Transcript/Abstract Interpretation With Professor Patrick Cousot - Lecture Series on AI

From PKC
Jump to navigation Jump to search

Original Page

Transcript

SPEAKER A

Hi, everyone, and welcome to the first joint distinguished lecture between the Flair, Flair and Air. AI Research and Flair Research are two groups at JPMorgan. Flair headed by Marco Pistolia and AI research headed by I'm. Emmanuela volunto. And basically, these groups both pursue research in technologies that are like at the front end of the development and research, namely in quantum computing and all sorts of other techniques related to technology and AI as a service and as a discipline of research and engineering that would apply to financial services. Both Marko and I are extremely pleased to have this Joint Distinguished Lecture because it's in the intersection of the interests and many others will come in the intersection of the interests of our two groups. So without further ado, I just wanted to say hi and to just welcome everyone. But I will pass to Marco Pistoya to really introduce our remarkable speaker today. Thank you, Marco. Thank you, Patrick.

SPEAKER B

Nice, Manuela. We're very excited today to have such a distinguished speaker. Professor Patrick. Professor Kuzo is a Silver Professor of Computer Science at the Current Institute of Mathematical Sciences at New York University. Before he was a professor at the Eco Normal Superior in Paris, France, the Ecopolitic and the University of Lorraine and a research scientist at the French National Center for Scientific Research at the University Joseph Fullyer of Grenoble, France. He received PhD in computer science and the PhD degree in mathematics from the Grenoble Alps University in France. Professor Kujo, along with Dr. Radia Kujo, is the inventor of abstract interpretation, which is the foundation of static program analysis. They created this framework in 1977. It is one of the most cited papers in the history of computer science and really the foundation of every work that uses program analysis, which is something that companies use every day for code quality enforcement, security bug detection and so on. So we're so excited to have a Professor Kujo here. He was awarded numerous awards like the Silver Medal of the CNRS in 1999 an honorary degree from the Faculty of Mathematics and Informatic of University of Thailand in Germany. Like the I Tripoli Harland Myths joint Award. Along with Dr. Radia Kuzo, he received the ACM sick plan programming languages achievement award. I'm proud to have collaborated with Professor Kuzo and Dr. Raja Kuzo in the past, and I'm so happy to be here today co hosting with Manuela this talk by Professor Patrick on Abstract Interpretation. Thank you, Professor Kuzo.

SPEAKER C

Thank you. So I will share my slides and it always takes some time to see where are my slides? Here. That's it. And I will try to explain the principles and gives you some applications. First. I start by vocabulary because it's not always well understood. What is program semantic verification, static dynamic analysis, abstract interpretation and so on. So when you have program, you have a syntax and you can represent the program in the language by some syntactic element like syntax tree or things like that. And the semantic is a formal definition of the execution of the program. So if you give a program to a semantic, it is a description of what will happen at execution. Often it's very informal, but sometimes, for example, ML, there is a formal definition and there are many ways of defining this. For example, execution traces or retrieval state and so on. Then to make verification you need a specification that is what you would like your program to do. And verification is a mathematical proof that the program semantics satisfy the specification. So when you say my program satisfied specification, that's wrong, that is semantics because the syntax does not satisfy the specification and because you have recursion and loop you must do proof by induction. And that's extremely difficult to autopatize because in induction you have to find an inductive argument and this is a real part of the problem. How do you find inductive argument when you verify? Perhaps then another point is called undecided ability. It was proof by Gerdell and Turing and many others in the 40s that any finite mechanical proof must fail on infinitely many programs. By mechanical proof I mean something which answer is always correct and which terminate. So because of this you see that it is difficult and so people have work around first for example, covered the proof is incorrect. You can make proof easy if they are not required to be correct. Another case is decidable cases, that is if you take a specific type of program like linear arithmetic loops with no test inside and so on, there is an algorithm that gives you the invariant. And unfortunately this covers a very small part of programming. So what you can do also is to assume that everything is finite as in model checking. And so you go out of memory and the time is almost infinite, so you have no answer. In deductive method people have to use interaction to provide the invariant, the inductive argument, because they are not able to infer it. And so when we do static analysis by abstract interpretation, the proof is always correct, it always terminates. So there must be something wrong here. And what does that always work is that it is sometimes inconclusive. You cannot say if it is two or four, the specification is satisfied or not. And so the game is to have analyzer that are precise and scale to be conclusive as often as possible. So dynamic analysis is completely different. You monitor the execution at one time, so you instrument your program to check things at one time. And the problem is that you can only observe one execution at the time. And sometimes like independency, you have to observe two executions and so you cannot do that at one time. You have guys that do symbolic execution that they put symbolic name instead of values and rapidly you see that they cannot explore all pathways in the perhaps. For example, if a perhaps does not terminate, they can conclude nothing. By symbolic execution. Others do bug finding I mentioned coherent. There is another approach which is also called Modesta king. Nowadays you pick a spot and you try to prove that there is a bug. So you establish a formula. More or less by symbolic execution for that path. And you prove there exist a value for which the bird will happen. And you use smt solvers for that. So the problem is, the smt solver answers nothing. You have. No idea what is going on. And so for me, these are not verification methods because they don't prove that the specification is satisfied. So in static analysis, the only input is a program text and it is valid the answer will be valid for all execution. It will always be correct. An abstract interpretation is a theory which allow you to abstract the semantics that you have defined into something which is able to make analysis and also verification. Method follow from the semantic by abstract interpretation. So the main objective is soundness that is, what is proof is true. Sometimes we have completeness that is, if something is true the method that you use allows you to prove it. But maybe you have to find an inductive argument and so the machine will not do it. But for manual verification method at least you have this property. That the method that you propose allow you to prove something. If it is true, static analysis are incomplete. That is, what is true might not be provable because of approximation, but at least what is proved is always correct, which is important. And another stuff is that abstract interpretation provide. A method to design the static analysis. And this design is machine checkable. And so for the moment, we can check only small analyzers. But at least we can prove that for small tools, they are proof correct for big. One. It's a limitation of deductive method. But we have progress on this. So I will try to introduce informally the main ideas. So you have a universe which is concrete. So in this universe you have elements, for example each point is a program execution. And then you have properties. And properties are sets of program execution. For example, by program terminates means all these execution here are finite inside. And maybe this one is all by execution terminate. In less than 1000 steps. And so this is not manageable by computer because it's an infinite mathematical object. You cannot represent it in the machine and you cannot enumerate all the points. So what you do is you introduce an abstract universe of properties which will have finite representations of properties. So these properties I have this finite representation and these two are represented in the same way. So you see that there is a loss of information because in the abstract these two properties are represented the same way, whereas in the concept they are different. And that is where I do an approximation. I approximate this dark blue by the light blue. Now, I have to establish a correspondence between the two worlds here and this is done by what is called mathematically Galway connection. But it's just a map from concrete properties to the abstract ones. So you see, the two blues are mapped to the same and there is a tide of inverse function which say this represents this purple property, this one represents this light blue property. And so there is a notion of inclusion or over approximation. If I have the light blue, I can represent it exactly in the abstract. If I have the dark blue I cannot because it has nothing corresponding, but I can over approximate and to get this representation and when I come back by gamma, you see it's an inclusion and over approximation. So I have lost information. But without losing information, the only way is to enumerate everything. And because I have infinitely many infinite objects, I cannot do that. Okay, so to this order correspond an abstract implication and the trick is to the theory ensure that if you reason here, whenever you say anything and prove it in this domain, it will be true here. So that soundness and obviously in this case it is incomplete because you see, I cannot prove this property, it is not represented exactly, but sometimes for proof method, we have also completeness. Okay? So the main idea is everything provable in the abstract will be true in the complete. So now I will illustrate the idea. And I have taken a program with two variable modest program and this represents at times zero I have x and y. But at that point then my program do one computation step. Then x now is changed to this value and y to this one. And the next computation step gives me this value. And each time I do a computation step, I move to the next state. So for real program you have a number of dimension which is fabulous and this is very complicated, but I take a simple example and you see I am looping forever here. I come back to a previous state and because this picture is hard to manage, I will represent it in this way. So you see, I have the time and at each time there is a point here. So this one is this one, this one is the next one, this one is the next one and I have made it continuous because a line is better than dots. So this line is one point here. Then when we define the semantics, you will have many points because you will have many input states or inputs or different file you are managing. And so each data input gives you a possible trajectory. And this set of trajectories is a semantics which is a program property. So it is one thing in blue here and the semantics is an infinite set of infinite objects. But mathematically I can reason about it. Then I have a specification, which means in my example here, no trajectory should go in the red points. So now I have an abstraction. What I will do, I will say I have to prove that these trajectories never go in their head. So what I will do is I will over approximate the trajectory and put them in some envelope. And here is my envelope. Oh, I have computer this envelope. We will come back on it on this point problem next. But for the moment, imagine I am able to have an object. You see, this object can be represented in a machine. Because essentially, these points here and with all and they are one of the points I have in the abstract. This is one program property. The property is all execution are in the green. So I can compute that. And then I say look, because all trajectories are in the green and the green does not intersect with the red, I approve that my specification is satisfied. So it's two step. First I take the semantics, I construct the static analytics. The static analyzer takes a program and construct the green over approximation of behavior. And then in the second phase it checks that it does not intersect with the red. So now, this is a miraculous case, but we know by undecided ability this is not always well. So when it works, we are happy because we know all execution are in the green. The green does not intersect, the proof is finished. And so it is what I represented here. What is true here is true here. The fact that the green does not intersect the red imply that the trajectories here do not intersect the red. Now, let's see other methods. When you do testing, you try a few executions and if you are lucky, you will find this one that gold in the red. But everyone knows that programmers are not lucky and often they miss errors. And so you say that this is not sound because you miss errors. If you do model checking, nowadays people do bounded model checking. They say I will try all execution for ten minutes or seven days and then I will stop. And so you see, at the beginning there is no error. So they will claim I have found nobody. But you see that later there is one and they missed it. So now what can happen also is guys like coverity. They say I cover, but they forget some cases. So in fact they approve nothing, because they approve that maybe there is no error in the green, but because some execution goes out, some might go into a forbidden zone and it's an error and you will get no message. So this will never happen with a fact interpretation based static analyzer. But what will happen is this. It is incompleteness. That is, you see, I have other epoxy method, but a bit too much. And so now the green gold in the red. So what I must say is you might have an error here and I don't know if you have one or not because there are two situations. The first one is you have an error because there is one execution that goes in the head. But I don't know that. I know only this. So I don't know that there is one and the other is this one. There is no execution, but again, I don't know it. So this is called a false alarm and this is called a true alarm. So in that case, the only solution that we have is to say our analyzer is not precise enough. We are going to do abstractions that are more precise. And so the design of a static analyzer is a bit experimental. You start with very abstraction, course abstraction and you refine your abstraction until you are satisfied knowing that it is not possible to get perfect solution. So I am going to go now to more specific detail. There is a little part which is theoretical just to show that there is some mathematics behind. But if you skip it's okay, so what is the trace semantics? It is finite or infinite sequence of tracers that I have represented by the lines. And you may have many of them, or infinitely many of them, for example, if you have different input. So the trace just record the state of the memory a long time and the state is just what is in the memory plus what is in the control. So the control consists of depending on the language. But usually you have a perhaps point you have stacked to record the codes to procedures. You may have more complicated thing with object languages and memory state record the value of variable, what you have allocated in memory, what input you have done and so on. So I will take a very simple case. It's a Y language, it's a small subset of C and the state will just be a program point l saying I am at this point in the program and the envelope who here tell me what are the values of the variable in this state? So, rho of x is five means the value of the variable x in the program is equal to five. And this semantics is defined by structural induction, that is by induction on the syntax of a program. So I define it for infinitely many program by considering finitely many cases. So a base case would be the assignment. So I have an assignment S which is a label variable sign to which I assign an expression. This expression has no side effect. Very simple arithmetic. Expression is variable constant, basic operation, plus, minus and so on. So what are the traces it's prefix? It means I just observe the beginning of all execution. But it can be as long as I want. So here I can have the initial state I am at L with any possible values of the program variable. And you see, I stop there. My observation is just I start the observation. I stop immediately. Then the next one is I start and then I do one step. So after this step, the control has changed. It is after the statement and the value of the variable has changed. Because X has been assigned the value of the expression and I have some formal definition. If I provide the value of the variable, this function will give me the value of the arithmetic expression. And this row is the old one. But now the value of X is fixed. So that's the base case very simple. And now we move to a very complicated case which is iteration and for that we will define the semantics by solution of equation of this form x equal f of x. And these solutions are called fixed point. So when you have such an equation, you may have zero, one or many solution. And there is a famous theorem by taski that says if eggs belong to a complete lattice, x and f is increasing, then there is a unique lease solution, the least fixed pole for this order. So you have a partial order, which in our case will be implication. And you can calculate in some cases by iterating. So you start from the smallest element you apply f. You apply f. This may go on forever, but you can pass to the limit by taking the least upper bound. And this is a fixed point. So there is mathematics that ensure that when I write this, this object exists and now is the definition of iteration. So I will spend some time on it. You see the semantics of the while is a linguistic point of this operator. So that's my function f in the previous picture. And now I define this function f. And it takes an argument, which is the prefix tracer that I have built up. To now. And because we start from bottom, at the first iteration, it will be empty. We say, I have built no iteration. And so you see, this is false because it's empty. This is false because it's empty. So these two terms here are false and there is only one which is possible when X is empty, which is an execution can start at the poem entry of the while loop and stop there. So that's the same as the assignment. It is the basic case. Now, we have done x with this, and we can iterate. And the next iteration let us consider this case. The next iteration x will be these traces here. I evaluate the Boolean expression in the environment. And it is false. So what is the possible execution? It is. You arrive at L, the test is false, so you exit the loop and you never enter. So this is this execution. You arrive at L and then, because the test is false, I go after the while without changing the value of the variable. Because in my language I can have no side effects in Boolean expression. And now there is another case, remember, I arrive at L, okay? I arrive at L here and when I evaluate in the envelope, it is true. So what I do, I start executing the body and so to execute the body, the semantics of the body, I have already defined it because it is by structural induction. So I define the semantics for this and then I define the semantics for that, so I can say I know the semantics of the body. And so the semantics of the body will tell me that I have a possible execution here, which is you have at the beginning with the same environment, which was the initial one, and you go inside the loop body. So when you go inside, in the loop body, you can stop at any time. And so here the semantics of the loop will be I arrive, I go in the loop body, I stop, or you can say, I terminate the loop body, and I come back at l. And so here you will have I arrive at L, I execute the body, and I come back at l. So now our x has either I have done no iteration that was the big case. Or I have done some iteration and stop in the middle, or I came back. And then again, here you will have the test is full, so after one iteration, I exist, or I do one more, and so on. So after two iterations I exist. Or I do one more. After three iteration, I exist. One more. And so when I iterate this function, I have longer and longer iteration in the loop body. It can be infinite. So maybe it will go forever. And when I take the list of which is a joint of all of them, I will have the prefix execution. So you see, mathematically we can give a formal definition and this will be the basis of all our reasoning. If this is false, this is not really good, then all our later reasoning will be wrong. But you see, nowadays we can take this and generate compiler that are proof correct from such definition. For example, Concert is a project where a compiler of C is proof correct for some machines and it uses a definition of the semantics like this. So if we use semantics, our static analysis will be correct, okay? So now we want the maximal traces, so it is a prefix tracer so that you arrive after the statement or there was a break inside the statements and so you go to the ladder where you break and there are also infinite tracers where infinite sequences of states that all the prefixers were in the prefix semantics. So you see that this definition of the y loop will be I iterate, and at some point I exit the loop and I cannot go longer. This will be a maximal execution or the loop goes forever and this will be an infinite sequence and this will be another execution of the loop. Because if there are inputs inside of the loop, there can be many execution. So that we have more or less of the formal part. I did it not to show that I am 7th, but to show that it is possible to give formal definition of language. And for example, for C, there are some. Okay, so the first abstraction is reachability. So you see my abstraction, it takes a tracer that I had before and also it takes a precondition that I have a bit forgotten here. It takes a program point and it returns a set of values of variable at each ParaMed point. So you see, it says how do I get this environment at paramount l? I pick an execution, any one I have defined previously. What I forgot here is to say that it should start with an initial state in p zero. That the precondition. So I forgot it. Sorry. So I pick all the tracers you have defined before, this one here, and I look at all the traces and in each tracer I look at all the time. You go through this point l and I collect the invariant, the values of the variable. So what I get is, when I am at point L, x equals one, maybe, or another one would be x equals five, or another one would be x equals ten, et cetera. And you see that when I have defined this, I take my previous definition here, I apply the alpha and I get this one. And this one generates the strongest invariant that is exactly at each paramount, what are the possible values of the variable. And of course, this is not computable, so it's an intermediate step in my reasoning. But from this I can derive proof method like Floyd or logic, if, you know, I can derive this from them, but it's not computable. So I will do another abstraction. And when we do a cactusion abstraction so assume I have an invariant sake, saying x equals two and y equals one or x equals three and y equals two, or x equals five and x equals four and y equals four. So this is this red line and x and y live on this red line. What I will do, I will project on the coordinates here. So when I have projected, I know that x is between two and five, and I know that x y is between one and four. But what I have lost is the fact that y plus one is equal to x and I have lost the information that I am on this line. And now the information is that I am in this box. So the abstraction is very simple for each variable. So remember, this was a set of environment, the possible values of the variable at some point. And now for each variable I project and collect its possible values in r, which I have written p here. It's unfortunate, I should have written r here. So you see the idea I forget about relation between variables and so now we have sets of values, but I can have infinite set, that is x might be 15 and so on forever. So what I will do is I will make a further fraction, I will take the minimum and maximum value of the set. So now I have an infinite set of values here and now I have two numbers or maybe plus infinity and minus infinity. And now this is representable in the machine. I can represent that for each variable and each parameter point that's finite and I can put that in my computer and compute those semantics here where I have transformed everything into intervals. And this will not terminate. We will see later why, but it's a progress. So when you design a static analysis, you will have many abstraction as prey as maybe 60 or 100 depend on the configuration you take, because it's extremely difficult to find the abstraction that will cover all your problems. So what you do, you say I take intervals, it solves some problem, I take congregations, it solves some problem. For example, to know that x is equal to something modulo four is useful in C because you must have variables that are aligned on where, double where and so on. So if you know that the value of the pointer is some constant module of four, you know the alignment. So if you have two analyzers with congregation to the interval, what you can do is make a reduction. That is, you see, if you are on one at this point and in this interval, you must be in this interval and this point. So this gives me a reduction saying that I must be between these two values and this is more precise than that because one is impossible and more precise. So this operation of reduction is really important because you create more and more complex abstraction by adding new one and making reduction so that you can transfer information from one to the other. Now we have fixed point and the idea of fixed point abstraction is the following. You have a concrete function which has a fixed point here you over approximate it and you compute the function of the over approximate and you see that it is above the concrete fixed point. So when we cannot compute this, we will take an over approximation of the function, compute fixed point of the function and we guarantee that this is another approximation here. So another representation is this one. I have my function, I pass to the limit, I continue to iterate and I abstract to four. Points only, which is very nice. You see at this point they all go to this stuff and all these guys they go there. So now when I iterate, it will be a finite iteration and this point over approximately the fixed point here. So they are correct, they are not very precise but they are correct. Now, when we have interval, you can have 0102-0304 forever. And so we must have another idea which is called widening in such interpretation and it can be linked to a Newton iteration. For example, if you have a function and you want to calculate its fixed point, you iterate you start at zero, you project the value f, you project the value f you project the value and very slowly it converts to the fixed point. So if you do this iterates in a machine you have errors because of floating point computation, let us forget that. But it will converge extremely slowly. So what Newton proposes say instead of using the function, I will use the derivative. And so instead of applying the function going to next point here, I will use the derivative to extrapolate. And when I have this condition, I know by task theorem that my x is an over approximation. And so that the idea of getting a fixed point or an over approximation of a fixed point which is very important because it is an inductive argument and that's a way of finding inductive argument that we can use to do inductive proof. And I take the same example of widering for intervals. So you can imagine that the analysis start executing the program. And the first point is this one. And I abstract it in interval, achieve the same point. Then I do one step and I get this previous point here plus this one here. So these two points, that's not an interval. So I have to abstract it into an interval. Excuse me, which here this one x is one and y is between y two, one two. Then I go on and some other imagine it's a loop. So the next iteration introduce this point x equals two, y equals two. So now I have an interval and a point. This is not an interval. So I have selected and I get this interval. My three points are definitely in this rectangle and I go on and I have this new point and I have this new interval. And you see that it can go on forever by introducing new points. So this is where I'm going to do widening. I'm going to say I take the previous interval, I take the next one which is this one and I do widening that will extrapolate to get this interval. You see the idea of the extrapolate is very simple. One is stable, I keep it. Two is not stable, I extrapolate to plus infinity, one is stable, I keep it. Two is stable, I keep it. And so you see it's an induction saying oh, I guess that maybe the next point will be in this interval. So you see that I cannot go on forever because if in the worst case I will get minus infinity plus infinity everywhere, this will be very imprecise, but nevertheless right and so this idea applies, always applies. And when you take example of static analysis like numerical properties, we have seen this semantics, we have seen we can cover it with internal. This needs a widening. We have seen that we can use simple convinces. We don't need widenings for this one because the site domain doesn't have increasing change. You can have octagons here, which is more precise and interval and can be used to handle arrays. For example, when you have an array from one to N, you need to know that the index is between one and N and oxygen will allow you to do that. When we work on control command, you have filter you need to do under them. And when you are floating point computation, you need to bound the errors. And so that's another domain in all these cases. But this one we need widenings. And so you see, this can be very precise by combining them.

SPEAKER B

Professor, sorry for interrupting. I just wanted to remind you that we have approximately 12 million I am very late.

SPEAKER C

So we can under symbolic properties like hip allocation, we can do checking, that is check specification like this one. You see, this one means when I am not at LX, will be positive for a number of times. Then if I arrive at LX would be zero and then I can be anywhere and X will be negative. So that's something I can do. And soundness I will not go along with this and I will go with two examples of static analyzer. I started three minutes. I started, so I will take five minutes. The first world is andromedaa and you know this guy who did it at IBM. And what it does is security for web application in languages for the web, it's very interesting because it's demand driven, so it does not calculate the information and the check and if it is missing information, it calculates it when needed. So it's precise and scalable and it checks things that are bad on the web, like XSS or SQL injection and things like that. And I was a co author, but Marco and his team did all the work. I just thought about it. Another one is Astray which is used in Europe in automobile industry. We have unique in space in the US is not really used. So let's go up. You have a nice interface. So when I did that with my team at ENS, it was 60,000 lines of Okml. Now it's 264,000 lines of Okml, not counting the interface, which is also thousands or thousands of lines here. So you check for division by zero, out of bounds of pointer that are bad arithmetic overflow it can undo parallel program. So you check for data locking which is bad, it use operating system. So not all operating system, but in Avionics and automotive you have specific operating system and can you analyze the interaction with the system and so on. It was evaluated by the American government and which objective is to develop techniques for evaluating software tools. So they tried four tools and two were successful. One is Astray, which is French and based on exhaustion done by my former student and this one is also French based on exotic interpretation and done by master student. It's not PhD, this one with PhD and the other is they don't say what they are. So you see they have test set and Astray found and both found all bugs plus some false alarms astray much less than pharmacy and interestingly, Astray discovered bugs in the test set. They were provided test set, they did not know some bugs and as I found them, the guy made a manual analysis and now these bugs are included in the one you must find if you want to go to the desert. Now you have to choose analyzers. So the first thing is you avoid it because programmers are never held responsible so they can do anything. It will be always great. Then the other thing is to take academic's analytics. For example, this paper is a recent one that check model checkers on automatic code. You see here about 1000 or 2000 lines of sea astray analyzed 10 million lines in one shot. So you see the scale is 10,000 and when you look, most of them fail. But they are very successful on academic tests that are built to be little ParaMed of two lines that are difficult to analyze, but it doesn't stay. There are two excellent books and at the end thank you.

SPEAKER B

Thank you, professor Puzzle, such a fascinating talk. Thank you very much.

SPEAKER C

Okay, I will put the slide online. You can also take them, it's no problem.

SPEAKER B

Thank you. So I think we have time for a few questions.

SPEAKER D

By any of the attendees that would like to ask a question. Please use the raise your hand feature found in zoom. Looks like we do have one hand up currently it comes from Carrie. Carrie, please, I'm used to ask your question.

SPEAKER A

Hi, Peter.

SPEAKER E

How are you? Thank you so much for presenting today. Can you hear me okay?

SPEAKER C

Yes, yes. Speak slowly because my English is not perfect. Okay.

SPEAKER A

But I am wondering I appreciate your.

SPEAKER E

Expertise and I'm wondering if we were.

SPEAKER A

Talking about symbols at some point.

SPEAKER E

I'm wondering if you could provide some.

SPEAKER A

Insight on what is the symbol for iterative. Would it be uppercase sigma or does it really matter?

SPEAKER C

I spoke essentially at two slides in symbol. One was a picture of a data structure. I don't know if you refer to this one. And another one was you give a specification using a regular expression where you replace the symbol by assertions and then you can build an area that will check your specification if you speak of the second, if you speak of the first symbol are just representation of a very complex graph which is the set of all possible memory allocation in your program. So one memory allocation is a graph and because it change over time, you have to represent this set itself by a graph. So it's a rather complex you can read the paper I gave the paper and the other one was more oriented towards checking program. But instead of doing testing, you just specified properties and you use an analyser to tell you whether these properties are too or fault. And instead of invariate, I want to be able to represent evolution over time. So either you choose a temporal logic where the formulas are just saying when I am at that point this is true of the variable and then I go to next point, this is true of the variable and so on. But my observation is temporal logic is a bit difficult. So I use regular expression because most parameters understand regular expression. But you see a regular expression tell you a possible sequence of properties of the execution and you can check that very precisely if you want. But you see the analysis will not enumerate contrary to budget checking. That's a big difference. I don't know if I answered your questions.

SPEAKER E

I appreciate your insight.

SPEAKER C

Yeah, okay, thank you.

SPEAKER D

I'm showing more hands. The next hand up is from Ann and go ahead and I'm going to ask a question.

SPEAKER E

Hi professor, thank you for the lecture. You were talking about dividing the input sets and I'm interested in actually narrowing the input sets so that we can identify the steps that will cause a program to fail.

SPEAKER C

Yes.

SPEAKER E

Yeah. Have you thought about it? Where can I read more about that? That's something.

SPEAKER C

Okay, so I have not spoken of what is called backward analysis where the analysis that I have pointed, they start from the beginning and they go in the future. But you can do the reverse. You can say assume I have reached this point here where there is an error and let us go backward to see what are the input condition that will create this error. And so you can iterate this process and it will stabilize at some point. Then there are two cases. Either you prove that it's possible to have the error or it's impossible or you don't know. So there are now a number of things that you can do. For example, you can say I will cut my input domain. In two cases you will do a decoder. You have an input interval where you might add errors. You could in two and you do two analyzers and we have done experiment like this and it is more precise than many other tools. The problem is to scale so if your program are reasonably small, that is 1 million line, it's okay. If it is a billion line, it will not work. But there are techniques and many people work on this technique nowadays. It's doing progress slowly. There are randomization technique, there are case analysis techniques. It is a problem because when we were doing with Airbus for a stray, when we were finding a bug, they were asking for a counter example. And a plane fly for 10 hours. If you have a bird after 10 hours, your example will be a trace of 10 hours with a point every five milliseconds. So it's a huge object. And so usually what we do is we don't go backward up to the beginning, but we go backwards to a point where we are confident that things are okay. And so if you are in the loop, we will go backward up to the end of the loop, not to the beginning of the loop. And this proved to be rather effective. You can also ask the user to help a lot by providing, for example, a regular expression telling where he thinks the error should be found. This will narrow the bandwidth of the case that we have to check. And so my answer is, if you want one execution, it's rather difficult, but in general you should have a little bandwidth. The parameter is intelligent enough to understand at some point what is the problem. That is the problem. That's my experience.

SPEAKER E

Thank you, professor. Yes. I have a way to narrow down the line number set to the one that interests us. And I actually appreciate your point about not going to the beginning, but going to the, as they call, less non good. And going from there. Yeah. Thank you.

SPEAKER C

So if you have procedures and so on, most there are techniques to infer specifications for the precondition post condition and then you can do your analysis within the body of the procedure, which is usually not that big. So up to procedures of 1000 line 2009, it will work. The problem might be that the specification of the procedure is wrong, but this will be another disease that will find it.

SPEAKER E

I'd love to talk to you more about it, but I know you're out of time. But thank you so much.

SPEAKER C

I have not understood if it is a question or not.

SPEAKER E

I apologize. I said that we are out of time. I would love to talk to you more about it. We've done something interesting in my space. I'm sure you'd be curious to find out. Anyway, we can contact you later.

SPEAKER C

Yes, it's no problem. That is you ask Marco, he knows my email. You send me an email and we can discuss. I can do a zoom session. I can give you my zoom number. I am available to discuss if you have more technical. This was a very general talk, so I did not go into all details but I can go. If you want to discuss this separately, I'm available. I'm a researcher, so most people think I have nothing to do and I am always available.

SPEAKER A

I'm sorry, I have to leave. Yes, thank you very much for your talk and Marco will take the final goodbye.

SPEAKER C

Okay, bye bye.

SPEAKER A

Thank you, Patrick.

SPEAKER B

Professor Kujo, if you still have time, we have a few more questions, but.

SPEAKER C

If you need yes, I can stay as long as you want. Thank you. I was impressed by the number of participants, which is impressive. I don't know if it is true, but I have never had a class. The maximum class I have had was $500, which is much less than we had today.

SPEAKER B

We had over 860 people attending, so it was really incredible. We have other questions coming.

SPEAKER D

Yes, our next question comes from Alexander. Alexander, please. I'm mute to ask your question.

SPEAKER C

Yes, please.

SPEAKER F

Professor Kazat, thank you very much for absolutely great presentation. I have a question like a few years ago I was working with third party company which used first of all the logic proof solvers to analyze the code, which is kind of similar concept to what you have to use in kind of like math and analysis to analyze code. What we found though that it works pretty well on low level languages, maybe like assembler or even like C, but for example in highly distributed type of programming like microservices or for example in my case that was scalabased framework using Actor system. It was really hard to use for system like that and find errors and find this kind of conditions because of orchestration, because it's like highly distributed system is highly distributed, messaging driven. Do you have any thoughts about that?

SPEAKER C

Yes. So one problem is that we are dependent on the language because we depend on the semantics. And when you take two different languages, they have two different semantics. And so it's very hard to do analysis that works for many kind of semantics. In particular, in parallelism. If you take monitors, actors, semifinals, communications, asynchronous synchronous and so on, each time it's a different model, so it's a different analysis. Second point, it is difficult, so researchers, they do small problem and they don't go to big ones, so they take lower ranking food first. Now, it is not impossible to analyze parallel code, so it is statically allocated processors. So the example I have in mind is a pilot interface in planes, so it's the communication with the pilot and the system screens and so on. So it's a 3 million line of C, it has 15 processors and it has many communication. For example, it has voice and sound and blinking stuff and blink and things like that and you want to prove that when you have somebody speaking a message there will be no sound at the same time and when you are blinking screen it will be on two different devices, not on the same and things like that and so on this program, which is partly unwritten and partly generated automatically, we had about the time thousand alarms. So I was completely disappointed. I say thousand alarm, it's not manageable but when I discuss with the people, they told me yeah, but we have 100 engineers working on this project and thousand divided by Android and each engineer will have between zero and 20 alarms and this will be manageable. We went on and the problem was there was a big data structure which represents the state of the system with semaphore inside and so on so we are still working on that and I am hopeful that someday we will reach almost no foresaw. The trick is that this is for a model of parallelism that is most often found in embedded systems and it will work for automotive, it will work for planes, satellites and so on and it will not work for say, cloud computing because the model is different. So what I think is unfortunate that you have thousands of researchers, they work on small problems, they publish a lot and they are hilocked to have projects that take a straight in 20 years. So you see, it's very difficult to have funding for this long term and so that's a difficulty. That is, if you don't invest bankers, they know that if you don't invest, you have no return. So that's the same in parameter analysis. So you say the actor, there are some analysers but you see, usually it's one guy that does that for his CVS and then it goes to the trash. To have a long term project is much more difficult.

SPEAKER F

Thank you very much for your answer. Actually I got very good answer and my experience working with this company, like you say, they want to go with something they know yes, explore some boundaries which kind of like new for them and we couldn't find any grounds and like you say, for cloud computing and distributing system actually your approaches actually might work.

SPEAKER C

Yes, I can tell you a story when I was young so after maybe five years of such interpretation I decided to work on parallelism and I started to work on parallelism and then at some point I got no money because the people at the time were saying yeah, they build this parallel machine but by the market low, you see, two years later your monoprocessor is more powerful than your parallel processor. So we are not going to spend money on parallelism and so we are paying now the fact that the decider 1020 years ago were not able to anticipate that parallelism would be the main problem of nowadays and so there has been very few research that scale any sound. That's unfortunate.

SPEAKER F

Thank you.

SPEAKER C

That's a great question.

SPEAKER F

And now we're at the time when we cannot scale up no more. Right. Because of quantum limit of the Microsoft and we have to go to parallelism. There is no other way. Thank you so much for your answer, professor.

SPEAKER C

All right.

SPEAKER D

And we have another question from Caroline. Caroline, if you'd like me to ask your question.

SPEAKER C

I understand that you hear nothing. There is a mic not working. It's better, not better.

SPEAKER E

My question is, as a student of mathematics, can you recommend lectures that I should get into? You already mentioned that from algebra is used to be outside algebra.

SPEAKER C

Are there any other there are two people who are speaking at the same time, so I don't understand. Mahu, you have understood something.

SPEAKER B

Caroline, can you actually mute yourself for a second? You might be in a busy place, but I think Caroline was asking you, Professor Kudo, if you have any literature, a book, maybe, that you would recommend for people.

SPEAKER C

Yeah, at the end of my talk, I gave one book, which is more for engineers and which is written by one of my former students. So it's a good book. So I would recommend this one. And the other one is a book. I have been working this last year. It's not yet published. It will appear in September. But it's more for researchers, so I can show my slide. I will put the slide online.

SPEAKER B

We can actually get your slide later and find the reference.

SPEAKER C

Yes.

SPEAKER B

So thank you, caroline, I'm sorry, but I think maybe you are in the office and there were other voices coming through. But I think I got your question for Professor Kuzo.

SPEAKER D

All right, I am showing a couple more questions. Our next question comes from Tom. Tom, can you want me to ask your question?

SPEAKER G

Can you hear me?

SPEAKER C

Yes.

SPEAKER G

Okay, excellent. I guess it's a question that follows on a little bit from the previous one, which is we're often in the business of having to build highly distributed systems with dependencies outside of our control as well. And certainly we've looked at using something like TLA Plus to help us understand and specify our interfaces and determine that our protocols are right. But you seem to shy away a little bit from temporal logic as a way of expressing these things. How do you see the work that you've done applying to these large scale distributed systems that were kind of increasingly being driven to both right and also reason about? Because we would like to be able to validate, correctness, and then be able to assert that the implementation we have and the assumptions we've made about the underlying infrastructure is actually asserted by the model.

SPEAKER C

Okay, so this is not a simple question, man so when you speak of TLA Plus or the B method or even B or method like this, it is a method that proceed by refinement. That is, you are supposed to have a specification and refine it until you get a program. At each refinement, you have proof obligations, and it is guaranteed that at each level whenever you have a requirement, they will be satisfied at the implementation level if you approve the refinement constraints. This is very nice for small algorithm, that is, if you have a small protocol of say 300 lines, it will take you two months or three months to get it in this way but in the end it will work. So if you go to very specific algorithm I think doing program refinement is great and formally it is just an inverse of a psychic interpretation. It can be formalized in the same way. Now, if you go to programs that are say 100,000 lines or more that are duplicated on thousand of computers, I don't think you will make it because it's extremely difficult to go through all the proof. The proof are really usually much larger than the parameters so if you have a parameter 10,000 line, your proof will be 1 million lines. So you see, you move the problem of understanding program to understanding proof. So I would recommend refinement in the small in the large. In principle, it should be possible to have a static analysis tool that offers some guarantees like message are not lost or there is no global deadlock or things like that. Unfortunately, no one has invested in short stools. So you can for example, if you use a stray without parallelism, you will be able to analyze a parab in C or C plus plus. Whenever there is a call to communication or system you will have to provide some specification and that's it. It will not be a global view of the system. So yes, your question? My answer is in the small do refinement in the large do analysis but there are no analyzers so try to find a company that will build them. It's extremely hard because there are no competent companies in software in the US you have maybe one or two or three that are competent and most of them are commercial. For example, one has an analyzer that works on 70 languages so when you take the intersection of 70 languages you prove nothing, that is, you take any 70 languages on the market and you will see that they are sufficiently different or that you cannot do the same reasoning for all these languages. So I am a bit pessimistic and too old to start a project or 20 years now but my students in Europe, they are starting project don't worry, in ten years you will have solution that's the scale. It's so difficult that you cannot see. My last student, it took him seven years to do a thesis and he was employed next day by Facebook. So that was one on the market, which is employed immediately. And so to find ten, because a project like this, you need ten. You need a guy who has experience and failures previously to understand what is important and not and to draw the contour of the project and to organize this. It's really difficult and the competence is not there, unfortunately. You see, I am the only one giving a class on abstract interpretation. In the US. All the classes, you have one or two classes, so people know it exists, but they never go into the details, and so they are unable to apply. That's really a problem. So what I would recommend is you take the problem which is the most important for you only one, and try to find somebody who is able to find this kind of bugs and only this kind of bugs that makes a model of programs that find a specific bug that is important to you. And that's already a start. And then you move on to more and more complicated properties. People get experience and slowly you grow your team and you grow the scope of your tool. But starting small. So when we did a stray, we asked for small programs. So we got a simulator of planes, which is a big problem. Then we told them it's too big. So we said, Maybe you have a simulator of the simulator. They said, yes, we have one. It was too big. So we said, Maybe you have a simulator of the simulator. Of the simulator. Yes, they have one. So this one, we analyzed this and we analyzed the second one, then the simulator and then the plane. It took five years, but it grows slowly from problem we can solve to more and more difficult problem. People get experience, knowledge, and I think it's a way to go. If you say, I want to check my cloud computing is too difficult, you will not succeed. You have to isolate. And I spent three months discussing the problems that are important to the avionic industry before starting my little 500 analysis of the prohibits. So it's important to, I would say, restrict the ambition to start with. Maybe you ask too much to these companies.

SPEAKER G

Maybe. Okay, thank you very much.

SPEAKER D

I am showing a few more questions. Our next question comes from Vincent. Vincent, please. I'm mute to ask your question.

SPEAKER C

Vassal. Vassal. In French, we don't pronounce a T at the end. So you don't say kuzat, you say Cuzo. Like merlo pinot vassal. Okay, vaso.

SPEAKER F

I'm actually from Canada, so I definitely got a lot of awesome back in the day. Okay, my question is I asked this partially in good humor, but how do you analyze the static analyzers? How do you make sure that the program itself is sufficiently complete?

SPEAKER C

So when we did astray we had a theory, so you have seen that it goes down to small programs. For example, interval, you have the addition. So my student, they were doing proof by N. So they say the interval of the parameters. I will prove that the interval of the result is different. And they have done that a lot for filters, for all the domains I have done. And the general structure was known. To be correct because for years we had done that. So we did not do a proof but the fact that there is a theory is a guideline and we knew problem. For example, when you have a reduction, it can destroy the widening. So you do a widening and then you do a reduction so you have extended your interval and the narrowing will bring it back. So next widening will extend it and the reduction by some analysis will bring back and it will not terminate. So we knew this problem was existing and so we prove it did not exist. In fact, it did, but we made the modification and then prove it did not. So it was, I would say informal proof like mathematician do they have a convincing argument, but not a list of formulas that are southern of pages. And then subgroups started a project where they built a small analyzer with interval optagos and so on, the same structure astray with layers of domain that communicate for reduction and so on and they prove it with concern. So the problem is that this smaller analyzer or the factor is 100 with the true Astray. So you see, the analyzer is small, we can prove it formally, we have the mathematics, we have the proverbs, we have the guy that do it to scale is difficult. It's the same problem as scaling proof that you do by any prover when you go big is difficult but contrary to many, if you compare to compilers, for example, astray we have found very few bugs, very few. The bugs we have found maybe one bug every month that's very small and now it's a long time since we have not found a bug. The bugs that are difficult are the one of imprecision. That is your answer is correct but the bug is you could have done better because you have all approximated at some point where it was not useful. These bugs were difficult to find. We had a few because people say I will not spend too much on this program, I will over approximate, it will be correct. Let's go. And so it was and someday you find a problem where you should have been more precise. So these birds, which are sound but impressive, are difficult. And I can tell you the last bug I heard about it was a guy using the analyzer. He said I have found a bug in Astray. So we said no problem, we will do a waiting. So we need a little extract of the program. He came with his extract and we thought about it. And you say the problem is that the bug which is reported by Astray is a true bug, is your program. And we showed bug. So we ran to telephone and discuss the bug with his friends in his company. It was a flying bug, so it was important. I am optimistic that there are very few bugs in program design with a theory in the air that you can use informally as mathematician do. Now, there are very few people doing that. Are you satisfied? Myself, yes. Thank you for that. Thank you.

SPEAKER D

And our last question comes from Ian. Ian, please, I'm mute to ask your question.

SPEAKER H

Okay, thank you. First of all, thank you very much for the interesting lecture. It really is one of the areas that I'm particularly interested in. Now, you pointed out that the analysis needs to reflect the semantics of the programming language. We would like to maybe design the next 700 programming languages to have semantics that are most helpful to static analysis, to abstract interpretation, and yet still have the largest domain of useful programs. So with that in mind, what kinds of semantics are more or less hazardous to that goal? Or how can you give advice to perhaps a language designer that wants to be able to handle nondeterministic concurrency, dynamic behaviors and so forth without making it impossible to tell if your program is good?

SPEAKER C

Yeah. So which kind of semantics? I would say no one, because in practice you don't have one semantics, you have several. For example, you will have Tracers, which is one kind of semantics with interleaving for parallelism, which is not so good, but maybe you lose parallelism and then you will have another layer maybe, which will be where you separate, for example, the semantics of procedures from the semantics of the program. So this is an abstraction that can be formalized by abstract interpretation. Then you will have to maybe you will want a semantics of function which is precondition and post condition to be able to analyze function, procedures, methods and so on separately. Then if you have classes, you will have to have a class invariant and so on. And so this will be from your trace semantics, initial level of abstraction that you can define formally. And now when you go to the analysis for some problems, some semantics will be useful, for some others, another semantics will be useful. So in practice we have a kind of er sheet of three or four semantics and we build the analysis with respect to one of them for some property, another one for some other property. And because we have shown that they are abstraction of one of the other by composition, we are sure that they all analyze the same concrete basic property. So that's one point. The next point is to have a universal language that does everything is extremely difficult. So you see, if you have dynamic creation of processors, you have distribution, you have arbitrary complex classes, inheritage. If you put everything in a language, I think history has proved that you will fail. And I would cite an example from IBM, because mandamarco is no longer IBM. So IBM when I was young said we have too many languages, we will create one. And they called it programming language one, PL one. So I studied PL one, and you are, for example, automatic conversion from one type to another. So you had a matrix of 80 by 80, and you have rules to go through this matrix, to know whenever you have any data structure, like a boolean, how it becomes a complex number or an array of characters. So you could do that. So this way they spend a lot of money on developing compilers, on developing type system. They created a laboratory in Switzerland to have a formal specification with Vienna definition language and so on. And in Vienna it failed because it was too complicated, too many. But parameters is not understood. If you don't have any VM machine, you cannot run it. And so they are undone after spending I don't know how much. The good point is that they learned the lesson. They never tried again to have a universal language. And the second point is VDL. The formal semantic became what is called VDM Vienna Definition method, which is still used at least in Europe, for specification and perhaps development. So it was not a wasted effort. I will discourage you to do a universal language. And if you want a language which is dynamic, I have one of my students called Antoine Minne that has an interpreter for, I think it's Pearl, which is rather dynamic. And if you send me an email or Marco send me an email, I highly have to give you the reference. You can try the analysis. It's for one language. It's very different from other one because of dynamic aspects, typing and so on, and maybe it can be useful.

SPEAKER H

All right, I suppose your Pcuso at NYU.edu.

SPEAKER C

Yeah, you send me a message and I will tell you the reference, the pointers you need and the previous question on the book. I will send it to Marco with my slides. And we have answered these two specific questions.

SPEAKER B

Professor Kuzok, so I would like to thank you really very much for this very fascinating and interesting lecture. I'm sorry that we kept you here for an hour and 36 minutes, but it doesn't happen often that we have the opportunity to meet somebody who knows the Bible of.

SPEAKER C

A bird. I can prove that I am an old guy.

SPEAKER B

You definitely are the top scientists in the field. So it's an honor for us to have you here and to have had the opportunity to learn from you and ask you all these questions. So thank you again, and we hope to have the opportunity to collaborate with you in the future.

SPEAKER C

If you have questions, don't hesitate to ask. And I enjoy the talk, and we are happy to meet you again, because we have not seen each other since a long time now. So bye, everybody. Thank you.

SPEAKER B

Thank you. Bye.

Auto Chapters

JPMORGAN'S FLAIR, AI RESEARCH JOINT DISTINGUISHED LECT

==AI Research and Flair Research are two groups at JPMorgan. Flair headed by Marco Pistolia and AI research headed by I'm. Emmanuela volunto. Both pursue research in technologies that are like at the front end of the development. First joint distinguished lecture between the Flair, Flair and Air.

A TALK BY PATRICK KUJO AND RADIA KUZO

Professor Patrick Kuzo is a Silver Professor of Computer Science at New York University. He is the inventor of abstract interpretation, which is the foundation of static program analysis. It is one of the most cited papers in the history of computer science.

PROGRAM SEMANTIC VERIFICATION, STATIC DYNAMIC ANALYSIS, INDUCTION

Program semantic verification, static dynamic analysis, abstract interpretation and so on. Professor Kuzo explains the principles and gives you some applications. The main objective is soundness that is true. Sometimes what is true might not be provable.

MATHEMATICAL FOUNDATIONS OF THE ABSTRACT LANGUAGE

So what you do is you introduce an abstract universe of properties which will have finite representations of properties. The main idea is everything provable in the abstract will be true in the complete. We will illustrate the idea with a simple program.

UNDOING AN ITERATION WITH A STRUCTURAL INDUCTION

Trace semantics is a finite or infinite sequence of tracers that I have represented by the lines. This semantics is defined by structural induction on the syntax of a program. For a very complicated case which is iteration we will define the semantics by solution of equation of this form.

UNDOING THE CACTUSION ABSTRACTION

The first abstraction is reachability. It takes a program point and it returns a set of values of variable at each ParaMed point. From this I can derive proof method like Floyd or logic. It's extremely difficult to find the abstraction that will cover all your problems.

FIXED POINT ABLATION

The idea of fixed point abstraction is the following. When we have interval, you can have 0102-0304 forever. And so we must have another idea which is called widening in such interpretation. Can be linked to a Newton iteration.

STATIC ANALYSIS IN SOFTWARE

One is Astray which is used in Europe in automobile industry. Another is andromedaa for web application in languages for the web. The first thing is you avoid it because programmers are never held responsible so they can do anything. The other thing is to take academic's analytics.

QUESTIONS FOR THE PRESENTATION

So I think we have time for a few questions. By any of the attendees that would like to ask a question. Please use the raise your hand feature found in zoom. Looks like we do have one hand up currently it comes from Carrie.

INTRO TO SYMBOLS IN C#

Talking about symbols, I'm wondering if you could provide some Insight on what is the symbol for iterative. Would it be uppercase sigma or does it really matter? I spoke essentially at two slides in symbol.

FORWARD ANALYSIS OF PROGRAM ERRORS

Patrick Kujo is interested in narrowing the input sets so that we can identify the steps that will cause a program to fail. The problem is to scale so if your program are reasonably small, that is 1 million line, it's okay. But there are techniques and many people work on this technique nowadays.

PUSHING THE PARALLELISM IN COMPUTING

One problem is that we are dependent on the language because we depend on the semantics. It's very hard to do analysis that works for many kind of semantics. To have a long term project is much more difficult. We are hopeful someday we will reach almost no foresaw.

REFERENCES TO ALGEBRA FOR STUDENTS AND ENGINEERS

All right. And we have another question from Caroline. Can you actually mute yourself for a second? Professor Kudo, if you have any literature, a book, maybe, that you would recommend for people. Our next question comes from Tom.

LARGE DISTRIBUTED SYSTEMS: PROOF AND VALIDATION

How do you see the work that you've done applying to these large scale distributed systems that were kind of increasingly being driven to both right and also reason about? I would recommend refinement in the small in the large. There are no competent companies in software in the US.

STATIC ANALYZERS

How do you analyze the static analyzers? How do you make sure that the program itself is sufficiently complete? If you compare to compilers, for example, astray we have found very few bugs, very few. The bugs that are difficult are the one of imprecision.

LANGUAGE DESIGN: THE STATIC SEMANTICS

We would like to maybe design the next 700 programming languages to have semantics that are most helpful to static analysis, to abstract interpretation. If you put everything in a language, I think history has proved that you will fail. I will discourage you to do a universal language.

A BIRD'S TALK BY PROFESSOR COUSOT

Professor Cousot, so I would like to thank you really very much for this very fascinating and interesting lecture. You definitely are the top scientists in the field. If you have questions, don't hesitate to ask.