Difference between revisions of "Transcript/Abstract interpretation"
Line 11: | Line 11: | ||
So this is now a fully parameterized interpreter. Accordingly, we should now go and instantiate this interpreter to basically go back to our standard interpreter. And then once we have achieved this, we can try to build a different abstract interpreter that computes the properties of interest, such as signs rather than concrete values. Here. Okay, our standard interpreter values are either in the rule the stores are just maps of verbal identifiers to values. And the algebra that we need here is then indeed we instantiate the polymorphic algebra type with store and value. And here we just repeat all the definitions that we simply extracted from the original monolithic, if you like, standard interpreter. So these are just the same definitions of seek and it. And while that we have seen previously, except that they are now grouped up here really as an algebra so that we can pass this algebra to the parametrized interpreter function and thereby we get back our standard interpreter. Okay? So now let's imagine how we instead set up an abstract interpreter which operates on science essentially or computes science as properties instead of INTs and Booleans. And to this end we basically replace those concrete domains that we talked about previously by these domains. So rather than talking about int we will be talking about sign. And what is the sign? Well, it's zero plus and neck. There's nothing surprising about it. And there are now two additional elements here, bottom and top. So we will see this in a second, but the intuition is as we will be doing some sort of iterative fixed point computation. | So this is now a fully parameterized interpreter. Accordingly, we should now go and instantiate this interpreter to basically go back to our standard interpreter. And then once we have achieved this, we can try to build a different abstract interpreter that computes the properties of interest, such as signs rather than concrete values. Here. Okay, our standard interpreter values are either in the rule the stores are just maps of verbal identifiers to values. And the algebra that we need here is then indeed we instantiate the polymorphic algebra type with store and value. And here we just repeat all the definitions that we simply extracted from the original monolithic, if you like, standard interpreter. So these are just the same definitions of seek and it. And while that we have seen previously, except that they are now grouped up here really as an algebra so that we can pass this algebra to the parametrized interpreter function and thereby we get back our standard interpreter. Okay? So now let's imagine how we instead set up an abstract interpreter which operates on science essentially or computes science as properties instead of INTs and Booleans. And to this end we basically replace those concrete domains that we talked about previously by these domains. So rather than talking about int we will be talking about sign. And what is the sign? Well, it's zero plus and neck. There's nothing surprising about it. And there are now two additional elements here, bottom and top. So we will see this in a second, but the intuition is as we will be doing some sort of iterative fixed point computation. | ||
Bottom will serve a special role in helping us to store the fixed point computation from basically undefined whereas top is indication of error in the sense like we were not able to compute any useful property. So basically it means we could have any sign here and we're really stuck. So we replace booleans by CPO. Bool. So CPO stands for complete partial order. This is hinting at the kind of algebraic structure that we need here for the kind of fixed point computation to be possible. But if you like, it's basically just a richer version of Bool. So it's either a proper bull or we also have a button at the top for Booleans, okay? And then property replaces value. So property is not either int Bool, it's rather either sign CPU, well, and then our properties replaces store. So it's also a map and it's also a map from verbal identifiers. But we don't map to values, we map to properties. Okay? So this is how we prepare our abstract interpreter in terms of the types that the interpreter operates on. Now, just a fun fact here we can clarify that science, they are essentially really abstract numbers and there's some formal reason behind here. Skip over here. But I give you a very simple constructive argument here. We just instantiate the numplus of haskell with sign. In this way we sort of show that sign is something like a number just as much as int as a number. And then you remember the tables for these operations that I showed earlier on in this presentation. We can capture them here as the operations of none. So here are here for example, we point out how addition works and you see that whenever top and bottom are involved, this will basically define the result. So think of this like this. If we already cannot hope for one of the operators to have a more specific sign, but we assume it could be any sign, well, then we will not be able to assign a specific sign to the result. And here if we are kind of again imagine fixed point computation. If we don't know yet anything about the sign of either side, then we also don't know yet the sign of the result. Well, if you add zero it's zero. If you have zero, it's positive and so on. | |||
Now, I was mentioning already the fact that we need these buttons and tops. So bottom being basically the undefined value and top being the overdefined value. And we need to arrange all the proper values or signs together with the bottom and the top element in an appropriate structure. So it's a complete partial order. So essentially we want some sort of partial order function and also least upper bound. And in this way we can later on perform the appropriate fixed point computation in the abstract interpreter in a way similar to how we have performed fixed point computation in the denotational interpreter. Okay? Partial ordering. So if x and y are equal, they are also partially ordered. Button is really smaller than anything else and anything is smaller than top. Otherwise for signs we don't have any other true order relationships. So for example, zero is not smaller than horse, pose is not smaller than meg and so on. So therefore we have a catchall case here with force. The least upper bound of two values is x if they are the same. And obviously if bottom is on either side, then the least upper bound is the other value. And if we have any other combination of values. So for example, we have the combination of post and neck, then we get to top sign, which means we don't know. Okay, this is also how we explicitly tell the type system here that we have a bottom. But that's a bit of a coding detail. Just to clarify, the kind of complete partial order that we use here can also be visualized by a Hassle diagram like this. So this is bottom, this is top. And so we have like three distinct elements here. Next, zero plus and bottom is smaller than all of these elements and all of these elements are smaller than top. And we actually opted for this model of science here. You could also argue that we might want to add a bit more precision. | Now, I was mentioning already the fact that we need these buttons and tops. So bottom being basically the undefined value and top being the overdefined value. And we need to arrange all the proper values or signs together with the bottom and the top element in an appropriate structure. So it's a complete partial order. So essentially we want some sort of partial order function and also least upper bound. And in this way we can later on perform the appropriate fixed point computation in the abstract interpreter in a way similar to how we have performed fixed point computation in the denotational interpreter. Okay? Partial ordering. So if x and y are equal, they are also partially ordered. Button is really smaller than anything else and anything is smaller than top. Otherwise for signs we don't have any other true order relationships. So for example, zero is not smaller than horse, pose is not smaller than meg and so on. So therefore we have a catchall case here with force. The least upper bound of two values is x if they are the same. And obviously if bottom is on either side, then the least upper bound is the other value. And if we have any other combination of values. So for example, we have the combination of post and neck, then we get to top sign, which means we don't know. Okay, this is also how we explicitly tell the type system here that we have a bottom. But that's a bit of a coding detail. Just to clarify, the kind of complete partial order that we use here can also be visualized by a Hassle diagram like this. So this is bottom, this is top. And so we have like three distinct elements here. Next, zero plus and bottom is smaller than all of these elements and all of these elements are smaller than top. And we actually opted for this model of science here. You could also argue that we might want to add a bit more precision. |
Latest revision as of 14:53, 15 January 2023
Transcript
So we have talked much already about interpretation, like running a program. And for example, we have also exercised the notational style of semantics to get interpreters or the operational style. So here we want to kind of add a form of interpretation, usually referred to as abstract interpretation, that allows you to compute properties of the program rather than actually execute or evaluate the program. So for this to make sense here, I assume that you had coverage of basics of interpretation and denotation semantics as we rely on these subjects in what follows. Okay, so simple motivation for abstract interpretation. Look at this simple imperative program here, completely silly, but enough to make the point without knowing the value of x. So x might be even a program variable that might be part of the input that we only provide if we actually run the program for real. So, without knowing the value of x, nevertheless, we know for sure that the value of Y at this point must be positive.
I mean, just by basic reasoning, basic arithmetics, we know it must be positive here because it's positive, we never actually need to do this test, so we can already save on the test. Additionally, we don't really need this code. We should emit a warning, this code will never be executed. So depending on how you look at it, it's either an optimization or maybe something we should warn about. And so we really want to understand how we can write some form of interpreter, an abstract interpreter that gives us this information here, namely that y is positive at that point. And then once we know these things, we can easily implement optimizations or warnings. Okay? So the key idea here is when we do absent application, is that we kind of no longer plan to operate on actual values as you know them, like Ins and booleans. Rather we operate on abstract values. So for example, for the problem that we just looked into, it may be enough to operate on signs rather than actual INTs. And again, for the problem we just looked at, regardless of the value, regardless of the sign of x, y would be positive at the appropriate point in the program. So here are just some examples of operations that we would typically just use on numbers, but now we kind of define them instead on science. And all these rules are pretty straightforward. Like assume like we multiply two negative numbers, we get a positive number. If you multiply negative and positive, we get negative. And here we also add a question mark because sometimes we simply don't know.
Okay? So if you don't know the sign, then in many cases we also don't know the result, except maybe here for zero, we know that it's always zero. Okay? And similarly, we can apply this trick to pretty much any operation that a normal operator would involve. And in this matter we can imagine how the normal interpreter becomes more abstract in computing on properties rather than actual values. Okay? So in order to make this step from a, let's say, concrete interpreter to an abstract interpreter, here is a simple concrete interpreter again for the imperative language. And we want to somehow factor it so that it's clear what part of interpretation is shared among concrete versus abstract interpretation. And let's just recall how this interpreted structure, right? So essentially here we have this compositional scheme, how we essentially traverse over all the statement forms and we invoke recursion on all the substations. And then once we have determined all the meanings of subexpressions and substatements, we have some operator that defines the meaning as it is composed from the constituents, okay? And here are all the helper functions, all the operators that indeed design meaning or compose meanings, if you like. This was the denotation style of interpretation and we are going to rely on this style as far as abstract interpretation is concerned. So remember, like, you know, the semantics of skip is the identity function for assignment.
Essentially we just manipulate verbal assignments, sequential composition of statements, we use function composition and for if statement, we essentially also use a conditional within the functional language here. And for a while we need to compute an appropriate fixed point, if you remember. Now, how can we extract the common structure for both concrete versus an abstract interpreter? Well, the trick is the actual definitions here of all these functions. They are specific to the kind of interpretive we need. But the fact that we need such members at all, that's a general fact. So what we do here is we imagine that we build an algebra record of all these ingredients here and that we parameterize this compositional and recursive scheme by this algebra. So rather than hardwiring specific choices here, we prepare to group up such specific choices in an algebra, as we say in Haskell, as a record. And we pass this record to this function. And by this we have a large amount of reuse. So here's the type of these semantic algebra kind of a record type that we need. So first of all, we assume that we need some sort of store transformation without committing to any particular representation of stores. So this is a type verbal. We also assume because of expression evaluation, that we have some form of, like you could say, store observation. And as we don't really commit to any particular representation for values here, because we want to use this for both abstract and concrete interpretation. And then we rely on these types of names within this record type. And you see we have a skip and a sign, a C, if, et cetera. So we basically have record components that correspond to the operators that we need in filling in the details of composite meanings in an interpreter. And all these guys have the appropriate types. So, for example, Sikh Prime takes a store transformation and another store transformation and returns a store transformation. Okay? So this data type, this record type provides the structure of the kind of ingredients that we need in order to commit to a particular interpreter, in fact to a particular representation for all the involved types and particular definitions of the operators for composing the meanings. Okay, now here is the parameterized interpreter. So we have this additional argument here. So we receive a semantic algebra which is still fully polymorphic. So we don't really commit to any particular store representation or value representation. And then here this code is very similar to what we had before except that it always picks up the appropriate operator from the semantic algebra A, right? So it's the same style of recursion into the statements, into the expressions. But whenever we need to combine the ingredients we use the operators as we can essentially access them within the semantic algebra A.
So this is now a fully parameterized interpreter. Accordingly, we should now go and instantiate this interpreter to basically go back to our standard interpreter. And then once we have achieved this, we can try to build a different abstract interpreter that computes the properties of interest, such as signs rather than concrete values. Here. Okay, our standard interpreter values are either in the rule the stores are just maps of verbal identifiers to values. And the algebra that we need here is then indeed we instantiate the polymorphic algebra type with store and value. And here we just repeat all the definitions that we simply extracted from the original monolithic, if you like, standard interpreter. So these are just the same definitions of seek and it. And while that we have seen previously, except that they are now grouped up here really as an algebra so that we can pass this algebra to the parametrized interpreter function and thereby we get back our standard interpreter. Okay? So now let's imagine how we instead set up an abstract interpreter which operates on science essentially or computes science as properties instead of INTs and Booleans. And to this end we basically replace those concrete domains that we talked about previously by these domains. So rather than talking about int we will be talking about sign. And what is the sign? Well, it's zero plus and neck. There's nothing surprising about it. And there are now two additional elements here, bottom and top. So we will see this in a second, but the intuition is as we will be doing some sort of iterative fixed point computation.
Bottom will serve a special role in helping us to store the fixed point computation from basically undefined whereas top is indication of error in the sense like we were not able to compute any useful property. So basically it means we could have any sign here and we're really stuck. So we replace booleans by CPO. Bool. So CPO stands for complete partial order. This is hinting at the kind of algebraic structure that we need here for the kind of fixed point computation to be possible. But if you like, it's basically just a richer version of Bool. So it's either a proper bull or we also have a button at the top for Booleans, okay? And then property replaces value. So property is not either int Bool, it's rather either sign CPU, well, and then our properties replaces store. So it's also a map and it's also a map from verbal identifiers. But we don't map to values, we map to properties. Okay? So this is how we prepare our abstract interpreter in terms of the types that the interpreter operates on. Now, just a fun fact here we can clarify that science, they are essentially really abstract numbers and there's some formal reason behind here. Skip over here. But I give you a very simple constructive argument here. We just instantiate the numplus of haskell with sign. In this way we sort of show that sign is something like a number just as much as int as a number. And then you remember the tables for these operations that I showed earlier on in this presentation. We can capture them here as the operations of none. So here are here for example, we point out how addition works and you see that whenever top and bottom are involved, this will basically define the result. So think of this like this. If we already cannot hope for one of the operators to have a more specific sign, but we assume it could be any sign, well, then we will not be able to assign a specific sign to the result. And here if we are kind of again imagine fixed point computation. If we don't know yet anything about the sign of either side, then we also don't know yet the sign of the result. Well, if you add zero it's zero. If you have zero, it's positive and so on.
Now, I was mentioning already the fact that we need these buttons and tops. So bottom being basically the undefined value and top being the overdefined value. And we need to arrange all the proper values or signs together with the bottom and the top element in an appropriate structure. So it's a complete partial order. So essentially we want some sort of partial order function and also least upper bound. And in this way we can later on perform the appropriate fixed point computation in the abstract interpreter in a way similar to how we have performed fixed point computation in the denotational interpreter. Okay? Partial ordering. So if x and y are equal, they are also partially ordered. Button is really smaller than anything else and anything is smaller than top. Otherwise for signs we don't have any other true order relationships. So for example, zero is not smaller than horse, pose is not smaller than meg and so on. So therefore we have a catchall case here with force. The least upper bound of two values is x if they are the same. And obviously if bottom is on either side, then the least upper bound is the other value. And if we have any other combination of values. So for example, we have the combination of post and neck, then we get to top sign, which means we don't know. Okay, this is also how we explicitly tell the type system here that we have a bottom. But that's a bit of a coding detail. Just to clarify, the kind of complete partial order that we use here can also be visualized by a Hassle diagram like this. So this is bottom, this is top. And so we have like three distinct elements here. Next, zero plus and bottom is smaller than all of these elements and all of these elements are smaller than top. And we actually opted for this model of science here. You could also argue that we might want to add a bit more precision.
You could have additional elements here like non pose and non neck. This would allow us to define more least upper bounds without going right away to truck. And more generally, when you do abstract interpretation, you always have some choice in how you set up your abstract domains. Basically how big these abstract domains are. Obviously the bigger the domains are, maybe the more precision you can expect. But maybe also the computations are getting very expensive. So there's always a trade off. Okay, now here is our algebra for sign detection. And you might need to put down some time to understand this in detail. So I will just run through it quickly to provide some intuitions here. But this is quite a beast, obviously, but some parts are easy. So here we just kind of repeat some of the type synonyms that we rely on. So we have our abstract values, that is properties, we have our abstract stores, that is bar properties. And here we build our algebra. And essentially we build an algebra that is somewhat close to the algebra for the standard interpreter. So for example, skip is defined by ID, just as in the standard interpreter. And also the treatment of assignments is the very same. With the only difference of course, that we manipulate different kinds of stores here. But the shape of it in here is the same as in the standard interpreter. And that's also true for sequential composition of statements.
We apply function composition just as before. So there's apparently nothing really that we can or must abstract from here things start to be different for if. So for if what we do is we obviously receive something that corresponds to the condition and something g and h, they correspond to the Zen and the alphabet. Now what we do here is we look at the meaning of the condition. So we apply it to the given store, the abstract store, and we insist that we get back an abstract pool. We certainly don't want to get a sign here. So over here we just insist it's an abstract pool. Now we perform a case discrimination on that abstract pool and it could be a true, it could be a false, it could also be a bottom and a top. So if we somehow know that the condition will evaluate the true, then the meaning of the if statement is solely defined in terms of the meaning of the then branch. Likewise for false we defer to the else branch. But now it could be that we don't really know yet what the value of the condition is because remember, and we see this in a second here, this is where the fixed point computation kicks in. So if we start from bottoms with some fixed point computation to carry out interpretation initially, maybe also this condition will evaluate to button. And if this is the case, the meaning, the composed meaning of the if statement will also be button, which means you probably need to continue fixed point computation to obtain meaningful results here. But it may also be the case that we have already figured out that the condition could evaluate to both true and false. And this is what the top bull case is for. If the condition can evaluate to both boolean results, then this means we need to combine the meaning of the then and the else branches. And this is where we use least up about and this is quite a nasty bit here we are combining essentially the abstract stores here, right? We say whatever the Zen branch could compute or the else branch, that sort of abstract store needs to be combined and that stored and models what could be the result of the if statement. Okay? So now here it gets a bit tricky and don't expect to get this just for my very brief explanation, okay?
But the essence of it is as follows f is the meaning of the condition of the Vile statement, g is the meaning of the body of the Vile statement. And then we somehow compute a fixed point over it where we involve if and skip and decompose the assumed meaning of the Vile statement x with the meaning of the body. So in a way, this is very similar to the standard interpreter, except that we use a different fixed point operator here and that we initiate the fixed point computation in a special way starting from bottom. So we will look at this a bit more in detail on the next slide. But the key thing to understand here is yes, we perform a very similar fixed point computation for a Vowel Statement here because the meaning of a vowel Statement needs to be defined by a fixed point. Except that we will perform a fixed point computation on these complete partial orders and we will storage fixed point computation from the buttons.
Okay? And then this stuff used again very similar to what we had in the standard interpreter, we just operate on different, that is, abstract domains, but other than that it's pretty much the same. Okay, so let's look a little bit more at this problem of fixed point computation. So remember, this is the kind of fixed point operator that we used in the standard denotational interpreter is essentially defined by the fixed point condition. Yes. So basically for the fixed point of F is such that if we apply F to the fixed point, well, that's again the fixed point. And in a way this is another very computational approach. I mean, here we really rely on the meaning of the body of the while loop to essentially at some point have a base case so that we terminate this infinite recursion for the absurd interpretation. We really want to make sure that we, that we terminate. I mean, we want to terminate with the result of the analyzes. We don't want it to suddenly analyze forever. So we use a different way of computing fixed points. So essentially what we do here is we also apply a given function, it's called age to some initial value. Think of this as button and we apply it any number of times. So basically iterate hi produces a list of applications of h to I and for each element it's an increasing number of applications. So it's first eyed and it's hid and it's HHI and so on. And in our case, this is something like the store here, let's say. And then what limit does is it basically looks at the prefix of this infinite list and we're on a lazy language, so we never materialize this infinite list.
It looks at the prefix at the first two elements and if they are the same, we have reached the fixed point and if not, we just keep on looking by this recursion with limit and this fixed point exists. If we have constructed our abstract interpretation in a sound manner, and I'm not going to be very clear about the constraints that we have to meet, but essentially it requires that obviously we need to be able to work on complete partial orders here and need to be able to start from bottom. But also all these functions here, they need to meet additional conditions. And if we construct the system in such a sound manner, then this competition will also terminate. I realized that this is not quite obvious and you obviously have to play with this in order to get a good understanding of it. Anyway. So we now have our abstract interpreter for sign detection. Let's put it to work. So here are two variations of a program for computing factorials. So basically y should be the factorial of x and I'm just putting two variations here just to pinpoint some challenge because our sign detection may more easily be useful for either of these variations. What's the difference here?
Well, here we use an extra loop variable i, which we basically increment until we hit X. And here we basically use X and be decremented down in a certain way. So this is the difference between these two variations. And as we will see our sign detection algebra as we define it so far that's the upper one works fine for the first variation of factorial but has some issues with the second variation. So let's look at this. So what do we do here? We take this function for factorial and we feed some assumptions into the analysis. That is, we assume that X is positive. So we want to compute factorial of a positive number. X sign detection by abstract interpretation gives us some properties for the variables that are involved in the program. In particular, it figures out that Y, which is the result that we're interested in, is going to be positive. It also figures out that I and X are positive at the end of the program.
I mean, that's nice to know, but maybe we were not interested in that. Now, we apply the same abstract interpretation to the other variation of Victoria where we don't actually have a verbal I, but instead where we count down X and surprise, surprise if such interpretation fails here to figure out the right property for Y. So what it does is it assigns top to Y, which means it really doesn't know what sign Y is going to have at the end of the program. So it could be positive, it could be negative, it could be zero, in fact. So this is not the grade and you can look at the algebra and it's discussed at length in the book. Basically, we can apply a simple trick to become a little bit more nuanced about how we handle if statements because remember, there was the problem that we would quite easily end up combining Zen and Else branches in a way that this could easily lead to trouble. And there's a more clever way of doing this, not going into it here, but indeed this more clever analyzes will still figure out that Y is positive. Great.
Okay. Yeah, that's it. So we basically looked at interpretation in the style of denotational semantics. We kind of factored out the compositional scheme and the typing structure underlying it so that we could parameterize interpretation and semantic algebra. And then we could either use a concrete semantic algebra and get back to standard interpretation or we could set up an abstract algebra that would instead involve properties of interest in our example was science and by this we could compute properties of the program. Yeah, so all the code that I discussed here is of course available in the repo and there is also additional meter program techniques other than abstract interpretations covered in the book. Thanks.
Auto Chapters
INTERPRETATION WITH ABSTRACT INTERPRETATION
abstract interpretation allows you to compute properties of the program rather than actually execute or evaluate the program. Key idea here is that we kind of no longer plan to operate on actual values as you know them. Rather we operate on abstract values. Once we know these things, we can easily implement optimizations or warnings.
HASKELL'S INTERPRETER: FULLY PROTECTED INTERPRETATION
We want to somehow factor it so that it's clear what part of interpretation is shared among concrete versus abstract interpretation. How can we extract the common structure for both concrete versus an abstract interpreter?
INTRO TO THE ABSTRACT INTERPRETER IN HASKELL
An abstract interpreter which operates on science essentially or computes science as properties instead of INTs and Booleans. What is the sign? Well, it's zero plus and neck. And in this way we can later on perform the appropriate fixed point computation in the interpreter.
INTERPRETER ALGEBRA FOR SIGN DETECTION
Here is our algebra for sign detection. It is somewhat close to the algebra for the standard interpreter. Treatment of assignments is the very same. There's apparently nothing really that we can or must abstract from here things start to be different for if.
LINEAR 2.8
In a way, this is very similar to the standard interpreter, except that we use a different fixed point operator here and that we initiate the fixed point computation in a special way starting from bottom. Here we rely on the meaning of the body of the while loop to essentially have a base case so that we terminate this infinite recursion.
SIGN DETECTION BY INTERSECTION IN C#
So here are two variations of a program for computing factorials. The upper one works fine for the first variation of factorial but has some issues with the second variation. We can apply a simple trick to become a little bit more nuanced about how we handle if statements.