Issue
Python's builtin function compile()
code = """def fact(x):
if x <= 1:
return 1
print (1)
return x*fact(x-1)
fact(10)"""
c = compile(code, "<string>", "exec")
code = """def fact(y):
if y <= 1:
return 1
print (1)
return y*fact(y-1)
fact(10)"""
d = compile(code, "<string>", "exec")
here c == d is False. Which is the expected behavior? (adding space in source code or making print 1
instead of print(1)
doesn't result in a changed object, which is correct.)
My question is: Can compile object be used to detect changes in the Python source code?
Edit:
To explain it more clearly, I am working on a web application that will allow users to execute their Python code.
Every time a user executes code, its executed on the server. Even adding a space/bracket etc results in a new execution. I am trying to optimize this step by storing the compiled code and if the code is the same in the new request then not executing it.
How do I know that code has changed? (Is it the kind of change that requires executing the code again or a simple space addition.)
I do think there are other ways to achieve what I am trying to achieve by using hashed value or something like this but I want to do it this way because it seems more Pythonic and better than reinventing the wheel.
Solution
Don't do this, as an optimization it won't result in great speed ups.
compile
is a built-in function, it is implemented in C
, it is fast and is not a place where you should be looking to optimize. When a user tries to execute code you should allow it to be compiled and executed without any sort of caching.
Consider the following, writing code that tries to discover if there's any difference in text entered by a user, will most likely result in more execution time than actually just compiling the function and executing it. Apart from this, you also have the fact that you'll have to store and retrieve the compile code somewhere. The first adds unnecessary space requirements and, the second, adds to the overhead because of the look ups you have to do (depending on how you chose to store it, of course).
Apart from what I just said, you can try and compare the result of compiling Python code but the approach is limited and obfuscated. I will assume the code snippets you entered should compare equal since the only difference between them lies in the parameter names (whose name has no impact on code execution).
Use the byte code:
In general the approach I'm going to present only returns "equal" (True
) if the code snippets in question only differ in white-space and/or parameter names, in all other cases, it returns False
(you could attempt to make it more intelligent but that would probably require you taking many edge cases into consideration).
What you should generally be aware of is that compile
returns a code
object. code
objects generally contain all the information Python
needs in order to execute a piece of code. You can view the instructions contained in a code object by using the dis
module:
from dis import dis
dis(c)
1 0 LOAD_CONST 0 (<code object fact at 0x7fa7bc30e630, file "<string>", line 1>)
3 MAKE_FUNCTION 0
6 STORE_NAME 0 (fact)
6 9 LOAD_NAME 0 (fact)
12 LOAD_CONST 1 (10)
15 CALL_FUNCTION 1
18 POP_TOP
19 LOAD_CONST 2 (None)
22 RETURN_VALUE
This is what your code snippet does. It loads another code object (which represents the function fact
), makes a function call and returns.
The same exact output is generated when you call dis(d)
, the only difference lies in the loading of the code
object for fact
:
dis(d)
1 0 LOAD_CONST 0 (<code object fact at 0x7fa7bc30e5b0, file "<string>", line 1>)
As you can see, they are different:
code object fact at 0x7fa7bc30e630 != code object fact at 0x7fa7bc30e5b0
but their difference does not lie in their functionality, rather they are different instances representing a unit of functionality that is identical.
So here lies the question: do we care about the fact that they are different instances if they represent the same functionality? Similarly, do we care if two variables have different names if they represent the same value? I'm assuming we don't and this is why I think you can make a meaningful basic comparison of code objects if you compare their raw byte code.
Comparing the raw byte code:
Raw byte code is pretty much what I previously described, no names, no identities just a sequence of commands for the Python interpreter (pure functionality). Let's take a look at it:
c.co_code
'd\x00\x00\x84\x00\x00Z\x00\x00e\x00\x00d\x01\x00\x83\x01\x00\x01d\x02\x00S'
Okay, that's ugly, let's have a better look:
dis(c.co_code)
0 LOAD_CONST 0 (0)
3 MAKE_FUNCTION 0
6 STORE_NAME 0 (0)
9 LOAD_NAME 0 (0)
12 LOAD_CONST 1 (1)
15 CALL_FUNCTION 1
18 POP_TOP
19 LOAD_CONST 2 (2)
22 RETURN_VALUE
That looks better. This is exactly the same as the previous dis(c)
command, the only difference is that the names are not present since they don't really play a part in all this.
So, what do we get if we compare d.co_code
with c.co_code
? Of course, equality since the commands executed are identical. But, there's a pit-fall here, in order to be 100% sure that d.co_code
is equal to c.co_code
we also need to compare the code objects for the functions loaded inside c
and d
(the code objects representing the function fact
) if we don't compare these, we'll be getting false positives.
So where does the code
object for the functions fact
lie in each case? They lie in a field called co_consts
inside the code
object c
and d
respectively, co_consts
is a list
containing all the constants for a specific code
object. If you take a peak inside there, you'll be able to see the definition for each of these:
# located at position 0 in this case.
# the byte code for function 'fact'
dis(c.co_consts[0])
2 0 LOAD_FAST 0 (x)
3 LOAD_CONST 1 (1)
6 COMPARE_OP 1 (<=)
9 POP_JUMP_IF_FALSE 16
3 12 LOAD_CONST 1 (1)
15 RETURN_VALUE
4 >> 16 LOAD_CONST 1 (1)
19 PRINT_ITEM
20 PRINT_NEWLINE
5 21 LOAD_FAST 0 (x)
24 LOAD_GLOBAL 0 (fact)
27 LOAD_FAST 0 (x)
30 LOAD_CONST 1 (1)
33 BINARY_SUBTRACT
34 CALL_FUNCTION 1
37 BINARY_MULTIPLY
38 RETURN_VALUE
So, what do we do? We compare their raw byte code to see if they represent the same functionality, as we did previously.
As you can understand this turns out to be a recursive procedure where we first compare the raw byte code for the input, then scan through co_consts
to see if another code
object is present and repeat until no code
objects can be found, if the code
objects are in a different position in co_consts
we'll return False
.
In code, that should look like this:
from types import CodeType
def equal_code(c1, c2):
if c1.co_code != c2.co_code:
return False
for i, j in zip(c1.co_consts, c2.co_consts):
if isinstance(i, CodeType):
if isinstance(j, CodeType):
return equal_code(i, j)
else: # consts in different position, return false
return False
return True
Where CodeType
from types
is used to check for code
instances.
I think this is pretty much the best you can do using only code
objects generated from compile
.
Answered By - Dimitris Fasarakis Hilliard
0 comments:
Post a Comment
Note: Only a member of this blog may post a comment.