llllllllll / codetransformer Goto Github PK
View Code? Open in Web Editor NEWPython code object transformers
Home Page: http://codetransformer.readthedocs.io
License: GNU General Public License v2.0
Python code object transformers
Home Page: http://codetransformer.readthedocs.io
License: GNU General Public License v2.0
In to_pycode
we need to emit EXTENDED_ARG
when the index is > 256 in py3.6 or > 2**16 before py3.6 for instructions like LOAD_FAST where the argument is an index.
Currently Handles:
or
and and
.Does Not Currently Handle:
or
and and
It should almost certainly be visiting LOAD_DEREF. The others are less clear. LOAD_CLASSDEREF should only appear in the body of a class, which I don't think we know how to visit? And LOAD_FAST implies that there's a local store to the name somewhere in the function. We might also want to be more explicit about rejecting any STORE_* instructions to a name that's been declared frozen.
If I understand correctly, both decompiler
and transformers
are optional to the main functionality of the package: inspection and modification of code objects.
I propose to make both packages optional (e.g. via extras_require
).
we should use it
dict/list/set comprehensions use BUILD_{type} 0
plus an iterator and a sequence of MAP_ADD
/LIST_APPEND
/SET_ADD
instructions, which breaks with the current transformers that assume you can just insert a function call after BUILD_* call.
The right fix for this is probably a more expressive pattern-based API.
This morning I had a shower idea: "Python function calls can have a high overhead. What if we could use the disassembler to directly insert a function's bytecode into the bytecode of the caller function?".
I wrote a mwe to show that this might provide some speed benefit on some cases:
def add(a, b):
return a + b
def mult(c, d):
"""
Time calling the add function and its internal bytecode
"""
result = 0
for _ in range(d):
a = result
b = c
result = add(a, b)
return result
def mult_inline(c, d):
"""
Time calling just the bytecode inside the add function
"""
result = 0
for _ in range(d):
a = result
b = c
result = a + b
return result
def baseline(c, d):
"""
Overhead of the function without calling add at all
"""
result = 0
for _ in range(d):
a = result
b = c
pass
return result
def mwe():
c, d = 100, 100
import ubelt as ub
mt = ub.Timerit(100, label='mult')
for timer in mt:
with timer: mult(c, d)
mit = ub.Timerit(100, label='mult-inline')
for timer in mit:
with timer: mult_inline(c, d)
bt = ub.Timerit(100, label='baseline')
for timer in bt:
with timer: baseline(c, d)
overhead = min(bt.times)
nocall = min(mit.times)
withcall = min(mt.times)
# time of a single call to the interesting parts
add_time = (nocall - overhead) / d * 1e9
call_add_time = (withcall - overhead) / d * 1e9
call_time = call_add_time - add_time
print('add_time = {:.2f} ns'.format(add_time))
print('call_add_time = {:.2f} ns'.format(call_add_time))
print('call_time = {:.2f} ns'.format(call_time))
if __name__ == '__main__':
mwe()
Output:
Timing complete for: mult, 100 loops, best of 3
time per loop : best=12.1 µs, ave=12.14 ± 0.15 µs
Timing complete for: mult-inline, 100 loops, best of 3
time per loop : best=5.731 µs, ave=5.881 ± 0.35 µs
Timing complete for: baseline, 100 loops, best of 3
time per loop : best=2.547 µs, ave=4.608 ± 3.3 µs
add_time = 31.84 ns
call_add_time = 95.52 ns
call_time = 63.68 ns
This analysis shows that the overhead for calling a function is about 63 nanoseconds in this case. When we insert the bytecode inline inside the loop, we end up saving ~2.5 microseconds.
This shows that there may be some benefit to doing this in simple cases. In more complex cases, this may not be feasible due to the semantics of function scopes, but in simple cases (i.e. where the args to a callee are not modified, and other assumptions I haven't thought of yet are met), this might be doable.
While thinking about this I recalled the PyCon 2016 talk and I rewatched it, and was reminded that the codetransformer
project exists. I skimmed the code, but didn't see anything that explicitly did what I envisioned. I was wondering if a PSF certified bytecode expert might weigh in on whether or not this seemed feasible / interesting.
From my initial reading / experiments, there are a few challenges/side-effects that need to be addressed/accepted to do something like this:
Might something like this be easy to do with codetransformer
? Any ideas for where to get started / things that may cause issues that I haven't considered?
I would like to start thinking about the API for the transformers to allow for pattern matching.
Some ideas:
multipledispatch
The idea with using multiple dispatch would be to attempt to call some dispatched function over all of the instructions, shrinking the tuple until we get a match, and then consuming all of the instructions and continuing.
We can always just write some pattern language and mix the abstract pattern types in with our concrete instruction types like:
@dispatch(SETUP_WITH, var, WITH_CLEANUP_FINISH)
def visit_pattern(self, *instrs):
....
We could always go full crazy and do something like alex in our code to define this stuff:
example:
class MyTransformer(CodeTransformer):
with $rules:
SETUP_WITH .* WITH_CLEANUP_FINISH { visit_with }
STORE_FAST LOAD_FAST { visit_roundtrip }
def visit_with(self, *instrs):
...
def visit_roundtrip(self, store, load):
...
In code.py, lines 68 and 306, there's a lonely "*" used as a parameter, which throws the invalid syntax error. I'm not sure if I should remove them or replace them with something else. I'll be back if I find out.
toolz is a great library, but if #58 is to be followed, installing it just for flip
and complement` looks like overkill.
If it makes life easier for decompiler / transformers, it should be listed as their requirement.
from codetransformer import CodeTransformer, pattern
from codetransformer.instructions import BINARY_ADD, BINARY_MULTIPLY
class add2mul(CodeTransformer):
@pattern(BINARY_ADD)
def _add2mul(self, add_instr):
yield BINARY_MULTIPLY().steal(add_instr)
def f(*, k=3):
return k + 1
ff = add2mul()(f)
print(f.__kwdefaults__) => {'k': 3}
print(f()) => 4
print(ff.__kwdefaults__) => None
print(ff()) => TypeError: f() missing 1 required positional argument: 'k'
I'm guessing it's an old version of the API, but the documentation says:
We can convert our Code object back into its raw form via the
to_bytecode()
method:
Presumably it should say "via the to_pycode()
method:"
We should talk about what to do with steal
. I am not sure I like how it mutates the instructions in place.
We need to actually document how to use any of this. Also maybe do readthedocs or some doc hosting.
I've already made good progress here. I'm quite aware of the challenges since python3 enhanced the capabilities of the dis module. Several modules need to be manually backported.
Reintegrating to retain python 3 support as well will require some effort.
I'm making this issue to judge interest in accepting a PR with this, otherwise I will maintain a separate fork.
The argument for the instruction objects should be the actual thing, not the low level representation.
Hi Everyone,
I've had a change of heart. I now think that it's best to stick to normal mundane technologies rather than perform my normal wizardry.
In a related change of heart, I'm considering changing my username to something that is less easy to mimic with Upper-case I's, perhaps something like @jjevnik , which is still open.
-Evil Joe
for example:
from codetransformer import CodeTransformer, pattern
from codetransformer.instructions import CALL_FUNCTION
def add(*args):
return sum([*args])
class EmptyTransformer(CodeTransformer):
@pattern(CALL_FUNCTION)
def _call(self, call):
yield call
transformer = EmptyTransformer()
new_add = transformer(add)
add() => 0
new_add() => TypeError: add() missing 1 required positional argument: 'args'
The example code from the README works correctly under Python 3.7 but gives a TypeError
under Python 3.8.0. Any pointers for how to fix this?
from codetransformer import CodeTransformer, instructions, pattern
class FoldNames(CodeTransformer):
@pattern(
instructions.LOAD_GLOBAL,
instructions.LOAD_GLOBAL,
instructions.BINARY_ADD,
)
def _load_fast(self, a, b, add):
yield instructions.LOAD_FAST(a.arg + b.arg).steal(a)
@FoldNames()
def f():
ab = 3
return a + b
f()
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-6-319624ac894a> in <module>
11
12 @FoldNames()
---> 13 def f():
14 ab = 3
15 return a + b
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/codetransformer/core.py in __call__(self, f, globals_, name, defaults, closure)
213
214 return FunctionType(
--> 215 self.transform(Code.from_pycode(f.__code__)).to_pycode(),
216 _a_if_not_none(globals_, f.__globals__),
217 _a_if_not_none(name, f.__name__),
/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/codetransformer/code.py in to_pycode(self)
582 bc.append(0)
583
--> 584 return CodeType(
585 self.argcount,
586 self.kwonlyargcount,
TypeError: an integer is required (got type bytes)
Tests should not be something that's shipped to users.
Perhaps tests' dependencies should be specified via setuptools' tests_require
instead of an extra. Without pinning. Pinning should be done in requirements.txt
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.