llllllllll / codetransformer Goto Github PK

View Code? Open in Web Editor NEW

184.0 184.0 24.0 308 KB

Python code object transformers

Home Page: http://codetransformer.readthedocs.io

License: GNU General Public License v2.0

Python 100.00%

codetransformer's People

Contributors

Stargazers

Watchers

codetransformer's Issues

we don't emit EXTENDED_ARG for indices

In to_pycode we need to emit EXTENDED_ARG when the index is > 256 in py3.6 or > 2**16 before py3.6 for instructions like LOAD_FAST where the argument is an index.

Finish Decompiler

Currently Handles:

Does Not Currently Handle:

`asconstants` doesn't visit LOAD_DEREF, LOAD_FAST, or LOAD_CLASSDEREF instructions

It should almost certainly be visiting LOAD_DEREF. The others are less clear. LOAD_CLASSDEREF should only appear in the body of a class, which I don't think we know how to visit? And LOAD_FAST implies that there's a local store to the name somewhere in the function. We might also want to be more explicit about rejecting any STORE_* instructions to a name that's been declared frozen.

Make decompiler and transformers optional

If I understand correctly, both decompiler and transformers are optional to the main functionality of the package: inspection and modification of code objects.

I propose to make both packages optional (e.g. via extras_require).

versioneer

we should use it

overridden literals functions are broken for comprehensions

dict/list/set comprehensions use BUILD_{type} 0 plus an iterator and a sequence of MAP_ADD/LIST_APPEND/SET_ADD instructions, which breaks with the current transformers that assume you can just insert a function call after BUILD_* call.

The right fix for this is probably a more expressive pattern-based API.

Possible support for simple inline code modifications.

This morning I had a shower idea: "Python function calls can have a high overhead. What if we could use the disassembler to directly insert a function's bytecode into the bytecode of the caller function?".

I wrote a mwe to show that this might provide some speed benefit on some cases:

def add(a, b):
    return a + b


def mult(c, d):
    """
    Time calling the add function and its internal bytecode
    """
    result = 0
    for _ in range(d):
        a = result
        b = c
        result = add(a, b)
    return result

def mult_inline(c, d):
    """
    Time calling just the bytecode inside the add function
    """
    result = 0
    for _ in range(d):
        a = result
        b = c
        result = a + b
    return result

def baseline(c, d):
    """
    Overhead of the function without calling add at all
    """
    result = 0
    for _ in range(d):
        a = result
        b = c
        pass
    return result

def mwe():
    c, d = 100, 100
    import ubelt as ub
    mt = ub.Timerit(100, label='mult')
    for timer in mt:
        with timer: mult(c, d)

    mit = ub.Timerit(100, label='mult-inline')
    for timer in mit:
        with timer: mult_inline(c, d)

    bt = ub.Timerit(100, label='baseline')
    for timer in bt:
        with timer: baseline(c, d)

    overhead = min(bt.times)
    nocall = min(mit.times)
    withcall = min(mt.times)

    # time of a single call to the interesting parts
    add_time = (nocall - overhead) / d * 1e9
    call_add_time = (withcall - overhead) / d * 1e9
    call_time = call_add_time - add_time
    print('add_time = {:.2f} ns'.format(add_time))
    print('call_add_time = {:.2f} ns'.format(call_add_time))
    print('call_time = {:.2f} ns'.format(call_time))

if __name__ == '__main__':  
    mwe()

Output:

Timing complete for: mult, 100 loops, best of 3
    time per loop : best=12.1 µs, ave=12.14 ± 0.15 µs
Timing complete for: mult-inline, 100 loops, best of 3
    time per loop : best=5.731 µs, ave=5.881 ± 0.35 µs
Timing complete for: baseline, 100 loops, best of 3
    time per loop : best=2.547 µs, ave=4.608 ± 3.3 µs
add_time = 31.84 ns
call_add_time = 95.52 ns
call_time = 63.68 ns

This analysis shows that the overhead for calling a function is about 63 nanoseconds in this case. When we insert the bytecode inline inside the loop, we end up saving ~2.5 microseconds.

This shows that there may be some benefit to doing this in simple cases. In more complex cases, this may not be feasible due to the semantics of function scopes, but in simple cases (i.e. where the args to a callee are not modified, and other assumptions I haven't thought of yet are met), this might be doable.

While thinking about this I recalled the PyCon 2016 talk and I rewatched it, and was reminded that the codetransformer project exists. I skimmed the code, but didn't see anything that explicitly did what I envisioned. I was wondering if a PSF certified bytecode expert might weigh in on whether or not this seemed feasible / interesting.

From my initial reading / experiments, there are a few challenges/side-effects that need to be addressed/accepted to do something like this:

the locals need to be expanded to include the variables from the inlined function
need to resolve conflicts between variable names
return statements need to be resolved to assignments to variables in the caller
multiple return statements will not work (return outside function)
closures, nonlocal, and global access seem like a pain but perhaps could be ignored on a first pass
code that reassigns the values of its argument variables may produce unintended results.
doing this will trash any hope of getting a reasonable traceback, but that's probably ok.

Might something like this be easy to do with codetransformer? Any ideas for where to get started / things that may cause issues that I haven't considered?

Copy lnotab position on steal.

transform pattern

I would like to start thinking about the API for the transformers to allow for pattern matching.

Some ideas:

`multipledispatch`

The idea with using multiple dispatch would be to attempt to call some dispatched function over all of the instructions, shrinking the tuple until we get a match, and then consuming all of the instructions and continuing.

pros

existing, well tested library
I know how to use it from blaze
clear syntax with the decorator patterm

cons

greedy matching
no blog or abstract pattern stuff

custom pattern DSL

We can always just write some pattern language and mix the abstract pattern types in with our concrete instruction types like:

@dispatch(SETUP_WITH, var, WITH_CLEANUP_FINISH)
def visit_pattern(self, *instrs):
    ....

pros

more expressive

cons

a lot more work

use quasiquotes to have a lexer generator block inlined

We could always go full crazy and do something like alex in our code to define this stuff:

example:

class MyTransformer(CodeTransformer):
    with $rules:
        SETUP_WITH .* WITH_CLEANUP_FINISH { visit_with }
        STORE_FAST LOAD_FAST              { visit_roundtrip }

    def visit_with(self, *instrs):
        ...

    def visit_roundtrip(self, store, load):
        ...

pros

This is wild
really cool
I want to do this
super readable (when you know what you are seeing)
regex-like
frees us from the grip of valid python syntax

cons

quasiquotes are not really that well supported
a lot more work
potentially harder for people to understand

Invalid syntax error

In code.py, lines 68 and 306, there's a lonely "*" used as a parameter, which throws the invalid syntax error. I'm not sure if I should remove them or replace them with something else. I'll be back if I find out.

Remove the toolz dependency

toolz is a great library, but if #58 is to be followed, installing it just for flip and complement` looks like overkill.

If it makes life easier for decompiler / transformers, it should be listed as their requirement.

kwargs default value is not set after transformation

from codetransformer import CodeTransformer, pattern
from codetransformer.instructions import BINARY_ADD, BINARY_MULTIPLY


class add2mul(CodeTransformer):
    @pattern(BINARY_ADD)
    def _add2mul(self, add_instr):
        yield BINARY_MULTIPLY().steal(add_instr)


def f(*, k=3):
    return k + 1


ff = add2mul()(f)

print(f.__kwdefaults__) => {'k': 3}
print(f()) => 4
print(ff.__kwdefaults__) => None
print(ff()) => TypeError: f() missing 1 required positional argument: 'k'

Code object documentation says to_bytecode() not to_pycode()

I'm guessing it's an old version of the API, but the documentation says:

We can convert our Code object back into its raw form via the to_bytecode() method:

Presumably it should say "via the to_pycode() method:"

Discuss steal API

We should talk about what to do with steal. I am not sure I like how it mutates the instructions in place.

Documentation for pattern API

We need to actually document how to use any of this. Also maybe do readthedocs or some doc hosting.

Add Python 2 Support

I've already made good progress here. I'm quite aware of the challenges since python3 enhanced the capabilities of the dis module. Several modules need to be manually backported.

Reintegrating to retain python 3 support as well will require some effort.

I'm making this issue to judge interest in accepting a PR with this, otherwise I will maintain a separate fork.

LOAD_(DEREF|CLOSURE) arg is index not str

The argument for the instruction objects should be the actual thing, not the low level representation.

Change of heart

Hi Everyone,

I've had a change of heart. I now think that it's best to stick to normal mundane technologies rather than perform my normal wizardry.

In a related change of heart, I'm considering changing my username to something that is less easy to mimic with Upper-case I's, perhaps something like @jjevnik , which is still open.

-Evil Joe

*args after transformation is considerd as a required positional argument

for example:

from codetransformer import CodeTransformer, pattern
from codetransformer.instructions import CALL_FUNCTION

def add(*args):
    return sum([*args])

class EmptyTransformer(CodeTransformer):
    @pattern(CALL_FUNCTION)
    def _call(self, call):
         yield call

transformer = EmptyTransformer()

new_add = transformer(add)

add() => 0
new_add() => TypeError: add() missing 1 required positional argument: 'args'

New TypeError in Python 3.8

The example code from the README works correctly under Python 3.7 but gives a TypeError under Python 3.8.0. Any pointers for how to fix this?

from codetransformer import CodeTransformer, instructions, pattern

class FoldNames(CodeTransformer):
    @pattern(
        instructions.LOAD_GLOBAL,
        instructions.LOAD_GLOBAL,
        instructions.BINARY_ADD,
    )
    def _load_fast(self, a, b, add):
        yield instructions.LOAD_FAST(a.arg + b.arg).steal(a)

@FoldNames()
def f():
    ab = 3
    return a + b

f()

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-319624ac894a> in <module>
     11 
     12 @FoldNames()
---> 13 def f():
     14     ab = 3
     15     return a + b

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/codetransformer/core.py in __call__(self, f, globals_, name, defaults, closure)
    213 
    214         return FunctionType(
--> 215             self.transform(Code.from_pycode(f.__code__)).to_pycode(),
    216             _a_if_not_none(globals_, f.__globals__),
    217             _a_if_not_none(name, f.__name__),

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/codetransformer/code.py in to_pycode(self)
    582                 bc.append(0)
    583 
--> 584         return CodeType(
    585             self.argcount,
    586             self.kwonlyargcount,

TypeError: an integer is required (got type bytes)

Move tests outside of the packages

Tests should not be something that's shipped to users.

Perhaps tests' dependencies should be specified via setuptools' tests_require instead of an extra. Without pinning. Pinning should be done in requirements.txt

llllllllll / codetransformer Goto Github PK

codetransformer's People

Contributors

Stargazers

Watchers

Forkers

codetransformer's Issues

multipledispatch

pros

cons

custom pattern DSL

pros

cons

use quasiquotes to have a lexer generator block inlined

pros

cons

Recommend Projects

Recommend Topics

Recommend Org

`multipledispatch`