Git Product home page Git Product logo

codetransformer's People

Contributors

andrewwalker avatar carreau avatar llllllllll avatar ssanderson avatar vgr255 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

codetransformer's Issues

we don't emit EXTENDED_ARG for indices

In to_pycode we need to emit EXTENDED_ARG when the index is > 256 in py3.6 or > 2**16 before py3.6 for instructions like LOAD_FAST where the argument is an index.

Finish Decompiler

Currently Handles:

  • For-Loops
  • While-Loops
  • All Binary Operators except or and and.
  • Dict/Set/Tuple/Number/String/Name Constants
  • Function Definitions
  • Chained Assignments
  • Unpacking Assignments
  • With-Blocks
  • Global and Nonlocal Declarations
  • Imports

Does Not Currently Handle:

  • Python != 3.4.3
  • If-Statements
  • or and and
  • ternary expressions
  • Class Definitions
  • Comprehensions
  • Lambdas
  • Try-Except-Finally Blocks
  • Raise Statements
  • Assert Statements
  • Comparisons

`asconstants` doesn't visit LOAD_DEREF, LOAD_FAST, or LOAD_CLASSDEREF instructions

It should almost certainly be visiting LOAD_DEREF. The others are less clear. LOAD_CLASSDEREF should only appear in the body of a class, which I don't think we know how to visit? And LOAD_FAST implies that there's a local store to the name somewhere in the function. We might also want to be more explicit about rejecting any STORE_* instructions to a name that's been declared frozen.

Make decompiler and transformers optional

If I understand correctly, both decompiler and transformers are optional to the main functionality of the package: inspection and modification of code objects.

I propose to make both packages optional (e.g. via extras_require).

overridden literals functions are broken for comprehensions

dict/list/set comprehensions use BUILD_{type} 0 plus an iterator and a sequence of MAP_ADD/LIST_APPEND/SET_ADD instructions, which breaks with the current transformers that assume you can just insert a function call after BUILD_* call.

The right fix for this is probably a more expressive pattern-based API.

Possible support for simple inline code modifications.

This morning I had a shower idea: "Python function calls can have a high overhead. What if we could use the disassembler to directly insert a function's bytecode into the bytecode of the caller function?".

I wrote a mwe to show that this might provide some speed benefit on some cases:

def add(a, b):
    return a + b


def mult(c, d):
    """
    Time calling the add function and its internal bytecode
    """
    result = 0
    for _ in range(d):
        a = result
        b = c
        result = add(a, b)
    return result

def mult_inline(c, d):
    """
    Time calling just the bytecode inside the add function
    """
    result = 0
    for _ in range(d):
        a = result
        b = c
        result = a + b
    return result

def baseline(c, d):
    """
    Overhead of the function without calling add at all
    """
    result = 0
    for _ in range(d):
        a = result
        b = c
        pass
    return result

def mwe():
    c, d = 100, 100
    import ubelt as ub
    mt = ub.Timerit(100, label='mult')
    for timer in mt:
        with timer: mult(c, d)

    mit = ub.Timerit(100, label='mult-inline')
    for timer in mit:
        with timer: mult_inline(c, d)

    bt = ub.Timerit(100, label='baseline')
    for timer in bt:
        with timer: baseline(c, d)

    overhead = min(bt.times)
    nocall = min(mit.times)
    withcall = min(mt.times)

    # time of a single call to the interesting parts
    add_time = (nocall - overhead) / d * 1e9
    call_add_time = (withcall - overhead) / d * 1e9
    call_time = call_add_time - add_time
    print('add_time = {:.2f} ns'.format(add_time))
    print('call_add_time = {:.2f} ns'.format(call_add_time))
    print('call_time = {:.2f} ns'.format(call_time))

if __name__ == '__main__':  
    mwe()

Output:

Timing complete for: mult, 100 loops, best of 3
    time per loop : best=12.1 µs, ave=12.14 ± 0.15 µs
Timing complete for: mult-inline, 100 loops, best of 3
    time per loop : best=5.731 µs, ave=5.881 ± 0.35 µs
Timing complete for: baseline, 100 loops, best of 3
    time per loop : best=2.547 µs, ave=4.608 ± 3.3 µs
add_time = 31.84 ns
call_add_time = 95.52 ns
call_time = 63.68 ns

This analysis shows that the overhead for calling a function is about 63 nanoseconds in this case. When we insert the bytecode inline inside the loop, we end up saving ~2.5 microseconds.

This shows that there may be some benefit to doing this in simple cases. In more complex cases, this may not be feasible due to the semantics of function scopes, but in simple cases (i.e. where the args to a callee are not modified, and other assumptions I haven't thought of yet are met), this might be doable.

While thinking about this I recalled the PyCon 2016 talk and I rewatched it, and was reminded that the codetransformer project exists. I skimmed the code, but didn't see anything that explicitly did what I envisioned. I was wondering if a PSF certified bytecode expert might weigh in on whether or not this seemed feasible / interesting.

From my initial reading / experiments, there are a few challenges/side-effects that need to be addressed/accepted to do something like this:

  • the locals need to be expanded to include the variables from the inlined function
  • need to resolve conflicts between variable names
  • return statements need to be resolved to assignments to variables in the caller
  • multiple return statements will not work (return outside function)
  • closures, nonlocal, and global access seem like a pain but perhaps could be ignored on a first pass
  • code that reassigns the values of its argument variables may produce unintended results.
  • doing this will trash any hope of getting a reasonable traceback, but that's probably ok.

Might something like this be easy to do with codetransformer? Any ideas for where to get started / things that may cause issues that I haven't considered?

transform pattern

I would like to start thinking about the API for the transformers to allow for pattern matching.

Some ideas:

multipledispatch

The idea with using multiple dispatch would be to attempt to call some dispatched function over all of the instructions, shrinking the tuple until we get a match, and then consuming all of the instructions and continuing.

pros

  1. existing, well tested library
  2. I know how to use it from blaze
  3. clear syntax with the decorator patterm

cons

  1. greedy matching
  2. no blog or abstract pattern stuff

custom pattern DSL

We can always just write some pattern language and mix the abstract pattern types in with our concrete instruction types like:

@dispatch(SETUP_WITH, var, WITH_CLEANUP_FINISH)
def visit_pattern(self, *instrs):
    ....

pros

  1. more expressive

cons

  1. a lot more work

use quasiquotes to have a lexer generator block inlined

We could always go full crazy and do something like alex in our code to define this stuff:

example:

class MyTransformer(CodeTransformer):
    with $rules:
        SETUP_WITH .* WITH_CLEANUP_FINISH { visit_with }
        STORE_FAST LOAD_FAST              { visit_roundtrip }

    def visit_with(self, *instrs):
        ...

    def visit_roundtrip(self, store, load):
        ...

pros

  1. This is wild
  2. really cool
  3. I want to do this
  4. super readable (when you know what you are seeing)
  5. regex-like
  6. frees us from the grip of valid python syntax

cons

  1. quasiquotes are not really that well supported
  2. a lot more work
  3. potentially harder for people to understand

Invalid syntax error

In code.py, lines 68 and 306, there's a lonely "*" used as a parameter, which throws the invalid syntax error. I'm not sure if I should remove them or replace them with something else. I'll be back if I find out.

Remove the toolz dependency

toolz is a great library, but if #58 is to be followed, installing it just for flip and complement` looks like overkill.

If it makes life easier for decompiler / transformers, it should be listed as their requirement.

kwargs default value is not set after transformation

from codetransformer import CodeTransformer, pattern
from codetransformer.instructions import BINARY_ADD, BINARY_MULTIPLY


class add2mul(CodeTransformer):
    @pattern(BINARY_ADD)
    def _add2mul(self, add_instr):
        yield BINARY_MULTIPLY().steal(add_instr)


def f(*, k=3):
    return k + 1


ff = add2mul()(f)

print(f.__kwdefaults__) => {'k': 3}
print(f()) => 4
print(ff.__kwdefaults__) => None
print(ff()) => TypeError: f() missing 1 required positional argument: 'k'

Discuss steal API

We should talk about what to do with steal. I am not sure I like how it mutates the instructions in place.

Add Python 2 Support

I've already made good progress here. I'm quite aware of the challenges since python3 enhanced the capabilities of the dis module. Several modules need to be manually backported.

Reintegrating to retain python 3 support as well will require some effort.

I'm making this issue to judge interest in accepting a PR with this, otherwise I will maintain a separate fork.

Change of heart

Hi Everyone,

I've had a change of heart. I now think that it's best to stick to normal mundane technologies rather than perform my normal wizardry.

In a related change of heart, I'm considering changing my username to something that is less easy to mimic with Upper-case I's, perhaps something like @jjevnik , which is still open.

-Evil Joe

*args after transformation is considerd as a required positional argument

for example:

from codetransformer import CodeTransformer, pattern
from codetransformer.instructions import CALL_FUNCTION

def add(*args):
    return sum([*args])

class EmptyTransformer(CodeTransformer):
    @pattern(CALL_FUNCTION)
    def _call(self, call):
         yield call

transformer = EmptyTransformer()

new_add = transformer(add)

add() => 0
new_add() => TypeError: add() missing 1 required positional argument: 'args'

New TypeError in Python 3.8

The example code from the README works correctly under Python 3.7 but gives a TypeError under Python 3.8.0. Any pointers for how to fix this?

from codetransformer import CodeTransformer, instructions, pattern

class FoldNames(CodeTransformer):
    @pattern(
        instructions.LOAD_GLOBAL,
        instructions.LOAD_GLOBAL,
        instructions.BINARY_ADD,
    )
    def _load_fast(self, a, b, add):
        yield instructions.LOAD_FAST(a.arg + b.arg).steal(a)

@FoldNames()
def f():
    ab = 3
    return a + b

f()
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-319624ac894a> in <module>
     11 
     12 @FoldNames()
---> 13 def f():
     14     ab = 3
     15     return a + b

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/codetransformer/core.py in __call__(self, f, globals_, name, defaults, closure)
    213 
    214         return FunctionType(
--> 215             self.transform(Code.from_pycode(f.__code__)).to_pycode(),
    216             _a_if_not_none(globals_, f.__globals__),
    217             _a_if_not_none(name, f.__name__),

/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/codetransformer/code.py in to_pycode(self)
    582                 bc.append(0)
    583 
--> 584         return CodeType(
    585             self.argcount,
    586             self.kwonlyargcount,

TypeError: an integer is required (got type bytes)

Move tests outside of the packages

Tests should not be something that's shipped to users.

Perhaps tests' dependencies should be specified via setuptools' tests_require instead of an extra. Without pinning. Pinning should be done in requirements.txt

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.