construct / construct Goto Github PK

View Code? Open in Web Editor NEW

This project forked from mostawesomedude/construct

900.0 900.0 154.0 4.16 MB

Construct: Declarative data structures for python that allow symmetric parsing and building

Home Page: http://construct.readthedocs.org

License: Other

Python 99.67% Makefile 0.19% Kaitai Struct 0.14%

construct's Introduction

Construct 2.10

Construct is a powerful declarative and symmetrical parser and builder for binary data.

Instead of writing imperative code to parse a piece of data, you declaratively define a data structure that describes your data. As this data structure is not code, you can use it in one direction to parse data into Pythonic objects, and in the other direction, to build objects into binary data.

The library provides both simple, atomic constructs (such as integers of various sizes), as well as composite ones which allow you form hierarchical and sequential structures of increasing complexity. Construct features bit and byte granularity, easy debugging and testing, an easy-to-extend subclass system, and lots of primitive constructs to make your work easier:

Fields: raw bytes or numerical types
Structs and Sequences: combine simpler constructs into more complex ones
Bitwise: splitting bytes into bit-grained fields
Adapters: change how data is represented
Arrays/Ranges: duplicate constructs
Meta-constructs: use the context (history) to compute the size of data
If/Switch: branch the computational path based on the context
On-demand (lazy) parsing: read and parse only what you require
Pointers: jump from here to there in the data stream
Tunneling: prefix data with a byte count or compress it

Example

A Struct is a collection of ordered, named fields:

>>> format = Struct(
...     "signature" / Const(b"BMP"),
...     "width" / Int8ub,
...     "height" / Int8ub,
...     "pixels" / Array(this.width * this.height, Byte),
... )
>>> format.build(dict(width=3,height=2,pixels=[7,8,9,11,12,13]))
b'BMP\x03\x02\x07\x08\t\x0b\x0c\r'
>>> format.parse(b'BMP\x03\x02\x07\x08\t\x0b\x0c\r')
Container(signature=b'BMP')(width=3)(height=2)(pixels=[7, 8, 9, 11, 12, 13])

A Sequence is a collection of ordered fields, and differs from Array and GreedyRange in that those two are homogenous:

>>> format = Sequence(PascalString(Byte, "utf8"), GreedyRange(Byte))
>>> format.build([u"lalaland", [255,1,2]])
b'\nlalaland\xff\x01\x02'
>>> format.parse(b"\x004361789432197")
['', [52, 51, 54, 49, 55, 56, 57, 52, 51, 50, 49, 57, 55]]

construct's People

Contributors

Stargazers

Watchers

Forkers

encukou emidln andrzej-bieniek jbremer ramiro jeeveeyes levigross markcottrell lodagro blapid mocramis liori tdsparrow tmr232 rayleyva marvids ceridwen rgom mosquito gkonstantyno mfine larokin pnd10 mgrandi thedrow mickael9 daidaotian appknox cbisnett danome gamefreak riggs gitju d3xt3-bitstechlab matwey jamesremuscat beardypig domenzain catlee grv87 brupelo hardkrash bms20 yawor imbaczek perimosocordiae edensal edensa douyuan tjora markogle yoelk yyogo potens1 j-kleber kevinmgranger franscogini mglt ywx217 movermeyer nicwolf dmalinovsky-issart malinoff yairchu yannayl gitter-badger papperlapapp reox evidlo vetse tonysimpson traverseda leromarinvit matan1008 zvezdan tkhuang kshanafelt dominicantonacci cptguo jpsnyder konstantinklepikov f2011b mrzechonek guyongqiangx mikiones mcutools ciphertechsolutions rootkiter jmikedupont2 flameeyes waszil dansemacabre gridl cantzler sthagen cydave theassyrian idkwim mgoral 5l1v3r1

construct's Issues

Can't install 2.5.0 (from default PyPI URL) using pip

Note how by default it installs 2.06; and if you specify 2.5.0 it gets 2.0.5 (!):

$ python -V
Python 2.6.6

$ virtualenv --version
1.9.1

$ mkvirtualenv construct_test
New python executable in construct_test/bin/python
Installing setuptools............done.
Installing pip...............done.
virtualenvwrapper.user_scripts creating /home/ramiro/venv/construct_test/bin/predeactivate
virtualenvwrapper.user_scripts creating /home/ramiro/venv/construct_test/bin/postdeactivate
virtualenvwrapper.user_scripts creating /home/ramiro/venv/construct_test/bin/preactivate
virtualenvwrapper.user_scripts creating /home/ramiro/venv/construct_test/bin/postactivate
virtualenvwrapper.user_scripts creating /home/ramiro/venv/construct_test/bin/get_env_details

$ pip install construct
Downloading/unpacking construct
  Downloading construct-2.06.tar.gz (85kB): 85kB downloaded
  Running setup.py egg_info for package construct
Installing collected packages: construct
  Running setup.py install for construct
Successfully installed construct
Cleaning up...

$ pip uninstall construct
...
  Successfully uninstalled construct

$ pip install construct==2.5.0
Downloading/unpacking construct==2.5.0
  Downloading construct-2.05.tar.gz (85kB): 85kB downloaded
  Running setup.py egg_info for package construct
Installing collected packages: construct
  Running setup.py install for construct
Successfully installed construct
Cleaning up...

One needs to specify the tarball URL to get it to behave:

$ pip uninstall construct
...
  Successfully uninstalled construct

$ pip install https://pypi.python.org/packages/source/c/construct/construct-2.5.0.tar.gz#md5=05ac994e51d32a011e650b4410ceb72c
Downloading/unpacking https://pypi.python.org/packages/source/c/construct/construct-2.5.0.tar.gz
  Downloading construct-2.5.0.tar.gz (56kB): 56kB downloaded
  Running setup.py egg_info for package from https://pypi.python.org/packages/source/c/construct/construct-2.5.0.tar.gz
Downloading/unpacking six (from construct==2.5.0)
  Downloading six-1.3.0.tar.gz
  Running setup.py egg_info for package six
Installing collected packages: six, construct
  Running setup.py install for six
  Running setup.py install for construct
Successfully installed six construct
Cleaning up...

Anchors cause an AttributeError when building.

If an Anchor is in a construct, it must be in the Container used for building the stream.
I would prefer to not have to insert a useless value into the Container while building.
i.e. Following the behavior of Padding, when building.

There is a workaround is to set the Anchor's required container value to None while building.

More natural handling of little-endian BitStructs

Goal: parse a protocol with little-endian fields. Consider this hypothetical specification for 24-bit data:

bits	description
0	Flag 1
1	Flag 2
03:02	Reserved
19:04	Number (16-bit)
23:20	Reserved 2

As far as I can see there is currently no way to construct a structure for this (where the number overlaps multiple fields). Perhaps this code example demonstrates the problem better:

from construct import *

# Cool, very readable (big endian)!
example_struct = BitStruct("Example",
    Padding(4),                     # bits 23:20
    BitField("number", 16),         # bits 19:04
    Padding(2),                     # bits 03:02
    Bit("flag2"),                   # bit 1
    Bit("flag1")                    # bit 0
)

# WTF?
example_struct_little_endian = BitStruct("Example",
    BitField("number_lo", 4),       # bits 07:04
    Padding(2),                     # bits 03:02
    Bit("flag2"),                   # bit 1
    Bit("flag1"),                   # bit 0

    BitField("number_mi", 8),       # bits 15:08

    Padding(4),                     # bits 23:20
    BitField("number_hi", 4)        # bits 19:16
)


# Tests
data_be = b'\x0a\xbc\xd1'
data_le = b'\xd1\xbc\x0a'

# Data in network order (big endian).
testdata = Container(flag1=1, flag2=0, number=0xabcd)
data = example_struct.build(testdata)
print(data)
assert(data == data_be)

c = example_struct.parse(data)
print(c)
assert(c.flag1 == testdata.flag1)
assert(c.flag2 == testdata.flag2)
assert(c.number == testdata.number)


# Data in little endian.
testdata = Container(number_hi=0xa, flag2=0, flag1=1,
                     number_mi=0xbc, number_lo=0xd)
data = example_struct_little_endian.build(testdata)
print(data)
print(data_le)
assert(data == data_le)

c = example_struct_little_endian.parse(data)
print(c)
assert(c.flag1 == testdata.flag1)
assert(c.flag2 == testdata.flag2)
# It would be nice if this manual computation was unnecessary, so we can use
# number = c.number
number = (c.number_hi << 12) | (c.number_mi << 4) | c.number_lo
assert(number == 0xabcd)

It would be great if we could write a structure like:

LittleEndian24BitStruct("Example",
    Bit("flag1"),                   # bit 0
    Bit("flag2"),                   # bit 1
    Padding(2),                     # bits 03:02
    BitField("number", 16),         # bits 19:04
    Padding(4)                      # bits 23:20
)

so it can also work where fields are overloaded to be 32-bit or 64-bit (depending on the lower bits).

The above is a toy example, in reality I somehow have to parse a larger structure containing sequences of multiple structures of varying size (16-bit, 32-bit, 64-bit).

Adapter cannot be used here as the transformation needs to be done after reading the bytes, but before converting it to an object. Maybe a (sub)construct can be used here though.

TunnelAdapter: zlib

Hi, I am following the example from docs:

TunnelAdapter(
    PascalString("data", encoding = "zlib"),
    GreedyRange(UBInt16("elements"))
)

In python 3.3, I am getting at parse:
LookupError: unknown encoding: zlib

Protocol Buffers Varint encoding

I noticed that varint encoding is missing among the atomic types. I will work on this myself, just letting you know in advance. If you could point me in the right direction, please do so.

Explanation of vartint:
https://developers.google.com/protocol-buffers/docs/encoding

Can not access parent context from inner structure in OnDemand

Hi, it seems the context for inner structures is not constructed if they are used in an OnDemand construct, e.g.:

from construct import Struct, String, UBInt8, OnDemand

inner = Struct('inner',
               String('innerstring', lambda ctx: ctx._.length, encoding='utf-8'),
               )

outer = Struct('outer',
               UBInt8('length'),
               OnDemand(inner),
               )

Trying to parse the length-prefixed sting leads to an exception:

>>> outer.parse(b'\x03abc')
AttributeError: length

(for full trace see below).

If the OnDemand is removed in the example above, everything works fine. Is this a bug, or did I miss something?

Full trace:

ERROR:root:length
Traceback (most recent call last):
  File "/home/ich/.virtualenvs/construct/lib/python3.4/site-packages/construct/lib/container.py", line 33, in __getattr__
    return self[name]
KeyError: 'length'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<ipython-input-153-9b84b0441fcc>", line 2, in <module>
    outer.parse(b'\x03abc')
  File "/home/ich/.virtualenvs/construct/lib/python3.4/site-packages/construct/core.py", line 188, in parse
    return self.parse_stream(BytesIO(data))
  File "/home/ich/.virtualenvs/construct/lib/python3.4/site-packages/construct/core.py", line 198, in parse_stream
    return self._parse(stream, Container())
  File "/home/ich/.virtualenvs/construct/lib/python3.4/site-packages/construct/core.py", line 670, in _parse
    subobj = sc._parse(stream, context)
  File "/home/ich/.virtualenvs/construct/lib/python3.4/site-packages/construct/core.py", line 1052, in _parse
    stream.seek(self.subcon._sizeof(context), 1)
  File "/home/ich/.virtualenvs/construct/lib/python3.4/site-packages/construct/core.py", line 695, in _sizeof
    return sum(sc._sizeof(context) for sc in self.subcons)
  File "/home/ich/.virtualenvs/construct/lib/python3.4/site-packages/construct/core.py", line 695, in <genexpr>
    return sum(sc._sizeof(context) for sc in self.subcons)
  File "/home/ich/.virtualenvs/construct/lib/python3.4/site-packages/construct/core.py", line 275, in _sizeof
    return self.subcon._sizeof(context)
  File "/home/ich/.virtualenvs/construct/lib/python3.4/site-packages/construct/core.py", line 402, in _sizeof
    return self.lengthfunc(context)
  File "<ipython-input-87-ce9b61e96ee4>", line 2, in <lambda>
    String('innerstring', lambda ctx: ctx._.length, encoding='utf-8'),
  File "/home/ich/.virtualenvs/construct/lib/python3.4/site-packages/construct/lib/container.py", line 35, in __getattr__
    raise AttributeError(name)
AttributeError: length

Over broad exception catching (except Exception)

I have an issue where I'm running pyelftools and need to interrupt processing. I'm calling pyelftools from a Celery task. Celery cancels tasks by raising an celery.exceptions.Terminated exception. I have traced the offending except Exception to construct library where it needs to be fixed.

Due to the problem, I can't reliably cancel my task as sometimes construct except Exception will eat my celery exception and as far as I know there's no way to work around it.

As a fix, all occurences of except Exception should be changed to catch more specific exceptions, such as IOError, ValueError etc.

Example of an offending exception:
https://github.com/construct/construct/blob/master/construct/core.py#L364


$ git grep --line-number "except Exception"
construct/core.py:246:        except Exception:
construct/core.py:364:        except Exception:
construct/core.py:369:        except Exception:
construct/core.py:928:                except Exception:
construct/core.py:1398:        except Exception:
construct/core.py:1405:        except Exception:
construct/debug.py:110:        except Exception:
construct/debug.py:121:        except Exception:

This is follow-up from eliben/pyelftools#59 where they told me construct is upstream so here I am.

How to parse/build variable sized array of arrays?

Sorry that this is mostly a help post instead of a real issue...

I have a structure like (pseudo-C):

u32 totalSize;
u32 numArrays;
u32 arrayOffsets[numArrays];
// totalSize includes arrayOffsets
u8 arrayData[totalSize - numArrays * sizeof(u32)];

This is used to describe an array of arrays of multiples of 0x14 bytes (including 0).
In other words, parsing might look like this:

std::vector<std::vector<std::array<u8, 0x14>>> array_of_arrays;
for (int i = 0; i < numArrays; ++i) {
  u32 nextOffset = (i == numArrays - 1) ? totalSize : arrayOffsets[i + 1];
  u32 arraySize = nextOffset - arrayOffsets[i];
  std::vector<std::array<u8, 0x14>> arrays;
  u32 dataOffset = arrayOffsets[i] - numArrays * sizeof(u32);
  for (int j = 0; j < arraySize / 0x14; ++j, dataOffset += 0x14) {
    arrays.pseudo_push(&arrayData[dataOffset], 0x14);
  }
  array_of_arrays.push_back(arrays);
}

Is there a way to describe such a structure in construct so that it is buildable as well? The closest I've managed to get is basically parsing a Bytes object with a length of totalSize and manually running over the structure to parse it, after construct has done its parsing.

Values need to be provided to `Anchor` types when Structs are built.

Consider the following definition:

from hashlib import sha512
from construct import Struct, PascalString, Bytes, Terminator, UBInt16, Validator, Container, Anchor, Pass, ValidationError


class ChecksumValidator(Validator):
    def _validate(self, obj, ctx):
        return obj == sha512(ctx.string).hexdigest()

class ChecksumLengthValidator(Validator):
    def _validate(self, obj, ctx):
        return ctx.checksum_end - ctx.checksum_start == 128

SumString = Struct(
    "checksum_string",
    PascalString('string', length_field=UBInt16('length')),
    Anchor('checksum_start'),
    ChecksumValidator(Bytes('checksum', 128)),
    Anchor('checksum_end'),
    ChecksumLengthValidator(Pass),
    Terminator
)

def make_string(string):
    checksum = sha512(string).hexdigest()
    return SumString.build(Container(string=string, checksum=checksum))

Calls to make_string will fail, as checksum_start and checksum_end values aren't provided to the Container in SumString.build. The actual values passed are irrelevant, as they're properly populated when deserializing.

Also, calling ChecksumLengthValidator on Pass is kinda hacky.

How is construct3 going?

I understand that construct 2.x is meant to exist for as long Python 2.x exists but should we consider abandoning the feature requests and move forward to 3.x? @tomerfiliba

Edit: oh but the project supports PY3 as well.

Unicode Unsupported for Struct Names in Python 2

In Python 2, using a unicode object as the name for a struct object raises an exception. (This does not affect Python 3.) For example, consider the following code:

>>> import construct
>>> my_field = construct.Byte(u'MyName')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/matthewlefavor/python/py2_sandbox/lib/python2.7/site-packages/construct/macros.py", line 131, in UBInt8
    return FormatField(name, ">", "B")
  File "/Users/matthewlefavor/python/py2_sandbox/lib/python2.7/site-packages/construct/core.py", line 352, in __init__
    StaticField.__init__(self, name, self.packer.size)
  File "/Users/matthewlefavor/python/py2_sandbox/lib/python2.7/site-packages/construct/core.py", line 324, in __init__
    Construct.__init__(self, name)
  File "/Users/matthewlefavor/python/py2_sandbox/lib/python2.7/site-packages/construct/core.py", line 103, in __init__
    raise TypeError("name must be a string or None", name)
TypeError: ('name must be a string or None', u'MyName')

I have traced this to line 102 of construct.core, in the constructor for the Construct class, which explicitly asks whether the name argument is an instance of str (not even a subclass of str is allowed!):

101    if name is not None:
102        if type(name) is not str:
103            raise TypeError("name must be a string or None", name)
104        if name == "_" or name.startswith("<"):
105            raise ValueError("reserved name", name)

The simplest solution is to replace if type(name) is not str with the following line (and the appropriate import of six):

if not isinstance(name, six.string_types):

The reason this is an issue (should you need convincing) is that much Python 2 code, in an effort to ease the eventual transition to Python 3, activates the unicode_literals feature from __future__. This, combined with the bug, makes it impossible to use any string literal as a name argument.

I will begin work on a patch soon.

Create (empty) bitmap_file in memory

I am trying to create an empty bitmap in memory, so i can save it later as a file.
My current approach is to create a container containing all required attributes, an then use bitmap_file.build or bitmap_file.sizeof, but both wont work in this case:

container = Container(
          signature = 'BM',
          file_size = 0,
          data_offset = 0,
          header_size = 12,
          version = 'v2',
          width = imageWidth,
          height = imageHeight,
          number_of_pixels = imageWidth * imageHeight,
          planes = 1,
          bpp = 8,
          compression = 'Uncompressed',
          image_data_size = 0,
          horizontal_dpi = 72,
          vertical_dpi = 72,
          colors_used = 0,
          important_colors = 0,
          palette = [Container(red=i, green=i, blue=i) for i in range(0, 256)],
          pixels = [[0 for column in range(0, imageWidth)] for row in range(0, imageHeight)
)

size = bitmap_file.sizeof(container)

return bitmap_file.build(container)

sizeof will raise an exception in the bitmap_file's lambda expression for the palette-length, build will take very long time and then only return a long sequence of zero-bytes.

Best would be to be able to initialize the pixels-attribute with a byte-array or something like this.

Container iterkeys/itervalues/iteritems produce wrong order

I noticed that Container class produces keys in wrong order. Added regression tests.
https://github.com/construct/construct/commits/fix-container-iter

MERGED INTO 82

There is a test fail and its random, I dont think this failed before a rebase. Could someone investigate this?
https://travis-ci.org/construct/construct/builds/155820801

ULInt24 throws a field error when building

ULInt24 is a weird case in that it has the same format as all of the other multi-byte parsers, but it acts very differently.

It does seem to work when parsing a binary, but the struct library will throw a Field Error when building a binary.

>>> parser = Struct('foo', ULInt24('my_24bit_thing'))
>>> parser.build(Container(my_24bit_thing=123))
FieldError(error('pack expected 2 items for packing (got 1)',)

Any hints on getting this to work or is it just broken? I'm replacing all ULInt24 parsers in my code with BitFields for now.

Building a Struct and automatically updating some fields (like a length field)

Hi,
I'm sorry if this was covered in the documentation, I tried finding it but did not succeed.
Let's say I want to use the IP Struct to generate an IP message.
Is it possible to make the build() function calculate the length and put it in the right place in the Struct?

Meaning, can I have a field in a struct that is calculated according to the other fields?

Toda,
Joel

Embed / Optional issue

In the example below the BitStruct i can turn undesired into an embedded one, depending if the two fields defined before it are Embedded and/or Optional. At the bottom of the code there are three tests describing the issue.

import construct


def vstring(name, embed=True, optional=True):
    name_length_field = "_%s_length" % name.lower()
    s = construct.Struct(
        name,
        construct.ULInt8(name_length_field),
        construct.String(name,
                         lambda ctx: getattr(ctx, name_length_field)))
    if optional:
        s = construct.Optional(s)
    if embed:
        s = construct.Embed(s)
    return s


def build_struct(embed_g=True, embed_h=True):
    s = construct.Struct(
        "mystruct",
        construct.ULInt32("a"),
        construct.ULInt8("b"),
        construct.ULInt8("c"),
        construct.BitStruct(
            "d",
            construct.BitField("d_bit7", 1),
            construct.BitField("d_bit6", 1),
            construct.BitField("d_bit5", 1),
            construct.BitField("d_bit4", 1),
            construct.BitField("d_bit3", 1),
            construct.BitField("d_bit2", 1),
            construct.BitField("d_bit1", 1),
            construct.BitField("d_bit0", 1)),
        construct.BitStruct(
            "e",
            construct.BitField("e_bit7", 1),
            construct.BitField("e_bit6", 1),
            construct.BitField("e_bit5", 1),
            construct.BitField("e_bit4", 1),
            construct.BitField("e_bit3", 1),
            construct.BitField("e_bit2", 1),
            construct.BitField("e_bit1", 1),
            construct.BitField("e_bit0", 1)),
        construct.BFloat32("f"),
        vstring("g", embed=embed_g),
        vstring("h", embed=embed_h),
        construct.BitStruct(
            "i",
            construct.BitField("i_bit7", 1),
            construct.BitField("i_bit6", 1),
            construct.BitField("i_bit5", 1),
            construct.BitField("i_bit4", 1),
            construct.BitField("i_bit3", 1),
            construct.BitField("i_bit2", 1),
            construct.BitField("i_bit1", 1),
            construct.BitField("i_bit0", 1)),
        construct.SBInt8("j"),
        construct.SBInt8("k"),
        construct.SBInt8("l"),
        construct.LFloat32("m"),
        construct.LFloat32("n"),
        vstring("o"),
        vstring("p"),
        vstring("q"),
        vstring("r"))
    return s


bytes = ('\xc3\xc0{\x00\x01\x00\x00\x00HOqA\x12some silly text...\x00\x0e\x00'
         '\x00\x00q=jAq=zA\x02dB\x02%f\x02%f\x02%f')

print ("\n\nNo embedding for neither g and h, i is a container --> OK")
print build_struct(embed_g=False, embed_h=False).parse(bytes)
print ("Embed both g and h, i is not a container --> FAIL")
print build_struct(embed_g=True, embed_h=True).parse(bytes)
print ("\n\nEmbed g but not h --> EXCEPTION")
print build_struct(embed_g=True, embed_h=False).parse(bytes)

# When setting optional to False in vstring method, all three tests above
# work fine.

running the scripts (python 2.7, construct 2.5.1)

No embedding for neither g and h, i is a container --> OK
Container:
    a = 8110275
    b = 1
    c = 0
    d = Container:
        d_bit7 = 0
        d_bit6 = 0
        d_bit5 = 0
        d_bit4 = 0
        d_bit3 = 0
        d_bit2 = 0
        d_bit1 = 0
        d_bit0 = 0
    e = Container:
        e_bit7 = 0
        e_bit6 = 0
        e_bit5 = 0
        e_bit4 = 0
        e_bit3 = 0
        e_bit2 = 0
        e_bit1 = 0
        e_bit0 = 0
    f = 212421.015625
    g = Container:
        g = 'some silly text...'
    h = Container:
        h = ''
    i = Container:
        i_bit7 = 0
        i_bit6 = 0
        i_bit5 = 0
        i_bit4 = 0
        i_bit3 = 1
        i_bit2 = 1
        i_bit1 = 1
        i_bit0 = 0
    j = 0
    k = 0
    l = 0
    m = 14.640000343322754
    n = 15.640000343322754
    o = 'dB'
    p = '%f'
    q = '%f'
    r = '%f'
Embed both g and h, i is not a container --> FAIL
Container:
    a = 8110275
    b = 1
    c = 0
    d = Container:
        d_bit7 = 0
        d_bit6 = 0
        d_bit5 = 0
        d_bit4 = 0
        d_bit3 = 0
        d_bit2 = 0
        d_bit1 = 0
        d_bit0 = 0
    e = Container:
        e_bit7 = 0
        e_bit6 = 0
        e_bit5 = 0
        e_bit4 = 0
        e_bit3 = 0
        e_bit2 = 0
        e_bit1 = 0
        e_bit0 = 0
    f = 212421.015625
    g = 'some silly text...'
    h = ''
    i_bit7 = 0
    i_bit6 = 0
    i_bit5 = 0
    i_bit4 = 0
    i_bit3 = 1
    i_bit2 = 1
    i_bit1 = 1
    i_bit0 = 0
    i = <...>
    j = 0
    k = 0
    l = 0
    m = 14.640000343322754
    n = 15.640000343322754
    o = 'dB'
    p = '%f'
    q = '%f'
    r = '%f'


Embed g but not h --> EXCEPTION
Traceback (most recent call last):
  File "embed.py", line 77, in <module>
    print build_struct(embed_g=True, embed_h=False).parse(bytes)
  File ".../lib/python2.7/site-packages/construct/core.py", line 187, in parse
    return self.parse_stream(BytesIO(data))
  File ".../lib/python2.7/site-packages/construct/core.py", line 197, in parse_stream
    return self._parse(stream, Container())
  File ".../lib/python2.7/site-packages/construct/core.py", line 664, in _parse
    raise OverwriteError("%r would be overwritten but allow_overwrite is False" % (sc.name,))
construct.core.OverwriteError: 'h' would be overwritten but allow_overwrite is False

FlagsEnum print issue

The following test case does not seem work recently. By git bisect, this seems to be regression issue since commit 79ac197

make container preserve order of insertion; closes #16

from construct import *

def FLAGS(name):
    return Struct(name,
                 FlagsEnum(ULInt8('Flag'),
                    FLAG1     = 0x01,
                    FLAG2     = 0x04,
                    FLAG3     = 0x10,
                    FLAG4     = 0x80,  
                 )  
               )

c= FLAGS('a').parse('\x01')
print c

Traceback (most recent call last):
  File "U:\construct\construct\tests\t2.py", line 14, in 
    print c
  File "U:\construct\construct\lib\container.py", line 12, in wrapper
    return func(self, _args, *_kw)
  File "U:\construct\construct\lib\container.py", line 101, in **pretty_str**
    text.append(v.**pretty_str**(nesting + 1, indentation))
  File "U:\construct\construct\lib\container.py", line 12, in wrapper
    return func(self, _args, *_kw)
  File "U:\construct\construct\lib\container.py", line 125, in **pretty_str**
    v = self.**dict**[k]
KeyError: 'FLAG4'

building failed for BitField with dynamic length

With construct 2.5.1

s = Struct('test',UBInt8('len'), EmbeddedBitStruct(BitField('data', lambda ctx: ctx.len)))
s.build(Container(len=8, data=1))
....
....
114 if number < 0:
115 number += 1 << width
--> 116 i = width - 1
117 bits = ["\x00"] * width
118 while number and i >= 0:
TypeError: unsupported operand type(s) for -: 'function' and 'int'

how to access the value of previous obj in "greedy range objs" (OptionalGreedyRange(obj) ) ?

I met a strange protocol, that need to parse the packet (header+payload), and there could be multiple packet in a single transaction. It looks that the OptionalGreedyRange fit the case very well.
How ever, in the header, there is a field (flag = "unchanged") which means you need to re-use the previous payload's(in previous packet) format to decode the current payload.
In this case, is there a way to access value of previous obj when parsing the "greedy range objs"?
Can i set some flag(eg. Value) in the context and then can be accessed when parsing the next packet?

Example for string build function not working for python3

Hi,

the example in

construct/construct/macros.py

Line 522 in 208df16

>>> foo.build("hello")

won't work for python3, as long as there is no encoding set for the String macro. Guess there should be a hint in the documentation.

When I do

String('foo', 3).build('abc')

I get the error

TypeError: 'str' does not support the buffer interface

Support realtime streaming data parse

Can construct support streamed data parsing?

How can I handle large files efficiently?

I'm trying to use construct to parse large binary files, often in the range of 500 MBytes.

It works, but it's extremely slow and uses enormous quantities of RAM. A 42.4 MByte file takes ~1 minute to load, and consumes about 2.5 gigabytes of ram once the file is fully loaded.

How does one handle large files effectively with construct?

My parser is as follows

import construct as c

COMBO_PARSER = c.GreedyRange(
    c.Struct("sweep",
        c.SNInt32("segment_sz"),

        c.Struct("header",
            c.SNInt32("frame"),
            c.SNInt32("combo"),
            c.NFloat64("time"),
            c.SNInt32("tx"),
            c.SNInt32("rx"),
            c.SNInt32("tx_p"),
            c.SNInt32("rx_p"),
            c.SNInt32("gain"),
        ),

        c.Struct("serial",
                c.UNInt8("serial_present"),
                c.If(
                    lambda ctx:ctx['serial_present'] > 0,
                    c.Array(CONST__BYTESPerTelemetry, c.SNInt8("serial_data"))
                )
            ),

        c.Struct("telemetry",
                c.NFloat64("comboDelay"),
                c.NFloat64("x"),
                c.NFloat64("y"),
                c.NFloat64("distance"),


                c.NFloat32("theta"),
                c.NFloat32("velocity"),
                c.NFloat32("turnRate"),
                c.Padding(4),

                c.NFloat64("timeStamp"),

                c.UNInt32("LSE"),
                c.UNInt32("RSE"),
                c.UNInt32("flipperSE"),

                c.UNInt16("sonar"),
                c.UNInt8("temperature"),

                c.Padding(1)
            ),
        c.Struct("stitch_points",
                c.SNInt32("stitch_count"),
                c.Array(lambda ctx: ctx.stitch_count,
                    c.Struct("stitch_point",
                            c.SNInt32("I"),
                            c.NFloat32("Q"),
                            c.NFloat32("is_stitch"),

                        )
                    )
            ),
        c.UNInt32("data_point_cnt"),

        c.Array(lambda ctx:ctx.data_point_cnt,
            c.Struct("data_point_arr",
                    c.NFloat32("I"),
                    c.NFloat32("Q"),
                )
            ),
        c.Anchor("data_end"),
    )
)

Aligned() doesn't align

The documentation for Aligned() implies it works roughly the same way as alignment for C __struct__s:

Aligns the subconstruct to a given modulus boundary (default is 4).

However, the reality doesn't match the expectations. Code:

import construct as c

s = c.Struct(
    'x',
    c.Byte('a'),
    c.Aligned(c.Byte('b')),
    c.Byte('c')
)
print repr(s.build(c.Container(a=1, b=2, c=3)))

Output:

'\x01\x02\x00\x00\x00\x03'

So the offsets for a, b and c are 0, 1 and 5. b isn't aligned (neither is c). It seems Align() is merely padding the field.

I would have expected the following output with offsets 0, 4 and 5:

'\x01\x00\x00\x00\x02\x03'

If this is intended behavior, I find the terminology confusing. In any case I think real alignment control would be much more useful than padding.

Numpy Arrays building and parsing

Hello,

I wonder if construct can deal with Numpy Arrays and the use of memmap.
http://docs.scipy.org/doc/numpy/reference/generated/numpy.memmap.html

I didn't see anything about this in the doc but it might be efficient because
I have a binary file format with 1863 bytes of headers (including an array length at offset 1859), an array of 9 unsigned floats (little endian) and 104 bytes at the end.

Kind regards

A few errors found in d050dd4

Firstly, when importing all modules from construct (from construct import *) you raise an AttributeError - "module has no attribute GreedyRepeater". It appears you've renamed it GreedyRange (or something like that).

Secondly, when using the Switch example found in the documentation, there is a TypeError: 'str' does not support the buffer interface. The documentation does not reflect the fact you have to pass a bytes stream,

The problem is also found in binary.py whereby ord() and ch() are used, when they cannot work in 3.2 due to the distinction between bytes and unicode strings.

Another error i got was when using 8 flags and two BFloat32s in the same struct,even though they were a multiple of eight.

Finally, when using String and parsing it within a struct, its value is a byte stream, not unicode (or default) string. i'd expect that unless an encoding is specified, it defaults to utf-8? Or this could just be from my perspective.

Update http://construct.wikispaces.com/

"The latest version of Construct is version 2.04, and it is available for download from PyPI"

Latest release on PyPI is 2.06 - and btw. how about a new (python3 compatible) release?

python 3 compatibility

Const & FlagsEnum

FlagsEnum.parse returns a dict-like object, FlagsContainer, which means that, when using Const to verify its value, one musts supply a dict-like object. However, FlagsEnum.build (like the rest of construct) examines the passed-in object's attributes for the flag values. Thus, passing a dictionary to build will cause it to not set any flags. However, passing in an object rather than a dictionary will cause Const to fail its test.

To wit:

In [20]: Const(FlagsEnum(Byte("types"), feature=4, output=2, input=1), {'feature': True, 'output': False, 'input': False}).parse('\x04')
Out[20]: FlagsContainer({'output': False, 'feature': True, 'input': False})

In [21]: Const(FlagsEnum(Byte("types"), feature=4, output=2, input=1), {'feature': True, 'output': False, 'input': False}).build({'feature': True, 'output': False, 'input': False})
Out[21]: '\x00'    ## ERROR ##

In [22]: FlagsEnum(Byte("types"), feature=4, output=2, input=1).build(types_obj)
Out[22]: '\x04'

In [23]: Const(FlagsEnum(Byte("types"), feature=4, output=2, input=1), {'feature': True, 'output': False, 'input': False}).build(types_obj)
---------------------------------------------------------------------------
ConstError                                Traceback (most recent call last)
<snip>
ConstError: expected {'output': False, 'feature': True, 'input': False}, found <types_obj>

I'm guessing the more consistent solution is to refactor FlagsContainer to not be a dict-like object.

features

Tomer,

I checked out the versions of Construct and the last version that did what I needed it to do was version 2.04.

One of the capabilities I needed was a way to display the parsing in the order of the constructs...later versions did not allow me to do this. It is far faster to find things and fix things when they are in order on a large parse.

The seconds capability I needed was a way to display a table with one row showing the dot notation needed to access and element and the other column containing the value of the object....again all this in parse order.

With version 2.04 I was able to achieve these two capabilities...but not with the later versions of Construct.

This email however, is a request for an additional capability (possible as a parameter to set 'True' or 'False'). The capability is to have Construct generate an exception whenever a Construct has embedded within it another structure that would overwrite existing names at the same level...an example:

This would NOT generate an exception (since a,b,c,d do not overwrite one another):

>>> foo = Struct("foo",
...     UBInt8("a"),
...     UBInt8("b"),
... )
>>> bar= Struct("bar",
...     Embed(foo, overwrite=True),  # this places "foo" at same level as "bar"
...     UBInt8("c"),
...     UBInt8("d"),
... )
>>> bar.parse("abcd")
Container(a = 97, b = 98, c = 99, d = 100)

This WOULD generate an exception (since a,b get overwritten):

>>> foo = Struct("foo",
...     UBInt8("a"),
...     UBInt8("b"),
... )
>>> bar= Struct("bar",
...     Embed(foo, overwrite=False),    # this would generate exception if overwrite takes place
...     UBInt8("a"),
...     UBInt8("b"),
... )
>>> bar.parse("abcd")
EXCEPTION GENERATED...

This capability would help immensely in my very large parsing project.
If you can either add this capability or show me how I can add it, it would be a huge time saver.
Construct is an amazing tool and I'm glad you wrote it.

Thanks Tomer,
-Joe

Aligned() or AlignedStruct() doesn't work with PascalString()

The following code:

from construct import AlignedStruct, ULInt16, PascalString
s = AlignedStruct('s', ULInt16('id'), PascalString('txt'))
print s.parse('\0\1\2\3\4abcd')

gives the following stack trace:

Traceback (most recent call last):
  File "construct_alignedstruct_bug.py", line 3, in <module>
    print s.parse('\0\1\2\3\4abcd')
  File "construct-2.06-py2.7.egg\construct\core.py", line 181, in parse
    return self.parse_stream(StringIO(data))
  File "construct-2.06-py2.7.egg\construct\core.py", line 191, in parse_stream
    return self._parse(stream, Container())
  File "construct-2.06-py2.7.egg\construct\core.py", line 645, in _parse
    subobj = sc._parse(stream, context)
  File "construct-2.06-py2.7.egg\construct\core.py", line 279, in _parse
    return self._decode(self.subcon._parse(stream, context), context)
  File "construct-2.06-py2.7.egg\construct\core.py", line 705, in _parse
    subobj = sc._parse(stream, context)
  File "construct-2.06-py2.7.egg\construct\core.py", line 279, in _parse
    return self._decode(self.subcon._parse(stream, context), context)
  File "construct-2.06-py2.7.egg\construct\core.py", line 388, in _parse
    return _read_stream(stream, self.lengthfunc(context))
  File "construct-2.06-py2.7.egg\construct\macros.py", line 359, in padlength
    return (modulus - (subcon._sizeof(ctx) % modulus)) % modulus
  File "construct-2.06-py2.7.egg\construct\core.py", line 268, in _sizeof
    return self.subcon._sizeof(context)
  File "construct-2.06-py2.7.egg\construct\core.py", line 268, in _sizeof
    return self.subcon._sizeof(context)
  File "construct-2.06-py2.7.egg\construct\core.py", line 668, in _sizeof
    return sum(sc._sizeof(context) for sc in self.subcons)
  File "construct-2.06-py2.7.egg\construct\core.py", line 668, in <genexpr>
    return sum(sc._sizeof(context) for sc in self.subcons)
  File "construct-2.06-py2.7.egg\construct\core.py", line 392, in _sizeof
    return self.lengthfunc(context)
  File "construct-2.06-py2.7.egg\construct\macros.py", line 533, in <lambda>
    Field("data", lambda ctx: ctx[length_field.name]),
  File "construct-2.06-py2.7.egg\construct\lib\container.py", line 35, in __getitem__
    return self.__dict__[name]
KeyError: 'length'

Similar error is show for this:

from construct import Aligned, Struct, ULInt16, PascalString
s = Struct('s', ULInt16('id'), Aligned(PascalString('txt')))
print s.parse('\0\1\2\3\4abcd')

problem with Optional construct

Hello,

according to the docs the Optional construct allows a default value, however i could not get that to work, was it removed ? any other way of doing this ? that would allow me to make an empty structure with default values, then parse it to fill in the checksum and other fields.

http://construct.readthedocs.org/en/latest/misc.html

Can't be installed in virtualenv

Trying to install Construct on virtualenv fails complaining about write permissions on Python system lib foder.

(txscada)[12:44:23] defo:pysmve git:(master*) $ which python
/home/defo/.virtualenvs/txscada/bin/python
(txscada)[12:44:29] defo:pysmve git:(master*) $ pip install Construct
Downloading/unpacking Construct
  Running setup.py egg_info for package Construct

Installing collected packages: Construct
  Running setup.py install for Construct
    error: could not create '/usr/lib/python2.7/site-packages/construct': Permission denied
    Complete output from command /usr/bin/python2.7 -c "import setuptools;__file__='/home/defo/.virtualenvs/txscada/build/Construct/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --single-version-externally-managed --record /tmp/pip-21yutH-record/install-record.txt --install-headers /home/defo/.virtualenvs/txscada/include/site/python2.7:
    running install

running build

running build_py

running install_lib

creating /usr/lib/python2.7/site-packages/construct

error: could not create '/usr/lib/python2.7/site-packages/construct': Permission denied

----------------------------------------
Command /usr/bin/python2.7 -c "import setuptools;__file__='/home/defo/.virtualenvs/txscada/build/Construct/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --single-version-externally-managed --record /tmp/pip-21yutH-record/install-record.txt --install-headers /home/defo/.virtualenvs/txscada/include/site/python2.7 failed with error code 1 in /home/defo/.virtualenvs/txscada/build/Construct
Storing complete log in /home/defo/.pip/pip.log

Unions should build as Selects

I'm trying to build a tool that goes from some fairly human readable data representation such as json into tightly packed byte arrays. Unions are great 1:many mappings of bytes -> structured data, but the implementation in Construct isn't very useful for many:1 mappings of structured data -> bytes. Take the below code for example:

from construct import *

class QuickObj(object):
    def __init__(self, **kwargs):
        self.__dict__.update(kwargs)

test_union = Union("test_union",
    Embed(Struct("foo",
        ULInt8("a"),
        ULInt8("b")
    )),
    Embed(Struct("bar",ULInt16("c")))
)

test_ok = QuickObj(a=1,b=2)
test_union.build(test_ok)
test_fail = QuickObj(c=513)
test_union.build(test_fail)

Because test_fail doesn't follow the "foo" struct, the .build() fails, which isn't expected behavior because test_fail is a valid form of the many possible structured data representations. This basically forces you to reproduce the union twice; once with construct's Union, and once in your python-side object or an adapter.

I think this could be solved by having Union().build behave somewhat like Select().build. Try each subcon in order until one works.

Pretty print is not functioning

Using the head of tree version.

I am running the example here: http://construct.readthedocs.org/en/latest/basics.html#nested

And the 'print x' does not look like output in document.

from construct import *

c = Struct("foo",
     UBInt8("a"),
     UBInt16("b"),
     Struct("bar",
         UBInt8("a"),
         UBInt16("b"),
     )
 )
x = c.parse("ABBabb")
x  
print x

CRC Validator

Hi everyone,

is there a way to build a CRC validator? Validators check the value of a field against, for example in the case of OneOf, a set of values. But when the Validator's _validate() method is called there's no knowledge of the byte stream, only of the Context.
So I don't think a Validator would be sufficient to build a CRC check, contrary to what's stated in the documentation: http://construct.readthedocs.org/en/latest/adapters.html#validating

Perhaps something like a Restream is needed, a Subconstruct that can buffer a number of bytes and return them for further parsing after validation. Any suggestion is welcome. Thank you!

Embedded constructs are not properly inserted into Structs

Here's a fairly minimal example in Python3.

from construct import *
b = b'\x02\x04\x04\x04\x04'
m = MetaArray(lambda ctx: ctx.bytes, Struct('foo', Byte('bar'), Byte('baz')))
Minimal = Debugger(Struct(None, Byte('bytes'), Embed(m)))
print(Minimal.parse(b))

This should probably output something like:

Container:
    bytes = 2
    bar = [
        4
        4
    ]
    baz =
        4
        4
    ]

Instead, it outputs:

Container:
    bytes = 2
    bar = 4
    baz = 4

The problems are in lines 666-676 of core.py.

            if sc.conflags & self.FLAG_EMBED:
                context["<obj>"] = obj
                sc._parse(stream, context)
            else:
                subobj = sc._parse(stream, context)
                if sc.name is not None:
                    if sc.name in obj and not self.allow_overwrite:
                        raise OverwriteError("%r would be overwritten but allow_overwrite is False" % (sc.name,))
                    obj[sc.name] = subobj
                    context[sc.name] = subobj
        return obj

First, line 668 calls sc._parse but doesn't do anything with the return value. Second, because everything after line 669 is enclosed in the else clause corresponding to the if clause sc.conflags & self.FLAG_EMBED, if self.FLAG_EMBED is set, obj is never changed before it's returned. If called directly on a Struct, the inner Struct's _parse method gets called and lines 658-660 set the object. In this case, it seems like the values in the outer Struct get written twice, the second overwriting the first.

Container: wrong print order

The order of the structure definition is not preserved when printing. Here is an example...

from construct import *

d = Struct("foo",
UBInt8("a"),
SLInt16("b"),
LFloat32("c"),
)

final = d.parse("\x07\x00\x01\x00\x00\x00\x01")

print final

Results in...
Container:
a = 7
c = 2.350988701644575e-38
b = 256

Error from "from construct import *"

Hello,

I grabbed the latest zip from today, installed it, and am getting an error just from trying to import all of construct:

Python 2.6.6 (r266:84292, Dec 27 2010, 00:02:40)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.

from construct import *
Traceback (most recent call last):
File "", line 1, in
AttributeError: 'module' object has no attribute 'GreedyRepeater'

Any help would be appreciated.

Aligned does not preserve the correct context of a nested subcon when calculating the size of the Padding

When wrapping a nested construct such as a Struct within an Aligned, the context passed to the padlength function that calculates the size of the Padding (and subsequently subcon._sizeof and all constructs contained within it) is the parent context, not the context of the construct. This results in most meta constructs failing since they are being passed the wrong context.

The following example, which tries to parse a word-aligned struct that consists of a variable-length field bar, prefixed by its length foo, makes the issue evident:

def barLength(ctx):
  print("Context of bar is", ctx)
  return ctx.foo

Test = Struct("test1",
  Aligned(
    Struct("test2",
      ULInt8("foo"),
      Field("bar", barLength)
    ),
    modulus=4
  )
)

print(Test.parse(b"\x02\xAB\xCD\x00"))

Output:

Context of bar is Container:
    foo = 2

Context of bar is Container:
    test2 = Container:
        foo = 2
        bar = b'\xab\xcd'

Traceback (most recent call last):
  File "C:\tools\python\lib\site-packages\construct\lib\container.py", line 33, in __getattr__
    return self[name]
KeyError: 'foo'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "test.py", line 26, in <module>
    Test.parse(b"\x02\xAB\xCD\x00")
  File "C:\tools\python\lib\site-packages\construct\core.py", line 188, in parse
    return self.parse_stream(BytesIO(data))
  File "C:\tools\python\lib\site-packages\construct\core.py", line 198, in parse_stream

    return self._parse(stream, Container())
  File "C:\tools\python\lib\site-packages\construct\core.py", line 670, in _parse
    subobj = sc._parse(stream, context)
  File "C:\tools\python\lib\site-packages\construct\core.py", line 288, in _parse
    return self._decode(self.subcon._parse(stream, context), context)
  File "C:\tools\python\lib\site-packages\construct\core.py", line 733, in _parse
    subobj = sc._parse(stream, context)
  File "C:\tools\python\lib\site-packages\construct\core.py", line 288, in _parse
    return self._decode(self.subcon._parse(stream, context), context)
  File "C:\tools\python\lib\site-packages\construct\core.py", line 398, in _parse
    return _read_stream(stream, self.lengthfunc(context))
  File "C:\tools\python\lib\site-packages\construct\macros.py", line 379, in padlength
    return (modulus - (subcon._sizeof(ctx) % modulus)) % modulus
  File "C:\tools\python\lib\site-packages\construct\core.py", line 695, in _sizeof
    return sum(sc._sizeof(context) for sc in self.subcons)
  File "C:\tools\python\lib\site-packages\construct\core.py", line 695, in <genexpr>
    return sum(sc._sizeof(context) for sc in self.subcons)
  File "C:\tools\python\lib\site-packages\construct\core.py", line 402, in _sizeof
    return self.lengthfunc(context)
  File "test.py", line 14, in barLength
    return ctx.foo
  File "C:\tools\python\lib\site-packages\construct\lib\container.py", line 35, in __getattr__
    raise AttributeError(name)
AttributeError: foo

As you can see, on the second call to barLength it is being passed the context of test1 instead of the context of test2.

Need insight: are fields required to have unique names?

We are working on issue #56 and I have a feeling that the reason our code fails is because each PascalString uses a subcon of same name which is "length". Are names required to be unique within an entire structure? I need someone with knowledge of internals. Ping @tomerfiliba

Struct.sizeof raises SizeofError with a dynamic Struct

I'm using construct version 2.5.0

From the documentation:

foo = Struct("foo",
...     Byte("length"),
...     Array(lambda ctx: ctx.length, UBInt16("data")),
... )

This raises SizeofError:

x=foo.parse('\x01\x02\x00')
foo.sizeof(x)

def bla():
print x

SizeofError: 'Container' object has no attribute 'length'

I was able to workaround this for a simple toplevel structure using the voodoo "advanced usage" nested flag, but this breaks when you really nest a structure.

foo = Struct("foo",
    Byte("length"),
    Array(lambda ctx: ctx.length, UBInt16("data")),
    Struct("bar",
        Byte("length"),
        Array(lambda ctx: ctx.length, UBInt16("data")),
    )
)

The problem is in the following code:
core.py:683: (Version 2.5.0)

    def _sizeof(self, context):
        if self.nested:
            context = Container(_ = context) <- This is completly wrong
        return sum(sc._sizeof(context) for sc in self.subcons)

As the container with a true nested structure actually looks like this:
Container:
bar = Container:
length = 1
data = [
768
]
length = 1
data = [
512
]

That is the correct code to do this:

def _sizeof(self, context):
    if self.nested:
        context = Container[self.name]
    return sum(sc._sizeof(context) for sc in self.subcons)

And now with the toplevel struct nested=False, parsing works for nested structures too

The problem remains that nested = True for a toplevel Struct

I can think of some ways to solve this problem:

One is to actually default nested to False and have the Struct class set it to True when it receives Struct classes in it's constructor, Plus: Since Sequence inherits from Struct, it will work with it's nested field too, Minus: This will hurt preformance when creating a Struct
The other option involves catching the KeyError and trying to use the same context as was given, which will break quite badly when the struct has a member with the same name as the struct.
The last option is to catch the KeyError and reraise it with an indicative message that tells the user to set nested=False for this case

I think this exists even in rather old versions of the library, I guess no one uses Struct.sizeof() :P

P.S: The reraise there loses the original traceback (I think there is a better way to reraise in python)

SyntaxErrors with Python 3

$ python3.2 -m compileall -q construct
*** Error compiling 'construct/formats/filesystem/fat16.py'...
  File "construct/formats/filesystem/fat16.py", line 168
    print "failed to read %s bytes at %s" % (
                                        ^
SyntaxError: invalid syntax

*** Error compiling 'construct/formats/graphics/gif.py'...
  File "construct/formats/graphics/gif.py", line 150
    print gif_file.parse(s)
                 ^
SyntaxError: invalid syntax

PascalString and max length

Hello,

I'm reading a binary file.
This file seems to use a sort of PascalString

See this hex:

09 53 65 62 61 73 74 69 65 6E 00 00 00 00 00 00 00 06 43 65 6C 6C 65 73 00 00 00 00 00 00 00 00 00 00 06 46 72 61 6E 63 65 00 00 00 00 00 00 00 00 00 00 06 46 2D 43 47 4E 4A 00 03 46 31 39

I should get

Sebastien (FirstName) -> 53 65 62 61 73 74 69 65 6E
Celles (ForeName) -> 43 65 6C 6C 65 73
France (Country) -> 46 72 61 6E 63 65
F-CGNJ (RN) -> 46 2D 43 47 4E 4A
F19 (CN) -> 46 31 39

09 is len("Sebastien")
06 is len("Celles")
06 is len("France")
06 is len("F-CGNJ")
03 is len("F19")

Unfortunately if I'm using a PascalString I can't define a max length for my strings.

In my case I'd like to be able to define

PascalString("FirstName", length_field=UBInt8('length'), max_length=16)

because

09 53 65 62 61 73 74 69 65 6E 00 00 00 00 00 00 00

is 17 bytes long.

It will be nice to have a max_length parameter add.

Kind regards

How to declare a bit padding field with '1' pattern?

def TestPadding(length):
   return PaddingAdapter(Field(None, length), pattern=chr(1), strict=False)

TestPadding(5) in a BitStruct context generates '0x01', not '0x1f'


def TestPadding(length):
   return PaddingAdapter(Field(None, length), pattern=chr(0xff), strict=False)

leads to a coding error:

  File "construct/lib/binary.py", line 166, in decode_bin
    chars[j] = _bin_to_char[data[i:i+8]]
KeyError: '\xff\xff\xff\xff\xff\xff\x00\x00'

I can't figure out the right syntax. Thanks.

varint support?

https://developers.google.com/protocol-buffers/docs/encoding

The latest version of the Minecraft protocol uses varints as of version 13w41b. They appear straightforward in the documentation, so it doesn't look too difficult to implement.

Please release 2.5.3

Last version on pypi is over 2 years old.

Please consider updating pypi.

Auto calculated size

Is there any way to automatically calculate the whole struct size?

For example, I have a struct like this:

int8 type; // header 
intX size; // header
int8[] data;

"type" can be anything from 0 to 2, in which

0 makes size be an int8
1 makes size be an int16
2 makes size be an int32

"size" will be calculated based on header length + data length

I can parse this without problems with a switch do switch between int8/16/32 types of "size" and an anchor to retrieve the current header length... data will be Bytes() using context to calculate header.size - header.length - not sure if this is the right way to do it

the problem comes when I try to build an object using this method

is there any way to make size calculate the whole struct size based on the whole struct size?

The current code I came up with to parse things:

Header = Struct("header",
    UBInt8("type"),
    Switch("size", lambda ctx: ctx.type,
    {
        0: UBInt8("size"),
        1: UBInt16("size"),
        2: UBInt32("size")
    }),
    Anchor("length")
)

HeaderData = Struct("header",
    Embed(Header),
    Bytes("data", lambda ctx: ctx.size - ctx.length)
)

Building HeaderData will result in an invalid buffer(header)

Architectural changes

I want to make some changes on the API level in the forseeable future so if you have something against it, speak your mind openly:

List is not up to date but close.

Of lesser importance:

Remove six completely
py3compat overhaul
Move prettystr into __str__
Unlimited lines, PEP8 is good but word wrap is better
Arguments being renamed, so length_field becomes lengthfield

Ping @tomerfiliba

KeyError printing a Container with a FlagsEnum field.

Calling print function over a Container with a FlagsEnum field results in a KeyError.

I could actually check that the parsing of the Container is correct.

The problem is in "print(MyContainer)" that in turn calls "Container.pretty_str()" and in turn calls "FlagsContainer.pretty_str()".

The exception is always at line 125 of container.py: "v = self.dict[k]". And it occurs because dict = {} and, obviously, trying to access any key "k" will result in an KeyError.

I may be doing something bad but, I have not touch my code and it used to work without errors with the previous version of "Construct v2.5.0".

it is not a serious error, but it is annoying.

And, as always...
Continue the great work!! Construct rules!! :)

Thanks.

construct / construct Goto Github PK

construct's Introduction

Construct 2.10

Example

construct's People

Contributors

Stargazers

Watchers

Forkers

construct's Issues

List is not up to date but close.

Recommend Projects

Recommend Topics

Recommend Org