dalemyers / deserialize Goto Github PK

View Code? Open in Web Editor NEW

24.0 5.0 4.0 371 KB

A Python deserialization tool

License: MIT License

Python 99.49% Shell 0.51%

deserialization python python3

deserialize's Introduction

deserialize

A library to make deserialization easy. To get started, just run pip install deserialize

How it used to be

Without the library, if you want to convert:

{
    "a": 1,
    "b": 2
}

into a dedicated class, you had to do something like this:

class MyThing:

    def __init__(self, a, b):
        self.a = a
        self.b = b

    @staticmethod
    def from_json(json_data):
        a_value = json_data.get("a")
        b_value = json_data.get("b")

        if a_value is None:
            raise Exception("'a' was None")
        elif b_value is None:
            raise Exception("'b' was None")
        elif type(a_value) != int:
            raise Exception("'a' was not an int")
        elif type(b_value) != int:
            raise Exception("'b' was not an int")

        return MyThing(a_value, b_value)

my_instance = MyThing.from_json(json_data)

How it is now

With deserialize all you need to do is this:

import deserialize

class MyThing:
    a: int
    b: int

my_instance = deserialize.deserialize(MyThing, json_data)

That's it. It will pull out all the data and set it for you type checking and even checking for null values.

If you want null values to be allowed though, that's easy too:

from typing import Optional

class MyThing:
    a: Optional[int]
    b: Optional[int]

Now None is a valid value for these.

Types can be nested as deep as you like. For example, this is perfectly valid:

class Actor:
    name: str
    age: int

class Episode:
    title: str
    identifier: st
    actors: List[Actor]

class Season:
    episodes: List[Episode]
    completed: bool

class TVShow:
    seasons: List[Season]
    creator: str

Advanced Usage

Custom Keys

It may be that you want to name your properties in your object something different to what is in the data. This can be for readability reasons, or because you have to (such as if your data item is named __class__). This can be handled too. Simply use the key annotation as follows:

@deserialize.key("identifier", "id")
class MyClass:
    value: int
    identifier: str

This will now assign the data with the key id to the field identifier. You can have multiple annotations to override multiple keys.

Auto Snake

Data will often come in with the keys either camelCased or PascalCased. Since Python uses snake_case as standard for members, this means that custom keys are often used to do the conversion. To make this easier, you can add the auto_snake decorator and it will do this conversion for you where it can.

@deserialize.auto_snake()
class MyClass:
    some_integer: int
    some_string: str

Now you can pass this data and it will automatically parse:

{
    "SomeInteger": 3,
    "SomeString": "Hello"
}

Note that all fields need to be snake cased if you use this decorator.

Unhandled Fields

Usually, if you don't specify the field in your definition, but it does exist in the data, you just want to ignore it. Sometimes however, you want to know if there is extra data. In this case, when calling deserialize(...) you can set throw_on_unhandled=True and it will raise an exception if any fields in the data are unhandled.

Additionally, sometimes you want this, but know of a particular field that can be ignored. You can mark these as allowed to be unhandled with the decorator @allow_unhandled("key_name").

Ignored Keys

You may want some properties in your object that aren't loaded from disk, but instead created some other way. To do this, use the ignore decorator. Here's an example:

@deserialize.ignore("identifier")
class MyClass:
    value: int
    identifier: str

When deserializing, the library will now ignore the identifier property.

Parsers

Sometimes you'll want something in your object in a format that the data isn't in. For example, if you get the data:

{
    "successful": True,
    "timestamp": 1543770752
}

You may want that to be represented as:

class Result:
    successful: bool
    timestamp: datetime.datetime

By default, it will fail on this deserialization as the value in the data is not a timestamp. To correct this, use the parser decorator to tell it a function to use to parse the data. E.g.

@deserialize.parser("timestamp", datetime.datetime.fromtimestamp)
class Result:
    successful: bool
    timestamp: datetime.datetime

This will now detect when handling the data for the key timestamp and run it through the parser function supplied before assigning it to your new class instance.

The parser is run before type checking is done. This means that if you had something like Optional[datetime.datetime], you should ensure your parser can handle the value being None. Your parser will obviously need to return the type that you have declared on the property in order to work.

Subclassing

Subclassing is supported. If you have a type Shape for example, which has a subclass Rectangle, any properties on Shape are supported if you try and decode some data into a `rectangle object.

Raw Storage

It can sometimes be useful to keep a reference to the raw data that was used to construct an object. To do this, simply set the raw_storage_mode paramater to RawStorageMode.ROOT or RawStorageMode.ALL. This will store the data in a parameter named __deserialize_raw__ on the root object, or on all objects in the tree respectively.

Defaults

Some data will come to you with fields missing. In these cases, a default is often known. To do this, simply decorate your class like this:

@deserialize.default("value", 0)
class IntResult:
    successful: bool
    value: int

If you pass in data like {"successful": True} this will deserialize to a default value of 0 for value. Note, that this would not deserialize since value is not optional: {"successful": True, "value": None}.

Post-processing

Not everything can be set on your data straight away. Some things need to be figured out afterwards. For this you need to do some post-processing. The easiest way to do this is through the @constructed decorator. This decorator takes a function which will be called whenever a new instance is constructed with that instance as an argument. Here's an example which converts polar coordinates from using degrees to radians:

data = {
    "angle": 180.0,
    "magnitude": 42.0
}

def convert_to_radians(instance):
    instance.angle = instance.angle * math.pi / 180

@deserialize.constructed(convert_to_radians)
class PolarCoordinate:
    angle: float
    magnitude: float

pc = deserialize.deserialize(PolarCoordinate, data)

print(pc.angle, pc.magnitude)

>>> 3.141592653589793 42.0

Downcasting

Data often comes in the form of having the type as a field in the data. This can be difficult to parse. For example:

data = [
    {
        "data_type": "foo",
        "foo_prop": "Hello World",
    },
    {
        "data_type": "bar",
        "bar_prop": "Goodbye World",
    }
]

Since the fields differ between the two, there's no good way of parsing this data. You could use optional fields on some base class, try multiple deserializations until you find the right one, or do the deserialization based on a mapping you build of the data_type field. None of those solutions are elegant though, and all have issues if the types are nested. Instead, you can use the downcast_field and downcast_identifier decorators.

downcast_field is specified on a base class and gives the name of the field that contains the type information. downcast_identifier takes in a base class and an identifier (which should be one of the possible values of the downcast_field from the base class). Internally, when a class with a downcast field is detected, the field will be extacted, and a subclass with a matching identifier will be searched for. If no such class exists, an UndefinedDowncastException will be thrown.

Here's an example which would handle the above data:

@deserialize.downcast_field("data_type")
class MyBase:
    type_name: str


@deserialize.downcast_identifier(MyBase, "foo")
class Foo(MyBase):
    foo_prop: str


@deserialize.downcast_identifier(MyBase, "bar")
class Bar(MyBase):
    bar_prop: str


result = deserialize.deserialize(List[MyBase], data)

Here, result[0] will be an instance of Foo and result[1] will be an instance of Bar.

If you can't describe all of your types, you can use @deserialize.allow_downcast_fallback on your base class and any unknowns will be left as dictionaries.

deserialize's People

Contributors

Stargazers

Watchers

Forkers

horstage iandriy zmxnv1 krzysztofsajko

deserialize's Issues

Add ability to specify defaults for missing items

The name of the property should be reported in errors

Without the name, it can be difficult to figure out which type is incorrect if it fails to deserialize.

Special case deserializing enum errors

Errors like this aren't helpful:

deserialize.exceptions.DeserializeException: Cannot deserialize '<class 'str'>' to '<enum 'PBXProductType'>' for 'PBXNativeTarget.product_type'

It should say what the value is for enums so we know what has to be added.

Unions between base types don't appear to work

Sample class definition:

class Thing:
    value: Union[int, float]

Sample data:

{
  "value": 4
}

Expected results:

The data deserializes correctly.

Actual results:

The type int doesn't match the type Union[int, float].

Add ability to flag on unhandled fields

Sometimes we want to ensure we've parsed everything and should flag on unhandled fields.

Add ability to store raw data

Attribute parser gets redefined by another subclass of common ancestor

When a decorator defines a parser for an attribute in a class then that definition is also applied to other subclasses of the same base class.

import decimal
from typing import Any

import attr
import deserialize


def _money_amount(value: Any):
    return decimal.Decimal(value).quantize(decimal.Decimal("0.01"), decimal.ROUND_HALF_UP) if value else None


@attr.s(auto_attribs=True)
@deserialize.parser("a", _money_amount)
class Base:
    a: decimal.Decimal


@attr.s(auto_attribs=True)
class Foo(Base):
    b: str


@attr.s(auto_attribs=True)
@deserialize.parser("b", _money_amount)
class Bar(Base):
    b: decimal.Decimal


def test_deserialize_base():
    deserialize.deserialize(Base, {"a": 1.23})


def test_deserialize_foo():
    deserialize.deserialize(Foo, {"a": 1.23, "b": "b"})


def test_deserialize_bar():
    deserialize.deserialize(Bar, {"a": 1.23, "b": 1.23})

test_deserlalize.py::test_deserialize_base PASSED
test_deserlalize.py::test_deserialize_bar PASSED

tests/test_deserlalize.py:32 (test_deserialize_foo)
def test_deserialize_foo():

  deserialize.deserialize(Foo, {"a": 1.23, "b": "b"})

test_deserlalize.py:34:

../../../opt/anaconda3/envs/fractal-python/lib/python3.9/site-packages/deserialize/init.py:93: in deserialize
return _deserialize(
../../../opt/anaconda3/envs/fractal-python/lib/python3.9/site-packages/deserialize/init.py:175: in _deserialize
_deserialize_dict(
../../../opt/anaconda3/envs/fractal-python/lib/python3.9/site-packages/deserialize/init.py:370: in _deserialize_dict
property_value = parser_function(value)

value = 'b'

def _money_amount(value: Any):

  return decimal.Decimal(value).quantize(decimal.Decimal("0.01"), decimal.ROUND_HALF_UP) if value else None

E decimal.InvalidOperation: [<class 'decimal.ConversionSyntax'>]

test_deserlalize.py:9: InvalidOperation

Inverse operations should be supported

We can convert a dictionary/list/whatever to objects, but we can't yet go back the other way. This could be extremely useful when doing things like working with APIs where requests need to be generated too.

null should be checked before the parser is run for optional types

Sample class definition:

@deserialize.parser("value", float)
class Thing:
    value: Optional[float]

Sample data:

{
  "value": null
}

Expected results:

We get an instance of Thing with value set to None.

Actual results:

We fail to deserialize since we try and run the parser before we assign None. Since the float() method doesn't work on None, it throws an exception.

Discussion:

I'm not actually 100% sure which way we should be doing this. If we parse first, it allows us to do things like set defaults, etc. when it is None. We can also easily fix the above by creating a simple wrapper around the float function to check for None first. I suspect that the existing implementation is the best option. This issue is for record keeping more than anything.

Subclasses should be supported

JSON compatible attribute names (camelCased)

Hi there.
Is there any way there can be added another argument like case_function which can be used to convert camelCase field names from JSON string to snake_case which will be used in setattr()?

For example if we have class definition like this

class Test:
    field_name: int

and the JSON string looks like this '{"fieldName": 0}'
and deserialize(Test, json.loads('{"fieldName": 0}'))
the current implementation will not be able to deserialize: deserialize.exceptions.DeserializeException: Unexpected missing value for: Test.field_name

Thanks!

what is the difference with json.loads ?

Hello,

I am very interested in your project, but I would like to understand what is the difference between deserialize and json.loads when using object_hook parameter. https://docs.python.org/3.8/library/json.html#json.loads

Also does it have to be a json necessary deserialize.deserialize(MyThing, json_data) or I can use any string ?

flat_path: str = "file://path_to_file"
myFile : File = deserialize.deserialize(File, flat_path)

I am looking for tool in order to convert any string to a python type (int, float, dict, str, uri, file, custom_format).

Thx

Enum support?

I haven't found a way to make it deserialize Enums. Am I missing anything or it is not simply supported?

Classes which already have an init method are not supported

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.