Git Product home page Git Product logo

kaiba's People

Contributors

chameleontartu avatar dependabot[bot] avatar sietsevandermolen avatar thomasborgen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

udhayamlm

kaiba's Issues

Remove Returns dependency

Returns is awesome for functional style programming, its just that its still breaking every minor release and I guess still not ready.

Removing returns should also make contributing easier.

Reversability?

In addition to normal config is it possible to produce the reverse to recreate the original input data
This probably have to be a separate tool that we can call with an existing config.

Rewrite Typing

We need to figure out how to actually do typing.

Its of course a bit hard since the data we read are any datatype thats possible in JSON, but we should be able to do it better than it is done right now...

Figure out how to handle data in keys instead of values

Some formats have interesting data in key names instead of in values.

For example JSONSchema.

{
  "$id": "https://example.com/person.schema.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Person",
  "type": "object",
  "properties": {
    "firstName": {
      "type": "string",
      "description": "The person's first name."
    },
    "lastName": {
      "type": "string",
      "description": "The person's last name."
    },
    "age": {
      "description": "Age in years which must be equal to or greater than zero.",
      "type": "integer",
      "minimum": 0
    }
  }
}

Every child object in "properties" is a fields name. That means that it is interesting data.

We could argue that if its just a key that we would need to know the path to anyways, we don't need to be able to map it. Which is fair. But the properties object could have any amount of children and it definitely feels more like an array than an object.

So what do we do?

Maybe we can support iterating over an objects children?

We also need to consider the reverse. What if we wanted to create this JSONSchema data. How should we go about mapping input data into a key name?

Fix docs/configuration.md

We now use pydantic and hopefully there are some markdown generation library for pydantic models so that we don't have to keep updating the reference code.

If not then lets use docstrings.

The file should also be renamed to schema reference or something like that.

Do I always have to have attrs to run kaiba?

I tried using kaiba on Mac with blank project, no dependencies.

I received this issue

ImportError: cannot import name 'dataclass' from 'attr' (/Users/dc/Library/Caches/pypoetry/virtualenvs/whatever-project-py3.10/lib/python3.10/site-packages/attr.py)

It worked fine after installing attrs - I initially installed attr, not attrs.

Do I always need to install this dependency? Can it be bundled with kaiba?

Add greater/less than conditions in if statements

Need to decide what to do with strings and lists...

"test" > "bob" is False, since ord("t") < ord("b") or something like that. It doesn't really give any value i think. Most likely checking length for anything thats not a number is best.

#definitions vs #/definitions in schema.json

I wonder why both of these works, but it should be fixed.

#definitions/iterable and #/definitions/attribute

"iterables": {
    "type": "array",
    "items": {
        "$ref": "#definitions/iterable"
    },
    "default": []
},
"branching_attributes": {
    "type": "array",
    "items": {
        "type": "array",
        "items": {
            "$ref": "#/definitions/attribute"
        }
    },
    "default": []
}

Aggregation on keys

I would guess that without aggregation possibilities on some key or set of keys that one would have to fall back to code when dealing with duplicated entries or need to handle unique values only.

What I believe Aggregation entails is for example a csv file like this:

customer_number;item;quantity
1;1;100
1;2;300
2;1;300

With the current state of kaiba one can only iterate over each line here and then create an object per line.

Having the possibility to aggregate on customer_number one would then be able to create 1 customer object per unique customer_number. I still think it makes the most sense to treat each aggregation as normal iterables where one will be able to create one item/quantity object per line. So something like this:

{
  "customers": [
    {
      "customer_id": 1,
      "items": [
        { "item_id": 1, "quantity" 100 },
        { "item_id": 2, "quantity" 300 },
      ]
    },
    {
      "customer_id": 2,
      "items": [
        { "item_id": 1, "quantity" 300 },
      ]
    }
  ]
}

Not exactly sure how the implementation would be done, but I'm quite sure it should be possible.

I also wonder if there are any reasons to have multiple aggregation_keys

I'll look into this if there's a need, or whenever i get time.

Is it possible to iterate over data without a root?

I have a JSON file as a list. Is it possible to iterate and process it if there is no root?

[
  {
    "name": "Sam",
    "age": 12
  },
  ...
  {
    "name": "Bill",
    "age": 45
  }
]

I cannot find it anywhere covered, so asking in case I missed it. Maybe it is a bug in docs? 😅

Add require_success to functions like casting and regex

require_success will make sure mapping stops if for example casting fails. This can happen when input data is not expected type and type conversion is impossible. If statements can also fail this way when using in or contains.

Can Kaiba be used as a converter from different formats such as AVRO and PROTOBUF to JSON and JSON Schema?

@thomasborgen, I believe this was not an initially intended usage of Kaiba. But is it possible to convert AVRO to JSON Schema or PROTOBUF to JSON schema?

Many tools are available in Python, Java, and other languages—for example, https://github.com/criccomini/twister.

All those tools are converting data to data. Is there a way to extend Kaiba to generate both data in JSON and JSON Schema?

Clarify the interface to kaiba.

Right now the interface to kaiba is in process.py. The functions are called process and process_raises.

In the future we only want to bump major versions if we break anything that these interface functions returns or if we change their function signature(parameters).

I propose to move most of our code into private modules and in kaiba/__init__.py we will expose only the two functions.

What do you think @ChameleonTartu ?

Is it possible to use if-statement on objects, not values?

I have an object that I want to decide to use or not to use, based on some fields. Field A as a decision field and Field B as a value field.

Is it possible to do if-statements on objects?

input.json

{
  "companies": [
    {
      "name": "Sillicon Valley",
      "country": "US"
    },
...
    {
      "name": "Tech Hub",
      "county": "Iceland"
    }
  ]
}

output.json

if country == Iceland:

{
  "companies": [
    {
      "name": "Tech Hub"
    }
  ]
}

Missing method process_as_bytes

It would be cool to have a method process_as_bytes where input type is byted JSON.

import requests
from kaiba.process import process_as_bytes

config = {
...
}

resp = requests.get('https://example.com')
process_as_bytes(resp.content, config)

It removes a need to cast to json to transform for me as a customer and makes code cleaner for me.

Migrate to Pydantic for config models and validation

Pydantic seems to be quite a good option for
reading our config input from json into typed objects that should probably be easy to work with.
validating that the json config input is in fact valid and it seems good at creating error messages.

The current json schema implementation does not handle errors as well as i hoped it would.

Pydantic is also used by FastAPI that we use to serve kaiba, which should mean that we should get super easy error handling and good messages.

Add keyword to iterators to reference an array root

When the root is an array there are currently no way to reference it.

I suggest adding something like _root_ or $root that most likely will never be a key anyone is trying to find. To make it even more foolproof only the first value in an path with length 1 will ever be checked for this keyword

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.