kaiba-tech / kaiba Goto Github PK

View Code? Open in Web Editor NEW

10.0 5.0 1.0 1.48 MB

Kaiba is No-Code Configurable JSON data transformation

Home Page: https://app.kaiba.tech

License: MIT License

Python 100.00%

json transformation mapping mapping-tools configurable kaiba data-science

kaiba's People

Contributors

Stargazers

Watchers

Forkers

udhayamlm

kaiba's Issues

Remove Returns dependency

Returns is awesome for functional style programming, its just that its still breaking every minor release and I guess still not ready.

Removing returns should also make contributing easier.

Reversability?

In addition to normal config is it possible to produce the reverse to recreate the original input data
This probably have to be a separate tool that we can call with an existing config.

Rewrite Typing

We need to figure out how to actually do typing.

Its of course a bit hard since the data we read are any datatype thats possible in JSON, but we should be able to do it better than it is done right now...

Add more metadata to pydantic models

like examples and better descriptions

Add required keyword to config objects and attributes

The thought is that we can make the mapping fail somehow if we are unable to assign data to an attribute or if an object is empty.

Figure out how to handle data in keys instead of values

Some formats have interesting data in key names instead of in values.

For example JSONSchema.

{
  "$id": "https://example.com/person.schema.json",
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "Person",
  "type": "object",
  "properties": {
    "firstName": {
      "type": "string",
      "description": "The person's first name."
    },
    "lastName": {
      "type": "string",
      "description": "The person's last name."
    },
    "age": {
      "description": "Age in years which must be equal to or greater than zero.",
      "type": "integer",
      "minimum": 0
    }
  }
}

Every child object in "properties" is a fields name. That means that it is interesting data.

We could argue that if its just a key that we would need to know the path to anyways, we don't need to be able to map it. Which is fair. But the properties object could have any amount of children and it definitely feels more like an array than an object.

So what do we do?

Maybe we can support iterating over an objects children?

We also need to consider the reverse. What if we wanted to create this JSONSchema data. How should we go about mapping input data into a key name?

Rebrand piri to kaiba

Fix docs/configuration.md

We now use pydantic and hopefully there are some markdown generation library for pydantic models so that we don't have to keep updating the reference code.

If not then lets use docstrings.

The file should also be renamed to schema reference or something like that.

Do I always have to have attrs to run kaiba?

I tried using kaiba on Mac with blank project, no dependencies.

I received this issue

ImportError: cannot import name 'dataclass' from 'attr' (/Users/dc/Library/Caches/pypoetry/virtualenvs/whatever-project-py3.10/lib/python3.10/site-packages/attr.py)

It worked fine after installing attrs - I initially installed attr, not attrs.

Do I always need to install this dependency? Can it be bundled with kaiba?

Rename Iterable -> Iterator

I think Iterator is easier to understand.

Should we bring adapters into this package

I'm wondering if csv-json xml-json adapters should be in this package and part of the process...

This will then include json-csv and json-xml post adapters.

Add greater/less than conditions in if statements

Need to decide what to do with strings and lists...

"test" > "bob" is False, since ord("t") < ord("b") or something like that. It doesn't really give any value i think. Most likely checking length for anything thats not a number is best.

Github actions stopped working

Github actions stopped working after changing main branch to main from master

#definitions vs #/definitions in schema.json

I wonder why both of these works, but it should be fixed.

#definitions/iterable and #/definitions/attribute

"iterables": {
    "type": "array",
    "items": {
        "$ref": "#definitions/iterable"
    },
    "default": []
},
"branching_attributes": {
    "type": "array",
    "items": {
        "type": "array",
        "items": {
            "$ref": "#/definitions/attribute"
        }
    },
    "default": []
}

Aggregation on keys

I would guess that without aggregation possibilities on some key or set of keys that one would have to fall back to code when dealing with duplicated entries or need to handle unique values only.

What I believe Aggregation entails is for example a csv file like this:

customer_number;item;quantity
1;1;100
1;2;300
2;1;300

With the current state of kaiba one can only iterate over each line here and then create an object per line.

Having the possibility to aggregate on customer_number one would then be able to create 1 customer object per unique customer_number. I still think it makes the most sense to treat each aggregation as normal iterables where one will be able to create one item/quantity object per line. So something like this:

{
  "customers": [
    {
      "customer_id": 1,
      "items": [
        { "item_id": 1, "quantity" 100 },
        { "item_id": 2, "quantity" 300 },
      ]
    },
    {
      "customer_id": 2,
      "items": [
        { "item_id": 1, "quantity" 300 },
      ]
    }
  ]
}

Not exactly sure how the implementation would be done, but I'm quite sure it should be possible.

I also wonder if there are any reasons to have multiple aggregation_keys

I'll look into this if there's a need, or whenever i get time.

Is it possible to iterate over data without a root?

I have a JSON file as a list. Is it possible to iterate and process it if there is no root?

[
  {
    "name": "Sam",
    "age": 12
  },
  ...
  {
    "name": "Bill",
    "age": 45
  }
]

I cannot find it anywhere covered, so asking in case I missed it. Maybe it is a bug in docs? 😅

update docs with new model names

mapping->data_fetcher
attribute.mappings->attribute.data_fetchers
regexp->regex
regexp.search->regex.expression

Add require_success to functions like casting and regex

require_success will make sure mapping stops if for example casting fails. This can happen when input data is not expected type and type conversion is impossible. If statements can also fail this way when using in or contains.

Move pydantic models to own folder and create kaiba basemodel

Make something to showcase Kaiba... dev.to, medium post?

Not sure how to show this to the rest of the world 🤷🏼

maybe some subreddit might be interested...

Can Kaiba be used as a converter from different formats such as AVRO and PROTOBUF to JSON and JSON Schema?

@thomasborgen, I believe this was not an initially intended usage of Kaiba. But is it possible to convert AVRO to JSON Schema or PROTOBUF to JSON schema?

Many tools are available in Python, Java, and other languages—for example, https://github.com/criccomini/twister.

All those tools are converting data to data. Is there a way to extend Kaiba to generate both data in JSON and JSON Schema?

Clarify the interface to kaiba.

Right now the interface to kaiba is in process.py. The functions are called process and process_raises.

In the future we only want to bump major versions if we break anything that these interface functions returns or if we change their function signature(parameters).

I propose to move most of our code into private modules and in kaiba/__init__.py we will expose only the two functions.

What do you think @ChameleonTartu ?

Rename Mapping

Mapper or DataFetcher

Is it possible to use if-statement on objects, not values?

I have an object that I want to decide to use or not to use, based on some fields. Field A as a decision field and Field B as a value field.

Is it possible to do if-statements on objects?

input.json

{
  "companies": [
    {
      "name": "Sillicon Valley",
      "country": "US"
    },
...
    {
      "name": "Tech Hub",
      "county": "Iceland"
    }
  ]
}

output.json

if country == Iceland:

{
  "companies": [
    {
      "name": "Tech Hub"
    }
  ]
}

Missing method process_as_bytes

It would be cool to have a method process_as_bytes where input type is byted JSON.

import requests
from kaiba.process import process_as_bytes

config = {
...
}

resp = requests.get('https://example.com')
process_as_bytes(resp.content, config)

It removes a need to cast to json to transform for me as a customer and makes code cleaner for me.

Regex example in docs is quite hard to follow, make a simple example

The current example might be moved to usecases where we talk about using kaiba for parsing chess games

Migrate to Pydantic for config models and validation

Pydantic seems to be quite a good option for
reading our config input from json into typed objects that should probably be easy to work with.
validating that the json config input is in fact valid and it seems good at creating error messages.

The current json schema implementation does not handle errors as well as i hoped it would.

Pydantic is also used by FastAPI that we use to serve kaiba, which should mean that we should get super easy error handling and good messages.