kaiba-tech / kaiba Goto Github PK
View Code? Open in Web Editor NEWKaiba is No-Code Configurable JSON data transformation
Home Page: https://app.kaiba.tech
License: MIT License
Kaiba is No-Code Configurable JSON data transformation
Home Page: https://app.kaiba.tech
License: MIT License
Returns is awesome for functional style programming, its just that its still breaking every minor release and I guess still not ready.
Removing returns should also make contributing easier.
In addition to normal config is it possible to produce the reverse to recreate the original input data
This probably have to be a separate tool that we can call with an existing config.
We need to figure out how to actually do typing.
Its of course a bit hard since the data we read are any datatype thats possible in JSON, but we should be able to do it better than it is done right now...
like examples and better descriptions
The thought is that we can make the mapping fail somehow if we are unable to assign data to an attribute or if an object is empty.
Some formats have interesting data in key names instead of in values.
For example JSONSchema.
{
"$id": "https://example.com/person.schema.json",
"$schema": "https://json-schema.org/draft/2020-12/schema",
"title": "Person",
"type": "object",
"properties": {
"firstName": {
"type": "string",
"description": "The person's first name."
},
"lastName": {
"type": "string",
"description": "The person's last name."
},
"age": {
"description": "Age in years which must be equal to or greater than zero.",
"type": "integer",
"minimum": 0
}
}
}
Every child object in "properties"
is a fields name. That means that it is interesting data.
We could argue that if its just a key that we would need to know the path to anyways, we don't need to be able to map it. Which is fair. But the properties
object could have any amount of children and it definitely feels more like an array than an object.
So what do we do?
Maybe we can support iterating over an objects children?
We also need to consider the reverse. What if we wanted to create this JSONSchema data. How should we go about mapping input data into a key name?
We now use pydantic and hopefully there are some markdown generation library for pydantic models so that we don't have to keep updating the reference code.
If not then lets use docstrings.
The file should also be renamed to schema reference or something like that.
I tried using kaiba
on Mac with blank project, no dependencies.
I received this issue
ImportError: cannot import name 'dataclass' from 'attr' (/Users/dc/Library/Caches/pypoetry/virtualenvs/whatever-project-py3.10/lib/python3.10/site-packages/attr.py)
It worked fine after installing attrs
- I initially installed attr
, not attrs
.
Do I always need to install this dependency? Can it be bundled with kaiba
?
I think Iterator is easier to understand.
I'm wondering if csv-json
xml-json
adapters should be in this package and part of the process...
This will then include json-csv
and json-xml
post adapters.
Need to decide what to do with strings and lists...
"test" > "bob" is False, since ord("t") < ord("b") or something like that. It doesn't really give any value i think. Most likely checking length for anything thats not a number is best.
Github actions stopped working after changing main branch to main
from master
I wonder why both of these works, but it should be fixed.
#definitions/iterable and #/definitions/attribute
"iterables": {
"type": "array",
"items": {
"$ref": "#definitions/iterable"
},
"default": []
},
"branching_attributes": {
"type": "array",
"items": {
"type": "array",
"items": {
"$ref": "#/definitions/attribute"
}
},
"default": []
}
I would guess that without aggregation possibilities on some key or set of keys that one would have to fall back to code when dealing with duplicated entries or need to handle unique values only.
What I believe Aggregation entails is for example a csv file like this:
customer_number;item;quantity
1;1;100
1;2;300
2;1;300
With the current state of kaiba one can only iterate over each line here and then create an object per line.
Having the possibility to aggregate on customer_number one would then be able to create 1 customer object per unique customer_number. I still think it makes the most sense to treat each aggregation as normal iterables where one will be able to create one item/quantity object per line. So something like this:
{
"customers": [
{
"customer_id": 1,
"items": [
{ "item_id": 1, "quantity" 100 },
{ "item_id": 2, "quantity" 300 },
]
},
{
"customer_id": 2,
"items": [
{ "item_id": 1, "quantity" 300 },
]
}
]
}
Not exactly sure how the implementation would be done, but I'm quite sure it should be possible.
I also wonder if there are any reasons to have multiple aggregation_keys
I'll look into this if there's a need, or whenever i get time.
I have a JSON file as a list. Is it possible to iterate and process it if there is no root?
[
{
"name": "Sam",
"age": 12
},
...
{
"name": "Bill",
"age": 45
}
]
I cannot find it anywhere covered, so asking in case I missed it. Maybe it is a bug in docs? 😅
mapping->data_fetcher
attribute.mappings->attribute.data_fetchers
regexp->regex
regexp.search->regex.expression
require_success will make sure mapping stops if for example casting fails. This can happen when input data is not expected type and type conversion is impossible. If statements can also fail this way when using in
or contains
.
Not sure how to show this to the rest of the world 🤷🏼
maybe some subreddit might be interested...
@thomasborgen, I believe this was not an initially intended usage of Kaiba. But is it possible to convert AVRO to JSON Schema or PROTOBUF to JSON schema?
Many tools are available in Python, Java, and other languages—for example, https://github.com/criccomini/twister.
All those tools are converting data to data. Is there a way to extend Kaiba to generate both data in JSON and JSON Schema?
Right now the interface to kaiba is in process.py
. The functions are called process
and process_raises
.
In the future we only want to bump major versions if we break anything that these interface functions returns or if we change their function signature(parameters).
I propose to move most of our code into private modules and in kaiba/__init__.py
we will expose only the two functions.
What do you think @ChameleonTartu ?
Mapper
or DataFetcher
I have an object that I want to decide to use or not to use, based on some fields. Field A as a decision field and Field B as a value field.
Is it possible to do if-statements on objects?
input.json
{
"companies": [
{
"name": "Sillicon Valley",
"country": "US"
},
...
{
"name": "Tech Hub",
"county": "Iceland"
}
]
}
output.json
if country == Iceland:
{
"companies": [
{
"name": "Tech Hub"
}
]
}
It would be cool to have a method process_as_bytes
where input type is byted JSON.
import requests
from kaiba.process import process_as_bytes
config = {
...
}
resp = requests.get('https://example.com')
process_as_bytes(resp.content, config)
It removes a need to cast to json to transform for me as a customer and makes code cleaner for me.
The current example might be moved to usecases where we talk about using kaiba for parsing chess games
Pydantic seems to be quite a good option for
reading our config input from json into typed objects that should probably be easy to work with.
validating that the json config input is in fact valid and it seems good at creating error messages.
The current json schema implementation does not handle errors as well as i hoped it would.
Pydantic is also used by FastAPI that we use to serve kaiba, which should mean that we should get super easy error handling and good messages.
When the root is an array there are currently no way to reference it.
I suggest adding something like _root_
or $root
that most likely will never be a key anyone is trying to find. To make it even more foolproof only the first value in an path
with length 1
will ever be checked for this keyword
Theres something weird with search
and I feel like expression
makes more sense.
It should be clear that we have a frontend to create configs even though its still in early development its functional.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.