Comments (4)
Hi again @ChameleonTartu
I did a test where i changed a avro schema into a json schema with kaiba using the Kaiba App and it worked.
However, it made it clear that we have a limitation in kaiba. We are unable to transform values into keys. for example in avro a field is defined like this:
{"name": "field_name", "type": "string"}
But in JSONSchema a field's name is its key as in:
{
"properties": {
"field_name": {
"type": "string"
}
}
}
I think this is something that we should look into supporting since it could be very powerful. This would include getting a keys name instead of its value and also maybe extending the kaiba object to make it possible to have a dynamic name.
For Protobuf, since its not JSON data we can't map directly to it. We can only map to the correct structure and let a post-processor handle the dump from json to protobuf.
Here is how I changed the avro schema into a jsonschema:
Given this Avro schema
{
"type": "record",
"namespace": "Tutorialspoint",
"name": "Employee",
"fields": [
{"name": "Name", "type": "string"},
{"name": "Age", "type": "int"}
]
}
And this kaiba config
{
"name": "root",
"array": false,
"iterators": [],
"attributes": [
{
"name": "title",
"default": "Employee"
},
{
"name": "type",
"default": "object"
}
],
"objects": [
{
"name": "properties",
"array": false,
"objects": [
{
"name": "Name",
"array": false,
"iterators": [],
"attributes": [
{
"name": "type",
"data_fetchers": [
{
"path": ["fields", 0, "type"]
}
]
}
]
},
{
"name": "Age",
"array": false,
"iterators": [],
"attributes": [
{
"name": "type",
"data_fetchers": [
{
"path": ["fields", 1, "type"]
}
]
}
]
}
]
}
]
}
You can produce this:
{
"title": "Employee",
"type": "object",
"properties": {
"Name": {
"type": "string"
},
"Age": {
"type": "int"
}
}
}
from kaiba.
Hi @ChameleonTartu Thanks for the question :)
After a quick look into avro it seems to me that its a format used for transferring data quickly. It contains a schema, which the avro writer needs to validate incoming data it is going to write. if its valid, the avro writer now encodes the schema + data into a binary blob. Any avro reader can now read and decode the data properly because the schema is included in the blob.
From what I understand. Kaiba could be used both in front before data injection into avro to make arbitrary data conform with what avro expects. or behind, after the avro reader has read the data and output some json to turn it into a more desired format.
My initial thinking is that I dont think Kaiba should need to handle the schema part of avro. But I'll give this more thought.
I've been contemplating adding some pre/post processors directly into kaiba core. But i'm not sure if its the right place.
I've also just checked protobuf quickly right now and I was wondering if you could explain a bit more about the usecase. Is your usecase to change the .proto
schema data into a JSONSchema
decleration? Or is it again to change the data before injection and after reading?
from kaiba.
@thomasborgen I will bring a bit of context as my use cases are from Data Engineering and working with Apache Kafka, not from the integration space where we have used Kaiba.
CONTEXT:
In a broad sense, Apache Kafka is a Data Bus where you dump your data (produce) and read it after (consume).
Kafka doesn't care about intake; it knows and stores bytes. It has topics that are practically different channels/queues where you put your messages for separation.
A reader or writer of Kafka needs an extra Schema Registry. Schema registry stores schemas in AVRO, JSON (JSON Schema), or PROTOBUF formats. The algorithm that readers and writers follow will be:
Writer:
- Create schema
- Store schema in Schema registry
- Generate data based on the schema
- Send to Kafka
Reader:
- Retrieve schema
- Validate incoming data against the schema
- Process further
The common part, independently of the schema format, is
schema = read_schema()
validate(message, schema)
PROBLEM:
There is no simple way to change the schema, so I cannot take AVRO to convert to JSON Schema or JSON Schema to PROTOBUF.
The expected behavior will be to convert schema to schema with no pain:
AVRO <-> JSON
PROTOBUF <-> JSON
The reason for this use-case to exist and correlation with Kaiba:
1. Read messages from topic ABC. Message in JSON format
2. Enrich or trim the messages and post in topic XYZ. A message should be in AVRO format. (Kaiba-related)
3. Make a decision and post to ATP, and the message should be in PROTOBUF format.
Enrichment or data manipulation is truly Kaiba's existence story, but integration between formats must be solved.
As I wrote before, JSON to AVRO and JSON to PROTOBUF are partially industry-solved, while conversion between JSON Schema and other schemas is a widely open question that, to my knowledge, still needs to be solved.
QUESTIONS:
Is it kaiba-related? Yes, partially because Kaiba is great at manipulating data based on the schema.
Do you think this particular request should go to Kaiba-core? Not necessarily, it can go to Kaiba eco-system and help to promote it in Data Engineering niche.
I am open to discussion and ready to contribute to this branch of the project as I see a great need for it myself.
from kaiba.
@thomasborgen This works, the only issue it doesn't do any magic, it is very manual based and it requires understanding of both formats very well and kaiba itself.
Even though, I think this is a great solution, so I include it in the manual of transfomation AVRO to JSON.
Do I understand correctly that there is no "reverse" transformation availble, yet?
from kaiba.
Related Issues (20)
- Clarify the interface to kaiba. HOT 1
- Figure out how to handle data in keys instead of values HOT 2
- Move pydantic models to own folder and create kaiba basemodel
- Add more metadata to pydantic models
- Should `casting` be applied after `default` value
- update docs with new model names
- Fix docs/configuration.md
- Add keyword to iterators to reference an array root
- Add greater/less than conditions in if statements
- Add required keyword to config objects and attributes
- Add require_success to functions like casting and regex
- Make something to showcase Kaiba... dev.to, medium post?
- Aggregation on keys
- Remove Returns dependency
- Github actions stopped working
- Do I always have to have attrs to run kaiba? HOT 1
- Missing method process_as_bytes HOT 1
- Is it possible to iterate over data without a root? HOT 1
- Is it possible to use if-statement on objects, not values? HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kaiba.