Comments (3)
Our team discussed this today. Here are the solutions we discussed and the conclusion.
TL;DR
Conduit will make sure all schema names are unique by adding the connector ID and/or the pipeline ID to the name that a connector developer provides (which is going to be the collection name in most cases).
Long version:
Possible solutions
- Make every name unique
1a. Connector developers provide a name, that is "made unique" by Conduit by adding a prefix/suffix.
1b. Connector developers don't provide a name. - Use "namespaces" (each connector gets one)
Discussion:
-
Make every name unique
1a. Connector developers provide a name. Conduit "makes it unique" by adding a prefix/suffix.
Pros: makes the connector code a bit more clear (by showing what a schema is referring to)
Cons: The actual name is different. The original name is not valid anymore.1b. Connector developers don't provide a name. Conduit generates a random/unique name.
Pros: Simple implementation.
Cons: The schema registry internally is not well organized. This can done in a limited way by having structured names (e.g. pipeline ID + connector ID + schema name). The actual name is different. The original name is not valid anymore. -
Use "namespaces" (each connector gets one)
Confluent's SR has schema contexts. Works more or less like a prefix.Pros: intuitive way to organize schemas, easier cleanup
Cons: the franz-go client doesn't support contexts as "first class citizens". What CURRENTLY can be done is to change the base URL, but that would mean one client per connector. We might also want to change the client to support schema contexts.
Conclusion
We're choosing 1a for the following reasons:
- It's possible to dictate a structure of the IDs, which makes it easier to identify which schemas belong to which pipelines, which in turns makes cleanup easier.
- A connector developer's involvement is minimal.
While it does require some care on a connector developer's behalf, because the actual schema name is different, it's still not a big problem, because the parameter name and docs will call it out.
from conduit.
The mentioned solution relies on a connector being able to identify themselves (the combination of the pipeline/connector ID and the name that a developer provided guarantees schema subject uniqueness). Tokens can be used for that. Lovro wrote down some thoughts how to do that: #1701 (comment)
from conduit.
@lovromazgon and I were discussing the implementation of this. There are a few points:
- We're not quite happy with schema.Create() returning a schema with a different name, but there's no good solution.
- Eventually, we'd like to organizes schemas into contexts.
- There might be cases where a user (not necessarily a connector developer) might want to use an existing schema from an external schema registry. Prefixing the subject name with the connector ID always (as in the proposed solution) makes that impossible. So we're going to make that configurable.
- A user will be able to:
- specify a custom prefix
- use the default prefix (connector ID)
- use no prefix at all (subject name is exactly as in the connector code, e.g. the collection name)
- The above will be make possible through 2 configuration parameters:
- one to enable the prefix
- one to specify the prefix
- The default behavior is to use the connector ID as the prefix
- Now the plot twist: we'll use
context
instead ofprefix
since we plan to organize schemas into contexts in future.
from conduit.
Related Issues (20)
- Connector: Splunk [Destination]
- Connector: Surrealdb [Source/Destination] HOT 1
- Feature: consider a way to indicate what fields are sensitive HOT 2
- Setup Redshift Accounts for Integration Testing HOT 2
- Marketo Build error
- Upgrade Go and CI action in Connnectors HOT 1
- Feature: replace panic message "tried to run FaninNode without hooking the in channel up to another node" into a nicer error message HOT 3
- Feature: WASI Preview 2 Support HOT 8
- Speed up WASM processor tests
- [Schemas] Use configured schema service in Avro processors HOT 1
- Feature: add record operation `truncate`
- Set `types.WithBuiltinPlugin = true` when updating Postgres connector HOT 1
- Bug: error cause for degraded pipeline might not be correct
- [Schemas] Automatically clean up unused schemas HOT 1
- Bug: field.set processor can not set a new field at the .Payload level HOT 1
- Bug: pipeline.index J.randomUUID is not a function HOT 12
- Feature: conduit_connector_records metric HOT 1
- [Schemas] Decode raw data in records with schemas before passing them to processors
- Login to Buf registry in CI action
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from conduit.