airbytehq / airbyte Goto Github PK

The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.

Home Page: https://airbyte.com

License: Other

Shell 0.33% Dockerfile 0.10% Java 13.98% HTML 0.01% CSS 0.05% JavaScript 0.41% Python 67.84% Handlebars 0.13% Makefile 0.01% PLpgSQL 0.10% TSQL 0.02% Jinja 0.12% PLSQL 0.04% Kotlin 16.86%

data pipeline data-analysis data-engineering java python etl change-data-capture data-collection data-integration

airbyte's Introduction

Data integration platform for ELT pipelines from APIs, databases & files to databases, warehouses & lakes

We believe that only an open-source solution to data movement can cover the long tail of data sources while empowering data engineers to customize existing connectors. Our ultimate vision is to help you move data from any source to any destination. Airbyte already provides the largest catalog of 300+ connectors for APIs, databases, data warehouses, and data lakes.

Screenshot taken from Airbyte Cloud.

Getting Started

Deploy Airbyte Open Source or set up Airbyte Cloud to start centralizing your data.
Create connectors in minutes with our no-code Connector Builder or low-code CDK.
Explore popular use cases in our tutorials.
Orchestrate Airbyte syncs with Airflow, Prefect, Dagster, Kestra or the Airbyte API.

Try it out yourself with our demo app, visit our full documentation and learn more about recent announcements. See our registry for a full list of connectors already available in Airbyte or Airbyte Cloud.

Join the Airbyte Community

The Airbyte community can be found in the Airbyte Community Slack, where you can ask questions and voice ideas. You can also ask for help in our Airbyte Forum, or join our Office Hours. Airbyte's roadmap is publicly viewable on GitHub.

For videos and blogs on data engineering and building your data stack, check out Airbyte's Content Hub, Youtube, and sign up for our newsletter.

Contributing

If you've found a problem with Airbyte, please open a GitHub issue. To contribute to Airbyte and see our Code of Conduct, please see the contributing guide. We have a list of good first issues that contain bugs that have a relatively limited scope. This is a great place to get started, gain experience, and get familiar with our contribution process.

Security

Airbyte takes security issues very seriously. Please do not file GitHub issues or post on our public forum for security vulnerabilities. Email [email protected] if you believe you have uncovered a vulnerability. In the message, try to provide a description of the issue and ideally a way of reproducing it. The security team will get back to you as soon as possible.

Airbyte Enterprise also offers additional security features (among others) on top of Airbyte Open Source.

License

See the LICENSE file for licensing information, and our FAQ for any questions you may have on that topic.

Thank You

Airbyte would not be possible without the support and assistance of other open-source tools and companies! Visit our thank you page to learn more about how we build Airbyte.

airbyte's People

Contributors

Stargazers

Watchers

Forkers

addack ghalaax yarrrtem joshkuntz xi-osman vikasyadav15 rberrelleza alihaskar zkan christopheduong dashworks-live gasparakos keberox agilee smashingeric winar-jin eugene-kulak fagan2888 patilvikram upnrunnhq mateuszklimek erikjansenirefact mervatkheir anujsrc satackey vitaliizazmic v2kk psiva2020 anurag870 roshan gordalina georgesotiropoulos bloom-invest amitku dominuskelvin tgiardina jknaresh moszutij technologyinstitute abhi9321 hudsondba muriloo emrul nowi ntucker datatips ns-admetrics liveauctioneers ceohockey60 laashub-soa em3ndez mbrukman neuroradiology mmangione apurbad rebootcode minjamiladinovic yarkfu weisk any0503 xiaoxiao-jar shuangyifan monkeyfx ba370921 zuodh rogervaas geekhuyang folkevil jeesim2 benqian swipswaps cybernetics brunotech herbertgoto chomman zhangli344236745 dualeh kunaldawn parkman328 woorigel vivicai fcrimins mchorfa fxghqc rheehot pks-os venkat-oss site-command rynnaethelwulf mashey tomslutsky marcelomata u-learn yeyufengguo gongdexing wangqu mu-l dwtcourses jet-admin freedomme

airbyte's Issues

Fix CDP configuration in Gradle

Expected Behavior

CDP shouldn't throw warning

Current Behavior

CDP throws of warnings:

WARNING: Due to the absence of 'LifecycleBasePlugin' on project ':dataline-analytics' the task ':cpdCheck' could not be added to task graph. Therefore CPD will not be executed. To prevent this, manually add a task dependency of ':cpdCheck' to a 'check' task of a subproject.
1) Directly to project ':dataline-integrations:singer:bigquery:destination':
    check.dependsOn(':cpdCheck')
2) Indirectly, e.g. via project ':dataline-analytics':
    project(':dataline-integrations:singer:bigquery:destination') {
        plugins.withType(LifecycleBasePlugin) { // <- just required if 'java' plugin is applied within subproject
            check.dependsOn(cpdCheck)
        }
    }

Steps to Reproduce

Just run build within a subproject: ./gradlew :dataline-server:build

Severity of the bug for you

Very low

Add end-to-end tests

Testing please ignore

Expected Behavior

Tell us what should happen.

Current Behavior

Tell us what happens instead of the expected behavior.

Steps to Reproduce

Severity of the bug for you

Very low / Low / Medium / High / Critical

Additional context

Environment, version, integration...

Source detail page - Settings

The user should be able to edit some settings for each source: https://www.figma.com/file/AODY4oi15iI5l0fbxHvobt/Daxtarity?node-id=5782%3A37808

The user should NOT be able to change the connector, but should be able to for the rest.
So the connector input should be displayed as disabled.

move JsonSchemaValidator into its own module

Tell us about the problem you're trying to solve

This class depends on no commons-like libraries. Right now it is in commons so that it can be shared with the worker, but that is not idea because it injects these dependencies into all modules that use commons. Ideally we'd like this class split out into its own module.

Prototype process for integration publishing

Tell us about the problem you're trying to solve

I want a process for publishing integrations with versioning

Pass only needed information in `StandardTapConfig`

Tell us about the problem you're trying to solve

StandardTapConfig currently contains a lot of "wide" data types which contain a lot more data than is needed to pull data from a Singer tap. This makes writing tests more tedious than is necessary and makes it harder to understand which inputs impact behavior.

Describe the solution you’d like

Create new types to capture exactly the information needed for taps, and remove any extra cruft.

Destination section

The destination section should let the user edit the configuration of the destination.
But the user shouldn't be able to edit the connector type for this v1.0:
https://www.figma.com/file/AODY4oi15iI5l0fbxHvobt/Daxtarity?node-id=5782%3A38729

In the future, we might support multiple destinations, so this will change.

@cgardens

API doesn't fail on incorrect params are passed.

Expected Behavior

the API should throw an exception when it receives a request body for a route that does not match what is defined in swagger.

Current Behavior

the API just tries to process the incomplete body.

Steps to Reproduce

curl -H "Content-Type: application/json" -X POST localhost:8001/api/v1/workspaces/update -d '{ "anonymousDataCollection": false, "email": "[email protected]", "news": false, "securityUpdates": true, "workspaceId": "5ae6b09b-fdec-41af-aaf7-7d94cfc33ef6" }' note: initialSetupComplete field is missing.

Create sync worker interface that isolates tap and target

Create sync worker interface that isolates tap and target so that there is a clear interface against which one can implement an integration.

Unit Testing for Workers

Clean up API exceptions handling

Tell us about the problem you're trying to solve

The exception story in the API needs to be improve to ensure we send the proper HTTP responses.
eg:

validation can throw configNotFound

Adding new source

When in "Sources", user can click on "+ New source" in the header. This should start the adding new source flow, which is very similar to the onboarding (apart from the step of adding a destination, which doesn't exist here):
https://www.figma.com/file/AODY4oi15iI5l0fbxHvobt/Daxtarity?node-id=5782%3A45752

After adding a new source, the user should be redirected to the "Sources" section.

@cgardens

Validate Inputs for connection/create

Expected Behavior

The API should throw an exception if either the sourceImplementationId or the destinationImplementationId provided to connection/create do not exist. Currently, it will save the configuration without complaint.

┆Issue is synchronized with this Asana task by Unito

Delete source

When user clicks on delete source in the source settings page, a confirmation popup should be displayed:
https://www.figma.com/file/AODY4oi15iI5l0fbxHvobt/Daxtarity?node-id=5913%3A398

@cgardens

Speed up image building

Tell us about the problem you're trying to solve

Building all application images takes ~10minutes and is not incremental on dev environment.
It is needs to go faster

Onboarding - set up the connection with frequency

In onboarding, after setting up the destination, the user should be asked to select a frequency to the syncing:
https://www.figma.com/file/AODY4oi15iI5l0fbxHvobt/Daxtarity?node-id=5782%3A45365

Here are the values of frequency we offer in v1.0:
manual, 5 min, 15 min, 30 min, 1 hour, 2 hours, 3 hours, 6 hours, 8 hours, 12 hours, 24 hours

User should also be able to go back to previous steps by clicking in tabs of header or source/destination in summary cell

@cgardens

Source detail page - Schema tab

User should be able to specify the data they want to sync from the source, that's what the Schema tab is for:
https://www.figma.com/file/AODY4oi15iI5l0fbxHvobt/Daxtarity?node-id=5782%3A38004

Each connector should have some default settings, specified by the API. But the user should be able to edit and save changes: https://www.figma.com/file/AODY4oi15iI5l0fbxHvobt/Daxtarity?node-id=5913%3A12

@cgardens

Simplify Schedule persistence

Tell us about the problem you're trying to solve

StandardSchedule lifecycle is the same as StandardSync but the code treat them as different entities.

Describe the solution you’d like

StandardSchedule should belong to the StandardSync

Redshift as a destination

Tell us about the problem you're trying to solve

I'd like to be able to push data to Redshift.

Describe the solution you’d like

The ability to choose Redshift as a destination in the UI.

Create documentation for the integration category

Tell us about the problem you're trying to solve

I need to know how to build new integrations and how to use existing ones.

json schema codegen is nesting enums as subclasses

// todo (cgardens) - the codegen auto-nests enums as subcasses. this won't work. we expect these
//   enums to be reusable in create, update, read.

make integrations self-contained

integrations need to be responsible for defining the config spec
integration images need an interface to produce the config spec
integrations should fail if the provided config is invalid
integration tests should fail if validation is not being performed properly or if it can't access the spec
integrations should be able to optionally convert their config to a different format (currently only required for BQ, but could be used in the future for postgres, others)

Support multiple destinations for a single installation of dataline

Tell us about the problem you're trying to solve

I'd like to send data to multiple destinations e.g: snowflake and redshift, without having to deploy two instances of dataline.

add unit tests for job history handler

Allow manually triggering integration & acceptance tests from a branch

API pass in state on manual sync

manual sync failing in API

Expected Behavior

setting a null schedule should make a connection manual sync only.

Current Behavior

sending null returns a json error saying schedule is required

Other

glanced at the config and it looks like it right. new

Unit Tests for JobHistoryHandler

Allows workers to split their logs

Tell us about the problem you're trying to solve

Having one big log file is not manageable to understand what went wrong with a worker

Describe the solution you’d like

I would like all the logs to be scoped by job id

Support incremental syncs

Tell us about the problem you're trying to solve

Dataline currently only supports dumping the entire data source into the destination. I want to only sync the data that has changed since the last sync, not dump the entire dataset every time.

Describe the solution you’d like

When configuring a connection from the UI, I'd like to be able to configure it to be incremental instead of full refresh.

Improve database & config migration lifecycle

Tell us about the problem you're trying to solve

Today we don't have a good process to apply db migrations or config migrations. As we iterate on the project that will cause issues to user who want to upgrade to the next version.

Describe the solution you’d like

Not yet defined, but I am thinking of having a version specific container that runs before any others and applies migrations to the db if necessary.

Onboarding - set up a source

Once user set up their preferences, they should see the following screen:
https://www.figma.com/file/AODY4oi15iI5l0fbxHvobt/Daxtarity?node-id=5782%3A42748

Setting up the name for the source and selecting a connector for the source should be compulsory.
So the button "Set up source" should be disabled as long as all fields are not filled.

Once the user selected a connector, 2 things should happen:

new fields should be displayed to set up the source (defined by the API)
a link towards our documentation for this connector should be displayed. Clicking on the link should display the docs page in a new tab.

@cgardens

Feedback when adding a source or destination

When adding a source or destination, from the Onboarding or New source, we should test the connections before moving on.
Here are the screens showcasing the feedback:

Testing connections: https://www.figma.com/file/AODY4oi15iI5l0fbxHvobt/Daxtarity?node-id=5894%3A39
Positive results (with redirection to next step after 2s): https://www.figma.com/file/AODY4oi15iI5l0fbxHvobt/Daxtarity?node-id=5894%3A345
Negative results (with error message and button to resubmit): https://www.figma.com/file/AODY4oi15iI5l0fbxHvobt/Daxtarity?node-id=5894%3A661

@cgardens

https://airbytehq.slack.com/archives/C02RQFX0X89/p1640022762001700?thread_ts=1640022762.001700&cid=C02RQFX0X89

Track events not well logged on docs.dataline.io

Expected Behavior

Tell us what should happen.
When visiting a documentation page, in Segment's debugger (https://app.segment.com/dataline/sources/dataline-website/debugger), I should see 2 events:

TRACK: "Viewed [Page Name] Page"
PAGE: /deployment/deploying-dataline/with-docker (for instance)

Current Behavior

Tell us what happens instead of the expected behavior.
We currently only see the PAGE event. The issue with this is that we won't see this issue in the analytics tools (Amplitude / etc).

Steps to Reproduce

Go to any docs page
Check debugger on Segment, and you'll only see the PAGE event, and not the TRACK one.

Severity of the bug for you

Very low / Low / Medium / High / Critical
Medium

Additional context

Environment, version, integration...
We had a similar issue in the past. The solution is in passing the name of the page in the Segment event, so that "Track Named Pages" works on Segment. As an example, this would be a page call with a name of Pricing passed in:
analytics.page('Pricing');

Remove ConfigFetchers

Tell us about the problem you're trying to solve

There is duplicated code between server and scheduler

Describe the solution you’d like

Find a way to remove these files

Onboarding - set up your destination

Once user set up their source in Onboarding, they should be invited to set up their destination:
https://www.figma.com/file/AODY4oi15iI5l0fbxHvobt/Daxtarity?node-id=5782%3A44298

in v1.0, we only support one destination. This might change in the future.

It should be the same behavior as for sources. But there should be 2 differences:

user should be able to click on (1) create a source in the tabs of the header to go back to previous step (note that clicking on "(3) set up connection" shouldn't do anything until you reached that point)
user should be able to click on "Source name" in the summary to go back to previous step as well

Summary of expected behavior (similar to source):
Setting up the name for the source and selecting a connector for the destination should be compulsory. So the button "Set up destination" should be disabled as long as all fields are not filled.

Once the user selected a connector, 2 things should happen:

new fields should be displayed to set up the destination (defined by the API)
a link towards our documentation for this connector should be displayed. Clicking on the link should display the docs page in a new tab.

@cgardens

make timestamps available in saved worker logs

Current singer tap/target logs do not include timestamps in the UI. This would be useful for someone trying to debug exact issues in the UI logs.

Track customer usage on the backend

Inject job id into the logs of workers

All logs output by a worker should include job id for debugging purposes. This should be controlled outside of the integration implementations that an OSS contributor makes.

Sources section - List of sources

"Sources" should be the default page on which users land after the onboarding or in a new session.
Here is how the list should look like: https://www.figma.com/file/AODY4oi15iI5l0fbxHvobt/Daxtarity?node-id=5856%3A0

In v1.0, user shouldn't be able to sort the list yet (this might change in the future).
The list should display some indicators:

status of the last sync with icons
source name
source connector
last sync date
frequency of the syncs
"Enabled" status

User should have the ability to disable / enable a source from this list.
When a source has experienced a failure at the last sync, the status should displayed the failed status, but also the background should be red, so it stands out.

In the case of a source manually being synced, the ENABLED toggle should be replaced by a button. When clicking this button, a feedback should be displayed: https://www.figma.com/file/AODY4oi15iI5l0fbxHvobt/Daxtarity?node-id=5906%3A810

@cgardens

Implement Segment event tracking

Here is the Segment workspace we will be using: https://app.segment.com/dataline/home

Here are the specs for the event tracking:
https://docs.google.com/spreadsheets/d/1qQPMBdz86yMBiyXpjw91D3HoP6hN7yIPRmLrF7xWOgA/edit?usp=sharing

We need Identify, Page and Track events.

@cgardens

add stripe api source integration

Decrease Logging

Expected Behavior

We are okay logging a lot, especially at the beginning, but they should be useful and human readable. Right now we are logging a lot of useless stuff which is making the useful stuff hard to consume.

Current Behavior

JsonReferenceProcessor logs a ton useless stuff. we should filter it.

dataline-scheduler | 2020-09-10 00:49:15 TRACE JsonReferenceProcessor:191 - key=type
dataline-scheduler | 2020-09-10 00:49:15 TRACE JsonReferenceProcessor:145 - processed: []

this log line happens every second.

dataline-scheduler | 2020-09-10 00:51:46 INFO  JobSubmitter:54 - Running job-submitter...

Type Validator prints all the time (even when nothing is invalid)

dataline-scheduler | 2020-09-10 00:52:23 DEBUG TypeValidator:136 - validate( {"connectionId":"60413b71-0fdb-4b7a-959f-7f316e2adfb0","manual":true}, {"connectionId":"60413b71-0fdb-4b7a-959f-7f316e2adfb0","manual":true}, $)
dataline-scheduler | 2020-09-10 00:52:23 DEBUG RequiredValidator:136 - validate( {"connectionId":"60413b71-0fdb-4b7a-959f-7f316e2adfb0","manual":true}, {"connectionId":"60413b71-0fdb-4b7a-959f-7f316e2adfb0","manual":true}, $)
dataline-scheduler | 2020-09-10 00:52:23 DEBUG AdditionalPropertiesValidator:136 - validate( {"connectionId":"60413b71-0fdb-4b7a-959f-7f316e2adfb0","manual":true}, {"connectionId":"60413b71-0fdb-4b7a-959f-7f316e2adfb0","manual":true}, $)
dataline-scheduler | 2020-09-10 00:52:23 DEBUG PropertiesValidator:136 - validate( {"connectionId":"60413b71-0fdb-4b7a-959f-7f316e2adfb0","manual":true}, {"connectionId":"60413b71-0fdb-4b7a-959f-7f316e2adfb0","manual":true}, $)
dataline-scheduler | 2020-09-10 00:52:23 DEBUG RequiredValidator:136 - validate( {"connectionId":"60413b71-0fdb-4b7a-959f-7f316e2adfb0","manual":true}, {"connectionId":"60413b71-0fdb-4b7a-959f-7f316e2adfb0","manual":true}, $)
dataline-scheduler | 2020-09-10 00:52:23 DEBUG TypeValidator:136 - validate( "60413b71-0fdb-4b7a-959f-7f316e2adfb0", {"connectionId":"60413b71-0fdb-4b7a-959f-7f316e2adfb0","manual":true}, $.connectionId)
dataline-scheduler | 2020-09-10 00:52:23 DEBUG UUIDValidator:136 - validate( "60413b71-0fdb-4b7a-959f-7f316e2adfb0", {"connectionId":"60413b71-0fdb-4b7a-959f-7f316e2adfb0","manual":true}, $.connectionId)

Validation of configurations should happen before persistence

We validate schema on read, but no on write. That doesn't make a whole ton of sense.

jetty to java json parsing

i need to figure out if i should actually be doing toString here. i have to double check how jetty parses the any type in an http request. i don't know if it gets parsed as a string or a json object and then what that actually gets cast to in java.

Always produce the most up to date images to help for local development

Tell us about the problem you're trying to solve

I would like to have all the images available on dockerhub when after I push to master. That will help me setup an up to date environment when developing locally.

Describe the solution you’d like

Github pushes to dockerhub as part of the master build

Add BigQuery integration

1st screen - Set up your preferences

The 1st time the user launches the app, they should see this screen:
https://www.figma.com/file/AODY4oi15iI5l0fbxHvobt/Daxtarity?node-id=5894%3A3685

In this screen, we ask for their email address, and some other options (data anonymization, newsletter, security updates) that are inactive until the user puts an email address.
This step should be optional to the user, as they can click on "Continue" without entering their email address.
If they enter their email address, the default settings should be:

Anonymize my data: off
Subscribe to newsletter: off
Get security updates: on

Once the user clicks on "Continue", they should be redirected towards the onboarding.

@cgardens

Source detail page - Status tab with list of logs

When clicking on a cell in the list of sources in "Sources", this should display the status detail page"
https://www.figma.com/file/AODY4oi15iI5l0fbxHvobt/Daxtarity?node-id=5782%3A36940

A few features:

User should be able to click on "Sources" in breadcrumb to go back to the list.
List of logs should all be white, but hover state should be grey or red (depending if completed or failed). Hover state should also include the expand / reduce icon on the right of the cells.
clicking on any log should expand it: https://www.figma.com/file/AODY4oi15iI5l0fbxHvobt/Daxtarity?node-id=5782%3A36313
clicking on an expanded log (only the original cell, not the expansion) should reduce it
user should be able to launch a manual sync with "Sync now" (only the increments)
in summary cell at the top, only the ENABLED toggle should be clickable.
for a manually syncing source, layout should be as follows: https://www.figma.com/file/AODY4oi15iI5l0fbxHvobt/Daxtarity?node-id=5906%3A1233

The v1.0 doesn't include a full refresh.

@cgardens

airbytehq / airbyte Goto Github PK

airbyte's Introduction

Getting Started

Join the Airbyte Community

Contributing

Security

License

Thank You

airbyte's People

Contributors

Stargazers

Watchers

Forkers

airbyte's Issues

Expected Behavior

Current Behavior

Steps to Reproduce

Severity of the bug for you

Expected Behavior

Current Behavior

Steps to Reproduce

Severity of the bug for you

Additional context

Tell us about the problem you're trying to solve

Tell us about the problem you're trying to solve

Tell us about the problem you're trying to solve

Describe the solution you’d like

Expected Behavior

Current Behavior

Steps to Reproduce

Tell us about the problem you're trying to solve

Expected Behavior

Tell us about the problem you're trying to solve

Tell us about the problem you're trying to solve

Describe the solution you’d like

Tell us about the problem you're trying to solve

Describe the solution you’d like

Tell us about the problem you're trying to solve

Tell us about the problem you're trying to solve

Expected Behavior

Current Behavior

Other

Tell us about the problem you're trying to solve

Describe the solution you’d like

Tell us about the problem you're trying to solve

Describe the solution you’d like

Tell us about the problem you're trying to solve

Describe the solution you’d like

Expected Behavior

Current Behavior

Steps to Reproduce

Severity of the bug for you

Additional context

Tell us about the problem you're trying to solve

Describe the solution you’d like

Expected Behavior

Current Behavior

Tell us about the problem you're trying to solve

Describe the solution you’d like

Recommend Projects

Recommend Topics

Recommend Org