Comments (5)
btw. I'm thinking of high performance database cloning with https://github.com/sfu-db/connector-x . we could replace the sqlalchemy with it even now but I'm not sure there will be much difference. it however can load directly to arrow
tables and once dlt
can process arrow
natively I could use it to load data from db to db almost directly.
from verified-sources.
Amazing! Thank you for the detailed feedback. Will work on these.
from verified-sources.
btw. I'm thinking of high performance database cloning with https://github.com/sfu-db/connector-x . we could replace the sqlalchemy with it even now but I'm not sure there will be much difference. it however can load directly to
arrow
tables and oncedlt
can processarrow
natively I could use it to load data from db to db almost directly.
Nice, I didn't know about this. We could use both together I guess. SQLAlchemy for schema reflection and building queries, and connector-x to run them.
from verified-sources.
there are a few weird things about it:
- it loads the whole data set and gives you panda frame or pyarrow. so requires more dependencies and more memory and will not work for big datasets
- only a few sources are supported
- bigquery credentials are passed as file so we should probably fallback to SqlAlchemy in that case
my take: if you want to experiment with it then great but I'd do it as an option or as a separate pipeline. just my 2c
from verified-sources.
@rudolfix Question about this:
note that you can pass a list of resources/sources/generators to dlt.run method so people can pick their own tables.
This would be passed with calling source().with_resources(...)
, right?
Can this be handled lazily somehow? I can't find a way to access the resource list from within source.
I'm thinking I need to know which tables are requested before generating the resources dynamically, so this could replace my table_names
argument. So something like this where the source is a generator:
@dlt.source
def sql_database(...):
...
# Get arguments passed to `with_resource` from somewhere (if none do full schema reflection)
resource_names = get_requested_resources()
# lazy generate resources
for table in resource_names:
yield dlt.source(table_rows ....)(...)
from verified-sources.
Related Issues (20)
- rest_api: Allow multiple resolve params in an endpoint config
- Postgres HOT 1
- SQL databases: fix creds for sql_table() HOT 2
- sql_database source | error with pyarrow BE, if some of the types were not identified correctly HOT 1
- google cloud pubsub verified source
- SQL database source helpers has no fallback for `tz` in `row_tuples_to_arrow` HOT 1
- rest_api: Allow the REST API config object to exclude rows, columns, and transform data
- rest_api: passing value for path parameters not working as expected HOT 3
- Support for incremental loading for Postgres with xmin
- rest_api: Extend `response_actions` to accept hook functions
- rest_api: Allow to specify a transformation function for the cursor field in incremental load configuration HOT 1
- webhook verified source HOT 1
- import_schema_path yaml.constructor.ConstructorError HOT 1
- rest_api: rename `json_response` paginator to `json_link`
- rest_api: pluggable paginators HOT 1
- rest_api: Clarify how we specify incremental loading
- rest_api: Request param incremental configuration should not accept keys that it ignores
- add filter to mongodb source HOT 2
- rest_api: Pluggable Custom Authentication/Authorization
- Add column selection and filtering to sql_database source
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from verified-sources.