datacoves / dbt-coves Goto Github PK
View Code? Open in Web Editor NEWCLI tool for dbt users to simplify creation of staging models (yml and sql) files
Home Page: https://pypi.org/project/dbt-coves/
License: Apache License 2.0
CLI tool for dbt users to simplify creation of staging models (yml and sql) files
Home Page: https://pypi.org/project/dbt-coves/
License: Apache License 2.0
Is your feature request related to a problem? Please describe.
no
Describe the solution you'd like
use sqlfluff pre-commit hook
https://docs.sqlfluff.com/en/stable/production.html#using-pre-commit
Describe the bug
When using dbt-coves generate sources
, the resulting SQL file doesn't consider arrays. These are casted as string.
Also dbt-coves doesn't traverse through the array when generating the source.yml file. Only the top level will be represented as column.
This is at least true for BigQuery.
To Reproduce
Steps to reproduce the behaviour:
dbt-coves generate sources
Expected behaviour
The SQL should contain all fields and wrap them in structs and arrays correctly.
The yml should show all levels of columns.
Best
Andreas
Describe the bug
When i use dbt-coves generate sources in a dbt project with BigQuery and i create models for 2 tables with the same name but different cases, the function only returns one of the models and overwrites the other.
To Reproduce
Steps to reproduce the behaviour:
Describe the bug
The cookiecutter template used is assuming 0.19 as the dbt version
To Reproduce
Install dbt-coves and then run dbt deps
Expected behaviour
To install compatible packages with dbt 0.20
Ideally dbt-coves init
should recognize the installed dbt version and generate the appropriate packages.yml
file.
Model properties files (yaml) are not currently generating any field or table description.
It should be nice to add a new parameter --metadata metadata.csv
where dbt-coves looks for fields descriptions, types, tables descriptions and by using that information the generated yaml gets almost complete.
This information would be also used to specify the types on the .SQL file when doing flattening.
The structure of the metadata.csv file is https://docs.google.com/spreadsheets/d/1bLWFXt3XhMwTWpNgcXMQovGvITEPIjuOCDarY7xevsY/edit?usp=sharing
Describe the bug
Generate source throws exception when VARIANT contains no json
To Reproduce
dbt-coves generate sources
on a table with a variant column with no JSON
Is your feature request related to a problem? Please describe.
Generating full .yml files (model name, list of columns, empty description fields) for every model in a dbt project - thorough documentation.
Describe the solution you'd like
If the current generate sources function had a flag that disables .sql files creation / or yml-only
flag, it would achieve exactly that. Although it's set up for sources specifically, if you provide your analytics schema in the .dbt_coves.yml, it will create the yml definitions for all of the dbt models in a project.
Describe alternatives you've considered
As an alternative I had a wrapper python function that captured terminal output of this function https://github.com/dbt-labs/dbt-codegen/blob/0.3.2/macros/generate_model_yaml.sql for each model and created the ymls.
Additional context
The generate sources function is extremely useful in creating staging models for a dbt project. It seems, however, that it could be easily extended to help with thorough documentation of all models within a dbt project. Ideally, it wouldn't then put the "sources" part at the top of the file, but that's much easier to get rid of than creating all of the ymls from scratch!
I have two databases, a source database where I load all my raw data to, and then a data model database where dbt operates. So my dbt profile is set to the data model database but has a role that can read from the source database. But the generate sources command doesn't look in my source database, only my data model database.
Describe the bug
A clear and concise description of what the bug is.
After running dbt-coves setup all
and going through the prompts, I get:
dbt init SUCCESS ✔
❌ 'bool' object has no attribute 'parent'
To Reproduce
Steps to reproduce the behaviour:
dbt-coves setup all
Expected behaviour
To not get the > ❌ 'bool' object has no attribute 'parent'
output
Desktop (please complete the following information):
pip freeze
:Describe the bug
Pre-exising contents of schema.yml
version: 2
sources:
- name: my_source_name
database: my_database_name
description: 'here is a test source description'
tables:
- name: my_table_name
description: 'here is a test table description'
Command:
dbt-coves generate sources --database my_database_name --schemas my_schema_name --update-strategy update
Contents of schema.yml after executing command:
version: 2
sources:
- name: my_source_name
database: my_database_name
description: here is a test source description
tables:
- name: my_table_name
description: here is a test table description
To Reproduce
Steps to reproduce the behaviour:
dbt-coves generate sources --database my_database_name --schemas my_schema_name --update-strategy update
Expected behaviour
I would expect it to leave existing descriptions untouched
Desktop (please complete the following information):
Describe the bug
Lint test failed, but shows as passed on CI job
To Reproduce
Break a pre-commit rule and push to repo.
Describe the current behavior
If I have a pre-existing some_staging_model.sql
as:
SELECT
col1
FROM
some_table
WHERE
col1 = FALSE
, and there is a new column that should be added due to it's being recently-added in the source, generate sources
will remove my existing transformations when it updates some_staging_model.sql
as:
SELECT
col1
, col2
FROM
some_table
Requested behaviour
I would expect that it would retain the existing SQL, and just add new columns / drop newly-deleted columns
Desktop (please complete the following information):
Additional context
Contents of my .dbt_coves.config.yml:
generate:
sources:
update_strategy: update
Maybe this should only be the case if the update
strategy is set to "update"?
I'm trying to automate my staging layer refresh using dbt-coves. If, however, I'll lose all staging layer transformations every time it's run, then I can't use it for this purpose.
Is your feature request related to a problem? Please describe.
It can be painful to create .yml files for dbt models; the behaviour of dbt-coves generate properties
would be helpful for models also.
Describe the solution you'd like
Addition of command dbt-coves generate properties
, which searches the manifest for models without a patch_path; presents an interface for selection like dbt-coves generate sources
does, and creates a templated .yml file with all columns listed for selected models.
Describe alternatives you've considered
dbt-sugar contains this functionality, as does codegen. However, using multiple separate tools to do essentially the same thing (from the user perspective) is not ideal, as it requires the user to remember which tool does which specific use case.
Is your feature request related to a problem? Please describe.
I'm trying to run generate sources
automatically via cron on a virtual machine. But, from what I can tell, there isn't a way to suppress some of the prompts (choose relations, flatten JSON, etc.).
It also seems that I have to hardcode the schemas I'd like, using config.yml.
Describe the solution you'd like
I'd like a simple way to be able to generate sources for all relations in a schema, and for all schemas in a database.
I'd like to be able to have the following in my config
Describe alternatives you've considered
Not being able to automate it. Closest I've come is by hard coding my schemas in config.yml and then still needing to execute manually so I can interact with the prompts.
Running a command that expects a subcommand will return this:
Can't instantiate abstract class GenerateTask with abstract methods run
which is misleading.
Handle the error and print the help instead.
I noticed that when I re-run generate sources
, sources and staging models for which the base tables have been deleted from the database are not modified / deleted. Is that the expected behavior?
Describe the solution you'd like
We're currently able to supply
select
,exclude
,filter
arguments when generating properties. It would be very helpful to do the same when generating sources.
Describe alternatives you've considered
Manually selecting sources via the CLI
Many sources are based on APIs and often times these APIs are well documented. It would be great to have the option to get descriptions for all fields in a source from the OpenAPI specification.
Update model properties files tables and field descriptions by running dbt-coves docs update --from metadata.csv
.
The csv will contain identifiers for tables and fields so dbt-coves could take the descriptions set and update the corresponding model properties yaml files.
Models could be filtered using a --m
argument as dbt
does.
Is your feature request related to a problem? Please describe.
When initializing a new dbt project, it's good to create every file in the current folder instead of on a new one.
Describe the solution you'd like
By passing the argument --current-dir, the initialization will happen in the current directory.
Is your feature request related to a problem? Please describe.
Sometimes a column in a table may have the name GROUP
, ORDER
, START
, SCHEMA
, TABLE
, etc.
generate sources
doesn't quote these columns, causing issues when dbt is run
Describe the solution you'd like
I'd like generate sources
to automatically quote column names when they conflict with database keywords
Describe alternatives you've considered
Manually modifying the templates with huge if x or x
conditions to add quotes if needed
Currently dbt-coves init
does this for you. But, most of the time, the repo already exists and we need to add the missing components manually.
Is your feature request related to a problem? Please describe.
dbt-coves can set up github actions, airflow, and other pieces of the data stack in a pre-configured simple way.
dbt-coves setup macros # use dbt-package
dbt-coves update macros # use a version matrix and update packages.yml for the user
These are all things we can "setup" for the user but require ongoing lifecycle management.
dbt-coves setup github-actions
dbt-coves setup --update github-actions
dbt-coves setup airflow
dbt-coves setup --update airflow
dbt-coves setup permifrost
dbt-coves setup --update permifrost
dbt-coves setup training-and-tools
dbt-coves setup --update training-and-tools
dbt-coves setup snowpark
dbt-coves setup --update snowpark
Managing the lifecycle of files in user owned repos involves change management. Copier enables efficient change management (including handling merge conflicts) and leverages a centralized repository of templates which provide the actual content. Furthermore by wrapping in dbt-coves setup, we will be able to provide a changelog to assist users in updating dbt-coves derived components in their data stack codebase. We can also consider a dbt-coves setup --update --all [--dry-run ?]
command which checks for component updates and outputs actions / changelogs to terminal.
Is your feature request related to a problem? Please describe.
Sometimes you need to reuse the same python function many times.
Describe the solution you'd like
It would be good to define common functions in dbt macros so they can be used in dbt-coves templates.
Describe the bug
When running dbt-coves generate
the output is:
Can't instantiate abstract class GenerateTask with abstract methods run
To Reproduce
dbt-coves generate
Expected behaviour
To show a list of possible arguments.
Describe the bug
'list' object has no attribute 'keys' when flattening table created by airbyte (github_issues)
To Reproduce
Use Airbyte to ingest data from github issues.
Run dbt-coves generate sources
, when prompted to flatten data, answer yes
.
Describe the bug
The GitHub airbyte connector created a field named default in lowercase in the github_issue_labels table. dbt-coves does not double-quote case-sensitive snowflake columns.
Describe the bug
Unable to install dbt-coves v1.3 from PyPI (via pip install)
To Reproduce
python -m venv venv_test
source venv_test/bin/activate
pip install --upgrade pip
pip install dbt-coves~=1.3
Expected behaviour
Successful installation (I noticed v1.4 was yanked on PyPI, so pinning to ~=1.3, which I'm assuming is >=1.3,<1.4)
Console Log/Tracebacks
(note, I added line breaks to make it more human-readable)
ERROR: Ignored the following versions that require a different python version: 0.19.1a7 Requires-Python >=3.7,<3.9; 0.19.1a8
Requires-Python >=3.7,<3.9; 0.19.1a9 Requires-Python >=3.7,<3.9; 0.19.2a10 Requires-Python >=3.7,<3.9; 0.20.0a1
Requires-Python >=3.7,<3.9; 0.20.0a2 Requires-Python >=3.7,<3.9; 0.20.0a3 Requires-Python >=3.7,<3.9; 0.21.0a1
Requires-Python >=3.7,<3.9; 0.21.0a10 Requires-Python >=3.7,<3.9; 0.21.0a11 Requires-Python >=3.7,<3.9; 0.21.0a12
Requires-Python >=3.7,<3.9; 0.21.0a13 Requires-Python >=3.7,<3.9; 0.21.0a14 Requires-Python >=3.7,<3.9; 0.21.0a15
Requires-Python >=3.7,<3.9; 0.21.0a16 Requires-Python >=3.7,<3.9; 0.21.0a17 Requires-Python >=3.7,<3.9; 0.21.0a18
Requires-Python >=3.7,<3.9; 0.21.0a2 Requires-Python >=3.7,<3.9; 0.21.0a3 Requires-Python >=3.7,<3.9; 0.21.0a4
Requires-Python >=3.7,<3.9; 0.21.0a5 Requires-Python >=3.7,<3.9; 0.21.0a6 Requires-Python >=3.7,<3.9; 0.21.0a7
Requires-Python >=3.7,<3.9; 0.21.0a8 Requires-Python >=3.7,<3.9; 0.21.0a9 Requires-Python >=3.7,<3.9; 0.21.1a19
Requires-Python >=3.7,<3.9; 0.21.1a20 Requires-Python >=3.7,<3.9
ERROR: Could not find a version that satisfies the requirement dbt-coves~=1.3 (from versions: 1.0.4a1, 1.0.4a2, 1.0.4a3,
1.0.4a4, 1.0.4a17, 1.0.4a18, 1.0.4a19, 1.0.4a20, 1.0.4a21, 1.0.4a22, 1.0.4a23, 1.0.4a24, 1.0.4a25, 1.0.4a26, 1.0.4a27, 1.0.4a28,
1.0.4a29, 1.0.4a30, 1.0.5a1, 1.0.5a2, 1.0.5a3, 1.1.0a1, 1.1.0a2, 1.1.0a3, 1.1.0a4, 1.1.0a5, 1.1.0a6, 1.1.0a7, 1.1.0a8, 1.1.0a9, 1.1.1a0,
1.1.1a1, 1.1.1a2, 1.1.1a3, 1.1.1a4, 1.1.1a5, 1.1.1a6, 1.1.1a7, 1.1.1a8, 1.1.1a9, 1.1.1a10, 1.1.1a11, 1.1.1a12, 1.1.1a13, 1.1.1a14, 1.1.1a15,
1.1.1a16, 1.1.1a17, 1.1.1a18, 1.1.1a19, 1.1.1a20, 1.1.1a21, 1.1.1a22, 1.1.1a23, 1.1.1a24, 1.1.1a25, 1.1.1a26, 1.1.1a27, 1.1.1a28, 1.1.1a29,
1.1.1a30, 1.1.1a31, 1.1.1a32, 1.1.1a33, 1.3.0a1, 1.3.0a2, 1.3.0a3, 1.3.0a4, 1.3.0a5, 1.3.0a6, 1.3.0a7, 1.3.0a8, 1.3.0a9, 1.3.0a10,
1.3.0a11, 1.3.0a12, 1.3.0a13, 1.3.0a14, 1.3.0a15, 1.3.0a16, 1.3.0a17, 1.3.0a18, 1.3.0a19, 1.3.0a20, 1.3.0a21, 1.3.0a22, 1.3.0a23,
1.3.0a24, 1.3.0a25, 1.3.0a26, 1.3.0a27, 1.3.0a28, 1.4.0)
ERROR: No matching distribution found for dbt-coves~=1.3
Desktop (please complete the following information):
Additional context
python --version
Python 3.9.13
Is your feature request related to a problem? Please describe.
Since dbt requires unique model names, our models are named by the convention models/schema/schema__table.sql
. When running generate properties
, the models are picked up by the wrong names, e.g.
Model locations.locations__county not materialized, did you execute dbt run?.
Model locations.locations__state not materialized, did you execute dbt run?.
Where the models are actually named
locations.state
locations.county
Describe the solution you'd like
Allow for custom model naming conventions so that model names do not have to equal table names.
Is your feature request related to a problem? Please describe.
In a regular workflow, most times (if not all) the model.sql will be modified (renaming columns mainly, converting/casting) and then another model.yml will need to be reproduced anyway to reflect those modifications.
Describe the solution you'd like
The option to not generate source.yml when dbt-coves generate sources ---database <db_name>
is called
Describe alternatives you've considered
Noel suggested changing the path for the generated yml during source creation to go to /tmp
though the directory can cause added confusion
Is your feature request related to a problem? Please describe.
When you have dozens or hundreds of tables, it's difficult to pick the right ones.
Describe the solution you'd like
Filter schemas running dbt-coves generate sources --shemas=RAW_*
.
Filter tables running dbt-coves generate sources --relations=S*RC_*
.
Describe the bug
When defining sources_destination
, models_destination
, etc in .dbt_coves/config.yml
, the first reference to schema
is unable to be altered, while following sources can be
To Reproduce
Steps to reproduce the behaviour:
Define:
models_destination: "models/staging/{{ schema }}/{{ schema | replace('raw','stg') }}__{{ relation }}.sql"
dbt-coves functions as expected, but with
models_destination: "models/staging/{{ schema | replace('raw','') }}/{{ schema | replace('raw','stg') }}__{{ relation }}.sql"
the first reference to schema
returns a blank string.
Expected behaviour
I would expect referencing the same variable in the same line to function identically.
Describe the bug
in dbt-coves generate sources, a csv airbyte source containing backslash / hyphens in the column names causes issues on Snowflake
To Reproduce
Steps to reproduce the behaviour:
Expected behaviour
Expected output:
_airbyte_data:"cldr display name"::varchar as cldr_display_name,
Actual output:
_airbyte_data:cldr display name::varchar as cldr display name,
Results in dbt run:
Database Error in model _airbyte_raw_country_codes (models/sources/raw/_airbyte_raw_country_codes.sql)
001003 (42000): SQL compilation error:
syntax error line 13 at position 35 unexpected 'name'.
Describe the bug
I had dbt-redshift installed in a pipenv, and when I installed dbt-coves it looks like it installed [email protected] (without a specific dbt adapter specified)
To Reproduce
Steps to reproduce the behaviour:
installed version: 0.19.2
latest version: 0.19.2
Up to date!
Plugins:
- bigquery: 0.18.2
- postgres: 0.19.2
- redshift: 0.19.2
- snowflake: 0.18.2
Expected behaviour
Installing dbt-coves shouldn't clobber an existing dbt install which has specified a specific adapter
Desktop (please complete the following information):
Windows 10
Python 3.8
dbt-coves v0.19.2-a.10
Is your feature request related to a problem? Please describe.
Do more comprehensive flattening of JSON data.
Describe the solution you'd like
Create additional tables when JSON field contains multiple levels of data.
Describe the bug
Database Error 000904 (42000): SQL compilation error: error line 1 at position 19 invalid identifier 'ISSUE'
To Reproduce
Error on flattening github_pull_requests__links
.
Describe the bug
Table not found on BigQuery datastet
To Reproduce
dbt-coves generate source
Expected behaviour
Generate files from source tables
Console Log/Tracebacks
Which schemas would you like to inspect? [RAW_RTB_HOUSE_GSHEETS]
No tables/views found in RAW_RTB_HOUSE_GSHEETS schema.
Desktop (please complete the following information):
Additional context
Besides console info are all on uppercase, all my sources datasets and tables are lowercase
Is your feature request related to a problem? Please describe.
dbt has the option to define column descriptions in markdown files. It would be great to have an option in dbt-coves to generate schema descriptions in the style of:
- name: status
description: '{{ doc("orders_status") }}'
And generate a docs.md file that has the columns defined, so I would just need to add my descriptions there.
Is your feature request related to a problem? Please describe.
dbt-coves seems to do a lot to help users adopt analytics engineering best practices.
One of the best practices is documentation.
A problem I (and a lot of people) have is that you have to repeat the column descriptions in every downstream model.
This results in either a bunch of time copying and pasting or downstream models not having proper descriptions.
Describe the solution you'd like
What I would like to do is just do column descriptions once at the base model and then have that propagate through the DAG to all downstream models.
Describe alternatives you've considered
Use PyYAML to make my own solution (more of a learning project since this is not my speciality)
dbt-osmosis has a way to "Automatically generate documentation based on upstream documented columns"
dbt-osmosis yaml document --project-dir ... --profiles-dir ...
dbt-coves v0.20.0-a.3 dbt v0.20.2
Can't instantiate abstract class GenerateTask with abstract methods run
To Reproduce
Steps to reproduce the behaviour:
I assume because dbt v0.20.2 is not supported.
I've installed dbt-coves in a fresh venv. I guess it might be good idea to constraint the dependencies for support version to help getting started
more easir
Is your feature request related to a problem? Please describe.
Sometimes you need to exclude some relations when running dbt-coves generate sources and avoid passing all the possible options using just the relations
filter.
Describe the solution you'd like
Run dbt-coves generate sources --exclude-relations _airbyte*
will process all relation excepts those starting with _airbyte
.
The --exclude-relations
filter should be case insensitive.
Is your feature request related to a problem? Please describe.
It seems like running generate sources
sends the DESCRIBE TABLE ...
statements to Snowflake sequentially one-by-one as it goes. It'd be great if this went a lot faster.
Describe the solution you'd like
Would it be reasonable to queue up all those database statements up front and run through them as the results return, so that it completes much more quickly?
Describe alternatives you've considered
Can't really think of any apart from just using it as it is and waiting much longer.
Additional context
python 3.10.9
Snowflake 7.3.1
macOS 13.2 (22D49)
output of pip freeze
:
agate==1.6.3
asn1crypto==1.5.1
attrs==22.2.0
Babel==2.11.0
bump2version==1.0.1
bumpversion==0.6.0
certifi==2022.12.7
cffi==1.15.1
charset-normalizer==2.1.1
click==8.1.3
colorama==0.4.5
commonmark==0.9.1
cryptography==36.0.2
dbt-core==1.3.2
dbt-coves==1.3.0a25
dbt-extractor==0.4.1
dbt-snowflake==1.3.0
filelock==3.9.0
future==0.18.3
hologram==0.0.15
idna==3.4
importlib-metadata==6.0.0
isodate==0.6.1
jaraco.classes==3.2.3
Jinja2==3.1.2
jsonschema==3.2.0
keyring==23.13.1
leather==0.3.4
Logbook==1.5.3
luddite==1.0.2
MarkupSafe==2.1.2
mashumaro==3.0.4
minimal-snowplow-tracker==0.0.2
more-itertools==9.0.0
msgpack==1.0.4
networkx==2.8.8
oscrypto==1.3.0
packaging==21.3
parsedatetime==2.4
pathspec==0.9.0
pretty-errors==1.2.25
prompt-toolkit==3.0.36
pycparser==2.21
pycryptodomex==3.17
pydantic==1.10.4
pyfiglet==0.8.post1
Pygments==2.14.0
PyJWT==2.6.0
pyOpenSSL==22.0.0
pyparsing==3.0.9
pyrsistent==0.19.3
python-dateutil==2.8.2
python-slugify==7.0.0
pytimeparse==1.1.8
pytz==2022.7.1
PyYAML==6.0
questionary==1.10.0
requests==2.28.2
rich==12.6.0
ruamel.yaml==0.17.21
ruamel.yaml.clib==0.2.7
six==1.16.0
snowflake-connector-python==2.7.12
sqlparse==0.4.3
text-unidecode==1.3
typing_extensions==4.4.0
urllib3==1.26.14
wcwidth==0.2.6
Werkzeug==2.2.2
yamlloader==1.2.2
zipp==3.12.0
We create one file per schema, as opposed to per table. Currently I am manually merging the table-specific data into a single file, but it would be nice for this to happen automatically.
Given that the strategy config exists at all, I assume this is already under consideration for the future!
Describe the bug
A clear and concise description of what the bug is.
For a single schema, dbt-coves is unable to detect columns via generate sources, producing the following SQL:
SELECT
FROM {{ source('xyz', 'abc') }}
and YML:
- name: xyz__abc
description: ''
columns:
but for all other schemas, dbt-coves correctly generates sources. I'm unable to figure out why this is the case. Some debugging suggestions would be appreciated.
Describe the bug
If I run generate sources
, then add a column to one of the tables for which the sources were created, then re-run generate sources
, the new column is not picked up. I'm not sure if this is expected or not.
To Reproduce
Steps to reproduce the behaviour:
generate sources
generate sources
againExpected behaviour
I would expect that the new column would be added in the sources.yml file (and for it to potentially remove dropped columns).
Console Log/Tracebacks
Please avoid screenshots if possible, instead copy and paste the console and wrap it in
code by using
code block
Desktop (please complete the following information):
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.