datacoves / dbt-coves Goto Github PK

View Code? Open in Web Editor NEW

208.0 10.0 14.0 2.02 MB

CLI tool for dbt users to simplify creation of staging models (yml and sql) files

Home Page: https://pypi.org/project/dbt-coves/

License: Apache License 2.0

Shell 0.20% Python 98.24% Jinja 1.56%

analytics bigquery datacoves dbt elt etl jinja python redshift snowflake

dbt-coves's People

Contributors

Stargazers

Watchers

Forkers

leonardopelanda belaidcherfa yvathrey bestdrivercn seandavi patterninc z3z1ma bantonellini jhsb25 alexnikitchuk coldborecapital hodadelfi

dbt-coves's Issues

[BUG] dbt version installed is 0.20 but `packages.yml` refers to packages for 0.19

Describe the bug
The cookiecutter template used is assuming 0.19 as the dbt version

To Reproduce
Install dbt-coves and then run dbt deps

Expected behaviour
To install compatible packages with dbt 0.20
Ideally dbt-coves init should recognize the installed dbt version and generate the appropriate packages.yml file.

[BUG] SQL compilation error when flattening source table

Describe the bug
Database Error 000904 (42000): SQL compilation error: error line 1 at position 19 invalid identifier 'ISSUE'

To Reproduce
Error on flattening github_pull_requests__links.

Support for dbt 0.21.0

[BUG] Pre-commit rule didn't pass, but shows as passed on CI job

Describe the bug
Lint test failed, but shows as passed on CI job

To Reproduce
Break a pre-commit rule and push to repo.

Include `filter`, `select`, and `exclude` arguments in generate source

Describe the solution you'd like
We're currently able to supply

select,
exclude,
filter

arguments when generating properties. It would be very helpful to do the same when generating sources.

Describe alternatives you've considered
Manually selecting sources via the CLI

[BUG] Generate Sources in BigQuery with same table name (different cases)

Describe the bug
When i use dbt-coves generate sources in a dbt project with BigQuery and i create models for 2 tables with the same name but different cases, the function only returns one of the models and overwrites the other.

To Reproduce
Steps to reproduce the behaviour:

Create in BigQuery two tables with the same name, one with lowercase and the other with uppercase.
Execute dbt-coves generate sources for the BigQuery dbt project
In the CLI, select both tables that you created
dbt-coves create the models for the two tables, but only returns one of the models and the other overwrites.

Console Log/Tracebacks

OS: Ubuntu
Version: dbt-coves v1.3.0-alpha.8

Feature Request: Snowflake: Multithreading for performance?

Is your feature request related to a problem? Please describe.
It seems like running generate sources sends the DESCRIBE TABLE ... statements to Snowflake sequentially one-by-one as it goes. It'd be great if this went a lot faster.

Describe the solution you'd like
Would it be reasonable to queue up all those database statements up front and run through them as the results return, so that it completes much more quickly?

Describe alternatives you've considered
Can't really think of any apart from just using it as it is and waiting much longer.

Additional context
python 3.10.9
Snowflake 7.3.1
macOS 13.2 (22D49)
output of pip freeze:
agate==1.6.3
asn1crypto==1.5.1
attrs==22.2.0
Babel==2.11.0
bump2version==1.0.1
bumpversion==0.6.0
certifi==2022.12.7
cffi==1.15.1
charset-normalizer==2.1.1
click==8.1.3
colorama==0.4.5
commonmark==0.9.1
cryptography==36.0.2
dbt-core==1.3.2
dbt-coves==1.3.0a25
dbt-extractor==0.4.1
dbt-snowflake==1.3.0
filelock==3.9.0
future==0.18.3
hologram==0.0.15
idna==3.4
importlib-metadata==6.0.0
isodate==0.6.1
jaraco.classes==3.2.3
Jinja2==3.1.2
jsonschema==3.2.0
keyring==23.13.1
leather==0.3.4
Logbook==1.5.3
luddite==1.0.2
MarkupSafe==2.1.2
mashumaro==3.0.4
minimal-snowplow-tracker==0.0.2
more-itertools==9.0.0
msgpack==1.0.4
networkx==2.8.8
oscrypto==1.3.0
packaging==21.3
parsedatetime==2.4
pathspec==0.9.0
pretty-errors==1.2.25
prompt-toolkit==3.0.36
pycparser==2.21
pycryptodomex==3.17
pydantic==1.10.4
pyfiglet==0.8.post1
Pygments==2.14.0
PyJWT==2.6.0
pyOpenSSL==22.0.0
pyparsing==3.0.9
pyrsistent==0.19.3
python-dateutil==2.8.2
python-slugify==7.0.0
pytimeparse==1.1.8
pytz==2022.7.1
PyYAML==6.0
questionary==1.10.0
requests==2.28.2
rich==12.6.0
ruamel.yaml==0.17.21
ruamel.yaml.clib==0.2.7
six==1.16.0
snowflake-connector-python==2.7.12
sqlparse==0.4.3
text-unidecode==1.3
typing_extensions==4.4.0
urllib3==1.26.14
wcwidth==0.2.6
Werkzeug==2.2.2
yamlloader==1.2.2
zipp==3.12.0

Feature: Get source descriptions from OpenAPI specifications

Many sources are based on APIs and often times these APIs are well documented. It would be great to have the option to get descriptions for all fields in a source from the OpenAPI specification.

Add schema and table selectors on "generate sources"

Is your feature request related to a problem? Please describe.
When you have dozens or hundreds of tables, it's difficult to pick the right ones.

Describe the solution you'd like
Filter schemas running dbt-coves generate sources --shemas=RAW_*.
Filter tables running dbt-coves generate sources --relations=S*RC_*.

[BUG] dbt-coves fails to detect columns in schema

Describe the bug
A clear and concise description of what the bug is.

For a single schema, dbt-coves is unable to detect columns via generate sources, producing the following SQL:

SELECT
FROM {{ source('xyz', 'abc') }}

and YML:

- name: xyz__abc
    description: ''
    columns:

but for all other schemas, dbt-coves correctly generates sources. I'm unable to figure out why this is the case. Some debugging suggestions would be appreciated.

Allow custom model naming

Is your feature request related to a problem? Please describe.
Since dbt requires unique model names, our models are named by the convention models/schema/schema__table.sql. When running generate properties, the models are picked up by the wrong names, e.g.

Model locations.locations__county not materialized, did you execute dbt run?. 
Model locations.locations__state not materialized, did you execute dbt run?.

Where the models are actually named

locations.state
locations.county

Describe the solution you'd like
Allow for custom model naming conventions so that model names do not have to equal table names.

Generate sources from tables/views located on a different database

I have two databases, a source database where I load all my raw data to, and then a data model database where dbt operates. So my dbt profile is set to the data model database but has a role that can read from the source database. But the generate sources command doesn't look in my source database, only my data model database.

[BUG] Unable to install v1.3 via PyPI/pip

Describe the bug

Unable to install dbt-coves v1.3 from PyPI (via pip install)

To Reproduce

python -m venv venv_test
source venv_test/bin/activate
pip install --upgrade pip
pip install dbt-coves~=1.3

Expected behaviour

Successful installation (I noticed v1.4 was yanked on PyPI, so pinning to ~=1.3, which I'm assuming is >=1.3,<1.4)

Console Log/Tracebacks

(note, I added line breaks to make it more human-readable)

ERROR: Ignored the following versions that require a different python version: 0.19.1a7 Requires-Python >=3.7,<3.9; 0.19.1a8 
Requires-Python >=3.7,<3.9; 0.19.1a9 Requires-Python >=3.7,<3.9; 0.19.2a10 Requires-Python >=3.7,<3.9; 0.20.0a1 
Requires-Python >=3.7,<3.9; 0.20.0a2 Requires-Python >=3.7,<3.9; 0.20.0a3 Requires-Python >=3.7,<3.9; 0.21.0a1 
Requires-Python >=3.7,<3.9; 0.21.0a10 Requires-Python >=3.7,<3.9; 0.21.0a11 Requires-Python >=3.7,<3.9; 0.21.0a12 
Requires-Python >=3.7,<3.9; 0.21.0a13 Requires-Python >=3.7,<3.9; 0.21.0a14 Requires-Python >=3.7,<3.9; 0.21.0a15 
Requires-Python >=3.7,<3.9; 0.21.0a16 Requires-Python >=3.7,<3.9; 0.21.0a17 Requires-Python >=3.7,<3.9; 0.21.0a18 
Requires-Python >=3.7,<3.9; 0.21.0a2 Requires-Python >=3.7,<3.9; 0.21.0a3 Requires-Python >=3.7,<3.9; 0.21.0a4 
Requires-Python >=3.7,<3.9; 0.21.0a5 Requires-Python >=3.7,<3.9; 0.21.0a6 Requires-Python >=3.7,<3.9; 0.21.0a7 
Requires-Python >=3.7,<3.9; 0.21.0a8 Requires-Python >=3.7,<3.9; 0.21.0a9 Requires-Python >=3.7,<3.9; 0.21.1a19 
Requires-Python >=3.7,<3.9; 0.21.1a20 Requires-Python >=3.7,<3.9
ERROR: Could not find a version that satisfies the requirement dbt-coves~=1.3 (from versions: 1.0.4a1, 1.0.4a2, 1.0.4a3, 
1.0.4a4, 1.0.4a17, 1.0.4a18, 1.0.4a19, 1.0.4a20, 1.0.4a21, 1.0.4a22, 1.0.4a23, 1.0.4a24, 1.0.4a25, 1.0.4a26, 1.0.4a27, 1.0.4a28, 
1.0.4a29, 1.0.4a30, 1.0.5a1, 1.0.5a2, 1.0.5a3, 1.1.0a1, 1.1.0a2, 1.1.0a3, 1.1.0a4, 1.1.0a5, 1.1.0a6, 1.1.0a7, 1.1.0a8, 1.1.0a9, 1.1.1a0, 
1.1.1a1, 1.1.1a2, 1.1.1a3, 1.1.1a4, 1.1.1a5, 1.1.1a6, 1.1.1a7, 1.1.1a8, 1.1.1a9, 1.1.1a10, 1.1.1a11, 1.1.1a12, 1.1.1a13, 1.1.1a14, 1.1.1a15, 
1.1.1a16, 1.1.1a17, 1.1.1a18, 1.1.1a19, 1.1.1a20, 1.1.1a21, 1.1.1a22, 1.1.1a23, 1.1.1a24, 1.1.1a25, 1.1.1a26, 1.1.1a27, 1.1.1a28, 1.1.1a29, 
1.1.1a30, 1.1.1a31, 1.1.1a32, 1.1.1a33, 1.3.0a1, 1.3.0a2, 1.3.0a3, 1.3.0a4, 1.3.0a5, 1.3.0a6, 1.3.0a7, 1.3.0a8, 1.3.0a9, 1.3.0a10, 
1.3.0a11, 1.3.0a12, 1.3.0a13, 1.3.0a14, 1.3.0a15, 1.3.0a16, 1.3.0a17, 1.3.0a18, 1.3.0a19, 1.3.0a20, 1.3.0a21, 1.3.0a22, 1.3.0a23, 
1.3.0a24, 1.3.0a25, 1.3.0a26, 1.3.0a27, 1.3.0a28, 1.4.0)
ERROR: No matching distribution found for dbt-coves~=1.3

Desktop (please complete the following information):

OS: Mac M2, OSX Ventura 13.1
Browser: Mac Terminal app
Version: 2.13 (447)

Additional context

python --version
Python 3.9.13

New command `docs update`

Update model properties files tables and field descriptions by running dbt-coves docs update --from metadata.csv.
The csv will contain identifiers for tables and fields so dbt-coves could take the descriptions set and update the corresponding model properties yaml files.
Models could be filtered using a --m argument as dbt does.

[BUG] Fix model (SQL) templates in order to support reserved words/uppercase/lowercase

Describe the bug
The GitHub airbyte connector created a field named default in lowercase in the github_issue_labels table. dbt-coves does not double-quote case-sensitive snowflake columns.

Running a command that expects a subcommand shows misleading error

Running a command that expects a subcommand will return this:
Can't instantiate abstract class GenerateTask with abstract methods run
which is misleading.

Handle the error and print the help instead.

Feature request: `generate sources` "update" mode also update staging models (rather than overwrite or leave unchanged)

Describe the current behavior
If I have a pre-existing some_staging_model.sql as:

SELECT
	col1

FROM
	some_table

WHERE
	col1 = FALSE

, and there is a new column that should be added due to it's being recently-added in the source, generate sources will remove my existing transformations when it updates some_staging_model.sql as:

SELECT
	col1
,	col2

FROM
	some_table

Requested behaviour
I would expect that it would retain the existing SQL, and just add new columns / drop newly-deleted columns

Desktop (please complete the following information):

OS: macOS 13.2.1 (22D68)
Python 3.10.10
dbt-core==1.3.3
dbt-coves==1.3.0a29
dbt-snowflake==1.3.0

Additional context
Contents of my .dbt_coves.config.yml:

generate:
  sources:
    update_strategy: update

Maybe this should only be the case if the update strategy is set to "update"?

I'm trying to automate my staging layer refresh using dbt-coves. If, however, I'll lose all staging layer transformations every time it's run, then I can't use it for this purpose.

`dbt-coves generate` output should be a list of options

Describe the bug
When running dbt-coves generate the output is:

Can't instantiate abstract class GenerateTask with abstract methods run

To Reproduce
dbt-coves generate

Expected behaviour
To show a list of possible arguments.

[BUG]

dbt-coves v0.20.0-a.3 dbt v0.20.2

Can't instantiate abstract class GenerateTask with abstract methods run
To Reproduce
Steps to reproduce the behaviour:

Run command "dbt-coves generate --profile cloud"
etc.

I assume because dbt v0.20.2 is not supported.

I've installed dbt-coves in a fresh venv. I guess it might be good idea to constraint the dependencies for support version to help getting started more easir

Automatically generate documentation based on upstream documented columns

Is your feature request related to a problem? Please describe.

dbt-coves seems to do a lot to help users adopt analytics engineering best practices.

One of the best practices is documentation.

A problem I (and a lot of people) have is that you have to repeat the column descriptions in every downstream model.

This results in either a bunch of time copying and pasting or downstream models not having proper descriptions.

Describe the solution you'd like
What I would like to do is just do column descriptions once at the base model and then have that propagate through the DAG to all downstream models.

Describe alternatives you've considered
Use PyYAML to make my own solution (more of a learning project since this is not my speciality)

dbt-osmosis has a way to "Automatically generate documentation based on upstream documented columns"

dbt-osmosis yaml document --project-dir ... --profiles-dir ...

[BUG] Documentation generated automatically has `--` compiled wrong

[BUG] Updating existing source yml removes pre-existing quotes from description

Describe the bug
Pre-exising contents of schema.yml

version: 2
sources:
  - name: my_source_name
    database: my_database_name
    description: 'here is a test source description'
    tables:
      - name: my_table_name
        description: 'here is a test table description'

Command:
dbt-coves generate sources --database my_database_name --schemas my_schema_name --update-strategy update

Contents of schema.yml after executing command:

version: 2
sources:
  - name: my_source_name
    database: my_database_name
    description: here is a test source description
    tables:
      - name: my_table_name
        description: here is a test table description

To Reproduce
Steps to reproduce the behaviour:

Create source file with at least one description
Run command dbt-coves generate sources --database my_database_name --schemas my_schema_name --update-strategy update
Check source file and see that quotes have been removed from descriptions

Expected behaviour
I would expect it to leave existing descriptions untouched

Desktop (please complete the following information):

OS: macOS 13.1 (22C65)
dbt-core==1.3.0
dbt-coves==1.3.0a18
dbt-extractor==0.4.1
dbt-snowflake==1.3.0
Snowflake 7.1.1
python 3.9.16

Init subcommand initializes project in the current directory

Is your feature request related to a problem? Please describe.
When initializing a new dbt project, it's good to create every file in the current folder instead of on a new one.

Describe the solution you'd like
By passing the argument --current-dir, the initialization will happen in the current directory.

Support extracting and loading airbyte sources, connections and destinations metadata

Request:`generate sources`: Quote columns that use SQL keywords (such as `GROUP`)

Is your feature request related to a problem? Please describe.
Sometimes a column in a table may have the name GROUP, ORDER, START, SCHEMA, TABLE, etc.
generate sources doesn't quote these columns, causing issues when dbt is run

Describe the solution you'd like
I'd like generate sources to automatically quote column names when they conflict with database keywords

Describe alternatives you've considered
Manually modifying the templates with huge if x or x conditions to add quotes if needed

[BUG] Hyphen and backslash not handled in csv source naming

Describe the bug
in dbt-coves generate sources, a csv airbyte source containing backslash / hyphens in the column names causes issues on Snowflake

To Reproduce
Steps to reproduce the behaviour:

Ingest a csv file with above characters in column names using Airbyte
Run dbt-coves generate sources
Select the above created table, and choose to flatten.

Expected behaviour
Expected output:
_airbyte_data:"cldr display name"::varchar as cldr_display_name,
Actual output:
_airbyte_data:cldr display name::varchar as cldr display name,
Results in dbt run:
Database Error in model _airbyte_raw_country_codes (models/sources/raw/_airbyte_raw_country_codes.sql)
001003 (42000): SQL compilation error:
syntax error line 13 at position 35 unexpected 'name'.

Support dbt macros on dbt-coves jinja templates

Is your feature request related to a problem? Please describe.
Sometimes you need to reuse the same python function many times.

Describe the solution you'd like
It would be good to define common functions in dbt macros so they can be used in dbt-coves templates.

[BUG] ❌ 'bool' object has no attribute 'parent' after running `dbt-coves setup all`

Describe the bug
A clear and concise description of what the bug is.

After running dbt-coves setup all and going through the prompts, I get:

dbt init SUCCESS ✔

❌ 'bool' object has no attribute 'parent'

To Reproduce
Steps to reproduce the behaviour:

Run dbt-coves setup all

Expected behaviour
To not get the > ❌ 'bool' object has no attribute 'parent' output

Desktop (please complete the following information):

OS: macOS 13.2 (22D49)
Python: 3.10.9
output of pip freeze:
agate==1.6.3
asn1crypto==1.5.1
attrs==22.2.0
Babel==2.11.0
bump2version==1.0.1
bumpversion==0.6.0
certifi==2022.12.7
cffi==1.15.1
charset-normalizer==2.1.1
click==8.1.3
colorama==0.4.5
commonmark==0.9.1
cryptography==36.0.2
dbt-core==1.3.2
dbt-coves==1.3.0a20
dbt-extractor==0.4.1
dbt-snowflake==1.3.0
filelock==3.9.0
future==0.18.3
hologram==0.0.15
idna==3.4
importlib-metadata==6.0.0
isodate==0.6.1
jaraco.classes==3.2.3
Jinja2==3.1.2
jsonschema==3.2.0
keyring==23.13.1
leather==0.3.4
Logbook==1.5.3
luddite==1.0.2
MarkupSafe==2.1.2
mashumaro==3.0.4
minimal-snowplow-tracker==0.0.2
more-itertools==9.0.0
msgpack==1.0.4
networkx==2.8.8
oscrypto==1.3.0
packaging==21.3
parsedatetime==2.4
pathspec==0.9.0
pretty-errors==1.2.25
prompt-toolkit==3.0.36
pycparser==2.21
pycryptodomex==3.16.0
pydantic==1.10.4
pyfiglet==0.8.post1
Pygments==2.14.0
PyJWT==2.6.0
pyOpenSSL==22.0.0
pyparsing==3.0.9
pyrsistent==0.19.3
python-dateutil==2.8.2
python-slugify==7.0.0
pytimeparse==1.1.8
pytz==2022.7.1
PyYAML==6.0
questionary==1.10.0
requests==2.28.2
rich==12.6.0
six==1.16.0
snowflake-connector-python==2.7.12
sqlparse==0.4.3
text-unidecode==1.3
typing_extensions==4.4.0
urllib3==1.26.14
wcwidth==0.2.6
Werkzeug==2.2.2
yamlloader==1.2.2
zipp==3.11.0

Improve flattening in `dbt-coves generate sources`

Is your feature request related to a problem? Please describe.
Do more comprehensive flattening of JSON data.

Describe the solution you'd like
Create additional tables when JSON field contains multiple levels of data.

[BUG] Table not found on BigQuery datastet

Describe the bug
Table not found on BigQuery datastet

To Reproduce
dbt-coves generate source

Expected behaviour
Generate files from source tables

Console Log/Tracebacks

Which schemas would you like to inspect? [RAW_RTB_HOUSE_GSHEETS]
No tables/views found in RAW_RTB_HOUSE_GSHEETS schema.

Desktop (please complete the following information):

OS: Linux

Additional context

Besides console info are all on uppercase, all my sources datasets and tables are lowercase

Run pre-commit checks on CI

[BUG] Arrays are casted as string or record

Describe the bug
When using dbt-coves generate sources, the resulting SQL file doesn't consider arrays. These are casted as string.
Also dbt-coves doesn't traverse through the array when generating the source.yml file. Only the top level will be represented as column.
This is at least true for BigQuery.

To Reproduce
Steps to reproduce the behaviour:

Create a source table with arrays on BigQuery.
Run command dbt-coves generate sources

Expected behaviour
The SQL should contain all fields and wrap them in structs and arrays correctly.
The yml should show all levels of columns.

Best
Andreas

Add dbt-coves generate properties

Is your feature request related to a problem? Please describe.
It can be painful to create .yml files for dbt models; the behaviour of dbt-coves generate properties would be helpful for models also.

Describe the solution you'd like
Addition of command dbt-coves generate properties, which searches the manifest for models without a patch_path; presents an interface for selection like dbt-coves generate sources does, and creates a templated .yml file with all columns listed for selected models.

Describe alternatives you've considered
dbt-sugar contains this functionality, as does codegen. However, using multiple separate tools to do essentially the same thing (from the user perspective) is not ideal, as it requires the user to remember which tool does which specific use case.

[BUG] Generate source throws exception when VARIANT contains no json

Describe the bug
Generate source throws exception when VARIANT contains no json

To Reproduce
dbt-coves generate sources on a table with a variant column with no JSON

Installing dbt-coves downgraded my dbt installation (with specified adapter) to 0.18.2 [BUG]

Describe the bug
I had dbt-redshift installed in a pipenv, and when I installed dbt-coves it looks like it installed [email protected] (without a specific dbt adapter specified)

To Reproduce
Steps to reproduce the behaviour:

Create a new pipenv
pipenv install dbt-redshift
Run dbt --version, note that plugins postgres and redshift are installed at 0.19.2
pipenv install dbt-coves
Run dbt --version, note that all 4 core plugins (the two above plus snowflake and BQ) are installed at 0.18.2
Run pipenv update dbt-redshift
Run dbt --version and get

installed version: 0.19.2
   latest version: 0.19.2

Up to date!

Plugins:
  - bigquery: 0.18.2
  - postgres: 0.19.2
  - redshift: 0.19.2
  - snowflake: 0.18.2

Note that BQ and SF are still 0.18.2 but PG and RS are 0.19.2

Expected behaviour
Installing dbt-coves shouldn't clobber an existing dbt install which has specified a specific adapter

Desktop (please complete the following information):
Windows 10
Python 3.8
dbt-coves v0.19.2-a.10

Feature: Generate docs.md files with link to schema.yml columns

Is your feature request related to a problem? Please describe.
dbt has the option to define column descriptions in markdown files. It would be great to have an option in dbt-coves to generate schema descriptions in the style of:

      - name: status
        description: '{{ doc("orders_status") }}'

And generate a docs.md file that has the columns defined, so I would just need to add my descriptions there.

[feat] Setup task expansion to cover lifecycle of dbt-coves vendored components

Is your feature request related to a problem? Please describe.

dbt-coves can set up github actions, airflow, and other pieces of the data stack in a pre-configured simple way.

dbt-coves setup macros # use dbt-package
dbt-coves update macros # use a version matrix and update packages.yml for the user

These are all things we can "setup" for the user but require ongoing lifecycle management.

dbt-coves setup github-actions
dbt-coves setup --update github-actions

dbt-coves setup airflow
dbt-coves setup --update airflow

dbt-coves setup permifrost
dbt-coves setup --update permifrost

dbt-coves setup training-and-tools
dbt-coves setup --update training-and-tools

dbt-coves setup snowpark
dbt-coves setup --update snowpark

Managing the lifecycle of files in user owned repos involves change management. Copier enables efficient change management (including handling merge conflicts) and leverages a centralized repository of templates which provide the actual content. Furthermore by wrapping in dbt-coves setup, we will be able to provide a changelog to assist users in updating dbt-coves derived components in their data stack codebase. We can also consider a dbt-coves setup --update --all [--dry-run ?] command which checks for component updates and outputs actions / changelogs to terminal.

Feature request: For automation, add more defaults (and ideally, an `all` option for some) for `generate sources`

Is your feature request related to a problem? Please describe.
I'm trying to run generate sources automatically via cron on a virtual machine. But, from what I can tell, there isn't a way to suppress some of the prompts (choose relations, flatten JSON, etc.).
It also seems that I have to hardcode the schemas I'd like, using config.yml.

Describe the solution you'd like
I'd like a simple way to be able to generate sources for all relations in a schema, and for all schemas in a database.
I'd like to be able to have the following in my config

"all" for schemas
"all" for relations"
a default for flattening JSON
an default update strategy for existing staging models

Describe alternatives you've considered
Not being able to automate it. Closest I've come is by hard coding my schemas in config.yml and then still needing to execute manually so I can interact with the prompts.

Support one_file_per_schema for model_props_strategy

We create one file per schema, as opposed to per table. Currently I am manually merging the table-specific data into a single file, but it would be nice for this to happen automatically.

Given that the strategy config exists at all, I assume this is already under consideration for the future!

[BUG] 'list' object has no attribute 'keys' when flattening a source table

Describe the bug
'list' object has no attribute 'keys' when flattening table created by airbyte (github_issues)

To Reproduce
Use Airbyte to ingest data from github issues.
Run dbt-coves generate sources, when prompted to flatten data, answer yes.

[BUG] Flattened table: fields named url or name don't have commas following

Describe the bug
Flattened table: fields named url or name don't have commas following

To Reproduce
Run dbt-coves generate sources on github_commits_commit table generated by airbyte.

Add "exclude-relations" argument when running `dbt-coves generate sources`

Is your feature request related to a problem? Please describe.
Sometimes you need to exclude some relations when running dbt-coves generate sources and avoid passing all the possible options using just the relations filter.

Describe the solution you'd like
Run dbt-coves generate sources --exclude-relations _airbyte* will process all relation excepts those starting with _airbyte.
The --exclude-relations filter should be case insensitive.

`dbt-coves setup` should help setting up the cloned repo (sqlfluff, pre-commit, ci/cd)

Currently dbt-coves init does this for you. But, most of the time, the repo already exists and we need to add the missing components manually.

Question: Should `generate sources` remove sources/staging models that are no longer in the database?

I noticed that when I re-run generate sources, sources and staging models for which the base tables have been deleted from the database are not modified / deleted. Is that the expected behavior?

Set field types, descriptions and table descriptions when running `generate sources`

Model properties files (yaml) are not currently generating any field or table description.
It should be nice to add a new parameter --metadata metadata.csv where dbt-coves looks for fields descriptions, types, tables descriptions and by using that information the generated yaml gets almost complete.
This information would be also used to specify the types on the .SQL file when doing flattening.
The structure of the metadata.csv file is https://docs.google.com/spreadsheets/d/1bLWFXt3XhMwTWpNgcXMQovGvITEPIjuOCDarY7xevsY/edit?usp=sharing

Restrict generate sources to output yml files only

Is your feature request related to a problem? Please describe.
Generating full .yml files (model name, list of columns, empty description fields) for every model in a dbt project - thorough documentation.

Describe the solution you'd like
If the current generate sources function had a flag that disables .sql files creation / or yml-only flag, it would achieve exactly that. Although it's set up for sources specifically, if you provide your analytics schema in the .dbt_coves.yml, it will create the yml definitions for all of the dbt models in a project.

Describe alternatives you've considered
As an alternative I had a wrapper python function that captured terminal output of this function https://github.com/dbt-labs/dbt-codegen/blob/0.3.2/macros/generate_model_yaml.sql for each model and created the ymls.

Additional context
The generate sources function is extremely useful in creating staging models for a dbt project. It seems, however, that it could be easily extended to help with thorough documentation of all models within a dbt project. Ideally, it wouldn't then put the "sources" part at the top of the file, but that's much easier to get rid of than creating all of the ymls from scratch!

Change check command to use pre-commit vs calling sqlfluff directly

Is your feature request related to a problem? Please describe.
no

Describe the solution you'd like
use sqlfluff pre-commit hook
https://docs.sqlfluff.com/en/stable/production.html#using-pre-commit

[BUG] Jinja Templating breaks on multiple references of `{{ schema}}`

Describe the bug
When defining sources_destination, models_destination, etc in .dbt_coves/config.yml, the first reference to schema is unable to be altered, while following sources can be

To Reproduce

Steps to reproduce the behaviour:

Define:

models_destination: "models/staging/{{ schema }}/{{ schema | replace('raw','stg') }}__{{ relation }}.sql"

dbt-coves functions as expected, but with

models_destination: "models/staging/{{ schema | replace('raw','') }}/{{ schema | replace('raw','stg') }}__{{ relation }}.sql"

the first reference to schema returns a blank string.

Expected behaviour

I would expect referencing the same variable in the same line to function identically.

[BUG] (question) `generate sources` doesn't pick up new columns

Describe the bug
If I run generate sources, then add a column to one of the tables for which the sources were created, then re-run generate sources, the new column is not picked up. I'm not sure if this is expected or not.

To Reproduce
Steps to reproduce the behaviour:

Run generate sources
In the database, add a column to one of the source tables
Run generate sources again

Expected behaviour
I would expect that the new column would be added in the sources.yml file (and for it to potentially remove dropped columns).

Console Log/Tracebacks
Please avoid screenshots if possible, instead copy and paste the console and wrap it in
code by usingcode block
Desktop (please complete the following information):

OS: macOS 13.2 (22D49)
Snowflake 7.3.1
Python 3.10.9
dbt-core==1.3.2
dbt-coves==1.3.0a25
dbt-extractor==0.4.1
dbt-snowflake==1.3.0

Option to not generate source.yml by default

Is your feature request related to a problem? Please describe.
In a regular workflow, most times (if not all) the model.sql will be modified (renaming columns mainly, converting/casting) and then another model.yml will need to be reproduced anyway to reflect those modifications.

Describe the solution you'd like
The option to not generate source.yml when dbt-coves generate sources ---database <db_name> is called

Describe alternatives you've considered
Noel suggested changing the path for the generated yml during source creation to go to /tmp though the directory can cause added confusion

datacoves / dbt-coves Goto Github PK

dbt-coves's People

Contributors

Stargazers

Watchers

Forkers

dbt-coves's Issues

Recommend Projects

Recommend Topics

Recommend Org