dbt-msft / tsql-utils Goto Github PK
View Code? Open in Web Editor NEWdbt-utils for the dbt-msft family of packages
License: MIT License
dbt-utils for the dbt-msft family of packages
License: MIT License
dbt_utils.get_url_parameter() doesn't work currently (neither does get_url_host
or get_url_path
).
I think it has to do with dbt_utils.split_part
:
tsql-utils/macros/dbt_utils/cross_db_utils/split_part.sql
Lines 5 to 9 in 23b11b7
dbt-utils's integration test message is:
('42000', '[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]XML parsing: line 1, character 56, semicolon expected (9411) (SQLMoreResults)')
this was the generated SQL by the integration tests...
with data as (
select * from "dbt-msft-serverless-db"."test"."data_urls"
)
select
nullif(
LTRIM(CAST(('<X>'+REPLACE(
LTRIM(CAST(('<X>'+REPLACE(url,'utm_medium=' ,'</X><X>')+'</X>') AS XML).value('(/X)[2]', 'VARCHAR(128)'))
,'&' ,'</X><X>')+'</X>') AS XML).value('(/X)[1]', 'VARCHAR(128)'))
,'') as actual,
medium as expected
from data
union all
select
nullif(
LTRIM(CAST(('<X>'+REPLACE(
LTRIM(CAST(('<X>'+REPLACE(url,'utm_source=' ,'</X><X>')+'</X>') AS XML).value('(/X)[2]', 'VARCHAR(128)'))
,'&' ,'</X><X>')+'</X>') AS XML).value('(/X)[1]', 'VARCHAR(128)'))
,'') as actual,
source as expected
from data
@davidclarance maybe you can help me with this? happy to give more context later -- just jotting this down for now
-- setup
with a as (
select * from "dbt-msft-serverless-db"."test"."test_insert_by_period"
),
b as (
select * from "dbt-msft-serverless-db"."test"."expected_insert_by_period"
),
a_minus_b as (
select "id", "created_at" from a
except
select "id", "created_at" from b
),
b_minus_a as (
select "id", "created_at" from b
except
select "id", "created_at" from a
),
unioned as (
select 'a_minus_b' as which_diff, * from a_minus_b
union all
select 'b_minus_a' as which_diff, * from b_minus_a
)
select * from unioned
which_diff | id | created_at |
---|---|---|
b_minus_a | 3 | 2018-02-02 |
b_minus_a | 4 | 2018-03-02 |
b_minus_a | 5 | 2018-04-02 |
b_minus_a | 6 | 2018-05-02 |
The fix of release a version that doesn't have a 4th integer is easy, but hesitant to introduce yet another version set to match...
dbt-labs/dbt-core#3109
consider dbt_utils.star()
which works without issue in TSQL. But, because it isn't currently implemented in tsql-utils, you have to call dbt_utils.star()
to use it... which is overly complicated?
The simplest solution IMHO involves dbt-labs/dbt-core#2923 where if a user calls tsql_utils.star
, it would first look for a star
macro in the tsql-utils package and if it can't find one, then it would go look in dbt-utils
and find dbt-utils.star
. However, that's a medium- or long-term fix.
In the short-term, @jtcohen6 do you think it makes the most sense to have 100% dbt-utils
macro cover in tsql-utils
, but macros that do not need modification are simply dispatched back to dbt-utils
, like below? Otherwise, I think it might be challenging for users to have to remember to call dbt-utils for some macros and tsql-utils for others....
{% sqlserver__star(from, relation_alias=False, except=[]) %}
{% do return( dbt_utils.star(from, relation_alias, except) ) %}
{% endmacro %}
{% synapse_star(from, relation_alias=False, except=[]) %}
{% do return( sqlserver.star(from, relation_alias, except) ) %}
{% endmacro %}
I can open a PR for this with samples from dbt-utils.
Options:
my priority for this macro is not the macro itself, but really because it is used in dbt-utils
's integration testing for validation of OTHER macros.
calc
CTEthese two coalesces are the tricky parts that need to be converted into NULLIF
s or CASE WHEN
s? Also false
needs to be replaced with 0
bc TSQL doesn't have a true & false. Here's a gist of the compiled SQL that's error-ing out. If you log into the test db (dbt-msft-serverless-db
), you can debug it.
my interpretation for the logic for the first coalesce statement below, though i'm not sure in what scenario lower_bound < upper_bound
could be NULL
.
- if
lower_bound < upper_bound
is not null:
- return the result of the comparison:
true
orfalse
(or in TSQL's case:1
or0
)- else if
lower_bound < upper_bound
isNULL
:
- return
false
(i.e.0
)
the error i'm getting is Incorrect syntax near '<'. (102)
Database Error in test dbt_utils_mutually_exclusive_ranges_data_test_mutually_exclusive_ranges_no_gaps_not_allowed__lower_bound__upper_bound (models/schema_tests/schema.yml)
('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near '<'. (102) (SQLExecDirectW)")
compiled SQL at target/compiled/dbt_utils_integration_tests/models/schema_tests/schema.yml/schema_test/dbt_utils_mutually_exclusive_ranges_data_test_mutually_exclusive_ranges_no_gaps_f27ec0771ee181e29da8b3adeddba768.sql
-- For each record: lower_bound should be < upper_bound.
-- Coalesce it to return an error on the null case (implicit assumption
-- these columns are not_null)
coalesce(
lower_bound < upper_bound,
false
) as lower_bound_less_than_upper_bound,
-- For each record: upper_bound {{ allow_gaps_operator }} the next lower_bound.
-- Coalesce it to handle null cases for the last record.
coalesce(
upper_bound {{ allow_gaps_operator }} next_lower_bound,
is_last_record,
false
) as upper_bound_{{ allow_gaps_operator_in_words }}_next_lower_bound
window_functions
CTEthis piece I already was able to convert
row_number() over (
{{ partition_clause }}
order by {{ lower_bound_column }} desc
) = 1 as is_last_record
becomes
case when
row_number() over (
{{ partition_clause }}
order by {{ lower_bound_column }} desc
) = 1
then 1 else 0
end as is_last_record
Hi Guys,
First of all, thank you for creating this package. Your efforts are much appreciated.
I noticed the type_string adapter looks like this:
{%- macro sqlserver__type_string() -%}
VARCHAR(900)
{%- endmacro -%}
Is there anything that speaks against using VARCHAR(8000)
. By limiting it to 900, there could be instances where strings are truncated.
That's particularly bad since type_string is used in surrogate_key. For example, I use hashes of URLs as keys and this would lead to duplicate surrogate keys, because some URLs are longer than 900 characters.
I am having trouble running integration tests for this package in my development environment. I can follow the "Dev environment set-up" instructions up to step 5, but then I encounter a Compilation Error when I try to run dbt seed
or related dbt commands like run
or test
. The error specifies "No macro named 'except' found":
❯ dbt deps --target sqlserver
22:21:02 [WARNING]: Deprecated functionality
The `source-paths` config has been renamed to `model-paths`. Please update your
`dbt_project.yml` configuration to reflect this change.
22:21:02 [WARNING]: Deprecated functionality
The `data-paths` config has been renamed to `seed-paths`. Please update your
`dbt_project.yml` configuration to reflect this change.
22:21:02 Running with dbt=1.1.1
22:21:02 Installing ../../
22:21:02 Installed from <local @ ../../>
22:21:02 Installing ../../dbt-utils
22:21:02 Installed from <local @ ../../dbt-utils>
22:21:02 Installing ../../dbt-utils/integration_tests
22:21:02 Installed from <local @ ../../dbt-utils/integration_tests>
22:21:02 Installing ../
22:21:02 Installed from <local @ ../>
❯ dbt seed --target sqlserver --full-refresh
22:21:21 [WARNING]: Deprecated functionality
The `source-paths` config has been renamed to `model-paths`. Please update your
`dbt_project.yml` configuration to reflect this change.
22:21:21 [WARNING]: Deprecated functionality
The `data-paths` config has been renamed to `seed-paths`. Please update your
`dbt_project.yml` configuration to reflect this change.
22:21:21 Running with dbt=1.1.1
22:21:21 Partial parse save file not found. Starting full parse.
22:21:22 Encountered an error:
Compilation Error
In dispatch: No macro named 'except' found
Searched for: 'tsql_utils_dbt_utils_integration_tests.sqlserver__except', 'tsql_utils_dbt_utils_integration_tests.default__except', 'dbt.sqlserver__except', 'dbt.default__except'
Has anyone experienced an issue like this before or have any thoughts on how I can resolve this?
Also potentially related to this issue, I noticed that my dbt-sqlserver
version is different from the 0.21.0rc1 version specified in.circleci/config.yml
. I'm thinking it would be best for my dev environment to match the ci environment, but it looks the current version of dbt-utils
requires dbt-core
>= 1.0.0.
dbt_date.get_date_dimension
macro not working with SQL Server. Tested with latest package versions (as of posting) and versions matching the dbt_date
pinned commit from the tsql-utils
repo.
{{ dbt_date.get_date_dimension(
"2020-01-01",
"2024-01-01",
)}}
('42000', "[42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near the keyword 'with'. (156) (SQLMoreResults); [42000] [Microsoft][ODBC Driver 17 for SQL Server]
[SQL Server]Incorrect syntax near the keyword 'with'. If this statement is a common table expression, an xmlnamespaces clause or a change tracking context clause, the previous statement must be termina
ted with a semicolon. (319); [42000] [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Incorrect syntax near ')'. (102)")
Microsoft SQL Server 2019 (RTM-GDR) (KB5029377) - 15.0.2104.1 (X64)
Aug 16 2023 00:09:21
Copyright (C) 2019 Microsoft Corporation
Developer Edition (64-bit) on Windows Server 2019 Datacenter 10.0 <X64> (Build 17763: ) (Hypervisor)
dbt_project.yml
dispatch configurationdispatch:
- macro_namespace: dbt_utils
search_order: ['tsql_utils', 'dbt_utils']
- macro_namespace: dbt_date
search_order: ['tsql_utils', 'dbt_date']
packages.yml
packages:
- package: dbt-labs/dbt_utils
version: 1.1.1
- package: dbt-msft/tsql_utils
version: 0.10.0
- package: calogica/dbt_date
version: 0.10.0
packages.yml
testing dbt_date
pinned commitpackages:
- package: dbt-labs/dbt_utils
version: [">=0.9.0", "<1.0.0"]
- package: dbt-msft/tsql_utils
version: 0.10.0
- package: calogica/dbt_date
version: 0.6.0
as of v0.20.0
the synapse__
macros aren't needed. we should drop them!
get_url_host
, get_url_path
, get_url_host
()
The col_3 column has differing values:
expected_dictionary[col_3][2] == None
actual_dictionary[col_3][2] == False
for some reason sqlserver__dateadd
works on Azure SQL, but on Synapse, it comes up short by a day.
actual | expected |
---|---|
1900-01-02 00 00 00.0000000 | 1900-01-01 00 00 00.000 |
tsql-utils/macros/dbt_utils/cross_db_utils/dateadd.sql
Lines 1 to 13 in 730fa0d
with data as (
select * from "dbtsynapseci"."test"."data_dateadd"
)
select
case
when datepart = 'hour' then cast(
dateadd(
hour,
interval_length,
cast(from_time as datetime)
)
as
datetime2
)
when datepart = 'day' then cast(
dateadd(
day,
interval_length,
cast(from_time as datetime)
)
as
datetime2
)
when datepart = 'month' then cast(
dateadd(
month,
interval_length,
cast(from_time as datetime)
)
as
datetime2
)
when datepart = 'year' then cast(
dateadd(
year,
interval_length,
cast(from_time as datetime)
)
as
datetime2
)
else null
end as actual,
result as expected
from data
Hi,
I noticed while using surrogate_key that the compiled sql was syntactically incorrect due to the operator used to concat the list of fields I am passing to the macro. After some triage, I realized that it was using the default dbt_utils.concat() macro which uses '||' to concat.
In order to resolve this, I added the following to my dbt_project.yml:
dispatch:
- macro_namespace: dbt_utils
search_order: ['tsql_utils', 'dbt_utils']
vars:
dbt_utils_dispatch_list: ['tsql_utils']
dbt_date_dispatch_list: ['tsql_utils']
audit_helper_dispatch_list: ['tsql_utils']
dbt_expectations_dispatch_list: ['tsql_utils']
This solved the issue but I am not confident that this is the correct solution?
https://hub.getdbt.com/dbt-msft/tsql_utils/latest/
The top of the hub page contains a default package installation configuration that is insufficient as described by the README instructions https://github.com/dbt-msft/tsql-utils#installation-instructions.
To reduce confusion for newcomers, I suggest modifying this Hub page so the top "Installation" part equals the README instruction.
as I work on #59, I've had to disable ~17 tests in integration_tests/dbt_utils/dbt_project.yml
for the same reason. The reason why is because of the pivot to the test
block where code to be tested gets injected as a subquery into the result of get_test_sql()
. For the ~17 or so macros being tested, they all use CTEs, which aren't allowed as subqueries in TSQL. previously this was only an issue for bespoke tests, where the solution was to tell users that CTEs in bespoke tests were not supported.
However, this issue is a bigger blocker, because now:
sqlserver__
ports of dbt-utils
macros that were working would now need to be refactored to not use CTEs in order to be testedthe above two points are possible, but not ideal. Here's a commit of me doing it once, but I'm not looking forward to how this will look for a macro like get_date_spine.
your input is much appreciated as always @jtcohen6! It's a bummer that I didn't consider this scenario when testing dbt-sqlserver
with dbt-core==0.20.0
initially.
I'm using the tsql_utils.surrogate_key macro and when running or compiling dbt I get this warning:
Warning: the concat
macro is now provided in dbt Core. It is no longer available in dbt_utils and backwards compatibility will be removed in a future version of the package. Use concat
(no prefix) instead.
Hello!
I'm trying to run a date_spine to populate my Calendar Dimension, but it crash with Compilation Error:
23:29:43 Running with dbt=1.4.6
23:29:44 Encountered an error:
Compilation Error in model DimCalendar (models/03_dimensions/DimCalendar.sql)
'dict object' has no attribute 'datediff'
> in macro sqlserver__date_spine (macros/dbt_utils/datetime/date_spine.sql)
> called by macro date_spine (macros/sql/date_spine.sql)
> called by model DimCalendar (models/03_dimensions/DimCalendar.sql)
Version info:
dbt --version
Core:
- installed: 1.4.6
- latest: 1.5.1 - Update available!
Your version of dbt-core is out of date!
You can find instructions for upgrading here:
https://docs.getdbt.com/docs/installation
Plugins:
- sqlserver: 1.4.3 - Up to date!
packages:
- package: dbt-labs/dbt_utils
version: ["1.1.0"]
- package: calogica/dbt_date
version: ["0.7.2"]
- package: dbt-msft/tsql_utils
version: ["0.9.0"]
With a regression to dbt_utils 0.9.0 date_spine
worked, but had some warns:
> Executing task: dbt compile --model DimCalendar
23:35:07 Running with dbt=1.4.6
23:35:07 [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:
- tests
23:35:07 Found 65 models, 0 tests, 0 snapshots, 1 analysis, 695 macros, 0 operations, 10 seed files, 199 sources, 0 exposures, 0 metrics
23:35:07
23:35:08 Concurrency: 4 threads (target='dev')
23:35:08
23:35:08 Warning: the `datediff` macro is now provided in dbt Core. It is no longer available in dbt_utils and backwards compatibility will be removed in a future version of the package. Use `datediff` (no prefix) instead. The dbt_sazi.DimCalendar model triggered this warning.
23:35:08 Warning: the `dateadd` macro is now provided in dbt Core. It is no longer available in dbt_utils and backwards compatibility will be removed in a future version of the package. Use `dateadd` (no prefix) instead. The dbt_sazi.DimCalendar model triggered this warning.
23:35:08 Done.
make a PR here
https://github.com/fishtown-analytics/hub.getdbt.com
Hi,
I am in the process of creating TSQL shims for the great packages dbt-date
and dbt-expectations
(dbt-date-tsql
and dbt-expectations-tsql
) and I have followed a similar approach to tsql-utils
, creating a submodule linking to the original package.
I am actually wondering, why are dbt shims following this approach and what are the advantages? Is this to track a specific version of the original package? But in that case, shouldn't .gitmodules
be defining a specific branch or commit from this package?
And second question, once the packages are ready, would it make sense to move them under the dbt-msft
umbrella?
Cheers,
error message:
Cannot find data type
'string'
. , it has to bevarchar
for example
select
md5(cast(concat(coalesce(cast(ID as
string
), '')) as
string
)) as Key
I have dbt_utils and tsql_utils installed.
On one of my model's columns, I have the following test:
- name: MyColumn
tests:
- not_null
- accepted_values:
values: ["Opt1", "Opt2", "Opt3", "Opt4"]
- dbt_utils.expression_is_true:
expression: "!= 'Opt1'"
config:
where: "OtherColumn is null or OtherColumn = 0"
When I run dbt test
, I get the following error:
('42000', '[42000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Create View or Function failed because no column name was specified for column 1. (4511) (SQLMoreResults)')
The compiled SQL is
select
1
from (select * from MyTable where OtherColumn is null or OtherColumn = 0) dbt_subquery
where not(MyColumn != 'Opt1')
which, when run, is created as a view. SQL Server does not support unnamed columns in views, so this fails.
For reference, the following dispatch configuration is in my dbt_project.yml
dispatch:
- macro_namespace: dbt_utils
search_order: ["tsql_utils", "dbt_utils"]
Is this a bug, or am I doing something wrong? It seems there is no expression_is_true macro in tsql-utils for dbt_utils, but I may be understanding the dispatch flow incorrectly. Thanks!
Seems like compiled query generates nested CET expression, which causes the failure.
{{ dbt_date.get_date_dimension('2000-01-01', '2050-12-31') }}
Incorrect syntax near 'with'.
with base_dates as (
with date_spine as (...
package.yml
packages:
- package: calogica/dbt_date
version: 0.4.1
- package: dbt-labs/dbt_utils
version: 0.7.4
- package: dbt-msft/tsql_utils
version: 0.8.1
dispatch defined in dbt_project.yml
dispatch:
- macro_namespace: dbt_utils
search_order: ['tsql_utils', 'dbt_utils']
- macro_namespace: dbt_date
search_order: ['tsql_utils', 'dbt_date']
Hey Guys,
yes... ANOTHER surrogate key issue from me... But I think you guys are the right people to nerd around with for this type of thing :)
I am visualizing the models I create with DBT in Power BI and I imagine many of you are doing the same.
One of the best practices in Power BI is to keep columns small and reduce their uniqueness. Unfortunately hash surrogate keys don't fit that best practice at all.
You can't use the binary column type for relationships and instead have to use the string type. So all your keys are 32 character long and highly unique.
I analysed my model with Dax Studio and found that a very large portion of the model size is due to surrogate keys.
So one negative impact that is definitely happening is a bigger model size and RAM consumption of the Power BI datasets.
Additionally, there are some claims that it also affects the performance if you create relationships on large text columns.
Here is an article explaining the performance impact them and here is a reddit discussion.
To be honest, I am skeptical about that. Since relationships are pre-defined, it would be silly if Power BI didn't optimize away the strings... (but I don't know for sure).
So my first question is: How do you handle the problem? Or do you think it's not important to handle at all?
Does the simplicity of DBT's hash keys and the productivty increase of DBT outweight the downsides when using Power BI for you?
Do you have any idea on how to benchmark whether hash keys have an impact on query performance for imported Power BI datasets?
I have also come up with a prototype to solve this issue. I have not implemented it in production yet (and I am still undecided whether I actually should), but it seems to work well enough to get your opinion here.
The model below has a test_int_key
column that is set to null when the model is built.
It also has a post_hook that updates that column with a int value corresponding to the hash key.
{{
config(
materialized='table',
as_columnstore = false,
post_hook=["
{{ gen_int_surrogate_keys(this, 'test_key', 'test_int_key')}}
"]
)
}}
with src_test as (
select * from {{ref('src_test')}}
),
dim_test as (
select
{{ dbt_utils.surrogate_key(["test_id"]) }} as test_key,
cast(null as int) as test_int_key, -- Will be replaced by final int key in macro
status,
foo,
bar
from base_test
)
select * from dim_test
This macro creates a table for the primary key of the dimension with an identity property for the int_key
and another column for the hash_key
.
When the post_hook runs this macro, it merges all new hash keys into the table, which generates new rows with a new unique int_key
for them (due to the identity column property).
Then the macro updates the int_key
column of the dimension table with int keys that correspond to the hash keys.
{%- macro gen_int_surrogate_keys(this,
hash_key_col_name,
int_key_col_name) -%}
{%
set int_table = this.schema + '.int_keys_' + this.table
%}
-- Create table to store int keys for hash keys if it doesn't exist
if object_id ('{{ int_table }}', 'U') is null
begin
create table {{ int_table }} (
int_key int identity(1,1) not null,
hash_key varbinary(8000)
)
{% set idx_name = 'int_keys_' + this.table + '__index_on_hash_key' %}
create nonclustered index {{ idx_name }}
on {{ int_table }} (hash_key)
end;
-- Merge new hash keys that are not in int_table yet
with hash_key_data as (
select
{{ hash_key_col_name }} as hash_key
from {{ this }}
)
merge {{ int_table }} target_tbl
using hash_key_data src_tbl
on target_tbl.hash_key = src_tbl.hash_key
when not matched by target
then insert (hash_key) values (src_tbl.hash_key);
-- Update orig table's int_key column with int keys
update
{{ this }}
set
{{ this }}.{{ int_key_col_name }} = int_key.int_key
from {{ int_table }} int_key
where
{{ this }}.{{ hash_key_col_name }} = int_key.hash_key
{%- endmacro -%}
I generate table models for my facts and dimensions. And then I create views with renamed columns that I actually import in Power BI.
The underlying dimension tables contain the int_key columns. The fact tables are not regenerated to include the int key columns.
Instead I join to the dimensions in the report views on the hash key, and then retrieve the int key from the dimension:
with fact_test as (
select * from {{ref('fact_test')}}
),
dim_test as (
select * from {{ref('dim_test')}}
),
final as (
select
-- Use the int key as the key instead of the hash key
dim_test.test_int_key as TestKey,
spend as Spend,
impressions as Impressions,
clicks as Clicks,
conversions as Conversions,
from fact_test
-- Join to the dimension table using the hash key
-- This way we don't have to recreate the actual
-- fact tables with the new int keys.
-- We do dynamically in the report views
left join dim_test
on fact_test.test_key = dim_test.test_key
)
select * from final
This increases the model generation and probably makes the Power BI refresh a bit longer due to the join in the report view.
But the upside is that the model size and ram consumption would be lower. It also has potentially to improve query performance.
Do you think this is an issue worth bothering with? Or is the upside too small?
And what do you think of my idea to solve it? Do you perhaps have improvement suggestions or other ideas of how this could be solved better?
And do you think there is a big enough need for this that we should integrate the solution into tsql_tools?
The insert_by_period
materialization is very useful when populating large amounts of historical data. It enables the efficient use of indexes and allows work to be chunked into smaller pieces.
Currently, this materialization is disabled for TSQL.
Having started transitioning our work to dbt recently, I have a working implementation of the insert_by_period materialization that's currently running in production. There are a few additional features I've implemented to allow some developer flexibility. I think it is now stable enough to contribute externally.
The implementation is based on Alex Higg's implementation for snowflake.
I have a bare bones install of dbt-sqlserver installed. Have the following packages
Not sure what to do to fix this
The CI is failing during the dbt seed stage due to an ill formatted file (for synapse), the file is formatted fine for the Azure SQL runs.
This means that the tests aren't running for synapse
The data is not stored in this repo, but in dbt_utils (https://github.com/fishtown-analytics/dbt-utils/blob/master/integration_tests/data/etc/data_people.csv)
A local fix could be to let dbt seed fail gracefully without raising an error to the CI and preventing the test from running. Or to identify why this file does not work and fixing that. Alternately we could --exclude the file's load, but we would also have to then exclude downstream tests which quickly gets messy.
We need to pass minimum 2 fields while calling the dbt_utils.generate_surrogate_key macro from the model.
with listing_hosts as (
select
xxxxxxx
from
xxxx
),
dim_host as (
select
yyyyyy
from
yyyy
)
select
{{ dbt_utils.generate_surrogate_key([
**'host_id','host_id'**
]) }} as host_sid,
{{ cols_host }}
from dim_host
Generate surrogate key even if I pass only one field as input to the macro.
with listing_hosts as (
select
xxxxxxx
from
xxxx
),
dim_host as (
select
yyyyyy
from
yyyy
)
select
{{ dbt_utils.generate_surrogate_key([
**'host_id'**
]) }} as host_sid,
{{ cols_host }}
from dim_host
14:52:03 Unhandled error while executing target\run\xxxx\xx\xxxx\xxx\xxxxx.sql
('42000', "[42000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Incorrect syntax near the keyword 'from'. (156) (SQLMoreResults); [42000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Incorrect syntax near the keyword 'order'. (156); [42000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]The concat function requires 2 to 254 arguments. (189)")
In this dbt slack thread, someone shared a dbt-utils-esque version of the date_spine
macro.
We should bring this into this package. It might even also work for Synapse, I don't see any recursive CTEs...
datetime
folder inside of tsql-utils/macros/dbt_utils/
date_spine.sql
to the newly created datetime
folder{% macro synapse__date_spine(datepart, start_date, end_date) -%}
{% do return( tsql_utils.sqlserver__date_spine(datepart, start_date, end_date)) %}
{%- endmacro %}
tsql-utils/integration_tests/dbt_utils/dbt_project.yml
Lines 34 to 36 in c47616b
{% macro sqlserver__date_spine_sql(start_date, end_date, datepart) %}
with
l0 as (
select c
from (select 1 union all select 1) as d(c)
),
l1 as (
select
1 as c
from l0 as a
cross join l0 as b
),
l2 as (
select 1 as c
from l1 as a
cross join l1 as b
),
l3 as (
select 1 as c
from l2 as a
cross join l2 as b
),
l4 as (
select 1 as c
from l3 as a
cross join l3 as b
),
l5 as (
select 1 as c
from l4 as a
cross join l4 as b
),
nums as (
select row_number() over (order by (select null)) as rownum
from l5
),
rawdata as (
select top ({{dbt_utils.datediff(start_date, end_date, datepart)}}) + rownum -1 as n
from nums
order by rownum
),
all_periods as (
select (
{{
dbt_utils.dateadd(
datepart,
'n',
start_date
)
}}
) as date_{{datepart}}
from rawdata
),
filtered as (
select *
from all_periods
where date_{{datepart}} <= {{ end_date }}
)
select * from filtered
{% endmacro %}
{% macro sqlserver__date_spine(start_date, end_date, datepart) -%}
{% set date_spine_query %}
{{sqlserver__date_spine_sql(start_date, end_date, datepart)}} order by 1
{% endset %}
{% set results = run_query(date_spine_query) %}
{% if execute %}
{% set results_list = results.columns[0].values() %}
{% else %}
{% set results_list = [] %}
{% endif %}
{%- for date_field in results_list %}
select '{{ date_field }}' as date_{{datepart}} {{ 'union all ' if not loop.last else '' }}
{% endfor -%}
{% endmacro %}
I don't think this is an error with the macro, but really something weird going on with equivalence. for example, why aren't the below equal in Synapse?
actual | expected |
---|---|
D41D8CD98F00B204E9800998ECF8427E | d41d8cd98f00b204e9800998ecf8427e |
Hi all, first contribution to a project so please let me know if I've done anything wrong! 😄
Only a small change but a key one for the functionality of this macro, I think.
Currently, when collecting the models names to check presence in the target DB, the macro only looks at node.name
. This means aliased models are not checked correctly and models still in use are dropped.
Simple suggested solution is to swap current_models.append(node.name)
with current_models.append(node.name if node.config.alias == None else node.config.alias)
Hey T-SQL team!
Given that this package exists alongside dispatch, but doesn't dispatch macros itself, I think this will only require a change to the README. Once users and packages have upgraded to the new dispatch syntax in dbt v0.20, they'll want to specify a first-order project config in dbt_project.yml
:
dispatch:
- macro_namespace: dbt_utils
search_order: ['tsql_utils', 'dbt_utils']
- macro_namespace: dbt_date
search_order: ['tsql_utils', 'dbt_date']
- macro_namespace: dbt_expectations
search_order: ['tsql_utils', 'dbt_expectations']
Instead of:
vars:
dbt_utils_dispatch_list: ['tsql_utils']
dbt_date_dispatch_list: ['tsql_utils']
dbt_expectations_dispatch_list: ['tsql_utils']
Two crucial differences in the new syntax:
dbt_utils
, they should only specify that one.In the meantime, there is backwards compatibility for the old syntax, so users on old versions of dbt, or using old versions of packages, can still specify those vars.
Now that #38 is merged, the fun can begin, shimming macros to get the dbt-expectations integration tests passing.
currently the only place where you can see what models are not supported is by looking at integration_tests/dbt_utils/dbt_project.yml
(see below) which is not intuitive, but while the YAML is precise the models and tests in the integration tests don't necessarily correspond 1-1 with individual macros.
Maybe something in the README.md
? Or perhaps just one issue per macro? However, dbt_utils.group_by()
I know we'll never support. worth mentioning?
after #17 the current macros we still could port are:
test_split_part()
(not working on Synapse?)
test_date_spine()
insert_by_period
materialization
generate_series()
get_column_values()
get_relations_by_pattern()
get_relations_by_prefix_and_union()
get_url_host()
get_url_parameter()
get_url_path()
tsql-utils/integration_tests/dbt_utils/dbt_project.yml
Lines 22 to 62 in b68b3a9
While using dbt_utils.deduplicate
inside a CTE, the dbt compiler creates code that has nested CTE which then throws a compilation error while running.
Incorrect syntax near the keyword 'with'
('42000', "[42000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]
Incorrect syntax near the keyword 'with'. (156)
(SQLMoreResults); [42000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]
Incorrect syntax near the keyword 'with'. If this statement is a common table expression, an xmlnamespaces clause or a change tracking context clause, the previous statement must be terminated with a semicolon.
(319); [42000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Incorrect syntax near 'natural'. (102)")
How to reproduce, I am creating a model that references another model: stg_dim_listings
My Code:
{% set cols = dbtplyr.get_column_names(ref('stg_dim_listings')) %}
{% set cols_host = dbtplyr.starts_with('host_', cols) | join(', ') %}
with listing_hosts as (
select
{{ cols_host }}
from
{{ ref('stg_dim_listings') }}
),
dim_host as (
{{ dbt_utils.deduplicate(
relation='listing_hosts',
partition_by='host_id',
order_by='host_listings_count desc',
)
}}
)
select
{{ dbt_utils.generate_surrogate_key([
'host_id'
]) }} as host_sid,
{{ cols_host }}
from dim_host
dbt generated code from complied folder:
with listing_hosts as (
select
host_id, host_url, host_name, host_location, host_about, host_response_time, host_is_superhost, host_thumbnail_url, host_picture_url, host_neighbourhood, host_listings_count, host_total_listings_count, host_verifications, host_has_profile_pic, host_identity_verified, host_since, host_response_rate, host_acceptance_rate
from
"azuresql_dbt_db"."staging"."stg_dim_listings"
),
dim_host as (
with row_numbered as (
select
_inner.*,
row_number() over (
partition by host_id
order by host_listings_count desc
) as rn
from listing_hosts as _inner
)
select
distinct data.*
from listing_hosts as data
natural join row_numbered
where row_numbered.rn = 1
)
select convert(varchar(50), hashbytes('md5', concat(coalesce(cast(host_id as VARCHAR(MAX)), '_dbt_utils_surrogate_key_null_'), '')), 2)
as host_sid, host_id, host_url, host_name, host_location, host_about, host_response_time, host_is_superhost, host_thumbnail_url, host_picture_url, host_neighbourhood, host_listings_count, host_total_listings_count, host_verifications, host_has_profile_pic, host_identity_verified, host_since, host_response_rate, host_acceptance_rate
from dim_host
From my understanding, nested WITH is not allowed in sql server. Kindly correct me if I am wrong in my analysis.
Hi Guys,
First of all, thank you for creating this package. Your efforts are much appreciated.
I noticed the hash adapter converts the md5 bytes into a varchar:
{% macro sqlserver__hash(field) %}
convert(varchar(50), hashbytes('md5', {{field}}), 2)
{% endmacro %}
If we leave the hash as varbinary, it uses only 16 bytes of data, but as varchar it uses 32. I suspect it's probably also slower to do joins on 32 character varchar than on a varbinary.
Would it be possible to take out that conversion?
I have been using varbinary in my own adapter for several months without problems. The only issue is that Power Bi doesn't support relationships on varbinary columns.
To solve that problem I created a separate macro that does the conversion when it's needed:
{% macro cast_hash_to_str(col) -%}
convert(varchar(32), {{ col }}, 2)
{%- endmacro %}
And in my final model that is being consumed by PBI I just do something like...
final as (
select
{{ cast_hash_to_str('url_key') }} as UrlKey,
...
from whatever
)
This way the hashes use less space in the DB, perform faster joins and when necessary, such as in Power Bi they are converted to strings.
I get an error when I want to create a date dimension based on the date spine macro using a range of 1900-01-01 till 2999-01-01 because of an error:
Error: ('42000', '[42000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]Some part of your SQL statement is nested too deeply. Rewrite the query or break it up into smaller queries. (103003) (SQLExecDirectW)'
Note that apparently the date spine macro is able to generate dates using below script is from 1900-01-01 till 1933-06-21. Starting from the 1933-06-22 it fails.
Copying the script from the compiled folder and pasting it direct in SQL Management Studio seems to cause no problem. It appears that the issue is in passing the compiled script to SQL Server.
Steps to reproduce:
with
Base_Date as (
{{ dbt_utils.date_spine(
datepart="day",
start_date="cast('1900-01-01' as date)",
end_date="cast('2999-12-31' as date)"
)
}}
)
select * from Base_Date
Tested on:
SQL Server 14.0.3445.2
Dbt core: 1.1.2
Dbt synapse: 1.1.0
Dbt sqlserver: 1.1.0
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.