Describe the bug The on-r

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Duplicate entries in the report. about elementary HOT 4 OPEN

annav00 commented on June 20, 2024 1

Duplicate entries in the report.

from elementary.

Comments (4)

MICHM137 commented on June 20, 2024

Hello,

I confirm this bug which is a bit annoying. When we run models in parallel then we end up with duplicates in the tables.
As workaround I need to run those queries periodically.

create or replace table elementary.dbt_tests as (
    select * from elementary.dbt_tests qualify row_number() over (partition by unique_id order by generated_at) = 1 
);
create or replace table elementary.dbt_models as (
    select * from elementary.dbt_models qualify row_number() over (partition by unique_id order by generated_at) = 1 
);
create or replace table elementary.dbt_sources as (
    select * from elementary.dbt_sources qualify row_number() over (partition by unique_id order by generated_at) = 1 
);
create or replace table elementary.dbt_exposures as (
    select * from elementary.dbt_exposures qualify row_number() over (partition by unique_id order by generated_at) = 1
);
create or replace table elementary.dbt_columns as (
    select * from elementary.dbt_columns qualify row_number() over (partition by unique_id order by generated_at) = 1
);

from elementary.

haritamar commented on June 20, 2024

Hi @annav00 and @MICHM137 ,
Sorry for the delay in responding here, can you please confirm if this issue is still relevant to you?
Also - which databases are you using?

I'll mark this as high priority on our end.

In the meantime, a workaround you can consider setting the var cache_artifacts = False - this will force a full replace of the artifacts on every run, which I think should actually prevent duplicates (though it can increase on_run_end duration).

(When caching is enabled we only insert a diff - and I think there's probably a race there)

from elementary.

mattxxi commented on June 20, 2024

Hey (it is MICHM137),
I would avoid adding cache_artifacts False because the on_run_end duration takes already a lot of time and make our pipelines way longer than without elementary.
Do you plan optimizing the on_run_end hook?
Thanks for your answer

from elementary.

haritamar commented on June 20, 2024

Hi @mattxxi ,
Yeah makes sense. We implemented the cache due to performance reasons, just pointed out the alternative.

I think the duplicate entries when the cache is enabled results from a race in the delete_and_insert macro, which we need to fix.
I don't have an immediate time frame for it but I'm guessing we'll prioritize it in the near future.

from elementary.

Recommend Projects

Duplicate entries in the report. about elementary HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent