Git Product home page Git Product logo

Comments (12)

marclipoff-curative avatar marclipoff-curative commented on August 15, 2024 1

I think it worked. My cluster is having a tough time churning on the huge amount of data we have... so cant be 100% yet

from dbt_mixpanel.

fivetran-joemarkiewicz avatar fivetran-joemarkiewicz commented on August 15, 2024

@marclipoff-curative thanks so much for opening this issue. I am just starting to look into the error and noticed the LISTAGG issue is occurring within our mixpanel__sessions model at the following line.

'{' || {{ fivetran_utils.string_agg("(event_type || ': ' || number_of_events)", "', '") }} || '}' as event_frequencies

The output of this event frequencies field will create a JSON object of the frequency of each event_type during this user session. As Redshift has a limit of 65535, it seems your sessions are having more frequency than the allotted amount. Before digging into the solution I wanted to ask a few quick questions:

  • Is this event_frequencies field important to you? You could analyze it further by exploding the JSON object, but curious if you find this field valuable for your current analysis. If you do not, then we can easily set a variable that will disable this field if you want.
  • If you find the event_frequencies field valuable, would it defeat the integrity of the field if we set a limit on the LISTAGG? For example, if we limited the LISTAGG to 50000 would you find the integrity of the field to be compromised since we are artificially stopping the event_type frequency?

For arguments sake, I actually ended up quickly removing this field if you wanted to see the package models succeed and you can evaluate the outcome and consider the importance of this field. You can leverage this version of the package by adding the below package configuration to your packages.yml

packages:
  - git: https://github.com/fivetran/dbt_mixpanel.git
    revision: hotfix/frequency-variable
    warn-unpinned: false

Let me know your thoughts!

from dbt_mixpanel.

marclipoff-curative avatar marclipoff-curative commented on August 15, 2024

I dont need it at the moment. So I'd be fine excluding it. When running that hotfix, I get:

16:09:04  Running with dbt=1.0.1
16:09:04  [WARNING]: Deprecated functionality
The `source-paths` config has been renamed to `model-paths`. Please update your
`dbt_project.yml` configuration to reflect this change.
16:09:04  [WARNING]: Deprecated functionality
The `data-paths` config has been renamed to `seed-paths`. Please update your
`dbt_project.yml` configuration to reflect this change.
16:09:05  [WARNING]: Configuration paths exist in your dbt_project.yml file which do not apply to any resources.
There are 1 unused configuration paths:
- sources.dwh.landing.src_helpscount

16:09:05  Found 443 models, 505 tests, 0 snapshots, 4 analyses, 832 macros, 2 operations, 4 seed files, 633 sources, 0 exposures, 0 metrics
16:09:05
16:09:09  Concurrency: 10 threads (target='prod')
16:09:09
16:09:09  1 of 3 START view model base.stg_mixpanel__event_tmp............................ [RUN]
16:09:12  1 of 3 OK created view model base.stg_mixpanel__event_tmp....................... [CREATE VIEW in 3.27s]
16:09:13  2 of 3 START incremental model base.mixpanel__event............................. [RUN]
16:12:06  2 of 3 OK created incremental model base.mixpanel__event........................ [INSERT 0 2800820 in 172.16s]
16:12:06  3 of 3 START incremental model base.mixpanel__sessions.......................... [RUN]
16:12:06  3 of 3 ERROR creating incremental model base.mixpanel__sessions................. [ERROR in 0.81s]
16:12:07
16:12:07  Running 2 on-run-end hooks
16:12:07  Beginning drop of orphaned relations. dry_run=True
16:12:07  This command would be run: drop view if exists base.redshift_admin_running_queries cascade;
16:12:07  1 of 2 START hook: dwh.on-run-end.0............................................. [RUN]
16:12:07  1 of 2 OK hook: dwh.on-run-end.0................................................ [OK in 0.00s]
16:12:07  2 of 2 START hook: dwh.on-run-end.1............................................. [RUN]
16:12:07  2 of 2 OK hook: dwh.on-run-end.1................................................ [OK in 0.00s]
16:12:07
16:12:07
16:12:07  Finished running 1 view model, 2 incremental models, 2 hooks in 182.13s.
16:12:08
16:12:08  Completed with 1 error and 0 warnings:
16:12:08
16:12:08  Database Error in model mixpanel__sessions (models/mixpanel__sessions.sql)
16:12:08    syntax error at or near "("
16:12:08    LINE 128:     listagg((event_type || ': ' || number_of_events), ', ')
16:12:08                         ^
16:12:08
16:12:08  Done. PASS=2 WARN=0 ERROR=1 SKIP=0 TOTAL=3```

from dbt_mixpanel.

fivetran-joemarkiewicz avatar fivetran-joemarkiewicz commented on August 15, 2024

Sounds good, thanks @marclipoff-curative!

Were you able to run dbt clean before installing the new package? Also, would you be able to run dbt run --full-refresh to make sure we skip over any holdover incremental logic from past runs?

from dbt_mixpanel.

marclipoff-curative avatar marclipoff-curative commented on August 15, 2024

I think the problem is the stray ( . right?

from dbt_mixpanel.

fivetran-joemarkiewicz avatar fivetran-joemarkiewicz commented on August 15, 2024

Where do you see the stray (? The error is a bit strange since I completely removed the listagg function from the branch.

from dbt_mixpanel.

marclipoff-curative avatar marclipoff-curative commented on August 15, 2024

It compiles to this:

agg_event_types as (

    select
        session_id
        -- turn into json
        -- '{' ||
    listagg((event_type || ': ' || number_of_events), ', ')

 || '}' as event_frequencies

    from (

        select
            session_id,
            event_type,
            count(unique_event_id) as number_of_events

        from session_ids
        group by session_id, event_type

    ) as sub group by session_id
),

I think the fivetran_utils.string_agg macro creates a new line. So when commenting out using a --, the new line doesn't get commented out.

from dbt_mixpanel.

fivetran-joemarkiewicz avatar fivetran-joemarkiewicz commented on August 15, 2024

Can you attempt and re-run dbt deps as I think that may be a previous commit since I can see it is commented out. Below is the current version of that CTE in the hotfix branch:

from dbt_mixpanel.

fivetran-joemarkiewicz avatar fivetran-joemarkiewicz commented on August 15, 2024

Actually, I was just messing around and think I found a way for users to leverage this field even if there are a few fields that exceed the limit. See my code below

{% if target.type in ('postgres','redshift') %}
case when count(event_type) <= {{ var('mixpanel__event_frequency_limit', 50000) }}
then '{' || {{ fivetran_utils.string_agg("(event_type || ': ' || number_of_events)", "', '") }} || '}'
else 'Too many event types to render'
end
{% else %}
'{' || {{ fivetran_utils.string_agg("(event_type || ': ' || number_of_events)", "', '") }} || '}'
{% endif %} as event_frequencies

Would you be able to test the new version of the branch as well and let me know if it succeeds for you? If it doesn't, you can try and adjust the new mixpanel__event_frequency_limit variable to be smaller than the default 50,000 I set (50,000 may be too high of a default haha).

vars:
   mixpanel__event_frequency_limit: 1000

from dbt_mixpanel.

fivetran-joemarkiewicz avatar fivetran-joemarkiewicz commented on August 15, 2024

HI @marclipoff-curative I just wanted to reach back out and see if you were able to give the above branch a test?

We are planning to release this early next week and I wanted to make sure it worked on your end before pushing the changes to our next release.

Let me know if you have any questions!

from dbt_mixpanel.

fivetran-joemarkiewicz avatar fivetran-joemarkiewicz commented on August 15, 2024

Thanks! I will also limit the default to maybe 1000 to hopefully help with the string agg explosion as well

from dbt_mixpanel.

fivetran-joemarkiewicz avatar fivetran-joemarkiewicz commented on August 15, 2024

Closing this issue as PR #27 addressed the initial issue identified within this bug report.

from dbt_mixpanel.

Related Issues (14)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.