Git Product home page Git Product logo

dbt-teradata's Introduction

dbt-teradata

This plugin ports dbt functionality to Teradata Vantage.

Installation

pip install dbt-teradata

If you are new to dbt on Teradata see dbt with Teradata Vantage tutorial.

Sample profile

Here is a working example of a dbt-teradata profile:

my-teradata-db-profile:
  target: dev
  outputs:
    dev:
      type: teradata
      host: localhost
      user: dbc
      password: dbc
      schema: dbt_test
      tmode: ANSI

At a minimum, you need to specify host, user, password, schema (database), tmode.

Python compatibility

Plugin version Python 3.6 Python 3.7 Python 3.8 Python 3.9 Python 3.10
0.19.0.x
0.20.0.x
0.21.1.x
1.0.0.x

Optional profile configurations

Logmech

The logon mechanism for Teradata jobs that dbt executes can be configured with the logmech configuration in your Teradata profile. The logmech field can be set to: TD2, LDAP, KRB5, TDNEGO. For more information on authentication options, go to Teradata Vantage authentication documentation.

my-teradata-db-profile:
  target: dev
  outputs:
    dev:
      type: teradata
      host: <host>
      user: <user>
      password: <password>
      schema: dbt_test
      tmode: ANSI
      logmech: LDAP

Logdata

The logon mechanism for Teradata jobs that dbt executes can be configured with the logdata configuration in your Teradata profile. Addtional data like secure token, distinguished Name, or a domain/realm name can be set in your Teradata profile using logdata. The logdata field can be set to: JWT, LDAP, KRB5, TDNEGO. logdata is not used with the TD2 mechanism.

my-teradata-db-profile:
  target: dev
  outputs:
    dev:
      type: teradata
      host: <host>
      schema: dbt_test
      tmode: ANSI
      logmech: LDAP
      logdata: 'authcid=username password=password'
      port: <port>

For more information on authentication options, go to Teradata Vantage authentication documentation

Stored Password Protection

Stored Password Protection enables an application to provide a connection password in encrypted form to the driver. The plugin supports Stored Password Protection feature through prefix ENCRYPTED_PASSWORD( either in password connection parameter or in logdata connection parameter.

  • password
my-teradata-db-profile:
  target: dev
  outputs:
    dev:
      type: teradata
      host: <host>
      user: <user>
      password: ENCRYPTED_PASSWORD(file:PasswordEncryptionKeyFileName,file:EncryptedPasswordFileName)
      schema: dbt_test
      tmode: ANSI
      port: <port>
  • logdata
my-teradata-db-profile:
  target: dev
  outputs:
    dev:
      type: teradata
      host: <host>
      schema: dbt_test
      tmode: ANSI
      logmech: LDAP
      logdata: 'authcid=username password=ENCRYPTED_PASSWORD(file:PasswordEncryptionKeyFileName,file:EncryptedPasswordFileName)'
      port: <port>

For full description of Stored Password Protection see https://github.com/Teradata/python-driver#StoredPasswordProtection.

Port

If your Teradata database runs on port different than the default (1025), you can specify a custom port in your dbt profile using port configuration.

my-teradata-db-profile:
  target: dev
  outputs:
    dev:
      type: teradata
      host: <host>
      user: <user>
      password: <password>
      schema: dbt_test
      tmode: ANSI
      port: <port>

Other Teradata connection parameters

The plugin also supports the following Teradata connection parameters:

  • account
  • column_name
  • cop
  • coplast
  • encryptdata
  • fake_result_sets
  • field_quote
  • field_sep
  • lob_support
  • log
  • logdata
  • max_message_body
  • partition
  • sip_support
  • teradata_values

For full description of the connection parameters see https://github.com/Teradata/python-driver#connection-parameters.

Supported Features

Materializations

  • view
  • table
  • ephemeral
  • incremental

Commands

All dbt commands are supported.

Custom configurations

General

  • Enable view column types in docs - Teradata Vantage has a dbscontrol configuration flag called DisableQVCI (QVCI - Queryable View Column Index). This flag instructs the database to build DBC.ColumnsJQV with view column type definitions.

    ℹ️ Existing customers, please see KB0022230 for more information about enabling QVCI.

    To enable this functionality you need to:

    1. Enable QVCI mode in Vantage. Use dbscontrol utility and then restart Teradata. Run these commands as a privileged user on a Teradata node:
      # option 551 is DisableQVCI. Setting it to false enables QVCI.
      dbscontrol << EOF
      M internal 551=false
      W
      EOF
      
      # restart Teradata
      tpareset -y Enable QVCI
    2. Instruct dbt to use QVCI mode. Include the following variable in your dbt_project.yml:
      vars:
        use_qvci: true
      For example configuration, see test/catalog/with_qvci/dbt_project.yml.

Models

Table

The following options apply to table, snapshots and seed materializations.

  • table_kind - define the table kind. Legal values are MULTISET (default for ANSI transaction mode required by dbt-teradata) and SET, e.g.:

    • in sql materialization definition file:
      {{
        config(
            materialized="table",
            table_kind="SET"
        )
      }}
    • in seed configuration:
      seeds:
        <project-name>:
          table_kind: "SET"

    For details, see CREATE TABLE documentation.

  • table_option - define table options. Legal values are:

    { MAP = map_name [COLOCATE USING colocation_name] |
      [NO] FALLBACK [PROTECTION] |
      WITH JOURNAL TABLE = table_specification |
      [NO] LOG |
      [ NO | DUAL ] [BEFORE] JOURNAL |
      [ NO | DUAL | LOCAL | NOT LOCAL ] AFTER JOURNAL |
      CHECKSUM = { DEFAULT | ON | OFF } |
      FREESPACE = integer [PERCENT] |
      mergeblockratio |
      datablocksize |
      blockcompression |
      isolated_loading
    }

    where:

    • mergeblockratio:
      { DEFAULT MERGEBLOCKRATIO |
        MERGEBLOCKRATIO = integer [PERCENT] |
        NO MERGEBLOCKRATIO
      }
    • datablocksize:
      DATABLOCKSIZE = {
        data_block_size [ BYTES | KBYTES | KILOBYTES ] |
        { MINIMUM | MAXIMUM | DEFAULT } DATABLOCKSIZE
      }
    • blockcompression:
      BLOCKCOMPRESSION = { AUTOTEMP | MANUAL | ALWAYS | NEVER | DEFAULT }
        [, BLOCKCOMPRESSIONALGORITHM = { ZLIB | ELZS_H | DEFAULT } ]
        [, BLOCKCOMPRESSIONLEVEL = { value | DEFAULT } ]
    • isolated_loading:
      WITH [NO] [CONCURRENT] ISOLATED LOADING [ FOR { ALL | INSERT | NONE } ]

    Examples:

    • in sql materialization definition file:
      {{
        config(
            materialized="table",
            table_option="NO FALLBACK"
        )
      }}
      {{
        config(
            materialized="table",
            table_option="NO FALLBACK, NO JOURNAL"
        )
      }}
      {{
        config(
            materialized="table",
            table_option="NO FALLBACK, NO JOURNAL, CHECKSUM = ON,
              NO MERGEBLOCKRATIO,
              WITH CONCURRENT ISOLATED LOADING FOR ALL"
        )
      }}
    • in seed configuration:
      seeds:
        <project-name>:
          table_option:"NO FALLBACK"
      seeds:
        <project-name>:
          table_option:"NO FALLBACK, NO JOURNAL"
      seeds:
        <project-name>:
          table_option: "NO FALLBACK, NO JOURNAL, CHECKSUM = ON,
            NO MERGEBLOCKRATIO,
            WITH CONCURRENT ISOLATED LOADING FOR ALL"

    For details, see CREATE TABLE documentation.

  • with_statistics - should statistics be copied from the base table, e.g.:

    {{
      config(
          materialized="table",
          with_statistics="true"
      )
    }}

    This option is not available for seeds as seeds do not use CREATE TABLE ... AS syntax.

    For details, see CREATE TABLE documentation.

  • index - defines table indices:

    [UNIQUE] PRIMARY INDEX [index_name] ( index_column_name [,...] ) |
    NO PRIMARY INDEX |
    PRIMARY AMP [INDEX] [index_name] ( index_column_name [,...] ) |
    PARTITION BY { partitioning_level | ( partitioning_level [,...] ) } |
    UNIQUE INDEX [ index_name ] [ ( index_column_name [,...] ) ] [loading] |
    INDEX [index_name] [ALL] ( index_column_name [,...] ) [ordering] [loading]
    [,...]

    where:

    • partitioning_level:
      { partitioning_expression |
        COLUMN [ [NO] AUTO COMPRESS |
        COLUMN [ [NO] AUTO COMPRESS ] [ ALL BUT ] column_partition ]
      } [ ADD constant ]
    • ordering:
      ORDER BY [ VALUES | HASH ] [ ( order_column_name ) ]
    • loading:
      WITH [NO] LOAD IDENTITY

    e.g.:

    • in sql materialization definition file:
      {{
        config(
            materialized="table",
            index="UNIQUE PRIMARY INDEX ( GlobalID )"
        )
      }}

      ℹ️ Note, unlike in table_option, there are no commas between index statements!

      {{
        config(
            materialized="table",
            index="PRIMARY INDEX(id)
            PARTITION BY RANGE_N(create_date
                          BETWEEN DATE '2020-01-01'
                          AND     DATE '2021-01-01'
                          EACH INTERVAL '1' MONTH)"
        )
      }}
      {{
        config(
            materialized="table",
            index="PRIMARY INDEX(id)
            PARTITION BY RANGE_N(create_date
                          BETWEEN DATE '2020-01-01'
                          AND     DATE '2021-01-01'
                          EACH INTERVAL '1' MONTH)
            INDEX index_attrA (attrA) WITH LOAD IDENTITY"
        )
      }}
    • in seed configuration:
      seeds:
        <project-name>:
          index: "UNIQUE PRIMARY INDEX ( GlobalID )"

      ℹ️ Note, unlike in table_option, there are no commas between index statements!

      seeds:
        <project-name>:
          index: "PRIMARY INDEX(id)
            PARTITION BY RANGE_N(create_date
                          BETWEEN DATE '2020-01-01'
                          AND     DATE '2021-01-01'
                          EACH INTERVAL '1' MONTH)"
      seeds:
        <project-name>:
          index: "PRIMARY INDEX(id)
            PARTITION BY RANGE_N(create_date
                          BETWEEN DATE '2020-01-01'
                          AND     DATE '2021-01-01'
                          EACH INTERVAL '1' MONTH)
            INDEX index_attrA (attrA) WITH LOAD IDENTITY"

Seeds

Seeds, in addition to the above materialization modifiers, have the following options:

  • use_fastload - use fastload when handling dbt seed command. The option will likely speed up loading when your seed files have hundreds of thousands of rows. You can set this seed configuration option in your project.yml file, e.g.:
    seeds:
      <project-name>:
        +use_fastload: true

Common Teradata-specific tasks

  • collect statistics - when a table is created or modified significantly, there might be a need to tell Teradata to collect statistics for the optimizer. It can be done using COLLECT STATISTICS command. You can perform this step using dbt's post-hooks, e.g.:
    {{ config(
      post_hook=[
        "COLLECT STATISTICS ON  {{ this }} COLUMN (column_1,  column_2  ...);"
        ]
    )}}
    See Collecting Statistics documentation for more information.

Support for dbt-utils package

dbt-utils package is supported through teradata/teradata_utils dbt package. The package provides a compatibility layer between dbt_utils and dbt-teradata. See teradata_utils package for install instructions.

Limitations

Transaction mode

Only ANSI transaction mode is supported.

Credits

The adapter was originally created by Doug Beatty. Teradata took over the adapter in January 2022. We are grateful to Doug for founding the project and accelerating the integration of dbt + Teradata.

License

The adapter is published using Apache-2.0 License. Please see the license for terms and conditions, such as creating derivative work and the support model.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.