Git Product home page Git Product logo

tap-dataverse's Introduction

tap-dataverse

tap-dataverse is a Singer tap for Dataverse.

Built with the Meltano Tap SDK for Singer Taps.

Installation

Install from GitHub:

pipx install git+https://github.com/mjsqu/tap-dataverse.git@main

Configuration

Capabilities

  • catalog
  • state
  • discover
  • about
  • stream-maps
  • schema-flattening
  • batch

Settings

Setting Required Default Description
client_secret True None The client secret to authenticate against the API service
client_id True None Client (application) ID
tenant_id True None Tenant ID
start_date False None The earliest record date to sync NOT WORKING see #4
api_url True None The url for the API service
api_version False 9.2 The API version found in the /api/data/v{x.y} of URLs
annotations False False Turns on annotations
sql_attribute_names False False Uses the Snowflake column name rules to translate any characters outside the standard to an underscore. Particularly helpful when annotations are turned on
streams False None An array of streams, designed for separate paths using thesame base url.
stream_maps False None Config object for stream maps capability. For more information check out Stream Maps.
stream_map_config False None User-defined config values to be used within map expressions.
faker_config False None Config for the Faker instance variable fake used within map expressions. Only applicable if the plugin specifies faker as an addtional dependency (through the singer-sdk faker extra or directly).
flattening_enabled False None 'True' to enable schema flattening and automatically expand nested properties.
flattening_max_depth False None The max depth to flatten schemas.
batch_config False None

A full list of supported settings and capabilities for this tap is available by running:

tap-dataverse --about

Configure using environment variables

This Singer tap will automatically import any environment variables within the working directory's .env if the --config=ENV is provided, such that config values will be considered if a matching environment variable is set either in the terminal context or in the .env file.

Source Authentication and Authorization

This tap uses the client_credentials method of authentication and requires an App Registration and PowerApp setup steps. Note, these steps have been copied from internal documentation with all identifying values removed, please refer to the official Dataverse API documentation for further assistance, or do a PR if you would like to help improve the docs.

Azure App Registration

  • Login to https://portal.azure.com using your Azure Admin account
  • Open App registrations
  • Click + New registration
  • Enter a Name (e.g. tap_dataverse_powerplatformaccess)
  • Select Accounts in this organizational directory only (single tenant)
  • Click Register
  • Click Certificates & secrets
  • Click + New client secret
  • Description - (suggestion Tap-Dataverse Power Platform Client Secret)
  • Select Expiry of choice
  • Click Add and record client_secret in config
  • Click Expose an API
  • Click Set
  • Update Application ID URI
  • Enter the OAuth Endpoint
  • Click + Add a scope
  • Enter:
  • Scope name = session:role-any
  • Who can consent? = Admin and users
  • Click Add scope
  • Click Save

Configure the PowerApp

  • Login to https://admin.powerplatform.microsoft.com

  • Click Environments

  • Click Data Provider environment

  • Click S2S apps See all

  • Click + New app user

  • Click + Add an app

  • Select tap_dataverse_powerplatformaccess or the name selected for the App Registration earlier

  • Click Add

  • Enter: Business unit from the PowerPlatform developer settings url e.g. from https://.api.crm6.dynamics.com the business unit value comes before .api.crm in the url

  • Once you have completed all these steps, you should have:

  • client_id - GUID

  • tenant_id - GUID <> client_id

  • client_secret

Usage

You can easily run tap-dataverse by itself or in a pipeline using Meltano.

Executing the Tap Directly

tap-dataverse --version
tap-dataverse --help
tap-dataverse --config CONFIG --discover > ./catalog.json

Developer Resources

Follow these instructions to contribute to this project.

Initialize your Development Environment

pipx install poetry
poetry install

Create and Run Tests

Create tests within the tests subfolder and then run:

poetry run pytest

You can also test the tap-dataverse CLI interface directly using poetry run:

poetry run tap-dataverse --help

Testing with Meltano

Note: This tap will work in any Singer environment and does not require Meltano. Examples here are for convenience and to streamline end-to-end orchestration scenarios.

Next, install Meltano (if you haven't already) and any needed plugins:

# Install meltano
pipx install meltano
# Initialize meltano within this directory
cd tap-dataverse
meltano install

Now you can test and orchestrate using Meltano:

# Test invocation:
meltano invoke tap-dataverse --version
# OR run a test `elt` pipeline:
meltano elt tap-dataverse target-jsonl

SDK Dev Guide

See the dev guide for more instructions on how to use the SDK to develop your own taps and targets.

tap-dataverse's People

Contributors

mjsqu avatar

Watchers

 avatar

tap-dataverse's Issues

feat: Infer primary key from EntityDetails query

"@odata.context": "https://{org}.api.crm6.dynamics.com/api/data/v9.2/$metadata#EntityDefinitions('{entity}')/Attributes",

Returns:

        {
            "AttributeType": "Uniqueidentifier",
...
            "IsPrimaryId": true,

bug: Stream defined as unsorted - not resumable

The streams are being sorted by modified on (or replication_key), however the appropriate flag has not been set, so the Meltano SDK assumes the streams are unsorted and therefore un-resumable

Handle Virtual types

Example, attributes such as donotphone on the account entity have a donotphonename:

        {
            "@odata.type": "#Microsoft.Dynamics.CRM.BooleanAttributeMetadata",
            "LogicalName": "donotphone",
            "AttributeType": "Boolean",
            "IsPrimaryId": false,
            "AttributeOf": null,
            "MetadataId": "a4561f83-3630-4f3c-9a36-1cdfff96a97a"
        },
        {
            "LogicalName": "donotphonename",
            "AttributeType": "Virtual",
            "IsPrimaryId": false,
            "AttributeOf": "donotphone",
            "MetadataId": "ca244852-c3b8-4fea-ae31-68317fa11bf8"
        },

This is represented in the table records response as:

        {
            "@odata.etag": "W/\"6447176\"",
            "[email protected]": "Allow",
            "donotphone": false,
            "accountid": "872cde02-e4cc-ec11-a81b-000d3acb4e05",
            "[email protected]": "26/03/23 8:54 PM",
            "modifiedon": "2023-03-26T20:54:45Z"
        },

Currently, the formatted value is determined by type, however the code should also look for any extra elements that are:

  • Virtual
  • have another column as their AttributeOf value

Then that would dictate that an annotation is output.

bug: Streams with no prior state are not sorted

Requires this change to the params setting code:

        if self.replication_key:
            params["$orderby"] = f"{self.replication_key} asc"
            if last_run_date:
                params["$filter"] = f"{self.replication_key} ge {last_run_date}"

Always sort the stream if it has a replication key

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.