Git Product home page Git Product logo

ddp_backend's Introduction

DDP_backend

License: AGPL v3 Code coverage badge DeepSource

Django application for the DDP platform's management backend. Exposes API endpoints for the management frontend to communicate with, for the purposes of

  • Onboarding an NGO client
  • Adding users from the client-organization
  • Creating a client's workspace in our Airbyte installation
  • Configuring that workspace i.e. setting up sources, destinations and connections
  • Configuring data ingest jobs in our Prefect setup
  • Connecting to the client's dbt GitHub repository
  • Configuring dbt run jobs in our Prefect setup

Development conventions

Api end points naming

  • REST conventions are being followed.
  • CRUD end points for a User resource would look like:
    • GET /api/users/
    • GET /api/users/user_id
    • POST /api/users/
    • PUT /api/users/:user_id
    • DELETE /api/users/:user_id
  • Route parameteres should be named in snake_case as shown above.

Ninja api docs

Code style

  • Pep8 has been used to standardized variable names, classes, module names etc.
  • Pylint is the linting tool used to analyze the code as per Pep8 style.
  • Black is used as the code formatter.

Setting up your vscode env

  • Recommended IDE is VsCode.
  • Install the pylint extension in vscode and enable it.
  • Set the default format provider in vscode as black
  • Update the vscode settings.json as follows
    { "editor.defaultFormatter": null, "python.linting.enabled": true, "python.formatting.provider": "black", "editor.formatOnSave": true }

Running pylint

  • In your virtual environment run pylint ddpui/

Running celery

  • In your virtual environment run:
    celery -A ddpui worker -n ddpui
  • For windows run:
    celery -A ddpui worker -n ddpui -P solo
  • To start celery beat run:
    celery -A ddpui beat

Setup instructions

Step 1: Create a Python Virtual Environment

  • pyenv local 3.10

  • pyenv exec python -m venv venv

  • source venv/bin/activate

  • pip install --upgrade pip

  • pip install -r requirements.txt

Step 2: Create the .env file

  • create .env from .env.template

Step 3: Create SQL Database

  • create a SQL database and populate its credentials into .env

  • You can use a postgresql docker image for local development

docker run --name postgres-db -e POSTGRES_PASSWORD=<password> -p 5432:5432 -d <db name>

  • Add the environment variable to .env
DBNAME=<db name>
DBHOST=localhost
DBPORT=5432
DBUSER=postgres
DBPASSWORD=<password>
DBADMINUSER=postgres
DBADMINPASSWORD=<password>

Step 4: Install Airbyte

  • Open a new terminal
  • Start Airbyte and populate connection info in .env
AIRBYTE_SERVER_HOST=
AIRBYTE_SERVER_PORT=
AIRBYTE_SERVER_APIVER=
AIRBYTE_API_TOKEN= <token> # base64 encryption of username:password. Default username and password is airbyte:password and token will be YWlyYnl0ZTpwYXNzd29yZA==
AIRBYTE_DESTINATION_TYPES=

Step 5: Install Prefect and Start Prefect Proxy

PREFECT_PROXY_API_URL=

Step 6: Create secrets directory

  • Set DEV_SECRETS_DIR in .env unless you want to use Amazon's Secrets Manager

Step 7: Install DBT

  • Open a new terminal

  • Create a local venv, install dbt and put its location into DBT_VENV in .env

pyenv local 3.10

pyenv exec python -m venv <env-name>

source <env-name>/bin/activate

python -m pip install \
  dbt-core \
  dbt-postgres \
  dbt-bigquery

  • Create empty directories for CLIENTDBT_ROOT
CLIENTDBT_ROOT=
DBT_VENV=<env-name>/bin/activate

Step 8: Add SIGNUPCODE and FRONTEND_URL

  • The SIGNUPCODE in .env is for signing up using the frontend. If you are running the frontend, set its URL in FRONTEND_URL

Step 9: Start Backend

DJANGOSECRET=
  • Create logs folder in ddpui

  • create whitelist.py from .whitelist.template.py in ddpui > assets folder

  • Run DB migrations python manage.py migrate

  • Seed the DB python manage.py loaddata seed/*.json

  • Create the system user python manage.py create-system-orguser

  • Start the server python manage.py runserver

Step 10: Create first org and user

  • Run python manage.py createorganduser <Org Name> <Email address>

Using Docker

Follow the steps below:

Step 1: Install Docker and Docker Compose

Step 2: Create .env file

  • create .env from .env.template inside the Docker folder

Step 3: Create whitelist.py file

  • Copy the file in ddpui/assets/ to Docker/mount

Step 4: Build the image

If using M1-based MacBook run this before building image export DOCKER_DEFAULT_PLATFORM=linux/amd64

  • docker build -f Docker/Dockerfile.main --build-arg BUILD_DATE=$(date -u +'%Y-%m-%dT%H:%M:%SZ') -t dalgo_backend_main_image:0.1 . This will create the main image
  • docker build -f Docker/Dockerfile.dev.deploy --build-arg BUILD_DATE=$(date -u +'%Y-%m-%dT%H:%M:%SZ') -t dalgo_backend:0.1 .

Step 5: Start the other applications

Step 5: Start Backend

  • docker-compose -f Docker/docker-compose.dev.yml up

ddp_backend's People

Contributors

abhishek-n avatar avirajsingh7 avatar d-jeph avatar deepsource-autofix[bot] avatar fatchat avatar himanshudube97 avatar huzaifmalik786 avatar ishankoradia avatar jayanth-kumar-morem avatar mdshamoon avatar nairabhishek73 avatar siddhant3030 avatar yvonnegitau avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

ddp_backend's Issues

Create a dev requirement file

Can we please create a dev_requirements.txt file here? Also, mention all the dependencies and their version number.

Pagination for Prefect's flow-run logs

The prefect-service's API to fetch a flow-run's logs takes an offset parameter, but the Django application doesn't send a value

This enhancement simply takes an optional offset from the (frontend) client and forwards it to the prefect-service

Survey CTO is faliing in the platform.

This is the error we get.

What we're able to do is. We can add the source for SurveyCTO and add the destination also. But sync is not working.


Traceback (most recent call last):
  File "/home/ddp/DDP_backend/venv/lib/python3.10/site-packages/ninja/operation.py", line 104, in run
    result = self.view_func(request, **values)
  File "/home/ddp/DDP_backend/ddpui/api/client/airbyte_api.py", line 449, in post_airbyte_connection
    airbyte_conn = airbyte_service.create_connection(org.airbyte_workspace_id, payload)
  File "/home/ddp/DDP_backend/ddpui/ddpairbyte/airbyte_service.py", line 258, in create_connection
    sourceschemacatalog = get_source_schema_catalog(
  File "/home/ddp/DDP_backend/ddpui/ddpairbyte/airbyte_service.py", line 154, in get_source_schema_catalog
    raise Exception("Failed to get source schema catalogs")
Exception: Failed to get source schema catalogs

Restrict Airbyte destinations to BigQuery and Postgres

For our first version we will only support BigQuery and Postgres warehouses. The client's warehouse is the only Airbyte destination we support, and so the Airbyte destination configuration should only support BigQuery and Postgres

Add 10 tests for DDP_backend

  1. Write tests for 10 endpoints in clientapi.py
  2. Document for the team how we are meant to write our own tests going forward

Set up the DDP warehouse along with the Airbyte destination

The DDP warehouse is used by Airbyte and by dbt. From Airbyte's point of view it is a destination and needs to be set up as one. For dbt it is a warehouse for which dbt requires credentials to be able to read from and write to it

When the user sets up their DDP warehouse we need to set up both their Airbyte destination as well as their dbt warehouse. For Postgres and BigQuery, Airbyte requires at least as much configuration information as dbt, and so we render the UI using Airbyte's destination specification to collect that information, part of which we then store in our db to be able to construct dbt's profiles.yml

API for Prefect flow-runs

We have endpoints to run flows (/prefect/flows/airbyte_sync/ and /prefect/flows/dbt_run/) but none to see an organization's flow history

  • Create a new table in our DB to track flows and flow runs against an org
  • When creating a flow using the endpoints above, store the flow id and flow run id
  • Search and retrieve flow runs for an org

how to deal with the updated connector?

For Sneha, we're using a custom connector which was updated by us. So basically there was a schema change in the API and we had to write and update the connector again. But the updated connector is not pushed to master so the Commcare which is currently there in the platform will not work and we don't want to use that.

Proposed solution -

We can build the new connector by building the docker container for that with the latest code but then you'll have to pull the new connector in your connector list and use that one to create the connection.

But again when the PR is pushed to master how we will update the existing connector? Something to think about

Bug: TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'

this is the traceback I get when I run the migration. It could be related to python version but I'm using 3.10

Traceback (most recent call last):
  File "/Users/arun/Documents/DDP_backend/manage.py", line 22, in <module>
    main()
  File "/Users/arun/Documents/DDP_backend/manage.py", line 18, in main
    execute_from_command_line(sys.argv)
  File "/Users/arun/Documents/DDP_backend/venv/lib/python3.9/site-packages/django/core/management/__init__.py", line 446, in execute_from_command_line
    utility.execute()
  File "/Users/arun/Documents/DDP_backend/venv/lib/python3.9/site-packages/django/core/management/__init__.py", line 440, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/Users/arun/Documents/DDP_backend/venv/lib/python3.9/site-packages/django/core/management/base.py", line 402, in run_from_argv
    self.execute(*args, **cmd_options)
  File "/Users/arun/Documents/DDP_backend/venv/lib/python3.9/site-packages/django/core/management/base.py", line 448, in execute
    output = self.handle(*args, **options)
  File "/Users/arun/Documents/DDP_backend/venv/lib/python3.9/site-packages/django/core/management/base.py", line 96, in wrapped
    res = handle_func(*args, **kwargs)
  File "/Users/arun/Documents/DDP_backend/venv/lib/python3.9/site-packages/django/core/management/commands/migrate.py", line 97, in handle
    self.check(databases=[database])
  File "/Users/arun/Documents/DDP_backend/venv/lib/python3.9/site-packages/django/core/management/base.py", line 475, in check
    all_issues = checks.run_checks(
  File "/Users/arun/Documents/DDP_backend/venv/lib/python3.9/site-packages/django/core/checks/registry.py", line 88, in run_checks
    new_errors = check(app_configs=app_configs, databases=databases)
  File "/Users/arun/Documents/DDP_backend/venv/lib/python3.9/site-packages/django/core/checks/urls.py", line 14, in check_url_config
    return check_resolver(resolver)
  File "/Users/arun/Documents/DDP_backend/venv/lib/python3.9/site-packages/django/core/checks/urls.py", line 24, in check_resolver
    return check_method()
  File "/Users/arun/Documents/DDP_backend/venv/lib/python3.9/site-packages/django/urls/resolvers.py", line 494, in check
    for pattern in self.url_patterns:
  File "/Users/arun/Documents/DDP_backend/venv/lib/python3.9/site-packages/django/utils/functional.py", line 57, in __get__
    res = instance.__dict__[self.name] = self.func(instance)
  File "/Users/arun/Documents/DDP_backend/venv/lib/python3.9/site-packages/django/urls/resolvers.py", line 715, in url_patterns
    patterns = getattr(self.urlconf_module, "urlpatterns", self.urlconf_module)
  File "/Users/arun/Documents/DDP_backend/venv/lib/python3.9/site-packages/django/utils/functional.py", line 57, in __get__
    res = instance.__dict__[self.name] = self.func(instance)
  File "/Users/arun/Documents/DDP_backend/venv/lib/python3.9/site-packages/django/urls/resolvers.py", line 708, in urlconf_module
    return import_module(self.urlconf_name)
  File "/Users/arun/.pyenv/versions/3.9.13/lib/python3.9/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1030, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1007, in _find_and_load
  File "<frozen importlib._bootstrap>", line 986, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 680, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 850, in exec_module
  File "<frozen importlib._bootstrap>", line 228, in _call_with_frames_removed
  File "/Users/arun/Documents/DDP_backend/ddpui/urls.py", line 5, in <module>
    from ddpui.api.client.airbyte_api import airbyteapi
  File "/Users/arun/Documents/DDP_backend/ddpui/api/client/airbyte_api.py", line 21, in <module>
    from ddpui.ddpprefect.prefect_service import run_airbyte_connection_sync
  File "/Users/arun/Documents/DDP_backend/ddpui/ddpprefect/prefect_service.py", line 19, in <module>
    def get_airbyte_server_block_id(blockname) -> str | None:
TypeError: unsupported operand type(s) for |: 'type' and 'NoneType'

CRUD-ify the PrefectService

The PrefectService offers creation of four types of prefect blocks

Implement read / update / delete for these block types in PrefectService

Implement corresponding API endpoints in the ClientController

Make sure that a user can only touch their own organization's blocks! We track this using the OrgPrefectBlock table in our own DB

Creating a "ddp dbt profile"

For dbt, a profile is

  • listed in dbt_project.yml
  • a set of targets which appear in profiles.yml

For us, a ddp dbt profile is

  • a dbt profile
  • a single target associated to a single target schema
  • we generate the target name for the profiles.yml (which we generate) ... since this is up to us we will just use the name of the target schema / dataset

We envision a user creating several ddp dbt profiles in their workspace, one for production and others for testing and experimentation, each writing to a different target schema / dataset

When we create a ddp dbt profile we create three Prefect blocks

  • dbt test
  • dbt run
  • dbt docs generate

named using

  • the org's slug
  • the profile name
  • the target name

Clean up Airbyte api forwarding code

in ddpairbyte/functions.py

  • within abreq(), try-catch and log to a dedicated logfile
  • rename functions.py to something better

Use scripts/test-airbyte-api.py as a starting point, and write tests for every function in functions.py

Prefect integration

Create API endpoints to create Prefect jobs to

  • Run Airbyte ingests
  • Run dbt jobs

CRUD-ify the AirbyteService

The AirbyteService offers creation and retrieval of sources, destinations and connections.

Implement Delete and Update

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.