Git Product home page Git Product logo

ucx's Introduction

UCX - Unity Catalog Migration Toolkit

build codecov

Your best companion for upgrading to Unity Catalog. It helps you to upgrade all Databricks workspace assets: Legacy Table ACLs, Entitlements, AWS instance profiles, Clusters, Cluster policies, Instance Pools, Databricks SQL warehouses, Delta Live Tables, Jobs, MLflow experiments, MLflow registry, SQL Dashboards & Queries, SQL Alerts, Token and Password usage permissions that are set on the workspace level, Secret scopes, Notebooks, Directories, Repos, Files.

See contributing instructions to help improve this project.

Introduction

UCX will guide you, the Databricks customer through the process of upgrading your account, groups, workspaces, jobs etc to Unity Catalog.

  1. The upgrade process will first install code, libraries, and workflows into your workspace.
  2. After installation, you will run a series of workflows and examine the output.

UCX leverages Databricks Lakehouse platform to upgrade itself, this includes creating jobs, notebooks, deploying code and configuration files. The install.sh guides you through the installation.

By running the installation you install the assessment job and a number of upgrade jobs. The assessment and upgrade jobs are outlined in the custom generated README.py that is created by the installer and displayed to you by the install.sh. See interactive installation tutorial here.

The custom generated README.py, config.yaml and other assets are placed into your Databricks workspace home folder, into a subfolder named .ucx. See interactive tutorial.

Once the custom Databricks jobs are installed, begin by triggering the assessment job. The assessment job can be found under your workflows or via the active link in the README.py. Once the assessment job is complete, you can review the results in the custom generated Databricks dashboard (linked to by the custom README.py found in the workspace folder created for you).

You will need account, unity catalog and workspace administrative authority to complete the upgrade process. To run the installer, you will need to setup databricks-cli and a credential, following these instructions. Additionally, the interim metadata and config data being processed by UCX will be stored into a Hive Metastore database schema generated at install time.

For questions, troubleshooting or bug fixes, please see your Databricks account team or submit an issue to the Databricks UCX github repo

Installation

Prerequisites

  1. Get trained on UC [free instructor-led training 2x week] [full training schedule]
  2. [AWS] [Azure] [GCP] Account level Identity Setup
  3. [AWS] [Azure] [GCP] Unity Catalog Metastore Created (per region)

Download & Install

As a customer, download the latest release from github onto your laptop/desktop machine. Unzip or untar the release.

The ./install.sh script will guide you through installation process. Make sure you have Python 3.10 (or greater) installed on your workstation, and you've configured authentication for the Databricks Workspace.

install wizard

The easiest way to install and authenticate is through a Databricks configuration profile:

export DATABRICKS_CONFIG_PROFILE=ABC
./install.sh

You can also specify environment variables in a more direct way, like in this example for installing on an Azure Databricks Workspace using the Azure CLI authentication:

az login
export DATABRICKS_HOST=https://adb-123....azuredatabricks.net/
./install.sh

Please follow the instructions in ./install.sh, which will deploy UCX to your workspace and open a notebook with the description of all jobs to trigger. The journey starts with assessment.

Star History

Star History Chart

Project Support

Please note that all projects in the /databrickslabs github account are provided for your exploration only, and are not formally supported by Databricks with Service Level Agreements (SLAs). They are provided AS-IS and we do not make any guarantees of any kind. Please do not submit a support ticket relating to any issues arising from the use of these projects.

Any issues discovered through the use of this project should be filed as GitHub Issues on the Repo. They will be reviewed as time permits, but there are no formal SLAs for support.

ucx's People

Contributors

nfx avatar renardeinside avatar larsgeorge-db avatar harigs-db avatar dependabot[bot] avatar fastlee avatar william-conti avatar dipankarkush-db avatar dmoore247 avatar tamilselvanveeramani avatar mwojtyczka avatar saraivdbx avatar fannijako avatar pohlposition avatar pritishpai avatar priyal-c avatar nsenno-dbr avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.