Git Product home page Git Product logo

databricks-import-notebook's Introduction

Databricks Import Directory Action

GitHub Action that imports the files from a local path into a Databricks workspace. Build

When to use

This action is useful when you need to import a directory to the Databricks workspace, for example, when you want to import notebooks into a specific path. Only directories and files with the extensions .scala, .py, .sql, .r, .R, .ipynb are imported.

How it works

The GitHub Action works with the 'import_dir' command from the Databricks Workspace Cli

Getting Started

Prerequisites

  • Make sure you have a directory in your repo you want to import into Databricks
  • Make sure you have installed the Databricks Cli
  • Make sure you have a Databricks Access Token. It can be a PAT, or if you're working with Azure Databricks, it can be an AAD Token.

Note: You can find both sample workflows in this repository.

Usage

steps:
    - name: databricks-import-directory
      uses: microsoft/[email protected]
      with:
        databricks-host: https://<instance-name>.cloud.databricks.com
        databricks-token: token
        local-path: ./my-local-path
        remote-path: /my-remote-path

Inputs

Name Description Required Default value
databricks-host Workspace URL, with the format https://.cloud.databricks.com true NA
databricks-token Databricks token, it can be a PAT or an AAD Token true NA
local-path Path of the directory you want to import true NA
remote-path Path of the directory inside Databricks workspace where you want the files to land true NA

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Trademarks

This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.

databricks-import-notebook's People

Contributors

microsoft-github-operations[bot] avatar microsoftopensource avatar vianeyja avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

databricks-import-notebook's Issues

Instead of Ignoring unsupported file, the import_dir is throwing error

When imports local directory with import_dir into Workspace. And one of the subfolder has a list of unsupported files (e.g. .html files ) currently instead of ignore the unsupported file, the command is throwing errors as showed below.
And there is no option in this command to easily exclude the subfolder fromimport_dir -e is only supporting exclude hidden files. This causing a lot manual re-iteration to import each subfolder separately. It would be easier to either ignore the folder which does not have supported file extension or allow user to exclude sub-folders.

Step to reproduce:

  1. Import local directory to WORKSPACE with databricks workspace import_dir
  2. Error

send: b'"path": "path/path_subfolder/", "format": "HTML", "content": "..........#$%^&", "overwrite": true*'
reply: 'HTTP/1.1 400 Bad Request\r\n'

DataBricks configure command missing from AAD example

AAD will not work without using databricks configure --token option.

References:
https://docs.databricks.com/dev-tools/cli/index.html
https://learn.microsoft.com/en-us/azure/databricks/dev-tools/cli/

Need to update the documentation and example to add this command. Otherwise there is a less than helpful response of a 403 error that doesn't point at where the problem really is. We spent more time than I care to admin troubleshooting the access side of the house before realizing it was a missed configuration step.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.