crflynn / databricks-api Goto Github PK

A simplified, autogenerated API client interface using the databricks-cli package

License: MIT License

Python 90.85% Makefile 9.15%

databricks-api's Introduction

databricks-api

Please switch to the official Databricks SDK for Python (https://github.com/databricks/databricks-sdk-py) by running the following command:

pip install databricks-sdk

[This documentation is auto-generated]

This package provides a simplified interface for the Databricks REST API. The interface is autogenerated on instantiation using the underlying client library used in the official databricks-cli python package.

Install using

pip install databricks-api

The docs here describe the interface for version 0.17.0 of the databricks-cli package for API version 2.0.

The databricks-api package contains a DatabricksAPI class which provides instance attributes for the databricks-cli ApiClient, as well as each of the available service instances. The attributes of a DatabricksAPI instance are:

DatabricksAPI.client <databricks_cli.sdk.api_client.ApiClient>
DatabricksAPI.jobs <databricks_cli.sdk.service.JobsService>
DatabricksAPI.cluster <databricks_cli.sdk.service.ClusterService>
DatabricksAPI.policy <databricks_cli.sdk.service.PolicyService>
DatabricksAPI.managed_library <databricks_cli.sdk.service.ManagedLibraryService>
DatabricksAPI.dbfs <databricks_cli.sdk.service.DbfsService>
DatabricksAPI.workspace <databricks_cli.sdk.service.WorkspaceService>
DatabricksAPI.secret <databricks_cli.sdk.service.SecretService>
DatabricksAPI.groups <databricks_cli.sdk.service.GroupsService>
DatabricksAPI.token <databricks_cli.sdk.service.TokenService>
DatabricksAPI.instance_pool <databricks_cli.sdk.service.InstancePoolService>
DatabricksAPI.delta_pipelines <databricks_cli.sdk.service.DeltaPipelinesService>
DatabricksAPI.repos <databricks_cli.sdk.service.ReposService>

To instantiate the client, provide the databricks host and either a token or user and password. Also shown is the full signature of the underlying ApiClient.__init__

from databricks_api import DatabricksAPI

# Provide a host and token
db = DatabricksAPI(
    host="example.cloud.databricks.com",
    token="dpapi123..."
)

# OR a host and user and password
db = DatabricksAPI(
    host="example.cloud.databricks.com",
    user="[email protected]",
    password="password"
)

# Full __init__ signature
db = DatabricksAPI(
    user=None,
    password=None,
    host=None,
    token=None,
    api_version='2.0',
    default_headers={},
    verify=True,
    command_name='',
    jobs_api_version=None
)

Refer to the official documentation on the functionality and required arguments of each method below.

Each of the service instance attributes provides the following public methods:

DatabricksAPI.jobs

db.jobs.cancel_run(
    run_id,
    headers=None,
    version=None,
)

db.jobs.create_job(
    name=None,
    existing_cluster_id=None,
    new_cluster=None,
    libraries=None,
    email_notifications=None,
    timeout_seconds=None,
    max_retries=None,
    min_retry_interval_millis=None,
    retry_on_timeout=None,
    schedule=None,
    notebook_task=None,
    spark_jar_task=None,
    spark_python_task=None,
    spark_submit_task=None,
    max_concurrent_runs=None,
    tasks=None,
    headers=None,
    version=None,
)

db.jobs.delete_job(
    job_id,
    headers=None,
    version=None,
)

db.jobs.delete_run(
    run_id=None,
    headers=None,
    version=None,
)

db.jobs.export_run(
    run_id,
    views_to_export=None,
    headers=None,
    version=None,
)

db.jobs.get_job(
    job_id,
    headers=None,
    version=None,
)

db.jobs.get_run(
    run_id=None,
    headers=None,
    version=None,
)

db.jobs.get_run_output(
    run_id,
    headers=None,
    version=None,
)

db.jobs.list_jobs(
    job_type=None,
    expand_tasks=None,
    limit=None,
    offset=None,
    headers=None,
    version=None,
)

db.jobs.list_runs(
    job_id=None,
    active_only=None,
    completed_only=None,
    offset=None,
    limit=None,
    headers=None,
    version=None,
)

db.jobs.reset_job(
    job_id,
    new_settings,
    headers=None,
    version=None,
)

db.jobs.run_now(
    job_id=None,
    jar_params=None,
    notebook_params=None,
    python_params=None,
    spark_submit_params=None,
    python_named_params=None,
    idempotency_token=None,
    headers=None,
    version=None,
)

db.jobs.submit_run(
    run_name=None,
    existing_cluster_id=None,
    new_cluster=None,
    libraries=None,
    notebook_task=None,
    spark_jar_task=None,
    spark_python_task=None,
    spark_submit_task=None,
    timeout_seconds=None,
    tasks=None,
    headers=None,
    version=None,
)

DatabricksAPI.cluster

db.cluster.create_cluster(
    num_workers=None,
    autoscale=None,
    cluster_name=None,
    spark_version=None,
    spark_conf=None,
    aws_attributes=None,
    node_type_id=None,
    driver_node_type_id=None,
    ssh_public_keys=None,
    custom_tags=None,
    cluster_log_conf=None,
    spark_env_vars=None,
    autotermination_minutes=None,
    enable_elastic_disk=None,
    cluster_source=None,
    instance_pool_id=None,
    headers=None,
)

db.cluster.delete_cluster(
    cluster_id,
    headers=None,
)

db.cluster.edit_cluster(
    cluster_id,
    num_workers=None,
    autoscale=None,
    cluster_name=None,
    spark_version=None,
    spark_conf=None,
    aws_attributes=None,
    node_type_id=None,
    driver_node_type_id=None,
    ssh_public_keys=None,
    custom_tags=None,
    cluster_log_conf=None,
    spark_env_vars=None,
    autotermination_minutes=None,
    enable_elastic_disk=None,
    cluster_source=None,
    instance_pool_id=None,
    headers=None,
)

db.cluster.get_cluster(
    cluster_id,
    headers=None,
)

db.cluster.get_events(
    cluster_id,
    start_time=None,
    end_time=None,
    order=None,
    event_types=None,
    offset=None,
    limit=None,
    headers=None,
)

db.cluster.list_available_zones(headers=None)

db.cluster.list_clusters(headers=None)

db.cluster.list_node_types(headers=None)

db.cluster.list_spark_versions(headers=None)

db.cluster.permanent_delete_cluster(
    cluster_id,
    headers=None,
)

db.cluster.pin_cluster(
    cluster_id,
    headers=None,
)

db.cluster.resize_cluster(
    cluster_id,
    num_workers=None,
    autoscale=None,
    headers=None,
)

db.cluster.restart_cluster(
    cluster_id,
    headers=None,
)

db.cluster.start_cluster(
    cluster_id,
    headers=None,
)

db.cluster.unpin_cluster(
    cluster_id,
    headers=None,
)

DatabricksAPI.policy

db.policy.create_policy(
    policy_name,
    definition,
    headers=None,
)

db.policy.delete_policy(
    policy_id,
    headers=None,
)

db.policy.edit_policy(
    policy_id,
    policy_name,
    definition,
    headers=None,
)

db.policy.get_policy(
    policy_id,
    headers=None,
)

db.policy.list_policies(headers=None)

DatabricksAPI.managed_library

db.managed_library.all_cluster_statuses(headers=None)

db.managed_library.cluster_status(
    cluster_id,
    headers=None,
)

db.managed_library.install_libraries(
    cluster_id,
    libraries=None,
    headers=None,
)

db.managed_library.uninstall_libraries(
    cluster_id,
    libraries=None,
    headers=None,
)

DatabricksAPI.dbfs

db.dbfs.add_block(
    handle,
    data,
    headers=None,
)

db.dbfs.add_block_test(
    handle,
    data,
    headers=None,
)

db.dbfs.close(
    handle,
    headers=None,
)

db.dbfs.close_test(
    handle,
    headers=None,
)

db.dbfs.create(
    path,
    overwrite=None,
    headers=None,
)

db.dbfs.create_test(
    path,
    overwrite=None,
    headers=None,
)

db.dbfs.delete(
    path,
    recursive=None,
    headers=None,
)

db.dbfs.delete_test(
    path,
    recursive=None,
    headers=None,
)

db.dbfs.get_status(
    path,
    headers=None,
)

db.dbfs.get_status_test(
    path,
    headers=None,
)

db.dbfs.list(
    path,
    headers=None,
)

db.dbfs.list_test(
    path,
    headers=None,
)

db.dbfs.mkdirs(
    path,
    headers=None,
)

db.dbfs.mkdirs_test(
    path,
    headers=None,
)

db.dbfs.move(
    source_path,
    destination_path,
    headers=None,
)

db.dbfs.move_test(
    source_path,
    destination_path,
    headers=None,
)

db.dbfs.put(
    path,
    contents=None,
    overwrite=None,
    headers=None,
    src_path=None,
)

db.dbfs.put_test(
    path,
    contents=None,
    overwrite=None,
    headers=None,
    src_path=None,
)

db.dbfs.read(
    path,
    offset=None,
    length=None,
    headers=None,
)

db.dbfs.read_test(
    path,
    offset=None,
    length=None,
    headers=None,
)

DatabricksAPI.workspace

db.workspace.delete(
    path,
    recursive=None,
    headers=None,
)

db.workspace.export_workspace(
    path,
    format=None,
    direct_download=None,
    headers=None,
)

db.workspace.get_status(
    path,
    headers=None,
)

db.workspace.import_workspace(
    path,
    format=None,
    language=None,
    content=None,
    overwrite=None,
    headers=None,
)

db.workspace.list(
    path,
    headers=None,
)

db.workspace.mkdirs(
    path,
    headers=None,
)

DatabricksAPI.secret

db.secret.create_scope(
    scope,
    initial_manage_principal=None,
    scope_backend_type=None,
    backend_azure_keyvault=None,
    headers=None,
)

db.secret.delete_acl(
    scope,
    principal,
    headers=None,
)

db.secret.delete_scope(
    scope,
    headers=None,
)

db.secret.delete_secret(
    scope,
    key,
    headers=None,
)

db.secret.get_acl(
    scope,
    principal,
    headers=None,
)

db.secret.list_acls(
    scope,
    headers=None,
)

db.secret.list_scopes(headers=None)

db.secret.list_secrets(
    scope,
    headers=None,
)

db.secret.put_acl(
    scope,
    principal,
    permission,
    headers=None,
)

db.secret.put_secret(
    scope,
    key,
    string_value=None,
    bytes_value=None,
    headers=None,
)

DatabricksAPI.groups

db.groups.add_to_group(
    parent_name,
    user_name=None,
    group_name=None,
    headers=None,
)

db.groups.create_group(
    group_name,
    headers=None,
)

db.groups.get_group_members(
    group_name,
    headers=None,
)

db.groups.get_groups(headers=None)

db.groups.get_groups_for_principal(
    user_name=None,
    group_name=None,
    headers=None,
)

db.groups.remove_from_group(
    parent_name,
    user_name=None,
    group_name=None,
    headers=None,
)

db.groups.remove_group(
    group_name,
    headers=None,
)

DatabricksAPI.token

db.token.create_token(
    lifetime_seconds=None,
    comment=None,
    headers=None,
)

db.token.list_tokens(headers=None)

db.token.revoke_token(
    token_id,
    headers=None,
)

DatabricksAPI.instance_pool

db.instance_pool.create_instance_pool(
    instance_pool_name=None,
    min_idle_instances=None,
    max_capacity=None,
    aws_attributes=None,
    node_type_id=None,
    custom_tags=None,
    idle_instance_autotermination_minutes=None,
    enable_elastic_disk=None,
    disk_spec=None,
    preloaded_spark_versions=None,
    headers=None,
)

db.instance_pool.delete_instance_pool(
    instance_pool_id=None,
    headers=None,
)

db.instance_pool.edit_instance_pool(
    instance_pool_id,
    instance_pool_name=None,
    min_idle_instances=None,
    max_capacity=None,
    aws_attributes=None,
    node_type_id=None,
    custom_tags=None,
    idle_instance_autotermination_minutes=None,
    enable_elastic_disk=None,
    disk_spec=None,
    preloaded_spark_versions=None,
    headers=None,
)

db.instance_pool.get_instance_pool(
    instance_pool_id=None,
    headers=None,
)

db.instance_pool.list_instance_pools(headers=None)

DatabricksAPI.delta_pipelines

db.delta_pipelines.create(
    id=None,
    name=None,
    storage=None,
    configuration=None,
    clusters=None,
    libraries=None,
    trigger=None,
    filters=None,
    allow_duplicate_names=None,
    headers=None,
)

db.delta_pipelines.delete(
    pipeline_id=None,
    headers=None,
)

db.delta_pipelines.deploy(
    pipeline_id=None,
    id=None,
    name=None,
    storage=None,
    configuration=None,
    clusters=None,
    libraries=None,
    trigger=None,
    filters=None,
    allow_duplicate_names=None,
    headers=None,
)

db.delta_pipelines.get(
    pipeline_id=None,
    headers=None,
)

db.delta_pipelines.list(
    pagination=None,
    headers=None,
)

db.delta_pipelines.reset(
    pipeline_id=None,
    headers=None,
)

db.delta_pipelines.run(
    pipeline_id=None,
    headers=None,
)

db.delta_pipelines.start_update(
    pipeline_id=None,
    full_refresh=None,
    headers=None,
)

db.delta_pipelines.stop(
    pipeline_id=None,
    headers=None,
)

DatabricksAPI.repos

db.repos.create_repo(
    url,
    provider,
    path=None,
    headers=None,
)

db.repos.delete_repo(
    id,
    headers=None,
)

db.repos.get_repo(
    id,
    headers=None,
)

db.repos.list_repos(
    path_prefix=None,
    next_page_token=None,
    headers=None,
)

db.repos.update_repo(
    id,
    branch=None,
    tag=None,
    headers=None,
)

databricks-api's People

Contributors

Stargazers

Watchers

Forkers

tysoncung tedjt fagan2888 anogues bbertincourt youghurt biglinkage alexott ebarault skotep mukeshpatil2021 mattias-de-coninck mrmasterplan rmasiniexpert peterdowdy

databricks-api's Issues

how do you do multitask

Does the library support jobs api 2.1 as we want to try out multi task using this library.

init_script support in creating cluster

Hello,
Thanks for providing this nice wrapper.
I was wondering if there is anyway to set 'init_scripts' parameter in db.cluster.create_cluster command. (which exists in official website for Cluster API 2.0 here: https://docs.databricks.com/dev-tools/api/latest/clusters.html)

Thanks,

Was not able to use edit_cluster

The snippet:
db.cluster.edit_cluster(cluster_id, spark_version="10.3.x-scala2.12")
Could you share some sample on using this edit_cluster API?

HTTPError Traceback (most recent call last)
~/Library/Python/3.7/lib/python/site-packages/databricks_cli/sdk/api_client.py in perform_query(self, method, path, data, headers, files, version)
137 try:
--> 138 resp.raise_for_status()
139 except requests.exceptions.HTTPError as e:

~/Library/Python/3.7/lib/python/site-packages/requests/models.py in raise_for_status(self)
940 if http_error_msg:
--> 941 raise HTTPError(http_error_msg, response=self)
942

HTTPError: 400 Client Error: Bad Request for url: https://adb-xyz.azuredatabricks.net/api/2.0/clusters/edit

During handling of the above exception, another exception occurred:

HTTPError Traceback (most recent call last)
/var/folders/0q/1yts_p_s4rq_1x984fxlbv600000gp/T/ipykernel_15169/3832242952.py in
1 if name == "main":
----> 2 main()

/var/folders/0q/1yts_p_s4rq_1x984fxlbv600000gp/T/ipykernel_15169/3778288670.py in main()
69 unravel_spark2_cluster_configs,
70 unravel_spark3_cluster_configs,
---> 71 output_directory_path)
72

/var/folders/0q/1yts_p_s4rq_1x984fxlbv600000gp/T/ipykernel_15169/2351089658.py in configureInteractiveClustersWithUnravel(cluster_list, workspace_id2api, workspace_spark_verisons, unravel_spark2_cluster_configs, unravel_spark3_cluster_configs, output_path)
91
92 db.cluster.edit_cluster(cluster_id,
---> 93 spark_version="10.3.x-scala2.12")
94
95
~/Library/Python/3.7/lib/python/site-packages/databricks_cli/sdk/service.py in edit_cluster(self, cluster_id, num_workers, autoscale, cluster_name, spark_version, spark_conf, aws_attributes, node_type_id, driver_node_type_id, ssh_public_keys, custom_tags, cluster_log_conf, spark_env_vars, autotermination_minutes, enable_elastic_disk, cluster_source, instance_pool_id, headers)
365 _data['instance_pool_id'] = instance_pool_id
366 print(_data)
--> 367 return self.client.perform_query('POST', '/clusters/edit', data=_data, headers=headers)
368
369 def get_cluster(self, cluster_id, headers=None):

~/Library/Python/3.7/lib/python/site-packages/databricks_cli/sdk/api_client.py in perform_query(self, method, path, data, headers, files, version)
144 except ValueError:
145 pass
--> 146 raise requests.exceptions.HTTPError(message, response=e.response)
147 return resp.json()
148

HTTPError: 400 Client Error: Bad Request for url: https://adb-xyz.azuredatabricks.net/api/2.0/clusters/edit
Response from server:
{ 'error_code': 'INVALID_PARAMETER_VALUE',
'message': 'Missing required field: Size'}

No support fot job_clusters in db.jobs.create_job (API 2.1)

Support for Azure-Databricks

Is there any timeline to support Azure-Databricks API?

User creation through scim api

Hello

I forked your project and added the following to be able to create new users through the api.

https://docs.databricks.com/dev-tools/api/latest/scim/scim-users.html

def create_user(self, user_name=None, headers=None):
    _data = {}
    if user_name is not None:
        _data['schemas'] = ["urn:ietf:params:scim:schemas:core:2.0:User"]
        _data['userName'] = user_name
        _data['entitlements'] = [{'value': 'allow-cluster-create'}]		
    return self.client.perform_query('POST', '/preview/scim/v2/Users', data=_data, headers=headers)

Of course it can be improved to allow to pass groups and so on. Just in case you want it.

db.jobs.list_runs returns a different amount of runs than limit specified

Hi,

I am having issues with the method db.jobs.list_runs because it seems that the limit parameter isn't used and the number of runs returned always differs from limit.

For example:

len(db.jobs.list_runs(limit=25)['runs'])

returns 21 elements instead of the 25 expected.

Relax databricks-cli version restriction

Right now the package depends on the databricks-cli 0.12.x only, while the latest version is 0.14.3 - it would be useful to relax dependency version to avoid dependency on the old version

Add tasks to an existing job

Hello guys, I'm implementing this Lib as an interface in the manipulations, but I couldn't understand how I'm going to add a new task to an existing job.
How could I do this?

Example:

There is the method

db.jobs.reset_job(
    "job_id",
    new_settings,
)

and I need something like:

db.jobs.update_job(
    "job_id",
    new_settings,
)

compatibility with python3.8

Create Cluster init_script

I like your API very much and I will use it in my CI pipeline. Unfortunately I have a problem adding my init_script to the cluster.

This is my code:

cluster_json = db.cluster.create_cluster(
   num_workers=2,
   cluster_name="az-ckw-uieb-databricks-devops_test",
   spark_version="5.5.x-scala2.11",
   spark_conf=None,
   node_type_id="Standard_DS3_v2",
   spark_env_vars={
       "PYSPARK_PYTHON": "/databricks/python3/bin/python3"
   },
   autotermination_minutes=120,
   enable_elastic_disk=True,
   init_scripts=[{'dbfs': {'destination': 'dbfs:/databricks/scripts/oracle-install.sh'}}],
)

However, if I execute it, I got this error message:

TypeError: create_cluster() got an unexpected keyword argument 'init_scripts'

Any idea?

Many Thanks
Christoph

Cannot use DatabricksAPI.workspace

According to the documentation I think I should be able to from databricks_api import DatabricksAPI and then access the DatabricksAPI.workspace, but this doesnt seem to be working. My main goal is to check if a directory structure exists, and if not create one.

In [1]: from databricks_api import DatabricksAPI

In [2]: DatabricksAPI.workspace.list()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-09fa92501b4f> in <module>
----> 1 DatabricksAPI.workspace.list()

AttributeError: type object 'DatabricksAPI' has no attribute 'workspace'

In [3]: DatabricksAPI.workspaces
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-12099cc2227b> in <module>
----> 1 DatabricksAPI.workspaces

AttributeError: type object 'DatabricksAPI' has no attribute 'workspaces'

It looks like the DatabricksAPI object doesn't have any publicly accessible attrs:

DatabricksAPI.__dict__

mappingproxy({'__module__': 'databricks_api.databricks',
              '__init__': <function databricks_api.databricks.DatabricksAPI.__init__(self, **kwargs)>,
              '__dict__': <attribute '__dict__' of 'DatabricksAPI' objects>,
              '__weakref__': <attribute '__weakref__' of 'DatabricksAPI' objects>,
              '__doc__': None})

New git_source parameter not in API

The JobsCreate portion of the API now has a git_source parameter in the 2.1 API. When I try to use this however, I get the following error:

TypeError: create_job() got an unexpected keyword argument 'git_source'

I'm assuming this means the parameters need to be rescanned in a new update of the package; is there a plan to do this any time soon?

Usage of urllib3 is outdated specific to method_whitelist

Specific traceback:

    method_whitelist=set({'POST'}) | set(Retry.DEFAULT_METHOD_WHITELIST),
AttributeError: type object 'Retry' has no attribute 'DEFAULT_METHOD_WHITELIST'

urllib3 was updated for neutral language, therefore affecting method_whitelist. Specifically, Retry.DEFAULT_METHOD_WHITELIST was changed to Retry.DEFAULT_ALLOWED_METHODS.

Add Support for Azure and GCP

Currently methods db.cluster.create_cluster(), db.cluster.edit_cluster(), db.instance_pool.create_instance_pool(), and db.instance_pool.edit_instance_pool() only support API calls to AWS-based databricks workspaces. I'd recommend adding azure_attributes and gcp_attributes as parameters to these functions to support API calls on all platforms.

edit: opened an issue on databricks-cli here

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.