Git Product home page Git Product logo

databricks-api's Introduction

databricks-api

Please switch to the official Databricks SDK for Python (https://github.com/databricks/databricks-sdk-py) by running the following command:

pip install databricks-sdk

pypi pyversions

[This documentation is auto-generated]

This package provides a simplified interface for the Databricks REST API. The interface is autogenerated on instantiation using the underlying client library used in the official databricks-cli python package.

Install using

pip install databricks-api

The docs here describe the interface for version 0.17.0 of the databricks-cli package for API version 2.0.

The databricks-api package contains a DatabricksAPI class which provides instance attributes for the databricks-cli ApiClient, as well as each of the available service instances. The attributes of a DatabricksAPI instance are:

  • DatabricksAPI.client <databricks_cli.sdk.api_client.ApiClient>
  • DatabricksAPI.jobs <databricks_cli.sdk.service.JobsService>
  • DatabricksAPI.cluster <databricks_cli.sdk.service.ClusterService>
  • DatabricksAPI.policy <databricks_cli.sdk.service.PolicyService>
  • DatabricksAPI.managed_library <databricks_cli.sdk.service.ManagedLibraryService>
  • DatabricksAPI.dbfs <databricks_cli.sdk.service.DbfsService>
  • DatabricksAPI.workspace <databricks_cli.sdk.service.WorkspaceService>
  • DatabricksAPI.secret <databricks_cli.sdk.service.SecretService>
  • DatabricksAPI.groups <databricks_cli.sdk.service.GroupsService>
  • DatabricksAPI.token <databricks_cli.sdk.service.TokenService>
  • DatabricksAPI.instance_pool <databricks_cli.sdk.service.InstancePoolService>
  • DatabricksAPI.delta_pipelines <databricks_cli.sdk.service.DeltaPipelinesService>
  • DatabricksAPI.repos <databricks_cli.sdk.service.ReposService>

To instantiate the client, provide the databricks host and either a token or user and password. Also shown is the full signature of the underlying ApiClient.__init__

from databricks_api import DatabricksAPI

# Provide a host and token
db = DatabricksAPI(
    host="example.cloud.databricks.com",
    token="dpapi123..."
)

# OR a host and user and password
db = DatabricksAPI(
    host="example.cloud.databricks.com",
    user="[email protected]",
    password="password"
)

# Full __init__ signature
db = DatabricksAPI(
    user=None,
    password=None,
    host=None,
    token=None,
    api_version='2.0',
    default_headers={},
    verify=True,
    command_name='',
    jobs_api_version=None
)

Refer to the official documentation on the functionality and required arguments of each method below.

Each of the service instance attributes provides the following public methods:

DatabricksAPI.jobs

db.jobs.cancel_run(
    run_id,
    headers=None,
    version=None,
)

db.jobs.create_job(
    name=None,
    existing_cluster_id=None,
    new_cluster=None,
    libraries=None,
    email_notifications=None,
    timeout_seconds=None,
    max_retries=None,
    min_retry_interval_millis=None,
    retry_on_timeout=None,
    schedule=None,
    notebook_task=None,
    spark_jar_task=None,
    spark_python_task=None,
    spark_submit_task=None,
    max_concurrent_runs=None,
    tasks=None,
    headers=None,
    version=None,
)

db.jobs.delete_job(
    job_id,
    headers=None,
    version=None,
)

db.jobs.delete_run(
    run_id=None,
    headers=None,
    version=None,
)

db.jobs.export_run(
    run_id,
    views_to_export=None,
    headers=None,
    version=None,
)

db.jobs.get_job(
    job_id,
    headers=None,
    version=None,
)

db.jobs.get_run(
    run_id=None,
    headers=None,
    version=None,
)

db.jobs.get_run_output(
    run_id,
    headers=None,
    version=None,
)

db.jobs.list_jobs(
    job_type=None,
    expand_tasks=None,
    limit=None,
    offset=None,
    headers=None,
    version=None,
)

db.jobs.list_runs(
    job_id=None,
    active_only=None,
    completed_only=None,
    offset=None,
    limit=None,
    headers=None,
    version=None,
)

db.jobs.reset_job(
    job_id,
    new_settings,
    headers=None,
    version=None,
)

db.jobs.run_now(
    job_id=None,
    jar_params=None,
    notebook_params=None,
    python_params=None,
    spark_submit_params=None,
    python_named_params=None,
    idempotency_token=None,
    headers=None,
    version=None,
)

db.jobs.submit_run(
    run_name=None,
    existing_cluster_id=None,
    new_cluster=None,
    libraries=None,
    notebook_task=None,
    spark_jar_task=None,
    spark_python_task=None,
    spark_submit_task=None,
    timeout_seconds=None,
    tasks=None,
    headers=None,
    version=None,
)

DatabricksAPI.cluster

db.cluster.create_cluster(
    num_workers=None,
    autoscale=None,
    cluster_name=None,
    spark_version=None,
    spark_conf=None,
    aws_attributes=None,
    node_type_id=None,
    driver_node_type_id=None,
    ssh_public_keys=None,
    custom_tags=None,
    cluster_log_conf=None,
    spark_env_vars=None,
    autotermination_minutes=None,
    enable_elastic_disk=None,
    cluster_source=None,
    instance_pool_id=None,
    headers=None,
)

db.cluster.delete_cluster(
    cluster_id,
    headers=None,
)

db.cluster.edit_cluster(
    cluster_id,
    num_workers=None,
    autoscale=None,
    cluster_name=None,
    spark_version=None,
    spark_conf=None,
    aws_attributes=None,
    node_type_id=None,
    driver_node_type_id=None,
    ssh_public_keys=None,
    custom_tags=None,
    cluster_log_conf=None,
    spark_env_vars=None,
    autotermination_minutes=None,
    enable_elastic_disk=None,
    cluster_source=None,
    instance_pool_id=None,
    headers=None,
)

db.cluster.get_cluster(
    cluster_id,
    headers=None,
)

db.cluster.get_events(
    cluster_id,
    start_time=None,
    end_time=None,
    order=None,
    event_types=None,
    offset=None,
    limit=None,
    headers=None,
)

db.cluster.list_available_zones(headers=None)

db.cluster.list_clusters(headers=None)

db.cluster.list_node_types(headers=None)

db.cluster.list_spark_versions(headers=None)

db.cluster.permanent_delete_cluster(
    cluster_id,
    headers=None,
)

db.cluster.pin_cluster(
    cluster_id,
    headers=None,
)

db.cluster.resize_cluster(
    cluster_id,
    num_workers=None,
    autoscale=None,
    headers=None,
)

db.cluster.restart_cluster(
    cluster_id,
    headers=None,
)

db.cluster.start_cluster(
    cluster_id,
    headers=None,
)

db.cluster.unpin_cluster(
    cluster_id,
    headers=None,
)

DatabricksAPI.policy

db.policy.create_policy(
    policy_name,
    definition,
    headers=None,
)

db.policy.delete_policy(
    policy_id,
    headers=None,
)

db.policy.edit_policy(
    policy_id,
    policy_name,
    definition,
    headers=None,
)

db.policy.get_policy(
    policy_id,
    headers=None,
)

db.policy.list_policies(headers=None)

DatabricksAPI.managed_library

db.managed_library.all_cluster_statuses(headers=None)

db.managed_library.cluster_status(
    cluster_id,
    headers=None,
)

db.managed_library.install_libraries(
    cluster_id,
    libraries=None,
    headers=None,
)

db.managed_library.uninstall_libraries(
    cluster_id,
    libraries=None,
    headers=None,
)

DatabricksAPI.dbfs

db.dbfs.add_block(
    handle,
    data,
    headers=None,
)

db.dbfs.add_block_test(
    handle,
    data,
    headers=None,
)

db.dbfs.close(
    handle,
    headers=None,
)

db.dbfs.close_test(
    handle,
    headers=None,
)

db.dbfs.create(
    path,
    overwrite=None,
    headers=None,
)

db.dbfs.create_test(
    path,
    overwrite=None,
    headers=None,
)

db.dbfs.delete(
    path,
    recursive=None,
    headers=None,
)

db.dbfs.delete_test(
    path,
    recursive=None,
    headers=None,
)

db.dbfs.get_status(
    path,
    headers=None,
)

db.dbfs.get_status_test(
    path,
    headers=None,
)

db.dbfs.list(
    path,
    headers=None,
)

db.dbfs.list_test(
    path,
    headers=None,
)

db.dbfs.mkdirs(
    path,
    headers=None,
)

db.dbfs.mkdirs_test(
    path,
    headers=None,
)

db.dbfs.move(
    source_path,
    destination_path,
    headers=None,
)

db.dbfs.move_test(
    source_path,
    destination_path,
    headers=None,
)

db.dbfs.put(
    path,
    contents=None,
    overwrite=None,
    headers=None,
    src_path=None,
)

db.dbfs.put_test(
    path,
    contents=None,
    overwrite=None,
    headers=None,
    src_path=None,
)

db.dbfs.read(
    path,
    offset=None,
    length=None,
    headers=None,
)

db.dbfs.read_test(
    path,
    offset=None,
    length=None,
    headers=None,
)

DatabricksAPI.workspace

db.workspace.delete(
    path,
    recursive=None,
    headers=None,
)

db.workspace.export_workspace(
    path,
    format=None,
    direct_download=None,
    headers=None,
)

db.workspace.get_status(
    path,
    headers=None,
)

db.workspace.import_workspace(
    path,
    format=None,
    language=None,
    content=None,
    overwrite=None,
    headers=None,
)

db.workspace.list(
    path,
    headers=None,
)

db.workspace.mkdirs(
    path,
    headers=None,
)

DatabricksAPI.secret

db.secret.create_scope(
    scope,
    initial_manage_principal=None,
    scope_backend_type=None,
    backend_azure_keyvault=None,
    headers=None,
)

db.secret.delete_acl(
    scope,
    principal,
    headers=None,
)

db.secret.delete_scope(
    scope,
    headers=None,
)

db.secret.delete_secret(
    scope,
    key,
    headers=None,
)

db.secret.get_acl(
    scope,
    principal,
    headers=None,
)

db.secret.list_acls(
    scope,
    headers=None,
)

db.secret.list_scopes(headers=None)

db.secret.list_secrets(
    scope,
    headers=None,
)

db.secret.put_acl(
    scope,
    principal,
    permission,
    headers=None,
)

db.secret.put_secret(
    scope,
    key,
    string_value=None,
    bytes_value=None,
    headers=None,
)

DatabricksAPI.groups

db.groups.add_to_group(
    parent_name,
    user_name=None,
    group_name=None,
    headers=None,
)

db.groups.create_group(
    group_name,
    headers=None,
)

db.groups.get_group_members(
    group_name,
    headers=None,
)

db.groups.get_groups(headers=None)

db.groups.get_groups_for_principal(
    user_name=None,
    group_name=None,
    headers=None,
)

db.groups.remove_from_group(
    parent_name,
    user_name=None,
    group_name=None,
    headers=None,
)

db.groups.remove_group(
    group_name,
    headers=None,
)

DatabricksAPI.token

db.token.create_token(
    lifetime_seconds=None,
    comment=None,
    headers=None,
)

db.token.list_tokens(headers=None)

db.token.revoke_token(
    token_id,
    headers=None,
)

DatabricksAPI.instance_pool

db.instance_pool.create_instance_pool(
    instance_pool_name=None,
    min_idle_instances=None,
    max_capacity=None,
    aws_attributes=None,
    node_type_id=None,
    custom_tags=None,
    idle_instance_autotermination_minutes=None,
    enable_elastic_disk=None,
    disk_spec=None,
    preloaded_spark_versions=None,
    headers=None,
)

db.instance_pool.delete_instance_pool(
    instance_pool_id=None,
    headers=None,
)

db.instance_pool.edit_instance_pool(
    instance_pool_id,
    instance_pool_name=None,
    min_idle_instances=None,
    max_capacity=None,
    aws_attributes=None,
    node_type_id=None,
    custom_tags=None,
    idle_instance_autotermination_minutes=None,
    enable_elastic_disk=None,
    disk_spec=None,
    preloaded_spark_versions=None,
    headers=None,
)

db.instance_pool.get_instance_pool(
    instance_pool_id=None,
    headers=None,
)

db.instance_pool.list_instance_pools(headers=None)

DatabricksAPI.delta_pipelines

db.delta_pipelines.create(
    id=None,
    name=None,
    storage=None,
    configuration=None,
    clusters=None,
    libraries=None,
    trigger=None,
    filters=None,
    allow_duplicate_names=None,
    headers=None,
)

db.delta_pipelines.delete(
    pipeline_id=None,
    headers=None,
)

db.delta_pipelines.deploy(
    pipeline_id=None,
    id=None,
    name=None,
    storage=None,
    configuration=None,
    clusters=None,
    libraries=None,
    trigger=None,
    filters=None,
    allow_duplicate_names=None,
    headers=None,
)

db.delta_pipelines.get(
    pipeline_id=None,
    headers=None,
)

db.delta_pipelines.list(
    pagination=None,
    headers=None,
)

db.delta_pipelines.reset(
    pipeline_id=None,
    headers=None,
)

db.delta_pipelines.run(
    pipeline_id=None,
    headers=None,
)

db.delta_pipelines.start_update(
    pipeline_id=None,
    full_refresh=None,
    headers=None,
)

db.delta_pipelines.stop(
    pipeline_id=None,
    headers=None,
)

DatabricksAPI.repos

db.repos.create_repo(
    url,
    provider,
    path=None,
    headers=None,
)

db.repos.delete_repo(
    id,
    headers=None,
)

db.repos.get_repo(
    id,
    headers=None,
)

db.repos.list_repos(
    path_prefix=None,
    next_page_token=None,
    headers=None,
)

db.repos.update_repo(
    id,
    branch=None,
    tag=None,
    headers=None,
)

databricks-api's People

Contributors

crflynn avatar maxcaudle avatar nfx avatar zschumacher avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

databricks-api's Issues

Was not able to use edit_cluster

The snippet:
db.cluster.edit_cluster(cluster_id, spark_version="10.3.x-scala2.12")
Could you share some sample on using this edit_cluster API?


HTTPError Traceback (most recent call last)
~/Library/Python/3.7/lib/python/site-packages/databricks_cli/sdk/api_client.py in perform_query(self, method, path, data, headers, files, version)
137 try:
--> 138 resp.raise_for_status()
139 except requests.exceptions.HTTPError as e:

~/Library/Python/3.7/lib/python/site-packages/requests/models.py in raise_for_status(self)
940 if http_error_msg:
--> 941 raise HTTPError(http_error_msg, response=self)
942

HTTPError: 400 Client Error: Bad Request for url: https://adb-xyz.azuredatabricks.net/api/2.0/clusters/edit

During handling of the above exception, another exception occurred:

HTTPError Traceback (most recent call last)
/var/folders/0q/1yts_p_s4rq_1x984fxlbv600000gp/T/ipykernel_15169/3832242952.py in
1 if name == "main":
----> 2 main()

/var/folders/0q/1yts_p_s4rq_1x984fxlbv600000gp/T/ipykernel_15169/3778288670.py in main()
69 unravel_spark2_cluster_configs,
70 unravel_spark3_cluster_configs,
---> 71 output_directory_path)
72

/var/folders/0q/1yts_p_s4rq_1x984fxlbv600000gp/T/ipykernel_15169/2351089658.py in configureInteractiveClustersWithUnravel(cluster_list, workspace_id2api, workspace_spark_verisons, unravel_spark2_cluster_configs, unravel_spark3_cluster_configs, output_path)
91
92 db.cluster.edit_cluster(cluster_id,
---> 93 spark_version="10.3.x-scala2.12")
94
95
~/Library/Python/3.7/lib/python/site-packages/databricks_cli/sdk/service.py in edit_cluster(self, cluster_id, num_workers, autoscale, cluster_name, spark_version, spark_conf, aws_attributes, node_type_id, driver_node_type_id, ssh_public_keys, custom_tags, cluster_log_conf, spark_env_vars, autotermination_minutes, enable_elastic_disk, cluster_source, instance_pool_id, headers)
365 _data['instance_pool_id'] = instance_pool_id
366 print(_data)
--> 367 return self.client.perform_query('POST', '/clusters/edit', data=_data, headers=headers)
368
369 def get_cluster(self, cluster_id, headers=None):

~/Library/Python/3.7/lib/python/site-packages/databricks_cli/sdk/api_client.py in perform_query(self, method, path, data, headers, files, version)
144 except ValueError:
145 pass
--> 146 raise requests.exceptions.HTTPError(message, response=e.response)
147 return resp.json()
148

HTTPError: 400 Client Error: Bad Request for url: https://adb-xyz.azuredatabricks.net/api/2.0/clusters/edit
Response from server:
{ 'error_code': 'INVALID_PARAMETER_VALUE',
'message': 'Missing required field: Size'}

User creation through scim api

Hello

I forked your project and added the following to be able to create new users through the api.

https://docs.databricks.com/dev-tools/api/latest/scim/scim-users.html

def create_user(self, user_name=None, headers=None):
    _data = {}
    if user_name is not None:
        _data['schemas'] = ["urn:ietf:params:scim:schemas:core:2.0:User"]
        _data['userName'] = user_name
        _data['entitlements'] = [{'value': 'allow-cluster-create'}]		
    return self.client.perform_query('POST', '/preview/scim/v2/Users', data=_data, headers=headers)

Of course it can be improved to allow to pass groups and so on. Just in case you want it.

Relax databricks-cli version restriction

Right now the package depends on the databricks-cli 0.12.x only, while the latest version is 0.14.3 - it would be useful to relax dependency version to avoid dependency on the old version

Add tasks to an existing job

Hello guys, I'm implementing this Lib as an interface in the manipulations, but I couldn't understand how I'm going to add a new task to an existing job.
How could I do this?

Example:

There is the method

db.jobs.reset_job(
    "job_id",
    new_settings,
) 

and I need something like:

db.jobs.update_job(
    "job_id",
    new_settings,
) 

Create Cluster init_script

Hi

I like your API very much and I will use it in my CI pipeline. Unfortunately I have a problem adding my init_script to the cluster.

This is my code:

cluster_json = db.cluster.create_cluster(
   num_workers=2,
   cluster_name="az-ckw-uieb-databricks-devops_test",
   spark_version="5.5.x-scala2.11",
   spark_conf=None,
   node_type_id="Standard_DS3_v2",
   spark_env_vars={
       "PYSPARK_PYTHON": "/databricks/python3/bin/python3"
   },
   autotermination_minutes=120,
   enable_elastic_disk=True,
   init_scripts=[{'dbfs': {'destination': 'dbfs:/databricks/scripts/oracle-install.sh'}}],
)

However, if I execute it, I got this error message:

TypeError: create_cluster() got an unexpected keyword argument 'init_scripts'

Any idea?

Many Thanks
Christoph

Cannot use DatabricksAPI.workspace

According to the documentation I think I should be able to from databricks_api import DatabricksAPI and then access the DatabricksAPI.workspace, but this doesnt seem to be working. My main goal is to check if a directory structure exists, and if not create one.

In [1]: from databricks_api import DatabricksAPI

In [2]: DatabricksAPI.workspace.list()
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-2-09fa92501b4f> in <module>
----> 1 DatabricksAPI.workspace.list()

AttributeError: type object 'DatabricksAPI' has no attribute 'workspace'

In [3]: DatabricksAPI.workspaces
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-3-12099cc2227b> in <module>
----> 1 DatabricksAPI.workspaces

AttributeError: type object 'DatabricksAPI' has no attribute 'workspaces'

It looks like the DatabricksAPI object doesn't have any publicly accessible attrs:

DatabricksAPI.__dict__

mappingproxy({'__module__': 'databricks_api.databricks',
              '__init__': <function databricks_api.databricks.DatabricksAPI.__init__(self, **kwargs)>,
              '__dict__': <attribute '__dict__' of 'DatabricksAPI' objects>,
              '__weakref__': <attribute '__weakref__' of 'DatabricksAPI' objects>,
              '__doc__': None})

New git_source parameter not in API

The JobsCreate portion of the API now has a git_source parameter in the 2.1 API. When I try to use this however, I get the following error:

TypeError: create_job() got an unexpected keyword argument 'git_source'

I'm assuming this means the parameters need to be rescanned in a new update of the package; is there a plan to do this any time soon?

Usage of urllib3 is outdated specific to method_whitelist

Specific traceback:

    method_whitelist=set({'POST'}) | set(Retry.DEFAULT_METHOD_WHITELIST),
AttributeError: type object 'Retry' has no attribute 'DEFAULT_METHOD_WHITELIST'

urllib3 was updated for neutral language, therefore affecting method_whitelist. Specifically, Retry.DEFAULT_METHOD_WHITELIST was changed to Retry.DEFAULT_ALLOWED_METHODS.

image

Add Support for Azure and GCP

Currently methods db.cluster.create_cluster(), db.cluster.edit_cluster(), db.instance_pool.create_instance_pool(), and db.instance_pool.edit_instance_pool() only support API calls to AWS-based databricks workspaces. I'd recommend adding azure_attributes and gcp_attributes as parameters to these functions to support API calls on all platforms.

edit: opened an issue on databricks-cli here

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.