databricks / databricks-sdk-py Goto Github PK
View Code? Open in Web Editor NEWDatabricks SDK for Python (Beta)
Home Page: https://databricks-sdk-py.readthedocs.io/
License: Apache License 2.0
Databricks SDK for Python (Beta)
Home Page: https://databricks-sdk-py.readthedocs.io/
License: Apache License 2.0
Setting the direct_download
flag to True in the API returns an unencoded version of the notebook but throws a non-descript error today.
no way to set data_security_mode via WorkspaceClient.clusters.create
had to fall back to ClusterApi
Hello,
I am looking for the simplest to automate submitting a large amount of jobs. I see that databricks_cli https://docs.databricks.com/dev-tools/cli/index.html does not seem to be in active development, so I am looking at this project as a solution for automation.
I would like to be able to run this sdk from within my databricks workspace, but when I try the example https://docs.databricks.com/dev-tools/sdk-python.html#get-started-with-the-databricks-sdk-for-python, I get the error message default auth: cannot configure default credentials
I assume that is because databricks-sdk-py is unaware that I am running from within my databricks workspace. My question is, is there a way to make the module recognize that and assume my credentials?
here is the code:
account = AccountClient(host="https://accounts.cloud.databricks.com",account_id=db_account_id, username=db_username, password=db_password)
account.networks.list()
I used e2-certification account.
ValueError Traceback (most recent call last)
in <cell line: 8>()
6 verifier = Verifier(db_account_id, db_username, db_password, aws_key, aws_secret)
7
----> 8 verifier.run_private_link_check("3324600082051037")
in run_private_link_check(self, workspace_id)
80
81 def run_private_link_check(self, workspace_id:str):
---> 82 db_network_id = self.get_db_network_id(workspace_id)
83 vpc_id = self.get_vpc_id(workspace_id)
84 vpc_endpoints_ids = self.get_vpc_endpoint_ids(network_id)
in get_db_network_id(self, workspace_id)
24 def get_db_network_id(self, workspace_id:str):
25 #get network id.
---> 26 for network in self.account.networks.list():
27 if network.workspace_id == self.workspace_id:
28 return network.network_id
/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/databricks/sdk/service/deployment.py in list(self)
1292
1293 json = self._api.do('GET', f'/api/2.0/accounts/{self._api.account_id}/networks')
-> 1294 return [Network.from_dict(v) for v in json]
1295
1296
/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/databricks/sdk/service/deployment.py in (.0)
1292
1293 json = self._api.do('GET', f'/api/2.0/accounts/{self._api.account_id}/networks')
-> 1294 return [Network.from_dict(v) for v in json]
1295
1296
/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/databricks/sdk/service/deployment.py in from_dict(cls, d)
653 vpc_id=d.get('vpc_id', None),
654 vpc_status=VpcStatus(d['vpc_status']) if 'vpc_status' in d else None,
--> 655 warning_messages=[NetworkWarning.from_dict(v)
656 for v in d['warning_messages']] if 'warning_messages' in d else None,
657 workspace_id=d.get('workspace_id', None))
/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/databricks/sdk/service/deployment.py in (.0)
653 vpc_id=d.get('vpc_id', None),
654 vpc_status=VpcStatus(d['vpc_status']) if 'vpc_status' in d else None,
--> 655 warning_messages=[NetworkWarning.from_dict(v)
656 for v in d['warning_messages']] if 'warning_messages' in d else None,
657 workspace_id=d.get('workspace_id', None))
/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/databricks/sdk/service/deployment.py in from_dict(cls, d)
710 def from_dict(cls, d: Dict[str, any]) -> 'NetworkWarning':
711 return cls(warning_message=d.get('warning_message', None),
--> 712 warning_type=WarningType(d['warning_type']) if 'warning_type' in d else None)
713
714
/usr/lib/python3.9/enum.py in call(cls, value, names, module, qualname, type, start)
358 """
359 if names is None: # simple value lookup
--> 360 return cls.new(cls, value)
361 # otherwise, functional API: we're creating a new Enum type
362 return cls.create(
/usr/lib/python3.9/enum.py in new(cls, value)
676 ve_exc = ValueError("%r is not a valid %s" % (value, cls.qualname))
677 if result is None and exc is None:
--> 678 raise ve_exc
679 elif exc is None:
680 exc = TypeError(
ValueError: 'vpc' is not a valid WarningType
Errors received from trying to list tables in the workspace
import os
from databricks.sdk import WorkspaceClient
if __name__ == "__main__":
w = WorkspaceClient(
host="https://2111501043581247.7.gcp.databricks.com/",
token=os.environ['PAT_JAPAN'])
w.jobs.delete()
for t in w.tables.list():
print(t)
Traceback (most recent call last):
File "/Users/gant.kuln/test_databricks_py/helloworld.py", line 10, in <module>
for k in w.tables.list():
File "/Users/gant.kuln/miniconda3/envs/test_databricks_py/lib/python3.10/site-packages/databricks/sdk/service/unitycatalog.py", line 3081, in list
json = self._api.do('GET', '/api/2.1/unity-catalog/tables', query=query)
File "/Users/gant.kuln/miniconda3/envs/test_databricks_py/lib/python3.10/site-packages/databricks/sdk/client.py", line 416, in do
raise DatabricksError(**response.json())
TypeError: DatabricksError.__init__() got an unexpected keyword argument 'details'
(test_databricks_py) gant.kuln@JDY0V6H3X2 test_databricks_py %
In
TypeError: Object of type Library is not JSON serializable
when calling JobsAPI.create with a Libray object set in a JobTask
instead the line should be
if self.libraries: body['libraries'] = [v.as_dict() for v in self.libraries]
There is a hidden behaviour in requests
where a .netrc
will silently override provided authentication headers unless manually overridden.
We first observed this in dbt-databricks because of databricks/dbt-databricks#337. Fixing it however required two changes: one to dbt-databricks and one to databricks-sdk-py. The fix is simple: override the default behaviour of requests
by supplying a custom AuthBase
class.
requests
and a REST API (instead of thrift server) to check if the all-purpose cluster is running.requests
to perform the OAuth handshake. This step is only needed where OAuth is used to authenticate dbt-databricksThere is probably an easier way to reproduce this with a short script. But I'm using what I found while developing dbt-databricks.
Checkout this code from dbt-databricks databricks/dbt-databricks#338. This branch incorporates the first fix I described above.
Add an intentionally bad ~/.netrc to your workstation, like this:
machine <my-workspace>.cloud.databricks.com
login token
password <expired_token>
test_python_uc_sql_endpoint
integration test after updating _build_databricks_cluster_target
in tests/profiles.py
to comment out the "token"
key. This forces dbt-databricks to use OAuth instead.E ValueError: b'{"errorCode":"invalid_client","errorSummary":"Invalid value for \'client_id\' parameter.","errorLink":"invalid_client","errorId":"oaeLJQz1r35SrSNVtVcJUig0A","errorCauses":[]}'
This happens because without the override applied, the SDK includes authentication headers for a REST API request that doesn't require authentication and the server kicks back an invalid value for 'client_id' error
.
I'm about to open a PR that fixes this.
There is a related issue on databricks-sql-python (which implements its own oauth process). That fix is the same as this one.
Right now it is quite easy to use OAuthClient. It encapsulates Azure vs AWS differences and it returns a credentials_provider, which is nice. However, ClientCredentials is very crude and does nothing of the above.
One proposal [preferred] would be to encapsulate ClientCredentials within OAuthClient and have a single client to worry about.
An alternative would be to have a ClientCredentialsClient that provides a cloud-agnostic API and returns a credentials_provider.
I mistakenly supplied "https://accounts.gcp.databricks.com/workspaces/2111501043581247" as the host URL for WorkspaceClient. But it wasn't obvious from the error message what was wrong. Would be great if the SDK can detect it and provide actionable error message.
Hello,
I believe the current method to create a cluster in the SDK is missing init_scripts (which is a part of the APIs- Create new cluster). I am currently using a custom class which adds the init_scripts part to the body of the request.
Is adding init_scripts during cluster creation through the SDK planned?
Thank you.
Hello!
I'm not sure if I'm doing something wrong, but, while using the groups' list
operation from WorkspaceClient:
03/05/2023 04:00:10 PM :::DEBUG:::: Loaded from environment
03/05/2023 04:00:10 PM :::DEBUG:::: Attempting to configure auth: pat
03/05/2023 04:00:10 PM :::DEBUG:::: Starting new HTTPS connection (1): xxxxxxxxxxxx.cloud.databricks.com:443
03/05/2023 04:00:12 PM :::DEBUG:::: https://xxxxxxxxxxxx.cloud.databricks.com:443 "GET /api/2.0/preview/scim/v2/Groups?filter=displayName%2Beq%2Bmy_group HTTP/1.1" 400 None
03/05/2023 04:00:12 PM :::DEBUG:::: GET /api/2.0/preview/scim/v2/Groups?filter=displayName+eq+my_group
< 400 Bad Request
< {
< "detail": "Given filter operator is not supported.",
< "schemas": [
< "urn:ietf:params:scim:api:messages:2.0:Error"
< ],
< "scimType": "InvalidFilter",
< "status": "400"
< }
Traceback (most recent call last):
File "<redacted>/main.py", line 6, in <module>
databricks_groups = databricks_workspace.groups.list(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/databricks/sdk/service/iam.py", line 1291, in list
json = self._api.do('GET', '/api/2.0/preview/scim/v2/Groups', query=query)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/homebrew/lib/python3.11/site-packages/databricks/sdk/core.py", line 753, in do
raise self._make_nicer_error(status_code=response.status_code, **payload) from None
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: ApiClient._make_nicer_error() missing 1 required positional argument: 'message'
If I try the exact same filter using cURL, for example, everything works fine.
This is the main.py
:
from databricks.sdk import WorkspaceClient
from modules.config.log import logger
databricks_workspace = WorkspaceClient()
databricks_groups = databricks_workspace.groups.list(
filter="displayName+eq+my_group"
)
I'm using the SDK version 0.1.2
and Python version 3.11.3
.
This package currently depends on requests <2.29.0,>=2.28.1
, which is a very tight range and unfortunately has published vulnerabilities. Please widen the allowed range (I suggest to requests <3,>=2.28.1
if forwards compatibility is important) so that this package does not force insecure dependent packages on its consumers.
Repro steps:
from databricks.sdk import WorkspaceClient
w = WorkspaceClient(host = "https://db-sme-demo-docs.cloud.databricks.com", token = "REDACTED")
for c in w.clusters.list():
print(c.cluster_name)
Expected:
Actual:
Traceback (most recent call last):
File "/Users/paul.cornell/databricks-python-sdk-demo/main.py", line 5, in <module>
for c in w.clusters.list():
File "/Users/paul.cornell/.local/share/virtualenvs/paul.cornell-Otax6dmi/lib/python3.9/site-packages/databricks/sdk/service/clusters.py", line 1908, in list
return [ClusterInfo.from_dict(v) for v in json['clusters']]
File "/Users/paul.cornell/.local/share/virtualenvs/paul.cornell-Otax6dmi/lib/python3.9/site-packages/databricks/sdk/service/clusters.py", line 1908, in <listcomp>
return [ClusterInfo.from_dict(v) for v in json['clusters']]
File "/Users/paul.cornell/.local/share/virtualenvs/paul.cornell-Otax6dmi/lib/python3.9/site-packages/databricks/sdk/service/clusters.py", line 410, in from_dict
data_security_mode=DataSecurityMode(d['data_security_mode'])
File "/usr/local/Cellar/[email protected]/3.9.15/Frameworks/Python.framework/Versions/3.9/lib/python3.9/enum.py", line 384, in __call__
return cls.__new__(cls, value)
File "/usr/local/Cellar/[email protected]/3.9.15/Frameworks/Python.framework/Versions/3.9/lib/python3.9/enum.py", line 702, in __new__
raise ve_exc
ValueError: 'LEGACY_SINGLE_USER_STANDARD' is not a valid DataSecurityMode
I see there are methods in w.grants
but they're not documented in the examples
path of the repo.
Also would be nice to add a README.md
in the examples
folder (or add it to the root README) stating that the structure of the SDK is the same as the URLs of the REST API explorer. That would make it easier to know and navigate the complete functionality of the SDK
The Examples and the README is not helpful for using the SDK when it comes to Query History APIs.
The documentation and user guide needs to be improved else this is not useful and I have to use the python requests package and do everything by myself from scratch.
WorkspaceClient.storage_credentials.create() requires Metastore ID which the API is not expecting. Throws error 'CreateStorageCredential metastore_id can not be provided.' when provided and 'StorageCredentialsAPI. create() missing 1 required positional argument: 'metastore_id' when not provided
Import error when using auth_type="oauth-m2m"
When running this python statement:
a = AccountClient(auth_type="oauth-m2m", profile="E2CERTACCT")
I found the error below while running in the python debugger.
File "/Users/xxxxxx.xxxxxxxx/.pyenv/versions/3.8.12/lib/python3.8/site-packages/databricks/sdk/core.py", line 23, in <module>
from .azure import ARM_DATABRICKS_RESOURCE_ID, ENVIRONMENTS, AzureEnvironment
ImportError: attempted relative import with no known parent package
log:
2023-06-09 22:53:01,090 [databricks.sdk][INFO] loading E2CERTACCT profile from ~/.databrickscfg: host, account_id, client_id, client_secret
2023-06-09 22:53:01,090 [databricks.sdk][DEBUG] Ignoring pat auth, because oauth-m2m is preferred
2023-06-09 22:53:01,090 [databricks.sdk][DEBUG] Ignoring basic auth, because oauth-m2m is preferred
2023-06-09 22:53:01,090 [databricks.sdk][DEBUG] Ignoring metadata-service auth, because oauth-m2m is preferred
2023-06-09 22:53:01,090 [databricks.sdk][DEBUG] Attempting to configure auth: oauth-m2m
Traceback (most recent call last):
File "/Users/douglas.moore/.pyenv/versions/3.8.12/lib/python3.8/site-packages/requests/models.py", line 971, in json
return complexjson.loads(self.text, **kwargs)
File "/Users/douglas.moore/.pyenv/versions/3.8.12/lib/python3.8/json/__init__.py", line 357, in loads
return _default_decoder.decode(s)
File "/Users/douglas.moore/.pyenv/versions/3.8.12/lib/python3.8/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/Users/douglas.moore/.pyenv/versions/3.8.12/lib/python3.8/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/douglas.moore/.pyenv/versions/3.8.12/lib/python3.8/site-packages/databricks/sdk/core.py", line 415, in __call__
header_factory = provider(cfg)
File "/Users/douglas.moore/.pyenv/versions/3.8.12/lib/python3.8/site-packages/databricks/sdk/core.py", line 60, in wrapper
return func(cfg)
File "/Users/douglas.moore/.pyenv/versions/3.8.12/lib/python3.8/site-packages/databricks/sdk/core.py", line 114, in oauth_service_principal
token_url=resp.json()["token_endpoint"],
File "/Users/douglas.moore/.pyenv/versions/3.8.12/lib/python3.8/site-packages/requests/models.py", line 975, in json
raise RequestsJSONDecodeError(e.msg, e.doc, e.pos)
requests.exceptions.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/douglas.moore/.pyenv/versions/3.8.12/lib/python3.8/site-packages/databricks/sdk/core.py", line 775, in _init_auth
self._header_factory = self._credentials_provider(self)
File "/Users/douglas.moore/.pyenv/versions/3.8.12/lib/python3.8/site-packages/databricks/sdk/core.py", line 421, in __call__
raise ValueError(f'{auth_type}: {e}') from e
ValueError: oauth-m2m: Expecting value: line 1 column 1 (char 0)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/douglas.moore/.pyenv/versions/3.8.12/lib/python3.8/site-packages/databricks/sdk/core.py", line 496, in __init__
self._init_auth()
File "/Users/douglas.moore/.pyenv/versions/3.8.12/lib/python3.8/site-packages/databricks/sdk/core.py", line 780, in _init_auth
raise ValueError(f'{self._credentials_provider.auth_type()} auth: {e}') from e
ValueError: default auth: oauth-m2m: Expecting value: line 1 column 1 (char 0)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/douglas.moore/development/dba-helper/permissions-graph/extract_account_principals.py", line 44, in <module>
a = AccountClient(auth_type="oauth-m2m", profile="E2CERTACCT", debug_headers=True, debug_truncate_bytes=300)
File "/Users/douglas.moore/.pyenv/versions/3.8.12/lib/python3.8/site-packages/databricks/sdk/__init__.py", line 192, in __init__
config = client.Config(host=host,
File "/Users/douglas.moore/.pyenv/versions/3.8.12/lib/python3.8/site-packages/databricks/sdk/core.py", line 501, in __init__
raise ValueError(message) from e
ValueError: default auth: oauth-m2m: Expecting value: line 1 column 1 (char 0). Config: host=https://accounts.cloud.databricks.com, account_id=deadbeef-deadbeef-deadbeef, client_id=dead999-1234-4321-9999-deadbeef, client_secret=***, profile=E2CERTACCT, auth_type=oauth-m2m, debug_truncate_bytes=300, debug_headers=True
The SCIM Groups API returns a value under meta.resourceType
which can have a value of Group
(account level group) or WorkspaceGroup
(workspace local group).
Currently, the SDK does not capture this piece of information
@dataclass
class Group:
display_name: str = None
entitlements: 'List[ComplexValue]' = None
external_id: str = None
groups: 'List[ComplexValue]' = None
id: str = None
members: 'List[ComplexValue]' = None
roles: 'List[ComplexValue]' = None
Note that the value is present when accessing the API directly
I propose we add the meta.resourceType
field to the Group data class. This functionality is currently leveraged by the UC-Migration project
Note that I'm happy to make an attempt at contributing this functionality. I don't see any contribution guides for this project.
I believe the issue is on the API side but
workspace_client.metastores.delete(id=metastoreId',force=True)
will not force delete the metastore
I am having to send force as url parameter
workspace_client.metastores.delete(id=metastoreId+'?force=true',force=True)
which is adding parameter force=true in url and not message body
Hello,
When using the workspace_conf.get_status() in the WorkspaceClient, the method does not work and returns an AttributeError: type object 'dict' has no attribute 'from_dict'
. I believe WorkspaceConf is just a dictionary and thus, it is raising an Attribute Error. Could you please verify and confirm if this is correct.
e.g. workspace_client.workspace_conf.get_status(keys="enableTokensConfig")
does not work
Thank you.
Right now two steps are needed to make the sdk work from notebooks:
So we don't need to solve #1 right away, but maybe we can just bake the logic from #2 into the SDK itself, just like e.g. mlflow works out of the box without passing a token around manually? Would be nice if you could just type w = WorkspaceClient()
in a notebook!
Although it's not documented, the query history list endpoint seems to handle filter_by
properly only if it is passed in the request body rather than as a query param:
Ran using databricks-sdk v0.1.5, requests v2.28.2:
In [2]: from databricks.sdk import WorkspaceClient
In [3]: client = WorkspaceClient(host=..., token=...)
In [4]: from databricks.sdk.service.sql import QueryFilter
In [6]: filter_by = QueryFilter.from_dict(
...: {
...: "query_start_time_range": {
...: "start_time_ms": 0,
...: "end_time_ms": int(time.time() * 1000),
...: }
...: }
...: )
In [7]: filter_by
Out[7]: QueryFilter(query_start_time_range=TimeRange(end_time_ms=1683569801679, start_time_ms=0), statuses=None, user_ids=None, warehouse_ids=None)
In [9]: next(client.query_history.list(filter_by=filter_by))
---------------------------------------------------------------------------
DatabricksError Traceback (most recent call last)
Cell In[9], line 1
----> 1 next(client.query_history.list(filter_by=filter_by))
File ~/.pyenv/versions/3.10.9/lib/python3.10/site-packages/databricks/sdk/service/sql.py:2824, in QueryHistoryAPI.list(self, filter_by, include_metrics, max_results, page_token, **kwargs)
2821 if page_token: query['page_token'] = request.page_token
2823 while True:
-> 2824 json = self._api.do('GET', '/api/2.0/sql/history/queries', query=query)
2825 if 'res' not in json or not json['res']:
2826 return
File ~/.pyenv/versions/3.10.9/lib/python3.10/site-packages/databricks/sdk/core.py:753, in ApiClient.do(self, method, path, query, body)
749 if not response.ok:
750 # TODO: experiment with traceback pruning for better readability
751 # See https://stackoverflow.com/a/58821552/277035
752 payload = response.json()
--> 753 raise self._make_nicer_error(status_code=response.status_code, **payload) from None
754 if not len(response.content):
755 return {}
DatabricksError: Could not parse request object: Expected 'START_OBJECT' not 'VALUE_STRING'
at [Source: (ByteArrayInputStream); line: 1, column: 15]
at [Source: java.io.ByteArrayInputStream@794544b2; line: 1, column: 15]
but if I call:
In [53]: "res" in client.query_history._api.do('GET', '/api/2.0/sql/history/queries', body={"filter_by": filter_by.as_dict()})
Out[53]: True
the API works as expected.
Second, when using pagination with the query history endpoint, it doesn't seem to allow specifying page_token
and filter_by
at the same time:
In [55]: client.query_history._api.do('GET', '/api/2.0/sql/history/queries', body={"filter_by": filter_by.as_dict(), "page_token": "abc"})
---------------------------------------------------------------------------
DatabricksError Traceback (most recent call last)
Cell In[55], line 1
----> 1 client.query_history._api.do('GET', '/api/2.0/sql/history/queries', body={"filter_by": filter_by.as_dict(), "page_token": "abc"})
File ~/.pyenv/versions/3.10.9/lib/python3.10/site-packages/databricks/sdk/core.py:753, in ApiClient.do(self, method, path, query, body)
749 if not response.ok:
750 # TODO: experiment with traceback pruning for better readability
751 # See https://stackoverflow.com/a/58821552/277035
752 payload = response.json()
--> 753 raise self._make_nicer_error(status_code=response.status_code, **payload) from None
754 if not len(response.content):
755 return {}
DatabricksError: You can provide only one of 'page_token' or 'filter_by'
The current implementation doesn't remove filter_by
on subsequent calls and thus doesn't paginate correctly when filter_by
is passed in. Here, I patch the do
call to pass query
params as body
, but this time we fail when we attempt to get the second page.
In [56]: original_do = client.api_client.do
In [57]: from unittest.mock import patch
In [62]: def patch_do(method, path, query = None, body = None):
...: print(method,path,query,body)
...: res = original_do(method, path, query = None, body=query)
...: print("RES", res.keys())
...: return res
...:
In [64]: with patch.object(ApiClient, "do") as mock_do:
...: mock_do.side_effect = patch_do
...: it = client.query_history.list(filter_by=filter_by, max_results=1)
...: next(it)
...: next(it)
...:
GET /api/2.0/sql/history/queries {'filter_by': {'query_start_time_range': {'end_time_ms': 1683569801679}}, 'max_results': 1} None
RES dict_keys(['next_page_token', 'has_next_page', 'res'])
GET /api/2.0/sql/history/queries {'filter_by': {'query_start_time_range': {'end_time_ms': 1683569801679}}, 'max_results': 1, 'page_token': 'CkwKJDAxZWRlZGFjLWJjNDktMTgyOS1hM2UwLTYwNDYwZTVkNjU4MBCD2ZHe/zAY09WStKW5tAUiEDcyNTJjYmE1NTlmNDhkZjQo+JgQEgkSBxDP89Hk/zAYAQ=='} None
---------------------------------------------------------------------------
DatabricksError Traceback (most recent call last)
Cell In[64], line 5
3 it = client.query_history.list(filter_by=filter_by, max_results=1)
4 next(it)
----> 5 next(it)
File ~/.pyenv/versions/3.10.9/lib/python3.10/site-packages/databricks/sdk/service/sql.py:2824, in QueryHistoryAPI.list(self, filter_by, include_metrics, max_results, page_token, **kwargs)
2821 if page_token: query['page_token'] = request.page_token
2823 while True:
-> 2824 json = self._api.do('GET', '/api/2.0/sql/history/queries', query=query)
2825 if 'res' not in json or not json['res']:
2826 return
File ~/.pyenv/versions/3.10.9/lib/python3.10/unittest/mock.py:1114, in CallableMixin.__call__(self, *args, **kwargs)
1112 self._mock_check_sig(*args, **kwargs)
1113 self._increment_mock_call(*args, **kwargs)
-> 1114 return self._mock_call(*args, **kwargs)
File ~/.pyenv/versions/3.10.9/lib/python3.10/unittest/mock.py:1118, in CallableMixin._mock_call(self, *args, **kwargs)
1117 def _mock_call(self, /, *args, **kwargs):
-> 1118 return self._execute_mock_call(*args, **kwargs)
File ~/.pyenv/versions/3.10.9/lib/python3.10/unittest/mock.py:1179, in CallableMixin._execute_mock_call(self, *args, **kwargs)
1177 raise result
1178 else:
-> 1179 result = effect(*args, **kwargs)
1181 if result is not DEFAULT:
1182 return result
Cell In[62], line 3, in patch_do(method, path, query, body)
1 def patch_do(method, path, query = None, body = None):
2 print(method,path,query,body)
----> 3 res = original_do(method, path, query = None, body=query)
4 print("RES", res.keys())
5 return res
File ~/.pyenv/versions/3.10.9/lib/python3.10/site-packages/databricks/sdk/core.py:753, in ApiClient.do(self, method, path, query, body)
749 if not response.ok:
750 # TODO: experiment with traceback pruning for better readability
751 # See https://stackoverflow.com/a/58821552/277035
752 payload = response.json()
--> 753 raise self._make_nicer_error(status_code=response.status_code, **payload) from None
754 if not len(response.content):
755 return {}
DatabricksError: You can provide only one of 'page_token' or 'filter_by'
To summarize (sorry for the long post!), if my assumptions about the query history API are correct, then:
filter_by
param should be removed when querying with page_token
When creating a single node job cluster, an error is thrown if the num_workers is set to 0. However, this is the same setting used when inspecting the JSON structure of a single node job cluster.
Error message:
databricks.sdk.core.DatabricksError: Cluster validation error: Missing required field: settings.cluster_spec.new_cluster.size
Cluster configuraiton:
{
"job_cluster_key": "ascend_ingest",
"new_cluster": {
"spark_version": "12.2.x-scala2.12",
"spark_conf": {
"spark.databricks.delta.preview.enabled": "true",
"spark.master": "local[*, 4]",
"spark.databricks.cluster.profile": "singleNode",
},
"azure_attributes": {
"first_on_demand": 1,
"availability": "ON_DEMAND_AZURE",
"spot_bid_max_price": -1,
},
"node_type_id": "Standard_DS3_v2",
"custom_tags": {"ResourceClass": "SingleNode"},
"spark_env_vars": {"PYSPARK_PYTHON": "/databricks/python3/bin/python3"},
"enable_elastic_disk": True,
"data_security_mode": "SINGLE_USER",
"runtime_engine": "STANDARD",
## Seems theres a bug that won't allow for single node clusters
"num_workers": 0,
},
}
Note that changing num_workers to 1 resolves the issue.
When listing files in a workspace folder that contains MFflow experiments (files with a glass bottle icon), exception is thrown.
how to reproduce:
w.workspace.list('/')
Exception thrown:
ValueError Traceback (most recent call last)
<command-3538949529974781> in <cell line: 1>()
----> 1 w.workspace.list('/')
/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/databricks/sdk/service/workspace.py in list(self, path, notebooks_modified_after, **kwargs)
283
284 json = self._api.do('GET', '/api/2.0/workspace/list', query=query)
--> 285 return [ObjectInfo.from_dict(v) for v in json['objects']]
286
287 def mkdirs(self, path: str, **kwargs):
/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/databricks/sdk/service/workspace.py in <listcomp>(.0)
283
284 json = self._api.do('GET', '/api/2.0/workspace/list', query=query)
--> 285 return [ObjectInfo.from_dict(v) for v in json['objects']]
286
287 def mkdirs(self, path: str, **kwargs):
/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/databricks/sdk/service/workspace.py in from_dict(cls, d)
166 modified_at=d.get('modified_at', None),
167 object_id=d.get('object_id', None),
--> 168 object_type=ObjectType(d['object_type']) if 'object_type' in d else None,
169 path=d.get('path', None),
170 size=d.get('size', None))
/usr/lib/python3.9/enum.py in __call__(cls, value, names, module, qualname, type, start)
358 """
359 if names is None: # simple value lookup
--> 360 return cls.__new__(cls, value)
361 # otherwise, functional API: we're creating a new Enum type
362 return cls._create_(
/usr/lib/python3.9/enum.py in __new__(cls, value)
676 ve_exc = ValueError("%r is not a valid %s" % (value, cls.__qualname__))
677 if result is None and exc is None:
--> 678 raise ve_exc
679 elif exc is None:
680 exc = TypeError(
ValueError: 'MLFLOW_EXPERIMENT' is not a valid ObjectType
While this library is easy to use,
I found copying files ranging from 200 kb to 5MB files from local laptop storage to dbfs to be much slower with this library when compared to the requests library or curl.
Curl was about 15.3 seconds for a 5MB file.
This library didn't finish the 5MB file after 20 minutes.
the azure_use_msi
is documented in README, but marked as implemented. Adding support for managed identity could simplify deployment of Python-based applications
Please see below process where the service_principals.create function fails to add the specified group. Am I using the wrong datatype?
dbw=WorkspaceClient(...)
groups = {group.display_name:group for group in dbw.groups.list()}
print(groups)
{
'users': Group(id='...', display_name='users', ...),
'service_principals': Group(id='...', display_name='service_principals',...),
'admins': Group(id='...', display_name='admins', ...)
}
sp = dbw.service_principals.create(
id=secret_client.get_secret("...").value,
application_id=secret_client.get_secret("...").value,
display_name="...",
groups=[
groups["service_principals"] # this seems to be ignored and it gets added to users instead.
]
)
print(sp)
ServicePrincipal(
id='...',
active=True,
application_id='...',
display_name='...',
entitlements=None,
external_id=None,
groups=None, # <-------------------- ???
roles=None
)
the expected value for groups is
groups=[ComplexValue(display='service_principals', primary=None, type='direct', value='...')],
many thanks
Looking through the code, I noticed you're using a call to /api/2.0/secrets/get
to retrieve secrets remotely. That API endpoint doesn't seem to be documented in Secrets API 2.0
Could it please be added?
Hello! I'm wondering why the import_
method has a trailing underscore, but export
doesn't - was that intentional?
Tried to run the following test:
from databricks.sdk import JobsAPI
client = ApiClient()
api = JobsAPI(client)
cluster = JobCluster(
job_cluster_key = "cluster1"
)
task1 = JobTaskSettings(
task_key = "task1",
)
api.create(
job_clusters = [cluster],
tasks = [task1]
)
This causes the following error:
self = <databricks.sdk.core.ApiClient object at 0x1265bcbb0>, cfg = None
def __init__(self, cfg: Config = None):
self._cfg = Config() if not cfg else cfg
> self._debug_truncate_bytes = cfg.debug_truncate_bytes if cfg.debug_truncate_bytes else 96
E AttributeError: 'NoneType' object has no attribute 'debug_truncate_bytes'
I've fixed the error in a separate branch and will submit a PR. Changes done:
Replaced:
def __init__(self, cfg: Config = None):
self._cfg = Config() if not cfg else cfg
self._debug_truncate_bytes = cfg.debug_truncate_bytes if cfg.debug_truncate_bytes else 96
With:
if cfg:
self._cfg = cfg
self._debug_truncate_bytes = cfg.debug_truncate_bytes if cfg.debug_truncate_bytes else 96
self._user_agent_base = cfg.user_agent
else:
self._cfg = Config()
self._debug_truncate_bytes = 96
self._user_agent_base = None
Currently we add a strict upper bound of <2.29 to the requests library. Requests 2.30+ is incompatible with urllib <2, but users of databricks-sdk may still depend on older versions of urllib. Once psf/requests#6432 is resolved, we should relax the upper bound to allow more recent versions of the requests library which incorporate the most recent release of urllib. This should improve the security posture of the SDK.
When calling API WorkspaceClient.model_registry.get_model()
, it always returns GetModelResponse(registered_model=None)
.
However, the returned value is correct when I call the Web API directly with requests by passing the param.
url = f"{host}/api/2.0/mlflow/databricks/registered-models/get"
resp = requests.get(url, headers=headers, params={"name": "model_name"})
Could you please take a look?
Getting Cross-origin token redemption is permitted only for the 'Single-Page Application' client-type. Request origin: 'http://localhost:8020/'.
when using OAuth client with Azure.
It works if I remove the code for both u2m and m2m:
if 'microsoft' in self._client.token_url:
# Tokens issued for the 'Single-Page Application' client-type may
# only be redeemed via cross-origin requests
headers = {'Origin': self._client.redirect_url}
Generated Enum types does not have support of being instantiated from string, hence it forces end users to not use non pythonic syntax of:
from databricks.sdk.service.workspace import ExportFormat
w.workspace.export('/my_notebook', format=ExportFormat.SOURCE)
instead of neat, short, and pythonic:
w.workspace.export('/my_notebook', format='SOURCE')
throws an exception:
AttributeError Traceback (most recent call last)
<command-3538949529976490> in <cell line: 1>()
----> 1 w.workspace.export('/Users/[email protected]/auto-dlt/generated/gr_products/gr_bronze-autoloader', format='SOURCE')
/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/databricks/sdk/service/workspace.py in export(self, path, direct_download, format, **kwargs)
225 query = {}
226 if direct_download: query['direct_download'] = request.direct_download
--> 227 if format: query['format'] = request.format.value
228 if path: query['path'] = request.path
229
AttributeError: 'str' object has no attribute 'value'
For example:
help(w.workspace.export)
has signature of:
export(path: str, *, direct_download: bool = None, format: databricks.sdk.service.workspace.ExportFormat = None, **kwargs) -> databricks.sdk.service.workspace.ExportResponse method of databricks.sdk.service.workspace.WorkspaceAPI instance
in turn parameter format
is of Enum type defined in databricks.sdk.service.workspace.ExportFormat
as
class ExportFormat(Enum):
"""This specifies the format of the file to be imported. By default, this is `SOURCE`. However it
may be one of: `SOURCE`, `HTML`, `JUPYTER`, `DBC`. The value is case sensitive."""
DBC = 'DBC'
HTML = 'HTML'
JUPYTER = 'JUPYTER'
R_MARKDOWN = 'R_MARKDOWN'
SOURCE = 'SOURCE'
In
TypeError: Object of type AccessControlRequest is not JSON serializable
when calling JobsAPI.create
instead the line should be
if self.access_control_list: body['access_control_list'] = [v.as_dict() for v in self.access_control_list]
This issue makes it impossible to use a custom container for a Cluster as documented here: https://docs.databricks.com/clusters/custom-containers.html
This is an ask from one of my customers if we are planning to support a asynchronous client for our SDK?
As the current client
from databricks_cli.sdk.api_client import ApiClient
doesn't supports asynchronous calls
Hi,
I'm trying to use the Python SDK to programatically create (if not already existing) a set of Git credentials on my Databricks environment. Getting the credentials gives the traceback below. I'm calling the SDK as such:
from databricks.sdk import GitCredentialsAPI
from databricks_cli.sdk.api_client import ApiClient
self.git_creds = GitCredentialsAPI(self.api_client)
self.git_creds.list()
Is the ApiClient you are using by any chance not the same class as the one I'm importing?
Thanks!
Traceback (most recent call last):
File "/home/georgelpreput/Source/pushcart-deploy/.venv/bin/pushcart-deploy", line 6, in
sys.exit(deploy())
^^^^^^^^
File "/home/georgelpreput/Source/pushcart-deploy/.venv/lib/python3.11/site-packages/click/core.py", line 1130, in call
return self.main(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/georgelpreput/Source/pushcart-deploy/.venv/lib/python3.11/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
^^^^^^^^^^^^^^^^
File "/home/georgelpreput/Source/pushcart-deploy/.venv/lib/python3.11/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/georgelpreput/Source/pushcart-deploy/.venv/lib/python3.11/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/georgelpreput/Source/pushcart-deploy/src/pushcart_deploy/setup.py", line 118, in deploy
d.deploy()
File "/home/georgelpreput/Source/pushcart-deploy/src/pushcart_deploy/setup.py", line 69, in deploy
_ = self.repos.get_or_create_git_credentials()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/georgelpreput/Source/pushcart-deploy/src/pushcart_deploy/databricks_api/repos_wrapper.py", line 81, in get_or_create_git_credentials
c for c in self.git_creds.list() if c["git_username"] == git_username
^^^^^^^^^^^^^^^^^^^^^
File "/home/georgelpreput/Source/pushcart-deploy/.venv/lib/python3.11/site-packages/databricks/sdk/service/workspace.py", line 759, in list
json = self._api.do('GET', '/api/2.0/git-credentials')
^^^^^^^^^^^^
AttributeError: 'ApiClient' object has no attribute 'do'
With version 0.0.2 of the SDK installed locally, I have a local config profile named DEFAULT
defined in my ~/.databrickscfg
file. I have no local DATABRICKS_*
environment variables defined.
The following code returns the error databricks.sdk.client.DatabricksError: No auth configured
:
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
for c in w.clusters.list():
print(c.cluster_name)
So does the following code:
from databricks.sdk import WorkspaceClient
w = WorkspaceClient(profile = 'DEFAULT')
for c in w.clusters.list():
print(c.cluster_name)
If I set my local DATABRICKS_HOST
and DATABRICKS_TOKEN
environment variables however, the following code runs as expected with no errors:
from databricks.sdk import WorkspaceClient
w = WorkspaceClient()
for c in w.clusters.list():
print(c.cluster_name)
I would expect the SDK to recognize my local DEFAULT
profile to be recognized without having to specify profile = 'DEFAULT'
. Also, I should be able to set my local DEFAULT
profile and not be forced to set local DATABRICKS_*
environment variables if I don't want to.
Repro steps:
databricks.sdk
namespace so that you I access the dbutils
global from my Python code.Expect:
dbutils
and the Databricks SDK for Python classes.Actual:
from databricks.sdk.runtime import dbutils
. But then databricks.sdk.runtime
squiggles with Import "databricks.sdk.runtime" could not be resolved from sourcePylancereportMissingModuleSource
.from databricks.sdk import WorkspaceClient
. However, any code that relies on this import runs as expected.from databricks.sdk import
and then press Ctrl + Enter, I only get a drop-down with runtime
. I was expecting a longer list with Databricks SDK for Python classes, for example AccountClient
and WorkspaceClient
.Because of the many tools we own, it would be greatly beneficial to have a single way to store tokens locally. keyring or encrypted file.
Code:
from databricks.sdk import WorkspaceClient
from databricks.sdk.service.jobs import JobTaskSettings, NotebookTask, NotebookTaskSource
w = WorkspaceClient()
job_name = input("Some short name for the job (for example, my-job): ")
description = input("Some short description for the job (for example, My job): ")
existing_cluster_id = input("ID of the existing cluster in the workspace to run the job on (for example, 1234-567890-ab123cd4): ")
notebook_path = input("Workspace path of the notebook to run (for example, /Users/[email protected]/my-notebook): ")
task_key = input("Some key to apply to the job's tasks (for example, my-key): ")
print("Attempting to run the job. Please wait...\n")
j = w.jobs.create(
name = job_name,
tasks = [
JobTaskSettings(
description = description,
existing_cluster_id = existing_cluster_id,
notebook_task = NotebookTask(
base_parameters = {""},
notebook_path = notebook_path,
source = NotebookTaskSource("WORKSPACE")
),
task_key = task_key
)
]
)
print(f"View the job at {w.config.host}/#job/{j.job_id}\n")
Input:
Some short name for the job (for example, my-job): my-job
Some short description for the job (for example, My job): My job
ID of the existing cluster in the workspace to run the job on (for example, 1234-567890-ab123cd4): <CLUSTER-ID-REDACTED>
Workspace path of the notebook to run (for example, /Users/[email protected]/my-notebook): /Users/<FULL-USERNAME-REDACTED>/hello
Some key to apply to the job's tasks (for example, my-key): my-key
Attempting to run the job. Please wait...
Error:
Traceback (most recent call last):
File "/Users/paul.cornell/databricks-sdk-py-demo/run-job.py", line 14, in <module>
j = w.jobs.create(
File "/Users/paul.cornell/databricks-sdk-py-demo/.venv/lib/python3.10/site-packages/databricks/sdk/service/jobs.py", line 2074, in create
json = self._api.do('POST', '/api/2.1/jobs/create', body=body)
File "/Users/paul.cornell/databricks-sdk-py-demo/.venv/lib/python3.10/site-packages/databricks/sdk/core.py", line 686, in do
response = self.request(method, f"{self._cfg.host}{path}", params=query, json=body, headers=headers)
File "/Users/paul.cornell/databricks-sdk-py-demo/.venv/lib/python3.10/site-packages/requests/sessions.py", line 573, in request
prep = self.prepare_request(req)
File "/Users/paul.cornell/databricks-sdk-py-demo/.venv/lib/python3.10/site-packages/requests/sessions.py", line 484, in prepare_request
p.prepare(
File "/Users/paul.cornell/databricks-sdk-py-demo/.venv/lib/python3.10/site-packages/requests/models.py", line 371, in prepare
self.prepare_body(data, files, json)
File "/Users/paul.cornell/databricks-sdk-py-demo/.venv/lib/python3.10/site-packages/requests/models.py", line 511, in prepare_body
body = complexjson.dumps(json, allow_nan=False)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/__init__.py", line 238, in dumps
**kw).encode(obj)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type set is not JSON serializable
Hello,
I wrote a script that can repair all the runs for a given job_id. However, when I have to repair for a second time, I come across this error:
w.jobs.repair_run(j.run_id, rerun_all_failed_tasks=True)
databricks.sdk.core.DatabricksError: The latest repair ID needs to be provided in order to create a new repair
I cannot find any output in jobs object that has this information. Can you provide some insight here? Thanks!
The code checks for only .azuredatabricks.net
, but I am pretty sure we support other host patterns. Some of which are:
".azuredatabricks.net",
".databricks.azure.cn",
".databricks.azure.us",
When calling "dbutils.fs.mount" the proxy call towards the cluster is using the wrong dictionary names for the parameters.
The generated code looks like this:
import json
(args, kwargs) = json.loads('[[], {"source": "<someUrl>", "mountPoint": "/mnt/<myMountPoint", "encryptionType": "", "owner": "", "extraConfigs": null}]')
result = dbutils.fs.mount(*args, **kwargs)
dbutils.notebook.exit(json.dumps(result))
When this is executed in the cluster, the following error is thrown:
TypeError: DBUtils.FSHandler.mount() got an unexpected keyword argument 'mountPoint'
Further more this error is not proxied back - instead you get TypeError: the JSON object must be str, bytes or bytearray, not NoneType (dbutils.py:245)
The actual problem are the wrong parameter names in the Method-Signatur of "mount".
The original DataBricks-API expects the parameters with underscores instead of camelCase (see https://docs.databricks.com/dbfs/mounts.html):
dbutils.fs.mount(
source: str,
mount_point: str,
encryption_type: Optional[str] = "",
extra_configs: Optional[dict[str:str]] = None
)
Same probably applies to the other Mount-Methods.
w.alerts.list()
I am not sure which alert causes this issue, it would be great if debug=True
(or something of this sort) was passable as kwargs
to enable debugging of element that caused exception (or is there better way to do it from notebooks?)
Exception:
ValueError Traceback (most recent call last)
<command-3538949529976490> in <cell line: 1>()
----> 1 w.alerts.list()
/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/databricks/sdk/service/sql.py in list(self)
2198
2199 json = self._api.do('GET', '/api/2.0/preview/sql/alerts')
-> 2200 return [Alert.from_dict(v) for v in json]
2201
2202 def list_schedules(self, alert_id: str, **kwargs) -> Iterator[RefreshSchedule]:
/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/databricks/sdk/service/sql.py in <listcomp>(.0)
2198
2199 json = self._api.do('GET', '/api/2.0/preview/sql/alerts')
-> 2200 return [Alert.from_dict(v) for v in json]
2201
2202 def list_schedules(self, alert_id: str, **kwargs) -> Iterator[RefreshSchedule]:
/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/databricks/sdk/service/sql.py in from_dict(cls, d)
70 name=d.get('name', None),
71 options=AlertOptions.from_dict(d['options']) if 'options' in d else None,
---> 72 query=Query.from_dict(d['query']) if 'query' in d else None,
73 rearm=d.get('rearm', None),
74 state=AlertState(d['state']) if 'state' in d else None,
/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/databricks/sdk/service/sql.py in from_dict(cls, d)
1201 latest_query_data_id=d.get('latest_query_data_id', None),
1202 name=d.get('name', None),
-> 1203 options=QueryOptions.from_dict(d['options']) if 'options' in d else None,
1204 permission_tier=PermissionLevel(d['permission_tier']) if 'permission_tier' in d else None,
1205 query=d.get('query', None),
/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/databricks/sdk/service/sql.py in from_dict(cls, d)
1454 def from_dict(cls, d: Dict[str, any]) -> 'QueryOptions':
1455 return cls(moved_to_trash_at=d.get('moved_to_trash_at', None),
-> 1456 parameters=[Parameter.from_dict(v)
1457 for v in d['parameters']] if 'parameters' in d else None)
1458
/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/databricks/sdk/service/sql.py in <listcomp>(.0)
1454 def from_dict(cls, d: Dict[str, any]) -> 'QueryOptions':
1455 return cls(moved_to_trash_at=d.get('moved_to_trash_at', None),
-> 1456 parameters=[Parameter.from_dict(v)
1457 for v in d['parameters']] if 'parameters' in d else None)
1458
/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/databricks/sdk/service/sql.py in from_dict(cls, d)
1101 return cls(name=d.get('name', None),
1102 title=d.get('title', None),
-> 1103 type=ParameterType(d['type']) if 'type' in d else None,
1104 value=d.get('value', None))
1105
/usr/lib/python3.9/enum.py in __call__(cls, value, names, module, qualname, type, start)
358 """
359 if names is None: # simple value lookup
--> 360 return cls.__new__(cls, value)
361 # otherwise, functional API: we're creating a new Enum type
362 return cls._create_(
/usr/lib/python3.9/enum.py in __new__(cls, value)
676 ve_exc = ValueError("%r is not a valid %s" % (value, cls.__qualname__))
677 if result is None and exc is None:
--> 678 raise ve_exc
679 elif exc is None:
680 exc = TypeError(
ValueError: 'datetime-range' is not a valid ParameterType
With the following code that uses 0.0.1 of the SDK:
import os
from databricks.sdk import WorkspaceClient
host = os.getenv('DATABRICKS_HOST')
token = os.getenv('DATABRICKS_TOKEN')
w = WorkspaceClient(host = host, token = token, auth_type = "pat")
w.jobs.create(
job_name = 'my-job',
tasks = my_tasks
)
The my_tasks
declaration doesn't seem to work no matter what syntax I use. For example:
Creating a list returns AttributeError: 'dict' object has no attribute 'as_dict'
:
my_tasks = [
{
"description": "My job.",
"existing_cluster_id": "1128-232547-p64vrmx2",
"notebook_task": {
"notebook_path": "/Users/[email protected]/go-fakedata"
},
"task_key": "my-key"
}
]
Creating a dictionary returns AttributeError: 'str' object has no attribute 'as_dict'
:
my_tasks = {
"description": "My job.",
"existing_cluster_id": "1128-232547-p64vrmx2",
"notebook_task": {
"notebook_path": "/Users/[email protected]/go-fakedata"
},
"task_key": "my-key"
}
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.