Git Product home page Git Product logo

azure-databricks-client's Introduction

Azure Databricks Client Library


Build Status CodeQL Linter NuGet Version 1.1 ()

The Azure Databricks Client Library offers a convenient interface for automating your Azure Databricks workspace through Azure Databricks REST API.

The implementation of this library is based on REST API version 2.0 and above.

The master branch is for version 2. Version 1.1 (stable) is in the releases/1.1 branch.

Requirements

You must have personal access tokens (PAT) or Azure Active Directory tokens (AAD Token) to access the databricks REST API.

Supported APIs

REST API Version Description
Clusters 2.0 The Clusters API allows you to create, start, edit, list, terminate, and delete clusters.
Jobs 2.1 The Jobs API allows you to programmatically manage Azure Databricks jobs.
Dbfs 2.0 The DBFS API is a Databricks API that makes it simple to interact with various data sources without having to include your credentials every time you read a file.
Secrets 2.0 The Secrets API allows you to manage secrets, secret scopes, and access permissions.
Groups 2.0 The Groups API allows you to manage groups of users.
Libraries 2.0 The Libraries API allows you to install and uninstall libraries and get the status of libraries on a cluster.
Token 2.0 The Token API allows you to create, list, and revoke tokens that can be used to authenticate and access Azure Databricks REST APIs.
Workspace 2.0 The Workspace API allows you to list, import, export, and delete notebooks and folders.
InstancePool 2.0 The Instance Pools API allows you to create, edit, delete and list instance pools.
Permissions 2.0 The Permissions API lets you manage permissions for Token, Cluster, Pool, Job, Delta Live Tables pipeline, Notebook, Directory, MLflow experiment, MLflow registered model, SQL warehouse, Repo and Cluster Policies.
Cluster Policies 2.0 The Cluster Policies API allows you to create, list, and edit cluster policies.
Global Init Scripts 2.0 The Global Init Scripts API lets Azure Databricks administrators add global cluster initialization scripts in a secure and controlled manner.
SQL Warehouses 2.0 The SQL Warehouses API allows you to manage compute resources that lets you run SQL commands on data objects within Databricks SQL.
Repos 2.0 The Repos API allows users to manage their git repos. Users can use the API to access all repos that they have manage permissions on.
Pipelines (Delta Live Tables) 2.0 The Delta Live Tables API allows you to create, edit, delete, start, and view details about pipelines.

Usage

Check out the Sample project for more detailed usages.

In the following examples, the baseUrl variable should be set to the workspace base URL, which looks like https://adb-<workspace-id>.<random-number>.azuredatabricks.net, and token variable should be set to your Databricks personal access token.

Creating client

using (var client = DatabricksClient.CreateClient(baseUrl, token))
{
    // ...
}

Cluster API

  • Create a single node cluster:
var clusterConfig = ClusterAttributes
            .GetNewClusterConfiguration("Sample cluster")
            .WithRuntimeVersion(RuntimeVersions.Runtime_10_4)
            .WithAutoScale(3, 7)
            .WithAutoTermination(30)
            .WithClusterLogConf("dbfs:/logs/")
            .WithNodeType(NodeTypes.Standard_D3_v2)
            .WithClusterMode(ClusterMode.SingleNode);

var clusterId = await client.Clusters.Create(clusterConfig);
  • Wait for the cluster to be ready (or fail to start):
using Policy = Polly.Policy;

static async Task WaitForCluster(IClustersApi clusterClient, string clusterId, int pollIntervalSeconds = 15)
{
    var retryPolicy = Policy.Handle<WebException>()
        .Or<ClientApiException>(e => e.StatusCode == HttpStatusCode.BadGateway)
        .Or<ClientApiException>(e => e.StatusCode == HttpStatusCode.InternalServerError)
        .Or<ClientApiException>(e => e.Message.Contains("\"error_code\":\"TEMPORARILY_UNAVAILABLE\""))
        .Or<TaskCanceledException>(e => !e.CancellationToken.IsCancellationRequested)
        .OrResult<ClusterInfo>(info => info.State is not (ClusterState.RUNNING or ClusterState.ERROR or ClusterState.TERMINATED))
        .WaitAndRetryForeverAsync(
            _ => TimeSpan.FromSeconds(pollIntervalSeconds),
            (delegateResult, _) =>
            {
                if (delegateResult.Exception != null)
                {
                    Console.WriteLine($"[{DateTime.UtcNow:s}] Failed to query cluster info - {delegateResult.Exception}");
                }
            });
    await retryPolicy.ExecuteAsync(async () =>
    {
        var info = await clusterClient.Get(clusterId);
        Console.WriteLine($"[{DateTime.UtcNow:s}] Cluster:{clusterId}\tState:{info.State}\tMessage:{info.StateMessage}");
        return info;
    });
}

await WaitForCluster(client.Clusters, clusterId);
  • Stop a cluster:
await client.Clusters.Terminate(clusterId);
await WaitForCluster(client.Clusters, clusterId);
  • Delete a cluster:
await client.Clusters.Delete(clusterId);

Jobs API

  • Create a job:
// Job schedule
var schedule = new CronSchedule
{
    QuartzCronExpression = "0 0 9 ? * MON-FRI",
    TimezoneId = "Europe/London",
    PauseStatus = PauseStatus.UNPAUSED
};

// Run with a job cluster
var newCluster = ClusterAttributes.GetNewClusterConfiguration()
    .WithClusterMode(ClusterMode.SingleNode)
    .WithNodeType(NodeTypes.Standard_D3_v2)
    .WithRuntimeVersion(RuntimeVersions.Runtime_10_4);

// Create job settings
var jobSettings = new JobSettings
{
    MaxConcurrentRuns = 1,
    Schedule = schedule,
    Name = "Sample Job"
};

// Adding 3 tasks to the job settings.
var task1 = jobSettings.AddTask("task1", new NotebookTask { NotebookPath = SampleNotebookPath })
    .WithDescription("Sample Job - task1")
    .WithNewCluster(newCluster);
var task2 = jobSettings.AddTask("task2", new NotebookTask { NotebookPath = SampleNotebookPath })
    .WithDescription("Sample Job - task2")
    .WithNewCluster(newCluster);
jobSettings.AddTask("task3", new NotebookTask { NotebookPath = SampleNotebookPath }, new[] { task1, task2 })
    .WithDescription("Sample Job - task3")
    .WithNewCluster(newCluster);

// Create the job.
Console.WriteLine("Creating new job");
var jobId = await client.Jobs.Create(jobSettings);
Console.WriteLine("Job created: {0}", jobId);
  • Start a job run
// Start the job and retrieve the run id.
Console.WriteLine("Run now: {0}", jobId);
var runId = await client.Jobs.RunNow(jobId);
  • Wait for a job run to complete
using Policy = Polly.Policy;

static async Task WaitForRun(IJobsApi jobClient, long runId, int pollIntervalSeconds = 15)
{
    var retryPolicy = Policy.Handle<WebException>()
        .Or<ClientApiException>(e => e.StatusCode == HttpStatusCode.BadGateway)
        .Or<ClientApiException>(e => e.StatusCode == HttpStatusCode.InternalServerError)
        .Or<ClientApiException>(e => e.Message.Contains("\"error_code\":\"TEMPORARILY_UNAVAILABLE\""))
        .Or<TaskCanceledException>(e => !e.CancellationToken.IsCancellationRequested)
        .OrResult<RunState>(state =>
            state.LifeCycleState is RunLifeCycleState.PENDING or RunLifeCycleState.RUNNING
                or RunLifeCycleState.TERMINATING)
        .WaitAndRetryForeverAsync(
            _ => TimeSpan.FromSeconds(pollIntervalSeconds),
            (delegateResult, _) =>
            {
                if (delegateResult.Exception != null)
                {
                    Console.WriteLine(
                        $"[{DateTime.UtcNow:s}] Failed to query run - {delegateResult.Exception}");
                }
            });
    await retryPolicy.ExecuteAsync(async () =>
    {
        var (run, _) = await jobClient.RunsGet(runId);
        Console.WriteLine(
            $"[{DateTime.UtcNow:s}] Run:{runId}\tLifeCycleState:{run.State.LifeCycleState}\tResultState:{run.State.ResultState}\tCompleted:{run.IsCompleted}"
        );
        return run.State;
    });
}

await WaitForRun(client.Jobs, runId);
  • Export a job run
var (run, _) = await client.Jobs.RunsGet(runId);
foreach (var runTask in run.Tasks)
{
    var viewItems = await client.Jobs.RunsExport(runTask.RunId);
    foreach (var viewItem in viewItems)
    {
        Console.WriteLine($"Exported view item from run {runTask.RunId}, task \"{runTask.TaskKey}\", view \"{viewItem.Name}\"");
        Console.WriteLine("====================");
        Console.WriteLine(viewItem.Content[..200] + "...");
        Console.WriteLine("====================");
    }
}

Secrets API

Creating secret scope

const string scope = "SampleScope";
await client.Secrets.CreateScope(scope, null);

Create text secret

var secretName = "secretkey.text";
await client.Secrets.PutSecret("secret text", scope, secretName);

Create binary secret

var secretName = "secretkey.bin";
await client.Secrets.PutSecret(new byte[]{0x01, 0x02, 0x03, 0x04}, scope, secretName);

Resiliency

The clusters/create, jobs/run-now and jobs/runs/submit APIs support idempotency token. It is optional token to guarantee the idempotency of requests. If a resource (a cluster or a run) with the provided token already exists, the request does not create a new resource but returns the ID of the existing resource instead.

If you specify the idempotency token, upon failure you can retry until the request succeeds. Databricks guarantees that exactly one resource is launched with that idempotency token.

The following code illustrates how to use Polly to retry the request with idempotency_token if the request fails.

using Polly;

double retryIntervalSec = 15;
string idempotencyToken = Guid.NewGuid().ToString();

var clusterInfo = ClusterAttributes.GetNewClusterConfiguration("my-cluster")
    .WithNodeType("Standard_D3_v2")
    .WithNumberOfWorkers(25)
    .WithRuntimeVersion(RuntimeVersions.Runtime_7_3);

var retryPolicy = Policy.Handle<WebException>()
    .Or<ClientApiException>(e => e.StatusCode == HttpStatusCode.BadGateway)
    .Or<ClientApiException>(e => e.StatusCode == HttpStatusCode.InternalServerError)
    .Or<ClientApiException>(e => e.StatusCode == HttpStatusCode.ServiceUnavailable)
    .Or<ClientApiException>(e => e.Message.Contains("\"error_code\":\"TEMPORARILY_UNAVAILABLE\""))
    .Or<TaskCanceledException>(e => !e.CancellationToken.IsCancellationRequested)
    .WaitAndRetryForeverAsync(_ => TimeSpan.FromSeconds(retryIntervalSec));

var clusterId = await retryPolicy.ExecuteAsync(async () => await client.Clusters.Create(clusterInfo, idempotencyToken));

Breaking changes

  • The v2 of the library targets .NET 6 runtime.

  • The Jobs API was redesigned to align with the version 2.1 of the REST API.

    • In the previous version, the Jobs API only supports single task per job. The new Jobs API supports multiple tasks per job, where the tasks are represented as a DAG.

    • The new version supports two more types of task: Python Wheel task and Delta Live Tables pipeline task.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit Microsoft Contributor License Agreement (CLA).

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

azure-databricks-client's People

Contributors

andyrkn avatar countervayl avatar craigjar avatar dbartek avatar dependabot[bot] avatar dubrie avatar insightfactory-mick avatar jamesfielder avatar jonathan-vogel-siemens avatar kenakamu avatar ksiomelo avatar kyle8329 avatar memoryz avatar microsoft-github-policy-service[bot] avatar microsoftopensource avatar mistermackey avatar msftgits avatar samuelchmiel avatar sjoerdsjoerd avatar skarpecki avatar smokedlinq avatar tomkerkhove avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

azure-databricks-client's Issues

CreateClient with Interfaces

It seems like version 2 lost Databricks.CreateClient with interfaces for unit testing. Can we add it back to version 2 as well?

Inner HttpClientHandler will get disposed after the first request.

Hi All,
There is an issue using DatabricksClient.cs class, concretely while using a class with Continuous Web Job.

Sometimes it throws System.ObjectDisposedException with message "Cannot access a disposed of object. Object name: 'System.Net.Http.HttpClient" because of disposing of an inner handler inside the HttpClientClass.

It's better to select is it needed to dispose of the inner handler or not.

Here the inner handler will be disposed in this case after query executions:

var handler = new HttpClientHandler
{
AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate
};
var httpClient = new HttpClient(handler)
{
BaseAddress = apiUrl,
Timeout = TimeSpan.FromSeconds(timeoutSeconds)
};

In this case, inner handler will not be disposed:
var httpClient = new HttpClient(handler, false) { BaseAddress = apiUrl, Timeout = TimeSpan.FromSeconds(timeoutSeconds) };

Documentation about HttpClient constructors: Link

Same issue using HttpClient: Link

Thanks,
Illia

ObjectType.FILE

I'm getting an exception on the Workspace.List() call if it encounters an object of type FILE. This doesn't exist in the enum so it just crashes with unable to serialize. Can this be added, or is there some reason it shouldn't be?

We get following exception when we try to invoke via data brick C# client to list scopes.

The operation was canceled.
System.IO.IOException: Unable to read data from the transport connection: Operation canceled.
System.Net.Sockets.SocketException (125): Operation canceled
--- End of inner exception stack trace ---
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.GetResult(Int16 token)
at System.Net.Security.SslStream. FillBufferAsync g__InternalFillBufferAsync|215_0[TReadAdapter](TReadAdapter adap, ValueTask1 task, Int32 min, Int32 initial) at System.Net.Security.SslStream.ReadAsyncInternal[TReadAdapter](TReadAdapter adapter, Memory1 buffer)
at System.Net.Http.HttpConnection.FillAsync()
at System.Net.Http.HttpConnection.ReadNextResponseHeaderLineAsync(Boolean foldedHeadersAllowed)
at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, CancellationToken cancellationToken)
--- End of inner exception stack trace ---
at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, CancellationToken cancellationToken)
at System.Net.Http.HttpConnectionPool.SendWithNtConnectionAuthAsync(HttpConnection connection, HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at System.Net.Http.DecompressionHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at Microsoft.Azure.Databricks.Client.TimeoutHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
at Microsoft.Azure.Databricks.Client.ApiClient.HttpGet[T](HttpClient httpClient, String requestUri)
at Microsoft.Azure.Databricks.Client.SecretsApiClient.ListScopes()

Guidance for determining end time of a job

We are going through all of our jobs to interpret them via:

await databricksClient.Jobs.RunsList()

We were wondering if there is a way to determine the end time for a job?

I've noticed you have SetupDuration, ExecutionDuration & CleanupDuration but I'm not sure what they represent? Miliseconds?

If we take the sum of those and add it to the start time, does that represent the end time?

Missing TerminantionCode enum values

The following values are listed as valid termination codes but are not in the TerminationCode.cs enum:

BOOTSTRAP_TIMEOUT
CONTAINER_LAUNCH_FAILURE
DBFS_COMPONENT_UNHEALTHY
DRIVER_UNREACHABLE
DRIVER_UNRESPONSIVE
METASTORE_COMPONENT_UNHEALTHY
NETWORK_CONFIGURATION_FAILURE
AZURE_RESOURCE_MANAGER_THROTTLING

This causes json deserialization errors.

support for idempotency token in RunSubmit api and support to create one time notebook run settings in RunOnceSettings.cs

Databricks RunSubmit api supports the "idempotency_token" to ensure that even if a client submits the run with same idempotency_token multiple times, it is still treated as a single run. This "idempotency_token" is missing in the RunOnceSettings.cs in current client implementation. Can you please add this to the RunOnceSettings class.

Also, can you provide a way to create onetime notebook run settings for the RunSubmit api just like the onetime spark jar run settings method in the RunOnceSettings.cs?

I have the code ready for it but I cannot submit a pull request for the above items as I'm not a collaborator.

await db_client.Client.Workspace.List(path) exception

This call:

List <ObjectInfo> objects = (await db_client.Client.Workspace.List(path)).ToList();

Started throwing this exception recently:

System.Text.Json.JsonException
  HResult=0x80131500
  Message=The JSON value could not be converted to Microsoft.Azure.Databricks.Client.Models.ObjectType. Path: $.object_type | LineNumber: 0 | BytePositionInLine: 26.
  Source=System.Text.Json
  StackTrace:
   at System.Text.Json.ThrowHelper.ThrowJsonException(String message)
   at System.Text.Json.Serialization.Converters.EnumConverter`1.Read(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options)
   at System.Text.Json.Serialization.Metadata.JsonPropertyInfo`1.ReadJsonAndSetMember(Object obj, ReadStack& state, Utf8JsonReader& reader)
   at System.Text.Json.Serialization.Converters.ObjectDefaultConverter`1.OnTryRead(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options, ReadStack& state, T& value)
   at System.Text.Json.Serialization.JsonConverter`1.TryRead(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options, ReadStack& state, T& value)
   at System.Text.Json.Serialization.JsonConverter`1.ReadCore(Utf8JsonReader& reader, JsonSerializerOptions options, ReadStack& state)
   at System.Text.Json.JsonSerializer.ReadFromSpan[TValue](ReadOnlySpan`1 utf8Json, JsonTypeInfo jsonTypeInfo, Nullable`1 actualByteCount)
   at System.Text.Json.JsonSerializer.ReadNode[TValue](JsonNode node, JsonTypeInfo jsonTypeInfo)
   at Microsoft.Azure.Databricks.Client.WorkspaceApiClient.<>c.<List>b__5_0(JsonNode obj)
   at System.Linq.Enumerable.SelectIListIterator`2.ToList()
   at IntersectCLI.IntersectCLIProgram.<>c__DisplayClass16_0.<<CheckICMDatasets>g__GetNotebooksAsync|1>d.MoveNext()

Error when calling "DatabricksClient.Jobs.RunsGetOutput()": Microsoft.CSharp.RuntimeBinder.RuntimeBinderException: 'Cannot perform runtime binding on a null reference'

Hi team,

I can successfully access a job by its ID, and wait for it to be finished using the "databricksClient.Jobs.RunsGet()" method, but when I try to access the output of the job run, I get an exception, probably during convertion of the received payload to a dynamic object, in https://github.com/Azure/azure-databricks-client/blob/master/csharp/Microsoft.Azure.Databricks.Client/JobsApiClient.cs at line 135.

Or maybe I'm doing something wrong? The only expected parameter is a runId, and I pass a correct one (retrieved from the parent job).

Message | "Cannot perform runtime binding on a null reference"
Source | "Anonymously Hosted DynamicMethods Assembly"
StackTrace | "   at System.Dynamic.UpdateDelegates.UpdateAndExecute1[T0,TRet](CallSite site, T0 arg0)\r\n
   at Microsoft.Azure.Databricks.Client.JobsApiClient.<RunsGetOutput>d__13.MoveNext()\r\n
   at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw()\r\n
   at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task)\r\n
   at System.Runtime.CompilerServices.TaskAwaiter`1.GetResult()

Thanks for your help!

fred

Exception when adding member to group

Hi,

I use the following code to add a user to a Databricks group (admins in this case):

var principalName = new PrincipalName { UserName = userName };
await _databricksClient.Groups.AddMember(parentGroupName, principalName);

This results in the following exception:

Microsoft.CSharp.RuntimeBinder.RuntimeBinderException: Microsoft.Azure.Databricks.Client.PrincipalName' does not contain a definition for 'parent_name'

In the 'GroupsApiClient' class, you convert the PrincipalName to dynamic and add that property before posting it using the HttpClient. Would an internal property of the same name not been a better option? Am I the only one to have this issue?

Thanks and best regards
fred

Azure Databricks Python SDK

Hi,

I know this is not the right place to ask this question but don't we have any Azure databricks Python SDK. I am looking for something where I can execute notebook commands remotely.

RunParameters

Hello,

Can you give an example of how to specify notebook parameters using this library?

not possible to use Azure SDK for credential passthrough to assign a user

While developing solution for a big audit company, we are going to use “credential passthrough” option for creation of the Databricks clusters, a crucial part of this process is to specify user account that is supposed to be used for “credential passthrough”. This can be done from Databricks UI, but we could not find a way how to do this through C#.Net SDK (Microsoft.Azure.Databricks.Client) - it is possible to configure a cluster with credential passthrough, but not possible to assign a user.

Could you please clarify if this functionality is really missing in Azure SDK/Databricks API ?

Implementing the Repos API

Firstly, great sdk - it has streamlined my code to no end. Thanks! I was looking for an implementation of the Repos API but I have seen that it is not included here. I have added a basic implementation of the Repos API to a cloned version of your sdk and am using that now. Also noted that the additional Enum for FILE I added to the Workspace List function you have now corrected in your sdk. Do you have any plans to implement the Repos API? If not in the near timeframe, would you be happy to accept a PR for the Repos API implementation I have added?.

Programatically access Databricks APIs without user token/user intervention

We are using Databricks client in our service to Create and manage Databricks clusters, jobs and secrets.
But currently authentication to client is via user token.
How can we leverage other Azure library and get the access token for Databricks client programatically.
We don't want to use service principal or any browser login/user intervention.

I can see that you can AAD access token, and initiate the client here

How can i get the databricks token mentioned here?

This is how az cli is used to get the databricks token/
Get a token for the global Databricks application.
token_response=$(az account get-access-token --resource 2ff814a6-3304-4ab8-85cb-cd0e6f879c1d) token=$(jq .accessToken -r <<< "$token_response")

How can i achieve the same Databricks token from my Dotnet service code?

Please share documentation on the same or any code snippet which could be useful here.

Any help here would be really appreciated.

RunSubmit cluster configuration

RunSubmitSettings does not have a method or property for configuring clusters.
As a result, I get the following response when I use this method:

{ 
	"error_code": "INVALID_PARAMETER_VALUE",
	"message":"Either new_cluster or existing_cluster_id must be specified."
}

What should I do to configure the cluster?

[Feature Request] Cancellation

All methods should provide the additional optional parameter cancellationToken to cancel the corresponding HTTP request.

Documentation domain not accessible

The link provided in your homepage is not working, as the whole domain docs.azuredatabricks.net is not accessible:
https://docs.azuredatabricks.net/api/latest/authentication.html
403 ERROR
The request could not be satisfied.
Bad request. We can't connect to the server for this app or website at this time. There might be too much traffic or a configuration error. Try again later, or contact the app or website owner.
If you provide content to customers through CloudFront, you can find steps to troubleshoot and help prevent this error by reviewing the CloudFront documentation.
Generated by cloudfront (CloudFront)
Request ID: UWoh2DWX5kHgCXvW4g7sjHBRdMxC2Z_JuLfxADBLXXjjwSX0CBd-AA==

TY

REPO objects prevent WorkspaceApiClient.List from working

When calling DatabricksClient.Workspace.List on directories containing repositories, the call with throw an exception because Microsoft.Azure.Databricks.Client.ObjectType does not define REPO.

NuGet Package, version 1.1.2515.1

Newtonsoft.Json.JsonSerializationException: Error converting value "REPO" to type 'Microsoft.Azure.Databricks.Client.ObjectType'. Path 'objects[0].object_type'.
 ---> System.ArgumentException: Requested value 'REPO' was not found.
   at Newtonsoft.Json.Utilities.EnumUtils.ParseEnum(Type enumType, NamingStrategy namingStrategy, String value, Boolean disallowNumber)
   at Newtonsoft.Json.Converters.StringEnumConverter.ReadJson(JsonReader reader, Type objectType, Object existingValue, JsonSerializer serializer)
   --- End of inner exception stack trace ---
   at Newtonsoft.Json.Converters.StringEnumConverter.ReadJson(JsonReader reader, Type objectType, Object existingValue, JsonSerializer serializer)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.DeserializeConvertable(JsonConverter converter, JsonReader reader, Type objectType, Object existingValue)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.SetPropertyValue(JsonProperty property, JsonConverter propertyConverter, JsonContainerContract containerContract, JsonProperty containerProperty, JsonReader reader, Object target)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.PopulateObject(Object newObject, JsonReader reader, JsonObjectContract contract, JsonProperty member, String id)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateObject(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, Object existingValue)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateValueInternal(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, Object existingValue)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.PopulateList(IList list, JsonReader reader, JsonArrayContract contract, JsonProperty containerProperty, String id)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateList(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, Object existingValue, String id)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.CreateValueInternal(JsonReader reader, Type objectType, JsonContract contract, JsonProperty member, JsonContainerContract containerContract, JsonProperty containerMember, Object existingValue)
   at Newtonsoft.Json.Serialization.JsonSerializerInternalReader.Deserialize(JsonReader reader, Type objectType, Boolean checkAdditionalContent)
   at Newtonsoft.Json.JsonSerializer.DeserializeInternal(JsonReader reader, Type objectType)
   at Newtonsoft.Json.Linq.JToken.ToObject(Type objectType, JsonSerializer jsonSerializer)
   at Newtonsoft.Json.Linq.JToken.ToObject(Type objectType)
   at Newtonsoft.Json.Linq.JToken.ToObject[T]()
   at Microsoft.Azure.Databricks.Client.WorkspaceApiClient.List(String path, CancellationToken cancellationToken)

Execution API implementation

I was looking for implementations for the new Execution API but couldn't find them. So I wonder if the maintainers are willing to accept a PR implementing that API.

That implementation is a little bit cumbersome, as it involves waiting for a response asynchronously, pagination, and in some cases streaming files.

Release window of version 2?

Hi all,

can you provide any information on the release of a stable version 2?
Some of our critical infrastructure depends on this client library and we will have to switch to Databricks Jobs API 2.0 soon. RC5 seems pretty stable so far, so we are wondering when it will be released. IMO, this is a very important update to the library.

Best regards
Jonathan

You are doing it wrong.

One picture is worth a thousand words :

image

I'll let your imagination to complete the task using nswag or OpenAPITools ....

  • It took me half a day to patch some missing part in open api doc, then compile and publish dotnet client with AAAAAALLLLLL the methods.

Support for .NET Standard 2.0

Is it possible to add .NET Standard 2.0 support for version 2?
New package release contains Jobs API 2.1 but now it is targetting only to net6.0

List task in JobsApiClient not working properly

Hi,

The List task in JobsApiClient is not working properly, it takes in the parameters of 'limit', 'offset' and 'expand_tasks' but doesnt actually use it and pass it to the databricks jobs api. As a result, whatever arguments being passed in, it will always use the default.

`Dbfs.List` incorrectly decodes characters in the path

Spark allows using the slash character / in partitioned columns of delta tables. For example, the following table is partitioned by the Name column:

Name Value
a/b 3

When stored as delta table it creates a directory for each partition. So calling DatabvricksClient.Dbfs.List("/mnt/my-data/my-table") I get the following output:

FileSize IsDirectory Path
0 true /mnt/my-data/my-table/Name=a%2Fb
0 true /mnt/my-data/my-table/_delta_log

However, when I pass the /mnt/my-data/my-table/Name=a%2Fb path, I get the ClientException with the message

{"error_code":"RESOURCE_DOES_NOT_EXIST","message":"No file or directory exists on path dbfs:/mnt/my-data/my-table/Name=a/b"}

Enum JSON Parsing Not Working

There are two Enums in the Instance Pool model which are not parsed correctly with the latest v2.0.0-rc.3. Error received:

System.Text.Json.JsonException: The JSON value could not be converted to System.Nullable`1[Microsoft.Azure.Databricks.Client.Models.AzureAvailability]. Path: $[0].azure_attributes.availability | LineNumber: 0 | BytePositionInLine: 273.
  Stack Trace:
      at System.Text.Json.ThrowHelper.ThrowJsonException(String message)
   at System.Text.Json.Serialization.Converters.EnumConverter`1.Read(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options)
   at System.Text.Json.Serialization.Converters.NullableConverter`1.Read(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options)
   at System.Text.Json.Serialization.Metadata.JsonPropertyInfo`1.ReadJsonAndSetMember(Object obj, ReadStack& state, Utf8JsonReader& reader)
   at System.Text.Json.Serialization.Converters.ObjectDefaultConverter`1.OnTryRead(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options, ReadStack& state, T& value)  
   at System.Text.Json.Serialization.JsonConverter`1.TryRead(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options, ReadStack& state, T& value)
   at System.Text.Json.Serialization.Metadata.JsonPropertyInfo`1.ReadJsonAndSetMember(Object obj, ReadStack& state, Utf8JsonReader& reader)
   at System.Text.Json.Serialization.Converters.ObjectDefaultConverter`1.OnTryRead(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options, ReadStack& state, T& value)  
   at System.Text.Json.Serialization.JsonConverter`1.TryRead(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options, ReadStack& state, T& value)
   at System.Text.Json.Serialization.JsonCollectionConverter`2.OnTryRead(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options, ReadStack& state, TCollection& value)  
   at System.Text.Json.Serialization.JsonConverter`1.TryRead(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options, ReadStack& state, T& value)
   at System.Text.Json.Serialization.JsonConverter`1.ReadCore(Utf8JsonReader& reader, JsonSerializerOptions options, ReadStack& state)
   at System.Text.Json.JsonSerializer.ReadFromSpan[TValue](ReadOnlySpan`1 utf8Json, JsonTypeInfo jsonTypeInfo, Nullable`1 actualByteCount)
   at System.Text.Json.JsonSerializer.ReadNode[TValue](JsonNode node, JsonTypeInfo jsonTypeInfo)
   at System.Text.Json.JsonSerializer.Deserialize[TValue](JsonNode node, JsonSerializerOptions options)
   at Microsoft.Azure.Databricks.Client.InstancePoolApiClient.List(CancellationToken cancellationToken) in C:\Users\Craig\projects\azure\azure-databricks-client\csharp\Microsoft.Azure.Databricks.Client\InstancePoolApiClient.cs:line 68
   at Microsoft.Azure.Databricks.Client.Test.InstancePoolApiClientTest.TestList() in C:\Users\Craig\projects\azure\azure-databricks-client\csharp\Microsoft.Azure.Databricks.Client.Test\InstancePoolApiClientTest.cs:line 71
   at Microsoft.VisualStudio.TestPlatform.MSTestAdapter.PlatformServices.ThreadOperations.ExecuteWithAbortSafety(Action action)

As there are no unit tests for the Instance Pool API, I've created one which will produce this above error, see attached.
InstancePoolApiClientTest.cs.txt

I've traced this back to the change from Newtonsoft JSON parsing to System.Text.Json and removal of the attribute level annotation to treat the enums as strings. Note that there is a global config to treat enums as strings but unfortunately that isn't working. I'm not very experienced with C# or System.Text.Json so not sure if that is a bug or expected behaviour. The fix that I have come up with is to add annotation [JsonConverter(typeof(JsonStringEnumConverter))] in AzureAttributes.cs to the Availability attribute and in InstancePoolInfo.cs to the State attribute.

Suggested fixes attached as well.
InstancePoolInfo.cs.txt
AzureAttributes.cs.txt

ClusterAttributes RuntimeEngine does not get set when Photon engine is selected

When creating a new ClusterAttributes object, if the runtime_enigne is being set to Photon it does not get serialized. I believe it is ignored in the ApiClient because of the JsonSerializerOptions where DefaultIgnoreCondition is set to JsonIgnoreCondition.WhenWritingDefault, and when setting that value to 0 (RuntimeEngine.Photon) it just ignores it. Setting it to RuntimeEngine.Standard (1) works.

I think the RuntimeEngine enum can be made nullable since the API documentation states that if left unspecified, the runtime engine is inferred from spark_version.

Also, autotermination_minutes could also be made nullable, otherwise when set to 0 it will be ignored by the serializer. Setting it explicitly to 0 will allow us to disable automatic termination. (as per documentation)

Thank you

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.