Git Product home page Git Product logo

databricks-rest-client's People

Contributors

bolshem avatar cornelcreanga avatar dependabot[bot] avatar javamonkey79 avatar jeff303 avatar joongho avatar jyothikomm avatar reillydj avatar samshuster avatar seregasheypak avatar ssh-parity avatar techpavan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

databricks-rest-client's Issues

pass tokens in api calls

One suggestion is to pass tokens in the API calls. This is the recommended way of authenticating and also opens the API up to Azure Databricks users.

Add support for InstanceProfiles API

Databricks Rest API has an Instance Profiles endpoint, allowing for creation and deletion of instance profiles to databricks. This issue would create the InstanceProfilesService class and implementation.

more runnable classes

It might be good to have more runnable classes with main that wrap service calls or even a generic runner class which takes the service as a cli arg.

Change the initScripts field in NewClusterDTO to be an array

One of our processes failed with an error: com.fasterxml.jackson.databind.JsonMappingException: Can not deserialize instance of com.edmunds.rest.databricks.DTO.InitScriptInfoDTO out of START_ARRAY token

According to the Databricks API documentation "init_scripts" is an array of InitScriptInfo. Change the initScripts field in NewClusterDTO to be InitScriptInfoDTO[] to align with Databricks API requirements.

Bring ClusterService interface up-to-date

There are a few new methods available in Databricks' Cluster API.

Goal:
Add support for the following methods to the ClusterService

  1. pin
  2. unpin
  3. list node types
  4. list zones
  5. spark versions
  6. permanent delete

Accessing databricks from behind a proxy

Many enterprises use web proxies.

But unfortunately it seems to me that it is not possible to use the databricks-rest-client library in such an environment because of the use of Apache HttpClient. This class has a long known issue whereby the standard Java system properties relating to proxy setup (http.proxyHost, http.proxyPort, https.proxyHost, https.proxyPort and http.nonProxyHosts) are ignored by default.

The suggested solution is to call useSystemProperties() on the HttpClientBuilder used to create the client.

This (I think!) would require a small change to the first line of DatabricksRestClientImpl.initClient(...)

//    HttpClientBuilder clientBuilder = HttpClients.custom()
    HttpClientBuilder clientBuilder = HttpClientBuilder.create().useSystemProperties() // pick up Java proxy settings

However I'm a little uncomfortable submitting a pull request for this change as I'm not sure if useSystemProperties() will stomp some other aspects of the build configuration which is important for communicating with a databricks endpoint. Or maybe there is a work around for this issue that I haven't spotted.

(Btw. I'm using this library through the databricks-maven-plugin - and our build/CI system is secured behind a proxy which I think is a fairly common set-up.)

Allow use of custom HttpRequestExecutor in HttpClient

Hello, we are currently using the library to make run jobs in our production environment. We need to collect metrics around requests to the Databricks API and we use a custom class extending HttpRequestExecutor to accomplish this right now. We had to implement our own extension of the DatabricksRestClientImpl in order to pass in our custom HttpRequestExecutor to the HttpClientBuilder which led to a lot of duplicated code with the AbstractDatabricksRestClient#initClient method since the builder is not accessible before initialization. We would like a way to pass this executor to the builder are willing to contribute to the project to make it happen.

My first thought was to add this executor to the DatabricksServiceFactory.Builder class so that it could be set in the initClient method, but it does not seem to fit the pattern of all the values in the builder being primitives or strings. Right now I do not see a pattern for passing custom parameters to the HttpClientBuilder. Is there a preferred way to accomplish this that I could work on?

Here is a snippet of the workaround we have implemented:

    public CustomDatabricksRestClient(DatabricksServiceFactory.Builder builder, ...) {
        super(builder);
        initClientWithExecutor(builder, ...);
    }

    @Override
    protected void initClient(DatabricksServiceFactory.Builder builder) {
        // No-op init
    }

    private void initClientWithExecutor(DatabricksServiceFactory.Builder builder, ...) {
        CustomHttpRequestExecutor customHttpRequestExecutor = new CustomHttpRequestExecutor(...);

        HttpClientBuilder clientBuilder = HttpClients.custom().useSystemProperties()
                .setRetryHandler(retryHandler)
                .setServiceUnavailableRetryStrategy(retryStrategy)
                .setRequestExecutor(customHttpRequestExecutor)
                .setDefaultRequestConfig(createRequestConfig(builder));

        List<Header> headers = new ArrayList<>();
        if (StringUtils.isNotEmpty(builder.getToken())) {
            Header authHeader = new BasicHeader("Authorization", String.format("Bearer %s", builder.getToken()));
            headers.add(authHeader);
        }

        String userAgent = builder.getUserAgent();
        if (userAgent != null && userAgent.length() > 0) {
            Header userAgentHeader = new BasicHeader("User-Agent", userAgent);
            headers.add(userAgentHeader);
        }

        if (!headers.isEmpty()) {
            clientBuilder.setDefaultHeaders(headers);
        }

        try {
            SSLContext ctx = SSLContext.getDefault();
            // Allow TLSv1.2 protocol only
            SSLConnectionSocketFactory sslsf = new SSLConnectionSocketFactory(
                    ctx,
                    new String[]{"TLSv1.2"},
                    null,
                    SSLConnectionSocketFactory.getDefaultHostnameVerifier());
            clientBuilder = clientBuilder.setSSLSocketFactory(sslsf);
        } catch (Exception e) {
            _log.error("", e);
        }

        client = clientBuilder.build(); //CloseableHttpClient

        url = String.format("https://%s/api/%s", host, apiVersion);
        mapper = new ObjectMapper().setSerializationInclusion(JsonInclude.Include.NON_DEFAULT);
    }

Add support for jobs API 2.1 (multitask jobs)

Current, it is not possible to configure multiple tasks for one job. However, Databricks Jobs API 2.1 allows that, and this is a feature we would like to use configuring jobs programmatically using the library. Is it possible to add support for that feature?

Cleanup / organize DTOs

There are plenty of DTOs there - do we really need all of them? If so maybe we could javadoc these classes and perhaps organize into a separate folders

databricks-rest-client:3.0.6 - java.lang.NoSuchMethodError

databricks-rest-client:3.0.6 introduced log4j 2.17.2... however, a new issue was introduced

Exception in thread "main" java.lang.NoSuchMethodError: 'java.lang.ClassLoader org.apache.logging.log4j.util.StackLocatorUtil.getCallerClassLoader(int)'
	at org.apache.log4j.Logger.getLogger(Logger.java:40)
	at com.edmunds.rest.databricks.restclient.DefaultHttpClientBuilderFactory.<clinit>(DefaultHttpClientBuilderFactory.java:44)
	at com.edmunds.rest.databricks.DatabricksServiceFactory$Builder.build(DatabricksServiceFactory.java:352)
	at X.X.clients.Databricks.ServiceFactoryByToken(Databricks.java:26)
	at VincentApp.main(VincentApp.java:26)

DatabricksRestClientTest only tests Password Authenticated Client

The DataProvider(name = "Clients") provides three references to the same client, the password authenticated client, since the DatabricksFixtures class only holds one reference to a client.

The goal is to have the DataProvider provide references to each of the three different kinds of clients.

Add mvn-checkstyle

In order to ensure that project keeps consistent formatting, we need checkstyle as part of the build.

In terms of the checkstyle.xml used, we should determine if we should use Edmunds.com's checkstyle.xml or use a different one.

Add support for new object types introduced in Workspace List API

Recently the workspace list API seems to be returning new object types (FILE & REPO) (https://docs.databricks.com/dev-tools/api/latest/workspace.html#objecttype) (the type FILE is not listed here yet but we're seeing those in our API responses).
This library has only 3 file types as far as I can see, hence wanted to add support to the new types.
I can make a contribution too but just wanted to report the issue and get the opinion before moving forward.

image

Adopt checkstyle settings for a modern plugin version

Settings file (checkstyle/google-idea-checkstyle.xml) is incompatible with checkstyle plugin versions available for the latest Idea releases.

AC

google-idea-checkstyle.xml can be imported from modern Idea versions

Jackson dependencies - NoClassDefFoundError

When using databricks-rest-client, I am getting
com/fasterxml/jackson/annotation/JsonMerge: java.lang.NoClassDefFoundError java.lang.NoClassDefFoundError: com/fasterxml/jackson/annotation/JsonMerge

Inspecting dependencies, I see that

[INFO] +- com.edmunds:databricks-rest-client:jar:2.1.2:compile
[INFO] | +- com.fasterxml.jackson.core:jackson-databind:jar:2.9.7:compile
[INFO] | | +- com.fasterxml.jackson.core:jackson-annotations:jar:2.6.0:compile
[INFO] | | \- com.fasterxml.jackson.core:jackson-core:jar:2.9.7:compile

(I know that the latest version of databricks-rest-client is 2.2.x, but jackson dependencies didn't change)

It pulls in jackson-databind version 2.9.7, that for some reason pulls in older version of jackson-annotations (2.6.0).

jackson-annotation v 2.6.0 does not have JsonMerge
https://github.com/FasterXML/jackson-annotations/blob/2.6/src/main/java/com/fasterxml/jackson/annotation/JsonMerge.java
when the latest version does
https://github.com/FasterXML/jackson-annotations/blob/ab01a57066d441ed4eda8719808de2f39f094973/src/main/java/com/fasterxml/jackson/annotation/JsonMerge.java

Googling the issue found me this:
FasterXML/jackson-annotations#119

Add support for Groups API

Databricks Rest API has a Groups endpoint, allowing for creation and deletion of databricks groups. This issue would create the GroupsService class and implementation.

Add more documentation for checkstyle guidelines

Documentation for getting google standard checkstyle setup can use some beefing up. Specifically, add some notes (for intellij users) on installing the correct plugin, uploading the google standards, and using the checkstyle code scan.

Clean up JobServiceTest and JobRunnerTests

Right now there is code repetition between the two classes.
The tests have also become a bit sloppy and could be cleaned up.

Goal would be to:

  1. abstract away commonalities between the two.
  2. Clean up the tests.

Update TerminationCodeDTO according with the last API spec

Some of the codes specified here https://docs.databricks.com/dev-tools/api/latest/clusters.html#clusterterminationreasonterminationcode are not reflected in the TerminationCodeDTO => error on deserialization (*) => the cluster API is not usable.

(*)
Example:
Cannot deserialize value of type com.edmunds.rest.databricks.DTO.clusters.TerminationCodeDTO from String "SPARK_ERROR": not one of the values accepted for Enum class: [CLOUD_PROVIDER_LAUNCH_FAILURE, INIT_SCRIPT_FAILURE, INTERNAL_ERROR, INSTANCE_UNREACHABLE, INSTANCE_POOL_CLUSTER_FAILURE, COMMUNICATION_LOST, REQUEST_REJECTED, INACTIVITY, CLOUD_PROVIDER_SHUTDOWN, USER_REQUEST, TRIAL_EXPIRED, INVALID_ARGUMENT, SPARK_STARTUP_FAILURE, UNEXPECTED_LAUNCH_FAILURE, JOB_FINISHED]

#77

@samshuster

create toString methods for databricks DTO classes

Narrative

As an engineer working with the databricks rest clientI'd like meaningful toString methods on all java bean classesSuch that I can easily work with and debug these classes

Implementation Details

Nice to have: use jackson to create toString for DTOs, to match up with annotations, e.g.

//ObjectMapper could come from a constant
return new ObjectMapper().writeValueAsString(this);

Alternatively, use ReflectionToString from commons-lang:

return ReflectionToString.toString(this);

Update (again) TerminationCodeDTO

A new Azure code was added (GLOBAL_INIT_SCRIPT_FAILURE) - not present yet in the documentation. The Jackson deserialization will fail.

In future I think that a better solution would be to convert the unknown enums to null (com.fasterxml.jackson.databind.DeserializationFeature.READ_UNKNOWN_ENUM_VALUES_AS_NULL ) instead of throwing an error. The Azure API seems to change often and it will be hard to keep the pace.

#83

@samshuster

ClusterService should have an "upsert" method and it should take ClusterAttributesDTO as parameters

The purpose of this story is to make the ClusterService easier to use.
However, we should keep it backwards compatible for the time being.

Requirement 1

ClusterService currently has a create and edit method, but no upsert method (upsert here meaning create it if it doesn't exist or update the configuration if it already exists) which is a very useful piece of logic to have in the library.

Requirement 2

In addition, we should offer methods in ClusterService that take ClusterAttributeDTO which can be deserialized directly from JSON objects, instead of enforcing users to use the CreateClusterRequest objects.

For an example look at the JobService which does not use Request objects anymore.

I think the old CreateClusterRequest methods should be marked as deprecated.

Upgrade log4j from 1.2.17 to latest

Apache Log4j2 versions 2.0-beta7 through 2.17.0 (excluding security fix releases 2.3.2 and 2.12.4) are vulnerable to a remote code execution (RCE) attack where an attacker with permission to modify the logging configuration file can construct a malicious configuration using a JDBC Appender with a data source referencing a JNDI URI which can execute remote code. This issue is fixed by limiting JNDI data source names to the java protocol in Log4j2 versions 2.17.1, 2.12.4, and 2.3.2.

Group Integration Test Not Working

As of the last couple of months, the group api appears to function differently then it used to. I am not sure that the functionality is necessarily broken for users, but the integration test appears to be broken unless account api access is possible by the test user.

This story would be to examine how the test could be improved.

Update (again) TerminationCodeDTO according with the last API spec

New codes were added (AZURE_RESOURCE_PROVIDER_THROTTLING, AZURE_RESOURCE_MANAGER_THROTTLING,
NETWORK_CONFIGURATION_FAILURE) and the Jackson deserialization will fail for this cases.

The codes were only added into the MSFT specification https://docs.microsoft.com/en-us/azure/databricks/dev-tools/api/latest/clusters#--terminationcode . The databricsk link (https://docs.databricks.com/dev-tools/api/latest/clusters.html#clusterterminationreasonterminationcode) seems to be lagging behind.

Update README for recent issues

Groups API and InstanceProfiles API were recently implemented, but the README was not updated to reflect that functionality.

Goal of this issue is to update the README to show that those two APIs are available for use.

Please remove log4j.xml from the project

Please remove log4j.xml from the project. There is no reason for a project that's going to be included as a library into another one to define its own log4 file. If there are multiple log4 files the classloader will just choose one of them and.

Update instance types used in ClusterServiceTest

The instance types specified in the ClusterServiceTest integration tests have been deprecated by databricks. It is probably best to update them to a instance that is still supported.

I'd suggest changing r3.xlarge to m4.large where applicable.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.