Git Product home page Git Product logo

ml-azuredatabricks's Introduction

Machine Learning with Azure Databricks

Easy to get started collection of Machine Learning Examples in Azure Databricks

Example Notebooks: HTML format, Github

Azure Databricks Reference Architecture - Machine Learning & Advanced Analytics

Key Benefits:

  • Built for enterprise with security, reliability, and scalability
  • End to end integration from data access (ADLS, SQL DW, EventHub, Kafka, etc.), data prep, feature engineering, model building in single node or distributed, MLops with MLflow, integration with AzureML, Synapse, & other Azure services.
  • Delta Lake to set the data foundation with higher data quality, reliability and performance for downstream ML & AI use cases
  • ML Runtime Optimizations
    • Reliable and secure distribution of open source ML frameworks
    • Packages and optimizes most common ML frameworks
    • Built-in optimization for distributed deep learning
    • Built-in AutoML and Experiment tracking
    • Customized environments using conda for reproducibility
  • Distributed Machine Learning
    • Spark MLlib
    • Migrate Single Node to distributed with just a few lines of code changes:
      • Distributed hyperparameter search (Hyperopt, Gridsearch)
      • PandasUDF to distribute models over different subsets of data or hyperparameters
      • Koalas: Pandas DataFrame API on Spark
    • Distributed Deep Learning training with Horovod
  • Use your own tools
    • Multiple languages in same Databricks notebooks (Python, R, Scala, SQL)
    • Databricks Connect: connect external tools with Azure databricks (IDEs, RStudio, Jupyter,...)

Machine Learning & MLops Examples using Azure Databricks:

To review example notebooks below in HTML format: https://joelcthomas.github.io/ml-azuredatabricks/
To reproduce in a notebook, see instructions below.

Adding soon:

  • Single node scikit-learn to distributed hyperparamter search using Hyperopt
  • Single node pandas to distributed using Koalas
  • PandasUDF to distribute models over different subsets of data or hyperparameters
  • Using databricks automl-toolkit in Azure Databricks
  • Using automl from AzureML in Azure Databricks

Other:

MLflow

Overview of MLflow and its features

How to run this example?

To reproduce examples provided here, please import ml-azuredatabricks.dbc file in git root directory to databricks workspace.

Instructions on how to import notebooks in databricks

Setup Cluster

Create a cluster - https://docs.microsoft.com/en-us/azure/databricks/clusters/create
GPU enabled Clusters - https://docs.microsoft.com/en-us/azure/databricks/clusters/gpu
Install a library/package - https://docs.microsoft.com/en-us/azure/databricks/libraries
Machine Learning Runtime - https://docs.microsoft.com/en-us/azure/databricks/runtime/mlruntime
To see list of already available package in each runtime - https://docs.microsoft.com/en-us/azure/databricks/release-notes/runtime/releases

Additional Information

For more information on using Azure Databricks
https://docs.microsoft.com/en-us/azure/azure-databricks/

ml-azuredatabricks's People

Contributors

joelcthomas avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.