Git Product home page Git Product logo

introduction-to-data-science-workshop's Introduction

badge badge badge GitPitch

Introduction to Data Science Workshop Series

Teaching materials for KAUST Visualization Core Lab (KVL) Introduction to Data Science Workshop Series.

Course Curricula

Identifying Core Competencies for Data Science

According to a recent O’Reilly Data Science Survey, most data scientists use multiple programming languages on a daily basis to solve their data science problems. The top three programming languages used by data scientists are SQL, Python, and Bash. The ability to share and reproduce data science workflows is critical, whether the workflows are providing decision support in industrial applications, or generating novel insights from scientific data. Core tools for facilitating reproducible data science workflows are version control tools such as Git, virtual environment tools such as Conda, and container technologies such as Docker.

Building Data Science Capacity at KAUST

KVL has organized a series of Introduction to Data Science workshops to build capacity in the core data science tools and enable future data science applications at KAUST.

  • Introduction to Python for Data Science: TBD
  • Introduction to Conda for (Data) Scientists: TBD
  • Introduction to Shell for (Data) Scientists: TBD
  • Introduction to Version Control using Git for (Data) Scientists: TBD
  • Introduction to SQL for Data Science: TBD
  • Introduction to Docker for (Data) Scientists: TBD

The core workshop material largely follows a curriculum developed by Software and Data Carpentry, two global nonprofit organizations that teach foundational coding and data science skills to researchers worldwide. The curriculum will be offered every Fall and Spring semester in its entirety in order to provide KAUST students, post-docs, staff, and researchers with an opportunity to develop their skills in these core data science tools.

KAUST Core Labs will offer a Certificate of Completion to those learners who complete the core Introduction to Data Science curriculum.

Helping to Advance the State-of-the-Art in Data Science at KAUST

In addition to building capacity in core data science tools, KVL and KAUST Supercomputing Core Laboratory (KSL) are planning to offer additional advanced training courses in tools used in state-of-the-art data science applications with a particular focus on enabling data science with GPUs.

Using Conda

Creating the Conda environment

After adding any necessary dependencies to the Conda environment.yml file you can create the environment in a sub-directory of your project directory by running the following command.

$ conda env create --prefix ./env --file environment.yml

Once the new environment has been created you can activate the environment with the following command.

$ conda activate ./env

Note that the env directory is not under version control as it can always be re-created from the environment.yml file as necessary.

Building JupyterLab extensions (optional)

If you wish to use any JupyterLab extensions included in the environment.yml file then you need to activate the environment and rebuild the JupyterLab application using the following commands to source the postBuild script.

$ conda activate $ENV_PREFIX # optional if environment already active
(/path/to/env) $ . postBuild

Updating the Conda environment

If you add (remove) dependencies to (from) the environment.yml file after the environment has already been created, then you can update the environment with the following command.

$ conda env update --prefix ./env --file environment.yml --prune

Listing the full contents of the Conda environment

You can list the full contents of the Conda environment by running the following command.

$ conda list --prefix ./env

Using Docker

In order to build Docker images for your project and run containers you will need to install Docker and Docker Compose.

Detailed instructions for using Docker to build and image and launch containers can be found in the docker/README.md.

introduction-to-data-science-workshop's People

Contributors

davidrpugh avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.