Introduction to Data Science Workshop Series

Teaching materials for KAUST Visualization Core Lab (KVL) Introduction to Data Science Workshop Series.

Course Curricula

Identifying Core Competencies for Data Science

According to a recent O’Reilly Data Science Survey, most data scientists use multiple programming languages on a daily basis to solve their data science problems. The top three programming languages used by data scientists are SQL, Python, and Bash. The ability to share and reproduce data science workflows is critical, whether the workflows are providing decision support in industrial applications, or generating novel insights from scientific data. Core tools for facilitating reproducible data science workflows are version control tools such as Git, virtual environment tools such as Conda, and container technologies such as Docker.

Building Data Science Capacity at KAUST

KVL has organized a series of Introduction to Data Science workshops to build capacity in the core data science tools and enable future data science applications at KAUST.

Introduction to Python for Data Science: TBD
Introduction to Conda for (Data) Scientists: TBD
Introduction to Shell for (Data) Scientists: TBD
Introduction to Version Control using Git for (Data) Scientists: TBD
Introduction to SQL for Data Science: TBD
Introduction to Docker for (Data) Scientists: TBD

The core workshop material largely follows a curriculum developed by Software and Data Carpentry, two global nonprofit organizations that teach foundational coding and data science skills to researchers worldwide. The curriculum will be offered every Fall and Spring semester in its entirety in order to provide KAUST students, post-docs, staff, and researchers with an opportunity to develop their skills in these core data science tools.

KAUST Core Labs will offer a Certificate of Completion to those learners who complete the core Introduction to Data Science curriculum.

Helping to Advance the State-of-the-Art in Data Science at KAUST

In addition to building capacity in core data science tools, KVL and KAUST Supercomputing Core Laboratory (KSL) are planning to offer additional advanced training courses in tools used in state-of-the-art data science applications with a particular focus on enabling data science with GPUs.

Using Conda

Creating the Conda environment

After adding any necessary dependencies to the Conda environment.yml file you can create the environment in a sub-directory of your project directory by running the following command.

$ conda env create --prefix ./env --file environment.yml

Once the new environment has been created you can activate the environment with the following command.

$ conda activate ./env

Note that the env directory is not under version control as it can always be re-created from the environment.yml file as necessary.

Building JupyterLab extensions (optional)

If you wish to use any JupyterLab extensions included in the environment.yml file then you need to activate the environment and rebuild the JupyterLab application using the following commands to source the postBuild script.

$ conda activate $ENV_PREFIX # optional if environment already active
(/path/to/env) $ . postBuild

Updating the Conda environment

If you add (remove) dependencies to (from) the environment.yml file after the environment has already been created, then you can update the environment with the following command.

$ conda env update --prefix ./env --file environment.yml --prune

Listing the full contents of the Conda environment

You can list the full contents of the Conda environment by running the following command.

$ conda list --prefix ./env

Using Docker

In order to build Docker images for your project and run containers you will need to install Docker and Docker Compose.

Detailed instructions for using Docker to build and image and launch containers can be found in the docker/README.md.

gitter-badger / introduction-to-data-science-workshop Goto Github PK

introduction-to-data-science-workshop's Introduction

Introduction to Data Science Workshop Series

Course Curricula

Identifying Core Competencies for Data Science

Building Data Science Capacity at KAUST

Helping to Advance the State-of-the-Art in Data Science at KAUST

Using Conda

Creating the Conda environment

Building JupyterLab extensions (optional)

Updating the Conda environment

Listing the full contents of the Conda environment

Using Docker

introduction-to-data-science-workshop's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent