Git Product home page Git Product logo

vscode-development-containers's Introduction

Pyspark/Python Development Containers

This repository contains a set of configurations to create pyspark and python development environments in vscode using docker containers. For development in pyspark two different strategies can be used, using spark in local mode or in standalone mode.

VSCode Development Environments

Environment Configurations Spark Mode
Python python-devenv-wks -
Pyspark pyspark-devenv-local-wks Local
Pyspark pyspark-devenv-standalone-wks Standalone

Requirements

Installation

To get started, follow these steps:

  1. Install Visual Studio Code (VSCode)
  2. Install and configure Docker for your operating system.
  3. Install the Remote Development extension pack

Install development environment

  1. Download the devcontainer files for the desired VScode development environment (zip extension), and unzip the files to a location of your choice.

  2. Configure the environment variables at your choice, if needed (see section Environment variables).

    • Change the name of file .env_template inside folder .devcontainer to .env.
    • Add or modify the environment variables with the values that fit best for you.
  3. Start VSCode.

  4. Press on View โ†’ Command Palette โ†’ search for Remote-Containers: Open Folder in Container....

  5. Choose and press over Remote-Containers: Open Folder in Container...

  6. Select folder pyspark-devenv-{spark mode}-wks or python-devenv-wks that contains folder .devcontainer.

  7. Leave the process running until the installation is complete.

Environment Variables

Common environment variables:

Variable Description Default Value
JUPYTER_PORT Port to access to Jupyter environment 8888
GIT_EMAIL Email that will be used to configure git default
GIT_USERNAME Username that will be used to configure git default
DISABLE_JUPYTER If you want to disable jupyter environment set this value to 1 0
JUPYTER_ALLOW_ORIGIN The address origin that are allowed to access to your jupyter server 0.0.0.0
JUPYTER_PASSWORD The password to access to your jupyter server (hashed password) hashed string of "devuser"
TAG The tag of the docker image regarding the develoment environment (see tags below) latest

Pyspark environment variables:

Both modes

Variable Description Default Value
JUPYTER_SPARk_MEMORY The amount of memory that spark used by jupyter is allowed to consume 2g
JUPYTER_SPARK_CORES Number of cpu cores that spark used by jupyter is allowed to use 2
SPARK_EXECUTOR_MEMORY The amount of memory that spark executor can consume 2g

Standalone Mode

Variable Description Default Value
SPARK_WORKER_CORES Number of cpu cores that a spark worker can use 2
SPARK_WORKER_MEMORY The amount of memory that a spark worker can use 4g
HISTORY_CLEANER_INTERVAL Specifies how often the filesystem job history cleaner checks for files to delete 1d
HISTORY_MAX_AGE History files older than this value will be deleted when the filesystem history cleaner runs 7d

Tags

python-devenv-wks
3.10
3.8
pyspark-devenv-local-wks/pyspark-devenv-standalone-wks Python version Spark version
3.10-3.4.0 (latest) 3.10 3.4.0
3.10-3.3.2 3.10 3.3.2
3.8-3.2.1 3.8 3.2.1

vscode-development-containers's People

Contributors

ornrocha avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.