- Lecture notebooks are organized by classes, in each respective folder.
- The requirements.txt file can be used to install all the necessary packages for all the notebooks in this course.
Note: As detailed below, multiple steps are necessary to run pyspark locally. We recommend trying out Vocareum before setting up locally, since the environment is already up and running there.
You need both Java and python for pyspark to work locally. Follow the steps here (https://www.datacamp.com/tutorial/installation-of-pyspark) based on your operating system, to download & install Java and set up corrsponding Environment Variables.
After you have done the above, proceed to create a python environment for the course from the requirements.txt file given here.
You're free to do this any other way if you're comfortable, but we recommend the below:
- Install Anaconda from https://www.anaconda.com/download. You can refer to https://docs.anaconda.com/free/anaconda/install/index.html for installation instructions. MAKE SURE TO DOWNLOAD THE APPROPRIATE ONE FOR YOUR OPERATING SYSTEM.
- Once anaconda is downloaded and the conda command is working in your terminal (refer to installation instructions), confirm that
conda activate base
command works (this should activate your base environment). - Run the following (replace <your-environment-name> with a name for your environment, I used DSC232r):
conda create -n <your-environment-name> python=3.7.5
conda activate <your-environment-name>
- Make sure you have git installed, and an account with GitHub. Download this repository via git (https://github.com/ucsd-dsc232r-s24/lecture-notebooks.git). You can run the following command on terminal, in the location where you want to clone the repository.
git clone https://github.com/ucsd-dsc232r-s24/lecture-notebooks.git
- Navigate to the folder location where you have downloaded this repository. Then, run
pip install -r requirements.txt
This should create your environment with all necessary packages.