dlab-berkeley / python-fundamentals-legacy Goto Github PK

D-Lab's 12 hour introduction to Python. Learn how to create variables and functions, use control flow structures, use libraries, import data, and more, using Python and Jupyter Notebooks.

License: Other

Jupyter Notebook 100.00%

python data-science jupyter introduction-to-python

python-fundamentals-legacy's Introduction

D-Lab Python Fundamentals Workshop

This repository contains the materials for D-Lab’s Python Fundamentals workshop. No prior experience with Python is required to attend this workshop.

Workshop Goals

This four-part, interactive workshop series is your complete introduction to programming Python for people with little or no previous programming experience. By the end of the series, you will be able to apply your knowledge of basic principles of programming and data manipulation to a real-world social science application.

Each of the parts is divided into a lecture-style coding walkthrough interrupted by challenge problems, discussions of the solutions, and breaks. Instructors and TAs are dedicated to engaging you in the classroom and answering questions in plain language.

Part 1: Introduction to Python and Jupyter Notebooks, variables, data types, and functions.
Part 2: Data structures, loops, conditionals, and creating functions.
Part 3: Libraries, File I/O, and scientific computing.
Part 4: Error handling, style, and an applied, in-depth project.

Installation Instructions

Anaconda is a useful package management software that allows you to run Python and Jupyter notebooks easily. Installing Anaconda is the easiest way to make sure you have all the necessary software to run the materials for this workshop. If you would like to run Python on your own computer, complete the following steps prior to the workshop:

Download and install Anaconda (Python 3.9 distribution). Click the "Download" button.
Download the Python Fundamentals workshop materials:
- Click the green "Code" button in the top right of the repository information.
- Click "Download Zip".
- Extract this file to a folder on your computer where you can easily access it (we recommend Desktop).
Optional: if you're familiar with git, you can instead clone this repository by opening a terminal and entering the command git clone [email protected]:dlab-berkeley/Python-Fundamentals.git.

Is Python Not Working on Your Laptop?

If you do not have Anaconda installed and the materials loaded on your workshop by the time it starts, we strongly recommend using the D-Lab Datahub to run the materials for these lessons. You can access the DataHub by clicking the following button:

The DataHub downloads this repository, along with any necessary packages, and allows you to run the materials in a Jupyter notebook that is stored on UC Berkeley's servers. No installation is necessary from your end - you only need an internet browser and a CalNet ID to log in. By using the DataHub, you can save your work and come back to it at any time. When you want to return to your saved work, just go straight to DataHub, sign in, and you click on the Python-Fundamentals folder.

If you don't have a Berkeley CalNet ID, you can still run these lessons in the cloud, by clicking this button:

Binder operates similarly to the D-Lab DataHub, but on a different set of servers. By using Binder, however, you cannot save your work.

Run the Code

Now that you have all the required software and materials, you need to run the code.

Open the Anaconda Navigator application. You should see the green snake logo appear on your screen. Note that this can take a few minutes to load up the first time.
Click the "Launch" button under "JupyterLab" and navigate through your file system on the left hand pane to the Python-Fundamentals folder you downloaded above. Note that, if you download the materials from GitHub, the folder name may instead be Python-Fundamentals-main.
Open 00_workshop_setup.ipynb to begin.
Press Shift + Enter (or Ctrl + Enter) to run a cell.

Note that all of the above steps can be run from the terminal, if you're familiar with how to interact with Anaconda in that fashion. However, using Anaconda Navigator is the easiest way to get started if this is your first time working with Anaconda.

Additional Resources

Check out the following online resources to learn more about Python:

About the UC Berkeley D-Lab

D-Lab works with Berkeley faculty, research staff, and students to advance data-intensive social science and humanities research. Our goal at D-Lab is to provide practical training, staff support, resources, and space to enable you to use R for your own research applications. Our services cater to all skill levels and no programming, statistical, or computer science backgrounds are necessary. We offer these services in the form of workshops, one-to-one consulting, and working groups that cover a variety of research topics, digital tools, and programming languages.

Visit the D-Lab homepage to learn more about us. You can view our calendar for upcoming events, learn about how to utilize our consulting and data services, and check out upcoming workshops. Subscribe to our newsletter to stay up to date on D-Lab events, services, and opportunities.

Other D-Lab Python Workshops

D-Lab offers a variety of Python workshops, catered toward different levels of expertise.

Introductory Workshops

Intermediate and Advanced Workshops

Contributors

Emily Grabowski
Pratik Sachdeva
Christopher Hench
Rochelle Terman

python-fundamentals-legacy's People

Contributors

Stargazers

Watchers

Forkers

kturner826 admndrsn ansolv rstern16 nickobob456 sqccathy ragrijalva jackspace jmapost henchc seth10 lisaeshunwilson anaslaaa gracedong92 yuyangpan kirschbombe shravankumar147 jasonecohen lthsieh kshield kaiquanmah anupamaangadi gdomnijl potatopaul 9vinny htnani boringppl acrutt phildani7 ourclamor hsabonchi kuzha timanum samyag1 john2912 zjj1031 nikolaospapachristou austincarson cyuecyue yangha7 ijdouglas pprawproud anhnguyendepocen alym203 ruschenpohler tpestaj geneh0 namitatrix rameezu terryxie007 dilerhaji ericlee-ucb takaushik zccooper dmazzella91 yixic94 cmcrawford4792 audhay ajayiea brunojordan shelleychen2019 kath-li jatinrajani zhaoying2931 mavabene badou93 y-khan averysaurus rowels fingolightly loyferra1 firminayivodji salmaelmallah mpettis2020 ewanoleghe hellenscott13x danyangli22 dlab-frontdesk bdbomfim hsiaohanlin1 ukukas arlionn annajiat ecahill31 yuzhangnju hatembk leenabh rsbertoldi alegzandra bukanif nsabimana738 tamannaeini juanestebanpabon rangsutu88 tommyboy913 meiqingli abauman310 taesoosong fecologist lidiya111

python-fundamentals-legacy's Issues

Day_3/Day_3_Answers/13_Answers.ipynb is missing numpy import

Day_3/Day_3_Answers/13_Answers.ipynb

The Challenge 2: Writing a CSV file needs:

import numpy as np

namespace, %who, and %whos (notebooks 03 and 11)

Maybe consider moving the namespace section from 03_Variables_Assignment into 11_scope.

In place of namespace in 03_Variables_Assignment, perhaps replaces with %who and %whos

Kernal dying

kernal was dying for me and a couple of other participants on datahub. I wasn't hitting up against memory and I was sure to shutdown notebooks as I moved along. Day 3 seemed to be particularly problematic.

Part 1 : typos

Fix link to 'documentation' at the top of notebook 5
'An argument with out the proper number of arguments' --> 'A function without the proper number of arguments' near the top of notebook 5

Day2 lesson 7 answers out of sync with the lesson challenges

please update the lesson 7 answers to match the challenges.

Inconsistency in Notebook 9

Notebook 9: in Principles of Custom Functions, adjust the 'Plan' section output to be a dataframe, rather than two lists, to match the rest of the section.

Provide Guidance for Windows Anaconda Installation

While installing Anaconda on Windows 10, I get this pop-up and I am not sure what I should do. Should I keep the defaults or should I change something in the advanced options of the installation? It would help to have some documentation about what to do when I see this screen.

Challenge #2 in Notebook 13

Instead of importing pandas as pd, we need to actually import numpy and use that instead

Typo in notebook 5

'An argument with out the proper number of arguments' --> 'A function without the proper number of arguments' near the top of notebook 5

Day 2- 07_List Challenge 1 Question 1 does not match answer

A participant mentioned that the list for the first challenge question doesn’t match the one in the answer.

Unnecessary line continuation (backslash) characters for dictionary literals

It's not necessary to use line continuation (backslash) characters in first two code cells with dictionary literals in
Day_3/12_Dictionaries.ipynb.

poets_dict = {"name": "Forough Farrokhzad", \
            "year of birth": 1935, \
            "year of death": 1967, \
            "place of birth": "Iran", \
            "language": "Persian", \
            "works": ["Remembrance of a Day","Unison","The Shower of Your Hair","Portrait of Forough"]}

works just fine as this:

poets_dict = {"name": "Forough Farrokhzad",
            "year of birth": 1935,
            "year of death": 1967,
            "place of birth": "Iran",
            "language": "Persian",
            "works": ["Remembrance of a Day","Unison","The Shower of Your Hair","Portrait of Forough"]}

This should be just fine in both python3 and python2 (I think?!) so I'm not sure why this was used in the first place?

Also, we should introduce the term "dictionary literal" as this version of defining as dictionary is using the "literal" syntax. For some explanation see: https://softwareengineering.stackexchange.com/questions/319547/literals-versus-instantiating-by-name-lists-and-dicts-in-python and https://www.python.org/dev/peps/pep-0586/

Day 3 Comprehensions & Day 4 Notebooks

16_Comprehensions

Consider adding an if, else list comprehension as well (question came up from participants)

16_Answers

For each comprehension challenge question, we may want to reinitialize or use new variable names to be used in the answers so the solutions can be created without using the variable names that were already run and set using a for loop (even if the code was incorrect, you would get the correct answer if you reuse the variables)

UPR_Workbook

Import statements for 'os' and 'csv' are easy to miss as they fall right under the long introduction text. May want to move these down to Part B where
'csv' module is never called, so there is no need to import it

Part A1

The parameter name 'file_name' in section 1.5 is different from the originally used variable 'filename' (no _) used in section 1.1, which could lead to some confusion. We may want to redefine the function parameter as 'filename' to keep it consistent

Part A2

In section 2.2, the instructions list the wrong variable names. The 3 new lists should be 'accept_recs', 'examine_recs', 'reject_recs'

Part A3

Might be a little out of scope for this workshop, but we could use this part as a good opportunity to introduce Regular Expressions (regex), and demonstrate what a solution using regex might look like.
*Same comment throughout the notebook. For clarity, may want to use the actual variable name in the markdown
We do not specify what strings to use for the 'decision_type' variable. We may want to make that explicit

DataFrame subsetting (notebook 17 and 6)

Streamline subsetting to 2-3 key ways (i.e. selecting a column and boolean mask)-- add framing to point students to pandas workshop for further work with dataframes/subsetting

Part 4: notebook 17 - missing README file

Include 'README.txt' file to filter out in airline_data folder for notebook 17

Provide Guidance for Linux Anaconda Installation

Just like #11 for Windows, it would be helpful to provide Linux guidance on whether to accept the defaults or not.

In accumulator patterns, change variable names to be more informatives

for example: for number in numbers --> for n in numbers

Notebook 17 part 1 scaffolding

Increase scaffolding for notebook 17 part 1 (importing data), and include skeleton code for the answers to help guide students.

This is a test

@mariepelagie - These are the test changes that I propose:

Broken link in notebook 5

Fix link to 'documentation' at the top of notebook 5

Day 3 Workshop Notes 12-8-2021

11 attendees at beginning of workshop. Student comments at start : in this class there is a lot of material to get acquainted with when not familiar with programming, may need another class after this one to continue to learn. Student question concerning order of executing cells in a notebook, having to do with lengths/red/green/blue example. Instructor demonstration about how to restart kernel and clear outputs, note that we don't have to work on notebook in a linear fashion. 16 participants as of 11:00 am. Introduced and discussed python pretty print (pprint library), working with dictionaries/collections. Bulk of time spent on notebook 12 and really drilling down into dictionaries and files, having students share their screens/code and working through it. 14 attendees after the noon break. 15 as of 12:52. Workshop went super smooth.

Day 2 Notebook 11 & Day 3 Notebooks

11_Scope

We use docstrings in some of the functions, but never talk about what they are. We may want to have a quick blurb introducing what the triple quotes mean and what docstrings are used for.

13_Files:

Bash is introduced without any context, may want to add a little blurb about it here
For Challenge 1, some participants decided to read in the file using Pandas. This sets 'Alameda' to the header. Here, it may be a good opportunity to include a hint about setting one of the function arguments 'header=None', and linking to the Pandas documentation to introduce how to read source code docs.
Need "import numpy as np" statement for Challenge 2 question

15_Errors

File Errors: The fourth bullet detailing the UnsupportedOperation error probably should say, ""write" flag instead of the "read" flag", not the other way around because the code cell underneath uses the write flag.
We should clarify that errors_01 and errors_02 packages are custom .py files we have written, maybe by specifying this during the import statement.

15_Answers

error_02 module cannot be found because it is not in the same directory as the answers notebooks. Either we would need to copy the errors_02 module to the answers, or create an importable directory that contains both of these error_* files (e.g., error directory with an init.py that contains both error_* files)

17_Beautiful-Code

Not sure what the import statements are for under the "Long Lines & Continuations" section. They also throw errors.

day 1 , lesson 6 change parameter to argument

In the sentence "A function is a piece of code that is called by name. It can be passed data to operate on (ie. the parameters) and can optionally return data (the return value)." change parameters to arguments.

Curriculum Review

Day 2, 07 Answers and and 07_lists have different answers for low_high question
Add time, biobreaks, and breakout group suggestions (in the questions / problems) into the presentation so it’s not skipped in remote workshops
Teaching Day 4 is hard because it’s not very interactive, especially in remote workshops
Across all of these, create a set of instructions that we can project into the screen (powerpoint maybe?)
Download Anaconda
Go to github dlab
How to open a file in Jupyter Notebook
Also FAQ that we can add into the invitation about whether we will provide recordings

datahub link does not work

The current datahub link listed in the README does not work:

Update link to D-Lab DataHub

We should update the button to point to the D-Lab DataHub.

Part 2 08_loops Issue with mountain_df

Under Challenge 2: Looping Through a Series, the end of a code block for creating the mountains_df data frame called "mountain_df" instead of "mountains_df".

Import numpy as np? Day 3 Notebook 13 Challenge 2 solution

import numpy as np

make binder use classic notebook instead of jupyter lab as default

Links need to be updated to add ?urlpath=tree as described in the UI section of the Binder user guide

Part 2, 08_loops error with rounding

Under "Loops for repeated computation", the list is currently:

tires = [41,35,28]

This should be changed to:

tires = tires = [40.9, 35.2, 28.4]

This will show the round() in the loop is actually working.

Notebook 15 challenge 1 solution missing

Include complete answers for notebook 15 challenge 1 in solutions notebook

Notebook output in Part 1, notebook 4

The notebooks 04_data_types.ipynb has output rendered, which needs to be cleared.

Include both mac and windows filepaths

Especially in Notebook 17, include windows filepath separators

Why is the "to" country "cotedivoire" for all 1709 rows?

Data Frame methods (notebook 12)

Reduce the number of methods introduced to 2-3 and include a challenge question to reinforce learning ( move other functions to an appendix or other section)

Parts 2-4 considerations

Part 2:

Potentially split up notebook 6 into two sections, lists and dictionaries/dataframes.

Notebook 9: in Principles of Custom Functions, adjust the 'Plan' section output to be a dataframe, rather than two lists, to match the rest of the section.

Consider reducing material slightly (I ended up teaching the second half of notebook 9 on Day 3)

Part 3:

Consider reducing number of skills covered slightly. Some suggestions:
Shorten or remove numpy notebook (consider introducing the numpy package in the Libraries notebook (10) instead)
Reduce number of functions introduced in pandas notebook, and increase guided exercise

Part 4:

Include answer for notebook 15 challenge 1 in solutions notebook
Increase scaffolding throughout the exercise in notebook 17, clarify steps explicitly, and include skeleton code in some areas to help facilitate coding. Potentially shorten sections 2 and 3 slightly to reduce the total amount of material

Split notebook 6

Split up notebook 6 into two sections, lists and dictionaries/dataframes.

Potential Renaming of Repo

Should the Repo be renamed to match the name of the workshop by adding "Parts 1-4" onto the end?

Suggested Revisions for the Day 4 project

The Day 4 project got a bit sidetracked by the complexity of section 4.4 Combine output dictionaries. So I added a section 2.4 to the attached solutions notebook and then reworked the rest of the code as one way to make it a bit more digestable. This file would need to be cleaned up a bit and then worked into the other two notebooks but I think it would be worthwhile because this is 95% great notebook.
UPR_Solutions-Patty-revisions.ipynb.zip

Update installation instructions to say Python 3.7 or higher

The instructions currently specify Python 3.7 exactly, which confuses some detail-oriented people because they can no longer find the older 3.7 now that Anaconda provides a newer 3.8 version of Python.

This confusion can be avoid by updating the instructions everywhere (not just in this repo, but we should check for this elsewhere, too) by just saying Python 3.7 or higher.

---------- Forwarded message ---------
From: D-Lab Frontdesk [email protected]
Date: Tue, Oct 26, 2021 at 2:17 PM
Subject: Re: Python 3.7 Installation problem?
To: Matt Nolan [email protected]

Hi Matt, the instructions should say Python 3.7 or higher. Thanks for pointing out the confusion!

On Tue, Oct 26, 2021 at 10:08 AM Matt Nolan [email protected] wrote:
The instructions for the class seem to be out of date.

The page you pointed to contains places to download Python 3.8 but not Python 3.7.

Python 3.7 isn't listed in the archive

Maybe its the notes in the google calendar event that are mistaken it says 3.7, should it say 3.8?

Can you add some clarity about what exactly I am supposed to download to be prepared for the class today? thanks

Software Requirements: Installation Instructions for Python Anaconda

D-Lab Python Fundamentals
This is the repository for D-Lab's introductory Python-Fundamentals workshop series. Laptop, Internet connection, and Zoom account required.

Workshop goals
There are four folders (one for each day) that contain the notebooks we will walk through for each day:

Day_1 - Running Python, Jupyter Notebooks, variables assignment, data type conversion, working with strings, built-in functions

Day_2 - Lists, for-loops, conditional statements, writing your own functions, scope

Day_3 - Dictionaries, reading and writing data from and to files, installing and importing libraries, debugging errors, list comprehensions, beautiful code

Day_4 - Python application for information retrieval. You will extract targeted information from a text data set of United Nations documents to generate tabular data in a .csv file suitable for subsequent statistical analysis. Everything needed for this exercise is covered in Days 1, 2, and 3.

Installation Instructions
Download and install Python Anaconda distribution 3.7 and the workshop materials to get started. Before Part 1 be sure to:

Download and install Python Anaconda distrubtion 3.7 --> Click "Download" and then click 64-bit "Graphical Installer" for your current operating system.

====

I searched the page and Download isn't found. Towards the bottom of the link to 3.7 installation its python 3.8 downloads

Fix typo in questions of Files

In the first cell of Day_3/13_Files.ipynb there's a typo:

- "How do a open a file and read its contents?"

which should be:

- "How to open a file and read its contents?"

Reduce material in part 3 - numpy

Reduce/remove numpy notebook. I see potential for featuring numpy in the 'libraries' notebook (10) rather than a standalone notebook

Workshop Notes PyFun 4-5-2022 through 4-14-2022

Attendees = 8 at start. Wide range of student interests in learning python - to broaden skills, for jobs, not to get left behind when it comes to programming languages, 13 as of 9:36. Workshop going super smoothly. Lots of good questions and student interest. Drop reasons include choppy zoom/audio, other causes include other commitments. 12 as of 10:33. Some of the usual issues with Anaconda - tab completion not working, files not showing up in file list when attempting to open in Jupyter Notebook, installs not working properly. Love that we are offering Datahub link as an alternative to facilitate learning in what is most often the first experience for folks who are interested in learning python. 8 as of 11:15. 9 as of 11:33.

Use pip instead of Anaconda Navigator in Day 3 Libraries

Currently there is a lengthy statement about how to use the Anaconda Navigator to install the fuzzywuzzy library. To limit time spent in this notebook for a lengthy day switch to using pip instead.

Shift custom functions to Part 1

Move 10_custom_functions to Part 1, and streamline it down so that it doesn't use material from Part 2. Move the challenges to Part 2.

Adding a cool graphic to python day 3

How to debug your code:

introduction of encoding = "utf-8" without explanation

In section Read a text file in one line this lesson on Files Day_3/13_Files.ipynb introduces encoding = "utf-8" without any explanation.

my_text = open("../Day_4/data/txts/fiji2014.txt", encoding = "utf-8").read()

Also it turns out for OSX (my OS) and Python3, it is not necessary to include the encoding parameter at all, as this works just fine:

my_text = open("../Day_4/data/txts/fiji2014.txt").read()

But this may fail elsewhere (even with Python3) if the default encoding for either the OS or the filesystem is something other than utf-8.

So, either we need to remove this (simplest solution) from the code or we need to provide an explanation and motivating example(s) (which could probably be a whole module itself on text file encodings, etc).

add punct import for Day 2 Notebook 9 Challenge 4 solution

from string import punctuation

Add punct import to solutions Day 2 Notebook 9 Challenge 4 (acrostics)

from string import punctuation

Day 4 Solution Notebook Part 5 `process_document` function uses global variable

The process_document function in part 5 of the Day 4 Solution Notebook does not extract the year from each file. Instead, it uses the global year variable assigned in section 1.2 of the notebook, whose value is 2014. As a result, even for the text files with 2013 in the name, the final output csv has 2014 as the year.

Since the read_recommendations function already extracts the year and country from the filename but does nothing with this information, it might be good to have this function ALSO return year and country on top of the recs. Then there could be a local variable year within the process_document function that stores these returned variables in year and country variables.

dlab-berkeley / python-fundamentals-legacy Goto Github PK

python-fundamentals-legacy's Introduction

D-Lab Python Fundamentals Workshop

Workshop Goals

Installation Instructions

Is Python Not Working on Your Laptop?

Run the Code

Additional Resources

About the UC Berkeley D-Lab

Other D-Lab Python Workshops

Introductory Workshops

Intermediate and Advanced Workshops

Contributors

python-fundamentals-legacy's People

Contributors

Stargazers

Watchers

Forkers

python-fundamentals-legacy's Issues

Recommend Projects

Recommend Topics

Recommend Org