Git Product home page Git Product logo

python-novice-inflammation's Introduction

Programming with Python

GitHub release Create a Slack Account with us Slack Status

An introduction to Python for non-programmers using inflammation data.

About the Lesson

This lesson teaches novice programmers to write modular code to perform data analysis using Python. The emphasis, however, is on teaching language-agnostic principles of programming such as automation with loops and encapsulation with functions, see Best Practices for Scientific Computing and Good enough practices in scientific computing to learn more.

The example used in this lesson analyses a set of 12 files with simulated inflammation data collected from a trial for a new treatment for arthritis. Learners are shown how it is better to automate analysis using functions instead of repeating analysis steps manually.

The rendered version of the lesson is available at: https://swcarpentry.github.io/python-novice-inflammation/.

This lesson is also available in R and MATLAB.

Episodes

# Episode Time Question(s)
1 Python Fundamentals 30 What basic data types can I work with in Python?
How can I create a new variable in Python?
Can I change the value associated with a variable after I create it?
2 Analyzing Patient Data 60 How can I process tabular data files in Python?
3 Visualizing Tabular Data 50 How can I visualize tabular data in Python?
How can I group several plots together?
4 Storing Multiple Values in Lists 30 How can I store many values together?
5 Repeating Actions with Loops 30 How can I do the same operations on many different values?
6 Analyzing Data from Multiple Files 20 How can I do the same operations on many different files?
7 Making Choices 30 How can my programs do different things based on data values?
8 Creating Functions 30 How can I define new functions?
What's the difference between defining and calling a function?
What happens when I call a function?
9 Errors and Exceptions 30 How does Python report errors?
How can I handle errors in Python programs?
10 Defensive Programming 30 How can I make my programs more reliable?
11 Debugging 30 How can I debug my program?
12 Command-Line Programs 30 How can I write Python programs that will work like Unix command-line tools?

Contributing

Travis Build Status

We welcome all contributions to improve the lesson! Maintainers will do their best to help you if you have any questions, concerns, or experience any difficulties along the way.

We'd like to ask you to familiarize yourself with our Contribution Guide and have a look at the more detailed guidelines on proper formatting, ways to render the lesson locally, and even how to write new episodes!

Maintainers

Lesson maintainers are Toan Phung and Indraneel Chakraborty.

Authors

A list of contributors to the lesson can be found in AUTHORS.

License

Instructional material from this lesson is made available under the Creative Commons Attribution (CC BY 4.0) license. Except where otherwise noted, example programs and software included as part of this lesson are made available under the MIT license. For more information, see LICENSE.md.

Citation

To cite this lesson, please consult with CITATION.

About Software Carpentry

Software Carpentry is a volunteer project that teaches basic computing skills to researchers since 1998. More information about Software Carpentry can be found here.

About The Carpentries

The Carpentries is a fiscally sponsored project of Community Initiatives, a registered 501(c)3 non-profit organisation based in California, USA. We are a global community teaching foundational computational and data science skills to researchers in academia, industry and government. More information can be found here.

python-novice-inflammation's People

Contributors

aaren avatar abostroem avatar adamobeng avatar benlaken avatar bkatiemills avatar damienirving avatar davidbenncsiro avatar drlabratory avatar dstndstn avatar ethanwhite avatar gcapes avatar ineelhere avatar janetriley avatar jhamrick avatar katkoler avatar kris-joseph avatar ldko avatar leyder avatar maxim-belkin avatar mkcor avatar noatgnu avatar nsoranzo avatar pbarmby avatar qulogic avatar rgaiacs avatar sparce avatar tbekolay avatar tobyhodges avatar valentina-s avatar zonca avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-novice-inflammation's Issues

updating the Debugging a Function subsection to py3k

The bug in debugging a function is no longer an issue in python 3. Should the section be removed entirely, or should a new bug be created, so students can continue to experience the joys of debugging?

One possibility would be to introduce an order of operations bug, temp - 32 * 5/9 instead of (temp - 32) * (5/9).

Lesson 01 and 04: matplotlib.pyplot.show ignore any arguments

In lesson 01-numpy and 04-files, we are teaching that matplotlib.pyplot.show take in arguments either graphics or figures. For example:

ave_inflammation = data.mean(axis=0)
ave_plot = matplotlib.pyplot.plot(ave_inflammation)
matplotlib.pyplot.show(ave_plot)

If we refer to matplotlib API, there are no mentions that show takes matplotlib object as arguments. The stated behaviour is that show will display all figures and blocks.

I agree that it would be useful to be able to tell matplotlib which graphic we want to display, but the API as it is now does not support it. To convince ourselves, simply run the following code:

import numpy

data = numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')

import matplotlib.pyplot

ave_inflammation = data.mean(axis=0)
ave_plot = matplotlib.pyplot.plot(ave_inflammation)
max_plot = matplotlib.pyplot.plot(data.max(axis=0))
min_plot = matplotlib.pyplot.plot(data.min(axis=0))
matplotlib.pyplot.show(ave_plot)

What we would expect is to only see the average plot, while we end up with all three graphics on the same figure.

From my point of view, since the arguments pass to show are simply drop, it does not serve any purpose to tell students to pass graphics or figure to show, and should simply call show without arguments in the courses.

Issues at 05-defensive.md

From https://github.com/swcarpentry/lesson-template/blob/gh-pages/tools/check.py:

  • Heading at line 41 should be level 2
  • Heading at line 433 should be level 2
  • Heading at line 500 should be level 2
  • Heading at line 524 should be level 2
  • Heading at line 551 should be level 2
  • Heading at line 574 should be level 2
  • Heading at line 605 should be level 2
  • Document contains heading not specified in the template: Assertions
  • Document contains heading not specified in the template: Test-Driven Development
  • Document contains heading not specified in the template: Debugging
  • Document contains heading not specified in the template: Know What It's Supposed to Do
  • Document contains heading not specified in the template: Make It Fail Every Time
  • Document contains heading not specified in the template: Make It Fail Fast
  • Document contains heading not specified in the template: Change One Thing at a Time, For a Reason
  • Document contains heading not specified in the template: Keep Track of What You've Done
  • Document contains heading not specified in the template: Be Humble
  • Could not find the linked asset file ../../rules.html in /tmp/python/../../rules.html. If this is a URL, it must be prefixed with http(s):// or ftp://.
  • The topic page should not have sub-headings outside of special blocks. If a topic needs sub-headings, it should be broken into multiple topics.

Figure numbering for figure captions

Since Pandoc can generate figure captions for images, it would be nice to be able to use figure numbering to refer to figures from the text (Figure 1, Figure 2, etc.). Currently, the figure captions are escaped by using the <backslash><white-space> syntax.

This is currently not possible directly in Pandoc, but there are some solutions discussed by @r-gaia-cs that involve CSS or Pandoc filters.

Perhaps we shouldn't have readings-01.py, readings-02.py etc?

Every time I read the command line lesson (08-cmdline.html), the files readings-01.py, readings-02.py, etc make me very uncomfortable, because that type of file naming behavior is exactly what we are trying to avoid by teaching version control.

I think the solution to this problem comes from the very beginning of the lesson, which reads:

... save the following in a text file called sys-version.py:

import sys
print 'version is', sys.version

We could take a similar approach to the rest of the lesson and instead of saying $ cat readings-01.py we could simply save the following in a text file called readings.py:

import sys
import numpy as np

def main():
    script = sys.argv[0]
    filename = sys.argv[1]
    data = np.loadtxt(filename, delimiter=',')
    for m in data.mean(axis=1):
        print m

Later in the lesson when we get to readings-02.py, we could just say that we open our text editor and make the following changes to readings.py:

import sys
import numpy as np

def main():
    script = sys.argv[0]
    filename = sys.argv[1]
    data = np.loadtxt(filename, delimiter=',')
    for m in data.mean(axis=1):
        print m

main()

The trick would be to somehow highlight main() in bold or a different color, because that is the part of the script that has changed since last time (i.e. kind of like git diff highlighting).

If people are happy with this suggestion of altering the wording of the command line lesson to make successive edits to readings.py as opposed to going $ cat readings-01.py, $ cat readings-02.py etc then I'd be happy to submit a pull request. (I'm also open to suggestions of how to incorporate git diff style highlighting into that PR, but I'd have to be able to write it in markdown and then and have pandoc convert it appropriately - I'm not aware of any way to make that happen.)

"Easier" example for TDD

Currently, we teach TDD using range_overlap as an example. I have found that the algorithm is not immediately obvious to learners, contributing significant extraneous load. I'd like to know if other instructors have had this problem and a mind to change it in our lessons. I would propose something much simpler, like

def common(first, second):
    """ Returns a (sorted) list of elements common to the
        lists 'first' and 'second'
    """

I'm happy to implement this along with some tests if there is any interest.

Add link to data files?

I'm not sure of the intention of the html version, but wouldn't it make sense include a direct link to the csv data files? That way students can work on this tutorial at their own pace without having to clone the repository?
See also #9.

Issues at 02-func.md

From https://github.com/swcarpentry/lesson-template/blob/gh-pages/tools/check.py:

  • Heading at line 61 should be level 2
  • Document contains heading not specified in the template: Debugging a Function
  • Document contains heading not specified in the template: Composing Functions
  • Document contains heading not specified in the template: The Call Stack
  • Document contains heading not specified in the template: Testing and Documenting
  • Document contains heading not specified in the template: Defining Defaults
  • Could not find the linked asset file 01-numpy.ipynb in /tmp/python/01-numpy.ipynb. If this is a URL, it must be prefixed with http(s):// or ftp://.
  • The topic page should not have sub-headings outside of special blocks. If a topic needs sub-headings, it should be broken into multiple topics.

Location of code and data does not agree with lesson 06-cmdline

The code and the data used in the lesson 06-cmdline have been moved to separate subdirectories. Thus the code as written in the lesson would not be executable. Is your plan to have this layout for the repo only, and then in live workshops have everything in one directory?

I ask because I know that novices struggle with navigating directories (most will have just seen it for the first time in the shell lesson). For the novice R lessons, we are trying to keep the source executable in R Markdown files. For this to work, what is written in the lesson has to correspond to the actual layout in the lesson repository. We recently moved the inflammation files to a subdirectory and updated the lessons. However, we have not yet moved the scripts. I hesitate to do so because this lesson is already challenging, and it is inevitable that many students will save the R script in the current working directory because they did not cd code before running nano.

I'd appreciate any advice on how to proceed.

Tuples referenced but not included in lesson

From an email I received:

Dear Software Carpentry,

Thank you for doing this work. I refer all scientists I know who express interest in becoming better programmers to your website and lessons.

I have one small bug report. I was reading your Python lesson on conditionals. One of the learning objectives is explaining the difference between tuples and lists, but tuples are not mentioned again in the lesson.
http://swcarpentry.github.io/python-novice-inflammation/05-cond.html

Also tuples and ipythonblocks library are mentioned in the reference material, but are not present in the lesson.
http://swcarpentry.github.io/python-novice-inflammation/reference.html

Thanks again for making this service available.

'Tabs and Spaces' example does not actually include tabs

The "Tabs and Spaces" callout in lesson 7 includes a code example where supposedly "the first two lines are using a tab for indentation, while the third line uses four spaces". In fact, all three lines use spaces for indentation.

Someone copying this example would not be able to reproduce the error.

I'm opening an issue rather than just fixing it, because:

  1. this might be a deliberate decision (the markdown file also has spaces, so this is not just a conversion problem)?
  2. fixing it is non-trivial, given that pandoc converts tabs to spaces by default (the '--preserve-tabs' option would have to be added to the pandoc call in the Makefile).

What is the inflammation data?

Just wondering what the data is a measurement of. It is certainly not temperature, is it CRP count or something else?

Knowing what it is, and the scale of the measurement, would make it easier to teach.

Issues at 06-cmdline.md

From https://github.com/swcarpentry/lesson-template/blob/gh-pages/tools/check.py:

  • Heading at line 56 should be level 2
  • Document contains heading not specified in the template: Command-Line Arguments
  • Document contains heading not specified in the template: Handling Multiple Files
  • Document contains heading not specified in the template: Handling Command-Line Flags
  • Document contains heading not specified in the template: Handling Standard Input
  • Could not find the linked asset file ../../rules.html in /tmp/python/../../rules.html. If this is a URL, it must be prefixed with http(s):// or ftp://
  • Could not find the linked asset file earlier in /tmp/python/earlier. If this is a URL, it must be prefixed with http(s):// or ftp://
  • The topic page should not have sub-headings outside of special blocks. If a topic needs sub-headings, it should be broken into multiple topics.

Make an IPython- / Jupyter- specific callout style

There are some places where instructors and students using the Jupyter notebook need to do something that those in an interpreter don't need to do; e.g., %matplotlib inline. We should put these notes in callouts. It would help, for those not using the Jupyter notebook, to be able to visually ignore these callouts. We could do that by making the Jupyter callouts a specific style / color etc.

Should at least some of the inflammation data pass the `detect_problems` function?

So we have a bunch of csv files with inflammation data. In 05-cond and 06-func, we develop some tests to tell whether or not our data is suspicious. One of the tests concerns the max behavior and one of the tests concerns the min behavior. But when we run detect_problems() on all of the files in inflammation*.csv we see that none of the csv files results in the "Seems OK!" print statement.

I just taught this lesson and some of the students stopped me to ask "Wait, so none of our data is okay?" I think it was demotivating for some of them (as it would be in the real world if you realized that all of your data were bad).

Should we consider changing either the datasets or the conditional tests in detect_problems() so that at least some of the inflammation data is considered OK?

Importing?

Now that we have broken up our lessons do we plan on teaching in a single notebook or a notebook per lesson? If we are going to teach in a notebook per lesson there should be import statements at the beginning of each lesson for modules we use over and over again (such as Numpy).

Issues at 07-errors.md

From https://github.com/swcarpentry/lesson-template/blob/gh-pages/tools/check.py:

  • Document contains heading not specified in the template: Syntax Errors
  • Document contains heading not specified in the template: Variable Name Errors
  • Document contains heading not specified in the template: Item Errors
  • Document contains heading not specified in the template: File Errors
  • The topic page should not have sub-headings outside of special blocks. If a topic needs sub-headings, it should be broken into multiple topics.

I was confused by the official location of the HTML of the lessons

In preparing for a workshop in which I will point to this material, I was working through the lessons at https://swcarpentry.github.io/python-novice-inflammation/01-numpy.html. I ran into problems, which I started to annotate using hypothes.is (e.g., https://via.hypothes.is/https://swcarpentry.github.io/python-novice-inflammation/01-numpy.html).

This morning, I then encountered http://www.software-carpentry.org/v5/novice/python/01-numpy.html, which seems to have been a differe (and perhaps more correct) version of https://swcarpentry.github.io/python-novice-inflammation/01-numpy.html.

Please make clearer the different versions of the material on the web.

Make a script to update python-novice-inflammation.zip

We have a zip file in the main directory to make it easy to get access to the necessary files used in the lesson. There's repetition here, but not repetition that we can avoid for pragmatic reasons. But, we should make sure that this zip always contains the right versions of the right files, so it would be helpful to have a little Python or Bash script that either checks that this zip is correct, or perhaps ideally, creates the appropriate zip file (which should be identical to the existing zip file if the files in it are the same).

Data file is in data subdirectory, numpy.loadtxt fails

At line 39 of 01-numpy.md , we load our first data file:

numpy.loadtxt(fname='inflammation-01.csv', delimiter=',')

The .csv file is in the data directory. Since I was working from the top level directory, the file wasn't found. I had to change it to:

numpy.loadtxt(fname='data\inflammation-01.csv', delimiter=',')

Are students expected to work from the data directory?

See also the discussion on issue #4 .

Issues at 03-loop.md

Fromhttps://github.com/swcarpentry/lesson-template/blob/gh-pages/tools/check.py:

  • Document contains heading not specified in the template: For Loops
  • Document contains heading not specified in the template: Lists
  • Document contains heading not specified in the template: Processing Multiple Files
  • The topic page should not have sub-headings outside of special blocks. If a topic needs sub-headings, it should be broken into multiple topics.

Examples in defensive.md should reference inflammation lesson

Testing is one of the hardest things for scientists to apply to their everyday coding. Our example of testing should make use of the scientific case we are following in this lesson to demonstrate how testing can be applied to scientific analysis.

Code in cmdline.md

It is unclear from the text of cmdline.md whether students are supposed to create files with the scripts, use the scripts in the code directory, or just read the code that's on the screen. This should be clarified.

Test stdin in cmdline.md

at the end of the lesson cmdline.md we modify our code accept stdin but then we only test our finished code on passing a filename. We should test stdin before we declare our code a success.

Example solutions for 10-cmdline

I've been working on adding solutions to the instructor's guide, but as I have never actually taught the Python+command line section, I don't have the solutions easily at-hand. It would save me some time to have a place to start rather than coding them all from scratch. Can anyone point me to a relevant notebook or repo that woudl help? Thanks.

Optional callouts

Many of our callouts add additional information to the lesson (and are great for people reading through the lessons) but may not be necessary to teach during workshops. It could be useful to note that the callouts aren't necessarily things that need to be taught in a workshop, but are primarily there for interest or for further reading. A few ways we could do this (no real preference toward one; comments with preferences welcome!)

  1. Add a note to the instructor guide specifying that callouts don't need to be taught unless it seems useful.
  2. Add a callout on the index page explaining what callouts are used for (I think these appear at the start of many technical books like the "for Dummies" series).
  3. Make all callout boxes collapsed by default, but shown when an arrow is clicked (as in, e.g., Bootsrap collapse)
  4. Make a separate "optional" callout that is collapsed by default, but shown when an arrow is clicked.
  5. Make a separate "optional" callout that is rendered differently than the usual blue callout.

tuples mentioned but not explained in 05-cond.md

The learning objectives for this section include "Explain the similarities and differences between tuples and lists." but this isn't really covered except briefly in the last exercise. Tuples are also mentioned later on (in 08-defensive.md).

Python modules: hyphens to underscores

Many of the python modules in code include hyphens in the name, making them practically impossible for learners to import (there is a way, but it's really obtuse and bad form). These should be changed to e.g. underscores in order to make them easily importable.

imports in the python lesson

The first lesson for the python lesson [1] shows importing numpy and matplotlib.pyplot without an alias. but in the next lesson where we go back to plotting inflammation data [2]
we import glob but the code in the for loop use np and plt aliases.

If students follow the current lesson material, they would not know to import using an alias and will trip up when the code fails

[1] http://swcarpentry.github.io/python-novice-inflammation/01-numpy.html
[2] http://swcarpentry.github.io/python-novice-inflammation/04-files.html

Issues at 04-cond.md

From https://github.com/swcarpentry/lesson-template/blob/gh-pages/tools/check.py:

  • Document contains heading not specified in the template: Image Grids
  • Document contains heading not specified in the template: Conditionals
  • Document contains heading not specified in the template: Nesting
  • Document contains heading not specified in the template: Creating a Heat Map
  • The topic page should not have sub-headings outside of special blocks. If a topic needs sub-headings, it should be broken into multiple topics.

What is wrong with the inflammation data?

I am preparing for teaching next week, and I have some questions:

  1. Throughout the lesson, it is indicated that there is something wrong with the data. I have yet to figure out what exactly the problem is?
  2. A more teaching technical thing: in 02 Checking our data, it is indicated that you can stuff the shown if statement into the for loop shown in 01 Processing multiple files. Do people do that when they teach, or do they just show them on screen and explain?
  3. When I do put that if statement in there, I get either suspicious maxima or adding up to zero on all of them. Is this actually correct?

Tuples and Exchanges challenge throws an error (with fix)

In 05-cond.md, this section:

Tuples and exchanges
Explain what the overall effect of this code is:

temp = left
left = right
right = temp

produces this error:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-199-42933dda7154> in <module>()
----> 1 temp = left
      2 left = right
      3 right = temp

NameError: name 'left' is not defined

left and right should be initialized to prevent the error.

Pull request here.

Is there interest in an upgrade to Python 3?

Hi all,

There's some hints here that @gvwilson wasn't pleased with the Python 3 transition, but that he now believes it is no harder to teach than Python 2. (I haven't found a primary source for these comments.)

I think every beginner we teach Python 2 puts additional load on the community (which has to maintain Python 2 code for that much longer), so my view is that we should port this lesson to Python 3 ASAP.

If there is interest, I'm more than happy to take this on.

Thanks!

Lessons never say to start the notebook

The first lesson starts right off showing Python commands and there's never discussion of where someone following along should put those commands. This may be intentional so that the lessons are flexible, but it also leaves someone following along on their own confused as to where they should be typing. I'd recommend adding some preliminary discussion pointing people at the IPython notebook and terminal.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.