Git Product home page Git Product logo

python-novice-gapminder's Introduction

python-novice-gapminder's People

Contributors

abostroem avatar alee avatar alex-ball avatar amasson84 avatar biologyguy avatar chbrandt avatar delocalizer avatar deppen8 avatar echism avatar eldobbins avatar iimog avatar justbennet avatar katrinleinweber avatar lexnederbragt avatar lo5an avatar martinosorb avatar maxim-belkin avatar montoyjh avatar ntmoore avatar phcerdan avatar rgaiacs avatar shyamd avatar souravsingh avatar sstevens2 avatar upendrak avatar vahtras avatar valentina-s avatar vinisalazar avatar wikfeldt avatar zkamvar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-novice-gapminder's Issues

Allow for use of plain-text editor + shell as alternative to Jupyter Notebook

I know this is revisiting something that has been discussed in several places before, but I worry that the use of a more abstract environment like the Jupyter Notebook might confuse and add an additional barrier to people wanting to "really" use Python.

I have found in that past that also introducing the shell to be a very minor addition, and feel that by teaching Python via the notebook we would only delay this necessary skill. In particular, only a few commands are needed to use the shell well enough to navigate and run Python scripts, and they are the same commands on both windows command prompts and linux/macos terminals.

On the other hand, by using the notebook, the students are blocked from simple concepts like importing functions etc. defined in other scripts.

motivating example

This is the first of several issues I'm opening in response to Greg's request (03/07/16) for feedback on the lesson. So, sorry for creating a bunch of new issues, but I think it's best to keep the topics separate.

So, first item: Perhaps this is still in the works, but I think some sort of motivating example at the very beginning would be useful. A "Here's what you'll be able to do by the end of the day" sort of thing. It's always good for students to be able to see where they are going, I think (and why).

Some comments on your very nice work

Greg,

I think all the topics are useful and germane to the audience. The only things I might add or remove are 1) Lists and iterables are such a central part of Python, they almost have to be talked about somehow (suggestions below). I see nothing on writing data files, and I think most people will want to be able to save results. This looks really good. Please let me know if you have fleshed out sections to review, and I'll find time to help, if my comments are on-topic and helpful.

Under Essential Questions, clarify what you mean by "process multiple data sets". Coming from statistics, that could be each file treated indendently, or it could imply merging multiple files into a single data set with an indicator and then doing a stratified analysis. Both would be valuable to demonstrate.

I think that, this being Python, that covering iteration in some way is almost required. This could be introduced with file handles and reinforced with loops, maybe? Yes, I see that you explicitly say NOT lists. But they are such an integral part of the python way.

Might I suggest introducing writing functions with 'templates' -- That includes the docstring, the init, a test, and a dummy function that assigns a variable and prints its value. Once the docstring is written, that is used as the guide for writing the real functionality. I don't do this consistently enough, but I think my code gets written faster, I resist feature creep better, and it's easier to do it this way than to decode already written code. I think it also helps encourage comments in the code. Like an outline for a term paper.

Which are the key SciPy modules?

Make it clear which sections most classes do or don't reach

This is great work, I'm really keen for to teach this lesson! I do however think the time estimates are too short across the board; marching through this in a straight line will maybe get through functions by the end of a day. People have raised concerns in the past about feeling disappointed they didn't get through all the material; as I always say, it's fine if you don't finish - but given that this seems to stress people out, maybe frame that material as an appendix / supplement to signal that flexibility.

possible midterm exam

Looking at what's covered in the morning, I was going to suggest something similar to what's in the python-novice-inflammation lesson: loop over a set of datafiles and for each one, read data, extract a subset, calculate something, plot something else. This seems awfully ambitious for people who haven't programmed before, however - but maybe for motivated folks it's feasible.

Suggestions on how to introduce error reading and use of help

I might be getting a bit ahead of the game here, but I'm keen to see how we actually introduce errors and help. It's easy to forget, but most docs are utterly inscrutable for beginners; pointing them at the help and telling them to solve a problem with it tends to be an exercise in despair. I do agree with keeping this in there - but close guidance will be necessary.

Reading Tabular Data - Numpy vs Pandas

When reading in the Gapminder data, why not read it in with Pandas? That way the learners can easily see the column names, and you won't get into trouble loading the data because of the country column being a string in the datasets we are giving them (np.loadtxt throws a ValueError: could not convert string to float: b'country' when you try to use it to load gapminder_gdp_americas.csv).

I find Pandas more intuitive with the accessible column names and easy to view tabular-like data frames. Once it's loaded with Pandas, we can use Numpy functions to do the planned calculations. Thus, we don't have to get into all the Pandas nitty gritty.

Exercise time estimates

I really like the proposed outline in index.md! My only thought when reading through was that the morning exercises might take longer than estimated for a novice group, especially when we get to things like loading up Pandas dataframes and reading documentation.

Rather than extend the exercises and shift chunks around, I wonder if it would be better to take time from the teaching portion and shift it to the exercises. Or perhaps make the teaching / exercise split more flexible? E.g., for Built-in functions (and methods) and help, we could say Teaching + Exercises (10 min) but with a minimum of 5 minutes for exercises?

Just some brainstorms; it's hard to estimate how long challenges will take without seeing the challenges, so the time estimates might be way more appropriate to talk about later on down the line. The general pace of the lesson seems good to me!

Take out testing

If there is a need to save time, Testing could go - unfortunate, but it deserves a separate lesson. It could be mentioned during the Defensive Programming part.

Writing lesson material & exercises

This is in reference to Greg's blog post "Designing Lessons Collaboratively".

I like the idea of throwing the lesson design open, to however many volunteers there are to write individual sections, based on the plan outline. Of course, this runs the risk of lacking continuity, consistency, and common themes. My hope would be that the established lesson outline, and agreed use of a dataset (the GapMinder data), would provide focus and mitigate against this.

Speaking as a newly-badged instructor, I'd be very happy to contribute exercises to the lesson. Perhaps volunteer instructors could be assigned (or choose) one or more sections to write an exercise for? If there are enough willing volunteers, this could provide overlapping coverage, ensuring multiple exercises for each section/topic.

Teach Markdown notation for writing documentation throughout

Along with programming style, instructors should implicitly teach students Markdown notation when writing {pseudocode, comments, docstrings}—and explicitly emphasize consistency and parallel phrasing and structure (modularity) in writing the same. Could go straight in the checklist.

Starting the habit early can only help.

Use Bokeh for plotting instead of Matplotlib

Matplot lib is so 2002 - if you want to give the students a 'wow' feeling and teach them the new normal how about Bokeh?

I would say the library is becoming mature now, currently at 0.11, and to make nice-looking high-level interactive plots the interface is very simple. At least no more cognitive load than starting with matplotlib in my opinion. Plus, if you are already planning on Pandas (which I fully +1) this already ties in perfectly to Bokeh, which understands how to deal with pandas objects.

You could use the code here from the Bokeh Gapminder example (https://anaconda.org/bokeh/gapminder/notebook) to perhaps create a simpler version for the lesson. (Without some of the formatting, the background year text, and the legend, it could be simple-ish, and perhaps even still have the slider widget and hover tool?)

If that is more for an intermediate to advanced level, you could use grid-plot to show a couple of snap-shots in time from the dataset, rendering them together with linked axes and a hover tool (quite easy in Bokeh). See grid plot examples here http://bokeh.pydata.org/en/0.10.0/docs/user_guide/layout.html

Launching jupyter notebook using "jupyter notebook ."

This may just be my own ignorance but why do we launch the Jupyter notebook in the exercise (in 01-run-quit.md) with a trailing dot?

$ jupyter notebook .

Could we add a bullet point explaining that to the users? I would have added one myself but I sort of need someone to explain it to me...

Discuss Python community coding standards somewhere

Relocated from swcarpentry/DEPRECATED-bc#930 (by @gregcaporaso)

Community standards

As you begin to develop python packages (i.e., bundled collections of python code) that others are using, or that you are hoping other developers will contribute to, it's useful to adhere to python community standards. Some python community standards that you should be aware of (and ideally adhere to in your own python package) include:

  • pep8: a style guide for python, which discusses topics such as how you should name variables, how you should use indentation in your code, and how you should structure your import statements, among many other things. Adhering to pep8 makes it easier for other python developers to read and understand your code, and to understand what their contributions should look like. The pep8 application and python library can check your code for compliance with pep8.
  • numpydoc: a standard for API documentation through docstrings used by numpy, scipy, and many other python scientific computing pacakges. Adhering to numpydoc helps ensure that users and developers will know how to use your python package, either for their own analyses or as a component of their own python packages. If you use numpydoc, you can also use existing tools such as Sphinx to automatically generate HTML documentation for your API.
  • pypi (the python package index) pip: standards for making your python package accessible and installable from the command line. Uploading releases of your python package to pypi and testing that they are installable with pip enables users to easily obtain working versions of your python packages, and developers to easily distribute their own tools that rely on your python package.
  • Semantic Versioning: a standard describing how to define versions of your python package. Using Semantic Versioning makes it easy for other developers to understand what is guaranteed to stay the same and what might change across versions of your python package. This in turn enables other developers to confidently build tools that depend on your python package.

Include references to online documentation for Pandas, to Stack Overflow, etc. in lessons

This comment is related to some of the concerns brought up in #22. I think it is important that learners know where they can go for help and that they realize they don't have to commit everything to memory. But whenever I try to introduce tools like online documentation (e.g. pandas), it's not very well received. Perhaps it is because the documentation is difficult even for experienced programmers but sometimes I get the feeling that beginners think it is almost like cheating to look online for help.

I see there is some room for online tools (python.org) early in the lesson. But what about including reference to pandas online documentation, perhaps in the Documentation section as an example? As difficult as the online documentation is to read, I think that reading/writing and using documentation is an important part of programming.

Frequent, short exercises are pragmatically difficult

The lesson includes many short 5-10 minute exercises. Is the intention to have the instructor working on these exercises interactively with the audience, or is there a dedicated break where the attendees work on the exercises themselves (or in small groups)? It has been my experience that the latter is quite difficult in practice. Novices need a lot of time to get oriented and understand the task they are being asked to perform.

Similarly, I've never managed to have a 15 minute coffee break. Attendees wander off or get deep into a conversation (this sort of discussion is valuable and I think worth encouraging). It usually ends up being 25-30 minutes long.

Overall, my point is that time estimates will always be underestimated compared to what happens in a live workshop where you have to manage a room full of people. A 6.5 hour lesson means that the last few topics will be rarely taught. One idea is to make sure the last Wrap-Up lesson makes sense even if Programming Style and Debugging had to be skipped for time.

reduce jargon in 08-lists

I'd consider cutting the section "A list is a mutable ordered collection of heterogeneous values." in 08-lists. This only teaches learners new jargon, not new concepts, and I am not sure the jargon is very helpful at this point.

manual vs power tools

Nice work. I've found introducing 'power tools', i.e. functions that accomplish a whole lot, too early leads to confusion and hurts chances of a person to form a proper mental model of the topic. For example, introducing pandas readcsv function before the person has a chance to manually loop through a file and try to parse it themselves.

Considering the ambitious 1-day schedule, at a minimum it would be good to manually build one functionality and then introduce the similar functionality with a module from Pandas. Then the same parallel can be drawn to other functionalities that are demonstrated.

Nice work.

Lots of broken external links in lesson materials

Almost all of the of the external links for this lesson that I've tried clicking on so far seem to be broken:

For instance, the link to the gapminder dataset in the setup instructions points to http://swcarpentry.github.io/python-novice-gapminder/setup/python-novice-gapminder-data.zip, when it should instead point to https://github.com/swcarpentry/python-novice-gapminder/raw/gh-pages/python-novice-gapminder-data.zip.

Here's another example - all of the sites in the Next Steps segment seem to point at the GitHub pages site instead of the actual external websites at (scipy for example points at http://swcarpentry.github.io/python-novice-gapminder/21-next-steps/scipy.org instead of http://scipy.org/).

No explaination as to what Data Frames are

I think it is important to briefly specify what a data frame is as it might leave attendees confused. For example, a data frame is 2-dimenstional table with columns of potentially different data types.

This is to be inserted in the Use DataFrame.info to find out more about a data frame section right after the bullet point This is a data frame.

remove unnecessary detail from 17-call-stack

I fear there is way too much detail in 17-call-stack for "people who have never programmed before". The key concept I would want my students to learn in this section is the idea of scope. All of the information about the run-time stack, stack frames, etc. is low-level implementation details, and I doubt any of it is necessary. I don't think a first-time programmer should have to worry about how scoping is implemented. The important thing is to understand what it means.

Docstrings in python

Can we introduce the concept of Docstrings in Python and how useful they are. I see that the docstrings have been used in several places throughout the lessons but never talked about Docstrings anywhere.

Requirements - developing environment

What about using Rodeo instead of the Notebook or editing nano? I have heard great things, and it looks very similar to RStudio (which works quite well for teaching). One issue to be aware of is that installing Rodeo and having it work with the Anaconda Python distribution we ask learners to download requires editing the PATH. We don't want to ask learners to do this, so we need to come up with some sort of installer to smooth this out.

20-style: add style guidelines

20-style points learners to PEP8, which gives them a good resource for later consultation, but it's not a quick read for use in the context of a SC workshop. It'd be better, I think, if this section also summarized some of the most important concepts. This section would also be a nice place to include less mechanical stylistic advice, such as 1) why it's a good idea to avoid using lots of global variables; 2) DRY; 3) etc.

Strings to Numbers exercise (ep 03)

I wonder if it's not better, in episode 03, to demonstrate how recasting will break on cases that they should be expected to break on before giving a task at that will throw an exception.
See the changes suggested in commit 113b92c to see what I mean.

when to present Pandas/matplotlib

I understand the motivation for covering pandas and matplotlib -- I'm guessing the idea is to give learners "powerful tools" right from the get-go. However, if this is really for "people who have never programmed before", I wonder whether throwing in all of that in the middle of the day will leave students feeling empowered or overwhelmed. Just learning the basics of a general-purpose programming language, like Python, for the first time, in a single day, is a lot to cover. I'm not suggesting cutting this entirely, but I wonder if it might work better as a "putting it all together" project at the end, if there is time. And maybe also cut it down to a really small set of basic Pandas/matplotlib functionality to keep from overwhelming students -- perhaps only what is needed to complete some relatively simple data analysis task.

Introduce dictionaries

I would like to see dictionaries introduced in the beginner lesson. Lists and dictionaries are basic Python data types and it seems appropriate to discuss dictionaries after lists. Lists are suitable for ordered data and dictionaries are for categorical data. I would like to see some information about choosing an appropriate data structure for the type of data being manipulated.
Pandas will also create a data frame columns with column names from key:value pairs in a dictionary.

Review scope of lesson and make prerequisites more accurate (or shorten lesson).

Please redirect discussion to #70.

Given the contents and time allotted, This lesson is an introduction to programming Python for people with little or no previous programming experience. should read This lesson is an introduction to Python programming for people who have no prior Python experience, but who have a working understanding of basic programming concepts (e.g. loops, variables, data types, if/then) and who are familiar with statistical analysis.

The ~10 minutes allotted to many of these core programming concepts is enough to expose people to syntax, but nowhere near enough for them to build mental models, test them, iterate on the broken bits, allow concepts to sink in, et cetera.

I've covered less than half of this material with genuine novices in a day and a half (using jQuery so as to eliminate any kind of installfest/dependency management time), and that's the only class I've done where I really got the sense that, given another half-day, we could have attacked actual use cases productively. (And even then, having students who do have some programming experience helps a lot in terms of making sure there's lots of help available throughout the room.)

It's possible that you can cut the time somewhat with people who have substantial STEM backgrounds (common in your audience, rare in mine), but 1) I'm not sure, and 2) I don't know how to state that as a prereq without sounding like a jerk.

Devote more time to NumPy

I agree that pandas DataFrame is becoming the workforce of data analysis in Python. Recently, I gave as SC lesson on pandas, which was received quite well. However, I think one should introduce or at least mention numpy in this lesson for several reasons:

  • students leaving the course without basic familiarity of numpy will not be able to understand ~60% (my rough guess) of scientific Python applications (this is contradiciting the last goal listed for the lesson),
  • much of pandas is based on numpy API
  • pandas is big and complex and there are many ways to get yourself into trouble (for example, there are mutliple ways of indexing), numpy has a simpler design and uses only few concepts (indexing, ufuncs, dtype).

On the other hand:

  • pandas includes (basic) plotting
  • pandas can read and export to many data formats

Objectives in "Writing Functions"

The objectives in the "Writing Functions" episode are:

  • Explain and identify the difference between function definition and function call.
  • Write a function that takes a small, fixed number of arguments and producing a single result.
  • Correctly identify local and global variable use in a function.
  • Correctly identify portions of source code that will be displayed as online help, and in particular distinguish docstrings from comments.
  • Write short docstrings for functions.

However the latter three are not covered in this episode. Variable scope has its own episode and docstrings/online help are part of the "Programming Style". Therefore this three points should be moved to their respective episodes.

Introduce writing functions by use of templates

From #24:

Might I suggest introducing writing functions with 'templates' -- That includes the docstring, the init, a test, and a dummy function that assigns a variable and prints its value. Once the docstring is written, that is used as the guide for writing the real functionality. I don't do this consistently enough, but I think my code gets written faster, I resist feature creep better, and it's easier to do it this way than to decode already written code. I think it also helps encourage comments in the code. Like an outline for a term paper.

Add discussion of paths and the filesystem so that learners understand how to open files

I understand the need to minimise prerequisites and certainly the portions of the shell lessons relating to loops, pipes, filters, and scripts are unnecessary overhead. However, given that they will be loading data from files into pandas dataframes, there has to be some concrete concept of where the files are relative to the working directory (even in the Jupyter notebook). This is also going to be an issue with getting the data onto the machines the learners are using and loading them up later.

Anyway, my feeling is that, if this isn't expected as a prerequisite (which is reasonable for most of our novice learners), then we have to figure our how to get that knowledge to stick. At present, there are 10-15 minutes allocated for in the "Reading tabular data" segment; that may be optimistic. If I were to squeeze out something, I'd start with the survey of scipy at the end of the day, followed by testing (as discussed in another issue).

Clarify what is meant by processing multiple data sets

From #24:

Under Essential Questions, clarify what you mean by "process multiple data sets". Coming from statistics, that could be each file treated indendently, or it could imply merging multiple files into a single data set with an indicator and then doing a stratified analysis. Both would be valuable to demonstrate.

Collected comments

It is a really nice outline! Here are a few comments.

Goals

Not all are treated in the lesson it appears, e.g.

f. Dependencies and requirements are explicit (e.g., a requirements.txt file)
i. Submit code to a reputable DOI-issuing repository upon submission of paper

In

c. Programs of all kinds (including "scripts") are broken into functions that:

the subitems from "Good Enough Practices in Scientific Computing" are missing.

Running and Quitting Interactively

Perhaps reverse the order: first show that the cell you have is a python cell. Then add that one can have markdown cells as well - in the hope of reducing confusion

Variables and Assignment

Trace behavior of swapping (a, b = b, a the old fashioned way) with an intermediate variable

I think I understand the idea, but it is not entirely clear from the description

Looping Over Data Sets

(use glob to get filenames)

This will require som explanation if learners do not know unix (but it is not impossible)

Covering the .something and .something() syntax

There is a section titled "Built-in Functions (and Methods) and Help". Is this where we would introduce Python dot-something syntax and discuss attributes and methods? It should definitely happen here if not before, and feeds nicely into looking at help pages etc.

Add Socrative Quizzes for real-time assessment

I think it would be nice to make a Socrative question for all the short exercise questions so that the instructor can administer the exercise and get realtime feedback on learner responses.

Pros:

  1. It can help keep the time devoted to each question very manageable because the instructor can see student correct and incorrect answers.
  2. We can collecting data on what students are learning and how effective the lesson is.
  3. We can see which exercises the instructors chose to administer. Given how many people think fewer exercises are better, this could be some very valuable data.

01-Running - formatting and starting off

(There is a formatting glitch in the ~~~ for running the notebook. Needs to be on a new line? Also not clear to a total newcomer whether you type the $)

If this is for people who have never programmed before, I found the organization and beginning really confusing: "Python programs are plain text files" Now let's talk about using Jupyter which stores scripts as JSON.

Maybe there needs to be a lesson zero, which talks about gathering the commands they have typed into the python interpreter into a text file and running them as scripts. Then the current 01 could be called something about using Jupyter, and it could do the same thing in there?

It is redoubled by the fact that they go back to interpreter-like interactions with basic assignments, and not running scripts within a notebook.

Strings to Numbers exercise errors

In the 3rd chapter of the python introduction, i noticed that the following errors

print("string to float:", float("3.4"))

The result of this should be

string to float:, 3.4

and not

3.4

Similarly

print("float to int:", int(3.4))

The result of this should be

float to int:, 3

and not

3

Thanks
Upendra

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.