Git Product home page Git Product logo

data_bootcamp's People

Contributors

cc7768 avatar davebackus avatar pbackus avatar sebecketthile avatar sglyon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data_bootcamp's Issues

class 1: describe directory structures

We might want to do this in a way that's approachable by newbies:

  • describe what they look like
  • explain how to create a new directory (Data_Bootcamp)
  • Download a file from GH repo and save it

Has to be platform/OS independent.

css themes for IPython

Low priority, but Brandon Rhodes uses one to make the display of df's sharper. I'm curious how this works.

pandas cleaning notebook

Below is a list of comments. If you want me to make any of the changes let me know:

  • When you start the section on string methods you include the pure python example of making "$123.45" a float two times. I think just once is enough.
  • I've often used string methods and pd.to_datetime to convert three numeric columns for year, month, day into a single column with a pandas date time type.
  • When you introduce selecting variables and observations I also hear the term indexing often (in addition to subsetting, filtering, and slicing)
  • There is a typo in the 4th bullet point when showing all the ways you can index into df. You wrote df[nlist]] instead of df[nlist]
  • When talking about the boolean selection we might want to introduce the query method. It is very concise and compiles the expressions and runs them in a more efficient way than we do when constructing these series/DataFrames of booleans by hand. I also like that it makes us not have to manage the boolean objects ourselves -- there's less room for bugs when we handle less temporary variables.
  • Formatting preferences: for the exercise at the bottom, I'd include the list of questions after the code snippet that loads the data instead of before it.

website

I've created a directory called Markdown and put a few files in it that should serve as the basis for a course website. Links to other pages are relative.

The question is how we want to set up the course website. Use this way, or use these inputs to create something nicer? Thoughts welcome.

Suggestion: Add lecture on SQL

Hi Dave,

Your posts today led me here. Reading the syllabus you've posted here, I'd urge you to add a lecture on SQL as it's by far the most common source of data in a corporate environment.

In my experience, work breaks down as:

  1. Get data
  2. Transform data
  3. Explore data
  4. Model data
  5. Generate graphs/figures
  6. Present analysis

For me, SQL's a critical piece of 1, 2, and sometimes 3.

Best,

-Tim

Cannot run anything in SQL_Intro file

I have been running the SQL_Intro file without any problems in the past. I went back to it for review today. After selecting run all, I noticed that none of the line items (in [ ]) had numbers in the brackets. Any time I enter a prompt, I don't get the proper outputs. I'm pretty certain I haven't installed any apps or anything that would've change settings on my computer.

Here's an example:

run('''
PRAGMA TABLE_INFO(sales_table)
''')


NameError Traceback (most recent call last)
in ()
----> 1 run('''
2 PRAGMA TABLE_INFO(sales_table)
3 ''')

NameError: name 'run' is not defined

I am running Anaconda (Py 3.4) on a Windows 7 device.

Thank you.
error

Add intro

  • How programs work

Not like spreadsheets, they go in order:

input data
plot data
compute mean
compute standard deviation
run regression

Each of these components has a number of steps, but the idea is to run through them in order. Like a novel, but not like a spreadsheet, which does everything at once.

  • Resources

Hard way

Codecademy

Coursera

Comments on each, strengths and weaknesses.

Compelling examples for first class

It's helpful to show students what they'll get out of the course, what their output might look like. So I thought we might collect some figures that we think are compelling in some way. Some of this will follow from the inherent interest in the data, but we might also mix in some interesting hi-tech stuff. If things cross your mind, add them here for future reference.

My current list, not filtered for quality, is

  • Emerging market indicators: indicators of business conditions by country from the World Bank, end with radar chart
  • Business cycle indicators: leading and contemporaneous monthly indicators of US economic growth
  • Demographics: current and projected age distribution of populations by country. can we make this dynamic? GIF?
  • Equity returns: distributions of returns on Fama-French equity portfolios, what would $1000 in 1950 be worth now?
  • Term structure of interest rates: dynamics of the term structure if interest rates. Sarah did a dynamic version in plot.ly, but that takes some setup.
  • Options: prices of puts and calls by strike, implied volatility smiles.

I'm not particularly attached to any of them.

wireless access in the classroom

We should check this:

  • If we have 50 people in a classroom downloading Anaconda, will the system be overloaded?
  • Also what to do with non-Stern students -- can we get a guest account? Instructions for them to start their Stern account?

package installation and updates

We should have a section about conda and pip, maybe some other things, so students understand how to install and update packages. Would you mind sketching out a markdown review of how these work? Rough is fine. You might also check and see if Launcher does this for conda.

Assigned to: Chase and Spencer
Approx deadline: Sep 15

Example for problem with years

Add example where years are represented as floats which makes the x or y axis display in scientific notation (i.e. 1.0 + 2.01e3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.