davebackus / data_bootcamp Goto Github PK
View Code? Open in Web Editor NEWMaterials for a course at NYU Stern using Python to study economic and financial data.
License: MIT License
Materials for a course at NYU Stern using Python to study economic and financial data.
License: MIT License
We might want to do this in a way that's approachable by newbies:
Has to be platform/OS independent.
Low priority, but Brandon Rhodes uses one to make the display of df's sharper. I'm curious how this works.
We started the problem sets in LaTeX. For example:
https://github.com/DaveBackus/Data_Bootcamp/blob/master/Documents/bootcamp_practice_1.pdf
Is there a better technology than this? Submitting answers requires students to write out code by hand, which seems a little clunky. We could do notebooks, but had thought we'd postpone them until the second half.
Spencer and I thought we'd see what Chase could come up with. Or should we stay with LaTeX?
Below is a list of comments. If you want me to make any of the changes let me know:
"$123.45"
a float two times. I think just once is enough.pd.to_datetime
to convert three numeric columns for year, month, day into a single column with a pandas date time type.indexing
often (in addition to subsetting
, filtering
, and slicing
)df
. You wrote df[nlist]]
instead of df[nlist]
query
method. It is very concise and compiles the expressions and runs them in a more efficient way than we do when constructing these series/DataFrames of booleans by hand. I also like that it makes us not have to manage the boolean objects ourselves -- there's less room for bugs when we handle less temporary variables.I've created a directory called Markdown and put a few files in it that should serve as the basis for a course website. Links to other pages are relative.
The question is how we want to set up the course website. Use this way, or use these inputs to create something nicer? Thoughts welcome.
Hi Dave,
Your posts today led me here. Reading the syllabus you've posted here, I'd urge you to add a lecture on SQL as it's by far the most common source of data in a corporate environment.
In my experience, work breaks down as:
For me, SQL's a critical piece of 1, 2, and sometimes 3.
Best,
-Tim
I have been running the SQL_Intro file without any problems in the past. I went back to it for review today. After selecting run all, I noticed that none of the line items (in [ ]) had numbers in the brackets. Any time I enter a prompt, I don't get the proper outputs. I'm pretty certain I haven't installed any apps or anything that would've change settings on my computer.
Here's an example:
run('''
PRAGMA TABLE_INFO(sales_table)
''')
NameError Traceback (most recent call last)
in ()
----> 1 run('''
2 PRAGMA TABLE_INFO(sales_table)
3 ''')
NameError: name 'run' is not defined
I am running Anaconda (Py 3.4) on a Windows 7 device.
Not like spreadsheets, they go in order:
input data
plot data
compute mean
compute standard deviation
run regression
Each of these components has a number of steps, but the idea is to run through them in order. Like a novel, but not like a spreadsheet, which does everything at once.
Hard way
Codecademy
Coursera
Comments on each, strengths and weaknesses.
It's helpful to show students what they'll get out of the course, what their output might look like. So I thought we might collect some figures that we think are compelling in some way. Some of this will follow from the inherent interest in the data, but we might also mix in some interesting hi-tech stuff. If things cross your mind, add them here for future reference.
My current list, not filtered for quality, is
I'm not particularly attached to any of them.
We should check this:
We should have a section about conda and pip, maybe some other things, so students understand how to install and update packages. Would you mind sketching out a markdown review of how these work? Rough is fine. You might also check and see if Launcher does this for conda.
Assigned to: Chase and Spencer
Approx deadline: Sep 15
Add example where years are represented as floats which makes the x or y axis display in scientific notation (i.e. 1.0 + 2.01e3)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.