Git Product home page Git Product logo

data-science's Introduction

Open Source Society logo

Open Source Society University

📊 Path to a free self-taught education in Data Science!

Open Source Society University - Data Science

Contents

About

This is a path for those of you who want to complete the Data Science undergraduate curriculum on your own time, for free, with courses from the best universities in the World.

In our curriculum, we give preference to MOOC (Massive Open Online Course) style courses because these courses were created with our style of learning in mind.

Curricular Guideline

OSSU Data Science uses the report Curriculum Guidelines for Undergraduate Programs in Data Science as our guide for course recommendation.

Curriculum

Introduction to Data Science

What is Data Science

Introduction to Computer Science

Students who already know basic programming in any language can skip this first course

Introduction to programming

Introduction to Computer Science and Programming Using Python

Introduction to Computational Thinking and Data Science

Data Structures and Algorithms

The Algorithms courses are taught in Java. If students need to learn Java, they should take this course first

Java Programming

Algorithms I: ArrayLists, LinkedLists, Stacks and Queues

Algorithms II: Binary Trees, Heaps, SkipLists and HashMaps

Algorithms III: AVL and 2-4 Trees, Divide and Conquer Algorithms

Algorithms IV: Pattern Matching, Dijkstra’s, MST, and Dynamic Programming Algorithms

Databases

Database Management Essentials

Data Warehouse Concepts, Design, and Data Integration

Relational Database Support for Data Warehouses

Business Intelligence Concepts, Tools, and Applications

Design and Build a Data Warehouse for Business Intelligence Implementation

MongoDB for Developers Learning Path

Single Variable Calculus

Calculus 1A: Differentiation

Calculus 1B: Integration

Calculus 1C: Coordinate Systems & Infinite Series

Linear Algebra

Essence of Linear Algebra

Linear Algebra

Multivariable Calculus

Multivariable Calculus

Statistics & Probability

Introduction to Probability

Intro to Descriptive Statistics

Intro to Inferential Statistics

Statistical Learning with Python by Stanford University on EdX or Statistical Learning With R by Stanford University on EdX

Data Science Tools & Methods

Tools for Data Science

Data Science Methodology

Data Science: Wrangling

Machine Learning/Data Mining

Machine Learning

Intro to Machine Learning

Mining Massive Datasets

Process Mining

How to use this guide

Duration

It is possible to finish within about 2 years if you plan carefully and devote roughly 20 hours/week to your studies. Learners can use this spreadsheet to estimate their end date. Make a copy and input your start date and expected hours per week in the Timeline sheet. As you work through courses you can enter your actual course completion dates in the Curriculum Data sheet and get updated completion estimates.

Order of the classes

Some courses can be taken in parallel, while others must be taken sequentially. All of the courses within a topic should be taken in the order listed in the curriculum. The graph below demonstrates how topics should be ordered.

Topic Progression Graph

Track your progress

  1. Create an account in Trello.
  2. Copy this board to your personal account. See how to copy a board here.

Now you just need to pass the cards to the Doing column or Done column as you progress in your study.

Which programming languages should I use?

Python and R are heavily used in Data Science community and our courses teach you both. Remember, the important thing for each course is to internalize the core concepts and to be able to use them with whatever tool (programming language) that you wish.

Content Policy

You must share only files that you are allowed. Do NOT disrespect the code of conduct that you sign in the beginning of your courses.

Prerequisites

The Data Science curriculum assumes the student has taken high school math and statistics.

How to contribute

You can open an issue and give us your suggestions as to how we can improve this guide, or what we can do to improve the learning experience.

You can also fork this project and send a pull request to fix any mistakes that you have found.

If you want to suggest a new resource, send a pull request adding such resource to the extras section. The extras section is a place where all of us will be able to submit interesting additional articles, books, courses and specializations.

Code of Conduct

OSSU's code of conduct.

Community

We have a Discord server! This should be your first stop to talk with other OSSU students. Why don't you introduce yourself right now?

Subscribe to our newsletter.

You can also interact through GitHub issues.

Add Open Source Society University to your Linkedin and Facebook profile!

Team

data-science's People

Contributors

ahmedabbas11 avatar akshat02 avatar daniel-web-developer avatar elahi-cs avatar ericdouglas avatar johnaoss avatar meboler avatar pulkitkrishna00 avatar raincrash avatar royshouvik avatar sidgupta234 avatar smcgb avatar tawfiq9009 avatar torrontogosh avatar tuskydev avatar waciumawanjohi avatar zayd-r avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

data-science's Issues

Trello Board

When accessing the Data Science Trello Board, and you click on the menu.. we are unable to copy the board.

Capstone in Coursera and Nanodegree of Udacity are not free.

As far as I know inn Coursera however all courses from the specialization except for the capstone project can be found and enrolled separately. And to access the capstone project, you need to get certification from all other courses from the specialization.
Also on Udacity the nano degree programs are not free.

I think you should add these points in the introduction as many people may think all those specializations are free. Unless I'm wrong in which case I would like to know what's the procedure of accessing them for free.

Thanks

Not all courses listed here are free

The majority of courses that link to Coursera from the Databases and Data Science Tools & Methods section are not free because they belong to some specializations. Is there any way to access them for free or alternatives to them?

Qesstion

Should I take computer science curriculum? or I can start with that?

LICENSE

The repository is missing a LICENSE, propose to add MIT license consistent with the ossu/computer-science repository.

Data Wrangling

Hi
I'm currently a Udacity Data Analyst Nanodegree student. One of the tougher projects that I encountered recently was that of data munging and wrangling. As New York Times also noted once, the data wangling part can take 50-80% of a data scientist's time in data science projects. So I thought it should be important for people to get their data wrangling skills on par as well when learning data science.
If you'd like you can include the following course under a new 'Data Wrangling' section -
Data Wrangling with MongoDB

Or maybe any other courses from EdX, Coursera or elsewhere.

Thanks
Akshat Tickoo

Request for Comments: Data Science Curriculum v2

Problem:
The curriculum has not been maintained and does not represent best practice.

Duration:
2020-08-31

Background:
OSSU recommends courses that would constitute an undergraduate major in Data Science. It is our responsibility to ensure that we follow best practice. To do so, we must bring the curriculum into alignment with external guidelines. A candidate set of guidelines has been identified and previously proposed.

In 2017, the Annual Review of Statistics and Its Application published the report "Curriculum guidelines for undergraduate programs in data science." The report was authored by “25 undergraduate faculty from a variety of institutions in the United States, primarily from the disciplines of mathematics, statistics, and computer science.” It had a goal of providing “structure for institutions planning for or revising a major in data science.”

The current state of OSSU Data Science is one of disrepair. The curriculum has had 1 change in 3 years. That change deleted a link to a broken application. But there remained many links to courses that are no longer offered. A list of these can be found here. Prospective students have posted in the issues asking if the Data Science curriculum is still maintained. Updating the curriculum must ensure that all courses are available for students.

Proposal:
OSSU Data Science should adopt “Curriculum guidelines for undergraduate programs in data science” (CGUPDS) as our guidelines. The curriculum should be updated to match. The exact changes can be reviewed in this pull request.

RFC:

Problem:
From what I can see there is no way to track the progress, unless the user tracks it all by themself. Could we create a README.md that the user can enter a command and that class is checked off as "completed" by the program when rendered.

There are more things I would like to add to this like specific courses for specialties but I didn't know how active or dynamic the project is.

our curricular guidelines. Examples are:

  • OSSU lists course X as required when the course's topics are elective in our curricular guidelines.
  • OSSU does not having a course to cover required topic X from our curricular guidelines.
  • OSSU lists courses X, Y and Z that cover the same topics when fewer courses could suffice.
  • OSSU recommends course X to teach a topic, but there exists a higher quality course that covers the same material.

Duration:
This should most often be 1 month from the date of posting.

Background:
Give an in depth description of the problem. Describe a solution to the problem. Describe the advantages and disadvantages of this solution. This section should be a few paragraphs.

Proposal:
Give a bullet point list of changes that are being proposed. These can link to a Pull Request.

Alternatives:
Give a bullet point list of alternative ways to address the problem.

Fixing the link of Probability and Statistics

The link "Probability and Statistics with R" currently points to
https://github.com/open-source-society/data-science#probability-and-statistics-with-r

But the section "Probability and Statistics with R" has href "#probability-and-statistics".
Either of them need to be changed.

I am learning to contribute. Please help me out in creating a pull request because I want to fix the issue myself. I want this to be my first contribution towards open-source.

Add official badge

Maybe we can put an official badge in this repository, so students will be able to link back to this in their own projects.

Example: Open Source Society University - Data Science

  • Markdown: [![Open Source Society University - Data Science](https://img.shields.io/badge/OSSU-data--science-blue.svg)](https://github.com/open-source-society/data-science)
  • HTML: <a href="https://github.com/open-source-society/data-science"><img alt="Open Source Society University - Data Science" src="https://img.shields.io/badge/OSSU-data--science-blue.svg"></a>

😄

Programming languages

As data scientist python is your native language it's a bit annoying to learn an other programming language just for data structure and algorithms , So I think you may need to find alternative courses for data structure and algorithms using python

RFC: Add python alternative for algorithms course

Problem:
Our curricular guidelines do not require learning multiple languages, but our curriculum asks students to learn a language just to take the algorithms classes.

Duration:
2021 Aug 15

Background:
Our curricular guidelines makes only a few references to programming languages. Students are expected to know SQL, the "language" of math, and "a suitable high-level language". (emphasis mine) The introductory courses, and most courses, use Python as this high level language.
But students are directed to Robert Sedgewick's Algorithms course, which is taught in Java.

Students have asked for a python alternative in issues and in the discord (example).

A possible option is the free interactive textbook Problem Solving with Algorithms and Data Structures using Python. The book links to a set of supporting lectures. It also has some exercises in the text, which are paired with youtube video solutions. Each chapter ends with a set of exercises; it does not seem that there is an official solution set but some student solutions can be found on github.

This free textbook is used by dozens of college courses. It is well rated by goodreads and by pythonbooks (which is really a measure of popularity on Amazon).

It is not clear that the book is of the same quality as the Sedgwick course. For one, the Sedgewick course provides an autograder. For another, user ratings of the Sedgewick Algorithms book are notably higher.

Proposal:
Offer Problem Solving with Algorithms and Data Structures using Python as an alternative Algorithms course for students who want to study in Python.

Alternatives:

  1. Stick with the status quo.
  2. Replace the Sedgewick course rather than offer an alternative.

RFC: Remove Patreon Link

Problem:
The OSSU Patreon link at the top of the curriculum does not navigate to an active contribution page on Patreon and may be Eric's individual page. (The same link is listed on the bioinformatics curriculum).

Duration:
Until 11/23/2021

Proposal:
Propose removing the Patreon link or replacing with an updated OSSU Patreon account.

Courses in the web app are different

The Readme says that I can track my progress in the web app ("my progress" tab) however there are different courses there, e.g. the data science track features Calculus from EdX and MIT but in the web app there is only calculus from Coursera. Am I missing something?

Is this course still relevant?

Last update on this course is made 3 years ago so is it still relevant? Like are there better free courses out there than the ones listed here? I am planning on starting this. Anything I need to know?

Natural Language Processing

Hi, it seems the Natural Language Processing course linked does not exist anymore.

I recommend this course to be used in place.

If you agree, I can make a pull-request.

Probability and Statistics with R

curriculum title - "Probability and Statistics with R"

This part have nothing to do with R programming, I think the the term "r" should be excluded from the title, and an addition of curses about R and statistics is necessary.

Linear Algebra

Is there not a better Linear Algebra course? This LAFF course is painfully obtuse and dull.

Q: How long does this program typically take?

Hi, sorry if this is the wrong place to post this question but I can’t seem to find an answer. What is the typical duration for the ossu data science course (assuming a full-time study load)? Thanks!

RFC: Overhaul Statistics

Summary

OSSU should undertake a search for a number of new courses in statistics.

Background

OSSU currently recommends 2 courses on statistics:

The first of these is no longer offered.

Guidelines

OSSU Data Science uses the report Curriculum Guidelines for Undergraduate Programs in Data Science as our guide for course recommendation.

Section 6 "Transitioning To A Data Science Major Using Typical Existing Courses" states:

...The courses shown in bold are the ten courses that cover the bare minimum of the basic skills needed for data science...

Subsection 6.3 "Courses in Statistics" states:

Content in the Introduction to Statistics course should follow the revised Guidelines for Assessment and Instruction in Statistics Education (GAISE) for college courses

  • Introduction to Statistics
  • Statistical Modeling/Regression
  • Machine Learning/Data Mining
  • Theory of Statistics (requires Probability Theory)

Gaise

For reference, the K-12 GAISE report uses a framework of 3 levels of sophistication with stats expected of K-12 students. This can be found on page 24.

The GAISE College Report includes both goals, recommendations and suggestions for topics that might be omitted.

Goals (summarized)

  1. Critique stats based results/conclusions.
  2. Recognize when statistics would be useful and carry out investigations using stats.
  3. Produce graphical displays and numerical summaries. Interpret them.
  4. Explain the role of variability in statistics.
  5. Explain the central role of randomness in designing studies and drawing conclusions.
  6. Use statistical models, including multivariable models.
  7. Understand and use hypothesis tests and interval estimation in a multiple of settings.
  8. Interpret and draw conclusions from output of statistical software packages.
  9. Demonstrate an awareness of ethical issues associated with sound statistical practice.

Recommendations

These are largely recommendations for how statistics courses should be taught.

  1. Teach statistical thinking
  2. Focus on conceptual understanding
  3. Integrate real data with a context and a purpose
  4. Foster active learning
  5. Use technology to explore concepts and analyze data
  6. Use assessments to improve and evaluate student learning

Suggestions for Topics that Might be Omitted from Introductory Statistics Courses

  • Probability theory
  • Constructing plots by hand
  • Basic statistics
  • Drills with z-, t-, χ 2 , and F-tables
  • Advanced training on a statistical software program

Of note, the basic statistics section reads:

Histograms, pie charts, scatterplots, means, and medians are now taught in middle and high school and are a prominent part of the Common Core State Standards in Mathematics. Classes taught to adults continuing their education or to students with a different high school background may need to spend a bit more time on basic statistics. No matter the audience, instructors will want to be sure that students truly understand these concepts, but should not dwell on them more than is necessary. Instructors may want to briefly review them to be sure terminology and notation are consistent, but this should take little time.

Assertions

  • OSSU Data Science curriculum should not recommend a descriptive stats course. This is prerequisite material; OSSU's focus is requisite material for undergraduate learners.
  • OSSU should identify a suitable Introduction to Statistics course, replacing the two current recommendations
  • After identifying the appropriate Introduction to Statistics course, OSSU should determine if a Statistical Modeling/Regression course is necessary. I would be unsurprised if a suitably rigorous Intro Course, paired with our existing ML courses prove sufficient.
  • OSSU should identify an optional Theory of Statistics course.

Request for Comments

This RFC is asking specifically for comments on the assertions above. Are these the right steps? Are there other implications for OSSU's curriculum that are not identified?

There will be other RFCs for carrying out the individual steps (e.g. there will be a separate RFC for Identify an Introduction to Statistics course).

5 years duration !

i just calculated the duration for all courses and it was 5 years ! 😄

PROJECTS file

Hi,

It seems the projects file of computer-science path has been moved to it's own repository and I was noted that the file in open-source-society/help is not used anymore, it seems convenient to have our own PROJECTS file in this repository.

We could use the file in open-source-society/help as a starting point, but that seems too general as it includes courses from other paths, too.

related: https://github.com/open-source-society/help/pull/7

mentioning: @ericdouglas

a lot of Python Courses

There is 4 courses for Python, it's more than enough
2 Courses 4 probability
and more duplicated courses
Please refactor the content like Computer Science track

SDK update required to sign up via Web App

Steps to reproduce
Navigate to https://ossu.firebaseapp.com/#/
Click Login with Github
Error Message encountered: {"code":"SERVICE_UNAVAILABLE","statusCode":410,"message":"SDK Update Required: See https://goo.gl/2cKQWm.","details":"SDK Update Required: https://goo.gl/2cKQWm. "}
More details on updating the SDK as of December 2018 is given in below google doc.
https://docs.google.com/document/d/1vpQV8DBQLkIZci7Vh8N4LUHyTkBOsuW5eXE1x8IAqvw/edit?usp=sharing

Would like to know if we can update the SDK as per above instructions. Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.