Stat 133: Concepts in Computing with Data

Calendar

Instructor: Gaston Sanchez
Lecture: MWF 3:00-4:00pm VLSB 2050
Tentative calendar (weekly topics), subject to change depending on the pace of the course.
Notes (:file_folder:) involves material discussed in class.
Reading (:book:) involves material that expands lecture topics, as well as coding examples that you should practice on your own.
Misc (:newspaper:) is supporting material that is worth taking a look at.

0. Course Introduction

📇 Dates: Jan 22-25
📎 Topics: Welcome to Stat 133. We begin with the usual review of the course policies/logistics, expectations, topics in a nutshell, etc. Then, we move on with an unconventional introduction to computing with data using my favorite analogy "Data Analysis is a lot like Cooking".
📁 Notes:
📖 Reading:
🔬 Lab: No lab
📰 Misc:
- What is Data Science?
🔈 To Do:
- Install R
- Install RStudio Desktop (open source version, free)

1. The Big Picture and R Survival Skills

📇 Dates: Jan 28-Feb 01
📎 Topics: First things first. At the conceptual level we'll discuss how data analysis projects usually start with a Research Question. Also, we'll describe how Data can actually be seen from a triangular perspective (i.e. my "3 Views of Data"). At the practical level, you'll begin learning basic survival skills for R, followed by an overall review of the RStudio workspace. Then we move on to discuss basic data types and their implementation in R around vectors and other data structures.
📁 Notes:
- The Starting Point: Research Questions
- The Three Views of Data
- Be the Boss of your Data (talk and chalk)
- Data Types and Vectors
📖 Reading:
- First contact with R (tutorial)
- Intro to Rmd files (tutorial)
🔬 Lab:
- Getting started with R and RStudio (due Feb-01, open till Feb-17)
📰 Misc:
- Introduction to R Markdown (by RStudio)
💡 Cheat sheet:
- RStudio cheat sheet
- R markdown cheat sheet
🎯 WARM-UP 1:
- Markdown practice (due Feb-03, open till Feb-17)

2. More Data Structures: Arrays, Lists, and Dataframes

📇 Dates: Feb 04-08
📎 Topics: In this week you'll keep learning more about R data structures like arrays and lists. More specifically, we'll focus on fundamental concepts like atomicity, vectorization, recycling, and subsetting. And given that we are studying vectors and its cousins, we'll briefly review the traditional base graphics approach that is based on R vectors.
📁 Notes:
- Arrays and Factors and Lists
- Data Frames part 1 and part 2
- Data Tables (introduction) and Spreadsheets
📖 Reading:
- Intro to vectors (tutorial)
- Intro to Data Technologies (preface, chapter 1, and chapter 5) (by Paul Murrell)
🔬 Lab:
- Getting started with vectors and factors (due Feb-08, open till Feb-17)
📰 Misc:
- chapter 20: Vectors (R for Data Science by Grolemund and Wickham)
💡 Cheat sheet:
- Base R
🎯 WARM-UP 2:
- Basic Data Objects (due Feb-10, open till Feb-17)

3. Transforming and Visualizing Tabular Data

📇 Dates: Feb 11-15
📎 Topics: Because data tables are so ubiquitous, you will have the chance to practice some data manipulation operations on data frames. Also, we'll discuss some considerations when importing tables (in R). Likewise, we begin a comprehensive discussion on concepts for data visualization.
📁 Notes:
- Importing tables part 1 and part 2
- Datavis: Classic Examples and Introduction
- Datavis: Encoding Data in Graphs
- Datavis: The Visual System
📖 Reading:
- "dplyr" tutorial slides (by Hadley Wickham)
🔬 Lab:
- Data Frame Basics (due Feb-17)
📰 Misc:
- Organizing data in spreadsheets (by Karl Broman)
- tibbles vignette
- Introduction to dplyr (by Hadley Wickham)
💡 Cheat sheet:
- Data transformation cheat sheet
- Data visualization with ggplot2
🎯 WARM-UP 3:
- Basic Data Manipulation (due Feb-17)

4. More Visualization

📇 Dates: Feb 18-22 (Holiday Feb-18)
📎 Topics: We continue reviewing more concepts of data visualization. At the practical level, it's important that you learn how to manipulate them via R data frames in a more modern and syntactic way. How? By following the data plying framework provided by the package "dplyr".
📁 Notes:
- Datavis: Using Color
- Datavis: Effective Charts
📖 Reading:
- "ggplot2" lecture (by Karthik Ram)
🔬 Lab:
- Data Wrangling and Graphics (due Feb-22)
📰 Misc:
- Tidy Data (by Hadley Wickham)
💡 Cheat sheet:
- Data transformation cheat sheet
🎯 WARM-UP 4:
- More Data Wrangling (due Feb-27)

5. Housekeeping: Filesystem and Bash Commands

📇 Dates: Feb 25-Mar 01
📎 Topics: Data Analysis Projects (DAPs) are made of files and directories. Therefore, we need to review some fundamental concepts such as the file-system, the command line interface, and some basic shell commands.
📁 Notes:
📖 Reading:
- Linux Tutorial lessons 1-5 (by Ryan Chadwick)
- The Unix Shell lessons 1-3 (by Software Carpentry)
🔬 Lab:
- Command Line Basics (due Mar-01)
📰 Misc:
- Linux Command Line tutorial (by Guru99)
💡 Cheat sheet:
- command line cheat sheet
🎯 WORK-OUT 1:
- TBA (due Mar-13)

6. Housekeeping: Version Control with Git and GitHub

📇 Dates: Mar 04-08
📎 Topics: We continue talking about filestructure topics, and we introduce basic notions of version control systems (VCS) using Git, and the companion hosting platform GitHub. On the Data side, we begin our discussion about Tables: the most common form in which data is stored, handled, and manipulated. Consequently, we need to talk about the typical storage formats of tabular data, and the relationship between tables and R data frames.
📁 Notes:
- Git Basics (slides)
- Git Workflow (slides)
- Data Tables (slides)
- Importing Tables in R (slides)
📖 Reading:
- Read sections 4 to 9 in Part I Installation (Happy Git and GitHub for the useR by Jenny Bryan et al.)
- Basic manipulation of Data Frames (slides)
🔬 Lab:
- Git Basics (due Mar-08)
- Get your Github Classroom repo
📰 Misc:
- Data Import (R for Data Science by Grolemund and Wickham)
💡 Cheat sheet:
- Data import cheat sheet
- git cheat sheet
🎓 MIDTERM 1: Friday Mar-08

7. Transition to Programming Basics for Data Analysis (part 1)

📇 Dates: Mar 11-15
📎 Topics: You don’t need to be an expert programmer to be a data scientist, but learning more about programming allows you to automate common tasks, and solve new problems with greater ease. We'll discuss how to write basic functions, the notion of R expressions, and an introduction to conditionals.
📁 Notes:
- Creating functions (tutorial)
- Introduction to functions (tutorial)
- Introduction to R expressions and conditionals (tutorial)
🔬 Lab:
- Getting started with functions and conditionals (due Mar-15)
📰 Misc:
- chapter 19: Functions (R for Data Science by Grolemund and Wickham)
🎯 WARM-UP 5:
- TBA (due Mar-20)

8. Programming Basics for Data Analysis (part 2)

📇 Dates: Mar 18-22
📎 Topics: In addition to writing functions to reduce duplication in your code, you also need to learn about iteration, which helps you when you need to do the same operation several times. Namely, we review control flow structures such as for loops, while loops, repeat loops, and the apply family functions.
📁 Notes:
- Introduction to loops (tutorial)
- More about functions (tutorial)
- Functions (Advanced R by H. Wickham)
- Environments (Advanced R by H. Wickham)
🔬 Lab:
- Getting started with loops (due Mar-22)
📰 Misc:
- chapter 21: Iteration (R for Data Science by Grolemund and Wickham)
🎯 WARM-UP 6:
- TBA (due Apr-03)

9. Spring Recess

📇 Dates: Mar 25-29
📎 Topics: Recharge your batteries

10. Manipulating Character Strings and Testing Functions

📇 Dates: Apr 01-05
📎 Topics: At its heart, computing involves working with numbers. However, a considerable amount of information and data is in the form of text. Therefore, you also need to learn about character strings, and how to perform basic manipulation of strings. In parallel, we'll keep working on writing functions, especially focusing on testing functions.
📁 Notes:
- Intro to testing functions (tutorial)
- Character strings in R (r4strings by Sanchez)
- Basic string manipulations (r4strings by Sanchez)
📖 Reading:
- testthat: Get started with testing (by Wickham)
🔬 Lab:
- Getting started with strings (due Apr-05)
📰 Misc:
- chapter 14: Strings (R for Data Science by Grolemund and Wickham)
💡 Cheat sheet:
- Stringr cheat sheet
🎯 WORK-OUT 2:
- TBA (due Apr-17)

11. Regular Expressions

📇 Dates: Apr 08-12
📎 Topics: To unleash the power of strings manipulation, we need to take things to the next level and learn about Regular Expressions. Namely, Regular expressions are a tool that allows us to describe a certain amount of text called "patterns". We'll describe the basic concepts of regex and the common operations to match text patterns.
📁 Notes:
- Regexpal tester tool.
- Long Jump World Record example
- Log file example
📖 Reading:
- Handling Strings in R (by Sanchez)
🔬 Lab:
- Regular Expressions (due Apr-12)
💡 Cheat sheet:
- Regular Expressions cheat sheet
🎯 WORK-OUT 2:
- Keep working on your workout02 assignment.

12. Random Numbers, Simulations, and Shiny Apps

📇 Dates: Apr 15-19
📎 Topics: Random numbers have many applications in science and computer programming, especially when there are significant uncertainties in a phenomenon of interest. In this part of the course we'll look at some basic problems involving working with random numbers and creating simulations. Jointly, we will briefly discuss Shiny apps to better visualize the results of some simulations. This type of apps are a nice companion to R, making it quick and simple to deliver interactive analysis and graphics on any web browser.
📁 Notes:
- Introduction to random numbers
- Coin toss shiny app
- shiny tutorial (by Grolemund)
📖 Reading:
- Part 1 - How to build a Shiny app (video)
🔬 Lab:
- Random numbers and simulations (due Apr-19)
📰 Misc:
- Part 2 - How to customize reactions (video)
- Part 3 - How to customize appearance (video)
💡 Cheat sheet:
- shiny cheat sheet
🎯 WARM-UP 7:
- TBA (due Apr-24)

13. R packaging (part 1)

📇 Dates: Apr 22-26
📎 Topics: Packages are the fundamental units of reproducible R code. They include reusable functions, the documentation that describes how to use them, and sample data. In this part we'll start describing how to turn your code into an R package.
📁 Notes:
- Programming S3 Classes
- Methods (by Sanchez)
📖 Reading:
- Package Structure (R packages by Wickham)
- See package components: http://r-pkgs.had.co.nz/ (R packages by Wickham)
🔬 Lab:
- HTML and Web scraping (due Apr-26)
💡 Cheat sheet:
- Package Development cheat sheet
🎯 WORK-OUT 3:
- TBA due May-03

14. R Packaging (part 2)

📇 Dates: Apr 29-May 03
📎 Topics: Creating an R package can seem overwhelming at first. So we'll keep working on the creation of a relatively basic package. This will give you the opportunity to apply most of the concepts seen in the course.
📁 Notes:
- Pack YouR Code (by Sanchez)
📖 Reading:
- See package components: http://r-pkgs.had.co.nz (R packages by Wickham)
🔬 Lab:
- Take advantage of lab discussion to work on the workout03 assignment
💡 Cheat sheet:
- Package Development cheat sheet

15. RRR Week and Final Exam

📇 Dates: May 06-10
📎 Topics: Prepare for final examination
📁 Notes:
- No lecture. Instructor will hold OH (in 309 Evans)
🎓 FINAL: May-15th, 7-10 pm (Room TBA)
- More details about the final will be posted on bCourses

vikilyc / stat133-spring-2019 Goto Github PK

stat133-spring-2019's Introduction

Stat 133: Concepts in Computing with Data

Calendar

0. Course Introduction

1. The Big Picture and R Survival Skills

2. More Data Structures: Arrays, Lists, and Dataframes

3. Transforming and Visualizing Tabular Data

4. More Visualization

5. Housekeeping: Filesystem and Bash Commands

6. Housekeeping: Version Control with Git and GitHub

7. Transition to Programming Basics for Data Analysis (part 1)

8. Programming Basics for Data Analysis (part 2)

9. Spring Recess

10. Manipulating Character Strings and Testing Functions

11. Regular Expressions

12. Random Numbers, Simulations, and Shiny Apps

13. R packaging (part 1)

14. R Packaging (part 2)

15. RRR Week and Final Exam

stat133-spring-2019's People

Contributors

Recommend Projects

Recommend Topics

Recommend Org