datalorax / sds-r Goto Github PK

View Code? Open in Web Editor NEW

9.0 4.0 4.0 63.29 MB

Repo for a draft book on social data science methods with R

Home Page: https://sds.pub

License: Creative Commons Attribution 4.0 International

CSS 37.90% HTML 0.61% R 61.49%

social-data-science rstats data-science r

sds-r's People

Contributors

Stargazers

Watchers

Forkers

makwhit6 joeswinehart ivaleehan pursuitofdatascience

sds-r's Issues

Fix tibble printing

Not sure what's going on. The tibble outputs print fine on my local but look messed up here.

Update README

@ohmaddi can you update the README to mention that you are a co-author as well?

@brendanhcullen I know you already started on the package for learnr tutorials. I think we need a standalone package for the book. It could be the one you started, or we could start a new one and port what you have in that over. The only reason I think the latter might be a better idea is just because then we could have the package name be sds or sdsr to match the book title.

The main reason I think we want a package for the book is because (a) we can create our own themes for different types of plots that we use throughout the book, and (b) we can use it as a way to share data, so they don't have to clone the repo or something to get the data. Incorporating the learnr tutorials there also makes sense.

Happy to start this or work from what you have, Brendan. Let me know what you each think.

Git Chapter review

Review bagged trees and random forests chapter

I merged in the full draft so students have access to it, but we'll still need a review at some point (doesn't have to be right away).

Taking out time-intensive chunks

I think we should basically manually cache a lot of the big chunks. Here's how I did it with the boosted trees chapter (which I'll be pushing tonight - it's done but I'm having issues with other finicky things I was trying to clean up).

Basically we'll have a new folder called "models" (which will also be up there after I push the boosted tree stuff) and then we just save the models after we run them there. Then we only echo the code that shows the model running, and eval (but don't echo) the code that reads the model in. This is way faster and will help our build time a lot. If we don't do this, I don't know if there is a way to get the book to build with the cache without committing it which I really don't want to do.

I can try to submit a new PR for this probably later tonight or maybe tomorrow.

Decision Trees

Basically just a note that the next chapter I'll be working on will be Decision Trees. I'll be trying to get a full draft by the end of the week.

Boosted trees

This is the next chapter I'll be working.

.gitignore and deploy

I think we should ignore the _book in the main repo, but then build out the GH Actions so that it deploys to a gh-actions branch. We can then point netlify to that branch, instead of the main/_book directory, and everything should work well.

Small style edits

Add GitHub icon to navbar that links back here
Add a default fill color. Could be any of the colors that look good in the --highligh-* vars in the CSS
@brendanhcullen tries to add sidenotes

The first two are simple, the third seems much more difficult. But, I also wanted to mention that if you're able to get this to work we should move the page navigation to be at the bottom always. Right now, page navigation is on the sides of the text if the window is wide enough, and at the bottom otherwise. If we can build in side-notes, we should probably just have it at the bottom always.

Review Decision Tree chapter

Same as #22 but the for the Decision Tree chapter.

Review FE chapter

It would be great if one or both of you could review the FE chapter. Basically, I think just look at the rendered version and look at the R Markdown and make any edits you think are needed. If there's something you think is not clear that needs further revision from me, feel free to add in a comment in the document.

In terms of workflow, I think just create a branch and go for it, then submit a PR with your review. There may be better ways, but that should generally work okay. If you have ideas on better workflow, I'm fully open to them.

Bagged trees

Just a note that this is the next chapter I'll be working on. Hope to have a draft by Monday the 16th at the latest, ideally.

Intro to R

Following Daniel's strategy, this is a note to finalize the intro by end of this week.

Tibble printing

Is still messed up. The lock file has the latest version of tibble so I'm not sure what's going on.

Book splitting

There's a number of different options for how to split the book. Currently the table of contents shows too many levels of headers by default. I think there's a way we can change that. I'd like it just to show the chapter level unless you're on that chapter. I think that's possible but not sure. This is mostly a note to myself to come back to this later.

Git chapter

Just an FYI that this will be the next chapter I start working on.

Full book outline

I think we're at the point where we can fairly safely create a full book outline. It's starting to get messy in terms of organization without it. Once we get that, we may want to go ahead and create empty Rmd's with just the headers for each chapter, then we can look at #24 a bit easier. I'd like to have four sections: Foundations, Data Visualization, Functional Programming, and Machine Learning. Then probably 10-ish chapters in each section.

I can draft an outline sometime in the next few days, but if either of you feel up for taking the first crack at it, that would be great. As I've mentioned before, I basically just want to mirror the courses.

Review boosted trees chapter

Who wants it?!