forestgeo / learn Goto Github PK

View Code? Open in Web Editor NEW

5.0 5.0 8.0 3 KB

Links to interesting articles, videos, tutorials, tips, and more

learn's People

Contributors

Stargazers

Watchers

Forkers

fdbesanto2 ashish-ranjan-dev winniebuna juliop1980 vikashprajapati-cell lazuraslong aje-dotcom

learn's Issues

Saving and loading objects in: saveRDS() vs. save(), and load() vs. readRDS()

Thanks @gabrielareto!

http://www.fromthebottomoftheheap.net/2012/04/01/saving-and-loading-r-objects/

Avoid re-writing your git credentials every time

By default git will ask your username and password each time you commit. To avoid this, the safest approach is to create a SSH key. To create a SSH key read Initial Setup here http://r-pkgs.had.co.nz/git.html#git-init (item 4.).

If after adding a SSH key git continues to ask for your credentials, try removing the remote and adding it again.

git remote remove origin
git remote add origin [email protected]:owner/repo.git

If all of the above fails you may try to cache your credentials or to save them in a text file for git to access:

To cache your credential you can read this (but I found it confusing)
To store your credentials read Store Credentials here. For example:

$ git config credential.helper store

# Important: push to some repo using the remote address explicitely. 
# (Git must first "learn" the format of the address; see Storage Format at https://goo.gl/t7W8ri)
# Example
$ git push https://github.com/forestgeo/fgeo.tool.git
# or
$ git push [email protected]:forestgeo/fgeo.tool.git
Username: <type your username>
Password: <type your password>

[several days later]
$ git push http://example.com/repo.git
[your credentials are used automatically]

Remember: this option is less secure than a SSH key.

Track figures, tables, etc. from article back to code with tag snippets

Tag important parts of your papers (figures, tables, analyses, p values, etc.) with a unique identifier. Then use Ctrl + . (Code > Go To File/Function) to find the file and line where you placed the tag. Make the tag a string; and describe what the tag is tagging.

I do this often, so I wrote a code snippet (https://goo.gl/FLmG48 and https://goo.gl/3VPZvc).

Struggling with `devtools`? Instead, install packages with `remotes`

https://twitter.com/mauro_lepore/status/933471081452720128

Install from GitHub with:

source("https://install-github.me/<user>/<repo>")

remotes::install_github("<user>/<repo>")

The zen of design

From https://www.python.org/dev/peps/pep-0020/#id3

The Zen of Phyton Design (edit mine)

Beautiful is better than ugly.

Explicit is better than implicit.

Simple is better than complex.

Complex is better than complicated.

Flat is better than nested.

Sparse is better than dense.

Readability counts.

Special cases aren't special enough to break the rules.

Although practicality beats purity.

Errors should never pass silently.

Unless explicitly silenced.

In the face of ambiguity, refuse the temptation to guess.

There should be one--and preferably only one--obvious way to do it.

Although that way may not be obvious at first unless you're Dutch.

Now is better than never.

Although never is often better than right now.

If the implementation is hard to explain, it's a bad idea.

If the implementation is easy to explain, it may be a good idea.

Namespaces are one honking great idea -- let's do more of those!

--Tim Peters [email protected]

Publish R products with one click from RStudio

Clicking only one button from RStudio, we can now publish R products including R Markdown documents, Shiny applications, R plots, and so on. For example: https://bookdown.org/forestgeoguest/mpart

This service makes it easier to follow the progress of an analysis. Readers can view the latest version of any publication directly online, without exchanging emails.

Privacy can be controlled at different levels, from completely private, through to accessible to only specific users, all the way to completely public.

Any email accounts can be linked to this service. For now, our is [email protected].

Connecting a new account to RStudio Connect via blogdown.org

upload your [R-product] to https://bookdown.org, which is a website provided by RStudio to host your [R-products] for free. This website is built on top of “RStudio Connect”, an RStudio product that allows you to deploy a variety of R-related applications to a server, including R Markdown documents, Shiny applications, R plots, and so on.

---https://bookdown.org/yihui/bookdown/rstudio-connect.html

To connect RStudio Connect

login to the account (e.g. gmail account) that you want to add to the RStudio Connect publish button.
click the little dropdown arrow next to the "publish" button
choose "manage accounts",
type bookdown.org as the server URL
follow the prompts

The tidyverse style guide

To apply this style guide automatically, use lintr (https://github.com/jimhester/lintr).

Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread. Just as with punctuation, while there are many code styles to choose from, some are more reader-friendly than others. The style presented here, which is used throughout the tidyverse, is derived from Google’s R style guide and has evolved considerably over the years.

--The tidyverse style guide, by Hadley Wickham (http://style.tidyverse.org/)

GitHub work flow and continuous integration of R packages

To develop our software I recommend to follow the GitHub flow:

Create a branch from the repository.
Create, edit, rename, move, or delete files.
Send a pull request from your branch with your proposed changes to kick off a discussion.
Make changes on your branch as needed. Your pull request will update automatically.
Merge the pull request once the branch is ready to be merged.
Tidy up your branches using the delete button in the pull request or on the branches page.

This is, I recommend to collaborate using the shared repository development model (simplest):

In the shared repository model, collaborators are granted push access to a single shared repository and topic branches are created when changes need to be made. Pull requests are useful in this model as they initiate code review and general discussion about a set of changes before the changes are merged into the main development branch. This model is more prevalent with small teams and organizations collaborating on private projects.

This approach allows continuous integration

Continuous integration (CI) is a software engineering practice in which isolated changes are immediately tested and reported on when they are added to a larger code base. The goal of CI is to provide rapid feedback so that if a defect is introduced into the code base, it can be identified and corrected as soon as possible. Continuous integration software tools can be used to automate the testing and build a document trail.

Continuous integration via TravisCI is highly recommended to automatically check R Packages after every commit.

If you use git and GitHub ... I highly recommend learning about Travis. Travis is a continuous integration service, which means that it runs automated testing code everytime you push to GitHub. ... For an R package, the most useful code to run is devtools::check().

Issue template

(From dplyr.)

Please briefly describe your problem and what output you expect. If you have a question, please don't use this form, but instead ask on the mailing list https://groups.google.com/forum/#!forum/manipulatr or http://stackoverflow.com.

Please include a minimal reprex. The goal of a reprex is to make it as easy as possible for me to recreate your problem so that I can fix it. If you've never heard of a reprex before, start by reading https://github.com/jennybc/reprex#what-is-a-reprex, and follow the advice further down the page. Do NOT include session info unless it's explicitly asked for, or you've used reprex::reprex(..., si = TRUE) to hide it away.

Delete these instructions once you have read them.

Brief description of the problem

# insert reprex here

Let users install packages from private repos via GUEST_TOKEN

Stuart may give guests a GUEST_TOKEN to download packages from private repos with:

GUEST_TOKEN <- "<GUEST_TOKEN>"
devtools::install_github("forestgeo/<PRIVATE_PACKAGE>", auto_token = GUEST_TOKEN)

Create a reproducible example with `reprex()`

library(reprex)

Copy the the next "code paragraph" chunk to clipboard:

A reproducible example:

df <- data.frame(x = 1:10, y = rep(c("D", "A"), 5))
df$index <- rownames(df)
df
sub <- df[df$y != "D", ]
(is_max <- sub$x == max(sub$x))
sub[is_max, ]

Run this and read the message on the console.

reprex()

Now, your clipboard has a reproducible example. Paste it in an issue.

If you paste your clipboard contents you get this:

# A reproducible example:
df <- data.frame(x = 1:10, y = rep(c("D", "A"), 5))
df$index <- rownames(df)
df
#>     x y index
#> 1   1 D     1
#> 2   2 A     2
#> 3   3 D     3
#> 4   4 A     4
#> 5   5 D     5
#> 6   6 A     6
#> 7   7 D     7
#> 8   8 A     8
#> 9   9 D     9
#> 10 10 A    10
sub <- df[df$y != "D", ]
(is_max <- sub$x == max(sub$x))
#> [1] FALSE FALSE FALSE FALSE  TRUE
sub[is_max, ]
#>     x y index
#> 10 10 A    10

The code above can be copied and pasted directly into the console. The comments are special: it lets you compare your output with the output above.

How to contribute

A good way to contrubute to this package is described, for example, here.

Why use purrr::map instead of lapply?

https://twitter.com/mauro_lepore/status/928784233669234689

What users can/'t push directly based on privileges

via

Users can be given different levels of privilege for specific repos. Users with:

admin privilege can always push directly, even to protected branches,
write privilege can push directly except to protected branches, where they need to push on issue branch via pull request,
read privilege can never push, but fork and propose a pull request.

Find all in an RStudio project

use ctrl + .

Ignoring files like .Rproj so that they don't show up in Repository

Tip found here

"Often, there are files that you don’t want to include in the repository. They might be transient (like LaTeX or C build artefacts), very large, or generated on demand. Rather than carefully not staging them each time, you should instead add them to .gitignore. This will prevent them from accidentally being added. The easiest way to do this is to right-click on the file in the Git pane and select Ignore"

Prune/Cleanup the local references to remote branch

git remote prune origin

(from https://goo.gl/rG5myw)

Grep text in file names

git ls-files | grep "data"

Request unlimited free private repos

For researchers, teachers, students, etc.:
https://education.github.com/discount_requests/new

For a classroom setting:
https://education.github.com/guide/private_repos

If you fail to get free unlimited private repositories from GitHub, try from Bitbucket (it is their default).

Initial Set up

Initial setup

(Adapted from http://r-pkgs.had.co.nz/git.html#git-init)

If you havent yet created an account on GitHub, do it now at
https://github.com. The free plan is fine; you can request unlimited free
private repos at https://education.github.com/discount_requests/new. Your
request will more likely be successful if you use an @edu email account.

Also install Git and link Git and GitHub together using the same email addredd:

Install Git:
- Windows: http://git-scm.com/download/win.
- OS X: http://git-scm.com/download/mac.
- Debian/Ubuntu: sudo apt-get install git-core.
- Other Linux distros: http://git-scm.com/download/linux.
During installation, accept all the defaults.
Launch Git Bash; and tell Git your name and email address. These are used to
label each commit so that when you start collaborating with others, it's
clear who made ach change. In the shell, run:
```
git config --global user.name "YOUR FULL NAME"
git config --global user.email "YOUR EMAIL ADDRESS"
```
You can check if you're set up correctly by running:
```
git config --global --list
```

(If git says: "*** Please tell me who you are.", see comment below and #32.)

The following steps are optional:

If you want avoid typing your username and password each time you
make a commit, generate a SSH key. SSH keys allow you to securely communicate
with websites without a password.

Go to RStudio preferences (Tools > Global Options...), choose the Git/SVN
panel, and click "Create RSA key..." (or create the key from git as explained here).
Give GitHub your SSH public key: https://github.com/settings/ssh.
The easiest way to find the key is to click "View public key" in
RStudio's Git/SVN preferences pane.

Given one local project, keep two remote branches, one public and one private

(from https://24ways.org/2013/keeping-parts-of-your-codebase-private-on-github/)

Given one local project, you can keep two remote branches, one public and one private. This requires some experience with git, possibly from the shell. Without experience the process seems complicated but solves some problems that users may occasionally face.

For example, consider https://github.com/forestgeo/test and https://github.com/forestgeo/test-private (you may acces as forestgeotest or forestgeoguest). One local repo in my computer pushes to two remotes

branch master pushes to a pubic remote repository,

git push -u https://github.com/forestgeo/test.git master

branch master-private pushes to a private remote repository,

git push -u https://github.com/forestgeo/test-private.git master-private

The commands above where used only the first time each branch pushed. Because I used -u, next I push from any branch simply with:

git push

Downloading and uploading single files from and to a repo

DOWNLOAD
From https://goo.gl/qi7zck

Go to the file you want to download.
Click it to view the contents within the GitHub UI.
In the top right, right click the Raw button.
In your web browser, Save as... (Windows: Ctrl + S)

UPLOAD

On the repo, click the button upload files

Best practice to install a package from a private repo

More
If you install packages from private repos often, passing your authorization token to install_github() becomes tedious. You can avoid that pain very easily. Here is how:
http://bit.ly/auth_token

Discounts for students, teachers, administrators and researchers on GitHub-Educaiton

https://education.github.com

https://twitter.com/mauro_lepore/status/941393586075168769

Insert screen-shot images in an R Markdown document with greenshot and imur.

https://goo.gl/QpTdAi

Initial setup

from http://r-pkgs.had.co.nz/git.html#git-init

1.1 Open Git Bash

Windows key + "Git"

4.1 In R or RStudio, you can check if you already have an SSH key-pair by running:

file.exists("~/.ssh/id_rsa.pub")

Conventions

http://r4ds.had.co.nz/introduction.html

Functions are in a code font and followed by parentheses, like sum(), or mean().
Other R objects (like data or function arguments) are in a code font, without parentheses, like flights or x.
If we want to make it clear what package an object comes from, we’ll use the package name followed by two colons, like dplyr::mutate(), or nycflights13::flights. This is also valid R code.
Package names are in bold, e.g. dplyr, except when they are in code such as dplyr::mutate().

Make Git “forget” file that was tracked but is now in .gitignore

How to make Git “forget” about a file that was tracked but is now in .gitignore?

https://gist.github.com/maurolepore/cc966a117fe53d87f05d0158fde649c2

Introduce Team Mentions

Introducing Team Mentions

Reformat code for easy read

The most important thing to write clear code (code that humans can read) is style. For example,

good <- function() {
   x = "Nicely formated"
}

bad <- function(){x="Badly formatted"}

To nicely format my code, I use this style guide. To help you comply with that style you can use three things:

1. lintr

https://github.com/jimhester/lintr

2. Shortcut Ctrl + Shift + A

3. RStudio Addin

Grep text in files

git grep "text"

Learning road map, with RStudio Webinars (~45min each)

Learning road map, with RStudio Webinars (~45min each)(file with links to resources listed below ).

If you want to teach yourself, these webinars are fun and excellent
Programming Part 1 (Writing code in RStudio): Single most important step to know RStudio.

Project management
Managing Change Part 1 (Projects in RStudio): Single most important step in better managing projects.
Programming Part 3 (Package writing in RStudio)

GitHub
Managing Change Part 2 (Github and RStudio)
Collaboration and time travel: version control with git, github, and RStudio*

Data science workflow

Tools
Import: Get your data into R
Wrangle: Data Wrangling with R and RStudio
Visualize:The Grammar and Graphics of Data Science
Communicate: Reproducible reporting

Concepts
Data Science in the Tidyverse*
Pipelines for data analysis in R*

*By Hadley Wickham (Chief Scientist at RStudio and Adjunct Professor of Statistics at Rice University)

RStudio IDE Easy Tricks You Might've Missed

Happy Git(Hub) for the {Human} useR (selected slides)

Most common reaction to git

When in git you "save" a file, which is called a commit, git records

the changes to the file
other important metadata
- who
- when
- a message (optional, useful if says why, not what)

Resources to learn GitHub

https://bookdown.org/maurolepore/idigbio/idigbio.html#16

Build a package for your reasearch paper or data paper.

For an example, see the following:

repository https://github.com/dfalster/baad
paper http://onlinelibrary.wiley.com/doi/10.1890/14-1889.1/abstract
package https://github.com/traitecoevo/baad.data

For a template to build your own paper-package see: https://github.com/forestgeo/pkg

To learn how to build R packages in greater detail see http://r-pkgs.had.co.nz/

Using functions from other packages inside functions of your own package (NAMESPACE issues)

Read this if one of the following applies to you:

You are using library(package) or require(package) inside your functions;
You don't know exactly when you need the syntax package::function() or package:::function() as opposed to function();
You don't know when you should list a package in the file DESCRIPTION, under the section Imports;
You don't know when to use the tag @importFrom .
The output of devtools::check() warns you about undefined global variables.

https://twitter.com/mauro_lepore/status/930607639574319106

On TravisCI treat warnings not as errors

Add this to the file .travis.yml:

warnings_are_errors: false

Best Practices for Scientific Computing

Recommended by Helene Muller-Landau:

Squash messy history into a clean commit and merge to master

With Git

https://stackoverflow.com/questions/5308816/how-to-use-git-merge-squash

git checkout master
git merge --squash bugfix
git commit

With GitHub

https://github.com/blog/2141-squash-your-commits

Publish with privacy control

To publish with privacy control there are some options:

Publish to a private repo and manage access with GitHub
Publish to https://bookdown.org/ via RStudio connect. Let people with an RStudio connect account (or a guest, e.g. forestgeoguest) to viewers and collaborate. They need to login through e.g. google (using e.g. [email protected])

Using RStudio on a cloud

https://twitter.com/mauro_lepore/status/933679801252089856

Error in Git's set upGit: “please tell me who you are”

Thanks @hqzzlo for this:

This is Qing. Here is the link that I found super helpful
https://stackoverflow.com/questions/11656761/git-please-tell-me-who-you-are-error

How to create a GitHub account

video: How to create a GitHub account

good practices in real statistical/ecological projects

this is a list of suggested basic good practices in real projects like those of CTFS:

version control for data and code

copy, then touch
a version is a copy with a unique name
don't use the f word ("final") on the file names
use dates in the file names
use git and github if you can
record R version, packages versions

automation

touch the script, not the console.
write and edit, don't try to remember a sequence of clicks etc
don't touch raw data
don't touch processed data
don't touch figures
don't touch the text if you can avoid it

separated directories in self-contained projects

raw vs processed data
typical directories: raw data, processed data, scripts, results
(figures, tables), text (manually updated), other directories?

document

humans do read scripts
use English
use at least raw text
use other tools if you can (Rmarkdown etc)

good practices for coding or programming are a different story, see issue #17 . But, in general, a typical script could look like this:

erase everything
tell where the stuff is
set the seed and maybe other general parameters (e.g. date for
version control)
load functionality
load data
review and clean data
do interesting things
keep results

this script-level structure is much more subjective and
project-dependent than the other things, though.

What do you think?

Plot species distributions

Visualizing the distribution of species in a plot is a common exploratory task. This little tutorial shows one flexible way to plot species distributions: http://rpubs.com/forestgeo/plot_sp