Git Product home page Git Product logo

learn's People

Contributors

maurolepore avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

learn's Issues

Avoid re-writing your git credentials every time

By default git will ask your username and password each time you commit. To avoid this, the safest approach is to create a SSH key. To create a SSH key read Initial Setup here http://r-pkgs.had.co.nz/git.html#git-init (item 4.).

If after adding a SSH key git continues to ask for your credentials, try removing the remote and adding it again.

git remote remove origin
git remote add origin [email protected]:owner/repo.git

If all of the above fails you may try to cache your credentials or to save them in a text file for git to access:

  • To cache your credential you can read this (but I found it confusing)
  • To store your credentials read Store Credentials here. For example:
$ git config credential.helper store

# Important: push to some repo using the remote address explicitely. 
# (Git must first "learn" the format of the address; see Storage Format at https://goo.gl/t7W8ri)
# Example
$ git push https://github.com/forestgeo/fgeo.tool.git
# or
$ git push [email protected]:forestgeo/fgeo.tool.git
Username: <type your username>
Password: <type your password>

[several days later]
$ git push http://example.com/repo.git
[your credentials are used automatically]

Remember: this option is less secure than a SSH key.

The zen of design

From https://www.python.org/dev/peps/pep-0020/#id3

The Zen of Phyton Design (edit mine)

Beautiful is better than ugly.

Explicit is better than implicit.

Simple is better than complex.

Complex is better than complicated.

Flat is better than nested.

Sparse is better than dense.

Readability counts.

Special cases aren't special enough to break the rules.

Although practicality beats purity.

Errors should never pass silently.

Unless explicitly silenced.

In the face of ambiguity, refuse the temptation to guess.

There should be one--and preferably only one--obvious way to do it.

Although that way may not be obvious at first unless you're Dutch.

Now is better than never.

Although never is often better than right now.

If the implementation is hard to explain, it's a bad idea.

If the implementation is easy to explain, it may be a good idea.

Namespaces are one honking great idea -- let's do more of those!

--Tim Peters [email protected]

Publish R products with one click from RStudio

Clicking only one button from RStudio, we can now publish R products including R Markdown documents, Shiny applications, R plots, and so on. For example: https://bookdown.org/forestgeoguest/mpart

This service makes it easier to follow the progress of an analysis. Readers can view the latest version of any publication directly online, without exchanging emails.

Privacy can be controlled at different levels, from completely private, through to accessible to only specific users, all the way to completely public.

Any email accounts can be linked to this service. For now, our is [email protected].

Connecting a new account to RStudio Connect via blogdown.org

upload your [R-product] to https://bookdown.org, which is a website provided by RStudio to host your [R-products] for free. This website is built on top of “RStudio Connect”, an RStudio product that allows you to deploy a variety of R-related applications to a server, including R Markdown documents, Shiny applications, R plots, and so on.

---https://bookdown.org/yihui/bookdown/rstudio-connect.html

To connect RStudio Connect

  • login to the account (e.g. gmail account) that you want to add to the RStudio Connect publish button.
  • click the little dropdown arrow next to the "publish" button
  • choose "manage accounts",
  • type bookdown.org as the server URL
  • follow the prompts

image

The tidyverse style guide

To apply this style guide automatically, use lintr (https://github.com/jimhester/lintr).

Good coding style is like correct punctuation: you can manage without it, butitsuremakesthingseasiertoread. Just as with punctuation, while there are many code styles to choose from, some are more reader-friendly than others. The style presented here, which is used throughout the tidyverse, is derived from Google’s R style guide and has evolved considerably over the years.

--The tidyverse style guide, by Hadley Wickham (http://style.tidyverse.org/)

GitHub work flow and continuous integration of R packages

To develop our software I recommend to follow the GitHub flow:

  1. Create a branch from the repository.
  2. Create, edit, rename, move, or delete files.
  3. Send a pull request from your branch with your proposed changes to kick off a discussion.
  4. Make changes on your branch as needed. Your pull request will update automatically.
  5. Merge the pull request once the branch is ready to be merged.
  6. Tidy up your branches using the delete button in the pull request or on the branches page.

This is, I recommend to collaborate using the shared repository development model (simplest):

In the shared repository model, collaborators are granted push access to a single shared repository and topic branches are created when changes need to be made. Pull requests are useful in this model as they initiate code review and general discussion about a set of changes before the changes are merged into the main development branch. This model is more prevalent with small teams and organizations collaborating on private projects.

This approach allows continuous integration

Continuous integration (CI) is a software engineering practice in which isolated changes are immediately tested and reported on when they are added to a larger code base. The goal of CI is to provide rapid feedback so that if a defect is introduced into the code base, it can be identified and corrected as soon as possible. Continuous integration software tools can be used to automate the testing and build a document trail.

Continuous integration via TravisCI is highly recommended to automatically check R Packages after every commit.

If you use git and GitHub ... I highly recommend learning about Travis. Travis is a continuous integration service, which means that it runs automated testing code everytime you push to GitHub. ... For an R package, the most useful code to run is devtools::check().

Issue template

(From dplyr.)

Please briefly describe your problem and what output you expect. If you have a question, please don't use this form, but instead ask on the mailing list https://groups.google.com/forum/#!forum/manipulatr or http://stackoverflow.com.

Please include a minimal reprex. The goal of a reprex is to make it as easy as possible for me to recreate your problem so that I can fix it. If you've never heard of a reprex before, start by reading https://github.com/jennybc/reprex#what-is-a-reprex, and follow the advice further down the page. Do NOT include session info unless it's explicitly asked for, or you've used reprex::reprex(..., si = TRUE) to hide it away.

Delete these instructions once you have read them.


Brief description of the problem

# insert reprex here

Create a reproducible example with `reprex()`

library(reprex)

Copy the the next "code paragraph" chunk to clipboard:

A reproducible example:

df <- data.frame(x = 1:10, y = rep(c("D", "A"), 5))
df$index <- rownames(df)
df
sub <- df[df$y != "D", ]
(is_max <- sub$x == max(sub$x))
sub[is_max, ]

Run this and read the message on the console.

reprex()

Now, your clipboard has a reproducible example. Paste it in an issue.

If you paste your clipboard contents you get this:

# A reproducible example:
df <- data.frame(x = 1:10, y = rep(c("D", "A"), 5))
df$index <- rownames(df)
df
#>     x y index
#> 1   1 D     1
#> 2   2 A     2
#> 3   3 D     3
#> 4   4 A     4
#> 5   5 D     5
#> 6   6 A     6
#> 7   7 D     7
#> 8   8 A     8
#> 9   9 D     9
#> 10 10 A    10
sub <- df[df$y != "D", ]
(is_max <- sub$x == max(sub$x))
#> [1] FALSE FALSE FALSE FALSE  TRUE
sub[is_max, ]
#>     x y index
#> 10 10 A    10

The code above can be copied and pasted directly into the console. The comments are special: it lets you compare your output with the output above.

What users can/'t push directly based on privileges

via

Users can be given different levels of privilege for specific repos. Users with:

  • admin privilege can always push directly, even to protected branches,
  • write privilege can push directly except to protected branches, where they need to push on issue branch via pull request,
  • read privilege can never push, but fork and propose a pull request.

Ignoring files like .Rproj so that they don't show up in Repository

Tip found here

"Often, there are files that you don’t want to include in the repository. They might be transient (like LaTeX or C build artefacts), very large, or generated on demand. Rather than carefully not staging them each time, you should instead add them to .gitignore. This will prevent them from accidentally being added. The easiest way to do this is to right-click on the file in the Git pane and select Ignore"

image

Initial Set up

Initial setup

(Adapted from http://r-pkgs.had.co.nz/git.html#git-init)

If you havent yet created an account on GitHub, do it now at
https://github.com. The free plan is fine; you can request unlimited free
private repos at https://education.github.com/discount_requests/new. Your
request will more likely be successful if you use an @edu email account.

Also install Git and link Git and GitHub together using the same email addredd:

  1. Install Git:

    During installation, accept all the defaults.

  2. Launch Git Bash; and tell Git your name and email address. These are used to
    label each commit so that when you start collaborating with others, it's
    clear who made ach change. In the shell, run:

    git config --global user.name "YOUR FULL NAME"
    git config --global user.email "YOUR EMAIL ADDRESS"

    You can check if you're set up correctly by running:

    git config --global --list

(If git says: "*** Please tell me who you are.", see comment below and #32.)

The following steps are optional:

If you want avoid typing your username and password each time you
make a commit, generate a SSH key. SSH keys allow you to securely communicate
with websites without a password.

  1. Go to RStudio preferences (Tools > Global Options...), choose the Git/SVN
    panel, and click "Create RSA key..." (or create the key from git as explained here).

  2. Give GitHub your SSH public key: https://github.com/settings/ssh.
    The easiest way to find the key is to click "View public key" in
    RStudio's Git/SVN preferences pane.

Given one local project, keep two remote branches, one public and one private

(from https://24ways.org/2013/keeping-parts-of-your-codebase-private-on-github/)

Given one local project, you can keep two remote branches, one public and one private. This requires some experience with git, possibly from the shell. Without experience the process seems complicated but solves some problems that users may occasionally face.

For example, consider https://github.com/forestgeo/test and https://github.com/forestgeo/test-private (you may acces as forestgeotest or forestgeoguest). One local repo in my computer pushes to two remotes

  • branch master pushes to a pubic remote repository,
git push -u https://github.com/forestgeo/test.git master
  • branch master-private pushes to a private remote repository,
git push -u https://github.com/forestgeo/test-private.git master-private

The commands above where used only the first time each branch pushed. Because I used -u, next I push from any branch simply with:

git push

Conventions

http://r4ds.had.co.nz/introduction.html

  • Functions are in a code font and followed by parentheses, like sum(), or mean().

  • Other R objects (like data or function arguments) are in a code font, without parentheses, like flights or x.

  • If we want to make it clear what package an object comes from, we’ll use the package name followed by two colons, like dplyr::mutate(), or nycflights13::flights. This is also valid R code.

  • Package names are in bold, e.g. dplyr, except when they are in code such as dplyr::mutate().

Learning road map, with RStudio Webinars (~45min each)

Learning road map, with RStudio Webinars (~45min each)(file with links to resources listed below ).

If you want to teach yourself, these webinars are fun and excellent
Programming Part 1 (Writing code in RStudio): Single most important step to know RStudio.

Project management
Managing Change Part 1 (Projects in RStudio): Single most important step in better managing projects.
Programming Part 3 (Package writing in RStudio)

GitHub
Managing Change Part 2 (Github and RStudio)
Collaboration and time travel: version control with git, github, and RStudio*

Data science workflow

Tools
Import: Get your data into R
Wrangle: Data Wrangling with R and RStudio
Visualize:The Grammar and Graphics of Data Science
Communicate: Reproducible reporting

Concepts
Data Science in the Tidyverse*
Pipelines for data analysis in R*

*By Hadley Wickham (Chief Scientist at RStudio and Adjunct Professor of Statistics at Rice University)

Using functions from other packages inside functions of your own package (NAMESPACE issues)

Read this if one of the following applies to you:

  • You are using library(package) or require(package) inside your functions;
  • You don't know exactly when you need the syntax package::function() or package:::function() as opposed to function();
  • You don't know when you should list a package in the file DESCRIPTION, under the section Imports;
  • You don't know when to use the tag @importFrom .
  • The output of devtools::check() warns you about undefined global variables.

https://twitter.com/mauro_lepore/status/930607639574319106

Publish with privacy control

To publish with privacy control there are some options:

  1. Publish to a private repo and manage access with GitHub

  2. Publish to https://bookdown.org/ via RStudio connect. Let people with an RStudio connect account (or a guest, e.g. forestgeoguest) to viewers and collaborate. They need to login through e.g. google (using e.g. [email protected])

good practices in real statistical/ecological projects

this is a list of suggested basic good practices in real projects like those of CTFS:

version control for data and code

  • copy, then touch
  • a version is a copy with a unique name
  • don't use the f word ("final") on the file names
  • use dates in the file names
  • use git and github if you can
  • record R version, packages versions

automation

  • touch the script, not the console.
  • write and edit, don't try to remember a sequence of clicks etc
  • don't touch raw data
  • don't touch processed data
  • don't touch figures
  • don't touch the text if you can avoid it

separated directories in self-contained projects

  • raw vs processed data
  • typical directories: raw data, processed data, scripts, results
    (figures, tables), text (manually updated), other directories?

document

  • humans do read scripts
  • use English
  • use at least raw text
  • use other tools if you can (Rmarkdown etc)

good practices for coding or programming are a different story, see issue #17 . But, in general, a typical script could look like this:

  1. erase everything
  2. tell where the stuff is
  3. set the seed and maybe other general parameters (e.g. date for
    version control)
  4. load functionality
  5. load data
  6. review and clean data
  7. do interesting things
  8. keep results

this script-level structure is much more subjective and
project-dependent than the other things, though.

What do you think?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.