Git Product home page Git Product logo

rddj-reproducibility-workflow's Introduction

A workflow for reproducible and transparent data journalism with R and GitHub

This repo is tutorial & example at the same time, yay!

The goal of this workflow is to automagically upload your knitted RMarkdown file to a GitHub Page and to "build" your R code into a zipped folder which can be downloaded by your readers. Ideally, you would reference this zipped folder from within your knitted RMarkdown.

Note: The repo from which the knitted RMarkdown is served to the GitHub Page can also be private!

Steps

Step 0

Init an empty repository & and add remote:

mkdir rddj-reproducibility-workflow
cd rddj-reproducibility-workflow
git init
# replace the following with your account and repo
git remote add origin https://github.com/grssnbchr/rddj-reproducibility-workflow.git 

Add a .gitignore to ignore standard R output files & project files as well as the tmp folder we'll need for building

.Rdata
.Rhistory
.Rprofile
main.html
output/*

Step 1 (repetitive)

All your "productive" R code goes into one RMarkdown file, but you can include source files (see main.Rmd).

You can work with your repo as you would with any other, doing stuff like

git add
git commit
git push
...

Step 2

Now you want to publish your RMarkdown, and, ideally, your whole R script (together with the input files) on GitHub Pages.

Initially, and only once, you need to do the following in your working directory:

  • Start a new branch gh-pages
git checkout -b gh-pages
  • remove everything except gitignore (need to enable an extension in Bash shells in order for this command to work)
shopt -s extglob
git rm -rf !(.gitignore)
git add -u
  • make an initial commit
git commit -m "first commit to gh-pages branch"

Step 3

For deployment, we want the following:

(* The RMarkdown should be automagically knitted to HTML)

  • The knitted RMarkdown file (main.html) should be pushed as index.html, so it is shown on the GitHub Page
  • The R code and the input files should be made available for download as a zipped folder, so everyone can rerun the RMarkdown and/or modify the code and produce the output folder.

In order to automate this deployment process, we create a little shell script.

First, make sure you are in the master branch:

git checkout master

Then, fire up your favorite editor and create a shell script called deploy.sh in the top folder, with the following content:

#!/bin/bash
# first, knit
# only works if you have pandoc > 1.9.0 installed 
# R -e "rmarkdown::render('main.Rmd')"
# make temporary copy of the stuff we want to commit in with all data we need in build
mkdir tmp
cp main.Rmd tmp/
cp -r input tmp/
cp processData.R tmp/ # replace this with the name your subroutines and add more, if needed
# switch to gh-pages branch
git checkout gh-pages
# rename index file (the processed main.Rmd) from master branch
mv main.html index.html
# make folder for rscript
mkdir rscript
# copy over necessary scripts from master branch 
cp -r tmp/* rscript/

# zip the rscript folder
zip -r rscript.zip rscript
# remove the rscript folder
rm -rf rscript
# remove temporary folder
rm -rf tmp
# add everything for committing
git add .
# commit in gh-pages
git commit -m "build and deploy to gh-pages"
# push to remote:gh-pages
git push origin gh-pages 
# checkout master again
git checkout master

At the end, make the script executable

chmod 755 deploy.sh

Step 4 (repetitive)

Now, every time you want to deploy your updated RMarkdown and your R script to your GitHub page, you can

./deploy.sh

And your knitted RMarkdown will magically find its way into username.github.io/reponame. Note: This also works when reponame is a private repo!

For this to work best, make sure you are in the master branch and you have a clean working directory!

In the case of this demonstration repo, the results are viewable under http://grssnbchr.github.io/rddj-reproducibility-workflow.

rddj-reproducibility-workflow's People

Contributors

grssnbchr avatar

Watchers

Bill Alpert avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.