Git Product home page Git Product logo

molecular-project's Introduction

Overview

A template file and folder structure for a data analysis project/paper done with R/Quarto/Github. Other components (e.g., other programming languages) can be added as needed.

Pre-requisites

This is a template for a data analysis project using R, Quarto, Github and a reference manager that can handle bibtex. Our recommendation for the reference manager is Zotero, with the Better BibTex plugin/extension. It is also assumed that you have a word processor installed (e.g. MS Word or LibreOffice). You need that software stack to make use of this template.

Template structure and content

The template comes with a folder structure and example files to illustrative the kinds of content you would place in the different folders. The following is a brief description of the contents. See the readme files in each folder for more details.

  • The assets folder contains static assets like manually generated schematics/diagrams, bibtex files, csl style files, PDFs of references, and other such content. These assets are not code-based and are not generated by code.

  • All code goes into the code folder and subfolders. Currently, there are 3 sub-folders that do different parts of an analysis. You can re-organize such that it makes most sense for your project. The folders contain files that do some data cleaning and analysis to illustrate the overall setup and workflow. See the readme files in those folders for details.

  • All data goes into the data folder and subfolders. Currently, there are 2 sub-folders that contain different versions of a simple example data set. You can re-organize such that it makes most sense for your project.

  • The products folder and its subfolders contains deliverables, such as manuscript/report, the supplement, slide decks, posters, Shiny web apps, etc. Those should generally be made with Quarto/R. As needed, other formats can be used. There is an example manuscript and and example slide deck.

    • The manuscript subfolder contains a template for a report written as Quarto file. If you access this repository as part of my Modern Applied Data Science course, the sections are guides for your project. If you found your way to this repository outside the course, you might only be interested in seeing how the file pulls in results and references and generates a word document as output, without paying attention to the detailed structure. There is also a sub-folder containing an example template for a supplementary material file.
    • The slides subfolder contains a basic example of slides made with Quarto.
  • The results folder contains automatically/code generated output. This includes figures, tables saved as serialized R data (.Rds) files, computed values and other outputs. All content in these folders should be automatically generated by code. Manually generated results should be avoided as much as possible. If absolutely necessary, they go into the assets folder.

  • There are multiple special files in the repo.

    • readme.md: this file contains instructions or details about the folder it is located in. You are reading the project-level README.md file right now. There is a readme in almost every folder.
    • data-analysis-template.Rproj is a file that tells RStudio that this is the main folder for a project. Rename if you want.
    • a few "hidden" files and folders (they start with a . and depending on how your OS is configured, you might not see them). Those are for R/RStudio and Git/GitHub and you can ignore them.

Naming conventions

We try to follow these naming conventions for folders and files:

  • Somewhat descriptive and easy to understand names.
  • Only lower-case letters (and numbers if needed). Words separated by a -.

For instance there is a folder called analysis-code with a file called exploratory-analysis-v2.qmd in it. We don't use _ or blank spaces for separators. We also don't use CamelCase, only lower-case. Exceptions are made for standard file endings, for instance R scripts end in .R (instead of .r).

Package management

It is recommended to use renv to manage R packages and increase chances of future reproducibility. This is required if you are using the template as part of a research project for our group. Otherwise, you can decide to implement renv or not. This can happen at any stage, though earlier in the project is generally better.

If you plan to use renv, start by reading the introduction to renv article so you know how to use it.

Getting started

This is a Github template repository. The best way to get it and start using it is by following these steps.

Once you got the repository, you can check out the examples by executing them in order. First run the processing code, which will produce the processed data. Then run the analysis scripts, which will take the processed data and produce some results. Then you can run the manuscript, poster and slides example files in any order. Those files pull in the generated results and display them. These files also pull in references from the bibtex file and format them according to the CSL style.

molecular-project's People

Contributors

smhammerton avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.