Git Product home page Git Product logo

project_template's Introduction

Data Science Project Template

This is a personal way of structuring projects built through observation and personal experience that helped me planning and scaling up without getting lost.

.
├── config.yml
├── data
├── envs
├── LICENSE
├── README.md
├── reports
├── results
├── src
├── start_project.sh
└── workflows

Rationale

  • config.yml: hand-curated list of external files and parameters required for the project.
  • data: keep raw and preprocessed data organized.
  • envs: conda environments to run the main project and, if necessary, create more.
  • LICENSE
  • reports: discuss your insights with a project webpage created with jupyter-book.
  • results: store files and final plots for every experiment.
  • src: project modules.
  • workflows: to download, preprocess, and analyze your data.

Typical workflow

  1. Modify config.yml to your taste adding variables that could be useful project-wide.
  2. Create the workflows to download and preprocess your project's data at workflows/download/ and workflows/preprocess, respectively. Make sure to distinguish between code that can be used project-wide -place it in the project's modules in src/ and call the functions in your workflow-; or code that is only used specifically for that part of the project -place it in your workflow's scripts/ subdirectory-.
  3. Now, you can analyse your data creating different experiments as subdirectories of workflows/analyses that will get inputs from data/ and will output at results/your_experiment_name/.
  4. Commit your work, and consider adding README files.
  5. Inspect and explore results creating jupyter notebooks at reports/notebooks/ that can be rendered into static webpages with jupyter-book. Structure your project's book by modifying reports/_toc.yml.

Requirements (for this use case)

  • an environment manager: e.g. conda
  • a workflow manager: e.g. snakemake
  • (optional) a webpage builder: e.g. jupyter-book

Installation

# clone repository
git clone https://github.com/MiqG/project_template
cd project_template

# removes git remote
bash start_project.sh

# remove start_project.sh
rm start_project.sh

Structure

.
├── config.yml
├── data
│   ├── prep
│   ├── raw
│   └── references
├── envs
│   └── main.yml
├── LICENSE
├── README.md
├── reports
│   ├── _config.yml
│   ├── images
│   │   └── logo.png
│   ├── notebooks
│   │   ├── example_notebook.md
│   │   ├── intro.md
│   │   └── README.md
│   ├── README.md
│   └── _toc.yml
├── results
│   ├── new_experiment
│   │   ├── files
│   │   │   └── output_example.tsv
│   │   └── plots
│   │       └── output_example.pdf
│   └── README.md
├── src
│   └── python
│       ├── setup.py
│       └── your_project_name
│           └── config.py
├── start_project.sh
└── workflows
    ├── analyses
    │   └── new_experiment
    │       ├── README.md
    │       ├── run_all.sh
    │       ├── scripts
    │       │   └── workflow_step.py
    │       └── snakefile
    ├── download
    │   ├── README.md
    │   ├── run_all.sh
    │   ├── scripts
    │   │   └── workflow_step.py
    │   └── snakefile
    ├── preprocess
    │   ├── README.md
    │   ├── run_all.sh
    │   ├── scripts
    │   │   └── workflow_step.py
    │   └── snakefile
    └── README.md

References

Have fun!

project_template's People

Contributors

miqg avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.