Git Product home page Git Product logo

Comments (1)

github-learning-lab avatar github-learning-lab commented on August 26, 2024

We're going to combine several things into this assignment. You'll be asked to make some modifications to your training repository and also to create a pull request that captures these changes.

Background

We use team conventions for how our pipelines are organized, which make it easier to hop in and out of collaborative projects and to rapidly understand what is going on where.

We refer to major elements of a pipeline as "phases", and name phases according to their purpose, such as 1_fetch or 2_process. These phases are used to separate files and data based on the intent of the code we are writing, and make it tractable to figure out where you'd need to edit code if you were coming in fresh to the project.

For medium to large pipelines projects, you'll see these workflow phases explicitly named by a number often followed by a verb (separated by an underscore). We use these phases to create different folders 📁 for data and code, and also to specify how we orchestrate the running of code (more on that later).

So, if we have a 1_fetch phase, code in the fetch folder 📁 would be used to do things like get data from web services, google drive, an FTP, or to scrape a website. 2_process (or 2_munge) might contain code that transforms the "fetched" data into more usable formats.

We recommend having src and out folders within each phase folder that contain code for this phase (src) and data (or other files) produced by this phase (out). When seeing some of our existing pipelines in action, you will also see other folders 📁 named in, log, and tmp to represent manually added files, logged/diagnostic output, and temporary data files, respectively.

⌨️ Activity: Restructure your code repository to follow our team's conventions for folders and files

Create a two phase directory structure for "fetch" and "process" concepts, and include src and out folders. Move the example script (my_happy_script.R) from the my_work_R folder into one of the src folders (at this time, it doesn't matter which one you choose) and delete any existing folders that aren't part of the intended structure.

When you are done, open a pull request with the changes.


Check your new pull request for a comment from me (you might have to wait a few seconds).

from ds-pipelines-targets-1.

Related Issues (6)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.