Git Product home page Git Product logo

4oss-lesson's People

Contributors

abbycabs avatar allegravia avatar bebatut avatar brandoncurtis avatar c-martinez avatar davidbenncsiro avatar erinbecker avatar evanwill avatar fmichonneau avatar fpsom avatar gvwilson avatar ianlee1521 avatar jduckles avatar jpallen avatar katrinleinweber avatar malvikasharan avatar mawds avatar maxim-belkin avatar mkuzak avatar neon-ninja avatar orchid00 avatar pbanaszkiewicz avatar pfern avatar pipitone avatar rgaiacs avatar synesthesiam avatar tobyhodges avatar twitwi avatar vdda avatar wclose avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

4oss-lesson's Issues

REC1: README section (minor fixes)

Take a look at the README.md file in the repository that you created in the introduction

  • automatically displayed on the front page of your repository
  • project name and subheading emphasised by default, using Markdown syntax
  • italics, bold, link, list

The list seems a bit disconnected (especially the last bullet point). I think it could be removed and replaced with a link to the Markdown cheatsheet.

read more about Markdown here: https://guides.github.com/features/mastering-markdown/

Replace/add the full URL with a clickable URL.

as well as the README & LICENSE .md files

Should be "In addition to the README & LICENSE .md files
Also, in the same list maybe start each bullet point with a capital letter (plus update the link in bullet point 2 with a clickable version).

REC1: Documentation - Challenge

I don't think that "create a simple page of documentation for your software using the GitHub page" is a great exercise. R has a different way to write documentation. I think that is more important ask people to write a minimal README-like documentation. I would replace this exercise with something like

For the Python or R script provided below, write a minimalist documentation that includes the best practices mentioned before. After write your documentation, swap it with your neighbour and provide feedback to each other.

Python Script

import pandas

def __main__():
    csv = pandas.read_csv()
    csv.plot()

R Script

library(ggplot2)
csv <- read.csv()
ggplot(csv, aes(x=x, y=y)) + geom_point()

Notes about the scripts

The scripts should have 12 lines or less. It should depend of a popular third-party library. It should read a CSV file. It should create a data visualisation.

The Reference link in the Extras dropdown menu links to Key Points, not to actual references

We need to change the current reference.md to auto-populate with references from episodes, instead of with Key Points (as it is now).
To this aim, we should:

  • add references in episode headers
  • move current reference.md content to, e.g., key-point.md
  • create an item ("Key Points") in the Extras dropdown menu pointing to key-point.md
  • modify current reference.md so that it auto-populates with references from episode headers

First episode needs more discussion and restructuring

The structure of this episode is currently confusing. Here are the potential changes that we (Mateusz and Allegra) discussed in Genoa (the suggested section headers mentioned in the following are just meant to give an idea of the corresponding content):

  1. The first part of the "Hosting a project on GitHub" section (text until - and including - the bulleted point list) is not about GitHub and should be part of an initial "What are the benefits of making your software project public from the beginning?" section.
  2. This (new) first section should be followed by a second section "How do I make my project publicly accessible?". This section should have the following sub-sections:
  • "Use a version control system to keep track of changes in your code"; the discussion about "How does version control help your research", could be moved to this section
  • "Hosting a project on GitHub", in which the part about GitHub would benefit from a bit of expansion.
  • "Documenting your code". This section should have the following sub-sections:
  1. "Write a good README - The 'front page' of your project". I suggest that we refer to README text files that should be always associated to a coding project (not only on GitHub) with description of the project, instructions for installation and usage, and examples. Then I would talk about the README.md file on GitHub. We could also provide links to guidelines on how to write nice README files (for example this or this, to a template to make a good README.md file and to good examples (see this list for example, and this example in particular).
  2. "Documentation"; The Discussion at the beginning of this section ("What experiences have you had with good or bad documentation") should be changed into the following one: Think about software that you wanted to install and use in your work. Think from the user perspective: was it easy to download it? Was it easy to install it? What kind of documentation was provided? There was any info you would have liked to find associated with the software (but you didn't)?
    The developer perspective is not relevant for this discussion.
    In order to reflect from the developer perspective, we can introduce a Challenge: Have you ever written documentation for your own software? If yes, what type? How? Where? If not, why?
    Mateusz will provide the link for the last Challenge of this episode.
  • "Make your software (re)usable"
    When discussing the importance of good code, we should stress the importance of commenting the code (to make it more easily readable by collaborators and therefore more accessible)

  • "Publishing"
    This section needs expansion.


Here is how I see the structure (to be discussed) of this episode:

A) "What are the benefits of making your software project public from the beginning?"

B) "How do I make my project publicly accessible?".

  • "Use a version control system to keep track of changes in your code";
  • "Hosting a project on GitHub"
  • "Documenting your code".
    - "Write a good README - The 'front page' of your project"
    - "Documentation"
  • "Make your software (re)usable"
  • "Publishing"

Resources for contributing

- https://opensource.guide/how-to-contribute/
- https://opensource.guide/building-community/
- https://ropensci.org/community/
- https://en.wikipedia.org/wiki/Wikipedia:Contributing_to_Wikipedia
- https://help.github.com/articles/about-project-boards/
- https://onboarding.ropensci.org/
- https://help.github.com/articles/about-project-boards/
- https://github.com/marketplace/category/project-management
    -- Free for open source projects
    https://github.com/marketplace/zenhub
    https://github.com/marketplace/zube free up to 4 users
    https://github.com/marketplace/waffle 
    https://github.com/marketplace/issue-sh free up to 5 users
    http://sciencetogether.online/tools/
- https://www.openproject.org/
- https://gitlab.com/
- https://opensource.com/
- http://mozillascience.github.io/working-open-workshop/
- https://github.com/mozillascience/working-open-workshop/blob/gh-pages/handouts/contributing.md
- https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32018H0790 point 12

Code of conduct - examples

- https://www.contributor-covenant.org/

Order of Episodes

Order of episodes/recommendations should be the following:

  1. 3rd recommendation "Adopt a licence and comply with the licence of third-party dependencies"
  2. 1st recommendation "Make source code publicly accessible from day one"
  3. 4th recommendation "Define clear and transparent contribution, governance and communication processes"
  4. 2nd recommendation "Make software easy to discover by providing software metadata via a popular community registry"

Content of introduction

We didn't discuss what will go into the introductory episode. A couple of thoughts:

  1. The licensing episode relies on learners having already created a project on GitHub, which means that we need to move that section out of episode 3, either into the introduction or as a prerequisite (we can put the instructions into extras or get rid of it entirely and link to another online guide)
  2. The introduction seems like a good place to answer the question "why open source?" I would like to see this acknowledge and address some of the "fears" listed in the supplementary materials of the paper (link), perhaps as an exercise/icebreaker.

Collaboration

Make clear what your project is and how to contribute, for new and existing collaborators

From Good enough practices for scientific computing

  • Create an overview of your project. (MUST INCLUDE)
  • Create a shared "to-do" list for the project. (MUST INCLUDE)
  • Decide on communication strategies. (MUST INCLUDE)
  • Make the license explicit. (will be part of the license recomendation)
  • Make the project citable (where to add?)

Except for the To-do list all of the above should be on the README file. I'm working on this

more on communication channels

Communication channels not only depend on the size of the team but also on persistency, urgency, provenance etc. There is a good overview in Producing Open Source Software

REC2: Stress out the importance of meta-data in software

  • Discussion: what is your favorite/most used tool. Does it have metadata? Where is it accessible from?
  • Clarify that the metadata captured may be different. Not standardized metadata can create problems (show the GEO "Age" figure)

Test run feedback

I organized a workshop on "Good practices in scientific software development" and had this lesson as part of the workshop (knowing full well that the lesson is still under development). I got some feedback after the workshop, so I'm sharing that here:

  • Most participants found the recommendations were good to know, specially at the early stages of their PhD.
  • Even so, some participants found that the pace was a bit slow and we could have gone through this part of the workshop a bit faster.
  • On episode 3, the exercise "Highlighting the importance of metadata", a lot of participants struggled because of not remembering the names of actors, directors, etc, so that is something to think about.

Metadata episode - review after the hackathon utrecht August 1-3

Leyla, Carlos, Fatma, Radka,
@ljgarcia @c-martinez

I just merged the PRs on the metadata episode.
#45
#38
#36
#32
#31
#30

Thanks for your contributions on the metadata episode, and your effort for addressing many comments. You can now continue to work on it.

How to continue?

  • I recommend going through the issues https://github.com/SoftDev4Research/4OSS-lesson/issues and close those that have been already addressed.
  • When merging I noticed there is an exercise that has the same title, one of those might need to be removed or updated: "Exercise: Using a registry, e.g., bio.tools" It might have been my mistake, but there were about 25+ conflicts :S
  • One of the PR was changing participants for learners, I didn't know which one to pick. I particularly like participants better.
  • I would make a suggestion to look for a definition outside of wikipedia.
  • Take a look at extra resources: https://www.ands.org.au/guides/metadata-working
  • Continue the conversation on twitter, to find more examples!

This Issue can be closed or saved as record to add comments.

Add definition of notebook

In 4OSS-lesson/_episodes/02-make-it-public.md, we have the "What if your software is an analysis in a Notebook" but there is no definition of the term "notebooks", rather we start writing about Jupyter straightway.

It'll be useful to include a general definition of "notebooks". @mkuzak is currently writing one.

REC1: Documentation

Replace

For end-user, a good documentation should clearly state what problems the software is designed to solve and who the target audience is. Installation instruction should be added clearly-stated list of dependencies. Ideally these should be handled with an automated package management solution. The documentation should also depict some examples of how to use the software (ideally to solve real-world analysis problems), explanation of the expected inputs and outputs.

with

For end-user, a good documentation should clearly state what problems the software is designed to solve and who the target audience is. Installation instruction should include a list of dependencies and, ideally, suggestion of how to install the dependencies. The documentation should also depict some examples of how to use the software with explanation of the expected inputs and outputs.

Replace

The software API should be documented to a suitable level. In the best documentation, all functions/methods are documented including example inputs and outputs.

with

If the software has an API, all functions/methods from the API should be documented including example inputs and outputs.

Setup

Write a short setup,

how to create a repo etc

Episode Contributions - review after the hackathon utrecht August 1-3

@Pfern @fmichonneau @malvikasharan

Thanks for your contributions to the Contributions episode.
I just merged the PR.
#24 and addressed the smaller changes.

How to continue? Please see the comments listed below. The order is: Number, topic and what to change and who suggested it. If you rather not address any of the TODOs please share the reasoning.

General comments:
We could work on one point each as they are four. Or we could discuss here in comments each point and then proceed.
Also I suggest giving it a full review now that is merged, and add points that need to be modified. If you see any short typos, please send smaller PR's.

Reviews numbered

  1. To allow meaningful contributions,
    it is crucial to make your project FAIR

TODO:
@mkuzak: how do the meaningful contributions are connected to being FAIR?
@orchid00: also I'm not sure how to make the connection, but there might be sub points to relate.

  1. If you are not yet convinced that having your project open for contributions is

TODO:
@mkuzak: I don't know, how me not being sure about the advantages of opening the project has to do with questions on how people can contribute.

  1. For the developers to make relevant contributions, it is also worth thinking...

TODO:
@mkuzak: This is a very long and hard to understand sentence. Can you rewrite and simplify it?

  1. Technical requirements (software versions, space, etc)
    TODO:
    @mkuzak: what kind of space? hard drive space?

Comments on metadata episode

Thanks for your contributions !
Reviewed August 3,

A few points to take care of

  • please consistently write metadata instead of meta-data
  • I've removed dots after objectives, but kept them after key points
  • please consistently write life sciences instead of Life Sciences
  • please avoid using just
  • what is the the NetCDF API and maybe add a link?
  • About good standards maybe should be a H2?
  • the table |Code|metadata| is not very clear or consistent in the type of information it hold per cell
  • please use " instead of โ€œ
  • please see how to put together these two sections:
    • Examples in Life Sciences:
    • existing meta-data platforms and tools

Licence episode - review after the hackathon utrecht August 1-3

@allegravia @rgaiacs @JenHarrow

I just merged the PR on the Licence episode.
#52

Thanks for your contributions on the Licence episode.

How to continue? please address the comments listed below. There are short things to fix in this episode, please see below and contact the reviewer for questions. If you rather not address any of the TODOs please explain the reasoning.

Recommended to make smaller PR's clearly stating the change so I can merge easily. Thanks!

General comments:
@orchid00: "licence" and "license": British English: spell it licence when you use it as a noun and license when you use it as a verb

@tobyhodges: You've done a great job of capturing and explaining a lot of potentially difficult concepts and information here +1 There's quite a lot to read through here. Is it possible to cut the amount of text down a bit? You've had good ideas for the exercises - encouraging learners to discuss the consequences and implications of choosing different licences is a smart way of invoking peer instruction and (I hope) will reduce the amount of time that the instructor has to spend talking :)

Reviews numbered

  1. "First objective."
  • Tell what is a copyright and what a licence does
  • tell why is important that a product/code has a licence

TODO:
@mkuzak: why it is important

@ljgarcia: Could a verb other than "tell" be used for the objectives? Telling and understanding is not necessarily the same.

  1. In the title h3: Different licenses for different types of products (even data)
    TODO:
    @mkuzak: why it is important

  2. Typo " For example, if your data analysis in R uses ggplot2 for the visualisation, ggplot2 would be a third-party dependence."

TODO:
@mkuzak: third-party dependence > third-party dependency

@fmichonneau: In the same parragraph.
I think it's a little more subtle than this here. Licenses of dependencies will affect your project differently whether you need to include the code in your own project, or whether you are simply linking to it. There are probably very few cases where including the source code of ggplot2 inside another project would make sense, and I worry it will give the wrong impression to the learners. For instance, you could compare the fs package which includes the libuv C++ library; and the magick package which does not include the source code from Image Magick but links to it. In the first case, the fs package needs to be released with a GPL3 license because of the libuv package. In the second case, the magick packages is released with an MIT license even though ImageMagick is released under a custom GPL-compatible license.

  1. Exercise:

"Linking this library statically or dynamically with other modules is making a combined work based on this library. Thus, the terms and conditions of the GNU General Public License cover the whole combination."

TODO:
@mkuzak: explain what does static and dynamic linking mean
@ljgarcia: +1
@LourensVeen: Good luck :). It's pretty clear for compiled and linked languages, but for modern stuff like Python...

  1. Point 4. Select your license by opening it page
    TODO:
    @ tobyhodges: "opening its page" would be correct

  2. "Fair rules" - parody, criticism, quotation - are an exception to these rights. For example, you can publish a study comparing the performance or the accuracy of several different programs, without infringing the copyright of the programs.

TODO:
@tobyhodges: Should this be "fair use" rules? That's the terminology that I'm familiar with

  1. Scenario: you receive a script from your supervisor asking you to modify it to analyse a different type of input data.

TODO:
@tobyhodges: Did the supervisor write the script? This was unclear to me from the wording. Perhaps, Your supervisor sends you a script that they wrote, asking you to modify [...]. Or: Abdul's supervisor sends him a script that she wrote, asking him to [...]

@LourensVeen: Usually, you and your supervisor would be working for the same entity on a work-for-hire basis, so is this really a copyright issue? That may actually be interesting to discuss here, so it's not a bad example at all, but maybe it should be mentioned somewhere?

  1. What an open source licence does?

TODO:
@tobyhodges: What does an open source licence do?
Or, What an open source licence does

  1. Exercise:
    Work in pairs. Ask learners to say wich permissions they have from a licence of a given software. Write permissions on sticky notes and stick them under each software name. Instructors should push learners to be more creative in terms of permissions. For example, can you sell the software? Can you sell support?

TODO:
@tobyhodges: This is a nice exercise. I'd recommend to rewrite so that it's aimed at the learners.
@https://github.com/fmichonneau: It seems this challenge needs a little more information. Any software? what does "permissions" refer to (do you mean "use cases")?

  1. Image: http://journals.plos.org/ploscompbiol/article/figure/image?size=large&download=&id=10.1371/journal.pcbi.1002598.g002

TODO:
@tobyhodges: This is an awesome figure! Can you add an image credit?

Before the image @LourensVeen: I would avoid the term "open access" here, as it's something different, and most versions of open access (notably green and gold open access) are not at all equivalent to open source. "continued access to the software and its source code" would be better.

  1. Fix typo

TODO:
@fmichonneau distibute -> distribute

  1. Copyright protects the creative expression of ideas, patents protect novel technological inventions, trademarks protect ownership.

TODO:
@fmichonneau:I think it would be useful to provide a little more details here, with some examples for each word.

@LourensVeen: I think "trademarks protect brand identities" is a more accurate statement.

  1. Question: are you allowed to use the photos that you downloaded for your machine learning algorithm?

TODO:
@fmichonneau: Adding some guidelines/possible solutions for this challenge will be useful to instructors
@LourensVeen: And they should probably mention that the terms and conditions of the website (a contract, not a copyright license) play a role here, lest people take an affirmative answer for a blanket permission to do this always.

14: e.g., by email.

TODO:
@fmichonneau: maybe switch the "share by email" example to another use case, it's probably best to encourage better practices even in the examples slightly_smiling_face

  1. Select the chosen license

TODO:
@fmichonneau: It might be worth adding that different communities have different practices. When preparing an R package to be submitted to CRAN the format of the LICENSE file needs to follow a strict format, and be named LICENSE (not LICENSE.md for instance). The usethis package makes this easy by having functions that do the right thing depending on the license you choose: use_mit_license(), use_gpl3_license(), etc...

  1. Copyright is .... in that parragraph

TODO:
@ljgarcia: Is there any source that can be cited here?

17: Scenario: you receive a sticker and your friends want a copy.

TODO:
@ljgarcia: What kind of sticker? Is this a logo or so? I suppose depending on the sticker there could be no copyright?

  1. Question: are you allowed to use the photos that you downloaded for your machine learning algorithm?

TODO:
@ljgarcia: Do you plan to add a solution here?
Could you use maybe a solution here rather than a challenge?

  1. third-party dependencies are any library

TODO:
@ljgarcia: It does not have no be a library, it could be just a component... maybe "code" rather than "library" here?
@LourensVeen: and not just of the library, but also the rest of the code ("Corresponding Source'). It's a strong copyleft, your users are interacting with the whole thing (similarly to a statically linked binary), and that's what section 13 says.

20: Choose a licence for my code
TODO:
@ljgarcia: Choosing?

  1. "a licence and all future versions [...]".

TODO:
@ljgarcia: Do you mean "the current version and all future versions [...]"?

  1. Add the previously selected license to your GitHub repository

TODO:
@ljgarcia: "the previously selected", selected where? Cannot they change to select a license they like better than the prevoiously selected one?

  1. each Contributor hereby grants to ...

TODO:
@c-martinez: I asked one of my colleagues (who know a lot more about licenses than I do) to have a quick look at the challenges on this episode. He suggested adding these two questions:

"Who owns the copyright to the various programs you've written so far?"
"What are the (dis)advantages to licensing your software under the Apache license versus licensing it under the GPL"? 
 I am guessing this might be a good place to add these questions ?

Touch upon free labour

Explain that the contributions are usually made by people in their free time. Is this a problem? What are the implications?

Last episode needs more discussion about what to include

Mateusz and I think we should discuss if we are full happy with the current content of this episode. - - Can documentation be sort of metadata? More explanation is needed about the difference between documentation and metadata (e.g., by providing examples).

  • Should we add a description on how to add metadata "manually" (i.e. not using a platform/registry)? Should we describe the different ways to add metadata and tell that registries are one of those?
  • We think we should provide more examples of metadata (e.g., HTML metadata tags, Package Metadata in R, CodeMeta json-ld) and explain how metadata make software more discoverable (in these examples, e.g. HTML pages by Google).
  • Some concepts in the episode should be made clearer (e.g. the paragraph on controlled vocabularies).

REC1: Documentation - Conclusion

Good documentation can make a big difference on how a software is used but also in creating a community around it to support it.

need improvement.

Single "practical" track across all episodes

The idea is to go across all steps within the worskhop

  • Put a single-line code in a repo
  • Add license
  • ...
  • ...

We should have a concrete track / specific approach that can be applicable throughout the workshop. In each step we mention all the available options, but then we focus on the one particular approach/topic/platform/technology so that the exercises are specific.

REC1: Write code for readability/re-usability

Learners should know where to find style recommendations and practices can makecode more readable

Possible exercises

  • reviewing style guides (PEP8?, parsing tools)
  • fixing some bad code
  • learning to write better functions and tests
  • learning how to comment well/write good documentation (Read the docs? GitHub pages?) -

exercises could divide the room according to their abilities/interest/language/etc. What not to do: https://www.doc.ic.ac.uk/~susan/475/unmain.html

Related resources

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.