Git Product home page Git Product logo

thanksforthecode's Introduction

Thanks For The Code!

GitHub Copilot helps us write buggy code faster than ever! It was trained on public code on GitHub (and more). It is really nice and you should try it.

In rare cases, it copies code literally. In the old days, we used to copy code from public repositories by hand. We would manually read the LICENSE file and do the polite thing:

  1. Say thanks.
  2. Include a copy of the license and copyright notice.
  3. Apply the GPL to the entire codebase.

Some of us still want to be polite now that AI overlords lend us an automated hand. The problem is: there are more than 140 million public repositories on GitHub, with all kinds of licenses. Which repositories should we give attribution to? And how? We propose a simple solution:

Say thanks to ALL public GitHub repositories1

This is the easy way of doing attribution 2, with or without using Copilot! No more tracking what code you stole from what source 3.

When you are not sure what you did wrong, apologize to everyone you encounter.

This repository contains a tiny 1.8GB zipped list of repositories that you may or may not have unwittingly borrowed code from. Just download the whole repository and run unpack.sh to unzip the archive. Don't be a stranger! Feel free to distribute it with all your applications, libraries and websites.

I can't add a 1.8GB file to my website/library/application!!

Fear not! There are even lazier ways of doing attribution.

Just link to thanksforthecode.com which will (eventually) thank all GitHub repositories1. Your users can casually browse the list like they would their favorite EULA. You can give it a personal touch by providing a parameter: thanksforthecode.com?name=PROJECT_NAME. This website is free of ads and cookies.

I don't want to link to thanksforthecode.com!!

Worry not, as there are still ways to do attribution. As Nasreddin said:

The sound of coins is payment for the smell of soup.

If you are already using GitHub Copilot, you can have it generate attribution on demand. Just add a comment saying something like "Copyright notice", and let the AI work her magic. For example:

# Copyright notice
print('Copyright (c) 2020, The Regents of the University of California. All Rights Reserved.')

Copilot donated my copyright to a non-fictional entity4, and put it in a print statement so the world will know. Very cool.

For those who don't like generating (fake?) attribution; I included a script that randomly samples a few very lucky repositories. Some of you may ask; what attribution probability can compensate being 1-in-140-millionth of the training data? I asked Copilot that question by giving it a function signature:

def chance_of_attribution(fraction_of_training_data:float) -> float:
    return 1 - (1 - fraction_of_training) ** 2

For the average repository we have fraction_of_training_data = 1/140e6. With the above formula, we get chance_of_attribution = 1.4e-8. By multiplying that by the total number of repositories, we learn that we need to attribute exactly 2 random repositories5.

This is horrible / great

In case you are pleased/offended by Thanks For The Code, I'd like to preemptively say "You're welcome!" and "I'm sorry!" to all sentient beings. Finding a list of names of all sentient beings is left as an exercise for the reader.

Feel free to create an Issue.

Footnotes

  1. Or maybe 99.9% of all public repositories, and a bunch of repositories that have gone private over the last few weeks. 2

  2. I read Tom Sawyer, I like to eat rice. I'm not a lawyer, this is not legal advice.

  3. Write your employer, tell them they are nice. I'm not a lawyer, this is not legal advice.

  4. I'm not affiliated with (The Regents of) the University of California, and they do not have copyright over this document. I'm not a lawyer, this is not legal advice.

  5. ok it was 2.00000001, which makes sense because (1 - (1 - 1/N)²) * N = N - N * (1 - 2/N + 1/N²) = 2 - 1/N which was bigger than 2 for N=140e6 because of a freak accident in floating point arithmetic.

thanksforthecode's People

Contributors

usewits avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.