Git Product home page Git Product logo

upstream-ontologist's Introduction

Upstream Ontologist

The upstream ontologist provides a common interface for finding metadata about upstream software projects.

It will gather information from any sources available, prioritize data that it has higher confidence in as well as report the confidence for each of the bits of metadata.

The ontologist originated in Debian and the currently reported metadata fields are loosely based on DEP-12, but it is meant to be distribution-agnostic.

Provided Fields

Standard fields:

  • Homepage: homepage URL
  • Name: human name of the upstream project
  • Contact: contact address of some sort of the upstream (e-mail, mailing list URL)
  • Repository: VCS URL
  • Repository-Browse: Web URL for viewing the VCS
  • Bug-Database: Bug database URL (for web viewing, generally)
  • Bug-Submit: URL to use to submit new bugs (either on the web or an e-mail address)
  • Screenshots: List of URLs with screenshots
  • Archive: Archive used - e.g. SourceForge
  • Security-Contact: e-mail or URL with instructions for reporting security issues
  • Documentation: Link to documentation on the web

Extensions for upstream-ontologist, not defined in DEP-12:

  • SourceForge-Project: sourceforge project name
  • Wiki: Wiki URL
  • Summary: one-line description of the project
  • Description: longer description of the project
  • License: Single line license (e.g. "GPL 2.0")
  • Copyright: List of copyright holders
  • Version: Current upstream version
  • Security-MD: URL to markdown file with security policy
  • Author: List of people who contributed to the project
  • Maintainer: The maintainer of the project
  • Funding: URL to more information about funding

Supported Data Sources

At the moment, the ontologist can read metadata from the following upstream data sources:

It will also scan README and INSTALL for possible upstream repository URLs (and will attempt to verify that those match the local repository).

In addition to local files, it can also consult external directories using their APIs:

Example Usage

The easiest way to use the upstream ontologist is by invoking the guess-upstream-metadata command in a software project:

$ guess-upstream-metadata ~/src/dulwich
Security-MD: https://github.com/dulwich/dulwich/tree/HEAD/SECURITY.md
Name: dulwich
Version: 0.20.15
Bug-Database: https://github.com/dulwich/dulwich/issues
Repository: https://www.dulwich.io/code/
Summary: Python Git Library
Bug-Submit: https://github.com/dulwich/dulwich/issues/new

Alternatively, there is a Python API. There are also autocodemeta and autodoap commands that can generate output in the codemeta and DOAP formats, respectively.

upstream-ontologist's People

Contributors

debian-janitor avatar dependabot[bot] avatar jelmer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

upstream-ontologist's Issues

support using github credentials

The GitHub API has a low rate-limit per IP for unauthenticated users, which we regularly hit on the scruffy.

The upstream ontologist should support having credentials passed in, or reading them from the users home directory.

For simplicity we should support the GITHUB_TOKEN environment variable, in addition to potentially reading the users' home directory.

  • Use GITHUB_TOKEN for API operations
  • Inject into Breezy git operations

python3-upstream-ontologist: should depend on python3-ruamel.yaml and python3-breezy

Hi since there isn't a discussion tab I had to open an issue, I found this issue on the bug tracking system https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1029750 I would like to work on it but I could not setup my development environment as I'm using a Windows machine with WSL enabled could you guide me to setup the development environment and mentor me into resolving this bug as I'm fairly new to Debian. Here is the list of things that I've done till now

  • subscribed to the relevant mailing lists
  • understood the bug-tracking system
  • asked #debian-mentors on IRC how to set up the dev environment and they told me to use sbuild for building the package and docker for testing. but I don't know how to reproduce the error on WSL(Ubuntu).

so as of now, I need help setting up my dev environment and reproducing the error.
thanks and apologies if this isn't the right way to ask for help

restructure API

The current API in rust is basically a 1:1 copy of what existed in Python. Instead, we should probably have a more rustic API.

Should UpstreamDatum be an enum (like it is now), or a trait for example?

suggests invalid bug URLs for GitLab instances

In https://salsa.debian.org/debian/decopy/-/merge_requests/3, lintian-brush suggests a Bug Database URL that is broken.

I can reproduce this with guess-upstream-metadata:

Name: decopy
Repository: https://salsa.debian.org/debian/decopy.git
Homepage: https://salsa.debian.org/debian/decopy
X-Version: 0.2.4.7
X-Summary: Automatic debian/copyright Generator
X-Description: |2-

      Decopy automates writing and updating the debian/copyright file.

      It reads all files in the source tree, analyzes the licenses and copyright
      messages included and generates the corresponding debian/copyright file.
      When the file already exists, decopy parses it to generate a more complete
      output.

X-License: ISC
X-Author:
- !Person
  name: Maximiliano Curia
  email: [email protected]
  url:
Repository-Browse: https://salsa.debian.org/debian/decopy
Bug-Database: https://salsa.debian.org/debian/decopy/issues
Bug-Submit: https://salsa.debian.org/debian/decopy/issues/new

However, while the project exists, it does not have issues enabled.

support multiple maintainers

Some projects have multiple maintainers, and e.g. DOAP files will list all of them.

It would be good to change the X-Maintainer field into List[Person] rather than just a single string.

parse README.md with markdown to extract long description

Some approximation of a long description can probably be done by parsing README.md and:

  • Skipping over the initial header for the project
  • taking the paragraphs until the next header
  • filtering out anything clearly irrelevant ("See INSTALL for ... ")

If there are too many paragraphs, perhaps just take the first paragraph. Otherwise, take all paragraphs.

@isomer

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.