Git Product home page Git Product logo

osv-schema's Introduction

Open Source Vulnerability Schema

This is the repository for the Open Source Vulnerability schema (OSV Schema), which is currently exported by:

Together, these include vulnerabilities from:

  • AlmaLinux
  • Alpine
  • Android
  • Bitnami
  • Chainguard
  • crates.io
  • Debian GNU/Linux
  • GitHub Actions
  • Go
  • Haskell
  • Hex
  • Linux kernel
  • Mageia
  • Maven
  • npm
  • NuGet
  • OSS-Fuzz
  • Packagist
  • Photon OS
  • Pub
  • PyPI
  • Python
  • R (CRAN and Bioconductor)
  • Rocky Linux
  • RubyGems
  • Ubuntu

These vulnerabilites are aggregated by https://osv.dev.

Join the discussion in the OpenSSF Slack channel #osv_schema

Reference tooling (e.g. converters) can be found in the tools/ directory

The current version of the specification is rendered here.

The OSV-Schema specification and the tools here are maintained by the Open Source Security Foundation (OpenSSF) Vulnerability Disclosures Working Group (WG).

osv-schema's People

Contributors

achrinza avatar andrewpollock avatar another-rex avatar calebbrown avatar captn3m0 avatar chrisbloom7 avatar dependabot[bot] avatar dodys avatar gongomgra avatar hayleycd avatar hythloda avatar joshbuker avatar kurtseifried avatar michaelchirico avatar michaelkedar avatar mihaimaruseac avatar mstg avatar oliverchang avatar oswalpalash avatar pandatix avatar ph0tonic avatar randy3k avatar redenmartinez avatar roo4l avatar rthorpeii avatar ryru avatar sethmlarson avatar sse4 avatar tylfin avatar zacchiro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

osv-schema's Issues

missing context in aliases

Speaking of which in the OSV data there's a lot of aliases of just some number:

["CVE-2020-3671","2576091"]

which makes it... hard to track, or know where it came from, e.g. unless an alias is unique (so not just a number) and identifiable, e.g. CVE-, GSD-, RHSA-, it's basically just confusing. To quote the docs:

The aliases field gives a list of IDs of the same vulnerability in other databases, in the form of the id field. This allows one database to claim that its own entry describes the same vulnerability as one or more entries in other databases. Or if one database entry has been deduplicated into another in the same database, the duplicate entry could be written using only the id, modified, and aliases field, to point to the canonical one.

but there's no data around which database that alias is for...

I feel like I should open up another ticket to discuss this as a pile of random numbers that end up overlapping between different entries is going to be problematic.

Moved here as per CloudSecurityAlliance/gsd-database#2389

additional affected:ranges entry suggestion

So we have

affected:ranges:events:introduced
affected:ranges:events:fixed
affected:ranges:events:last_affected
affected:ranges:events:limit

but we're missing a "notaffected" or "unaffected" as in "we checked and we're not affected at all", e.g. data from the CISA log4j repo:

https://github.com/kurtseifried/log4j-affected-db/blob/develop/data/cisagov.yml

cves:
cve-2021-4104:
investigated: true
affected_versions: []
fixed_versions: []
unaffected_versions: - '>= 1.0.0'

cve-2021-44228:
investigated: true
affected_versions: []
fixed_versions: []
unaffected_versions: - '>= 1.0.0'

So I'd like to propose a "notaffected" tag.

Rename `WEB` to aid in clarity?

Currently the catch-all for references of an unknown type is "WEB". At first glance however, it's not necessarily clear that these are references of essentially an undefined or unknown type.

Some various alternatives:

  • UNKNOWN
  • OTHER
  • UNSPECIFIED

@oliverchang this would be a pretty major change, but I think have some positive long term implications as far as keeping the data intuitive and reducing institutional knowledge required.

versions required if git commit present, but versions can be blank []

the versions array is mandatory if you e.g. put a git commit in for affected, but the versions array can be blank:

"allOf": [
{
"if": {
"properties": {
"ranges": {
"contains": {
"properties": {
"type": {
"enum": [
"SEMVER",
"ECOSYSTEM"
]
}
}
}
}
}
},
"then": {},
"else": {
"required": [
"versions"

I would ask it either be removed or if it's intended to be there, then some content should be checked for (which breaks things, some things have git commit's but no releases/etc.) so I would suggest we remove this versions requirement.

What kind of credit in credits field?

credits fields
{
"credits": [ {
"name": string,
"contact": [ string ],
} ]
}
The credits field is a JSON array providing a way to give credit for the discovery, confirmation, patch, or other events in the life cycle of a vulnerability.

is there some reason we don't have an optional text description or ENUM for what kind of credit(s)?

Discussion about references field

The references field which is currently:

{
"references": [ {
"type": string,
"url": string
} ]
}

One major change I'd like to propose is that type be EITHER a string (singular) or an array (multiple entries), although ideally I'd prefer it just be an array with 1 or more entries.

Related to this I'd like to propose some additional values for type such as:

EXPLOIT (exploit code/methodology/etc.)
EXPLOITATION (reports of exploitation)
WORKAROUND
DISCUSSION (e.g. Twitter threads)

and obviously, some of these URLs can be multiple things, e.g. an ADVISORY could contain FIX and WORKAROUND and EXPLOIT hence the string/array request

I'd also like to propose adding a timestamp for when it was seen, this is extremely useful when creating timelines and I've noticed a worrying trend for web pages/forum posts/etc to not contain a published date or "1 month ago".

As you can see from https://github.com/cloudsecurityalliance/gsd-database/blob/main/2021/1002xxx/GSD-2021-1002352.json (search for "timestamp") it's incredibly helpful when reconstructing the Twitter activity timeline.

Additional options for references field to support reproducers/exploit code

This is in relation to: CloudSecurityAlliance/gsd-database#2389

Ok meta comments:

Obviously, we want this added as a reference, but the OSV schema only supports:

https://ossf.github.io/osv-schema/#references-field

ADVISORY: A published security advisory for the vulnerability.
ARTICLE: An article or blog post describing the vulnerability.
REPORT: A report, typically on a bug or issue tracker, of the vulnerability.
FIX: A source code browser link to the fix (e.g., a GitHub commit) Note that the fix type is meant for viewing by people using web browsers. Programs interested in analyzing the exact commit range would do better to use the GIT-typed affected[].ranges entries (described above).
PACKAGE: A home web page for the package.
EVIDENCE: A demonstration of the validity of a vulnerability claim, e.g. app.any.run replaying the exploitation of the vulnerability.
WEB: A web page of some unspecified kind.

So it could be classed as "EVIDENCE", which is what I'll use for now, but I'm going to file an upstream issue with OSV to try for a better word, e.g.:

REPRODUCER: Nonweaponized, programmatic method to trigger the vulnerability?
EXPLOIT: weaponized, programmatic method to trigger the vulnerability (can be a link to code?)
EXPLOITATION: reports of exploitation but no actual technical reproducer/exploit available?

I think one possible way to do this would be to use "sub tags" e.g. EVIDENCE:EXPLOIT or EVIDENCE:REPRODUCER. This would prevent clutter and an explosion of top level tags, and for data consumers they can for example simply support "EVIDENCE" and ignore subtags, or choose to also support subtags, either way they are able to broadly classify the data according to the top level tags, even if they don't want to check for subtags.

VMWare Photon OS Advisories

I was looking to enrich the GSD Database with additional data from VMWare Photon Advisories (Filed PR: CloudSecurityAlliance/gsd-database#2443), and was planning to add PHSA as a database prefix here once that is merged.

However, I had a few questions:

  1. What's the expectation from a "home"-database? Does it need to serve valid OSV schema, or will a redirect to the relevant CVE also work? Is there a specific URL which needs to be registered, any other requirements?
    I was planning to just keep all the advisories in OSV format in a GitHub Repository - that should be fine?
    In this case, since the PHSA Advisory links to multiple CVEs: I can't really setup redirects, what will a home database for PHSA look like?

  2. In the case of Photon, an advisory (identifier issued here) contains multiple vulnerabilities. This impacts the severity field, but also other details, which might vary on a per-package basis.

This also creates issues around aliases - should PHSA-2022-0304 be considered an alias for both CVE-2022-4415 and CVE-2022-43551?

OSV v1.0.0 is missing a canonical URL

Discovered this while discussing #132

Currently, the canonical url for the schema is:
https://raw.githubusercontent.com/ossf/osv-schema/v<SEMVER>/validation/schema.json

For example, 1.4.0 resolves to:
https://raw.githubusercontent.com/ossf/osv-schema/v1.4.0/validation/schema.json

1.0.0 is historically important due to schema_version being null is implied to be 1.0.0, however currently that 404s:
https://raw.githubusercontent.com/ossf/osv-schema/v1.0.0/validation/schema.json

This should be added or backported so that it is consistent and can be used by validators.

Add support for editing of Ruby language (non-RubyGem) advisories in GHSA database

Add support for editing of Ruby language (non-RubyGem) advisories in GHSA database.

 * jruby: https://github.com/advisories?query=jruby+ (23)
 * mruby: https://github.com/advisories?query=mruby  (40)
 * ruby-lang: https://github.com/advisories?query=ruby-lang (86)
TOTAL: 143
Looks like their is 5 non-rubygems, non-ruby-language advisories, 
but we can deal with them separately.
 * Unreviewed "ruby: https://github.com/advisories?query=ruby+type%3Aunreviewed (148)

I tried to add missing information to some of them lately and was blocked
by not having an "ECOSYSTEM" value for them.

I propose something similar to "Ruby Languages" or "Rubies" be added
as a possible "ECOSYSTEM" value.

The "PACKAGE" value could be ["ruby-lang", "jruby", "mruby", "rbx/rubinius", "truffleruby", etc].

A good reference for other possible "PACKAGE" values at https://github.com/codicoscepticos/ruby-implementations
Thanks.

Tooling for schema validation

As part of #761, we became aware that the Cloud Security Alliance has a schema validator.

It seems like shipping a canonical, authoritative validator tool and library with the schema would be best, rather than each ecosystem integrator needing to reinvent the wheel (and possibly less comprehensively than desired).

# [Codecov](https://codecov.io/gh/ietf-tools/datatracker/pull/5226?src=pr&el=h1&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=ietf-tools) Report

Codecov Report

Merging #5226 (799b462) into main (0ec1264) will decrease coverage by 0.02%.
The diff coverage is 94.28%.

@@            Coverage Diff             @@
##             main    #5226      +/-   ##
==========================================
- Coverage   88.58%   88.57%   -0.02%     
==========================================
  Files         295      294       -1     
  Lines       40114    40094      -20     
==========================================
- Hits        35534    35512      -22     
- Misses       4580     4582       +2     
Impacted Files Coverage Δ
ietf/iesg/urls.py 100.00% <ø> (ø)
ietf/meeting/models.py 86.03% <ø> (-0.05%) ⬇️
ietf/meeting/urls.py 81.25% <ø> (ø)
ietf/iesg/views.py 92.53% <85.18%> (-0.71%) ⬇️
ietf/doc/mails.py 96.20% <100.00%> (ø)
ietf/group/views.py 90.95% <100.00%> (+0.09%) ⬆️
ietf/ietfauth/views.py 92.12% <100.00%> (+1.09%) ⬆️
ietf/meeting/helpers.py 89.84% <100.00%> (-0.07%) ⬇️
ietf/meeting/views.py 91.16% <100.00%> (+0.03%) ⬆️
ietf/secr/proceedings/proc_utils.py 86.33% <100.00%> (ø)
... and 6 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

Originally posted by @codecov in ietf-tools/datatracker#5226 (comment)

Please add "data_type" tag to make parsing easier

Please add "data_type" tag to make parsing easier, for example the CVE JSON format has:

"data_type": "CVE" which means you don't have to guess and test to see what kind of data you're parsing. If the OSV ould add a " "data_type": "OSV" to everything that would make life much simpler. Especially if more people start using things like OSV's database_specific field to embed OSV data into other formats (cats and dogs living together, it's turtles all the way down).

affected version assumptions/logic clarification

So I read https://ossf.github.io/osv-schema/ and I think there's an underlying assumption that should maybe be made explicit:

In most cases, there should be exactly one entry in the affected array per affected package to describe all affected versions. In rare cases, for example if the ecosystem_specific encodes platform information that doesn’t apply equally to all listed versions and ranges, a separate entry with the same package in the affected array may be needed.

The versions field can enumerate a specific set of affected versions, and the ranges field can list ranges of affected versions, under a given defined ordering. A version is considered affected if it lies within any one of the ranges or is listed in the versions list.

So in theory (assuming the data is correct) anything NOT listed is explicitly NOT vulnerable, correct? Should this logic be made explicit (assuming it is correct)?

I also wonder if it might be helpful to also support an "unknown" or "needs investigation:" type tag, this would for example allow people to publish OSV's quicker ("we know this specific version is affected, but we haven't looked at much older code yet, someone should probably do that") with a finer degree of detail, or is this overengineering and not needed (I'm leaning in this direction mostly right now).

What is the major difference between osv-schema and cve-schema

It is my understanding that both osv-schema and cve-schema try to describe a vulnerability, and many fields in osv-schema and cve-schema share the same meaning, especially for cve-schema v5.0.

Therefore, I'm now confusing about what is the major difference between the two schemas.

Thank you for your response.

Expand vuln id relationships

Allow not just aliasing (this ID is also known as...) and relates to (this ID is different from, but related to...), but also more nuanced relationships like parent/child and such.

@JLLeitschuh what would be most useful relationship types to include from a researcher's perspective?

complete example file for OSV

Is this missing anything? It seems like a complete and blank example:

{
  "schema_version": "string",
  "id": "string",
  "modified": "string",
  "published": "string",
  "withdrawn": "string",
  "aliases": [
    "string"
  ],
  "related": [
    "string"
  ],
  "summary": "string",
  "details": "string",
  "severity": [
    {
      "type": "string",
      "score": "string"
    }
  ],
  "affected": [
    {
      "package": {
        "ecosystem": "string",
        "name": "string",
        "purl": "string"
      },
      "severity": [
        {
          "type": "CVSS_V2",
          "score": "string"
        },
        {
          "type": "CVSS_V3",
          "score": "string"
        },
        {
          "type": "EPSS",
          "score": "string"
        }
      ],
      "ranges": [
        {
          "type": "SEMVER",
          "repo": "string",
          "events": [
            {
              "introduced": "string",
              "fixed": "string",
              "last_affected": "string",
              "limit": "string"
            }
          ],
          "database_specific": {}
        },
        {
          "type": "ECOSYSTEM",
          "repo": "string",
          "events": [
            {
              "introduced": "string",
              "fixed": "string",
              "last_affected": "string",
              "limit": "string"
            }
          ],
          "database_specific": {}
        },
        {
          "type": "GIT",
          "repo": "string",
          "events": [
            {
              "introduced": "string",
              "fixed": "string",
              "last_affected": "string",
              "limit": "string"
            }
          ],
          "database_specific": {}
        }
      ],
      "versions": [
        "string"
      ],
      "ecosystem_specific": {},
      "database_specific": {}
    }
  ],
  "references": [
    {
      "type": "string",
      "url": "string"
    }
  ],
  "credits": [
    {
      "name": "string",
      "contact": [
        "string"
      ],
      "type": [
        "string"
      ]
    }
  ],
  "database_specific": {}
}

Unified field for describing affected functions

Rename `database_specific` to `experimental`?

Per some of the conversation in #50, allowing the use of database_specific on a permanent and intentional basis could lead to an undesirable fracturing of the OSV standard, where every database has their own pet fields that get included and create sub-standards. (Obligatory XKCD)

By renaming the database_specific field to experimental, it would still allow databases to explore new fields for inclusion in the OSV schema proper, while making it clear that those fields are not official and reliable yet. It would also encourage databases to push for fields that are being heavily used in their own projects back upstream into the central OSV format.

Add a "first_not_affected" field or similar to properly describe all possible ranges for affected packages

Description

Hi!

I was trying to express some NVD vulnerabilities with OSV.
They use these keywords for the ranges

  • versionStartIncluding
  • versionStartExcluding
  • versionEndIncluding
  • versionEndExcluding

So we can have these type of intervals

  • [X,..., Y] : from X to Y, both ends included
  • [X,..., Y) : from X to Y, X included, Y excluded
  • (X,..., Y] : from X to Y, X excluded, Y included
  • (X,..., Y) : from X to Y, both ends excluded

But here in OSV, we have:

  • introduced: this is equivalent to versionStartIncluding
  • fixed: this is equivalent to versionEndExcluding
  • last_affected: this is equivalent to versionEndIncluding
  • limit: in this context, it is similar to fixed

It isn't posible to express (X,..., Y] / (X,..., Y) ranges, we are missing an equivalen to versionStartExcluding.

Example

The https://nvd.nist.gov/vuln/detail/CVE-2022-34465 vulnerability has this type of ranges

example

Why is CVSS limited to CVSS v3.x?

CVSS_V3 | A CVSS vector string representing the unique characteristics and severity of the vulnerability using a version of the Common Vulnerability Scoring System notation that is >= 3.0 and < 4.0 (e.g."CVSS:3.1/AV:N/AC:H/PR:N/UI:N/S:C/C:H/I:N/A:N").

Can we please either:

  1. add CVSS_V2 tag for CVSS v2 data (e.g. historical stuff)
  2. change CVSS_V3 to CVSS and simply specify the CVSS version in the string (as generally done). This also solves the "what about CVSS v4 problem in future?

Severity schema should it be a map rather than an array?

I noted in latest version of schema the addition of severity but seems to allow for multiple score of the same type without a means to distinguish so if you wanted to favour a severity say from a source it is challenging to work out. Currently there are only in database CVSS_V3 because of this lack of being able to distinguish identical score types should the data struct instead of being

"severity" : [
    {
         "type":"string",
         "score":"string"
    }
]

as types really should be unique base don this

"severity" : {
     "type_key" : "score"
     ....
}

You could even enumerate the types supported but this allows same flexibility bit avoids the doubt around duplicate values as that is simply not valid.

So what is done today in the database would be;

"severity" : {
    "CVSS_V3": "CVSS v3 score uri"
    "CVSS_V2":"CVSS v3 score uri"
    "ANotherseverityscore": "An other score"
}

Add support for lower bound exclusive version ranges

When defining vulnerable version ranges, the > operator, aka lower bound exclusive, represents the last-known-good release before a vulnerability is introduced. It might be argued that if one knows the last-known-good, one should also know the first-known-bad. However, this isn't always possible, and in some cases there are legitimate reasons for using > even if one has access to the first-known-bad version.

Example 1 - prereleases

In SemVer syntax, 1.2.3.alpha1 is less than 1.2.3 because the former is a prerelease of the latter. If a vulnerability is known to have been introduced in a particular prerelease then one could simply say >= 1.2.3.rc2 which can be converted to an introduced range event type to cover all latter prereleases plus the release itself, assuming the order of the prereleases is alphabetical and machine parseable. However, sometimes there may be so many prereleases that it's not possible to immediately identify which ones are affected. In this case, one way to ensure that all prereleases are covered is to specify > last-known-good.

Example 2 - interlaced affected versions

If a package has multiple affected versions with interlaced fixes, it's reasonable to list them like so:

>= 2.0-beta7, < 2.3.2
> 2.3.2, < 2.12.4
> 2.12.4, < 2.17.1

That example is from the recent log4j vulnerability.

Proposal

One suggestion for handling this is to extend affected[].ranges[].events[] to accept an event with a "last known good" event, and change the validation so that at least one introduced OR "last known good" field is present. This can still be programmatically evaluated by comparing a given version to see if it's > the last known good, just as you could compare it to see if it's >= and introduced event. This could be expressed in the evaluation example in the schema docs by slightly altering the IncludedInRanges method:

func IncludedInRanges(v, ranges)
  vulnerable = false
  for range in ranges
    if BeforeLimits(v, range)
      for evt in sorted(range.events)
        if evt.last_known_good is present && v > evt.last_known_good
           vulnerable = true
        else if evt.introduced is present && v >= evt.introduced
           vulnerable = true
        if evt.fixed is present && v >= evt.fixed
           vulnerable = false

Enable GHSA to consolidate `affected` entries

Context

Currently, GitHub's Advisory Database encodes pairs of (introduced, fixed) events in separate affected objects. This is because GitHub needs to preserve these pairs when converting the OSV entry back into their internal database represenation.

An example: https://github.com/github/advisory-database/blob/d6004eb8de91ad341605da869ab1b9f1e4abe433/advisories/github-reviewed/2017/10/GHSA-xgr2-v94m-rc9g/GHSA-xgr2-v94m-rc9g.json#L14

"affected": [
    {
      "package": {
        "ecosystem": "RubyGems",
        "name": "activesupport"
      },
      "ranges": [
        {
          "type": "ECOSYSTEM",
          "events": [
            { "introduced": "2.3.2" },
            { "fixed": "2.3.16" }
          ]
        }
      ]
    },
    {
      "package": {
        "ecosystem": "RubyGems",
        "name": "activesupport"
      },
      "ranges": [
        {
          "type": "ECOSYSTEM",
          "events": [
            { "introduced": "3.0.0" },
            { "fixed": "3.0.20" }
          ]
        }
      ]
    }
  ],

One obvious alternative here is to use a single affected entry and have two separate ECOSYSTEM ranges in there:

      "ranges": [
        {
          "type": "ECOSYSTEM",
          "events": [
            { "introduced": "2.3.2" },
            { "fixed": "2.3.16" }
          ]
        }, 
        {
          "type": "ECOSYSTEM",
          "events": [
            { "introduced": "3.0.0" },
            { "fixed": "3.0.20" }
          ]
        }
      ]

However, I believe this wasn't done, because GitHub needs to record version metadata using a <= operator, and relies on database_specific to this. One example: https://github.com/github/advisory-database/blob/d6004eb8de91ad341605da869ab1b9f1e4abe433/advisories/github-reviewed/2020/03/GHSA-gww7-p5w4-wrfv/GHSA-gww7-p5w4-wrfv.json

This has a bunch of affected entries that look like:

GHSA-gww7-p5w4-wrfv.json

    {
      "ranges": [
        {
          "type": "ECOSYSTEM",
          "events": [
            {
              "introduced": "2.0.0"
            }
          ]
        }
      ],
      "database_specific": {
        "last_known_affected_version_range": "<= 2.0.6"
      }
    },

Problems

There are a number of problems here:

  1. Using the the current evaluation pseudo-code, the semantics of the GHSA-gww7-p5w4-wrfv.json entry above would mean: everything after 2.0.0 is affected, regardless of any fixes that are encoded afterwards in other affected entries.

    This advisory has fixes later specified in 2.8.11.5 and 2.9.10.1, but they're not going to be considered under the current evaluation algorithm. Any version after 2.0.0 will be incorrectly considered to be vulnerable.

  2. Having duplicate package entries makes the entry a lot less lean than it should be

Proposed solutions

Add database_specific to affected[].ranges[]

If we instead add database_specific to affected[].ranges[], we could encode GHSA-gww7-p5w4-wrfv as

{
      "package": {
        "ecosystem": "Maven",
        "name": "com.fasterxml.jackson.core:jackson-databind"
      },
      "ranges": [
        {
          "type": "ECOSYSTEM",
          "events": [
            {
              "introduced": "2.0.0"
            },
          ],
          "database_specific": {
            "last_known_affected_version_range": "<= 2.0.6"
          }
        }, 
        {
          "type": "ECOSYSTEM",
          "events": [
            {
              "introduced": "2.1.0"
            },
          ],
          "database_specific": {
            "last_known_affected_version_range": "<= 2.1.5"
          }
        }, 
        ...
      ],

The pseudo-code as is should work as intended with this encoding (with minor changes re sorting), and the resulting JSON has far less duplication.

Support a <= event

Similar to above, rather than encoding this in database_specific, we support a "last_affected" event or something similar:

      "ranges": [
        {
          "type": "ECOSYSTEM",
          "events": [
            {
              "introduced": "2.0.0",
              "last_affected": "2.0.6"
            },
          ],
        }, 
        {
          "type": "ECOSYSTEM",
          "events": [
            {
              "introduced": "2.1.0",
              "last_affected": "2.1.5"
            },
          ],
        }, 

In the vast majority of cases, GitHub does actually know the exact fix version (example), but they would still like to record this <= information for use in tools such as Dependabot.

Supporting this would mean we potentially complicate our version specification, and potentially encourage entries that only use last_affected when we really want fixed in all cases. On the other hand, CVE 5.0 does support a lessThanOrEqual, and perhaps OSV was missing this.

Fix the evaluation algorithm to consider all affected entries for the same package.

We could alternatively make the pseudo-code handle these problematic cases, but it doesn't seem ideal as it:

  • Does not address the duplication issue. The intention of having multiple duplicate affected entries for the same package was to encode things like platform configurations where the set of affected versions are not the same.
  • Complicates the algorithm.

Ecosystem support for tooling that is outside of the language or os specific package management system

It would be incredibly useful to have some standard way of referring to generic tooling that is part of a language or OS ecosystem, but not actually installed via that ecosystem's package registry.

An example might be cargo, which is a part of the rust ecosystem and has advisories issued from the Rust Advisory Database but is apart from the existing crates.io OSV ecosystem (some previous discussion on this)

Or a very recent example would be some way to refer to generic nodejs that hasn't been installed via a system package manager and would be separate from the existing npm OSV ecosystem, but still quite important to have some standard way of representing security advisories for. That would also hopefully open the door to flagging these tools within the existing GitHub Advisory Database for the languages they currently support

Missing endpoint to serve schema files as application/schema+json

When embedding the OSV schema into another schema using $ref we get an error:

Error: https://raw.githubusercontent.com/ossf/osv-schema/v1.3.1/validation/schema.json is not a schema. Found a document with media type: text/plain

https://raw.githubusercontent.com/ossf/osv-schema/v1.3.1/validation/schema.json serves as text plain which fails: it needs to be served as "application/schema+json"

is there some official endpoint at osv.dev for example that serves the various schema files as application/schema+json?

If there already is one I apologize, I couldn't find it.

Overloading of ecosystem

Where do we put the vendor name? There are lots of packages, sometimes in the same ecosystem, or not really in any ecosystem at all, for which the vendor name is really helpful.

The osv-data seems to do things like:

"ecosystem": "Debian:5.0",
"ecosystem": "Debian:10",

is this the official way to do it? If so can we update the documentation at https://ossf.github.io/osv-schema/#affected-fields doesn't mention this explicitly but does show examples, and:

Your ecosystem here. | Send us a PR.

if so (for the data I'm currently working with) there are about 50 vendors with 100+ items, 250 with 10-99 and 300 with 5-9. What's the bar for entry here to get listed? Do they need to be listed? (E.g. you have Debian so can we add all the major Linux vendors? BSD's?).

Missing reference type value - the inverse of "FIX"

So for reference types we have:

FIX: A source code browser link to the fix (e.g., a GitHub commit) Note that the fix type is meant for viewing by people using web browsers. Programs interested in analyzing the exact commit range would do better to use the GIT-typed affected[].ranges entries (described above).

But we have no inverse, no "INTRODUCED", and for many of these vulnerabilities we have the commit/change/issue/pull/etc that introduced the vulnerability, but no way to label it as such. I would like to simply suggest we add "INTRODUCED" as the inverse of "FIX" for the reference types.

Severity does not support multiple differing severities for one ID

I just noticed if I have two different severities for the same identifier, I can't note this in a sensible manner.

For example

Let's say ID-1 affects two Linux distributions differently. On one distro it's critical, on the other it's important.

Right now the severity field is inflexible
https://ossf.github.io/osv-schema/#severity-field

It's an array, but the spec only support CVSSv3, and there is no way to namespace the findings into package or ecosystem. In the affected fields, I will specify an ecosystem to account for this

{"ecosystem": "distro1", "name": "package"}

and

{"ecosystem": "distro2", "name": "package"}

I've also seen instances where different versions have different severities to make this even more difficult. This is less common, so trying to put severity in the affected field would create a lot of duplication for what is not a common occurrence.

Support the GitLab community advisory database

It would be great to pull in advisory data from the GitLab Community Advisory Database.

I think transforming the GitLab format into OSV shouldn't be too difficult; however, figuring out what to use as the identifier within OSV might need some discussion as GitLab primarily use the cve Id as the identifier within their DB. Perhaps something like a prefix of GITLAB and the uuid from the record (although that field isn't documented and I haven't verified it exists on every record)?

Also, I know @oliverchang submitted an issue directly to GitLab quite some time ago about exporting OSV directly, but I'm not sure there's been much movement there yet.

I do think it'd be valuable to start bringing this in as there is quite a bit more coverage particularly in the maven ecosystem than GitHub currently has. I'm very slowly working on getting them in sync, but that is going to take quite a long time

affected[].ranges[].type missing timestamp type

affected[].ranges[].type missing timestamp type

So we have:
https://github.com/cloudsecurityalliance/gsd-database/blob/main/2023/1001xxx/GSD-2023-1001657.json

"product_version": "prior to Jan 4 of 2023 (2022/01/04)",
"vulnerability_type": "XSS",
"affected_component": "https://app.zerossl.com",

as it's a service there is no good way to specify a version. But we know for sure it was vulnerable prior to the fix time, so this is actionable information (if you have any certs from them that were created before this time you should probably roll them over).

having a TIMESTAMP type in addition to SEMVER and GIT would solve this problem easily. Can I submit a PR to add this?

As we've seen there's a lot of vulns in a lot of services and we need to start documenting them.

Add support for optional published and modified dates for URL data

So I'm finding timeline analysis is a lot easier when the Source URLs are tagged with time data, can I suggest we add an (optional, not required) support for a publishedDate and lastModifiedDate, e.g. almost all Red Hat RHSA's have it (Issued: 2017-11-16 Updated: 2017-11-16), Debian DSA (Date Reported: 09 Mar 2022), and so on.

What does an "introduced" value of "0" mean?

From: https://ossf.github.io/osv-schema/

Special values
"introduced" allows a version of the value "0" to represent a version that sorts before any other version.

What does 0 mean? "since forever"? "unknown"? Something else? Right after we have:

"limit" allows versions containing the string "" to represent “infinity”. If no limit events are provided, an implicit { "limit": "" } is assumed to exist. Multiple "limit" events are allowed in the same range.

So I assume 0 is meant to assume the inverse, or "forever", but for example, this breaks in the common case where the code has NOT been present in the given software since "forever" (e.g. the Linux Kernel, a lot of security fixes are for code added in the last decade, not earlier). Can we please clearly define what the "0" means?

affected ranges are order dependent

The algorithm for evaluating if a package/version is affected by a vulnerability (https://ossf.github.io/osv-schema/#evaluation) is dependent on the order of ranges. It doesn't look like this is intentional.

func IncludedInRanges(v, ranges)
  vulnerable = false
  for range in ranges
    if BeforeLimits(v, range)
      for evt in sorted(range.events)
        if evt.introduced is present && v >= evt.introduced
           vulnerable = true
        else if evt.fixed is present && v >= evt.fixed
           vulnerable = false
        else if evt.last_affected is present && v > evt.last_affected
           vulnerable = false

  return vulnerable

If we have:

"ranges": [ {
    "type": "SEMVER",
    "events": [
      { "introduced": "1.1.0" },
      { "fixed": "1.1.3" },
    ]
}, {
    "type": "SEMVER",
    "events": [
      { "introduced": "0" },
      { "fixed": "1.0.2" },
    ]
} ]

Then according to this algorithm, 1.1.0 is not vulnerable: The first iteration of for range in ranges: sets vulnerable=true, the second iteration if sets vulnerable=false, and the function returns false.

I think there may be an implicit assumption that there will be only one range of a given type, but I can't find any such restriction in the spec. Perhaps I'm overlooking it. Even if there is such a restriction, however, it seems like IncludedInRanges should return true if any range matches the given version.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.