Git Product home page Git Product logo

cavil's Issues

Don't scan the code itself

Things like "throw IllegalArgumentException();" in a code trigger Risk 9 because of the string legal being part of the exception name. Also, comments that state what is "legal" or "illegal" as an input for a function to the same. Would there be a way to make the scanning more smart and avoid a load of false positives?

Handle arbitrary tarballs

There's currently a hack that pretends uploaded tarballs are actually RPM packages. It kinda works for testing stuff, but we should at some point make that a real feature and fully support arbitrary tarballs.

OBS import race condition

The recent Factory change to accept every request after 2 hours (and to obsolete the legal review in progress) means that we now have many repeated imports of the same sources from OBS (first factory review, then product sync). That seems to result in source checkouts getting lost completely sometimes, causing an empty legal report.

The problem probably existed for a long time, but the recent change made it a much more frequent occurrence.

trademark logo scan

is cavil able to scan for trademarked logos and could flag them for manual review? trademark usage is difficult as fair-use could apply, however I think currently we're not highlighting those at all?

Show the original string if SPDX parsing fails

For spec files not coming from openSUSE, cavil is making it hard to review the rpm license as it enforces SPDX. If SPDX parsing fails, it should show the original rpm license with the information that spdx mapping failed included.

handle OCI container license labelling

OCI containers can have a license declaration as part of the container metadata:

https://github.com/opencontainers/image-spec/blob/master/annotations.md#pre-defined-annotation-keys

Kiwi and podman / docker support setting those labels during build time. example for kiwi:

     <containerconfig
        name="my-container"
        tag="latest"
        additionaltags="1.0.0.%RELEASE%">
         <labels>
          <!-- See https://en.opensuse.org/Building_derived_containers#Labels -->
          <suse_label_helper:add_prefix prefix="org.example.container">
  ...
            <!-- Select a correct license from https://github.com/openSUSE/spec-cleaner#spdx-licenses -->
            <label name="org.opencontainers.image.licenses" value="SUSE-Permissive"/>
          </suse_label_helper:add_prefix>
        </labels>
        <history author="Fabian Vogt &lt;[email protected]&gt;">Derive the image</history>
       </containerconfig>

it would be good if the legal auto bot would check for this to be set and accurate (that is harder)

Support LicenseRef- prefix in specfiles

The nmap package in Factory has started using the license LicenseRef-NPSL-0.93 and Cavil currently thinks that is not valid SPDX. But it is spec compliant, while our SUSE-* prefix is not. So we should at least support LicenseRef-* in addition to SUSE-*.

Port the UI to Vue.js

Performance issues are becoming more common with our current UI. Especially the AJAX driven tables are a big problem once the data sets reach a certain size. We've learned a lot about how to make better performing UIs with Vue.js for the QEM Dashboard. Those lessons should be applied to Cavil as well.

Diffiicult to navigate to sources from report with nested archives

A given report with .obscpio archive which contains other archives would have a report like,

MPL-Unspecified: 3 files

node_modules.obscpio._/package._1281/index.js
node_modules.obscpio._/package._1282/index.js
node_modules.obscpio._/package._943/node_modules/spdx-correct/index.js

It would be a lot more helpful to have the output include the name of the inner archives in the filenames. Even if you filter it to only include limited characters set [0-9a-zA-Z_+\-\.] (think XSS) it would be a lot more helpful than the current format.

MPL-Unspecified: 3 files

node_modules.obscpio._/package._1281.some_program_5.4.tgz/index.js
node_modules.obscpio._/package._1282.another_program_1.4.tgz/index.js
node_modules.obscpio._/package._943.magics_23.tgz/node_modules/spdx-correct/index.js

handle dockerfile and helm Chart.yml package containers

When a package only consists of a Dockerfile or a Chart.yaml (which is referring to a helm chart), then cavil fails the review wtih an error message.

it would be good if we would continue to do source tarball evaluation (e.g. scan all the tar files in that package) instead, so that we can get something that is a valid report for legal.

2-clause BSD license recognized as BSD unspecified

We can get a bit better at recognizing particular BSD license. See my example
Following snippet was recognized as BSD with unspecfied version, while it's clearly stated it's BSD-2-Clause

src/regex/tre*) is Copyright © 2001-2008 Ville Laurikari and licensed	
under a 2-clause BSD license (license text in the source files). 

Even the opensource.org calls it "2-Clause BSD License"
POPULAR / STRONG COMMUNITY
The 2-Clause BSD License
SPDX short identifier: BSD-2-Clause

Inconsistent patterns without license

We have 461 patterns without a license, and only 48 of them are keywords with risk 9. 248 have a risk assessment of 0, suggesting they have been used as a hack-ish version of ignore patterns, before the real feature existed. An unknown but not insignificant number also has actual license text, which seem to have been accidentally not assigned a license name.

We should find out which of these patterns are not in use for current Factory packages and remove all that have become obsolete.

Map licenses to SPDX identifiers

Much of our license pattern data predates the existence of SPDX, so we rely on mostly arbitrarily chosen identifiers. Recently there has been growing interest in reports that also include SPDX identifiers. This has many advantages, such as the ability to exchange reports in standard formats with tools like Fossology. Which in turn would also allow us to cooperate more with open source projects like OSSelot (see #64).

Make priorities more visible for open reviews

Currently we only list priorities as part of the link, which is easily overlooked. It should probably be a separate table column with some kind of colour highlighting for high priority reviews.

LegalDB report should use license definitions acceptable by obs-service-format_spec_file

Hello

this is an example copy-paste, where rather than GPL-2.0+ we should use GPL-2.0-or-later and similar.

It has happened to me a few times, that we've accepted changes to devel project, however, they failed to build in Factory where we have strict rpmlint checking. I can't recall what the license was, but the mistake was that I did copy paste the license text from Cavil and didn't cross-check against https://github.com/openSUSE/obs-service-format_spec_file/blob/master/licenses_changes.txt which I newly do since this issue occurred..
So, could we only use licenses and exceptions that are acceptable/listed by obs-service-format_spec_file?
GPL-2.0+ OR MIT: [1 files] ...
GPL-2.0+ WITH Autoconf-exception-3.0: ...
GPL-2.0+ WITH Libtool-exception: ...
GPL-3.0+ WITH Autoconf-Exception-3.0 ...

I understand that that might be challenging as I've seen a report which was referencing an older version of license than we had in the obs-service-format_spec_file. Perhaps such exceptions could be colorized or so, to warn the reviewer.

Review correction ui

Currently it is very hard to audit and possibly correct any already finished reviews. We could probably expand the file viewer to show all pattern matches for the whole file, including ignores and use a pull down menu with correction options, such as the removal of an ignore pattern.

Prevent obs_import race condition

Reported by @coolo. If the same package is requested multiple times in quick succession it might be possible to create the same obs_import job multiple times, resulting in a race condition.

tree navigation of parsed licenses

The cavil view should have a cascaded view of dependencies instead of just a flat-view. So instead of just having a flat list of files, I should be able to navigate the tree and have a subview of the licenses in that tree only.

Add UI for removing globs again

Currently we only have a UI for adding globs, but not one to remove them again.
glob
There should probably be a simple table view with delete button for admin users.

Allow filtering open reviews by minimum priority

Our production review backlog tends to be rather large because of many automatically imported priority 1 (low) openSUSE:Factory packages. It would help to be able to filter those out with a minimum priority setting in the ui, probably defaulting to priority 2 and above.

Optimize daily cleanup

Our daily cleanup background jobs take much longer than they should, wasting a lot of resources. There's probably many ways to improve that significantly.

Inconsistent risk assessments

Some named licenses have multiple conflicting risk assessments for various patterns:

Apache-1.1: 3, 4
Apache-2.0: 2, 1, 3
Apache-2.0 AND CC-BY-SA-4.0: 3, 2
Apache-2.0 OR Artistic-2.0: 3, 2
Apache-2.0 OR BSD-3-Clause: 2, 1
Apache-2.0 OR GPL-2.0: 3, 2
Apache-2.0 OR GPL-2.0+: 3, 2
Apache-2.0 OR MIT: 3, 1
Apache-2.0 WITH LLVM-exception: 3, 2
...

This needs to be cleaned up once we have gotten a normalised list back from the lawyers. And perhaps it would be a good idea to dedicate a new cli command to license pattern maintenance.

Be aware: Cases like Any Proprietary: 5, 3, 1, 4 need to have patterns with different risk assessments, since they don't represent one specific named license.

Flagging changes authored by AI

Hello team,

SUSE is currently running a pilot of Github Copilot https://mysuse.sharepoint.com/sites/github-copilot-pilot/SitePages/Introduction.aspx

So far it is a pilot aware of "AI Pair programmer" https://opensource.suse.com/legal/policy and none of the code will make it to the SUSE product.

However since SUSE Legal doesn't review any of openSUSE legal reviews, I'd like to make sure that we do not automatically fast-forward requests containing such changes. Without somebody actually looking into it.

Keeping this open as a high level tracker.

Zstandard compression support

Cavil can't currently unpack .zst files because File::Unpack lacks support for the format. We have started to see this format being used in OBS though, for packages like trivy.

Bring back ordering for ui tables

With the switch to server-side pagination we've lost support for ordering in all ui tables. It's a bit tricky to reimplement, but we should bring it back at least in places where users have requested it.

  • Order packages by state for /products/*

Support license incompatibilities

Not all Open Source licenses are compatible with each other. It would be nice if Cavil could highlight known incompatibilities in the report. Perhaps with a UI for incompatibility management.

Position dropdown menu for managing patterns dynamically

Currently the dropdown menu for managing patterns in the review ui always appears right below the button. This can be problematic if the pattern is at the end of the report. So we should probably position the menu dynamically above or below the button depending on where we have the most space.

menu

Race condition around unpack job locks

This seems to be pretty rare, but recently we've seen a job history like this where a lock was active that should not have been, preventing the package from being unpacked.

    id    |    task     |  state   |                   result                    |            created            
----------+-------------+----------+---------------------------------------------+-------------------------------
 91476425 | obs_import  | finished | null                                        | 2021-12-09 00:09:02.835544+01
 91476428 | unpack      | finished | "Package 281044 is already being processed" | 2021-12-09 00:09:21.143415+01
 91487879 | index_later | finished | null                                        | 2021-12-10 20:00:02.032591+01

UI for reviewing ML classification

Currently we have to look at PostgreSQL directly to review ML classification results. This has become rather tedious with an increasing number of classification failures. We are going to need a proper UI for reviewing results that can also be used to create new training data to reduce the number of failures again.

What i'd like to see is a long list of license snippets with ML assessment results and two buttons for a human to press, green and red.

Problems with File::Unpack

ldig@legaldb:~/cavil> ./script/cavil minion job -f 28673440
T: 445 files ...
Deep recursion on subroutine "File::Unpack::unpack" at /usr/lib/perl5/vendor_perl/5.18.2/File/Unpack.pm line 1170.
unpack('/data/auto-co/legal-bot/gcc46/5c4638b8b35ffd2d07223f6844e9f64e/.unpacked/gcc-4.6.2-20111212/libgo/go/archive/zip/testdata/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r','/data/auto-co/legal-bot/gcc46/5c4638b8b35ffd2d07223f6844e9f64e/.unpacked/gcc-4.6.2-20111212/libgo/go/archive/zip/testdata/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r'): recursion limit 200 at /home/ldig/cavil/script/../lib/Cavil/Checkout.pm line 154.
[2019-07-04 06:52:47.58874] [8318] [info] [27411
] Unpacked /data/auto-co/legal-bot/gcc46/5c4638b8b35ffd2d07223f6844e9f64e

Ignore snippet everywhere does not work

bad-request

It appears the "Ignore snippet everywhere" feature does not currently work, and results in a 400 response. Ignore snippet for package seems fine though, which is probably why it has not been noticed earlier.

Inconsistent license capitalisation

We have some duplicate license names with different capitalisation. Like Any permissive and Any Permissive, which are considered different licenses. That should probably not be the case.

Full test coverage for the main review process

#76 has shown that our test coverage is still not great. We should make sure that at least everything needed for the normal review process is covered. A good first step would probably the addition of coverage metrics.

One click UI for creating new patterns

From the report it should be easier to create new license patterns. For many keyword matches Cavil already has a good estimation for what the license pattern metadata will look like. Here it should be much easier to create the pattern without leaving the report UI. Perhaps we could even do something like GitHub reviews, where multiple patterns can be created from the report UI and submitted together as a batch.

Encoding error when generating SPDX reports

Dec 04 13:13:02 legaldb cavil[1636]: [1636] [e] Non-existing path in SPDX report 329096: /data/auto-co/legal-bot/java-11-openjdk/2a9e351679e9f9d5f078110b24744813/.unpacked/openjdk/test/jdk/sun/misc/URLClassPath/testclasses/+ª-ë-ï+Ñ-å-î.class
Dec 04 13:13:05 legaldb cavil[1636]: [1636] [e] Non-existing path in SPDX report 329096: /data/auto-co/legal-bot/java-11-openjdk/2a9e351679e9f9d5f078110b24744813/.unpacked/openjdk/test/jdk/sun/security/tools/jarsigner/JarSigning_RU/New/ðñð©ÐêðÁÐÇ/English
Dec 04 13:13:05 legaldb cavil[1636]: [1636] [e] Non-existing path in SPDX report 329096: /data/auto-co/legal-bot/java-11-openjdk/2a9e351679e9f9d5f078110b24744813/.unpacked/openjdk/test/jdk/sun/security/tools/jarsigner/JarSigning_RU/New/ðñð©ÐêðÁÐÇ/ðáÐâÐüÐüð¦ð©ð¦
Dec 04 13:13:05 legaldb cavil[1636]: [1636] [e] Non-existing path in SPDX report 329096: /data/auto-co/legal-bot/java-11-openjdk/2a9e351679e9f9d5f078110b24744813/.unpacked/openjdk/test/jdk/tools/launcher/UnicodeTest/ClassAϺ+äϦϦϿ+èÏ®õ©¡µûçõ©¡µûçÓñ¦Óñ+ÓñéÓñªÓÑÇÎóÎæοÎÖάµùѵ£¼Þ¬×Ýò£ÛÁ¡ýû¦espa+¦olÓ¦äÓ©ùÓ©ó.class

It seems we have a file system path encoding problem somewhere between File::Unpack2 and Cavil. For now i've added a workaround that makes such files not prevent SPDX report generation anymore. But of course this will need to be fixed.

RFE: Sharing and Re-using OSS Compliance infromation

Hello

this is just a quick thought from Today's Open Chain webinar by Caren Kresse about OSSelot: The Open Source Curation Database

Project site: See https://osselot.org/
Videos: https://www.osselot.org/index.php?s=videos

Could we extend or reuse existing analysed data as part of our legal review process?
https://github.com/Open-Source-Compliance

Seems like the process utilizes Fossology for the scan.

Data:
https://github.com/Open-Source-Compliance/package-analysis/tree/main/analysed-packages

The DB grows with every day and it seem to be a way how to get an extra curator (Oliver reviews PRs).
package_growth

Error-0:Yv6G - pocl empty on checkout works on reimport

Context: Leap has backlog of 60~+ requests not reviewed for over 8 days.

This particular issue was identified by Sebastian Riedl
That there were two requests for pocl in the backlog, and one has an Error-0:Yv6G, which means it was empty when checked out from OBS.

legal report should not be empty judging by

$ curl https://api.opensuse.org/public/source/science/pocl?rev=05f1e68e1f6817c7e6c5391f8eac871e
<directory name="pocl" rev="05f1e68e1f6817c7e6c5391f8eac871e" srcmd5="05f1e68e1f6817c7e6c5391f8eac871e">
  <linkinfo project="openSUSE:Factory" package="pocl" srcmd5="c566cacae98e515d0e0c93647593f951" baserev="c566cacae98e515d0e0c93647593f951" lsrcmd5="c492a9be0daa12e8a8e737d7959356f9"/>
  <entry name="link_against_libclang-cpp_so.patch" md5="fb3145931e75c3a11f764f22e68425cf" size="553" mtime="1608907150"/>
  <entry name="pocl-3.0.tar.gz" md5="bd79db59fa31e38759296849291210a3" size="1722809" mtime="1662482194"/>
  <entry name="pocl-rpmlintrc" md5="a8031c13cb3a4cb232bed0fd7f42dd4e" size="45" mtime="1662487601"/>
  <entry name="pocl.changes" md5="0c2660588be38db939d7f04cf1c5ec7b" size="16917" mtime="1667384010"/>
  <entry name="pocl.spec" md5="67906134b152d59a355044292367b6ce" size="4377" mtime="1667388077"/>
</directory>

Works fine on manual reimport

Extend the import mechanism with git support

Currently we are very focused on importing packages from OBS. To support new ALP workflows and to make Cavil easier to use for the community, we should implement native git (GitHub) support.

RFE: Speeding up license correction

Hello team!

from a position of person fast-tracking Leap legal reviews in my spare time.
As part of my reviews if I see a package where list of licenses doesn't match the spec file, which is every second review of community packages basically.

I typically want to submit the correct license right away. You typically want to send SR to the development project, where the package is developed. Then get a change to Factory and finally Leap.

I typically start with osc bco $some:Devel:project $package ... followed by commit && sr once the license in spec is tweaked.

For that, I typically click on the SR (in my case Leap SR), which takes me to the OBS Leap submission. Then I click from there on the source (typically openSUSE:Factory) from where I can see developed in particular devel project.

It would help me to speed up such corrections if I'd see "developed in (link)$project(/link)" right in the WebUI of cavil.
The ideal would be "To checkout package osc bco $some:Devel:project $package". But perhaps that's too much detail for applications outside of OBS (not our case), but that would be quite nice.

It might sound like not a big deal, but we have a queue of 100 plus packages, and a good 50 will have incorrect/partial license tags in the spec file. And that is a lot of clicking.

Carwos project "API"

As discussed over an email.

Background: Our CI legal pipeline that produces license report for
our customer depends on retrieving JSON from legaldb.suse.de and it has
been changing recently.

We rely on following interfaces (accessed with header containing "carwos" token in place and "Accept: application/json"):

url = f"http://legaldb.suse.de/package/{report_id}"
url = f"http://legaldb.suse.de/reviews/calc_report/{report_id}"
url = f"http://legaldb.suse.de/reviews/fetch_source/{file_id}"
url = "http://legaldb.suse.de/packages?" + urlencode(info)
(the report_id e.g. 232522, file_id is e.g. 7115234143 etc.)

We can fix the query at our end (e.g. to url = f"http://legaldb.suse.de/reviews/calc_report/{report_id}.json) but we would like to ensure that such defined interface is not changing too often or disappears completely.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.