opensuse / cavil Goto Github PK
View Code? Open in Web Editor NEWThe legal review and SBOM system used by SUSE and openSUSE
License: GNU General Public License v2.0
The legal review and SBOM system used by SUSE and openSUSE
License: GNU General Public License v2.0
Things like "throw IllegalArgumentException();" in a code trigger Risk 9 because of the string legal being part of the exception name. Also, comments that state what is "legal" or "illegal" as an input for a function to the same. Would there be a way to make the scanning more smart and avoid a load of false positives?
There's currently a hack that pretends uploaded tarballs are actually RPM packages. It kinda works for testing stuff, but we should at some point make that a real feature and fully support arbitrary tarballs.
The recent Factory change to accept every request after 2 hours (and to obsolete the legal review in progress) means that we now have many repeated imports of the same sources from OBS (first factory review, then product sync). That seems to result in source checkouts getting lost completely sometimes, causing an empty legal report.
The problem probably existed for a long time, but the recent change made it a much more frequent occurrence.
is cavil able to scan for trademarked logos and could flag them for manual review? trademark usage is difficult as fair-use could apply, however I think currently we're not highlighting those at all?
It should be possible to link directly to individual paginated result subsets, like https://legaldb.suse.de/#page=12
.
For spec files not coming from openSUSE, cavil is making it hard to review the rpm license as it enforces SPDX. If SPDX parsing fails, it should show the original rpm license with the information that spdx mapping failed included.
OCI containers can have a license declaration as part of the container metadata:
https://github.com/opencontainers/image-spec/blob/master/annotations.md#pre-defined-annotation-keys
Kiwi and podman / docker support setting those labels during build time. example for kiwi:
<containerconfig
name="my-container"
tag="latest"
additionaltags="1.0.0.%RELEASE%">
<labels>
<!-- See https://en.opensuse.org/Building_derived_containers#Labels -->
<suse_label_helper:add_prefix prefix="org.example.container">
...
<!-- Select a correct license from https://github.com/openSUSE/spec-cleaner#spdx-licenses -->
<label name="org.opencontainers.image.licenses" value="SUSE-Permissive"/>
</suse_label_helper:add_prefix>
</labels>
<history author="Fabian Vogt <[email protected]>">Derive the image</history>
</containerconfig>
it would be good if the legal auto bot would check for this to be set and accurate (that is harder)
The nmap
package in Factory has started using the license LicenseRef-NPSL-0.93
and Cavil currently thinks that is not valid SPDX. But it is spec compliant, while our SUSE-*
prefix is not. So we should at least support LicenseRef-*
in addition to SUSE-*
.
we have occasionaly the case where users submit packages with unintended suffix, like:
5 | ibs#289721 | a month ago | qt6-base.SUSE_SLE-15-SP4_GA | Error-9:Z9pY |
---|---|---|---|---|
5 ibs#289721 a month ago qt6-base.SUSE_SLE-15-SP4_GA Error-9:Z9pY |
These "spec file not found" cases should probably lead to a decline, or at least some more attention?
Performance issues are becoming more common with our current UI. Especially the AJAX driven tables are a big problem once the data sets reach a certain size. We've learned a lot about how to make better performing UIs with Vue.js for the QEM Dashboard. Those lessons should be applied to Cavil as well.
There seems to be a trend where open source projects include SPDX license expressions in their LICENSE file. We should probably try that too when extracting package metadata. Especially when we detect Kiwi files or Helm charts.
Example: https://github.com/osquery/osquery/blob/master/LICENSE
A given report with .obscpio archive which contains other archives would have a report like,
MPL-Unspecified: 3 files
node_modules.obscpio._/package._1281/index.js
node_modules.obscpio._/package._1282/index.js
node_modules.obscpio._/package._943/node_modules/spdx-correct/index.js
It would be a lot more helpful to have the output include the name of the inner archives in the filenames. Even if you filter it to only include limited characters set [0-9a-zA-Z_+\-\.]
(think XSS) it would be a lot more helpful than the current format.
MPL-Unspecified: 3 files
node_modules.obscpio._/package._1281.some_program_5.4.tgz/index.js
node_modules.obscpio._/package._1282.another_program_1.4.tgz/index.js
node_modules.obscpio._/package._943.magics_23.tgz/node_modules/spdx-correct/index.js
When a package only consists of a Dockerfile or a Chart.yaml (which is referring to a helm chart), then cavil fails the review wtih an error message.
it would be good if we would continue to do source tarball evaluation (e.g. scan all the tar files in that package) instead, so that we can get something that is a valid report for legal.
We can get a bit better at recognizing particular BSD license. See my example
Following snippet was recognized as BSD with unspecfied version, while it's clearly stated it's BSD-2-Clause
src/regex/tre*) is Copyright © 2001-2008 Ville Laurikari and licensed
under a 2-clause BSD license (license text in the source files).
Even the opensource.org calls it "2-Clause BSD License"
POPULAR / STRONG COMMUNITY
The 2-Clause BSD License
SPDX short identifier: BSD-2-Clause
We have 461 patterns without a license, and only 48 of them are keywords with risk 9. 248 have a risk assessment of 0, suggesting they have been used as a hack-ish version of ignore patterns, before the real feature existed. An unknown but not insignificant number also has actual license text, which seem to have been accidentally not assigned a license name.
We should find out which of these patterns are not in use for current Factory packages and remove all that have become obsolete.
Much of our license pattern data predates the existence of SPDX, so we rely on mostly arbitrarily chosen identifiers. Recently there has been growing interest in reports that also include SPDX identifiers. This has many advantages, such as the ability to exchange reports in standard formats with tools like Fossology. Which in turn would also allow us to cooperate more with open source projects like OSSelot (see #64).
Currently we only list priorities as part of the link, which is easily overlooked. It should probably be a separate table column with some kind of colour highlighting for high priority reviews.
Hello
this is an example copy-paste, where rather than GPL-2.0+ we should use GPL-2.0-or-later and similar.
It has happened to me a few times, that we've accepted changes to devel project, however, they failed to build in Factory where we have strict rpmlint checking. I can't recall what the license was, but the mistake was that I did copy paste the license text from Cavil and didn't cross-check against https://github.com/openSUSE/obs-service-format_spec_file/blob/master/licenses_changes.txt which I newly do since this issue occurred..
So, could we only use licenses and exceptions that are acceptable/listed by obs-service-format_spec_file?
GPL-2.0+ OR MIT: [1 files] ...
GPL-2.0+ WITH Autoconf-exception-3.0: ...
GPL-2.0+ WITH Libtool-exception: ...
GPL-3.0+ WITH Autoconf-Exception-3.0 ...
I understand that that might be challenging as I've seen a report which was referencing an older version of license than we had in the obs-service-format_spec_file. Perhaps such exceptions could be colorized or so, to warn the reviewer.
Currently it is very hard to audit and possibly correct any already finished reviews. We could probably expand the file viewer to show all pattern matches for the whole file, including ignores and use a pull down menu with correction options, such as the removal of an ignore pattern.
Similar to #71, direct tarball downloads would serve a similar purpose and give us more options in the future.
Reported by @coolo. If the same package is requested multiple times in quick succession it might be possible to create the same obs_import
job multiple times, resulting in a race condition.
The cavil view should have a cascaded view of dependencies instead of just a flat-view. So instead of just having a flat list of files, I should be able to navigate the tree and have a subview of the licenses in that tree only.
Our production review backlog tends to be rather large because of many automatically imported priority 1 (low) openSUSE:Factory packages. It would help to be able to filter those out with a minimum priority setting in the ui, probably defaulting to priority 2 and above.
Unpacking libadplug from the OBS request https://build.opensuse.org/request/show/1067071 fails with:
libmagic (null) at /usr/lib/perl5/vendor_perl/5.26.1/x86_64-linux-thread-multi/File/LibMagic.pm line 206.
SIGINT handler "default" not defined.
Our daily cleanup background jobs take much longer than they should, wasting a lot of resources. There's probably many ways to improve that significantly.
Some named licenses have multiple conflicting risk assessments for various patterns:
Apache-1.1: 3, 4
Apache-2.0: 2, 1, 3
Apache-2.0 AND CC-BY-SA-4.0: 3, 2
Apache-2.0 OR Artistic-2.0: 3, 2
Apache-2.0 OR BSD-3-Clause: 2, 1
Apache-2.0 OR GPL-2.0: 3, 2
Apache-2.0 OR GPL-2.0+: 3, 2
Apache-2.0 OR MIT: 3, 1
Apache-2.0 WITH LLVM-exception: 3, 2
...
This needs to be cleaned up once we have gotten a normalised list back from the lawyers. And perhaps it would be a good idea to dedicate a new cli command to license pattern maintenance.
Be aware: Cases like Any Proprietary: 5, 3, 1, 4
need to have patterns with different risk assessments, since they don't represent one specific named license.
Hello team,
SUSE is currently running a pilot of Github Copilot https://mysuse.sharepoint.com/sites/github-copilot-pilot/SitePages/Introduction.aspx
So far it is a pilot aware of "AI Pair programmer" https://opensource.suse.com/legal/policy and none of the code will make it to the SUSE product.
However since SUSE Legal doesn't review any of openSUSE legal reviews, I'd like to make sure that we do not automatically fast-forward requests containing such changes. Without somebody actually looking into it.
Keeping this open as a high level tracker.
Cavil can't currently unpack .zst
files because File::Unpack
lacks support for the format. We have started to see this format being used in OBS though, for packages like trivy.
With the switch to server-side pagination we've lost support for ordering in all ui tables. It's a bit tricky to reimplement, but we should bring it back at least in places where users have requested it.
/products/*
Not all Open Source licenses are compatible with each other. It would be nice if Cavil could highlight known incompatibilities in the report. Perhaps with a UI for incompatibility management.
This seems to be pretty rare, but recently we've seen a job history like this where a lock was active that should not have been, preventing the package from being unpacked.
id | task | state | result | created
----------+-------------+----------+---------------------------------------------+-------------------------------
91476425 | obs_import | finished | null | 2021-12-09 00:09:02.835544+01
91476428 | unpack | finished | "Package 281044 is already being processed" | 2021-12-09 00:09:21.143415+01
91487879 | index_later | finished | null | 2021-12-10 20:00:02.032591+01
Currently we have to look at PostgreSQL directly to review ML classification results. This has become rather tedious with an increasing number of classification failures. We are going to need a proper UI for reviewing results that can also be used to create new training data to reduce the number of failures again.
What i'd like to see is a long list of license snippets with ML assessment results and two buttons for a human to press, green and red.
ldig@legaldb:~/cavil> ./script/cavil minion job -f 28673440
T: 445 files ...
Deep recursion on subroutine "File::Unpack::unpack" at /usr/lib/perl5/vendor_perl/5.18.2/File/Unpack.pm line 1170.
unpack('/data/auto-co/legal-bot/gcc46/5c4638b8b35ffd2d07223f6844e9f64e/.unpacked/gcc-4.6.2-20111212/libgo/go/archive/zip/testdata/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r','/data/auto-co/legal-bot/gcc46/5c4638b8b35ffd2d07223f6844e9f64e/.unpacked/gcc-4.6.2-20111212/libgo/go/archive/zip/testdata/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r/r'): recursion limit 200 at /home/ldig/cavil/script/../lib/Cavil/Checkout.pm line 154.
[2019-07-04 06:52:47.58874] [8318] [info] [27411
] Unpacked /data/auto-co/legal-bot/gcc46/5c4638b8b35ffd2d07223f6844e9f64e
BuildRequire licenses should be also presented. It's otherwise not possible to see the license of the aggregate.
We have some duplicate license names with different capitalisation. Like Any permissive
and Any Permissive
, which are considered different licenses. That should probably not be the case.
#76 has shown that our test coverage is still not great. We should make sure that at least everything needed for the normal review process is covered. A good first step would probably the addition of coverage metrics.
From the report it should be easier to create new license patterns. For many keyword matches Cavil already has a good estimation for what the license pattern metadata will look like. Here it should be much easier to create the pattern without leaving the report UI. Perhaps we could even do something like GitHub reviews, where multiple patterns can be created from the report UI and submitted together as a batch.
Dec 04 13:13:02 legaldb cavil[1636]: [1636] [e] Non-existing path in SPDX report 329096: /data/auto-co/legal-bot/java-11-openjdk/2a9e351679e9f9d5f078110b24744813/.unpacked/openjdk/test/jdk/sun/misc/URLClassPath/testclasses/+ª-ë-ï+Ñ-å-î.class
Dec 04 13:13:05 legaldb cavil[1636]: [1636] [e] Non-existing path in SPDX report 329096: /data/auto-co/legal-bot/java-11-openjdk/2a9e351679e9f9d5f078110b24744813/.unpacked/openjdk/test/jdk/sun/security/tools/jarsigner/JarSigning_RU/New/ðñð©ÐêðÁÐÇ/English
Dec 04 13:13:05 legaldb cavil[1636]: [1636] [e] Non-existing path in SPDX report 329096: /data/auto-co/legal-bot/java-11-openjdk/2a9e351679e9f9d5f078110b24744813/.unpacked/openjdk/test/jdk/sun/security/tools/jarsigner/JarSigning_RU/New/ðñð©ÐêðÁÐÇ/ðáÐâÐüÐüð¦ð©ð¦
Dec 04 13:13:05 legaldb cavil[1636]: [1636] [e] Non-existing path in SPDX report 329096: /data/auto-co/legal-bot/java-11-openjdk/2a9e351679e9f9d5f078110b24744813/.unpacked/openjdk/test/jdk/tools/launcher/UnicodeTest/ClassAϺ+äϦϦϿ+èÏ®õ©¡µûçõ©¡µûçÓñ¦Óñ+ÓñéÓñªÓÑÇÎóÎæοÎÖάµùѵ£¼Þ¬×Ýò£ÛÁ¡ýû¦espa+¦olÓ¦äÓ©ùÓ©ó.class
It seems we have a file system path encoding problem somewhere between File::Unpack2
and Cavil
. For now i've added a workaround that makes such files not prevent SPDX report generation anymore. But of course this will need to be fixed.
Hello
this is just a quick thought from Today's Open Chain webinar by Caren Kresse about OSSelot: The Open Source Curation Database
Project site: See https://osselot.org/
Videos: https://www.osselot.org/index.php?s=videos
Could we extend or reuse existing analysed data as part of our legal review process?
https://github.com/Open-Source-Compliance
Seems like the process utilizes Fossology for the scan.
Data:
https://github.com/Open-Source-Compliance/package-analysis/tree/main/analysed-packages
The DB grows with every day and it seem to be a way how to get an extra curator (Oliver reviews PRs).
Context: Leap has backlog of 60~+ requests not reviewed for over 8 days.
This particular issue was identified by Sebastian Riedl
That there were two requests for pocl in the backlog, and one has an Error-0:Yv6G, which means it was empty when checked out from OBS.
legal report should not be empty judging by
$ curl https://api.opensuse.org/public/source/science/pocl?rev=05f1e68e1f6817c7e6c5391f8eac871e
<directory name="pocl" rev="05f1e68e1f6817c7e6c5391f8eac871e" srcmd5="05f1e68e1f6817c7e6c5391f8eac871e">
<linkinfo project="openSUSE:Factory" package="pocl" srcmd5="c566cacae98e515d0e0c93647593f951" baserev="c566cacae98e515d0e0c93647593f951" lsrcmd5="c492a9be0daa12e8a8e737d7959356f9"/>
<entry name="link_against_libclang-cpp_so.patch" md5="fb3145931e75c3a11f764f22e68425cf" size="553" mtime="1608907150"/>
<entry name="pocl-3.0.tar.gz" md5="bd79db59fa31e38759296849291210a3" size="1722809" mtime="1662482194"/>
<entry name="pocl-rpmlintrc" md5="a8031c13cb3a4cb232bed0fd7f42dd4e" size="45" mtime="1662487601"/>
<entry name="pocl.changes" md5="0c2660588be38db939d7f04cf1c5ec7b" size="16917" mtime="1667384010"/>
<entry name="pocl.spec" md5="67906134b152d59a355044292367b6ce" size="4377" mtime="1667388077"/>
</directory>
Works fine on manual reimport
Currently we are very focused on importing packages from OBS. To support new ALP workflows and to make Cavil easier to use for the community, we should implement native git (GitHub) support.
Hello team!
from a position of person fast-tracking Leap legal reviews in my spare time.
As part of my reviews if I see a package where list of licenses doesn't match the spec file, which is every second review of community packages basically.
I typically want to submit the correct license right away. You typically want to send SR to the development project, where the package is developed. Then get a change to Factory and finally Leap.
I typically start with osc bco $some:Devel:project $package
... followed by commit && sr once the license in spec is tweaked.
For that, I typically click on the SR (in my case Leap SR), which takes me to the OBS Leap submission. Then I click from there on the source (typically openSUSE:Factory) from where I can see developed in particular devel project.
It would help me to speed up such corrections if I'd see "developed in (link)$project(/link)" right in the WebUI of cavil.
The ideal would be "To checkout package osc bco $some:Devel:project $package
". But perhaps that's too much detail for applications outside of OBS (not our case), but that would be quite nice.
It might sound like not a big deal, but we have a queue of 100 plus packages, and a good 50 will have incorrect/partial license tags in the spec file. And that is a lot of clicking.
As discussed over an email.
Background: Our CI legal pipeline that produces license report for
our customer depends on retrieving JSON from legaldb.suse.de and it has
been changing recently.
We rely on following interfaces (accessed with header containing "carwos" token in place and "Accept: application/json"):
url = f"http://legaldb.suse.de/package/{report_id}"
url = f"http://legaldb.suse.de/reviews/calc_report/{report_id}"
url = f"http://legaldb.suse.de/reviews/fetch_source/{file_id}"
url = "http://legaldb.suse.de/packages?" + urlencode(info)
(the report_id e.g. 232522, file_id is e.g. 7115234143 etc.)
We can fix the query at our end (e.g. to url = f"http://legaldb.suse.de/reviews/calc_report/{report_id}.json) but we would like to ensure that such defined interface is not changing too often or disappears completely.
The code we currently use was copied mostly from openQA a long time ago and is not all that reliable. We should replace it with Mojolicious::Plugin::OAuth2, which recently gained support for OpenID Connect (which the openSUSE identity provider also supports).
It looks like some recent update / rewrite in UI dropped the ability to sort the pending review table using the different columns.
I'm not longer able to sort it by Creation time
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.