Git Product home page Git Product logo

bagit-support's People

Contributors

bbranan avatar dependabot[bot] avatar mikejritter avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

mikejritter

bagit-support's Issues

Add code coverage reporting

Since this project is small, it would be nice to have an idea of how much of the code is covered by the testing.

Two options we have are Codecov and Coveralls, which should be compared then once one is chosen integrated with the project

BagProfile Improvements

BagProfile offers validation for both BagConfig and Bag, but it might be nice to offer more generalized approaches to validation. For example, BagConfig holds tag files and their key/value pairs in a Map<String, Map<String, String>>. An extra validation method could be made for the Map.

Separate handling for payload manifest and tag manifest algorithms

The bagit profiles spec has separate sections for specifying valid algorithms for the payload manifests and tag manifests (e.g. Manifests-Allowed and Tag-Manifests-Allowed). Currently the BagWriter takes in only a single Set<BagItDigest> for the algorithms to use.

We should strive for better conformance to the specification by adding support for this differentiation in the BagWriter. The current constructor should still be seen as valid, and represents a case when both the payload and tag manifest will use the same algorithms. A secondary constructor should be created which allows for setting separate algorithms per manifest.

Project Organization

As the bagit-support project has grown, the package structure has remained flat. We should start to group common classes together, e.g. exceptions, serializers, in order to provide better organization for the project.

Fix BeyondTheRepository Version

The BTR json currently displays 0.1 as the Version, but it should be 1.0. The rest of the profile is up to date.

  "BagIt-Profile-Info": {
    "Source-Organization": "Beyond the Repository Bagit Profile Group",
    "External-Description": "Bagit Profile for Consistent Deposit to Distributed Digital Preservation Services",
    "Version": "0.1",
    "BagIt-Profile-Identifier": "https://github.com/dpscollaborative/btr_bagit_profile/releases/download/1.0/btr-bagit-profile.json",
    "BagIt-Profile-Version": "1.3.0"
  }

Use BagIt-Profile-Identifier from version 1.0 of BTR Profile

Version 1.0 of the BTR BagIt Profile was released today (2020-04-10): https://github.com/dpscollaborative/btr_bagit_profile/releases/tag/1.0

The BagIt-Profile-Identifier for this version is: https://github.com/dpscollaborative/btr_bagit_profile/releases/download/1.0/btr-bagit-profile.json.

This identifier should be used to set the BagIt-Profile-Identifier field when creating bags that conform to the BTR profile (rather than the current beyondtherepository value).

Default Bag Profile

Allow a no-arg BagProfile constructor which uses the beyondtherepository profile. This conflicts with the current default profile, which could be renamed to fedora-import-export.

Improvements for BagWriter

Working with the BagWriter has shown some of the methods are clunky to deal with. A few things that would be nice to have:

  • append values to tag files instead of needing to do a get each time
  • ability to register a single checksum
    • this would not be used at the moment and as such is being omitted

Description text not loading from bag profile json

In BagProfile.java, the json node for the description field is not being set. This can be seen as it is only checking if the node is empty.

The test for BagProfile.java should also ensure that ProfileFieldRule#getDescription is called so that we know the description is read in properly.

ProfileValidationUtil readInfo failure

In the readInfo method for the ProfileValidationUtil, the line.matches call will always fail because the regex is searching for a line which only contains spaces.

To fix this we could either use Matcher#find which will search for the occurrence, or update the regex to include a wildcard after the starting spaces.

Maven build warnings

The following build warning is emitted at the beginning of the mvn clean install process:

[WARNING] 
[WARNING] Some problems were encountered while building the effective model for org.duraspace:bagit-support:jar:1.1.0-SNAPSHOT
[WARNING] 'build.plugins.plugin.version' for com.github.github:site-maven-plugin is missing. @ line 289, column 15
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-site-plugin is missing. @ line 294, column 15
[WARNING] 'build.plugins.plugin.version' for org.apache.maven.plugins:maven-resources-plugin is missing. @ line 251, column 15
[WARNING] 
[WARNING] It is highly recommended to fix these problems because they threaten the stability of your build.
[WARNING] 
[WARNING] For this reason, future Maven versions might no longer support building such malformed projects.
[WARNING] 

Checksum payload manifests after OutputStream is closed

Currently in the BagWriter, the check to see if a payload manifest should be registered happens while writing is still ongoing. This can cause the actual checksum of the file to be incorrect when being registered with the tagmanifest.

Review exceptions in BagProfile

The BagProfile class has methods for validating conformance for both a BagConfig and a Bag, but both of these throw RuntimeExceptions if validation fails. While this was ok when these classes were only a part of the import-export-utility, it might be better to either use a checked exception now or return a result value which encapsulates the errors and status of the validation.

Relax tar content type matching

Currently when trying to serialize or deserialize a tarball, the SerializationSupport class differentiates between application/tar and application/x-tar in its commonTypeMap. However, there are bag profiles which specify only application/tar even though we treat the content types as functionally equivalent.

Use BagItDigest in BagWriter checksum maps

When writing bags we track checksums for files in a Map which currently uses a String as the key (Map<String, Map<File, String>>). This String represents the name of the algorithm, which we typically get from the BagItDigest. This should be updated to use the BagItDigest as the type for the key in order to give further assurances about what algorithms are being used.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.