Git Product home page Git Product logo

archivematica-sampledata's People

Contributors

hwesta avatar jhsimpson avatar jraddaoui avatar jrwdunham avatar mamedin avatar mcantelon avatar mistydemeo avatar replaceafill avatar ross-spencer avatar sallain avatar sarah-mason avatar sevein avatar sromkey avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

archivematica-sampledata's Issues

Non-UTF-8 file name creation command fails with IOError

The new createtransfers.py script fails when calling ./createtransfers.py create-variously-encoded-files with IOError: [Errno 84] Invalid or incomplete multibyte or wide character.

This failure happens on the following platforms:

  • Mac OS X 10.13.1 High Sierra with any version of python I tried
  • Debian 8.9 (jessie) with Python 2.7.13 (i.e., in the python:2.7 Docker container created by the SS Dockerfile)

This failure does not happen with:

  • Ubuntu 16.04 xenial and Python 2.7.12

Problem: DemoTransfer causing invalid METS

If you run the DemoTransfer the PREMIS rights.csv file causes invalid METS:

line 2224, column 57: cvc-type.3.1.3: The value '' of element 'premis:copyrightStatusDeterminationDate' is not valid.
line 2226, column 49: cvc-complex-type.2.4.b: The content of element 'premis:copyrightApplicableDates' is not complete. One of '{"info:lc/xmlns/premis-v2":startDate}' is expected.

Problem: Create files to test issues around extension-based identification

As a digital preservation analyst I want to understand what will be reported by Archivematica's format identification tools when fed file formats that do not conform to any specification, but are named with known file-format extensions, e.g. .jpg .gif .bin etc. The determination by one of these tools may require me to design a different workflow using the system utilising one specific tool, or seek to improve the output from one of the tools or another.

Problem: Need a file over 20mb with a virus for testing

There are 4 scenarios to effectively test clamAV:

  1. files < 20M that have virus WE HAVE
  2. files < 20M that don’t have virus WE HAVE
  3. files > 20 M that have virus WE DO NOT HAVE
  4. files > 20M that don’t have virus WE HAVE

We need a file that is over 20mb that has a virus.

Problem: Review sampledata

It's been a while since the sampledata set was thoroughly reviewed. Many things have been added over time, and it's likely that there's unnecessary repetition within the sample data. At the same time, the feature set in Archivematica has been growing quickly, so there are features for which there is no sample data (see, for example, #31). Even some basic features are not testable with the current sample data (see #28).

We should consider how the auto-generated sample data fits in, and if it needs to be incorporated in a better/more readily apparent way - not sure how to do this, but it warrants consideration.

We could also look at how the sampledata set is deployed to sandbox and testing servers.

Finally, better documentation about the data and what the various transfers are supposed to test would be helpful (probably as part of the README here).

Problem: filenames with strange encodings are not created programmatically

Filenames with strange (non-ASCII, non-UTF8) encodings are currently stored at TestTransfers/files_with_various_encodings/. However, on certain platforms (e.g., Mac OS X 10.13.1) attempting to checkout the master branch of this repo triggers an error in git:

error: unable to create file TestTransfers/files_with_various_encodings/big5/?s?{ (Illegal byte sequence)
error: unable to create file TestTransfers/files_with_various_encodings/shift_jis/?ۂ??Ղ郁?C?? (Illegal byte sequence)
error: unable to create file TestTransfers/files_with_various_encodings/windows_1252/s?ster (Illegal byte sequence)

If this issue cannot be overcome by some other means, then these transfers should be created programmatically (e.g., via make rules) in this sampledata repo.

Problem: Create a Fail Transfer Compliance Test Set

To fail verify transfer compliance (at present) all you need is a single empty folder. We can create a structure like this so that folks can observe that behaviour:

FailTransferCompliance/
├── README.md
└── TransferThisFolder

1 directory, 1 file

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.