artefactual / archivematica-sampledata Goto Github PK
View Code? Open in Web Editor NEWArchivematica sample data
Home Page: http://www.archivematica.org
Archivematica sample data
Home Page: http://www.archivematica.org
We could do with a little more information being output to the command line to get a feel for whether writing the files has worked at all. Plus any other relevant output we/a user might need.
Similar to the need to test filenames with different encodings. If any path we work on points to a directory, and not a file, and passes through a different portion of code, e.g. a mkdir
or mv
then it might trigger a different set of errors. Related is artefactual/archivematica#1104 where a directory in a zip file using cp437
is causing undefined behaviour in the transfer.
Create a set of files that can test the capability of BE inside Archivematica.
The new createtransfers.py
script fails when calling ./createtransfers.py create-variously-encoded-files
with IOError: [Errno 84] Invalid or incomplete multibyte or wide character
.
This failure happens on the following platforms:
This failure does not happen with:
Issues template is required to redirect people to the issue repo.
If you run the DemoTransfer the PREMIS rights.csv file causes invalid METS:
line 2224, column 57: cvc-type.3.1.3: The value '' of element 'premis:copyrightStatusDeterminationDate' is not valid.
line 2226, column 49: cvc-complex-type.2.4.b: The content of element 'premis:copyrightApplicableDates' is not complete. One of '{"info:lc/xmlns/premis-v2":startDate}' is expected.
Issue: artefactual/archivematica#1104 in the Archivematica repository is an interesting one that means testing the ability of Archivematica to recuse into structures like zip files and still be able to perform the same activities. More samples with zips inside zips will help us to test that.
NB. This would likely be in support of a feature file describing the behaviour as well.
To acceptance test the correct fix of artefactual/archivematica#808
I would like to test MediaConCH in Archivematica, but there are no mkv samples included in this repo. I am having trouble finding some with an explicit licence that would be suitable to including in this repo.
Samples I found are:
http://jell.yfish.us/
https://www.matroska.org/downloads/test_w1.html
As a tester I need to be able to create fairly arbitrarily sized transfers to be able to test Archivematica's limits. An issue where the limits of AM were noticed, and thus fixed is here.
Have a look at creating this sample set in the create transfers utility for testing purposes.
Namely, those from #15.
The opf format corpus is included in this repo as a sub module. The version being pulled in is a few years old, there are some valuable updates missing.
The submodule link should be updated to openpreserve/format-corpus@5a93e3e
As a digital preservation analyst I want to understand what will be reported by Archivematica's format identification tools when fed file formats that do not conform to any specification, but are named with known file-format extensions, e.g. .jpg
.gif
.bin
etc. The determination by one of these tools may require me to design a different workflow using the system utilising one specific tool, or seek to improve the output from one of the tools or another.
While it's likely that we can recreate the issues of artefactual/archivematica#1104 with other encodings, cp437 seems to be a popular encoding used in the past. We can pretty easily add this code page as part of the other createtransfers.py
work.
There are 4 scenarios to effectively test clamAV:
We need a file that is over 20mb that has a virus.
It's been a while since the sampledata set was thoroughly reviewed. Many things have been added over time, and it's likely that there's unnecessary repetition within the sample data. At the same time, the feature set in Archivematica has been growing quickly, so there are features for which there is no sample data (see, for example, #31). Even some basic features are not testable with the current sample data (see #28).
We should consider how the auto-generated sample data fits in, and if it needs to be incorporated in a better/more readily apparent way - not sure how to do this, but it warrants consideration.
We could also look at how the sampledata set is deployed to sandbox and testing servers.
Finally, better documentation about the data and what the various transfers are supposed to test would be helpful (probably as part of the README here).
Filenames with strange (non-ASCII, non-UTF8) encodings are currently stored at TestTransfers/files_with_various_encodings/. However, on certain platforms (e.g., Mac OS X 10.13.1) attempting to checkout the master branch of this repo triggers an error in git:
error: unable to create file TestTransfers/files_with_various_encodings/big5/?s?{ (Illegal byte sequence)
error: unable to create file TestTransfers/files_with_various_encodings/shift_jis/?ۂ??Ղ郁?C?? (Illegal byte sequence)
error: unable to create file TestTransfers/files_with_various_encodings/windows_1252/s?ster (Illegal byte sequence)
If this issue cannot be overcome by some other means, then these transfers should be created programmatically (e.g., via make
rules) in this sampledata repo.
To fail verify transfer compliance (at present) all you need is a single empty folder. We can create a structure like this so that folks can observe that behaviour:
FailTransferCompliance/
├── README.md
└── TransferThisFolder
1 directory, 1 file
The AM Ansible role has been recently updated so it runs createtransfers.py
once the repo is downloaded. I think that we should introduce a Makefile
in this repo so the build process details are hidden from the consumers. This would allow us greater flexibility in controlling what's in the build.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.