Git Product home page Git Product logo

demetsiiify's Introduction

demetsiiify is a web service for creating IIIF manifests from METS/MODS documents. It does not store the document images itself, but merely keeps track of the available dimensions, redirecting to the most suitable original resource when requested via the IIIF Image API.

It sports the following features:

  • Included Annotation Server: Users can create and share annotations using the Mirador viewer
  • RESTFul API that can be used from scripts and other programs
  • Every ID in the the generated manifests is fully dereferenceable (i.e. canvases, ranges, structures, etc)
  • Exposes the complete set of imported documents as a paginated IIIF collection
  • Rudimentary support for the IIIF Content Search API, allows searching through user-created annotations by target and date (no fulltext search, yet)

The service is available at https://demetsiiify.jbaiter.de

To run it on your own machine, make sure that you have an up-to-date version of both docker and docker-compose on your machine. Then, follow these steps:

  1. Run docker-compose up to start the individual services
  2. Run docker-compose run webapp pipenv run manage create to initialise the database

You should then be able to reach the service at http://localhost:5000

Caveats

Currently the service was only tested with METS/MODS documents that comply with the guidelines from the German Research Foundation (DFG), including most of the ~1.6 million digitized volumes available at the Central Directory of Digitized Prints.

If you would like to add support for your own flavor of METS/MODS, feel free to open an issue with a few example documents and I will try to adapt the software accordingly.

demetsiiify's People

Contributors

jbaiter avatar stweil avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

demetsiiify's Issues

Wrong URL for generated manifest

List index out of range.

I'm getting this error when I input mets into a locally hosted demetsiiify:

  File "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1612, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1598, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/code/demetsiiify/blueprints/api.py", line 127, in api_import
    job_meta = mets.get_basic_info(mets_url)
  File "/code/demetsiiify/mets.py", line 382, in get_basic_info
    doc.read_metadata()
  File "/code/demetsiiify/mets.py", line 162, in read_metadata
    self.metadata['title'] = self._read_titles()
  File "/code/demetsiiify/mets.py", line 148, in _read_titles
    ".//mods:relatedItem[@type='host']/mods:titleInfo")[0]]
IndexError: list index out of range

Import of ZEFYS METS fails

Currently, the digitised newspapers in ZEFYS rely on a somewhat sneaky way to publish METS files dynamically from Solr - which nevertheless seems to work fine with the venerable DFG-Viewer as in e.g. http://zefys.staatsbibliothek-berlin.de/dfg-viewer/?no_cache=1&set%5Bmets%5D=http%3A%2F%2Fzefys.staatsbibliothek-berlin.de%2Foai%2F%3Ftx_zefysoai_pi1%255Bidentifier%255D%3D6ab07f67-b3be-4309-9e5a-e8e97781048b

However, importing the above URL into https://zvdd-ng.de/ results in an error with below stracktrace

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1612, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1598, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/code/demetsiiify/blueprints/api.py", line 127, in api_import
    job_meta = mets.get_basic_info(mets_url)
  File "/code/demetsiiify/mets.py", line 380, in get_basic_info
    tree = etree.parse(mets_url)
  File "src/lxml/lxml.etree.pyx", line 3427, in lxml.etree.parse (src/lxml/lxml.etree.c:81117)
  File "src/lxml/parser.pxi", line 1811, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:117848)
  File "src/lxml/parser.pxi", line 1837, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:118195)
  File "src/lxml/parser.pxi", line 1741, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:117107)
  File "src/lxml/parser.pxi", line 1138, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:111653)
  File "src/lxml/parser.pxi", line 595, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:105109)
  File "src/lxml/parser.pxi", line 706, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:106817)
  File "src/lxml/parser.pxi", line 635, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:105671)
  File "http://zefys.staatsbibliothek-berlin.de/dfg-viewer/?no_cache=1&set%5Bmets%5D=http%3A%2F%2Fzefys.staatsbibliothek-berlin.de%2Foai%2F%3Ftx_zefysoai_pi1%255Bidentifier%255D%3D4b94aaaa-9d1b-4afa-8211-b52720ffc22e", line 184
lxml.etree.XMLSyntaxError: Entity 'nbsp' not defined, line 184, column 27

When the actual METS URL http://zefys.staatsbibliothek-berlin.de/oai/?tx_zefysoai_pi1%5Bidentifier%5D=6ab07f67-b3be-4309-9e5a-e8e97781048b is extracted from the DFG-Viewer URL above and fed directly to https://zvdd-ng.de/ instead, a different error ensues

Traceback (most recent call last):
  File "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1612, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.5/site-packages/flask/app.py", line 1598, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/code/demetsiiify/blueprints/api.py", line 127, in api_import
    job_meta = mets.get_basic_info(mets_url)
  File "/code/demetsiiify/mets.py", line 380, in get_basic_info
    tree = etree.parse(mets_url)
  File "src/lxml/lxml.etree.pyx", line 3427, in lxml.etree.parse (src/lxml/lxml.etree.c:81117)
  File "src/lxml/parser.pxi", line 1811, in lxml.etree._parseDocument (src/lxml/lxml.etree.c:117848)
  File "src/lxml/parser.pxi", line 1837, in lxml.etree._parseDocumentFromURL (src/lxml/lxml.etree.c:118195)
  File "src/lxml/parser.pxi", line 1741, in lxml.etree._parseDocFromFile (src/lxml/lxml.etree.c:117107)
  File "src/lxml/parser.pxi", line 1138, in lxml.etree._BaseParser._parseDocFromFile (src/lxml/lxml.etree.c:111653)
  File "src/lxml/parser.pxi", line 595, in lxml.etree._ParserContext._handleParseResultDoc (src/lxml/lxml.etree.c:105109)
  File "src/lxml/parser.pxi", line 706, in lxml.etree._handleParseResult (src/lxml/lxml.etree.c:106817)
  File "src/lxml/parser.pxi", line 635, in lxml.etree._raiseParseError (src/lxml/lxml.etree.c:105671)
  File "<string>", line 0
lxml.etree.XMLSyntaxError: Unknown encoding ""

Don't create IIIF Image Proxy-Service if there's only one image

Currently the app will always generate a proxy Image API info.json and reference that in the manifest for the page image.
In cases where there is only a single image available for a given page, this is not necessary and we should instead just directly put the image URL into the manifest.
In these cases we could also support non-JPEG formats like PNG, since the referenced image is no longer an IIIF Image (which mandates JPEG-support).

Could not import DFG-Viewer example (caused by image URL with redirect?)

While trying to process the BSB example, the METS import fails:

[...]
  File "/code/demetsiiify/mets.py", line 376, in image_info
    'mimetype': server_mime}) from exc
demetsiiify.mets.MetsImportError: Could not open image from http://daten.digitale-sammlungen.de/~db/0002/bsb00020619/images/150/bsb00020619_00015.jpg, likely the server sent corrupt data.
[...]

The image link works in a web browser and redirects to a different web page.

demetsiiify shows an error message with missing information:

 Unfortunately we were unable to generate a IIIF manifest from the METS located at

                                                           .

The error was logged in our backend and will be examined. If you wish to help with debugging, you can consult the traceback below and open an issue on GitHub. 

Latest code not working (undefined variables, wrong function call)

The code uses undefined variables mets_doc and manifest_ident and calls _fill_manifest_metadata with only one instead of two arguments.

I had to fix those lines to get something which works partially:

diff --git a/demetsiiify/iiif.py b/demetsiiify/iiif.py
index 7b861db..1aeafc5 100644
--- a/demetsiiify/iiif.py
+++ b/demetsiiify/iiif.py
@@ -175,7 +175,7 @@ def _make_empty_manifest(ident, label):
         current_app.config['SERVER_NAME']))
     manifest_factory.set_iiif_image_info('2.0', 0)
     manifest = manifest_factory.manifest(ident=manifest_ident,
-                                         label=make_label(mets_doc.metadata))
+                                         label=label)
     return manifest
 
 
@@ -212,9 +212,12 @@ def make_manifest(ident, mets_doc, physical_map, thumbs_map):
     :returns:               Generated IIIF manifest
     :rtype:                 dict
     """
+    manifest_ident = '{}://{}/iiif/{}/manifest'.format(
+        current_app.config['PREFERRED_URL_SCHEME'],
+        current_app.config['SERVER_NAME'], ident)
     manifest = _make_empty_manifest(ident=manifest_ident,
                                     label=make_label(mets_doc.metadata))
-    _fill_manifest_metadata(manifest)
+    _fill_manifest_metadata(manifest, mets_doc.metadata)
 
     phys_to_canvas = {}
     seq = manifest.sequence(ident='default')

Setup demetsiiify behind Nginx

I setup the app on a server and have no problem to get the content of e.g. the root location with curl. But in the browser I always get a 404 and can't figure out the needed nginx configuration.

My nginx site config are:

server {
  listen [::]:80;
  listen 80;
  server_name foo.bar;

  location /socket.io {
    proxy_pass http://127.0.0.1:5000/socket.io;
    proxy_redirect off;
    proxy_buffering off;
    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "Upgrade";
  }

  location / {
    proxy_pass http://127.0.0.1:5000;
    proxy_redirect off;

    proxy_set_header Host $host;
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
  }
}

I set up logging for gunicorn and can see that the calls are passed to the app in the docker container, but all there is something like:

[2017-06-29 15:04:03 +0000] [7] [DEBUG] GET /
[2017-06-29 15:04:03 +0000] [7] [DEBUG] Closing connection. 
[2017-06-29 15:04:10 +0000] [7] [DEBUG] GET /browse
[2017-06-29 15:04:10 +0000] [7] [DEBUG] Closing connection. 

As I see that you are using also Nginx for https://zvdd-ng.de/ can you tell me whats going wrong?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.