uoregon-libraries / rais-image-server Goto Github PK

RAIS: A IIIF-compliant, 100% open source image server for blazing-fast deep zooming

License: Creative Commons Zero v1.0 Universal

Makefile 1.05% Go 91.43% Shell 3.05% C 1.81% HTML 1.39% Dockerfile 1.27%

iiif tile-server jp2 golang-application jpeg2000 image-server zoomable-images

rais-image-server's Introduction

Rodent-Assimilated Image Server

RAIS was originally built by eikeon as a 100% open source, no-commercial-products-required, proof-of-concept tile server for JP2 images within chronam.

It has been updated to allow more command-line options, more source file formats, more features, and conformance to the IIIF spec.

RAIS is very efficient, completely free, and easy to set up and run. See our wiki pages for more details and documentation.

Configuration

Main Configuration Settings

RAIS uses a configuration system that allows environment variables, a config file, and/or command-line flags. See rais-example.toml for an example of a configuration file. RAIS will use a configuration file if one exists at /etc/rais.toml.

The configuration file's values can be overridden by environment variables, while command-line flags will override both configuration files and environmental variables. Configuration is best explained and understood by reading the example file above, which describes all the values in detail.

Cloud Settings

Because connecting to a cloud provider is optional, often means using a container-based setup, and differs from one provider to the next, all RAIS cloud configuration is environment-only. This means it can't be specified on the command line or in rais.toml.

Currently RAIS can theoretically support S3, Azure, and Google Cloud backends, but only S3 has had much testing. To set up RAIS for S3, you would have to export the following environment variables (in addition to having an S3-compatible object store running):

AWS_ACCESS_KEY_ID: Required
AWS_SECRET_ACCESS_KEY: Required
AWS_REGION: Required
RAIS_S3_ENDPOINT: optionally set for custom S3 backends; e.g., "minio:9000"
RAIS_S3_DISABLESSL: optionally set this to "true" for custom S3 backends which don't need SSL (for instance if they're running on the same server as RAIS)
RAIS_S3_FORCEPATHSTYLE: optionally set this to "true" to force path-style S3 calls. This is typically necessary for custom S3 backends like minio, but not for AWS.

Other backends have their own environment variables which have to be set in order to have RAIS connect to them.

For a full demo of a working custom S3 backend powered by minio, see docker/s3demo.

Note that external storage is going to be slower than serving images from local filesystems! Make sure you test carefully!

IIIF Features

RAIS supports level 2 of the IIIF Image API 2.1 as well as a handful of features beyond level 2. See the IIIF Features wiki page for an in-depth look at feature support.

Caching

RAIS can internally cache the IIIF info.json requests and individual tile requests. See the RAIS Caching wiki page for details.

Generating tiled, multi-resolution JP2s

RAIS performs best with JP2s which are generated as tiled, multi-resolution (think "zoom levels") images. Generating images like this is fairly easy with either the openjpeg tools or graphicsmagick. Other tools probably do this well, but we've only directly used those.

You can find detailed instructions on the How to encode jp2s wiki page.

License

RAIS Image Server is in the public domain under a CC0 license.

Contributors

Special thanks to Jessica Dussault (@jduss4) for providing the hand-drawn "Gocutus" logo, and Greg Tunink (@techgique) for various digital refinements to said logo.

rais-image-server's People

Contributors

Stargazers

Watchers

Forkers

msonawane brightmore nerdmaster shaldonhe hejob

rais-image-server's Issues

Implement IIIF 2.1 features

This would actually help with some of the current DOS vectors (maximum size being the big one)

Expose a way for plugins to request configuration data

Plugins having to include viper and rely on global state is unnecessary and potentially dangerous (as global state tends to be). Plugins should be able to simply request a configuration value, such as "TracerOut", and the app should return the value, looking it up in config as well as the environment.

Make a dedicated thumbnail cache option?

Caching thumbnails is pretty easy to do externally, so this has never been a priority, but anybody wanting an all-in-one solution might prefer a simple in-memory thumbnail cache. The tile cache is simply nowhere near as valuable as a thumbnail cache would be for large collections.

Let S3 plugins' IDs include a bucket

I don't recall why I wanted a single bucket, but if it wasn't something the AWS APIs made impossible, each S3 request should be able to just dynamically include a bucket. If the format is something we can make unambiguous (s3:bucket:identifier may work), then the lookup should be able to just pull from different buckets as needed.

If that's too crazy, we're probably fine forcing a single bucket, but it's definitely not ideal.

Remove all viper/cobra dependencies

The viper stuff for command-line args is handy, but not worth the half a billion random dependencies it pulls in. Let's figure out a cheaper way to do this. We could parse the toml manually, and probably jam together the ENV and CLI stuff in half a day.

In a very quick test, viper adds 6 megs to a stripped binary, versus about 200k for the toml parser by itself.

Add plugin (and plugin support) for background processing of non-JP2s

When dealing with a huge non-JP2 image, the server can become... sluggish, let's just say. It would be very handy to be able to put all requests for a given resource into some kind of queue and generate a tiled, multi-resolution JP2 before serving up those requests.

The external images plugin sort of does this, but it is definitely not production-ready, it does its processing in a very weird hook (the id-to-iiif call), and it just shells out to the JP2 compression command (though this might be the easiest approach, it's certainly not ideal).

Use JP2 streaming functions instead of file-opening functions

OpenJPEG has API functions for reading from a stream rather than opening a file on disk. Implementing this may not be trivial, but it shouldn't be too bad, and it could open us up to some performance improvements if we wanted to cache a small selection of most recent JP2s in memory / read from S3 directly instead of copying the file / etc.

It would be valuable to do performance testing against S3-streamed vs. in-memory vs. on-disk JP2s if we implement this. If there's not a decent gain, that would be unfortunate, but good to know. If there is, it would be good to rebuild the S3 plugin to stream as well as optionally caching JP2s in RAM for small exhibits that need really fast tiles.

plugin was built with a different version of package go.opencensus.io/resource

installed the latest RAIS and go version go1.21.0 linux/amd64. Compiled the imagick_decoder.so, which returned without any errors. But when I restart RAIS, I see the following error :
INFO - Loading plugin "/usr/local/rais/plugins/imagick-decoder.so"
Aug 11 09:59:48 srv1 rais-server[1262522]: 2023/08/11 09:59:48.706 - rais-server - ERROR - Unable to load "/usr/local/rais/plugins/imagick-decoder.so": cannot load plugin "/usr/local/rais/plugins/imagick-decoder.so": plugin.Open("/usr/local/rais/plugins/imagick-decoder"): plugin was built with a different version of package go.opencensus.io/resource

Make log output pluggable

This could get tricky since logging is so integral to RAIS, and some logs are produced before plugins are even loaded. But it would be very handy to be able to transform the log format prior to final output. For instance, if logs were printed out as structured JSON, we could use them in interesting ways with third-party services.

Add flag for maximum tile size and resize

For resize operations, we probably want to have a maximum pixel dimensions option. Probably just a single flag that neither dimension is allowed to exceed. It might make sense to do the same for tiles, but also maybe a minimum size for tiles - if something weird happened to the JS, a tile size of 4px, for instance, could probably take down the server.

IIIF Image v3

The following is based on https://iiif.io/api/image/3.0/change-log/

Breaking changes
Size

Full is deprecated
max returns maximum size available, but constrained by any of maxWidth, maxHeight, or maxArea
sizeAboveFull is deprecated
sizeByForcedWh is deprecated
sizeByDistortedWh is deprecated (or rather, unneeded, as the feature is the same as sizeByWh)
see below for extra features tasks

Information

@id is replaced by id
@type is replace by type, is required, value is a version specific string, eg "ImageService3"
profile is a single value, uses the label, eg "level0", "level1", "level2"
update context to v3

Seemingly a type mismatch in the return value of opjStreamRead

In src/openjpeg/image_stream.go, the value returned by opjStreamRead is opjMinusOne64 (type OPJ_UINT64), which seemingly mismatches the return type OPJ_SIZE_T under some compiling systems.

The compilation fails in local golang environment under MacOS Catalina with brew module openjpeg installed and used as a library. The problem is fixed after changing return value to opjMinusOneSizeT.

I think the reason it goes well in docker environment (both alpine and golang) is maybe that the two types references to the same type in ibopenjp2-7-dev. But in my MacOS local environment, the two types confirmed are u_longlong and u_long respectively.

I have made a push request trying to fix this.

Look into limiting incoming requests

Go's standard http package is a little too willing to accept requests, so when concurrency reaches absurd levels (say 20+ concurrent requests), all the requests are fighting for very limited resources, which can cause responses to take a very long time.

We could implement a worker queue of some sort in the image processing code, perhaps. Kind of hacky, but it would prevent the server from getting into a death spiral where the slowness causes requests to keep piling up, which makes things even slower, etc.

Then again, this has never caused problems outside of extreme load testing - real-world use just doesn't get that heavy. Definitely not a major priority.

Add TIFF support

Make this service capable of reading TIFFs as well as JP2s. Maybe other formats, too? If we want this to be useful beyond UO, we need at least TIFF support. I think I saw an institution using PNGs, so that might be worth supporting if it's not too crazy.

Make caching pluggable

It would be excellent if the caching layer were a set of plugin hooks rather than a hard-coded option. The current in-memory 2Q LRU cache could be a default plugin, much like the S3 plugin, but it would be great if we could just drop in plugins at will so that if, for instance, the in-memory cache doesn't find something, a redis cache could be hit, or a filesystem cache, or whatever.

On the flip side, though, there's currently no way to purge all "things" related to a single file (e.g., if you cache half the tiles for a given JP2, then change the JP2, the tiles served by RAIS could be really weird). If we start offering up more long-lived cache options, this might need to be addressed, or at least spelled out very clearly.

Look into partial JPG decoding

libjpeg-turbo/libjpeg-turbo#34

If this does what it sounds like, it could be a huge performance benefit to implement a lower-level JPG decoder that can avoid killing RAM on huge JPG files.

Fix decentralized and not-awesome logging practices!

The main web server logs things one way (using the go log library), while the go-based openjpeg package uses a custom logger which is used to also handle the C openjpeg library errors/warnings/etc. Somewhere, something else is also doing its own logging.

In an ideal world, all logs would include more Apache-like semantics: separation of error vs. access logging (rather than error vs. "standard" logging), error logs including the log level, all logs including a useful datestamp, etc. Additionally, all logging should go through a log-level-filterable system. It could just continue being the uber-simple logger I wrote, so long as changes are made to provide more useful output.

Improve configuration options

Add config file: We already have way too many config flags, which are further painful because they have to be specified on the command line.
Add config options: ImageMagick has a billion ways to configure it, and some may be handy to expose
Allow configuring JPG output quality - most tile servers will use JPG files, and a hard-coded quality is bad

Respect the log level's AUTHORITAH

The middleware that logs requests is using the "logger" package's global default logger. This needs to stop.

This may be the time to do a proper release of the gopkg utility repo so RAIS can pull in a version that doesn't expose global state, ensuring this kind of nonsense stops.

documentation

List of potential tweaks:

RAIS_IIIFURL is still lurking in a few places in the wiki (deprecated in v4)
Docker is discussed on the Installation page, but then again on its own child page, and again (a bit) in the Development guide. The content on the Docker page is probably more current, as it doesn't refer to RAIS_IIIFURL. Potentially merge all the content to the child page and just link to it?
On the docker demo page, it might be helpful to note that images added to docker/images don't appear until the container is restarted.
Development guide has a link to a wiki page that doesn't exist: Setup

Add caching options

Caching for resize operations for sure - tiles probably not for now, though if easy enough, it could be optional

Consider file modification time for tile and info cache

When a file is replaced by another with different dimensions, the cache is a mess and wrecks things until the cache(s) expire. Since we don't expire based on a time, instead opting for a "most popular" kind of cache, the expiration can be seconds, hours, days, weeks, etc.

It might take a bit of work to figure this out, but it would really be ideal to have cached requests actually look at the file just to get a quick stat - if the mod time is more recent than the cached data, don't serve up the cached data.

Of course we don't currently store any kind of cached data timestamp, so that would be the first step to having a smarter cache.

Add pyramidal TIFF support

It might be useful to at least read the right image within a pyramidal TIFF's image list in order to avoid decoding the huge 30-megapixel image when the 2-megapixel version will suffice for a given request.

IIP already handles pyramidal TIFFs really well, so this isn't something we're likely to implement anytime soon.

Remove double-header writing

I believe this happens when the image server experiences an error decoding, which is typically due to a connection being broken mid-download (closing a browser tab, OpenSeadragon timing out, etc.). Since the first call to Write causes net/http to spit out all headers, an error during that write cannot be propagated in said headers.

At a minimum, errors during the "write to client" phase shouldn't try to re-send headers. However, removing that behavior doesn't actually fix the problem entirely.

Despite this error currently happening only due to client behaviors, the real solution is to verify that the decode works separately from writing out to the client. In the (granted, unlikely) situation that a decode operation fails, that error should be separate from errors writing to the client. The downside here is that we'd need to write to a temporary buffer instead of being able to just stream to the client. For large operations, like full- or max-image requests, this could explode RAM usage.

Serving large JPEG files fails after retrieving info.json

Lastest docker images fails serving large JPEG files after retrieving info.json.
Versions prior to 4.0.1 are not affected.

To reproduce I followed those steps:

Serve a 10MB JPEG file through RAIS docker image ≥ 4.0.1 (e.g. https://upload.wikimedia.org/wikipedia/commons/f/ff/Pizigani_1367_Chart_10MB.jpg).
Retrieve the info.json for this image. This works once.
Any other call to this image (info, crop, resize) fails with a 500.

Logs:

2021/04/16 07:39:36.256 - rais-server - DEBUG - SchemeMap translated "/images/jpg/sample.jpg" to URL "file:///var/local/images/images/jpg/sample.jpg"
2021/04/16 07:39:36.256 - rais-server - DEBUG - Loading image data from image resource (id: /images/jpg/sample.jpg)
2021/04/16 07:39:36.259 - rais-server - ERROR - Error getting image and/or IIIF Info for "/images/jpg/sample.jpg": 445: 0x7fa544013190 - <nil>
2021/04/16 07:39:36.259 - rais-server - INFO - Request: [172.18.0.2:48168,172.18.0.1] /images%2fjpg%2fsample.jpg/info.json - 500

build fails

i am trying to build the rais-server, but i got an error:
jpe@srv1:~/rais-image-server$ make
go run src/transform/generator.go
go fmt src/transform/rotation.go
src/transform/rotation.go
go generate rais/src/version
go build -ldflags="-s -w" -buildmode=plugin -o bin/plugins/json-tracer.so rais/src/plugins/json-tracer
build rais/src/plugins/json-tracer: cannot load io/fs: malformed module path "io/fs": missing dot in first path element
make: *** [Makefile:74: bin/plugins/json-tracer.so] Error 1

Ignore trailing slashes in the IIIF path argument / configuration

Per #20 (comments):

docker run --rm -it --env-file=.env -e RAIS_S3ZONE=us-east-1 -e RAIS_S3BUCKET=ndnp-batches -e RAIS_LOGLEVEL=DEBUG -e RAIS_ADDRESS=":12415" -e RAIS_S3CACHE=/tmp/rais-s3 -e RAIS_IIIFURL=http://localhost/iiif/2/ -p 80:12415 uolibraries/rais

behaves differently from

docker run --rm -it --env-file=.env -e RAIS_S3ZONE=us-east-1 -e RAIS_S3BUCKET=ndnp-batches -e RAIS_LOGLEVEL=DEBUG -e RAIS_ADDRESS=":12415" -e RAIS_S3CACHE=/tmp/rais-s3 -e RAIS_IIIFURL=http://localhost/iiif/2 -p 80:12415 uolibraries/rais

This should not happen.

S3 identifier prefix

I'm not sure whether this an update for the documentation or a feature request but I wanted to setup an instance of RAIS which only serves images from S3 and ended up having to read the source to learn that I have to prefix the IIIF image identifier with s3:. It's not the end of the world but it feels like an implementation detail I'd prefer not to leak into public URLs.

Make ImageMagick optional

Adding ImageMagick dependencies means we have a lot of extra overhead that isn't necessary most of the time. If a server has huge PNG/BMP/JPG images, it's unreasonable to decode that stuff in full on every tile request. If it has small non-JP2 images, the built-in Go image decoding is probably acceptable.

Therefore, ImageMagick should be available, but not default. This means:

Make the whole decoder situation just plugin-based - it's already practically there since decoders are registered per file type
Make ImageMagick decoding into a plugin
Change the alpine image to strictly use JP2
The huge Fedora image won't change enough to care about dropping ImageMagick, I'm betting, though it is probably worth looking into. If we remove it, though, it's probably worth having a "RAIS full" docker image tag so that there is an easy and efficient way to serve up any image type.

Unescape %2F in the IIIF identifier part of the path?

It seems that when Apache proxies, requests which include a query arg end up escaping the "%2F" to make it "%252F", so that when it's unescaped and sent to RAIS, it's "%2F" rather than "/". Oddly this seems to only happen when a query arg is present.

It may be necessary to make RAIS unescape %2F itself. This would allow more consistent parsing, because then (with nocanon turned off on the Apache rule), all incoming URLs appear to be escaped equally. But are there dangerous edge cases here? I can't imagine anybody actually using "%2F" in a filename, but you never know. Though presumably that would get encoded to "%252F" so it would still work. Right?

This needs some careful consideration.

Use a regex more specific for 'pct:'

In https://github.com/uoregon-libraries/rais-image-server/blob/master/src/iiif/url.go#L28 line, the regex can be more specific, like:

pct:((100|[1-9]?\d(\.\d+)?),){3}(100|[1-9]?\d(\.\d+)?)

Add an ID prefix mapping configuration / plugin option

Use case: I don't want to expose internal information about my images. But I don't want to map every possible image ID to something opaque.

Create a prefix mapper. Could be configuration that just points to a file, or it could be a plugin. Config may be easier. Could even be 100% config and have no file, just a list of mappings. May need both to cover complex setups, though.
- May be worth redefining the id-to-path plugin hook to be something more like "translateID" so it's usable by other plugins and can be way more cool if necessary. Then config can be fairly simple but a plugin could add more complex use-cases.
For the built-in translation, we just use prefixes. If an id string has a prefix of one of the maps, we replace the prefix with its mapping.

This won't look great as an ENV variable, but if we want it to be extremely easy to configure, it needs to be configurable in the environment. e.g.:

export RAIS_ID_MAPS="collection1-=s3:bucket1:||collection2-=s3:bucket2:"

This make any id starting with collection1- translate under the hood to s3:bucket1:, so that, e.g., collection1-giovanni.jp2 is what the public would see if inspecting the code, but s3:bucket1:giovanni.jp2 is what RAIS resolves.

Add administrative API

Add functions that can be called via HTTP requests:

Purge in-memory cache. This shouldn't affect enough of the app to worry about granularity, at least not with the current setup. It should just clear both the info and tile caches.
Purge caches for a given identifier. This would take some doing, and right now isn't possible (the tile cache is cached by the IIIF request, and nothing indexes all cached tiles by identifier). Eventually, though, it could be that we have multiple caches from different sources: info cache, tile cache, S3 download cache, tiles cached on disk, etc. All caches should get a signal when a given IIIF ID needs its cache purged.
Health / liveness probes? RAIS is simple enough that if the service is running, it is almost guaranteed to be both healthy and live. But these may be worth considering.

The admin API could listen on a separate address pretty easily so it could easily be exposed just to internal users.

Server crashes on exit

This has been going on for a bit, and is probably related to the cleanup code. It doesn't really cause problems since it's only crashing after you request RAIS to exit... but it still is really annoying.

Get URL from Apache / nginx environment

There should be headers that tell us the full URL, including the scheme. Previously I was concerned we'd only get the hostname, but I think that isn't correct. Needs some investigation, but if we can ditch the stupid --iiif-url setting (or rather, make it optional), that would be lovely.

uoregon-libraries / rais-image-server Goto Github PK

rais-image-server's Introduction

Rodent-Assimilated Image Server

Configuration

Main Configuration Settings

Cloud Settings

IIIF Features

Caching

Generating tiled, multi-resolution JP2s

License

Contributors

rais-image-server's People

Contributors

Stargazers

Watchers

Forkers

rais-image-server's Issues

Recommend Projects

Recommend Topics

Recommend Org