Git Product home page Git Product logo

preview-server's Introduction

Build Status

preview-server

A docker container to produce PNG image previews for common file types. This container is intended to be used as part of a larger application stack.

The container uses monit to execute and monitor a Python async http server as well as soffice.bin (via unoconv) which is used for office document conversion. The preview service utilizes libav, gslib and imagemagick-wand, PIL for other file formats.

The focus of this project is to provide a preview success rate as close as possible to 100%. This is achieved by careful testing and error handling. For example, soffice.bin is restarted if it consumes too much memory. A healthcheck ensures soffice.bin is available (restarting it if not). Also, the preview service will retry requests to soffice.bin in order to recover from conversion errors.

Usage

You should pull the latest stable image from DockerHub and run it like this:

$ make small

Four flavors are provided:

  • small, just two containers, preview-server and preview-soffice.
  • medium, preview-server, haproxy and 3 preview-soffice replicas.
  • large, two preview-server replicas, haproxy, nginx, prometheus, grafana and 5 preview-soffice replicas.
  • dev, same as large but also enables reloading.

Once the service has initialised, files, paths, or urls can be sent to the /preview/ endpoint, and a PNG preview will be returned. Additional configuration is necessary in order to use paths, see Options below for more information. Optional width and height arguments can be sent with the request to control the size of the returned preview image.

$ curl -o out.png -F '[email protected]' http://localhost:3000/preview/
$ curl -o out.png -F 'url=http://somedomain.com/some-pdf' http://localhost:3000/preview/
$ curl -o out-small.png -F 'width=100' -F 'height=50' -F '[email protected]' http://localhost:3000/preview/

Options

A number of features are controlled by environment variables.

PVS_FILES - This option informs the preview service where files are located. When enabled, the service can be sent a path rather than POSTing file body or URL. When enabled, the given path should be relative to PVS_FILES.

For example, below the file located at /mnt/files/path/to/file.doc will be previewed.

$ docker run -d -p 3000:3000 --tmpfs /tmp \
    -v /mnt/files:/mnt/files -e PVS_FILES=/mnt/files \
    btimby/preview-server

$ curl -o out.png -F 'path=/path/to/file.doc' \
    -F 'width=200' -F 'height=100' http://localhost:3000/preview/

PVS_CACHE_CONTROL - This option controls the Cache-Control header emitted by the service. When omitted, the header supressed. When present, it controls the number of minutes previews should be cached. This value should be an interval such as 15m or 1h.

PVS_STORE - By default generated previews are ephemeral. If you wish to store the previews so that they are not regenerated in future requests, you can do so using ththis option. This option is required by PVS_X_ACCEL_REDIR. The value should be the path to a volume you mount for this purpose.

This can be used as a cache mechanism, for example by using tmpfs. Optionally, you can provide a file system (even a shared file system) for long-term storage. When combined with PVS_FILES, The file's mtime is compared to the preview's mtime. If the source file is newer, the preview is regenerated. This option has no effect for POSTed or downloaded files.

For example, below the host's /mnt/store directory or device will be used to store generated previews. The second call to curl will be much faster as it will simply return the preview generated in the first call.

$ docker run -d -p 3000:3000 --tmpfs /tmp \
    -v /mnt/files:/mnt/files -e PVS_FILES=/mnt/files \
    -v /mnt/store:/mnt/store -e PVS_STORE=/mnt/store \
    btimby/preview-server

$ curl -o out.png -F 'path=/path/to/file.doc' http://localhost:3000/preview/
$ curl -o out.png -F 'path=/path/to/file.doc' http://localhost:3000/preview/

PVS_X_ACCEL_REDIR - This option offloads file transfers to nginx. It requires that PVS_STORE be configured and that the volume be shared with nginx. The value should be the URI of the location in the nginx configuration file.

https://www.nginx.com/resources/wiki/start/topics/examples/xsendfile/

PVS_DEFAULT_WIDTH & PVS_DEFAULT_HEIGHT - These options provide the default width and height of generated PNG previews. If the caller omits width and height parameters to the service, these defaults are used.

PVS_MAX_WIDTH & PVS_MAX_HEIGHT - These options provide the maximum allowable width and height that a user can request.

PVS_LOGLEVEL & PVS_HTTP_LOGLEVEL - These options control the log output generated by the preview service. The first applies to the service in general, the second to the aiohttp's access log.

PVS_METRICS - Enable prometheus metrics at /metrics/ endpoint. The large flavor bundles preconfigured prometheus and grafana. Grafana provisioning is used to bundle a prebuild dashboard. Grafana is at: http://localhost:3001 and prometheus is at http://localhost:9090/.

PVS_RELOAD - Enable reloading of preview-server when source code changes.

PROFILE_PATH - Enable code profiling, runtime stats will be stored here when preview-server stops.

MAX_FILE_SIZE - Limit the size (in bytes) of files that can be previewed [default: unlimited].

MAX_PAGES - Limit the number of pages included in preview [default: unlimited]. Users can request page ranges or "all" however, this limit will be enforced.

PVS_PORT - The port that the preview-server binds within the container.

PVS_UID - The UID to use for preview-server and preview-soffice. This may be necessary to ensure that they can access volumes.

PVS_GID - The GID to use for preview-server and preview-soffice. This may be necessary to ensure that they can access volumes.

PVS_SOFFICE_ADDR - Used by preview-server to connect to soffice. Used by soffice for bind.

PVS_SOFFICE_PORT - Used by preview-server to connect to soffice. Used by soffice for bind.

PVS_SOFFICE_TIMEOUT - Control how long to wait for a response from soffice before retrying.

PVS_SOFFICE_RETRY - Control how many times to retry connection to soffice before failing.

Error tracking with Sentry

You can enable sentry error tracking by setting some environment variables:

SENTRY_DSN - The DSN for your sentry server.

SENTRY_RELEASE - The release to include with error reports. This defaults to the git hash of preview-server.

SENTRY_ENVIRONMENT - The environment to include with error reports. There is no default for this.

Development

To build, run:

$ make dev

Once the service is initialized, you can test it using:

The stress testing tool make test The interactive test: http://localhost:3000/test/

License

MIT, see LICENSE.

preview-server's People

Contributors

btimby avatar jalohse avatar pudo avatar sunu avatar

Stargazers

 avatar  avatar

Watchers

 avatar

preview-server's Issues

OfficeBackend emits error when converting uploaded file

The following error occurs because an upload is in /tmp which is not accessible to OO. In this case, we need to pass a file-like object and use private:stream / stdin to do the conversion.

preview-server_1  | Office probably died. Unsupported URL <file:///tmp/tmpduzfg5y2.pptx>: "type detection failed"
preview-server_1  | OfficeBackend.preview('/mnt/files/agreement.docx', 100, 124) took 9.370164s
nginx_1           | nginx.1    | preview 172.27.0.1 - - [01/Oct/2019:06:44:13 +0000] "GET /preview/?path=candea.pptx&width=103&height=310 HTTP/1.1" 200 2308 "-" "Python/3.6 aiohttp/4.0.0a0"
nginx_1           | nginx.1    | preview 172.27.0.1 - - [01/Oct/2019:06:44:13 +0000] "GET /preview/?path=agreement.docx&width=100&height=124 HTTP/1.1" 200 3738 "-" "Python/3.6 aiohttp/4.0.0a0"
preview-server_1  | improper image header `/tmp/tmpkbqabg2h.png' @ error/png.c/ReadPNGImage/3954
preview-server_1  | Traceback (most recent call last):
preview-server_1  |   File "/app/preview/__main__.py", line 160, in preview
preview-server_1  |     status, path = 200, await generate(path, format, width, height)
preview-server_1  |   File "/usr/lib/python3.6/concurrent/futures/thread.py", line 56, in run
preview-server_1  |     result = self.fn(*self.args, **self.kwargs)
preview-server_1  |   File "/app/preview/preview.py", line 43, in generate
preview-server_1  |     store_path = Backend.preview(path, width, height)
preview-server_1  |   File "/app/preview/preview.py", line 34, in preview
preview-server_1  |     return obj.preview(path, width, height)
preview-server_1  |   File "/app/preview/utils.py", line 53, in inner
preview-server_1  |     return f(*args, **kwargs)
preview-server_1  |   File "/app/preview/backends/office.py", line 155, in preview
preview-server_1  |     return PdfBackend().preview(t.name, width, height)
preview-server_1  |   File "/app/preview/utils.py", line 53, in inner
preview-server_1  |     return f(*args, **kwargs)
preview-server_1  |   File "/app/preview/backends/pdf.py", line 43, in preview
preview-server_1  |     return ImageBackend().preview(t.name, width, height)
preview-server_1  |   File "/app/preview/utils.py", line 53, in inner
preview-server_1  |     return f(*args, **kwargs)
preview-server_1  |   File "/app/preview/backends/image.py", line 43, in preview
preview-server_1  |     return resize_image(path, width, height)
preview-server_1  |   File "/app/preview/backends/image.py", line 17, in resize_image
preview-server_1  |     with Image(filename=path, resolution=300) as s:
preview-server_1  |   File "/usr/local/lib/python3.6/dist-packages/wand/image.py", line 8212, in __init__
preview-server_1  |     units=units)
preview-server_1  |   File "/usr/local/lib/python3.6/dist-packages/wand/image.py", line 8686, in read
preview-server_1  |     self.raise_exception()
preview-server_1  |   File "/usr/local/lib/python3.6/dist-packages/wand/resource.py", line 240, in raise_exception
preview-server_1  |     raise e
preview-server_1  | wand.exceptions.CorruptImageError: improper image header `/tmp/tmpkbqabg2h.png' @ error/png.c/ReadPNGImage/3954
preview-server_1  | OfficeBackend.preview('/mnt/files/sample.docx', 677, 242) took 12.502673s

Add MAX_SIZE

Add config option for MAX_SIZE. If a preview is requested for a file larger than MAX_SIZE instead return an error image.

.ico files are not working.

preview-server emits an error when trying to preview a .ico file.

MissingDelegateError: no decode delegate for this image format `' @ error/constitute.c/ReadImage/504
  File "preview/__init__.py", line 265, in handler
    await generate(obj)
  File "concurrent/futures/thread.py", line 56, in run
    result = self.fn(*self.args, **self.kwargs)
  File "preview/preview.py", line 60, in generate
    Backend.preview(obj)
  File "preview/preview.py", line 46, in preview
    return _preview(be, obj)
  File "preview/preview.py", line 30, in _preview
    be.preview(obj)
  File "preview/backends/base.py", line 25, in preview
    return method(obj)
  File "preview/utils.py", line 55, in inner
    return f(*args, **kwargs)
  File "preview/backends/image.py", line 79, in _preview_image
    path = resize_image(obj.src.path, obj.width, obj.height)
  File "preview/backends/image.py", line 22, in resize_image
    with Image(filename=path, resolution=300) as s:
  File "wand/image.py", line 8368, in __init__
    units=units)
  File "wand/image.py", line 8841, in read
    self.raise_exception()
  File "wand/resource.py", line 230, in raise_exception
    raise e

https://sentry.lumanox.io/organizations/lumanox/issues/2885/?project=14&query=is%3Aunresolved

Modify cleanup routine.

Currently the cleanup routine allows the user to configure a max age. This is less useful than allowing a max_size. Max_size would allow the storage used by preview server to be capped. This would require introducing an interval at which to perform cleanup. Add the following two options:

PVS_CLEANUP_MAX_SIZE=100g
PVS_CLEANUP_INTERVAL=3h

This way the preview server can prune the oldest files to bring storage usage below max_size.

Reinstate cleanup()

Rewrite cleanup() to delete files that are a configured age or older. This should be much less likely to delete a file that is being served by nginx.

Exactly scale image output

When given a width and height the resulting image should be exactly those dimensions. First scale the image down to that bounding box while preserving aspect ratio, then pad the image with transparency to meet the requested dimensions.

Remove jwilder-nginx

I believe stock nginx with resolver will now work with multiple backends and autoscale.

Use aiofiles for FS interface

A slow filesystem will affect preview-server. A lot of file access is done in a threadpool already, however, many stat calls etc are performed in a blocking fashion.

Add aiofiles library and convert all file access to asyncio.

Allow caller to opt out of storage

Allow user to disable storage for a preview request. This can be used to bypass caching or to not store the generated preview (for a one-off).

Allow icon redirection

Instead of resizing an icon and returning it, allow a base url (ICON_BASE_URL) to be configured.

With this URL, the browser can be redirected by raising a 301 response. Be sure to set the cache-control header according to config.

Additionally, use an LRU to cache the best file for a given dimension and extension.

Integrate with nginx

Allow nginx integration by emitting an X-Accel-Redirect header instead of the image file. Assuming the storage location is shared with nginx, this will allow it to offload the transfer and cache the image data.

Comprehensive file support

The extension lists are by no means comprehensive. Need to investigate all supported formats and ensure they are present in the extension lists. Also need to widen the file types present in fixtures.

Add limit to soffice backend

Provide a configuration option to limit the size of the threadpool use to run soffice conversions. In the absence of haproxy, this should be set to 1.

Implement storage

Build a unique key from given path and options. Use key to look up image in storage location. Serve the existing image if it exists, create it, store it, and serve it if not.

Support PDF output

Allow document types to be previewed as a single page PDF instead of image only.

Reinstate source change reloading

docker/preview/restart uses inotifywait to touch a file that used to cause monit to restart the service.

I removed monit. I need to integrate this script into docker/preview/start so that inotifywait instead calls a function to HUP or restart the preview server.

Consolidate backends.

The proxy plugin currently uses two backends. One for authenticated requests and one for anonymous requests. The authenticated backend is capable of serving anonymous requests, but the url must be rewritten. Change configuration to accept a single backend:

PROXY_AUTH_UPSTREAM=http://foo.bar/
PROXY_ANON_UPSTREAM=http://quux.baz/
# vs.
PROXY_UPSTREAM=http://foo.bar/

The anonymous endpoint then must rewrite a url such as /s0f9d0s9fs/file.name?preview=true to /link/s0f9d0s9fs/file.name?preview=true. Allowing the same backend to handle all requests.

Represent all features in `test.html`

test.html needs the following features.

  • Ajax loading while images are generated.
  • Timing information.
  • Slider to control width / height.
  • Upload / URL input and submit button.

Need to represent all features on this page.

Add file type icon fallback

Currently unsupported files and failed conversions emit HHTP 203 with an error image. Instead emit HTTP 203 and a file type icon.

Allow user to mount an icon set of their choice. Icon sets should be arranged in directories by resolution.

Pick the best resolution and resize to match requested width and height.

Increase gs DPI when making large images.

Currently Ghostscript is using the default DPI which is fine for small images. If a large image is requested, the resulting image must be upscaled, resulting in a blurry image. If Ghostscript rendered to a higher DPI, then larger images could be produced without upscaling.

Use the requested resolution to determine what DPI to use with Ghostscript.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.