Git Product home page Git Product logo

docker-virtuoso's Introduction

Virtuoso docker

Docker for hosting Virtuoso.

The Virtuoso is built from a specific commit SHA in https://github.com/openlink/virtuoso-opensource. This image is currently build from commit f3d88f16bca4274265160e098be3ba3c7d68341c, which corresponds to virtuoso 7.2.10. You can build this image from a different commit by providing the correct commit id as the VIRTUOSO_COMMIT build argument.

Running your Virtuoso

docker run --name my-virtuoso \
    -p 8890:8890 -p 1111:1111 \
    -e DBA_PASSWORD=myDbaPassword \
    -e SPARQL_UPDATE=true \
    -e DEFAULT_GRAPH=http://www.example.com/my-graph \
    -v /my/path/to/the/virtuoso/db:/data \
    -d redpencil/virtuoso

The Virtuoso database folder is mounted in /data.

The Docker image exposes port 8890 and 1111.

Docker compose

The image can also be configured and used via docker-compose.

db:
  image: redpencil/virtuoso:1.0.0
  environment:
    SPARQL_UPDATE: "true"
    DEFAULT_GRAPH: "http://www.example.com/my-graph"
  volumes:
    - ./data/virtuoso:/data
  ports:
    - "8890:8890"

Upgrading

There are multiple ways of upgrading your virtuoso version. The procedure described here takes a bit longer, but will result in using all of the latest features of your new virtuoso version and optimizes your DB size on disk.

NOTE: Upgrading virtuoso is a procedure to be done with great care, make sure to have backups before starting.

1. dump nquads

When upgrading it's recommended (and sometimes required!) to first dump to quads using the dump_nquads procedure:

docker compose exec virtuoso isql-v
SQL> dump_nquads ('dumps', 1, 1000000000, 1);

2. stop the db

docker compose stop virtuoso

3. remove old db and related files

When this has completed move the dumps folder to the toLoad folder. Make sure to remove the following files:

  • .data_loaded
  • .dba_pwd_set
  • virtuoso.db
  • virtuoso.trx
  • virtuoso.pxa
  • virtuoso-temp.db
mv data/db/dumps/* data/db/toLoad
rm data/db/virtuoso.{db,trx,pxa} data/db/virtuoso-temp.db data/db/.data_loaded data/db/.dba_pwd_set

Consider truncating or removing the virtuoso.log file as well.

4. update virtuoso version

Modify the docker-compose file to update the virtuoso version

   virtuoso:
-    image: redpencil/virtuoso:1.0.0
+    image: redpencil/virtuoso:1.2.0-rc.1

5. start the db

Start the DB and monitor the logs, importing the nquads might take a long time .

docker compose up -d virtuoso
docker compose logs -f virtuoso

After that your application can be started again and you should be good to go.

Configuration

dba password

The dba password can be set at container start up via the DBA_PASSWORD environment variable. If not set, the default dba password will be used.

SPARQL update permission

The SPARQL_UPDATE permission on the SPARQL endpoint can be granted by setting the SPARQL_UPDATE environment variable to true.

CORS

You may want to enable basic CORS headers on the SPARQL endpoint, this can be done by setting the ENABLE_CORS environment variable to any value. If not set (the default), no cors headers are sent.

.ini configuration

All properties defined in virtuoso.ini can be configured via the environment variables. The environment variable should be prefixed with VIRT_ and have a format like VIRT_$SECTION_$KEY. $SECTION and $KEY are case sensitive. They should be CamelCased as in virtuoso.ini. E.g. property ErrorLogFile in the Database section should be configured as VIRT_Database_ErrorLogFile=error.log.

Dumping your Virtuoso data as quads

Enter the Virtuoso docker, open ISQL and execute the dump_nquads procedure. The dump will be available in /my/path/to/the/virtuoso/db/dumps.

docker exec -it my-virtuoso bash
isql-v -U dba -P $DBA_PASSWORD
SQL> dump_nquads ('dumps', 1, 10000000, 1);

For more information, see http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtRDFDumpNQuad

Loading quads in Virtuoso

Manually

Make the quad .nq files available in /my/path/to/the/virtuoso/db/dumps. The quad files might be compressed. Enter the Virtuoso docker, open ISQL, register and run the load.

docker exec -it my-virtuoso bash
isql-v -U dba -P $DBA_PASSWORD
SQL> ld_dir('dumps', '*.nq', 'http://foo.bar');
SQL> rdf_loader_run();
SQL> checkpoint;
SQL> checkpoint_interval(N);
SQL> scheduler_interval(M);

Note: N and M should be fetched from your virtuoso.ini config by looking for CheckpointInterval and SchedulerInterval respectively.

Validate the ll_state of the load. If ll_state is 2, the load completed.

select * from DB.DBA.load_list;

For more information, see http://virtuoso.openlinksw.com/dataspace/doc/dav/wiki/Main/VirtBulkRDFLoader

Automatically

By default, any data that is put in the toLoad directory in the Virtuoso database folder (/my/path/to/the/virtuoso/db/toLoad) is automatically loaded into Virtuoso on the first startup of the Docker container. The default graph is set by the DEFAULT_GRAPH environment variable, which defaults to http://localhost:8890/DAV.

Creating a backup

A virtuoso backup can be created by executing the appropriate commands via the ISQL interface.

docker exec -i virtuoso_container mkdir -p backups
docker exec -i virtuoso_container isql-v <<EOF
    exec('checkpoint');
    backup_context_clear();
    backup_online('backup_',30000,0,vector('backups'));
    exit;

Restoring a backup

To restore a backup, stop the running container and restore the database using a new container.

docker run --rm -it -v path-to-your-database:/data redpencil/virtuoso virtuoso-t +restore-backup backups/backup_ +configfile /data/virtuoso.ini

The new container will exit once the backup has been restored, you can then restart the original db container.

It is also possible to restore a backup placed in /data/backups using a environment variable. Using this approach the backup is loaded automatically on startup and it is not required to run a separate container.

docker run --name my-virtuoso \
            -p 8890:8890 \
            -p 1111:1111 \
            -e DBA_PASSWORD=dba \
            -e SPARQL_UPDATE=true \
            -e BACKUP_PREFIX=backup_ \_
            -v path-to-your-database:/data \
            -d redpencil/virtuoso

Contributing

Contributions to this repository are welcome, please create a pull request on the master branch.

New features will be tested on redpencil/virtuoso:latest first. Once the image is verified, version branches will be rebased on master.

docker-virtuoso's People

Contributors

nvdk avatar erikap avatar mikidi avatar madnificent avatar jan-pieterbaert avatar rahien avatar sergiofenoll avatar cecton avatar mielvds avatar sandervd avatar bertvannuffelen avatar fr0gs avatar jvstein avatar mpparsley avatar peternowee avatar wdullaer avatar

Stargazers

 avatar  avatar  avatar

Watchers

James Cloos avatar

docker-virtuoso's Issues

Docs: add some info about updating images

When updating a virtuoso image, in some cases it's required to dump/load the data, to stop virtuoso from running in some kind of compatibility mode.
This would be a nice place to document our recommended procedure (quad-dump vs store/load backup)?

If it's the quad-dump procedure, I'd like some information about the "default graph" argument: how does this work when the database contains multiple graphs? Is it only setting the default graph for those triples that don't have an explicit graph? (this is what I assume, but an explicit mention would be reassuring). Maybe even a snippet to check what the default graph currently is in your database.

A mention of which files to remove (.trx files?) would also be welcome.

None of this info is technically the responsibility of this repo, but virtuoso documentation can be quite difficult to wade through, and database dump/load operations are stressful, so a bit more handholding would be very welcome.

provide a mu script to optimize virtuoso configuration

See https://github.com/mu-semtech/mu-cli#writing-your-own-script on writing scripts. We should provide a script that allows users to optimize their configuration.

This script could either calculate or ask the amount of memory available and the size of the database and optimize at least the following parameters based on that:

MaxCheckpointRemap should be 1/4 of the database size
NumberOfBuffers maximum should be about 60-70% of the available memory, count 9KB per buffer
MaxDirtyBuffers should be about 75% of NumberOfBuffers

More info on
http://docs.openlinksw.com/virtuoso/checkpointparams/
http://docs.openlinksw.com/virtuoso/sampleconf/
http://docs.openlinksw.com/virtuoso/rdfperfloading/
http://docs.openlinksw.com/virtuoso/perfdiag/

[regression]: migrations fail

Step to reproduce:

  • checkout CH project (https://github.com/lblod/app-contact-hub)
  • Switch docker image for virtuoso
  • fetch private migrations files in CH ACC (config/migrations/private-data/**)
  • delete data/db
  • run docker-compose -f docker-compose.yml -f docker-compose.dev.yml up -d

Expected result:

  • All migrations running and status DONE

Actual result:

  • 20210901153815-worship-local.ttlfails with error:
igrations_1                                                | E, [2021-12-07T11:21:48.067865 #12] ERROR -- : Invalid Turtle file 20210901153815-worship-local.ttl
migrations_1                                                | /usr/local/lib/ruby/2.5.0/net/protocol.rb:181:in `rbuf_fill': too many connection resets (due to Net::ReadTimeout - Net::ReadTimeout) after 0 requests on 47381480942380, last used 1638876108.0677445 seconds ago (Net::HTTP::Persistent::Error)
migrations_1                                                |   from /usr/local/lib/ruby/2.5.0/net/protocol.rb:157:in `readuntil'
migrations_1                                                |   from /usr/local/lib/ruby/2.5.0/net/protocol.rb:167:in `readline'
migrations_1                                                |   from /usr/local/lib/ruby/2.5.0/net/http/response.rb:40:in `read_status_line'
migrations_1                                                |   from /usr/local/lib/ruby/2.5.0/net/http/response.rb:29:in `read_new'
migrations_1                                                |   from /usr/local/lib/ruby/2.5.0/net/http.rb:1494:in `block in transport_request'
migrations_1                                                |   from /usr/local/lib/ruby/2.5.0/net/http.rb:1491:in `catch'
migrations_1                                                |   from /usr/local/lib/ruby/2.5.0/net/http.rb:1491:in `transport_request'
migrations_1                                                |   from /usr/local/lib/ruby/2.5.0/net/http.rb:1464:in `request'
migrations_1                                                |   from /usr/local/bundle/gems/net-http-persistent-3.1.0/lib/net/http/persistent.rb:964:in `block in request'
migrations_1                                                |   from /usr/local/bundle/gems/net-http-persistent-3.1.0/lib/net/http/persistent.rb:662:in `connection_for'
migrations_1                                                |   from /usr/local/bundle/gems/net-http-persistent-3.1.0/lib/net/http/persistent.rb:958:in `request'
migrations_1                                                |   from /usr/local/bundle/gems/sparql-client-3.0.1/lib/sparql/client.rb:696:in `request'
migrations_1                                                |   from /usr/local/bundle/gems/sparql-client-3.0.1/lib/sparql/client.rb:344:in `response'
migrations_1                                                |   from /usr/local/bundle/gems/sparql-client-3.0.1/lib/sparql/client.rb:305:in `query'
migrations_1                                                |   from /usr/local/bundle/gems/mu-auth-sudo-0.1.0/lib/mu/auth-sudo.rb:22:in `query'
migrations_1                                                |   from /usr/local/bundle/gems/mu-auth-sudo-0.1.0/lib/mu/auth-sudo.rb:26:in `update'
migrations_1                                                |   from /usr/local/bundle/gems/mu-auth-sudo-0.1.0/lib/mu/auth-sudo.rb:8:in `block in included'
migrations_1                                                |   from /usr/src/app/ext/web.rb:129:in `batch_insert'
migrations_1                                                |   from /usr/src/app/ext/web.rb:78:in `execute!'
migrations_1                                                |   from /usr/src/app/ext/web.rb:151:in `block in execute_migrations'
migrations_1                                                |   from /usr/src/app/ext/web.rb:149:in `each'
migrations_1                                                |   from /usr/src/app/ext/web.rb:149:in `execute_migrations'
migrations_1                                                |   from /usr/src/app/ext/web.rb:214:in `boot'
migrations_1                                                |   from /usr/src/app/ext/web.rb:217:in `<top (required)>'
migrations_1                                                |   from web.rb:87:in `require_relative'
migrations_1                                                |   from web.rb:87:in `<main>'

Database logs:

triplestore_1                                               | 11:21:50 ERRS_0 HY008 SR189 Async statement killed by SQLCancel.
triplestore_1                                               | 11:21:50 ERRS_0 S1T00 {CLI.. Client cancelled or disconnected
triplestore_1                                               | 11:21:50 ERRS_0 40001 SR337 Transaction aborted due to async rollback in cluster
triplestore_1                                               | 11:21:50 ERRS_0 S1T00 {CLI.. Client cancelled or disconnected
triplestore_1                                               | 11:21:50 ERRS_0 40001 SR337 Transaction aborted due to async rollback in cluster
triplestore_1                                               | 11:21:50 ERRS_0 40001 SR337 Transaction aborted due to async rollback in cluster
triplestore_1                                               | 11:21:50 ERRS_0 40001 SR337 Transaction aborted due to async rollback in cluster
triplestore_1                                               | 11:21:50 ERRS_0 S1T00 {CLI.. Client cancelled or disconnected
triplestore_1                                               | 11:21:50 ERRS_0 40001 SR337 Transaction aborted due to async rollback in cluster
triplestore_1                                               | 11:21:50 ERRS_0 S1T00 {CLI.. Client cancelled or disconnected
triplestore_1                                               | 11:21:50 ERRS_0 40001 SR337 Transaction aborted due to async rollback in cluster
triplestore_1                                               | 11:21:50 ERRS_0 S1T00 {CLI.. Client cancelled or disconnected
triplestore_1                                               | 11:21:50 ERRS_0 40001 SR337 Transaction aborted due to async rollback in cluster
triplestore_1                                               | 11:21:50 ERRS_0 40001 SR337 Transaction aborted due to async rollback in cluster
triplestore_1                                               | 11:26:53 ERRS_0 01V01 QW004 Incompatible types VARCHAR (182) and INTEGER (189) in COALESCE for coalesce_ret and <constant>
triplestore_1                                               | 11:26:53 ERRS_0 01V01 QW004 Incompatible types VARCHAR (182) and INTEGER (189) in := for coalesce_ret and <constant>
triplestore_1                                               | 11:26:53 ERRS_0 01V01 QW004 Incompatible types VARCHAR (182) and INTEGER (189) in COALESCE for coalesce_ret and <constant>
triplestore_1                                               | 11:26:53 ERRS_0 01V01 QW004 Incompatible types VARCHAR (182) and INTEGER (189) in := for coalesce_ret and <constant>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.