Git Product home page Git Product logo

wardle / hermes Goto Github PK

View Code? Open in Web Editor NEW
173.0 15.0 20.0 1.33 MB

A library and microservice implementing the health and care terminology SNOMED CT with support for cross-maps, inference, fast full-text search, autocompletion, compositional grammar and the expression constraint language.

License: Eclipse Public License 2.0

Clojure 100.00%
terminology snomed-ct snomed terminology-server clojure healthcare diagnoses drugs health icd-10

hermes's Introduction

Hermes : terminology tools, library and microservice.

Scc Count Badge Scc Cocomo Badge Test DOI
GitHub release (latest by date) Clojars Project cljdoc badge

Hermes provides a set of terminology tools built around SNOMED CT including:

  • a fast terminology service with full-text search functionality; ideal for driving autocompletion in user interfaces
  • an inference engine in order to analyse SNOMED CT expressions and concepts and derive meaning
  • cross-mapping to and from other code systems including ICD-10, Read codes and OPCS
  • support for SNOMED CT compositional grammar (cg) and expression constraint language (ECL) v2.2.
  • optional HL7 FHIR terminology server via hades

It is designed as both a library for embedding into larger applications and as a standalone microservice.

It is fast, both for import and for use. It imports and indexes the International and UK editions of SNOMED CT in less than 5 minutes; you can have a server running seconds after that.

It replaces previous similar tools I wrote in Java and Go and is designed to fit into a wider architecture with identifier resolution, mapping and semantics as first-class abstractions.

Rather than a single monolithic terminology server, it is entirely reasonable to build multiple services, each providing an API around a specific edition or version of SNOMED CT, and to use an API gateway to manage client access. Hermes is lightweight and designed to be composed with other services.

It is part of my PatientCare v4 development; previous versions have been operational within NHS Wales since 2007.

You can have a working terminology server running by typing only a few lines at a terminal. There's no need for any special hardware, or any special dependencies such as setting up your own elasticsearch or solr cluster. You just need a filesystem! Many other tools take hours to import the SNOMED data; you'll be finished in less than 10 minutes!

A HL7 FHIR terminology facade is under development : hades. This exposes the functionality available in hermes via a FHIR terminology API. This already supports search and autocompletion using the $expand operation.

Table of contents

Quickstart

You can have a terminology server running in minutes. Full documentation is below, but here is a quickstart.

Before you begin, you will need to have Java installed.

1. Download hermes

You can choose to run a jar file by downloading a release and running using Java, or run from source code using Clojure:

Download a release and run using Java

Download the latest release from https://github.com/wardle/hermes/releases For simplicity, I've renamed the download jar file to 'hermes.jar' for these examples

Run the jar file using:

java -jar hermes.jar

When run without parameters, you will be given help text.

In all examples below, java -jar hermes.jar is equivalent to clj -M:run and vice versa.

Run from source code using Clojure

Install clojure. e.g on Mac OS X:

brew install clojure

Then clone the repository, change directory and run:

git clone https://github.com/wardle/hermes
cd hermes
clj -M:run

When run without parameters, you will be given help text.

In all examples below, java -jar hermes.jar is equivalent to clj -M:run and vice versa.

2. Download and install one or more distributions

You will need to download distributions from a National Release Centre.

How to do this will principally depend on your location.

For more information, see https://www.snomed.org/snomed-ct/get-snomed. SNOMED provide a Member Licensing and Distribution Centre.

In the United States, the National Library of Medicine (NLM) has more information. For example, the SNOMED USA edition is available from https://www.nlm.nih.gov/healthit/snomedct/us_edition.html.

In the United Kingdom, you can download a distribution from NHS Digital using the TRUD service.

hermes also provides automated downloads for a range of distributions worldwide using the MLDS.

If you've downloaded a distribution manually, import using one of these commands:

java -jar hermes.jar --db snomed.db import ~/Downloads/snomed-2021/

or

clj -M:run --db snomed.db import ~/Downloads/snomed-2021/

If you're a UK user and want to use automatic downloads, you can do this

java -jar hermes.jar --db snomed.db install --dist uk.nhs/sct-clinical --dist uk.nhs/sct-drug-ext --api-key trud-api-key.txt --cache-dir /tmp/trud
clj -M:run --db snomed.db install --dist uk.nhs/sct-clinical --dist uk.nhs/sct-drug-ext --api-key trud-api-key.txt --cache-dir /tmp/trud

Ensure you have a TRUD API key.

This will download both the UK clinical edition and the UK drug extension. If you're a UK user, I'd recommend installing both.

When running interactively at the command-line, you can use --progress to turn on progress reporting when downloading items.

e.g.

java -jar hermes.jar --progress --db snomed.db install --dist uk.nhs/sct-clinical --dist uk.nhs/sct-drug-ext --api-key trud-api-key.txt --cache-dir /tmp/trud
clj -M:run --progress --db snomed.db install --dist uk.nhs/sct-clinical --dist uk.nhs/sct-drug-ext --api-key trud-api-key.txt --cache-dir /tmp/trud

You can download a specific edition using an ISO 6801 formatted date:

java -jar hermes.jar --db snomed.db install --dist uk.nhs/sct-clinical --api-key trud-api-key.txt --cache-dir /tmp/trud --release-date 2021-03-24
java -jar hermes.jar --db snomed.db install --dist uk.nhs/sct-drug-ext --api-key trud-api-key.txt --cache-dir /tmp/trud --release-date 2021-03-24

or

clj -M:run --db snomed.db install --dist uk.nhs/sct-clinical --api-key trud-api-key.txt --cache-dir /tmp/trud --release-date 2021-03-24
clj -M:run --db snomed.db install --dist uk.nhs/sct-drug-ext --api-key trud-api-key.txt --cache-dir /tmp/trud --release-date 2021-03-24

These are most useful for building reproducible container images. You can get a list of available UK versions by simply looking at the TRUD website, or using:

java -jar hermes.jar available --dist uk.nhs/sct-clinical --api-key trud-api-key.txt --cache-dir /tmp/trud

or

clj -M:run available --dist uk.nhs/sct-clinical --api-key trud-api-key.txt --cache-dir /tmp/trud

My tiny i5 'NUC' machine takes 1 minute to import the UK edition of SNOMED CT and a further minute to import the UK dictionary of medicines and devices.

If you have an account with the MLDS, then you can use that website to download a distribution manually, or hermes can do it for you.

java -jar hermes.jar available

or

clj -M:run available

For example, to install the Irish distribution:

java -jar hermes.jar --db snomed.db install --dist ie.mlds/285520 --username xxxx --password password.txt

or

clj -M:run --db snomed.db install --dist ie.mlds/285520 --username xxxx --password password.txt

You can request a specific version by providing --release-date as an option. You will need to have a licence for the distribution you are trying to download, or you will get an 'invalid credentials' error.

3. Index and compact

You must index. Compaction is not mandatory, but advisable.

java -jar hermes.jar --db snomed.db index compact

or

clj -M:run --db snomed.db index compact

My machine takes 6 minutes to build the search indices and 20 seconds to compact the database.

4. Run a server!

java -jar hermes.jar --db snomed.db --port 8080 --bind-address 0.0.0.0 serve

or

clj -M:run --db snomed.db --port 8080 serve

You can use hades with the 'snomed.db' index to give you a FHIR terminology server.

More detailed documentation is included below.

You can use multiple commands at the same time.

For example:

java -jar hermes.jar --api-key trud-api-key.txt --db snomed.db install uk.nhs/sct-clinical index compact serve 

Will download, extract, import, index and compact a database, and then run a server.

Common questions

What can I do with hermes?

hermes provides a simple library, and optionally a microservice, to help you make use of SNOMED CT.

A library can be embedded into your application; this is easy using Clojure or Java or any other language running on the JVM. You make calls using the API just as you'd use any regular library.

A microservice runs independently and you make use of the data and software by making an API call over the network. This makes the functionality available to any software code that can use HTTP and JSON, such as C#, Python or R.

Like all PatientCare components, you can use hermes in either way. Sometimes, when you're starting out, it's best to use as a library but larger projects and larger installations will want to run their software components independently, optimising for usage patterns, resilience, reliability and rate of change.

Most people who use a terminology run a server and make calls over the network.

How is this different to a national terminology service?

Previously, I implemented SNOMED CT within an EPR. Later I realised how important it was to build it as a separate module; I created terminology servers in Java, and then later in Go; hermes is written in Clojure. In the UK, the different health services in England and Wales have procured a centralised national terminology server. While I support the provision of a national terminology server for convenience, I think it's important to recognise that it is the data that matters most. We need to cooperate and collaborate on semantic interoperability, but the software services that make use of those data can be centralised or distributed; when I do analytics, I can't see me making server round-trips for every check of subsumption! That would be silly; I've been using SNOMED for analytics for longer than most; you need flexibility in provisioning terminology services. I want tooling that can both provide services at scale, while is capable of running on my personal computers as well.

Unlike other available terminology servers, hermes is lightweight and has no other dependencies except a filesystem, which can be read-only when in operation. This makes it ideal for use in situations such as a data pipeline, perhaps built upon Apache Kafka - with hermes, SNOMED analytics capability can be embedded anywhere.

I don't believe in the idea of uploading codesystems and value sets in place. My approach to versioning is to run different services; I simply deploy new services and switch at the API gateway level.

Localisation

SNOMED CT is distributed across the world. The base distribution is the International release, but your local distribution will include this together with local data. Local data will include region-specific language reference sets.

The core SNOMED API relating to concepts and their meaning is not affected by issues of locale. Locale is used to derive the synonyms for any given concept. There should be a single preferred synonym for every concept in a given language reference set.

When you build a database, the search index caches the preferred synonyms using the installed locales.

Can I get support?

Yes. Raise an issue, or more formal support options are available on request, including a fully-managed service.

Why are you building so many repositories?

Yes, I have a lot of repositories at https://github.com/wardle, providing functionality such as:

  • Integration with UK NHS services via concierge

  • UK dictionary of medicines and devices via dmd

  • socioeconomic deprivation data via deprivare

  • UK reference data updates via trud

  • UK organisational data via clods

  • UK geographical data via the NHS postcode directory nhspd

  • I previously built an electronic patient record as a monolithic application with many of these subsystems as modules of that larger system. Over time, I'm splitting them out into their own more independent modules.

I see the future of building health and care applications as simply composing together different modules of core well-tested functionality to solve user problems.

Small modules of functionality are easier to develop, easier to understand, easier to test and easier to maintain. I design modules to be composable so that I can stitch different components together in order to solve problems.

In larger systems, it is easy to see code rotting. Dependencies become outdated and the software becomes difficult to change easily because of software that depend on it. Small, well-defined modules are much easier to build and are less likely to need ongoing changes over time; my goal is to need to update modules only in response to changes in domain not software itself. I aim for an accretion of functionality.

It is very difficult to 'prove' software is working as designed when there are lots of moving parts.

What are you using hermes for?

I have embedded it into clinical systems; I use it for a fast autocompletion service so users start typing and the diagnosis, or procedure, or occupation, or ethnicity, or whatever, pops up. Users don't generally know they're using SNOMED CT. I use it to populate pop-ups and drop-down controls, and I use it for decision support to switch functionality on and off in my user interface - e.g. does this patient have a type of 'x' such as motor neurone disease - as well as analytics. A large number of my academic publications are as a result of using SNOMED in analytics.

What is this graph stuff you're doing?

I think health and care data are and always will be heterogeneous, incomplete and difficult to process. I do not think trying to build entities or classes representing our domain works at scale; it is fine for toy applications and trivial data modelling such as e-observations, but classes and object-orientation cannot scale across such a complete and disparate environment. Instead, I find it much easier to think about first-class properties - entity - attribute - value - and use such triples as a way of building and navigating a complex, hierarchical graph.

I am using a graph API in order to decouple subsystems and can now navigate from clinical data into different types of reference data seamlessly. For example, with the same backend data, I can view an x.500 representation of a practitioner, or a FHIR R4 Practitioner resource model. The key is to recognise that identifier resolution and mapping are first class problems within the health and care domain. Similarly, I think the semantics of reading data are very different to one of writing data. I cannot shoehorn health and care data into a REST model in which we read and write to resources representing the type. Instead, just as in real-life, we record event data which can effect change. In the end, it is all data.

Is hermes fast?

Hermes benefits from the speed of the libraries it uses, particularly Apache Lucene and lmdb, and from some fundamental design decisions including read-only operation and memory-mapped data files. It provides a HTTP server using the lightweight and reliable jetty web server.

I have a small i3 NUC server on my local wifi network, and here is an example of load testing, in which users are typing 'mnd' and expecting an autocompletion:

mark@jupiter classes % wrk -c300 -t12 -d30s --latency  'http://nuc:8080/v1/snomed/search?s=mnd'
Running 30s test @ http://nuc:8080/v1/snomed/search?s=mnd
  12 threads and 300 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency    40.36ms   19.97ms 565.73ms   92.08%
    Req/Sec   632.19     66.79     0.85k    68.70%
  Latency Distribution
     50%   38.76ms
     75%   45.93ms
     90%   54.09ms
     99%   79.31ms
  226942 requests in 30.09s, 125.75MB read
Requests/sec:   7540.91
Transfer/sec:      4.18MB

This uses 12 threads to make 300 concurrent HTTP connections. On 99% of occasions, that would provide a fast enough response for autocompletion (<79ms). Of course, that is users typing at exactly the same time, so a single instance could support more concurrent users than that. Given its design, Hermes is designed to easily scale horizontally, because you can simply run more servers and load balance across them. Of course, these data are fairly crude, because in real-life you'll be doing more complex concurrent calls. In real deployments, I've only needed one instance for hundreds of concurrent users, but it is nice to know I can scale easily.

Can I use hermes with containers?

Yes. It is designed to be containerised, although I have a mixture of different approaches in production, including running from source code directly. I would usually advise creating a volume and populating that with data, and then permitting read-only access to your service containers. A shared volume can be memory mapped by multiple running instances and provide high scalability.

There are some examples of different configurations available.

Can I use hermes on Apple Silicon?

Yes. There are three options.

The first is to use Rosetta and run an x86 Java SDK and this will look for an x86 LMDB library already bundled with hermes.

The other two options install a native aarch64 LMDB library, and make it available to hermes. The best performance will be gained from using a native library.

The next version of lmdbjava will include a pre-built lmdb binary for ARM on Mac OS X, so these steps will become unnecessary and hermes will work on multiple architectures and operating systems without needing these steps.

Option 1. Install an x86 Java SDK and run using that (Rosetta).

For example, you can get a list of installed JDKs:

$ /usr/libexec/java_home -V

    11.0.17 (arm64) "Amazon.com Inc." - "Amazon Corretto 11" /Users/mark/Library/Java/JavaVirtualMachines/corretto-11.0.17/Contents/Home
    11.0.14.1 (x86_64) "Azul Systems, Inc." - "Zulu 11.54.25" /Users/mark/Library/Java/JavaVirtualMachines/azul-11.0.14.1-1--x86/Contents/Home

Choose an SDK and check what we are using

$ export JAVA_HOME=$(/usr/libexec/java_home -v 11.0.14.1)
$ clj -M -e '(System/getProperty "os.arch")'

"x86_64"
Option 2. Install the lmdb library for your architecture

Here I use homebrew on my mac:

brew install lmdb
brew list lmdb

Once you have a native LMDB installed on your machine, you can reference it from the command line:

java -Dlmdbjava.native.lib=/opt/homebrew/Cellar/lmdb/0.9.30/lib/liblmdb.dylib -jar target/hermes-1.2.1151.jar --db snomed.db status

or

clj -J-Dlmdbjava.native.lib=/opt/homebrew/Cellar/lmdb/0.9.30/lib/liblmdb.dylib -M:run --db snomed.db status
Option 3. Build the lmdb library for your architecture (ie arm64).

Install the xcode command line tools, if they are not already installed

xcode-select --install

And then download lmdb and build:

git clone --depth 1 https://git.openldap.org/openldap/openldap.git
cd openldap/libraries/liblmdb
make -e SOEXT=.dylib
mkdir -p ~/Library/Java/Extensions
cp liblmdb.dylib ~/Library/Java/Extensions

In this example, rather than specifying the location of the library at the command line, I'm just copying the library to a well known location.

Once this native library is copied, you can use hermes natively using an arm64 based JDK.

$ export JAVA_HOME=$(/usr/libexec/java_home -v 11.0.17)
$ clj -M -e '(System/getProperty "os.arch")'

"aarch64"

Can I use hermes on other architectures or operating systems such as FreeBSD?

If hermes does not already contain a pre-built binary for your operating system and architecture, you simply need to install lmdb yourself. You may need to also tell hermes where to find the native library.

e.g. on FreeBSD:

$ pkg info -lx lmdb | grep liblmdb

	/usr/local/lib/liblmdb.a
	/usr/local/lib/liblmdb.so
	/usr/local/lib/liblmdb.so.0
java -Dlmdbjava.native.lib=/usr/local/lib/liblmdb.so -jar target/hermes-1.2.1151.jar --db snomed.db status

or

clj -J-Dlmdbjava.native.lib=/usr/local/lib/liblmdb.so -M:run --db snomed.db status

Documentation

A. How to download and build a terminology service

Ensure you have a pre-built jar file, or the source code checked out from github. See below for build instructions.

I'd recommend installing clojure and running using source code but use the pre-built jar file if you prefer.

1. Download and install at least one distribution.

If your local distributor is supported, hermes can do this automatically for you. Otherwise, you will need to download your local distribution(s) manually.

i) Use a registered SNOMED CT distributor to automatically download and import

You can see distributions that are available for automatic installation:

java -jar hermes.jar available
clj -M:run available

The basic command is:

clj -M:run --db snomed.db install --dist <distribution-identifier> [properties] 

or if you are using a precompiled jar:

java -jar hermes.jar --db snomed.db install --dist <distribution-identifier> [properties]

The distribution, as defined by distribution-identifier, will be downloaded and imported to the file-based database snomed.db.

Distribution-identifier Description
uk.nhs/sct-clinical UK SNOMED CT clinical - incl international release
uk.nhs/sct-drug-ext UK SNOMED CT drug extension - incl dm+d
uk.nhs/sct-monolith UK SNOMED CT monolith edition: includes everything

At the time of writing, the UK monolith edition is labelled as Draft for Trial Use.

Each distribution might require custom configuration options.

For example, the UK releases use the NHS Digital TRUD API, and so you need to pass in the following parameters:

  • --api-key : path to a file containing your NHS Digital TRUD api key
  • --cache-dir : directory to use for downloading and caching releases

For example, these commands will download, cache and install the International release, the UK clinical edition and the UK drug extension:

clj -M:run --db snomed.db install uk.nhs/sct-monolith --api-key=trud-api-key.txt --cache-dir=/tmp/trud

hermes will tell you what configuration parameters are required:

java -jar hermes.jar install --dist uk.nhs/sct-clinical --help

or

clj -M:run install --dist uk.nhs/sct-clinical --help

For the UK, TRUD requires an --api-key, which should be a path to a file containing your API key for that service.

You will need to provide different configuration options if hermes is using the MLDS to download distributions:

java -jar hermes.jar install --dist nl.mlds/128785 --help

or

clj -M:run install --dist nl.mlds/128785 --help

For MLDS downloads, you will need to provide --username and --password options. The password should be the path to a file containing your password. This makes it safer to use in automated pipelines and less likely to be accidentally logged.

ii) Download and install SNOMED CT distribution file(s) manually

Depending on where you live in the World, download the most appropriate distribution(s) for your needs.

In the UK, we can obtain these from TRUD.

For example, you can download the UK "Clinical Edition", containing the International and UK clinical distributions as part of TRUD pack 26/subpack 101.

Optionally, you can also download the UK SNOMED CT drug extension, that contains the dictionary of medicines and devices (dm+d) is available as part of TRUD pack 26/subpack 105.

Once you have downloaded what you need, unzip them to a common directory and then you can use hermes to create a file-based database.

If you are running using the jar file:

java -jar hermes.jar --db snomed.db import ~/Downloads/snomed-2020

If you are running from source code:

clj -M:run --db snomed.db import ~/Downloads/snomed-2020/

The import of both International and UK distribution files takes a total of less than 3 minutes on my machine.

2. Index

For correct operation, indices are needed for components, search and reference set membership.

Run

java -jar hermes.jar --db snomed.db index

or

clj -M:run --db snomed.db index

This will build the indices; it takes about 6 minutes on my machine.

3. Compact database (optional).

This reduces the file size and takes 20 seconds. This is an optional step, but recommended.

java -jar hermes.jar --db snomed.db compact

or

clj -M:run --db snomed.db compact

4. Run a REPL (optional)

When I first built terminology tools, either in Java or in Go, I needed to also build a custom command-line interface in order to explore the ontology. This is not necessary as most developers using Clojure quickly learn the value of the REPL; a read-evaluate-print-loop in which one can issue arbitrary commands to execute. As such, one has a full Turing-complete language (a lisp) in which to explore the domain.

I usually use a REPL from within my IDE so run a REPL from there. You can run an nREPL server, which makes it easy to connect from other editors, such as emacs or neovim:

clj -M:dev:nrepl-server

You can run a REPL and use the terminology services interactively at the command-line, but I would not advise this. It is much better to use a REPL within your editor.

clj -M:dev

5. Get the status of your installed index

You can obtain status information about any index by using:

clj -M:run --db snomed.db status --format json

Result:

{"releases":
["SNOMED Clinical Terms version: 20220731 [R] (July 2022 Release)",
  "35.6.0_20230315000001 UK drug extension",
  "35.6.0_20230315000001 UK clinical extension"],
  "locales":["en-GB", "en-US"],
  "components":
  {"concepts":1068735,
    "descriptions":3050621,
    "relationships":7956235,
    "concrete-values":33349,
    "refsets":541,
    "refset-items":13349472,
    "indices":
    {"descriptions-concept":3050621,
      "concept-parent-relationships":4737884,
      "concept-child-relationships":4737884,
      "component-refsets":10595249,
      "associations":1254384,
      "descriptions-search":3050621,
      "members-search":13349472}}}

In this example, you can see I have the July 2022 International release, with the UK clinical and drug extensions from March 2023. Given that these releases have been imported, hermes recognises it can support the locales en-GB and en-US. For completeness, detailed statistics on components and indices are also provided. Additional options are available:

java -jar hermes.jar --db snomed.db status --help

or

clj -M:run --db snomed.db status --help

6. Run a terminology web service

By default, data are returned using json, but you can request edn by simply adding "Accept:application/edn" in the request header.

java -jar hermes.jar --db snomed.db --port 8080 --bind-address 0.0.0.0 serve 

or

clj -M:run --db snomed.db --port 8080 --bind-address 0.0.0.0 serve

There are a number of configuration options for serve:

java -jar hermes.jar serve --help

or

clj -M:run serve --help
Usage: hermes [options] serve

Start a terminology server

Options:
      --allowed-origin "*" or ORIGIN    []     Set CORS policy, with "*" or hostname
      --allowed-origins "*" or ORIGINS         Set CORS policy, with "*" or comma-delimited hostnames
  -a, --bind-address BIND_ADDRESS              Address to bind
  -d, --db PATH                                Path to database directory
  -h, --help
      --locale LOCALE                   en-GB  Set default / fallback locale
  -p, --port PORT                       8080   Port number
  • --bind-address is optional. You may want to use --bind-address 0.0.0.0
  • --allowed-origins is optional. You could use --allowed-origins "*" or --allowed-origins example.com,example.net
  • --allowed-origin example.com --allowed-origin example.net is equivalent to --allowed-origins example.com,example.net.
  • --allowed-origin "*" is equivalent to --allowed-origins "*"
  • --locale sets the default locale. This is used as a default if clients do not specify their preference. e.g. --locale=en-GB

By default, the default locale will be determined by looking at which language reference sets are installed.

7. Run a HL7 FHIR terminology web service

You can use hades together with the files you have just created to run a FHIR R4 terminology server.

B. Endpoints for the HTTP terminology server

There are a range of endpoints.

I have a very small, low-powered server (<$3/mo) available for demonstration purposes. It is not intended for production use.

Here are some examples:

WARNING

The HTTP API returns data formatted as either JSON or EDN. Identifiers, such as concept or description identifiers, in SNOMED CT are 64-bit positive integers. The JSON specification does not limit the size of numeric types, but some implementations struggle to properly manage very large numbers and can silently truncate numbers. Most implementations have no such difficulty; if your client library or platform does not properly handle large numbers in JSON, there is usually a way to configure your parser to work correctly. For example, in JavaScript, you can use a reviver parameter.

Hermes could offer a per-server, or per-request configuration to stringify identifiers when output to JSON to help broken client implementations. If this applies to you, please join the discussion.

Get a single concept
http '127.0.0.1:8080/v1/snomed/concepts/24700007'
{
  "active": true,
  "definitionStatusId": 900000000000074008,
  "effectiveTime": "2002-01-31",
  "id": 24700007,
  "moduleId": 900000000000207008
}

Try it live: http://128.140.5.148:8080/v1/snomed/concepts/24700007

You'll want to use the other endpoints much more frequently.

Get extended information about a single concept
http 127.0.0.1:8080/v1/snomed/concepts/24700007/extended

Try it live: http://128.140.5.148:8080/v1/snomed/concepts/24700007/extended

The result is an extended concept definition - all the information needed for inference, logic and display. For example, at the client level, we can then check whether this is a type of demyelinating disease or is a disease affecting the central nervous system without further server round-trips. Each relationship also includes the transitive closure tables for that relationship, making it easier to execute logical inference. Note how the list of descriptions includes a convenient acceptableIn and preferredIn so you can easily display the preferred term for your locale. If you provide an Accept-Language header, then you will also get a preferredDescription that is the best choice for those language preferences given what is installed.

HTTP/1.1 200 OK
Content-Type: application/json
Date: Mon, 08 Mar 2021 22:01:13 GMT

{
    "concept": {
        "active": true,
        "definitionStatusId": 900000000000074008,
        "effectiveTime": "2002-01-31",
        "id": 24700007,
        "moduleId": 900000000000207008
    },
    "descriptions": [
        {
            "acceptableIn": [],
            "active": true,
            "caseSignificanceId": 900000000000448009,
            "conceptId": 24700007,
            "effectiveTime": "2017-07-31",
            "id": 41398015,
            "languageCode": "en",
            "moduleId": 900000000000207008,
            "preferredIn": [
                900000000000509007,
                900000000000508004,
                999001261000000100
            ],
            "refsets": [
                900000000000509007,
                900000000000508004,
                999001261000000100
            ],
            "term": "Multiple sclerosis",
            "typeId": 900000000000013009
        },
        {
            "acceptableIn": [],
            "active": false,
            "caseSignificanceId": 900000000000020002,
            "conceptId": 24700007,
            "effectiveTime": "2002-01-31",
            "id": 41399011,
            "languageCode": "en",
            "moduleId": 900000000000207008,
            "preferredIn": [],
            "refsets": [],
            "term": "Multiple sclerosis, NOS",
            "typeId": 900000000000013009
        },
        {
            "acceptableIn": [],
            "active": false,
            "caseSignificanceId": 900000000000020002,
            "conceptId": 24700007,
            "effectiveTime": "2015-01-31",
            "id": 41400016,
            "languageCode": "en",
            "moduleId": 900000000000207008,
            "preferredIn": [],
            "refsets": [],
            "term": "Generalized multiple sclerosis",
            "typeId": 900000000000013009
        },
        {
            "acceptableIn": [],
            "active": false,
            "caseSignificanceId": 900000000000020002,
            "conceptId": 24700007,
            "effectiveTime": "2015-01-31",
            "id": 481990016,
            "languageCode": "en",
            "moduleId": 900000000000207008,
            "preferredIn": [],
            "refsets": [],
            "term": "Generalised multiple sclerosis",
            "typeId": 900000000000013009
        },
        {
            "acceptableIn": [],
            "active": true,
            "caseSignificanceId": 900000000000448009,
            "conceptId": 24700007,
            "effectiveTime": "2017-07-31",
            "id": 754365011,
            "languageCode": "en",
            "moduleId": 900000000000207008,
            "preferredIn": [
                900000000000509007,
                900000000000508004,
                999001261000000100
            ],
            "refsets": [
                900000000000509007,
                900000000000508004,
                999001261000000100
            ],
            "term": "Multiple sclerosis (disorder)",
            "typeId": 900000000000003001
        },
        {
            "acceptableIn": [
                900000000000509007,
                900000000000508004,
                999001261000000100
            ],
            "active": true,
            "caseSignificanceId": 900000000000448009,
            "conceptId": 24700007,
            "effectiveTime": "2017-07-31",
            "id": 1223979019,
            "languageCode": "en",
            "moduleId": 900000000000207008,
            "preferredIn": [],
            "refsets": [
                900000000000509007,
                900000000000508004,
                999001261000000100
            ],
            "term": "Disseminated sclerosis",
            "typeId": 900000000000013009
        },
        {
            "acceptableIn": [
                900000000000509007,
                900000000000508004,
                999001261000000100
            ],
            "active": true,
            "caseSignificanceId": 900000000000017005,
            "conceptId": 24700007,
            "effectiveTime": "2003-07-31",
            "id": 1223980016,
            "languageCode": "en",
            "moduleId": 900000000000207008,
            "preferredIn": [],
            "refsets": [
                900000000000509007,
                900000000000508004,
                999001261000000100
            ],
            "term": "MS - Multiple sclerosis",
            "typeId": 900000000000013009
        },
        {
            "acceptableIn": [
                900000000000509007,
                900000000000508004,
                999001261000000100
            ],
            "active": true,
            "caseSignificanceId": 900000000000017005,
            "conceptId": 24700007,
            "effectiveTime": "2003-07-31",
            "id": 1223981017,
            "languageCode": "en",
            "moduleId": 900000000000207008,
            "preferredIn": [],
            "refsets": [
                900000000000509007,
                900000000000508004,
                999001261000000100
            ],
            "term": "DS - Disseminated sclerosis",
            "typeId": 900000000000013009
        }
    ],
    "directParentRelationships": {
        "116676008": [
            409774005,
            32693004
        ],
        "116680003": [
            6118003,
            414029004,
            39367000
        ],
        "363698007": [
            21483005
        ],
        "370135005": [
            769247005
        ]
    },
    "parentRelationships": {
        "116676008": [
            138875005,
            107669003,
            123037004,
            409774005,
            32693004,
            49755003,
            118956008
        ],
        "116680003": [
            6118003,
            138875005,
            404684003,
            123946008,
            118234003,
            128139000,
            23853001,
            246556002,
            363170005,
            64572001,
            118940003,
            414029004,
            362975008,
            363171009,
            39367000,
            80690008,
            362965005
        ],
        "363698007": [
            138875005,
            21483005,
            442083009,
            123037004,
            25087005,
            91689009,
            91723000
        ],
        "370135005": [
            138875005,
            769247005,
            308489006,
            303102005,
            281586009,
            362981000,
            719982003
        ]
    },
    "refsets": [
        991381000000107,
        999002271000000101,
        991411000000109,
        1127581000000103,
        1127601000000107,
        900000000000497000,
        447562003
    ]
}
Get properties for a single concept

Each concept within SNOMED CT is associated with relationships. You can use hermes to return these as groups of properties, including concrete values when available.

Here we look at properties for the concept representing the anti-convulsant lamotrigine:

http 'http://127.0.0.1:8080/v1/snomed/concepts/1231295007/properties'

Try it live http://128.140.5.148:8080/v1/snomed/concepts/1231295007/properties

Note that when results are not expanded, the metadata model is used to fix the cardinality of the values for the relationship in the context of the concept.

{
    "0": {
        "1142139005": "#1",
        "116680003": [
            779653004
        ],
        "411116001": 385060002,
        "763032000": 732936001,
        "766939001": [
            773862006
        ]
    },
    "1": {
        "1142135004": "#250",
        "1142136003": "#1",
        "732943007": 387562000,
        "732945000": 258684004,
        "732947008": 732936001,
        "762949000": 387562000
    }
}

Available parameters

  • expand - expand results to include transitive relationships (true/false/1/0)
  • format - format results
  • key-format - format keys
  • value-format - format values

For machine-interpretation, it is best to simply use ?expand=1 and process identifiers appropriately. For human consumption, and for interactive use, properties can be pretty-printed using a variety of formatting options:

Each format can be one of

  • id the identifier
  • syn synonym (language determined by Accept-Language header or system/index defaults)
  • id:syn a string of identifier and synonym
  • [id:syn] a vector of identifier and synonym
  • {id:syn} a map of identifier to synonym

Example:

http 'http://127.0.0.1:8080/v1/snomed/concepts/1231295007/properties?expand=0&format=id:syn'

Try it live http://128.140.5.148:8080/v1/snomed/concepts/1231295007/properties?expand=1&format=id:syn

Note again how the models within SNOMED CT are used to determine the cardinality of the returned relationships. A drug can have multiple roles, but has only single 'count of base of active ingredient and 'manufactured dose form' properties.

{
    "0": {
        "1142139005:Count of base of active ingredient": "#1",
        "116680003:Is a": [
            "779653004:Lamotrigine only product in oral dose form"
        ],
        "411116001:Has manufactured dose form": "385060002:Prolonged-release oral tablet",
        "763032000:Has unit of presentation": "732936001:Tablet",
        "766939001:Plays role": [
            "773862006:Anticonvulsant therapeutic role"
        ]
    },
    "1": {
        "1142135004:Has presentation strength numerator value": "#250",
        "1142136003:Has presentation strength denominator value": "#1",
        "732943007:Has BoSS": "387562000:Lamotrigine",
        "732945000:Has presentation strength numerator unit": "258684004:mg",
        "732947008:Has presentation strength denominator unit": "732936001:Tablet",
        "762949000:Has precise active ingredient": "387562000:Lamotrigine"
    }
}
Search

Example usage of search endpoint.

http '127.0.0.1:8080/v1/snomed/search?s=mnd\&constraint=<64572001&maxHits=5'

Try it live: http://128.140.5.148:8080/v1/snomed/search?s=mnd&constraint=<64572001&maxHits=5

[
  {
    "id": 486696014,
    "conceptId": 37340000,
    "term": "MND - Motor neurone disease",
    "preferredTerm": "Motor neuron disease"
  }
]

This searches only active concepts, but both active and inactive descriptions, by default. This can be changed per request. The defaults are sensible, because a user trying to find something with a now inactive synonym such as 'Wegener's Granulomatosis' will be suprised that their search fails to return any results.

Search parameters:

  • s - the text to search
  • constraint - an ECL expression to constrain the search; I never use search without this
  • maxHits - maximum number of hits
  • inactiveConcepts - whether to search inactive concepts (default, false)
  • inactiveDescriptions - whether to search inactive descriptions (default, true)
  • fuzzy - whether to use fuzziness for search (default, false)
  • fallbackFuzzy - whether to retry using a fuzziness factor if initial search returns no results (default, false)
  • removeDuplicates - whether to remove consecutive results with the same conceptId and text (default, false)

For autocompletion, in a typical type-ahead user interface control, you might use fallbackFuzzy=1 (or fallbackFuzzy=true) and removeDuplicates=1 (or removeDuplicates=true). That will mean that if a user mistypes one or two characters, they should still get some sensible results.

removeDuplicates is designed to create a better user experience when searching SNOMED CT. In general, during search, you will want to show to the user the multiple synonyms for a given concept. Recently however, and particularly if you are using multiple SNOMED CT distributions (e.g. both the UK clinical and drug extensions), then a single concept may have multiple synonyms with the same textual content. This can be disconcerting for end-users as it looks as if there are duplicates in the autocompletion list. Each, of course, has a different description id, but we do not show identifiers to end-users. To improve the user experience, I advise using removeDuplicates to remove consecutive results with the same conceptId and text.

Here I search for all UK medicinal products with the name amlodipine and populate my autocompletion control using the results:

http '127.0.0.1:8080/v1/snomed/search?s=amlodipine\&constraint=<10363601000001109&fallbackFuzzy=true&removeDuplicates=true&maxHits=500'

Try it live: http://128.140.5.148:8080/v1/snomed/search?s=amlodipine&constraint=<10363601000001109&fallbackFuzzy=true&removeDuplicates=true&maxHits=500

More complex expressions are supported, and no search term is actually needed.

Let's get all drugs with exactly three active ingredients:

http '127.0.0.1:8080/v1/snomed/search?constraint=<373873005|Pharmaceutical / biologic product| : [3..3]  127489000 |Has active ingredient|  = <  105590001 |Substance|'

Try it live: http://128.140.5.148:8080/v1/snomed/search?constraint=<373873005|Pharmaceutical / biologic product| : [3..3] 127489000 |Has active ingredient| = < 105590001 |Substance|

Or, what about all disorders of the lung that are associated with oedema?

http -j '127.0.0.1:8080/v1/snomed/search?constraint= <  19829001 |Disorder of lung|  AND <  301867009 |Edema of trunk|'

Try it live: http://128.140.5.148:8080/v1/snomed/search?constraint=/v1/snomed/search?constraint= < 19829001 |Disorder of lung| AND < 301867009 |Edema of trunk|

The ECL can be written more concisely:

http -j '127.0.0.1:8080/v1/snomed/search?constraint= <19829001 AND <301867009'
Expanding ECL without search

SNOMED CT provides the Expression Constraint Language (ECL) to declaratively define constraints for expressions. hermes provides support for the latest version of ECL. If you are simply expanding an ECL expression without search terms, you can use the expand endpoint.

http -j '127.0.0.1:8080/v1/snomed/expand?ecl= <19829001 AND <301867009&includeHistoric=true'

Try it live: http://128.140.5.148:8080/v1/snomed/expand?ecl=<19829001 AND <301867009&includeHistoric=true

This has an optional parameter includeHistoric which can expand the expansion to include historical associations. This is very useful in analytics. SNOMED introduced dedicated historic functionality in ECL v2.0, allowing you to choose to include historic associations as part of your ECL. You can use either approach in hermes.

For example,

<195967001 |Asthma| {{ +HISTORY-MOD }}

is an ECL expression that will return Asthma, and all subtypes, including those now considered inactive or duplicate. You can read more about the new history supplement functionality in ECL2.0 in the formal documentation.

Try it live: http://128.140.5.148:8080/v1/snomed/expand?ecl=<<195967001 {{ +HISTORY-MOD }}

As a concept identifier is actually a valid SNOMED ECL expression, you can do this:

http -j '127.0.0.1:8080/v1/snomed/expand?ecl=24700007&includeHistoric=true'

Try it live: http://128.140.5.148:8080/v1/snomed/expand?ecl=24700007&includeHistoric=true

[
    {
        "conceptId": 586591000000100,
        "id": 1301271000000113,
        "preferredTerm": "Multiple sclerosis NOS",
        "term": "Multiple sclerosis NOS"
    },
    {
        "conceptId": 192930001,
        "id": 297181019,
        "preferredTerm": "Multiple sclerosis NOS",
        "term": "Multiple sclerosis NOS"
    },
    {
        "conceptId": 24700007,
        "id": 41398015,
        "preferredTerm": "Multiple sclerosis",
        "term": "Multiple sclerosis"
    }
    ...
]

You can search using concrete values.

Here is SNOMED ECL that will return all products containing 250mg of amoxicillin that have an oral dose form:

< 763158003 |Medicinal product (product)| :
     411116001 |Has manufactured dose form (attribute)|  = <<  385268001 |Oral dose form (dose form)| ,
    {    <<  127489000 |Has active ingredient (attribute)|  = <<  372687004 |Amoxicillin (substance)| ,
          1142135004 |Has presentation strength numerator value (attribute)|  = #250,
         732945000 |Has presentation strength numerator unit (attribute)|  =  258684004 |milligram (qualifier value)|}

You can use hermes to expand this:

Try it live: http://128.140.5.148:8080/v1/snomed/expand?ecl=<7631580003...

Unfortunately, at the time of writing, the UK SNOMED drug extension doesn't currently publish concrete values data for products in the UK dictionary of medicines and devices, but this is on their roadmap.

Crossmap to and from SNOMED CT

There are endpoints for crossmapping to and from SNOMED.

Let's map one of our diagnostic terms into ICD-10:

  • 24700007 is multiple sclerosis.
  • 999002271000000101 is the ICD-10 UK complex map reference set.
http -j 127.0.0.1:8080/v1/snomed/concepts/24700007/map/999002271000000101

Try it live: http://128.140.5.148:8080/v1/snomed/concepts/24700007/map/999002271000000101

Result:

[
    {
        "active": true,
        "correlationId": 447561005,
        "effectiveTime": "2020-08-05",
        "id": "57433204-2371-5c6f-855f-94ff9dad7ba6",
        "mapAdvice": "ALWAYS G35.X",
        "mapCategoryId": 1,
        "mapGroup": 1,
        "mapPriority": 1,
        "mapRule": "",
        "mapTarget": "G35X",
        "moduleId": 999000031000000106,
        "referencedComponentId": 24700007,
        "refsetId": 999002271000000101
    }
]

And of course, we can crossmap back to SNOMED as well:

http -j 127.0.0.1:8080/v1/snomed/crossmap/999002271000000101/G35X

Try it live: http://128.140.5.148:8080/v1/snomed/crossmap/999002271000000101/G35X

If you map a concept into a reference set that doesn't contain that concept, you'll automatically get the best parent matches instead.

Map a concept into a reference set

You will usually crossmap using a SNOMED CT crossmap reference set, such as those for ICD-10 or OPCS. However, Hermes supports crossmapping a concept into any reference set. You can use this feature in data analytics in order to reduce the dimensionality of your dataset.

Here we have multiple sclerosis (24700007), and we're mapping into the UK emergency unit reference set (991411000000109):

http -j 127.0.0.1:8080/v1/snomed/concepts/24700007/map/991411000000109

Try it live: http://128.140.5.148:8080/v1/snomed/concepts/24700007/map/991411000000109

The UK emergency unit reference set gives a subset of concepts used for central reporting problems and diagnoses in UK emergency units.

As multiple sclerosis in that reference set, you'll simply get:

[
  {
    "active": true,
    "effectiveTime": "2015-10-01",
    "id": "d55ce305-3dcc-5723-8814-cd26486c37f7",
    "moduleId": 999000021000000109,
    "referencedComponentId": 24700007,
    "refsetId": 991411000000109
  }
]

But what happens if we try something that isn't in that emergency reference set?

Here is 'limbic encephalitis with LGI1 antibodies' (763794005). It isn't in that UK emergency unit reference set:

http -j 127.0.0.1:8080/v1/snomed/concepts/763794005/map/991411000000109

Try it live: http://128.140.5.148:8080/v1/snomed/concepts/763794005/map/991411000000109

Result:

[
  {
    "active": true,
    "effectiveTime": "2015-10-01",
    "id": "5b3b8cdd-dd02-50e3-b207-bf4a3aa17694",
    "moduleId": 999000021000000109,
    "referencedComponentId": 45170000,
    "refsetId": 991411000000109
  }
]

You get a more general concept - 'encephalitis' (45170000) that is in the emergency unit reference set. This makes it straightforward to map concepts into subsets of terms as defined by a reference set for analytics.

You could limit users to only entering the terms in a subset, but much better to allow clinicians to regard highly-specific granular terms and be able to map to less granular terms on demand.

C. Embed into another application

You can use git coordinates in a deps.edn file, or use maven:

In your deps.edn file (make sure you change the commit-id):

[com.eldrix.hermes {:git/url "https://github.com/wardle/hermes.git"
                    :sha     "097e3094070587dc9362ca4564401a924bea952c"}

In your pom.xml:

<dependency>
  <groupId>com.eldrix</groupId>
  <artifactId>hermes</artifactId>
  <version>1.0.960</version>
</dependency>

Remember to use the latest version.

You may need to add Clojars as a repository in your build tool. Here for maven:

<repositories>
    <repository>
        <id>clojars.org</id>
        <url>https://clojars.org/repo</url>
    </repository>
</repositories>

D. Development

See /doc/development on how to develop, test, lint, deploy and release hermes.

E. Backwards compatibility and versioning

Hermes uses versions of form major.minor.commit

Hermes builds a file-based database made up of a store and indices and each database is also versioned. Hermes of a specified major/minor version is compatible with databases created by the same major/minor version. For example, a database was created with Hermes v1.4.1265 can be read by Hermes v1.4.1320, but one created with Hermes v1.3.1262 cannot

If backwards compatibility can easily be preserved, the major/minor version is kept the same. For example, when support for concrete values was added, this was an additive change so that newer versions of Hermes would simply degrade gracefully, but throw a warning to say concrete values were not supported for this database.

On some occasions, compatibility is broken even when there is only a minor change to database format to prevent user inconvenience, error or confusion. For example, in the change from 1.3 series to 1.4, the search index changed to use normalised (folded) text according to term locale. This was a small change and degradation could have occurred gracefully, but such a fallback would lead to varying behaviour depending on which database was used and potentially confuse users.

In general therefore, the policy for versioning is to enforce exact version matching for a given Hermes and database version with a bias towards bumping versions when backwards compatibility or fallback modes of operation could result in confusing or unexpected behaviour.

Mark

hermes's People

Contributors

sidharthramesh avatar wardle avatar wilkerlucio avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hermes's Issues

Backing key value store

Evaluate and switch to alternative file based key value store. Given the read-only nature of the service in production, a B-tree based system makes sense, at a slight cost of slower import - which occurs only once. The Java FFI / Panama project isn't finalised but lmdb and lmdbjava may work in the interim. Mapdb works but is no longer actively maintained as far as I can tell.

I have an active private branch with a new lmdb based store, but this needs more testing and benchmarking. I'd be grateful for testing by others once I make it public - particularly given lmdb uses a native library.

Build from source - ClassNotFoundException

ClassNotFoundException) at java.net.URLClassLoader/findClass (URLClassLoader.java:445) when building from source.

Same on Ubuntu

openjdk version "17.0.7" 2023-04-18
OpenJDK Runtime Environment (build 17.0.7+7-Ubuntu-0ubuntu122.10.2)
OpenJDK 64-Bit Server VM (build 17.0.7+7-Ubuntu-0ubuntu122.10.2, mixed mode, sharing)

Macos

openjdk version "11.0.8" 2020-07-14
OpenJDK Runtime Environment AdoptOpenJDK (build 11.0.8+10)
OpenJDK 64-Bit Server VM AdoptOpenJDK (build 11.0.8+10, mixed mode)

Improve server error messages

See #47. For some reason server giving a 404 when obviously at the library level an exception is appropriately thrown during parsing an unsupported ECL expression. The server should catch exceptions such as this and display a more useful error message.

Consider adding alternatives to expand-ecl

At the moment expand-ecl provides only a list of ResultItems. This is fine as a baseline efficient query but it would be helpful to support clients with different results such as a set of concept identifiers or only ResultItems representing descriptions from a particular language reference set and of a specific term (eg preferred synonyms only).

Search wraps properties containing non-vector types as a vector

This is fine for clojure, as one passes in either {snomed/asA [1234]} or {snomed/isA 1234} and both work. The latter is recognised and wrapped in a vector. However, java clients may pass through any arbitrary java.util.Collection, and in fact Lucene accepts any arbitrary Collection, so we need to check for a Collection type rather than using vector?

Allow explicit locale fallback preferences

There are five functions that use the system default locale as a fallback in the top-level API:

  • preferred-synonym
  • fully-specified-name
  • search-concept-ids
  • pprint-properties
  • index

Each permits an explicit locale to be set using a language range (e.g. "en-GB"). index takes an explicit locale, but falls back to the system locale, and falls back to "en-US" if that system locale cannot be found.

There are four functions that take a collection of language reference set identifiers in order to choose locale-specific user presentable descriptions:

  • synonyms
  • preferred-synonym*
  • search-concept-ids
  • expand-ecl*

Each permits an explicit locale to be set using a collection of language reference set identifiers. They are tested in order as there are some language reference sets that do not cover the whole of SNOMED CT (e.g. some UK specialist extensions).

There are two distinct but related types of 'fallback'.

The first is when a client application does not include language preferences. In this situation, the current fallback is to use the system default locale, which is usually passed to match-locale and the results used. However, while this will usually work (e.g. in the UK, my system is 'en-GB' and the UK distribution can always return results for 'en-GB', so I always get a match with this fallback), there are cases in which it will not (e.g. I'm in France, but I'm running the UK distribution for some reason, my system is 'fr-fr' and I will be surprised to get a nil result when returning, e.g. a preferred synonym.

As such the second fallback is when a requested set of locale preferences cannot be met by the installed language reference sets and the client would much prefer a result, than no result. This would usually be the case, but not always.

So there are two distinct issues here.

First, when a database is opened, it would be sensible to permit an explicit base fallback locale, rather than assuming system default locale. If that fallback locale is not available in the set of installed language reference sets, then an error should be thrown. We then know that the db fallback language will always return a valid result. This does not prevent the system locale to be used when a base fallback is not provided explicitly, but an error should be thrown if that locale preference cannot be met by the set of installed reference sets.

Second, in functions returning locale-sensitive data, they should use the base database fallback language if no explicit language preference is provided, or optionally, use that fallback if the passed in preference cannot be met. The latter aspect will usually need to be opt-in, depending on the use-case.

Proposal:

  1. Introduce a top-level default locale that can be explicitly chosen by clients on opening a file-based database. This can itself default to the system locale, to maintain backwards compatibility. However, if that default locale preference cannot be satisfied by the installed set of language reference sets, then there should be an error.
  2. Change functions to use the database default locale rather than the system locale as a fallback.
  3. Make it explicit in the API so that clients can choose whether to fallback to the database default locale or to return nil values, when explicit preferences at the time cannot be met.

Evaluate compression in backing store

Many key value stores use compression to reduce I/O overhead, and this is likely to be most important in memory-constrained systems in which the whole database can't sit in memory. With the UK release, which includes the International edition, plus the UK clinical and the UK drug extension takes 5.3gb.

It is clear that overall, compression is possible as one can compress the database files using command-line tools.

Some systems compress by taking the most compressible data and putting into blocks (pages), that can be compressed - necessitating IO for the whole page and then decompression to get out one data item. Others use item-level compression but with a dictionary to improve compressibility.

To evaluate using a crude item-level compression, one can use synthetic SNOMED data:

(def refset-items (gen/sample (rf2/gen-refset) 1000000)

Here I generate one million random reference set items. An example of a generated item:

=> #com.eldrix.hermes.snomed.SimpleMapRefsetItem{:id #uuid"24889666-2bc4-4631-a11e-1d09f417ef60",
                                               :effectiveTime #object[java.time.LocalDate 0x2f052298 "2015-04-07"],
                                               :active false,
                                               :moduleId 4554197100,
                                               :refsetId 4554198008,
                                               :referencedComponentId 4554199102,
                                               :mapTarget "8cP1RB4o6792q1HBb",
                                               :fields []}

We can write a crude test of compression. Here is a function that takes a single item, and compresses it (using LZ4), returning a map containing information about the compressed and uncompressed lengths:

(def lz4-factory (delay (LZ4Factory/fastestInstance)))
(def compressor (delay (.highCompressor @lz4-factory)))
(def decompressor (delay (.fastDecompressor @lz4-factory)))

(defn compress
  [^bytes data]
  (let [^LZ4Compressor compressor @compressor
        uncompressed-length (alength data)
        max-compressed-length (.maxCompressedLength compressor uncompressed-length)
        data' (byte-array max-compressed-length)
        compressed-length (.compress compressor data 0 uncompressed-length data' 0 max-compressed-length)
        baos (ByteArrayOutputStream. (+ 4 compressed-length)) ;; space for length and the compressed data
        dos (DataOutputStream. baos)]
    (.writeInt dos uncompressed-length)
    (.write dos data' 0 compressed-length)
    (.toByteArray baos)))

(defn uncompress
  [^bytes data]
  (let [^LZ4FastDecompressor decompressor @decompressor
        bais (ByteArrayInputStream. data)
        dis (DataInputStream. bais)
        uncompressed-length (.readInt dis)
        data' (byte-array uncompressed-length)]
    (.decompress decompressor data 4 data' 0 uncompressed-length)
    data'))

(defn test-compress
  [item]
  (let [buf (.heapBuffer (PooledByteBufAllocator/DEFAULT) 256)
         _ (ser/write-refset-item buf item)
         uncompressed-length (.readableBytes buf)
         compressed (compress (ByteBufUtil/getBytes buf))
         compressed-length (alength compressed)]
    (.release buf)
    {:uncompressed uncompressed-length
      :compressed compressed-length}))

We can now use that function to test how compressible our reference set items are, if we use an item-per-item compressor:

(reduce (fn [acc v] (merge-with + acc v)) {} (map test-compress (gen/sample (rf2/gen-refset) 1000000)))

This simply reduces against our sample of 1 million items, merging each result by using '+' and thus giving us our result:

=>{:uncompressed 105847725, :compressed 97297913}

But that's in bytes, so let's convert to megabytes:

(float (/ (- (:uncompressed results ) (:compressed results)) 1024 1024))
=> 8.153736

Oh! But hold on, we don't have a million reference set items. The UK distribution at the time of writing has 12694317 items. So let's scale our answer:

(float (/ (* (/ 12694317 1000000) (- (:uncompressed results ) (:compressed results))) 1024 1024))
=> 103.50611

We would save only 103 megabytes if we compress reference set items in such a naive, item by item way. It is a very conservative estimate, as real strings might compress better than these randomly generated ones. I think still a useful exercise to determine whether to spend any more time on this approach.

Importing additional refsets from the Spanish drugs extension

Hi Mark, sorry, me again.

My next issue with using hermes to host a Spanish SNOEMD terminology service is that it doesn't seem to be importing all refsets from the release files. Below are the logs from my build script and as you can see it's importing four snapshot refsets:

  • der2_cRefset_AttributeValueSpainDrugExtensionSnapshot_es-ES_ES_20211001.txt" type: "AttributeValueRefset"
  • der2_cRefset_AssociationSpainDrugExtensionSnapshot_es-ES_ES_20211001.txt" type: "AssociationRefset"
  • der2_cRefset_LanguageSpainDrugExtensionSnapshot_en_20211001.txt" type: "LanguageRefset"
  • der2_cRefset_LanguageSpainDrugExtensionSnapshot_es-ES_ES_20211001.txt" type: "LanguageRefset"

In the Refset/Content directory there are more files that I think are important to my use-case, including:

  • der2_sRefset_VTMSpainDrugSnapshot_es-ES_ES_20211001.txt
  • der2_scRefset_VMPSpainDrugSnapshot_es-ES_ES_20211001.txt
  • der2_scicRefset_VMPPSpainDrugSnapshot_es-ES_ES_20211001.txt

I couldn't spot anything in the documentation for importing refsets as it looks like it tries to do everything automagically. My aim is to produce typeahead drug pickers, hence the need for those refsets. In the UK it's not really necessary since all the medication products have relationships meaning you can do ECL queries of the form < VMP. The Spanish version seems a lot less fleshed out, so I'm hoping the refsets will provide what I need.

All build output for this step:

Step 20/30 : RUN clj -M:run --db snomed.db import ../content/snomed-es-drugs/SnomedCT_SpainDrugExtension-ES_PRODUCTION_20211001T120000/RF2Release/
 ---> Running in 1f7fb7c3e4ee
2021-11-01 11:22:54,069 [main] INFO  com.eldrix.hermes.importer - importing files from  "../content/snomed-es-drugs/SnomedCT_SpainDrugExtension-ES_PRODUCTION_20211001T120000/RF2Release/"
2021-11-01 11:22:54,122 [async-thread-macro-1] INFO  com.eldrix.hermes.importer - Processing:  "../content/snomed-es-drugs/SnomedCT_SpainDrugExtension-ES_PRODUCTION_20211001T120000/RF2Release/Snapshot/Refset/Content/der2_cRefset_AttributeValueSpainDrugExtensionSnapshot_es-ES_ES_20211001.txt"  type:  "AttributeValueRefset"
2021-11-01 11:22:54,138 [async-thread-macro-2] INFO  com.eldrix.hermes.importer - Processing:  "../content/snomed-es-drugs/SnomedCT_SpainDrugExtension-ES_PRODUCTION_20211001T120000/RF2Release/Snapshot/Refset/Content/der2_cRefset_AssociationSpainDrugExtensionSnapshot_es-ES_ES_20211001.txt"  type:  "AssociationRefset"
2021-11-01 11:22:54,138 [async-thread-macro-4] INFO  com.eldrix.hermes.importer - Processing:  "../content/snomed-es-drugs/SnomedCT_SpainDrugExtension-ES_PRODUCTION_20211001T120000/RF2Release/Snapshot/Refset/Language/der2_cRefset_LanguageSpainDrugExtensionSnapshot_en_20211001.txt"  type:  "LanguageRefset"
2021-11-01 11:22:54,139 [async-thread-macro-1] INFO  com.eldrix.hermes.importer - Processing:  "../content/snomed-es-drugs/SnomedCT_SpainDrugExtension-ES_PRODUCTION_20211001T120000/RF2Release/Snapshot/Terminology/sct2_Description_SpainDrugExtensionSnapshot_es-ES_ES_20211001.txt"  type:  "Description"
2021-11-01 11:22:54,150 [async-thread-macro-3] INFO  com.eldrix.hermes.importer - Processing:  "../content/snomed-es-drugs/SnomedCT_SpainDrugExtension-ES_PRODUCTION_20211001T120000/RF2Release/Snapshot/Refset/Language/der2_cRefset_LanguageSpainDrugExtensionSnapshot_es-ES_ES_20211001.txt"  type:  "LanguageRefset"
2021-11-01 11:22:54,195 [async-thread-macro-2] INFO  com.eldrix.hermes.importer - Processing:  "../content/snomed-es-drugs/SnomedCT_SpainDrugExtension-ES_PRODUCTION_20211001T120000/RF2Release/Snapshot/Terminology/sct2_Relationship_SpainDrugExtensionSnapshot_es-ES_ES_20211001.txt"  type:  "Relationship"
2021-11-01 11:22:54,299 [async-thread-macro-4] INFO  com.eldrix.hermes.importer - Processing:  "../content/snomed-es-drugs/SnomedCT_SpainDrugExtension-ES_PRODUCTION_20211001T120000/RF2Release/Snapshot/Terminology/sct2_Concept_SpainDrugExtensionSnapshot_es-ES_ES_20211001.txt"  type:  "Concept"
Removing intermediate container 1f7fb7c3e4ee

Server output uses default platform character encoding instead of UTF-8.

In the interim, one can start the server by using

java -Dfile.encoding=UTF-8 -jar target/hermes-v0.7.jar --db snomed.db serve

This is only an issue on Windows, as the default character encoding will be windows-1252 unless explicitly changed at the command-line. Firstly, java should use UTF-8 by default, secondly no libraries should default to platform encoding, and thirdly, I should have checked and set explicitly.

Multiple word searches have missing results

Search for "Abdominal Pain":

http://localhost:8080/v1/snomed/search?s=abdominal - Returns a lot of results
http://localhost:8080/v1/snomed/search?s=abdominal+p - No results
http://localhost:8080/v1/snomed/search?s=abdominal+pa - No results
http://localhost:8080/v1/snomed/search?s=abdominal+pai - Returns a lot of results

Hermes always returns 0 results / 404 error when the second word has less than 3 characters.

Needs a logo?

Hermes is the only project that is a viable open-source SNOMED server right now. And it needs to get more popular. Having a logo would help.

Add refset-id support for choosing language reference set in Accept-Language header

Currently, we map an ISO language code into a "best fit" reference set, or set of reference sets. This actually works - so it is reasonable, for example, to map en-GB to what we think of as the best language reference sets for the UK given our local distributions (ie UK clinical and UK dm+d). BUT, it would be nice to give clients complete control over the language reference set, if they wish.

So we could accept a language code of the format "en-x-900000000000508004" and use that if given explicitly, and fallback to using other methods for language matching.

More complete live test coverage

I run live tests using downloaded distributions but this means these tests cannot easily run in an automated pipeline. It is not possible to distribute a real distribution owing to licensing issues.

My preference would be to build tests using a synthetic distribution of SNOMED with lots of edge cases and tricky 'stuff'.

There is a limit to how 'good' you can make tests without such data.

I have suggested this to SNOMED International - in essence we need a testing distribution that can be used as a way to validate any particular implementation.

In the meantime, I think I should carve out some time to build a rudimentary fake distribution as a proof-of-concept that can be freely shared and improved at a community level.

Add support for concrete types, for indexing and for search

I'm trying to run an ECL query on a terminology server using the Argentina edition of SNOMED CT. The query is meant to retrieve concepts with specific attributes, but it's not returning the expected results.
Here's the ECL query I'm using:

<781025005: 763032000 = 732936001, 1142135004 = #250

Concept ID 781025005 (Product containing metamizole)
Attribute 763032000 (Has dose form) with value 732936001 (Oral tablet)
Attribute 1142135004 (Has presentation strength numerator value) with value #250

The presentation strength refinement is the one not working. Using a wildcard returns an empty array, using a value (#250) returns 404. The correct result count in the case of #250 is 11 results.
You can test here for the correct output https://browser.ihtsdotools.org/?perspective=full&conceptId1=404684003&edition=MAIN/SNOMEDCT-ES/SNOMEDCT-AR/2022-11-30&release=&languages=es,en

Thanks in advance. I'm looking closely into your commits to be of more help in future issues and learn some LISP along the way. Thank you again for your awesome piece of software

Server not responding in Version 0.6

I've imported and indexed all the files using the jar in v0.6.

I skipped the compact step due to a Java Heap Space error.

Running the serve command brings up everything in the terminal as usual

$ java -jar hermes-v0.6.jar -d db serve

2021-04-20 00:41:16,060 [main] INFO  com.eldrix.hermes.core - starting terminology server  {:http/server {:port 8080, :svc #integrant.core.Ref{:key :terminology/service}, :join? true}, :terminology/service {:path db}}
2021-04-20 00:41:16,218 [main] INFO  com.eldrix.hermes.terminology - hermes terminology service opened  "db" {:version 0.4, :store "store.db", :search "search.db", :created "2021-04-20T00:33:18.951033", :releases ("SNOMED Clinical Terms version: 20200731 [R] (July 2020 Release)")}
2021-04-20 00:41:16,219 [main] INFO  com.eldrix.hermes.server - starting server on port  8080
2021-04-20 00:41:16,447 [main] INFO  org.eclipse.jetty.server.Server - jetty-9.4.z-SNAPSHOT; built: 2019-04-29T20:42:08.989Z; git: e1bc35120a6617ee3df052294e433f3a25ce7097; jvm 11.0.8+10-post-Ubuntu-0ubuntu120.04
2021-04-20 00:41:16,488 [main] INFO  org.eclipse.jetty.server.Server - Started @4113ms

But there is no response when making a request.

Error: Couldn't connect to server

On chrome visiting http://localhost:8080

localhost refused to connect

Add graph resolution.

Hermes fits into a wider architecture of first-class properties (namespace and identifier) and graph traversal via properties. This takes the prior usage in PatientCare v3 of SNOMED as a lingua franca and extensive use of dynamic key-value coding (as used in Apple WebObjects, swift and of course NeXTSTEP. Using key value coding and key paths, I could traverse heterogeneous entities to resolve an arbitrary keypath. This could drive semantic inference as well as making user interface binding very easy indeed. The modern approach is to use a graph-like API as a facade. This can be thought of like GraphQL or even a set of triples (subject predicate object) as a facade in front of multiple source systems. So, need to add a graph-like API facade permitting arbitrary resolution of identifiers and relationships.

Search by Semantic Tag

While showing a drug list for clinicians to pick from, I am trying to filter out the 'Clinical Drug' and 'Real Clinical Drug'. However, these seem to be semantic tags and not a part of the SNOMED CT hierarchy. Is it possible to include this information in an ECL? Or is there any way of including the semantic tag in a search?

This is only an issue with the clinical drugs and real clinical drugs semantic tags, since they don't seem to be a part of a hierarchy like Disorders, which can clearly be filtered using an ECL. Related issue.

Dropping support for Java 1.8 and Lucene 8?

Lucene 9.5 has a new API which is faster and more memory efficient, using 'storedFields' in the index searcher.

To use this, there are two possibilities:

  1. Drop support for Lucene 8 - which was the last Lucene to support java 1.8.
  2. Add compile-time code to check Lucene version and fallback to v8 compatible-code.

I am in favour of (1) unless there is interest in maintaining backwards compatibility. It would not be difficult to implement (2) however, and use the new API when the major Lucene version is >=9, and use the older now deprecated API when version is 8.

Support for other terminologies?

I think SNOMED CT is probably the most complex terminology in the healthcare space. Is there any scope to coerce other terminologies like Loinc and ICD10 into the search engine?

PS: I have scripts that convert Loinc and ICD10 into SNOMED RF2 release files. They work with Hermes. However, having multiple endpoints would be nice. Maybe a uniform release file that Hermes can consume?

Documentation for importing other countries' release files

Hello, I am attempting to use hermes to host the Spanish edition and am running into some trouble. The Spanish download centre lists 3 different files which look like they cover the content I need:

  • SnomedCT_International_Edition (I assume necessary as a base for the next two)
  • SnomedCT_SpanishRelease-es_PRODUCTION_20210430T120000Z
  • SnomedCT_SpainDrugExtension-ES_PRODUCTION_20211001T120000

The first two files import cleanly but the last produces the following output:

2021-10-26 20:29:09,822 [main] INFO  com.eldrix.hermes.core - importing  0  distributions from  "/Users/djb/Downloads/SNOMED_CT_SPANISH/SnomedCT_SpainDrugExtension-ES_PRODUCTION_20211001T120000/RF2Release/"
2021-10-26 20:29:09,988 [main] INFO  com.eldrix.hermes.importer - importing files from  "/Users/djb/Downloads/SNOMED_CT_SPANISH/SnomedCT_SpainDrugExtension-ES_PRODUCTION_20211001T120000/RF2Release/"
2021-10-26 20:29:10,002 [async-thread-macro-4] INFO  com.eldrix.hermes.importer - Processing:  "/Users/djb/Downloads/SNOMED_CT_SPANISH/SnomedCT_SpainDrugExtension-ES_PRODUCTION_20211001T120000/RF2Release/Snapshot/Refset/Language/der2_cRefset_LanguageSpainDrugExtensionSnapshot_en_20211001.txt"  type:  "LanguageRefset"
2021-10-26 20:29:10,003 [async-thread-macro-2] INFO  com.eldrix.hermes.importer - Processing:  "/Users/djb/Downloads/SNOMED_CT_SPANISH/SnomedCT_SpainDrugExtension-ES_PRODUCTION_20211001T120000/RF2Release/Snapshot/Refset/Language/der2_cRefset_LanguageSpainDrugExtensionSnapshot_es-ES_ES_20211001.txt"  type:  "LanguageRefset"
2021-10-26 20:29:10,003 [async-thread-macro-1] INFO  com.eldrix.hermes.importer - Processing:  "/Users/djb/Downloads/SNOMED_CT_SPANISH/SnomedCT_SpainDrugExtension-ES_PRODUCTION_20211001T120000/RF2Release/Snapshot/Refset/Content/der2_cRefset_AssociationSpainDrugExtensionSnapshot_es-ES_ES_20211001.txt"  type:  "AssociationRefset"
2021-10-26 20:29:10,002 [async-thread-macro-3] INFO  com.eldrix.hermes.importer - Processing:  "/Users/djb/Downloads/SNOMED_CT_SPANISH/SnomedCT_SpainDrugExtension-ES_PRODUCTION_20211001T120000/RF2Release/Snapshot/Refset/Content/der2_cRefset_AttributeValueSpainDrugExtensionSnapshot_es-ES_ES_20211001.txt"  type:  "AttributeValueRefset"
2021-10-26 20:29:10,005 [async-thread-macro-1] INFO  com.eldrix.hermes.importer - Processing:  "/Users/djb/Downloads/SNOMED_CT_SPANISH/SnomedCT_SpainDrugExtension-ES_PRODUCTION_20211001T120000/RF2Release/Snapshot/Terminology/sct2_Description_SpainDrugExtensionSnapshot_es-ES_ES_20211001.txt"  type:  "Description"
2021-10-26 20:29:10,005 [async-thread-macro-3] INFO  com.eldrix.hermes.importer - Processing:  "/Users/djb/Downloads/SNOMED_CT_SPANISH/SnomedCT_SpainDrugExtension-ES_PRODUCTION_20211001T120000/RF2Release/Snapshot/Terminology/sct2_Concept_SpainDrugExtensionSnapshot_es-ES_ES_20211001.txt"  type:  "Concept"
2021-10-26 20:29:10,059 [async-thread-macro-4] INFO  com.eldrix.hermes.importer - Processing:  "/Users/djb/Downloads/SNOMED_CT_SPANISH/SnomedCT_SpainDrugExtension-ES_PRODUCTION_20211001T120000/RF2Release/Snapshot/Terminology/sct2_Relationship_SpainDrugExtensionSnapshot_es-ES_ES_20211001.txt"  type:  "Relationship"
Exception in thread "async-thread-macro-15" java.lang.NullPointerException
	at com.eldrix.hermes.impl.ser$write_description.invokeStatic(ser.clj:27)
	at com.eldrix.hermes.impl.ser$write_description.invoke(ser.clj:25)
	at com.eldrix.hermes.impl.store$fn__11936$fn__11937.invoke(store.clj:43)
	at com.eldrix.hermes.impl.store.proxy$org.mapdb.serializer.GroupSerializerObjectArray$ff19274a.serialize(Unknown Source)
	at org.mapdb.serializer.GroupSerializerObjectArray.valueArraySerialize(GroupSerializerObjectArray.java:19)
	at com.eldrix.hermes.impl.store.proxy$org.mapdb.serializer.GroupSerializerObjectArray$ff19274a.valueArraySerialize(Unknown Source)
	at org.mapdb.BTreeMapJava$NodeSerializer.serialize(BTreeMapJava.java:171)
	at org.mapdb.BTreeMapJava$NodeSerializer.serialize(BTreeMapJava.java:136)
	at org.mapdb.StoreDirectAbstract.serialize(StoreDirectAbstract.kt:243)
	at org.mapdb.StoreDirect.update(StoreDirect.kt:631)
	at org.mapdb.BTreeMap.put2(BTreeMap.kt:408)
	at org.mapdb.BTreeMap.putIfAbsent(BTreeMap.kt:863)
	at com.eldrix.hermes.impl.store$write_object.invokeStatic(store.clj:127)
	at com.eldrix.hermes.impl.store$write_object.invoke(store.clj:115)
	at com.eldrix.hermes.impl.store$write_object.invokeStatic(store.clj:125)
	at com.eldrix.hermes.impl.store$write_object.invoke(store.clj:115)
	at com.eldrix.hermes.impl.store$write_descriptions.invokeStatic(store.clj:332)
	at com.eldrix.hermes.impl.store$write_descriptions.invoke(store.clj:319)
	at com.eldrix.hermes.impl.store$eval12076$fn__12077.invoke(store.clj:467)
	at clojure.lang.MultiFn.invoke(MultiFn.java:234)
	at com.eldrix.hermes.impl.store$write_batch_worker.invokeStatic(store.clj:478)
	at com.eldrix.hermes.impl.store$write_batch_worker.invoke(store.clj:473)
	at clojure.lang.AFn.applyToHelper(AFn.java:156)
	at clojure.lang.AFn.applyTo(AFn.java:144)
	at clojure.core$apply.invokeStatic(core.clj:667)
	at clojure.core$apply.invoke(core.clj:662)
	at com.eldrix.hermes.importer$create_workers$fn__14009.invoke(importer.clj:150)
	at clojure.core.async$thread_call$fn__8625.invoke(async.clj:484)
	at clojure.lang.AFn.run(AFn.java:22)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Exception in thread "async-thread-macro-13" java.lang.NullPointerException
	at com.eldrix.hermes.impl.ser$write_language_refset_item.invokeStatic(ser.clj:133)
	at com.eldrix.hermes.impl.ser$write_language_refset_item.invoke(ser.clj:131)
	at com.eldrix.hermes.impl.ser$eval11888$fn__11889.invoke(ser.clj:307)
	at clojure.lang.MultiFn.invoke(MultiFn.java:234)
	at com.eldrix.hermes.impl.store$fn__11950$fn__11951.invoke(store.clj:57)
	at com.eldrix.hermes.impl.store.proxy$org.mapdb.serializer.GroupSerializerObjectArray$ff19274a.serialize(Unknown Source)
	at org.mapdb.serializer.GroupSerializerObjectArray.valueArraySerialize(GroupSerializerObjectArray.java:19)
	at com.eldrix.hermes.impl.store.proxy$org.mapdb.serializer.GroupSerializerObjectArray$ff19274a.valueArraySerialize(Unknown Source)
	at org.mapdb.BTreeMapJava$NodeSerializer.serialize(BTreeMapJava.java:171)
	at org.mapdb.BTreeMapJava$NodeSerializer.serialize(BTreeMapJava.java:136)
	at org.mapdb.StoreDirectAbstract.serialize(StoreDirectAbstract.kt:243)
	at org.mapdb.StoreDirect.update(StoreDirect.kt:631)
	at org.mapdb.BTreeMap.put2(BTreeMap.kt:408)
	at org.mapdb.BTreeMap.putIfAbsent(BTreeMap.kt:863)
	at com.eldrix.hermes.impl.store$write_object.invokeStatic(store.clj:127)
	at com.eldrix.hermes.impl.store$write_object.invoke(store.clj:115)
	at com.eldrix.hermes.impl.store$write_object.invokeStatic(store.clj:125)
	at com.eldrix.hermes.impl.store$write_object.invoke(store.clj:115)
	at com.eldrix.hermes.impl.store$write_refset_items.invokeStatic(store.clj:379)
	at com.eldrix.hermes.impl.store$write_refset_items.invoke(store.clj:354)
	at com.eldrix.hermes.impl.store$eval12084$fn__12085.invoke(store.clj:471)
	at clojure.lang.MultiFn.invoke(MultiFn.java:234)
	at com.eldrix.hermes.impl.store$write_batch_worker.invokeStatic(store.clj:478)
	at com.eldrix.hermes.impl.store$write_batch_worker.invoke(store.clj:473)
	at clojure.lang.AFn.applyToHelper(AFn.java:156)
	at clojure.lang.AFn.applyTo(AFn.java:144)
	at clojure.core$apply.invokeStatic(core.clj:667)
	at clojure.core$apply.invoke(core.clj:662)
	at com.eldrix.hermes.importer$create_workers$fn__14009.invoke(importer.clj:150)
	at clojure.core.async$thread_call$fn__8625.invoke(async.clj:484)
	at clojure.lang.AFn.run(AFn.java:22)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)

Skipping over that problem for the moment, I tried to continue following the instructions.

 clj -J-Xmx8g -M:run --db snomed.db compact

Success!

And now indexing:

$ clj -M:run --db snomed.db index

2021-10-26 20:31:33,493 [main] INFO  com.eldrix.hermes.core - Building search index {:root "snomed.db", :languages "en-GB"}
2021-10-26 20:38:28,215 [async-thread-macro-7] WARN  com.eldrix.hermes.impl.search - could not determine preferred synonym for  450828004  using refsets:  (900000000000508004)
2021-10-26 20:38:28,216 [async-thread-macro-7] WARN  com.eldrix.hermes.impl.search - could not determine preferred synonym for  450829007  using refsets:  (900000000000508004)
(Lots and lots of this type of stuff)

I noted here that the languages only contains en-GB and I haven't been able to find any way of overriding that through a command line argument.

To try and query the Spanish descriptions I have tried the queries from the project readme and variations that seemed like they might work from trying to make sense of the impl/language.clj file. Spanish terms can be searched, however, the returned preferredTerm is always the English translation:

$ curl "http://localhost:8080/v1/snomed/search?s=infarto+mio&maxHits=5" -H "Accept: application/json"  -H "Accept-Language: es-x-448879004" | jq

[
  {
    "id": 898589016,
    "conceptId": 22298006,
    "term": "infarto de miocardio",
    "preferredTerm": "Myocardial infarction"
  },
  {
    "id": 1410692015,
    "conceptId": 164865005,
    "term": "ECG: infarto miocárdico",
    "preferredTerm": "ECG: myocardial infarction"
  },
  {
    "id": 983083013,
    "conceptId": 57054005,
    "term": "infarto agudo de miocardio",
    "preferredTerm": "Acute myocardial infarction"
  },
  {
    "id": 1325180015,
    "conceptId": 233843008,
    "term": "infarto miocárdico silente",
    "preferredTerm": "Silent myocardial infarction"
  },
  {
    "id": 1410695018,
    "conceptId": 164866006,
    "term": "ECG: sin infarto miocárdico",
    "preferredTerm": "ECG: no myocardial infarction"
  }
]

How should I negotiate the language or is there another problem here?

Getting Java Class error

clj -M:run --db snomed.db import ..
Execution error (UnsupportedClassVersionError) at java.lang.ClassLoader/defineClass1 (REPL:-2).
org/apache/lucene/index/Term has been compiled by a more recent version of the Java Runtime (class file version 55.0), this version of the Java Runtime only recognizes class file versions up to 52.0

Ie wants JRE version 11 and only v 8 installed.

Does not recognise simple refsets without "Simple" in the (actually optional) Summary of the "ContentSubType" element.

Hey @wardle, I've been having issues querying members of Refsets using the constrain parameter of the API.

Version: v0.8.1

Querying one of the refsets to get its members using:
http://localhost:8080/v1/snomed/search?constraint=^1131000189100

Gives a 404 Not Found error.

The release files I've used to index and search can be found here.

The query http://localhost:8080/v1/snomed/search?constraint=^1101000189108 - Members of 1101000189108 |CTV3 simple map reference set (foundation metadata concept)| refset seems to work just fine.

Investigating further, looking at one of the Refset files ./SnomedCT_IndiaReferenceSetsRF2_PRODUCTION_202108067T120000Z/Snapshot/Refset/Content/der2_Refset_cardiologySnapshot_IN1000189_20210806.txt:

id	effectiveTime	active	moduleId	refsetId	referencedComponentId
a60050b1-8079-49ab-a3ad-3cb59ed33bdc	20201127	1	1121000189102	1131000189100	1001000119102

The concept 1001000119102 |Pulmonary embolism with pulmonary infarction (disorder)| has only the following refsets:

  "refsets": [
    900000000000497000,
    447562003
  ]

and does not include 1131000189100 which is part of the file used to index. However, the concept 1131000189100 does exist in the server.

I believe the issues can be replicated by just using the packages: ./SnomedCT_IndiaReferenceSetsRF2_PRODUCTION_202108067T120000Z and ./SnomedCT_InternationalRF2_PRODUCTION_20210131T120000Z (link here). It might have something to do with the naming conventions and directory structure of the files?

Add better configuration for starting server.

As I am building a graph-API in front of hermes, I want to be able to plug and play server implementations - so that is I've not done this already. But, flag an issue here as it is still functionality that is needed.

No language refset for any locale listed in priority list

Hey @wardle, I just updated to the latest version and I tried to import and index the SNOMED CT International Edition (SnomedCT_InternationalRF2_PRODUCTION_20210131T120000Z). However, I got this error:

java -jar hermes-v0.8.0.jar -d ./snomed2.db index                       

2021-10-30 20:06:17,617 [main] INFO  com.eldrix.hermes.terminology - Building search index {:root "./snomed2.db", :languages "en-IN"}
Exception in thread "main" clojure.lang.ExceptionInfo: No language refset for any locale listed in priority list {:priority-list "en-IN", :store-filename "/Users/sid/Desktop/mlds/snomed2.db/store.db"}
        at com.eldrix.hermes.impl.search$build_search_index.invokeStatic(search.clj:138)
        at com.eldrix.hermes.impl.search$build_search_index.invoke(search.clj:131)
        at com.eldrix.hermes.terminology$build_search_index.invokeStatic(terminology.clj:173)
        at com.eldrix.hermes.terminology$build_search_index.invoke(terminology.clj:168)
        at com.eldrix.hermes.terminology$build_search_index.invokeStatic(terminology.clj:169)
        at com.eldrix.hermes.terminology$build_search_index.invoke(terminology.clj:168)
        at com.eldrix.hermes.core$build_index.invokeStatic(core.clj:53)
        at com.eldrix.hermes.core$build_index.invoke(core.clj:51)
        at com.eldrix.hermes.core$invoke_command.invokeStatic(core.clj:118)
        at com.eldrix.hermes.core$invoke_command.invoke(core.clj:116)
        at com.eldrix.hermes.core$_main.invokeStatic(core.clj:135)
        at com.eldrix.hermes.core$_main.doInvoke(core.clj:121)
        at clojure.lang.RestFn.applyTo(RestFn.java:137)
        at com.eldrix.hermes.core.main(Unknown Source)

I'm curious - how does Hermes know that the priority list is "en-IN"? I'm only importing the international version. Any idea on why this error happens?

Egregious change in UK filenames

NHS Digital has introduced a code (UKED UKCL UKCR or UKDG) into filenames to show where the file used to exist, since May 2021. They've shoehorned this data into the ContentSubType/Summary field which used to be a tuple of component type (e.g. Simple) and release type (e.g. Snapshot). Now, arguably, they've kept the standard, as the strict content of that field is not stated. But the existing usage combines component type and release type - particularly useful for reference set subtypes. It diverges from the International standard. So, we need to fix this, which is relatively straightforward.

I have added a unit test that will fail until this is fixed.

Not working on M1 mac

Was just trying to import SNOMED release files on an M1 Mac. Seems like there are some specific dependencies that need an arm64 version. Will try to build it in the cloud for now.

2022-06-02 23:07:39,311 [main] INFO  com.eldrix.hermes.core - importing 1 distributions from "./downloads/extracts"
2022-06-02 23:07:39,312 [main] INFO  com.eldrix.hermes.core - distribution:  "SnomedCT_InternationalRF2_PRODUCTION_20220531T120000Z"
2022-06-02 23:07:39,312 [main] INFO  com.eldrix.hermes.core - license:  "© 2022 International Health Terminology Standards Development Organisation 2002-2022.  All rights reserved.  SNOMED CT® was originally created by the College of American Pathologists.  'SNOMED' and 'SNOMED CT' are registered trademarks of International Health Terminology Standards Development Organisation, trading as SNOMED International.  SNOMED CT has been created by combining SNOMED RT and a computer based nomenclature and classification known as Clinical Terms Version 3, formerly known as Read Codes Version 3, which was created on behalf of the UK Department of Health and is Crown copyright.  This document forms part of the International Edition release of SNOMED CT® distributed by SNOMED International, which is subject to the SNOMED CT® Affiliate License, details of which may be found at  https://www.snomed.org/snomed-ct/get-snomed."
2022-06-02 23:07:39,312 [main] INFO  com.eldrix.hermes.core - importing 0 modules
Exception in thread "main" java.lang.UnsatisfiedLinkError: could not load FFI provider jnr.ffi.provider.jffi.Provider
        at jnr.ffi.provider.InvalidProvider$1.loadLibrary(InvalidProvider.java:49)
        at jnr.ffi.LibraryLoader.load(LibraryLoader.java:417)
        at jnr.ffi.LibraryLoader.load(LibraryLoader.java:396)
        at org.lmdbjava.Library.<clinit>(Library.java:125)
        at org.lmdbjava.Env$Builder.open(Env.java:486)
        at org.lmdbjava.Env$Builder.open(Env.java:512)
        at com.eldrix.hermes.impl.lmdb$open_STAR_.invokeStatic(lmdb.clj:82)
        at com.eldrix.hermes.impl.lmdb$open_STAR_.doInvoke(lmdb.clj:66)
        at clojure.lang.RestFn.invoke(RestFn.java:423)
        at com.eldrix.hermes.impl.lmdb$open_store.invokeStatic(lmdb.clj:110)
        at com.eldrix.hermes.impl.lmdb$open_store.invoke(lmdb.clj:107)
        at com.eldrix.hermes.impl.store$open_store.invokeStatic(store.clj:50)
        at com.eldrix.hermes.impl.store$open_store.invoke(store.clj:47)
        at com.eldrix.hermes.core$do_import_snomed.invokeStatic(core.clj:678)
        at com.eldrix.hermes.core$do_import_snomed.invoke(core.clj:672)
        at com.eldrix.hermes.core$import_snomed.invokeStatic(core.clj:721)
        at com.eldrix.hermes.core$import_snomed.invoke(core.clj:710)
        at com.eldrix.hermes.cmd.core$import_from.invokeStatic(core.clj:15)
        at com.eldrix.hermes.cmd.core$import_from.invoke(core.clj:12)
        at com.eldrix.hermes.cmd.core$invoke_command.invokeStatic(core.clj:115)
        at com.eldrix.hermes.cmd.core$invoke_command.invoke(core.clj:113)
        at com.eldrix.hermes.cmd.core$_main.invokeStatic(core.clj:132)
        at com.eldrix.hermes.cmd.core$_main.doInvoke(core.clj:118)
        at clojure.lang.RestFn.applyTo(RestFn.java:137)
        at com.eldrix.hermes.cmd.core.main(Unknown Source)
Caused by: java.lang.UnsatisfiedLinkError: could not get native definition for type `POINTER`, original error message follows: java.lang.UnsatisfiedLinkError: Unable to execute or load jffi binary stub from `/var/folders/kx/4sxz0qs91jzbx26_kxs392lw0000gn/T/`. Set `TMPDIR` or Java property `java.io.tmpdir` to a read/write path that is not mounted "noexec".
/Users/sid/Desktop/medblocks/mlds/jffi1863568547990917776.dylib: dlopen(/Users/sid/Desktop/medblocks/mlds/jffi1863568547990917776.dylib, 0x0001): tried: '/Users/sid/Desktop/medblocks/mlds/jffi1863568547990917776.dylib' (fat file, but missing compatible architecture (have 'i386,x86_64', need 'arm64e')), '/usr/lib/jffi1863568547990917776.dylib' (no such file)
        at com.kenai.jffi.internal.StubLoader.tempLoadError(StubLoader.java:448)
        at com.kenai.jffi.internal.StubLoader.loadFromJar(StubLoader.java:433)
        at com.kenai.jffi.internal.StubLoader.load(StubLoader.java:300)
        at com.kenai.jffi.internal.StubLoader.<clinit>(StubLoader.java:511)
        at java.base/java.lang.Class.forName0(Native Method)
        at java.base/java.lang.Class.forName(Class.java:398)
        at com.kenai.jffi.Init.load(Init.java:68)
        at com.kenai.jffi.Foreign$InstanceHolder.getInstanceHolder(Foreign.java:50)
        at com.kenai.jffi.Foreign$InstanceHolder.<clinit>(Foreign.java:46)
        at com.kenai.jffi.Foreign.getInstance(Foreign.java:104)
        at com.kenai.jffi.Type$Builtin.lookupTypeInfo(Type.java:242)
        at com.kenai.jffi.Type$Builtin.getTypeInfo(Type.java:237)
        at com.kenai.jffi.Type.resolveSize(Type.java:155)
        at com.kenai.jffi.Type.size(Type.java:138)
        at jnr.ffi.provider.jffi.NativeRuntime$TypeDelegate.size(NativeRuntime.java:182)
        at jnr.ffi.provider.AbstractRuntime.<init>(AbstractRuntime.java:48)
        at jnr.ffi.provider.jffi.NativeRuntime.<init>(NativeRuntime.java:61)
        at jnr.ffi.provider.jffi.NativeRuntime.<init>(NativeRuntime.java:45)
        at jnr.ffi.provider.jffi.NativeRuntime$SingletonHolder.<clinit>(NativeRuntime.java:57)
        at jnr.ffi.provider.jffi.NativeRuntime.getInstance(NativeRuntime.java:53)
        at jnr.ffi.provider.jffi.Provider.<init>(Provider.java:29)
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
        at java.base/java.lang.Class.newInstance(Class.java:584)
        at jnr.ffi.provider.FFIProvider$SystemProviderSingletonHolder.getInstance(FFIProvider.java:68)
        at jnr.ffi.provider.FFIProvider$SystemProviderSingletonHolder.<clinit>(FFIProvider.java:57)
        at jnr.ffi.provider.FFIProvider.getSystemProvider(FFIProvider.java:35)
        at jnr.ffi.LibraryLoader.create(LibraryLoader.java:88)
        at org.lmdbjava.Library.<clinit>(Library.java:125)
        at org.lmdbjava.Env$Builder.open(Env.java:486)
        at org.lmdbjava.Env$Builder.open(Env.java:512)
        at com.eldrix.hermes.impl.lmdb$open_STAR_.invokeStatic(lmdb.clj:82)
        at com.eldrix.hermes.impl.lmdb$open_STAR_.doInvoke(lmdb.clj:66)
        at clojure.lang.RestFn.invoke(RestFn.java:423)
        at com.eldrix.hermes.impl.lmdb$open_store.invokeStatic(lmdb.clj:110)
        at com.eldrix.hermes.impl.lmdb$open_store.invoke(lmdb.clj:107)
        at com.eldrix.hermes.impl.store$open_store.invokeStatic(store.clj:50)
        at com.eldrix.hermes.impl.store$open_store.invoke(store.clj:47)
        at com.eldrix.hermes.core$do_import_snomed.invokeStatic(core.clj:678)
        at com.eldrix.hermes.core$do_import_snomed.invoke(core.clj:672)
        at com.eldrix.hermes.core$import_snomed.invokeStatic(core.clj:721)
        at com.eldrix.hermes.core$import_snomed.invoke(core.clj:710)
        at com.eldrix.hermes.cmd.core$import_from.invokeStatic(core.clj:15)
        at com.eldrix.hermes.cmd.core$import_from.invoke(core.clj:12)
        at com.eldrix.hermes.cmd.core$invoke_command.invokeStatic(core.clj:115)
        at com.eldrix.hermes.cmd.core$invoke_command.invoke(core.clj:113)
        at com.eldrix.hermes.cmd.core$_main.invokeStatic(core.clj:132)
        at com.eldrix.hermes.cmd.core$_main.doInvoke(core.clj:118)
        at clojure.lang.RestFn.applyTo(RestFn.java:137)
        at com.eldrix.hermes.cmd.core.main(Unknown Source)

        at com.kenai.jffi.Type$Builtin.lookupTypeInfo(Type.java:253)
        at com.kenai.jffi.Type$Builtin.getTypeInfo(Type.java:237)
        at com.kenai.jffi.Type.resolveSize(Type.java:155)
        at com.kenai.jffi.Type.size(Type.java:138)
        at jnr.ffi.provider.jffi.NativeRuntime$TypeDelegate.size(NativeRuntime.java:182)
        at jnr.ffi.provider.AbstractRuntime.<init>(AbstractRuntime.java:48)
        at jnr.ffi.provider.jffi.NativeRuntime.<init>(NativeRuntime.java:61)
        at jnr.ffi.provider.jffi.NativeRuntime.<init>(NativeRuntime.java:45)
        at jnr.ffi.provider.jffi.NativeRuntime$SingletonHolder.<clinit>(NativeRuntime.java:57)
        at jnr.ffi.provider.jffi.NativeRuntime.getInstance(NativeRuntime.java:53)
        at jnr.ffi.provider.jffi.Provider.<init>(Provider.java:29)
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
        at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
        at java.base/java.lang.Class.newInstance(Class.java:584)
        at jnr.ffi.provider.FFIProvider$SystemProviderSingletonHolder.getInstance(FFIProvider.java:68)
        at jnr.ffi.provider.FFIProvider$SystemProviderSingletonHolder.<clinit>(FFIProvider.java:57)
        at jnr.ffi.provider.FFIProvider.getSystemProvider(FFIProvider.java:35)
        at jnr.ffi.LibraryLoader.create(LibraryLoader.java:88)
        ... 22 more

Add support for arbitrary reference sets

Hermes has a built-in understanding of the core reference set types, because it makes use of those data in its functioning. It does not simply treat items as an opaque collection of data,

However, there will be distributions, either the International edition, or national releases, that have custom reference set types with additional columns representing arbitrary data.

The reference set descriptor defines the structure of each reference set in a distribution. Hermes already has first class support for the refset-descriptor reference set, so it is not difficult to imagine building a way of storing arbitrary reference set items according to this more dynamic model.

Relationship indices are not generated correctly when there are multiple relationships relating to same source/target/type

SNOMED distributions can, even in a snapshot, have more than one relationship relating to the same combination of source, target, type and modifier identifiers.

For example, this is from the September 22 UK edition:

➜  Terminology git:(main) head -n 1 sct2_Relationship_UKCLSnapshot_GB1000000_20220928.txt 
id	effectiveTime	active	moduleId	sourceId	destinationId	relationshipGroup	typeId	characteristicTypeId	modifierId
➜
Terminology git:(main) cat sct2_Relationship_UKCLSnapshot_GB1000000_20220928.txt | grep 1089261000000101
832591000000123.        20210512	0	999000011000000103	1089261000000101	609336008	0	116680003	900000000000011006	900000000000451002
2191421000000129        20210512	0	999000011000000103	1089261000000101	301857004	0	116680003	900000000000011006	900000000000451002
3219831000000124	20210512	1	999000011000000103	1089261000000101	773760007	2	42752001	900000000000011006	900000000000451002
3228451000000128	20210512	1	999000011000000103	1089261000000101	51576004	1	363698007	900000000000011006	900000000000451002
3229451000000120	20210512	1	999000011000000103	1089261000000101	12835000	1	116676008	900000000000011006	900000000000451002
3229461000000123	20210512	1	999000011000000103	1089261000000101	213345000	0	116680003	900000000000011006	900000000000451002
5687171000000128	20210512	0	999000011000000103	1089261000000101	213345000	0	116680003	900000000000011006	900000000000451002
5687191000000129	20210512	0	999000011000000103	1089261000000101	36818005	1	116676008	900000000000011006	900000000000451002
5687201000000127	20210512	0	999000011000000103	1089261000000101	52530000	1	363698007	900000000000011006	900000000000451002

In this, you can see that 3229461000000123 and 5687171000000128 both relate to the same source, target and type:

3229461000000123	20210512	1	999000011000000103	1089261000000101	213345000	0	116680003	900000000000011006	900000000000451002
5687171000000128	20210512	0	999000011000000103	1089261000000101	213345000	0	116680003	900000000000011006	900000000000451002	

The current relationship indexing is done in a single pass during relationship importing. This would work if there were no relationships that essentially had the same data. In this case, you can see that the later row shows the relationship to be inactive, while the earlier row shows it to be active.

Current import would look at the effective date and if the same or later, would delete the relationship because it is inactive in the later row. This is incorrect behaviour when multiple relationships can reference the same tuple of source-target-type.

This therefore results in ~70 or so concepts not having correct relationships stored, affecting search and inference. The fix is to adopt a two-pass approach, in which relationships are imported, and the indices rebuilt after import.

Use optimised 'description'

The new 'description' fn provides an optimised path if a client knows both concept and description ids. 'preferred-description' should use this optimised path.

Search using constraint with refinements times out

The Health Ministry from Argentina advises to use the following constraint when searching within generic medications: <763158003: 732943007 =*, [0..0] 774159003=*.
I've noticed that using such a constraint on the AR Edition of SNOMED causes the request to time out. Removing the second refinement reduce search time to around 15 secs (docker container on a i7 8core iMac). So it seems to be properly interpreting the ECL.
I wonder if this is an indexing issue. Except for this only problem, we are in awe regarding the performance of this terminology server. Orders of magnitude more performance and less footprint. Also the full text search with it's fuzzy fallback is amazing.

Dockerise Hermes

I love the project and very persuaded by the need for SNOMED in clinical systems - thank you for building it. I am not familiar with clojure - I was never a fan of semicolons, and now have to think about colons as well!

How would you feel about dockerising it (or if I had a go)? That way it could run locally in its own container with out having to worry about dependencies and the like?

Paginate Search Results

Is there a way to paginate the search results? Something like an offset parameter with the search endpoint would be perfect.

Switch to lucene for raw component store?

Initial benchmarking shows the International and UK releases can be imported into a simple Lucene index in less than 2 minutes, can be compacted in less than a minute and provides fast <1 ms response times for single entity fetches and <6ms response times for more complex queries (e.g. reference set item for a given component and refset). While much of this improvement may lie with use of a different serialisation library, the disk space ends up as only 1.2gb for the core backend storage of the SNOMED components. In addition, lucene is a stable, well support library with considerable longevity, even if the API can change in breaking ways between versions. Lucene is also used for the fast free-text faceted search. So, finish benchmarking and switch to lucene for main file-based storage? Keep indexes for store and search separate as conceivably might run those as independent services in high load prod environment in future, and fine to have multiple indexes.

Modify search to use normalised / folded strings to better handle diacritics

Hi Mark,

Firstly, thank you for your continuous efforts with the Hermes terminology server. We've noticed a slight challenge when searching for specific Spanish SNOMED terms. After some investigation, we believe there's a possibility that the way Lucene (integrated within Hermes) processes Spanish terms might need some attention.

Details

  • We're primarily using the SNOMED Spanish AR edition.
  • A search for 'nódulo hepático' using substrings like 'nod hep' doesn't return the anticipated results. Interestingly, when we use 'nód hep', the search is spot on.
  • Given the integral role of Lucene in Hermes's search capabilities, there might be nuances related to Lucene's processing of Spanish terms.

To elaborate, if Lucene's default configurations (like tokenization, stemming, and stop words) are more English-centric, it could potentially lead to some nuances in handling Spanish terms within Hermes.

We genuinely appreciate any insights or suggestions you might have on this. Thank you for your understanding and support.

Best regards,

BTW: atm we're using hermes-1.2.1032

Status info missing

Hi Mark
Have installed and got it all working fine, many thanks - we will be using this for sure.
One small issue is that I would like to get the status info you describe in the readme but instead I get a list of all the packages, ( as per core.clj )
ie I would like to see this count info

{:installed-releases
("SNOMED Clinical Terms version: 20200731 [R] (July 2020 Release)"
"31.3.0_20210120000001 UK clinical extension"),
:concepts 574414,
:descriptions 1720404,
:relationships 3263996,
:refsets 9424174,
:indices
{:descriptions-concept 1720404,
:concept-parent-relationships 1210561,
:concept-child-relationships 1210561,
:installed-refsets 293,
:component-refsets 6094742,
:map-target-component 1125516}}

Inactive Descriptions show up in search

Indexed: SnomedCT_InternationalRF2_PRODUCTION_20220930T120000Z
Version: 1.0.712
Server is running

Problem

The search results seem to return the same term multiple times. For example, searching "Paracetamol 500", returns an array with "Paracetamol 500mg tablet", and "Paracetamol 500mg tablet" as the first 2 results, both belonging to the concept 322236009.

curl http://localhost:8080/v1/snomed/search?s=paracetamol%20500
[
	{
		"id": 464271012,
		"conceptId": 322236009,
		"term": "Paracetamol 500mg tablet",
		"preferredTerm": "Paracetamol 500 mg oral tablet"
	},
	{
		"id": 1236200010,
		"conceptId": 322236009,
		"term": "Paracetamol 500mg tablet",
		"preferredTerm": "Paracetamol 500 mg oral tablet"
	},
	{
		"id": 464300011,
		"conceptId": 322280009,
		"term": "Paracetamol 500mg capsule",
		"preferredTerm": "Paracetamol 500 mg oral capsule"
	},
	{
		"id": 1236213016,
		"conceptId": 322280009,
		"term": "Paracetamol 500mg capsule",
		"preferredTerm": "Paracetamol 500 mg oral capsule"
	},
	{
		"id": 2154521017,
		"conceptId": 322280009,
		"term": "Paracetamol 500mg capsule",
		"preferredTerm": "Paracetamol 500 mg oral capsule"
	},
	{
		"id": 464283014,
		"conceptId": 322250004,
		"term": "Paracetamol 500mg suppository",
		"preferredTerm": "Paracetamol 500 mg rectal suppository"
	},
	{
		"id": 1236205017,
		"conceptId": 322250004,
		"term": "Paracetamol 500mg suppository",
		"preferredTerm": "Paracetamol 500 mg rectal suppository"
	}
]

Investigating the descriptions:

curl http://localhost:8080/v1/snomed/concepts/322236009/extended | jq ".descriptions[] | select (.active)"
{
  "id": 3500544012,
  "effectiveTime": "2017-07-31",
  "active": true,
  "moduleId": 900000000000207000,
  "conceptId": 322236009,
  "languageCode": "en",
  "typeId": 900000000000013000,
  "term": "Acetaminophen 500 mg oral tablet",
  "caseSignificanceId": 900000000000448000,
  "refsets": [
    900000000000509000,
    900000000000508000
  ],
  "preferredIn": [
    900000000000509000
  ],
  "acceptableIn": [
    900000000000508000
  ]
}
{
  "id": 3500545013,
  "effectiveTime": "2017-07-31",
  "active": true,
  "moduleId": 900000000000207000,
  "conceptId": 322236009,
  "languageCode": "en",
  "typeId": 900000000000013000,
  "term": "Paracetamol 500 mg oral tablet",
  "caseSignificanceId": 900000000000448000,
  "refsets": [
    900000000000509000,
    900000000000508000
  ],
  "preferredIn": [
    900000000000508000
  ],
  "acceptableIn": [
    900000000000509000
  ]
}
{
  "id": 3681734015,
  "effectiveTime": "2018-07-31",
  "active": true,
  "moduleId": 900000000000207000,
  "conceptId": 322236009,
  "languageCode": "en",
  "typeId": 900000000000003000,
  "term": "Product containing precisely paracetamol 500 milligram/1 each conventional release oral tablet (clinical drug)",
  "caseSignificanceId": 900000000000448000,
  "refsets": [
    900000000000509000,
    900000000000508000
  ],
  "preferredIn": [
    900000000000509000,
    900000000000508000
  ],
  "acceptableIn": []
}

Only 3 active descriptions are found. The description with id 464271012 is not active, however, it's being shown in the search results.

Expected: Only active concepts should show up by default.

How to search?

Hi there Wardle! Thank you for the great work.
I was able to index my files and get a server up and running. However, I still haven't figured out the API to search for terms.
http://127.0.0.1:8080/v1/snomed/search? returns a lot of concepts, although in a sort of LISP representation:

(
#com.eldrix.hermes.impl.search.Result{:id 2616344010, :conceptId 420744004, :term "L", :preferredTerm "Roman numeral L"} 
#com.eldrix.hermes.impl.search.Result{:id 2616333019, :conceptId 420766006, :term "U", :preferredTerm "U"} 
#com.eldrix.hermes.impl.search.Result{:id 2616340018, :conceptId 420925002, :term "z", :preferredTerm "z"} 
...
)

How do I search for terms? and Is there a JSON API?

Some client JSON parsers don't support 64-bit integers, potentially causing truncation

Hello

First I'd like to say this is a much-needed library that's both incredibly fast and implements recent operators of the ECL.
I'm enjoying learning to use it!

I just came across a behaviour I don't quite understand for concepts in the UK drug extension, which have longer identifiers.
This first happened on my laptop and it seems the behaviour is similar using your helpful test server.
I do apologise in advance if the explanation should be obvious - I seem to be observing a change in the last digit of the concept ID of UK medical products. The two example below seem to affect all the REST endpoints I've tested.

I am wondering whether I'm missing something. The data stored seems valid, it's just the REST response doesn't match.
Curious to understand what is happening with the retrieval.
Thanks!
Peter

http://3.9.221.177:8080/v1/snomed/search?constraint=9401401000001101

[
  {
    "id": 47770001000001110,
    "conceptId": 9401401000001100,
    "term": "Zoladex",
    "preferredTerm": "Zoladex"
  }
]

You can see the concept id 9401401000001101 reads 9401401000001100

http://3.9.221.177:8080/v1/snomed/concepts/38001711000001104/preferred

{
  "id": 653951000001114,
  "effectiveTime": "2020-01-22",
  "active": true,
  "moduleId": 999000011000001200,
  "conceptId": 38001711000001100,
  "languageCode": "en",
  "typeId": 900000000000013000,
  "term": "Leuprorelin 10.72mg implant pre-filled syringes 1 pre-filled disposable injection",
  "caseSignificanceId": 900000000000017000
}

You can see the concept id 38001711000001104 reads 38001711000001100

Excluding Codes

Hi Mark
I have been told that I need to exclude 3 codes and their children, eg 410663007 <- 246061005 from entering the patient record.
Ideally this would be done when loading from TRUD, but alternatively is there a way to delete a conceptID and all children?
If not could you exclude codes in a config file for the server or least preferable when searching ?

Better error reporting during import

See #26 - we had a distribution with invalid data in their descriptions file. It would be very useful to catch a problem during import, which occurs with large batch sizes, and then re-execute the import item by item to flag up the dodgy input. We could then log that incorrect item and possibly even continue.

In core API, 'synonyms' should also accept locale preferences

The current main API returns all synonyms for a given concept. This will generally be fine, but includes synonyms that are not preferred or acceptable for a given locale. This means that for user-facing applications, if the client wishes to show a list of synonyms for a given concept, very outdated and unacceptable descriptions will be shown.

For example,

(map :term (synonyms svc 24700007))
=>
("Multiple sclerosis"
 "Multiple sclerosis, NOS"
 "Generalized multiple sclerosis"
 "Generalised multiple sclerosis"
 "Disseminated sclerosis"
 "MS - Multiple sclerosis"
 "DS - Disseminated sclerosis")

While we may wish to allow search of outdated synonyms (e.g. Wegener's granulomatosis), we should not show outdated terms by default. As such, 'synonyms' should should accept localisation preferences, so that there is an optimised fetch of synonyms that are either preferred, or acceptable, for a collection of language reference sets. This additional parameterisation should also be available via the graph API.

This would be an additive change, and not affect existing functionality.

It would not be goal to support greater levels of detail, or provide categories such as :preferred, :acceptable and :other. If more detail is required, then the already available 'extended-concept' can be used to return, in detail, how each concept's description relates to the set of installed language reference sets.

Add integration with new UK TRUD release distribution API.

Need to have a modular system to make it much easier to download regional SNOMED distributions. In the UK, this is via TRUD, but different in other regions. Would be very helpful to make it easy to automate this process and keep any single service up-to-date, or instead have a bootstrapping system to create a new service based on the latest distribution files.

Error importing distribution

Originally posted by @sidharthramesh in #39 (comment)

Upon running it on Google cloud (AMD64) I get the below error. The exact same build worked with hermes v0.8.4--alpha
Currently updated to the latest version: v0.12.654.

This is the Dockerfile that is being built: https://github.com/medblocks/mlds/blob/master/Dockerfile. It sources some of the SNOEMD packages from a Google Cloud Storage Bucket.

Step 14/21 : RUN java -jar hermes.jar -d ./snomed.db import ./extracts
 ---> Running in 481266795a68
2022-06-02 18:39:24,598 [main] INFO  com.eldrix.hermes.core - importing 1 distributions from "./extracts"
2022-06-02 18:39:24,600 [main] INFO  com.eldrix.hermes.core - distribution:  "SnomedCT_InternationalRF2_PRODUCTION_20220531T120000Z"
2022-06-02 18:39:24,601 [main] INFO  com.eldrix.hermes.core - license:  "© 2022 International Health Terminology Standards Development Organisation 2002-2022.  All rights reserved.  SNOMED CT® was originally created by the College of American Pathologists.  'SNOMED' and 'SNOMED CT' are registered trademarks of International Health Terminology Standards Development Organisation, trading as SNOMED International.  SNOMED CT has been created by combining SNOMED RT and a computer based nomenclature and classification known as Clinical Terms Version 3, formerly known as Read Codes Version 3, which was created on behalf of the UK Department of Health and is Crown copyright.  This document forms part of the International Edition release of SNOMED CT® distributed by SNOMED International, which is subject to the SNOMED CT® Affiliate License, details of which may be found at  https://www.snomed.org/snomed-ct/get-snomed."
2022-06-02 18:39:24,602 [main] INFO  com.eldrix.hermes.core - importing 0 modules
2022-06-02 18:39:24,906 [async-thread-macro-1] INFO  com.eldrix.hermes.importer - Processing:  "der2_cciRefset_RefsetDescriptorSnapshot_IN1000189_20210806.txt"  type:  "RefsetDescriptorRefset"
2022-06-02 18:39:24,931 [async-thread-macro-1] INFO  com.eldrix.hermes.importer - Processing:  "sct2_Concept_Snapshot_IN1000189_20210806.txt"  type:  "Concept"
2022-06-02 18:39:24,981 [async-thread-macro-34] ERROR com.eldrix.hermes.impl.store - import error: failed to import data:  {:type :info.snomed/RefsetDescriptorRefset, :parser #object[com.eldrix.hermes.snomed$parse_snomed_filename$fn__1048 0x70360744 "com.eldrix.hermes.snomed$parse_snomed_filename$fn__1048@70360744"], :headings ["id" "effectiveTime" "active" "moduleId" "refsetId" "referencedComponentId" "attributeDescription" "attributeType" "attributeOrder"], :data [#com.eldrix.hermes.snomed.RefsetDescriptorRefsetItem{:id #uuid "3dddb01f-6e1d-473e-8e70-b1b3f2887439", :effectiveTime #object[java.time.LocalDate 0x1e065e3f "2019-11-22"], :active true, :moduleId 683851000189105, :refsetId 672411000189107, :referencedComponentId 672411000189107, :attributeDescriptionId 900000000000461009, :attributeTypeId 900000000000461009, :attributeOrder 0}]}
Exception in thread "main" clojure.lang.ExceptionInfo: Import error {:batch {:type :info.snomed/RefsetDescriptorRefset, :parser #object[com.eldrix.hermes.snomed$parse_snomed_filename$fn__1048 0x70360744 "com.eldrix.hermes.snomed$parse_snomed_filename$fn__1048@70360744"], :headings ["id" "effectiveTime" "active" "moduleId" "refsetId" "referencedComponentId" "attributeDescription" "attributeType" "attributeOrder"]}, :data {:type :info.snomed/RefsetDescriptorRefset, :parser #object[com.eldrix.hermes.snomed$parse_snomed_filename$fn__1048 0x70360744 "com.eldrix.hermes.snomed$parse_snomed_filename$fn__1048@70360744"], :headings ["id" "effectiveTime" "active" "moduleId" "refsetId" "referencedComponentId" "attributeDescription" "attributeType" "attributeOrder"], :data [#com.eldrix.hermes.snomed.RefsetDescriptorRefsetItem{:id #uuid "3dddb01f-6e1d-473e-8e70-b1b3f2887439", :effectiveTime #object[java.time.LocalDate 0x1e065e3f "2019-11-22"], :active true, :moduleId 683851000189105, :refsetId 672411000189107, :referencedComponentId 672411000189107, :attributeDescriptionId 900000000000461009, :attributeTypeId 900000000000461009, :attributeOrder 0}]}, :exception {:via [{:type java.lang.ClassCastException, :message "class com.eldrix.hermes.snomed.RefsetDescriptorRefsetItem cannot be cast to class com.eldrix.hermes.snomed.RefsetDescriptorRefsetItem (com.eldrix.hermes.snomed.RefsetDescriptorRefsetItem is in unnamed module of loader clojure.lang.DynamicClassLoader @7a606260; com.eldrix.hermes.snomed.RefsetDescriptorRefsetItem is in unnamed module of loader 'app')", :at [com.eldrix.hermes.impl.ser$write_refset_descriptor_refset_item invokeStatic "ser.clj" 357]}], :trace [[com.eldrix.hermes.impl.ser$write_refset_descriptor_refset_item invokeStatic "ser.clj" 357] [com.eldrix.hermes.impl.ser$write_refset_descriptor_refset_item invoke "ser.clj" 356] [com.eldrix.hermes.impl.ser$fn__11117 invokeStatic "ser.clj" 411] [com.eldrix.hermes.impl.ser$fn__11117 invoke "ser.clj" 409] [clojure.lang.MultiFn invoke "MultiFn.java" 234] [com.eldrix.hermes.impl.lmdb$write_refset_items invokeStatic "lmdb.clj" 257] [com.eldrix.hermes.impl.lmdb$write_refset_items invoke "lmdb.clj" 226] [com.eldrix.hermes.impl.store$fn__11381 invokeStatic "store.clj" 434] [com.eldrix.hermes.impl.store$fn__11381 invoke "store.clj" 432] [clojure.lang.MultiFn invoke "MultiFn.java" 234] [com.eldrix.hermes.impl.store$write_batch_one_by_one$fn__11398 invoke "store.clj" 441] [com.eldrix.hermes.impl.store$write_batch_one_by_one invokeStatic "store.clj" 440] [com.eldrix.hermes.impl.store$write_batch_one_by_one invoke "store.clj" 436] [com.eldrix.hermes.impl.store$write_batch_with_fallback invokeStatic "store.clj" 451] [com.eldrix.hermes.impl.store$write_batch_with_fallback invoke "store.clj" 447] [com.eldrix.hermes.core$do_import_snomed$fn__14256 invoke "core.clj" 687] [clojure.core$map$fn__5931$fn__5932 invoke "core.clj" 2759] [clojure.core.async.impl.channels$chan$fn__1328 invoke "channels.clj" 304] [clojure.core.async.impl.channels.ManyToManyChannel put_BANG_ "channels.clj" 147] [clojure.core.async$fn__6536 invokeStatic "async.clj" 172] [clojure.core.async$fn__6536 invoke "async.clj" 164] [clojure.core.async$pipeline_STAR_$process__6720 invoke "async.clj" 534] [clojure.core.async$pipeline_STAR_$fn__6732 invoke "async.clj" 549] [clojure.core.async$thread_call$fn__6643 invoke "async.clj" 484] [clojure.lang.AFn run "AFn.java" 22] [java.util.concurrent.ThreadPoolExecutor runWorker nil -1] [java.util.concurrent.ThreadPoolExecutor$Worker run nil -1] [java.lang.Thread run nil -1]], :cause "class com.eldrix.hermes.snomed.RefsetDescriptorRefsetItem cannot be cast to class com.eldrix.hermes.snomed.RefsetDescriptorRefsetItem (com.eldrix.hermes.snomed.RefsetDescriptorRefsetItem is in unnamed module of loader clojure.lang.DynamicClassLoader @7a606260; com.eldrix.hermes.snomed.RefsetDescriptorRefsetItem is in unnamed module of loader 'app')"}}
	at com.eldrix.hermes.impl.store$write_batch_one_by_one$fn__11398.invoke(store.clj:444)
	at com.eldrix.hermes.impl.store$write_batch_one_by_one.invokeStatic(store.clj:440)
	at com.eldrix.hermes.impl.store$write_batch_one_by_one.invoke(store.clj:436)
	at com.eldrix.hermes.impl.store$write_batch_with_fallback.invokeStatic(store.clj:451)
	at com.eldrix.hermes.impl.store$write_batch_with_fallback.invoke(store.clj:447)
	at com.eldrix.hermes.core$do_import_snomed$fn__14256.invoke(core.clj:687)
	at clojure.core$map$fn__5931$fn__5932.invoke(core.clj:2759)
	at clojure.core.async.impl.channels$chan$fn__1328.invoke(channels.clj:304)
	at clojure.core.async.impl.channels.ManyToManyChannel.put_BANG_(channels.clj:147)
	at clojure.core.async$fn__6536.invokeStatic(async.clj:172)
	at clojure.core.async$fn__6536.invoke(async.clj:164)
	at clojure.core.async$pipeline_STAR_$process__6720.invoke(async.clj:534)
	at clojure.core.async$pipeline_STAR_$fn__6732.invoke(async.clj:549)
	at clojure.core.async$thread_call$fn__6643.invoke(async.clj:484)
	at clojure.lang.AFn.run(AFn.java:22)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
The command '/bin/sh -c java -jar hermes.jar -d ./snomed.db import ./extracts' returned a non-zero code: 1

Originally posted by @sidharthramesh in #39 (comment)

Prolonged hermes snomed_db build time

I am currently trying to build the snomed_db with RF2 downloaded file using the instructions in the documentation. I am using only the 'snapshot' folder. The documentation says it takes few minutes to run. Mine have been running for more than 24 hrs and hasn't stopped. Am i missing something? My machine is a Windows 10 with 8g ram. Any suggestion will be appreciated. Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.