Git Product home page Git Product logo

Comments (7)

kaicode avatar kaicode commented on August 10, 2024

The Snowstorm "semantic index" is an index of all concept parents, ancestors, attributes and attribute groups. This is used to answer the findConceptParents call and other hierarchy and ECL queries.

Initial thoughts:

  • The extension inferred relationship 739111000009126 will have been imported because a relationship with that identifier does not exist in the International Edition. Checks are based on the component id here.
  • The semantic index may have been updated incorrectly because the triple 81260002 is-a 321351000009104 is active in one relationship and inactive using a different relationship identifier. Snowstorm does not always catch this sort of update when importing an extension.

There is a workaround for this scenario. Could you try rebuilding the semantic index please? This can be done using the rebuildBranchTransitiveClosure function under the Concepts area of Swagger.
(This will move to the new Admin area of Swagger in v3.x)

from snowstorm.

kaicode avatar kaicode commented on August 10, 2024

There are a few duplicate triples like this in the International Snapshot. When building the semantic index we sort by effectiveTime and active to get the most effective relationships in the right order for processing but it looks like avoiding duplicate triples with different relationship ids is not working when importing a delta. I would be interested to hear if rebuilding the semantic index solves this.

from snowstorm.

dkincaid avatar dkincaid commented on August 10, 2024

I just ran the rebuild. Now I do get back that parent when I query the MAIN/SNOMED-VET endpoint, but it is very slow to return (like 6-7 seconds). Before it was pretty much instantaneous. I also see this log message output when I call that endpoint now:

2019-04-30 12:04:32.967  WARN 2794 --- [/O dispatcher 1] org.elasticsearch.client.RestClient      : request [GET http://localhost:9200/es-query/query-concept/_search?typed_keys=true&ignore_unavailable=false&expand_wildcards=open&allow_no_indices=true&search_type=dfs_query_then_fetch&batched_reduce_size=512] returned 1 warnings: 
[299 Elasticsearch-6.4.2-04711c2 "Deprecated: the number of terms [699096] used in the Terms Query
 request has exceeded the allowed maximum of [65536]. This maximum can be set by changing the 
[index.max_terms_count] index level setting." "Tue, 30 Apr 2019 17:04:25 GMT"]

that seems surprising for just a single concept parent query.

from snowstorm.

kaicode avatar kaicode commented on August 10, 2024

Thanks for trying that. I'm glad you are getting the desired parents back now. The semantic index rebuild on this branch had quite a performance impact didn't it!

It's slower now because we now have two full semantic indexes sitting on top of each other. One on MAIN and the other on the MAIN/SNOMED-VET branch. The large query and slower query time is because the query clause is excluding all the concepts in the MAIN semantic index after they were all replaced when it was rebuilt on SNOMED-VET. Just about the only weakness of Snowstorm is that if you replace tens of thousands of components on branches other than MAIN things will start to slow down.

I've marked this down as a bug. It's going to take some thought to solve this without impacting the performance of the incremental semantic index update. Thanks for reporting the issue.

from snowstorm.

kaicode avatar kaicode commented on August 10, 2024

@dkincaid If you would like this working now another workaround you could try is to import the vet extension into MAIN then rebuild the semantic index on MAIN and just not use the SNOMED-VET branch. That should give you fast consistent results until this bug can be fixed.

from snowstorm.

kaicode avatar kaicode commented on August 10, 2024

Hi @dkincaid,

In version 4.1.0 of Snowstorm we have updated the semantic index update function to use all active triples (source, type and destination concept) when processing each relationship change. This was necessary because in the US Edition there are over one hundred cases of triples being made inactive in the US module straight after the same triple is made active in the International module. The inactivation in the US module is done using a different relationship id but Snowstorm was making the triple inactive until this fix.

This should also fix the issue you were seeing where relationships were going missing because I believe this was happening for the very same reason. This fix should give you accurate child/parent/ECL results straight after the RF2 import. The workaround we tried before gives me confidence that v4.1.0 (or later) will work for you without wrecking your performance.

I just thought I should let you know in case you have time to try it again. I can recommend deleting all your Snowstorm Elasticsearch indexes and starting a fresh because some of the index mappings have changed to support better non-english search and other features. We still require just the date in the effectiveTime field so remember to simplify those if you do import the Vet Extension.

I hope you are tempted to try! 😄

Kind regards,
Kai

from snowstorm.

kaicode avatar kaicode commented on August 10, 2024

Closing this ticket because I believe it's fixed in 4.1.0.
Please add comments or reopen the ticket as required.

from snowstorm.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.