Git Product home page Git Product logo

Comments (8)

javier-molina avatar javier-molina commented on June 11, 2024

Looks like the first attempt is here biocache-hubs#341 but it should be possible to deliver it as the new Solr (as per Core Infrastructure Upgrade) is in place.

@adam-collins open for comments/feedback

from extended-data-model.

adam-collins avatar adam-collins commented on June 11, 2024

AtlasOfLivingAustralia/ala-install@75ea0b7

from extended-data-model.

adam-collins avatar adam-collins commented on June 11, 2024

Visible on the Event Search tab of https://biocache-databox.ala.org.au/#tab_eventSearch

from extended-data-model.

javier-molina avatar javier-molina commented on June 11, 2024

From Ely:

Testing Events queries in biocache-test.ala.org.au
Background
Testing is done using the acronym PPBES, which stands for Port Phillip Bay Environmental Survey
This was a survey conducted several times over a number of years.
Dataset name will contain the name of the survey/expedition and will vary slightly whilst always containing the acronym PPBES (PPBES, PPBES-3, PPBES-4 etc).
Within each individual survey field trip there will be multiple field collection sites. Each field collection site will be provided in the data as Field Number (not in Event ID)
Each field collection site (Field Number) may have multiple occurrences associated with it.
Query: Dataset name = PPBES
As a user, I am likely to be interested in all the specimens collected during a particular PPBES expedition. I may also be interested in specimens collected at a specific site across multiple surveys.
Returned 7,904 records with DQ filters on, 11,390 with DQ filters off.
Largest category of exclusions is Exclude duplicate records = filters out 3,402 records. Note that these are almost certainly not duplicates but multiple occurrences of the same species at the same location and time. This is a common error in duplicate detection for animal records.
First record returned is https://biocache-test.ala.org.au/occurrences/bf28998b-cf64-4e33-bbe7-5037f31cda44
This record shows dataset name supplied in record is PPBES-4 so partial match search is successful
I can also customise my filters to include “EventID”. Even though the raw data is being supplied in “Field Number” I can use EventID to filter on Field number.
Conclusion: query is successful but still missing is the ability to filter further based on Dataset Name. I have done a general query “PPBES” but now want to further filter my results to show me how many occurrences are associated with PPBES, PPBES-3 and PPBES-4.
Solution: add a facet to filter on “Dataset Name” Could DatasetName also be added to the facet list?
Query: Dataset name = PPBES-4
As a user, I expect this query to give me all the occurrences associated with survey PPBES-4. I don’t expect any other PPBES survey data (e.g. PPBES-3) to be returned in the result set.
Result: the query returns 1,061 records with filters on (1,209 with filters off)
If I show the list of field collection sites under the facet “Event ID” all the field collection sites are labelled as “PPBES-4” and then the site number, so the PPBES-4 query has been successful.
Conclusion: query runs as expected
Run the same query but without the hyphen in the Dataset name. Query = PPBES 4
Result: query returns the same result as PPBES-4 (with the hyphen)
Conclusion: query runs as expected
Query: EventID/Field Number = PPBES-4 102 1
As a user, I want to see all the records associated with the field site PPBES-4 102 1
I expect to see approximately 35 records. (If I query on PPBES-4 in DatasetName then facet on EventID, I get 35 records)
Query using EventID gives 33 records (with filter), or 37 records (no filter)
Query using Field Number gives 33 records (with filter) or 37 records (no filter)
Conclusion: query runs as expected, with a bonus that I can use either EventID or Field Number
Query: EventID/Field Number = PPBES-4 102
As a user, I want to see all the records associated with the field site PPBES-4 102 – but this time I have left a 1 off the end (partial match query)
Query using EventID I get zero matches
Query using Field Number I get zero matches
Conclusion: query fails
Query: EventID/Field Number = PPBES-4 102 1, PPBES-4 110 1, PPBES-4 203 1, PPBES-4 205 1
As a user, I am instructed to enter multiple Event IDs with one per line
Query using EventID I get 120 records (with filter), 128 (no filter)
Query using Field Number I get 120 records (with filter), 128 (no filter)
Conclusion: query runs as expected, with a bonus that I can use either EventID or Field Number

from extended-data-model.

qifeng-bai avatar qifeng-bai commented on June 11, 2024

@elywallis It has been deployed to biocache-test.ala.org.au, but it has only limited datasets available

from extended-data-model.

javier-molina avatar javier-molina commented on June 11, 2024

@qifeng-bai we should be able to issue partial searches, it seems that broke at some point, for example:

Searching for datasetname: PPBES should return matches for PPBES, PPBES-3, PPBES-4
Currently it returns result only for the exact match PPBES

Partial match should work for all fields on the event search tab.

See #43 for additional changes to Event Search tab

from extended-data-model.

qifeng-bai avatar qifeng-bai commented on June 11, 2024

If we check at:

https://biocache-ws-test.ala.org.au/ws/occurrences/search?q=dataset_name:PPBES returns >6000 records,
(https://atlaslivingaustralia.slack.com/archives/CC0JZ0YH0/p1655162555158999)
but. https://biocache-ws-test.ala.org.au/ws/occurrences/search?q=dataset_name:ppbes return zero.

Solution
Add datasetName into 'Text' field to perform full text search
Add datasetName into 'text_datasetName' to do partial search - require index rebuilt

from extended-data-model.

qifeng-bai avatar qifeng-bai commented on June 11, 2024

@javier-molina
Prod and Test both have text_datasetName defined, which enables partial search on, but in different positions. They have slight differences (See below)

However, there is no text_datasetName field defined in Pipelines/managed-schema (dev branch) - @djtfmartin mentioned this schema should be a point of truth.

Will change on Event related fields:

Prod Test Pipelines (added by Bai)
eventID: text eventID: text eventID: partial
parentEventID: string parentEventID: string parentEventID: partial
fieldNumber: string fieldNumber: string fieldNumber: partial
datasetName: partial (text_datasetName) datasetName: partial (text_datasetName) / text search datasetName: partial (text_datasetName) / text search

Other diff between Test and Dev

Dev Test
taxonConceptID to text

from extended-data-model.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.