Comments (8)
Looks like the first attempt is here biocache-hubs#341 but it should be possible to deliver it as the new Solr (as per Core Infrastructure Upgrade) is in place.
@adam-collins open for comments/feedback
from extended-data-model.
AtlasOfLivingAustralia/ala-install@75ea0b7
from extended-data-model.
Visible on the Event Search
tab of https://biocache-databox.ala.org.au/#tab_eventSearch
from extended-data-model.
From Ely:
Testing Events queries in biocache-test.ala.org.au
Background
Testing is done using the acronym PPBES, which stands for Port Phillip Bay Environmental Survey
This was a survey conducted several times over a number of years.
Dataset name will contain the name of the survey/expedition and will vary slightly whilst always containing the acronym PPBES (PPBES, PPBES-3, PPBES-4 etc).
Within each individual survey field trip there will be multiple field collection sites. Each field collection site will be provided in the data as Field Number (not in Event ID)
Each field collection site (Field Number) may have multiple occurrences associated with it.
Query: Dataset name = PPBES
As a user, I am likely to be interested in all the specimens collected during a particular PPBES expedition. I may also be interested in specimens collected at a specific site across multiple surveys.
Returned 7,904 records with DQ filters on, 11,390 with DQ filters off.
Largest category of exclusions is Exclude duplicate records = filters out 3,402 records. Note that these are almost certainly not duplicates but multiple occurrences of the same species at the same location and time. This is a common error in duplicate detection for animal records.
First record returned is https://biocache-test.ala.org.au/occurrences/bf28998b-cf64-4e33-bbe7-5037f31cda44
This record shows dataset name supplied in record is PPBES-4 so partial match search is successful
I can also customise my filters to include “EventID”. Even though the raw data is being supplied in “Field Number” I can use EventID to filter on Field number.
Conclusion: query is successful but still missing is the ability to filter further based on Dataset Name. I have done a general query “PPBES” but now want to further filter my results to show me how many occurrences are associated with PPBES, PPBES-3 and PPBES-4.
Solution: add a facet to filter on “Dataset Name” Could DatasetName also be added to the facet list?
Query: Dataset name = PPBES-4
As a user, I expect this query to give me all the occurrences associated with survey PPBES-4. I don’t expect any other PPBES survey data (e.g. PPBES-3) to be returned in the result set.
Result: the query returns 1,061 records with filters on (1,209 with filters off)
If I show the list of field collection sites under the facet “Event ID” all the field collection sites are labelled as “PPBES-4” and then the site number, so the PPBES-4 query has been successful.
Conclusion: query runs as expected
Run the same query but without the hyphen in the Dataset name. Query = PPBES 4
Result: query returns the same result as PPBES-4 (with the hyphen)
Conclusion: query runs as expected
Query: EventID/Field Number = PPBES-4 102 1
As a user, I want to see all the records associated with the field site PPBES-4 102 1
I expect to see approximately 35 records. (If I query on PPBES-4 in DatasetName then facet on EventID, I get 35 records)
Query using EventID gives 33 records (with filter), or 37 records (no filter)
Query using Field Number gives 33 records (with filter) or 37 records (no filter)
Conclusion: query runs as expected, with a bonus that I can use either EventID or Field Number
Query: EventID/Field Number = PPBES-4 102
As a user, I want to see all the records associated with the field site PPBES-4 102 – but this time I have left a 1 off the end (partial match query)
Query using EventID I get zero matches
Query using Field Number I get zero matches
Conclusion: query fails
Query: EventID/Field Number = PPBES-4 102 1, PPBES-4 110 1, PPBES-4 203 1, PPBES-4 205 1
As a user, I am instructed to enter multiple Event IDs with one per line
Query using EventID I get 120 records (with filter), 128 (no filter)
Query using Field Number I get 120 records (with filter), 128 (no filter)
Conclusion: query runs as expected, with a bonus that I can use either EventID or Field Number
from extended-data-model.
@elywallis It has been deployed to biocache-test.ala.org.au, but it has only limited datasets available
from extended-data-model.
@qifeng-bai we should be able to issue partial searches, it seems that broke at some point, for example:
Searching for datasetname: PPBES should return matches for PPBES, PPBES-3, PPBES-4
Currently it returns result only for the exact match PPBES
Partial match should work for all fields on the event search tab.
See #43 for additional changes to Event Search tab
from extended-data-model.
If we check at:
https://biocache-ws-test.ala.org.au/ws/occurrences/search?q=dataset_name:PPBES returns >6000 records,
(https://atlaslivingaustralia.slack.com/archives/CC0JZ0YH0/p1655162555158999)
but. https://biocache-ws-test.ala.org.au/ws/occurrences/search?q=dataset_name:ppbes return zero.
Solution
Add datasetName into 'Text' field to perform full text search
Add datasetName into 'text_datasetName' to do partial search - require index rebuilt
from extended-data-model.
@javier-molina
Prod and Test both have text_datasetName defined, which enables partial search on, but in different positions. They have slight differences (See below)
However, there is no text_datasetName field defined in Pipelines/managed-schema (dev branch) - @djtfmartin mentioned this schema should be a point of truth.
Will change on Event related fields:
Prod | Test | Pipelines (added by Bai) |
---|---|---|
eventID: text | eventID: text | eventID: partial |
parentEventID: string | parentEventID: string | parentEventID: partial |
fieldNumber: string | fieldNumber: string | fieldNumber: partial |
datasetName: partial (text_datasetName) | datasetName: partial (text_datasetName) / text search | datasetName: partial (text_datasetName) / text search |
Other diff between Test and Dev
Dev | Test |
---|---|
taxonConceptID to text |
from extended-data-model.
Related Issues (20)
- Deploy DAGs to production Airflow
- Document services for API Gateway HOT 1
- Review eBird dataset contents HOT 4
- Remove MV dataset HOT 1
- Missing label for eventID HOT 1
- Source IMOS dataset with fine grained coordinates
- Create Release Plan
- Error downloading filtered dataset HOT 3
- Deploy UI in test HOT 2
- Deploy Node components in test HOT 1
- Create ES cluster for test/databox integration
- Dataset dr18393 issues HOT 4
- Default event type for event core archives from external sources HOT 1
- Support multi-configurations for UI HOT 3
- Convert Events-test to Events prod
- Airflow is still triggered whent the ApiGW returns 500 HOT 1
- Regression testing
- Event download - EMR reported 'Status: False' HOT 2
- FixEvents UI test/prod environment HOT 1
- Filtered downloads are empty HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from extended-data-model.