Comments (6)
@RogerGee - I am sorry it has taken so long to reply. The full-text search in NADA does not support OR, it always performs an AND query combining all keywords as you have guessed. We will add support for OR in the next release of NADA.
For a better full-text search, NADA supports Solr (https://solr.apache.org/). Solr offers a much better search and lots of options to configure and tune the search according to your needs.
from nada.
@mah0001 Thank you very much for your response. I will look into Solr integration and appreciate the suggestion.
Kindly let me know if you'd like me to close this issue or leave it open pending support for operator OR in the next release of NADA.
from nada.
I configured an instance of NADA to connect to a Solr server. After getting everything indexed in Solr, I tried a few searches with Solr's Boolean operator syntax. The syntax didn't work. It appears, as with MySQL FULLTEXT, that the operators are getting stripped out of the search string. (See this snippet of the relevant source code that escapes terms.)
Please let me know if you have any recommendations or if I am missing something obvious with Solr. Otherwise, I will wait on future releases of NADA that may support more complex search expressions. Thank you.
from nada.
@RogerGee - The terms are not stripped, the method $this->escapeTerm
escapes special characters such as spaces to be passed to SOLR. Without escaping, search queries will fail.
Try this:
1 - Enable debug mode for SOLR so you can see the actual queries run by SOLR. Edit application/config/solr
and change the value for the setting 'solr_debug' to 'true':
$config['solr_debug']=true;
2 - Now try a search on the catalog search page, it will print the raw query executed by SOLR. You should be able to see if anything has been removed.
3- Copy the raw query and try directly in SOLR UI to see if you get same results.
Change mm (minimum match) options (for more info on mm, see https://solr.apache.org/guide/7_7/the-extended-dismax-query-parser.html).
4 - In the 'application/config/solr.php' file, change the value for 'mm' to '0%'.
'mm'='0%',
Here is screenshot with the Debug mode enabled:
from nada.
@mah0001 Thanks so much for your detailed response. I'm not well-versed in Solr, so it was very helpful!
I walked through the steps you outlined and noticed something peculiar. When I execute a query similar to yours (i.e. using the Boolean operator NOT
), the spaces in my query are escaped whereas in your query they remain unescaped. When I run the query (with escaped spaces) in the Solr UI, I get the same unexpected results as in NADA. If I take out the backslashes, the results are correct. I presume this is because an escaped space is not treated as a delimiter for parsing a Boolean operator.
For example, when I search oklahoma NOT health
, the q
query parameter gets set to oklahoma\ NOT\ health
. When you ran population NOT albania
, the query parameter was set correctly to population NOT albania
(i.e. without backslashes). I'm not sure why it would work one way for you and differently for me. See screenshot below.
(As you can see, the first search result contains the word health
, which is incorrect.)
I verified that this is happening due to Solarium\Core\Query\Helper::escapeTerm
. When I run the snippet of code below, I get the value with the escaped spaces:
<?php
require_once 'vendor/autoload.php';
$config = [
'endpoint' => [
'localhost' => [
'host' => '127.0.0.1',
'port' => 8983,
'path' => '/',
'core' => 'nada',
],
],
];
$adapter = new Solarium\Core\Client\Adapter\Curl;
$event_dispatcher = new Symfony\Component\EventDispatcher\EventDispatcher;
$solr_client = new Solarium\Client($adapter,$event_dispatcher,$config);
$query = $solr_client->createSelect();
$helper = $query->getHelper();
$search = 'oklahoma NOT state';
echo $helper->escapeTerm($search) . PHP_EOL;
$search = 'population NOT albania';
echo $helper->escapeTerm($search) . PHP_EOL;
The result of running this code in my environment is:
oklahoma\ NOT\ state
population\ NOT\ albania
I'm using the version of Solarium that is bundled with NADA's codebase (i.e. 6.2.7
).
It seems that escapeTerm
is designed to escape a single term. I don't know if it is supposed to be used to prepare the entire search string. I tried to alter the implementation in Catalog_search_solr.php
to parse out the terms and escape each individually. This is pretty rudimentary and doesn't handle the case where you'd want a space included in a search term.
$query_keywords = preg_split('/\s+/',$this->study_keywords);
array_walk($query_keywords,function (string &$term) use($helper) {
$BOOLEAN_OPS = ['AND','OR','NOT'];
if (!in_array(strtoupper($term),$BOOLEAN_OPS)) {
$term = $helper->escapeTerm($term);
}
});
$query->setQuery(implode(' ',$query_keywords));
I'll keep looking into this to see if there's possibly some quirk of my environment that explains this. I appreciate any comments you might have. Thanks!
from nada.
@RogerGee Excellent debugging. You are right, escapeTerms
usage was incorrect. For our NADA instance with Solr, we are not escaping the terms at all. I have pushed the changes to nada-5.3 branch. This branch has lots of other changes, it might be easier to replace your Catalog_search_solr.php
with this: (https://github.com/ihsn/nada/blob/nada-5.3/application/libraries/Catalog_search_solr.php)
With the escapeTerms
removed, you should be able to get the correct results for Boolean queries and you can also use various other search options (https://solr.apache.org/guide/6_6/the-standard-query-parser.html):
Here is an example of searching using specific fields:
title:"country survey" AND years:2021
Let me know if you notice any issues.
from nada.
Related Issues (20)
- Some functions of the web app depends on the API and stop working when API is disabled HOT 2
- Add BibTex and other citation utilities to the DOCUMENT pages
- A
- Values displayed in the catalogue are strange or incorrect. HOT 1
- Issues running the installer HOT 5
- Potential security issue HOT 1
- Any plans to support DDI 3+ (Lifecycle)?
- Data Deposit Page not loading HOT 1
- Collection - field missing when editing a collection HOT 1
- XML issue - still in 1.2.2 HOT 1
- Import citation β doesnβt work HOT 1
- Wrong citation field shown HOT 1
- Data deposit - form doesn't match the right XML tag HOT 1
- Data deposit - Saving issue - Required fields HOT 1
- Permissions for Data deposit - Restricted access HOT 1
- Data deposit - XML tag filled in automatically when the form is empty HOT 2
- Process a data deposit - xml tag HOT 2
- Metadata editor - document description (docdesr) HOT 2
- Metadata editor - creation from scratch HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from nada.