Git Product home page Git Product logo

Comments (6)

mah0001 avatar mah0001 commented on June 2, 2024

@RogerGee - I am sorry it has taken so long to reply. The full-text search in NADA does not support OR, it always performs an AND query combining all keywords as you have guessed. We will add support for OR in the next release of NADA.

For a better full-text search, NADA supports Solr (https://solr.apache.org/). Solr offers a much better search and lots of options to configure and tune the search according to your needs.

from nada.

RogerGee avatar RogerGee commented on June 2, 2024

@mah0001 Thank you very much for your response. I will look into Solr integration and appreciate the suggestion.

Kindly let me know if you'd like me to close this issue or leave it open pending support for operator OR in the next release of NADA.

from nada.

RogerGee avatar RogerGee commented on June 2, 2024

I configured an instance of NADA to connect to a Solr server. After getting everything indexed in Solr, I tried a few searches with Solr's Boolean operator syntax. The syntax didn't work. It appears, as with MySQL FULLTEXT, that the operators are getting stripped out of the search string. (See this snippet of the relevant source code that escapes terms.)

Please let me know if you have any recommendations or if I am missing something obvious with Solr. Otherwise, I will wait on future releases of NADA that may support more complex search expressions. Thank you.

from nada.

mah0001 avatar mah0001 commented on June 2, 2024

@RogerGee - The terms are not stripped, the method $this->escapeTerm escapes special characters such as spaces to be passed to SOLR. Without escaping, search queries will fail.

Try this:
1 - Enable debug mode for SOLR so you can see the actual queries run by SOLR. Edit application/config/solr and change the value for the setting 'solr_debug' to 'true':

$config['solr_debug']=true;

2 - Now try a search on the catalog search page, it will print the raw query executed by SOLR. You should be able to see if anything has been removed.

3- Copy the raw query and try directly in SOLR UI to see if you get same results.

Change mm (minimum match) options (for more info on mm, see https://solr.apache.org/guide/7_7/the-extended-dismax-query-parser.html).

4 - In the 'application/config/solr.php' file, change the value for 'mm' to '0%'.

'mm'='0%',

Here is screenshot with the Debug mode enabled:
image

from nada.

RogerGee avatar RogerGee commented on June 2, 2024

@mah0001 Thanks so much for your detailed response. I'm not well-versed in Solr, so it was very helpful!

I walked through the steps you outlined and noticed something peculiar. When I execute a query similar to yours (i.e. using the Boolean operator NOT), the spaces in my query are escaped whereas in your query they remain unescaped. When I run the query (with escaped spaces) in the Solr UI, I get the same unexpected results as in NADA. If I take out the backslashes, the results are correct. I presume this is because an escaped space is not treated as a delimiter for parsing a Boolean operator.

For example, when I search oklahoma NOT health, the q query parameter gets set to oklahoma\ NOT\ health. When you ran population NOT albania, the query parameter was set correctly to population NOT albania (i.e. without backslashes). I'm not sure why it would work one way for you and differently for me. See screenshot below.

Screen Shot 2023-11-14 at 11 24 09 AM

(As you can see, the first search result contains the word health, which is incorrect.)

I verified that this is happening due to Solarium\Core\Query\Helper::escapeTerm. When I run the snippet of code below, I get the value with the escaped spaces:

<?php

require_once 'vendor/autoload.php';

$config = [
    'endpoint' => [
        'localhost' => [
            'host' => '127.0.0.1',
            'port' => 8983,
            'path' => '/',
            'core' => 'nada',
        ],
    ],
];

$adapter = new Solarium\Core\Client\Adapter\Curl;
$event_dispatcher = new Symfony\Component\EventDispatcher\EventDispatcher;
$solr_client =  new Solarium\Client($adapter,$event_dispatcher,$config);

$query = $solr_client->createSelect();
$helper = $query->getHelper();

$search = 'oklahoma NOT state';
echo $helper->escapeTerm($search) . PHP_EOL;

$search = 'population NOT albania';
echo $helper->escapeTerm($search) . PHP_EOL;

The result of running this code in my environment is:

oklahoma\ NOT\ state
population\ NOT\ albania

I'm using the version of Solarium that is bundled with NADA's codebase (i.e. 6.2.7).

It seems that escapeTerm is designed to escape a single term. I don't know if it is supposed to be used to prepare the entire search string. I tried to alter the implementation in Catalog_search_solr.php to parse out the terms and escape each individually. This is pretty rudimentary and doesn't handle the case where you'd want a space included in a search term.

$query_keywords = preg_split('/\s+/',$this->study_keywords);
array_walk($query_keywords,function (string &$term) use($helper) {
    $BOOLEAN_OPS = ['AND','OR','NOT'];
    if (!in_array(strtoupper($term),$BOOLEAN_OPS)) {
        $term = $helper->escapeTerm($term);
    }
});
$query->setQuery(implode(' ',$query_keywords));

I'll keep looking into this to see if there's possibly some quirk of my environment that explains this. I appreciate any comments you might have. Thanks!

from nada.

mah0001 avatar mah0001 commented on June 2, 2024

@RogerGee Excellent debugging. You are right, escapeTerms usage was incorrect. For our NADA instance with Solr, we are not escaping the terms at all. I have pushed the changes to nada-5.3 branch. This branch has lots of other changes, it might be easier to replace your Catalog_search_solr.php
with this: (https://github.com/ihsn/nada/blob/nada-5.3/application/libraries/Catalog_search_solr.php)

With the escapeTerms removed, you should be able to get the correct results for Boolean queries and you can also use various other search options (https://solr.apache.org/guide/6_6/the-standard-query-parser.html):

Here is an example of searching using specific fields:

title:"country survey" AND years:2021

Let me know if you notice any issues.

from nada.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.