Git Product home page Git Product logo

moodle-search_elastic's Introduction

GitHub Workflow Status (branch)

Moodle Global Search - Elasticsearch Backend

This plugin allows Moodle to use Elasticsearch as the search engine for Moodle's Global Search.

The following features are provided by this plugin:

  • Multiple versions of Elasticsearch
  • File indexing
  • Request signing, compatible with Amazon Web Services (AWS)
  • Respects Moodle Proxy settings
  • Image recognition and indexing (via external service)

Supported Moodle Versions

This plugin currently supports Moodle:

Moodle version Branch
Moodle 4.2 and up MOODLE_402_STABLE
Moodle 3.10 to 4.1 MOODLE_310_STABLE
Moodle 3.5 to 3.9 master

Elasticsearch Version Support

Currently this plugin is tested to work against the following versions of Elasticsearch:

  • 5.5.0
  • 6.4.1
  • 6.6.1
  • 7.17.9
  • 8.5.3

And following version of OpenSearch:

  • 2.4.1

Verified Platforms

This plugin has been tested to work on the following cloud platforms:

Generic Elasticsearch Setup

To use this plugin first you will need to setup an Elaticsearch service.

The following is the bare minimum to get Elasticsearch working in a Debian/Ubuntu Operating System environment. Consult the Elasticsearch Documention for in depth instructions, or for details on how to install on other operating systems.

NOTE: The instructions below should only be used for test and dev purposes. Don't do this in production. For a production setup we recommend Elasticsearch running as a cluster, getting started documentation can be found here: https://www.elastic.co/guide/en/elasticsearch/reference/current/setup.html

Elasticsearch requires Java as a prerequisite, to install Java:

sudo apt-get install default-jre default-jdk

Once Java is installed, the following commands will install and start Elasticsearch.

wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-6.6.1.deb
sudo dpkg -i elasticsearch-6.6.1.deb
sudo update-rc.d elasticsearch defaults
sudo service elasticsearch start

A quick test can be performed by running the following from the command line.

curl -X GET 'http://localhost:9200'

The output should look something like:

{
  "name" : "1QHLiux",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "mLRqIsnVRrGdgg2OfHWNrg",
  "version" : {
    "number" : "5.1.2",
    "build_hash" : "c8c4c16",
    "build_date" : "2017-01-11T20:18:39.146Z",
    "build_snapshot" : false,
    "lucene_version" : "6.3.0"
  },
  "tagline" : "You Know, for Search"
}

By default the Elasticsearch service is available on: http://localhost:9200

Docker

You can also run Elasticsearch with Docker. The project publishes an offical container for supported version with instructions.

Azure Elasticsearch Setup

To use this plugin first you will need to setup an Elaticsearch service.

To use Microsoft Azure to provide an Elasticsearch service for Moodle:

  1. Create a Microsoft Azure account: Account creation page
  2. Create a Linux virtual machine and connect to virtual machine: Azure Linux virtual machine setup guide
  3. Setup an Elasticsearch service:

Elasticsearch requires Java as a prerequisite, to install Java:


sudo apt-get install default-jre default-jdk

Once Java is installed, the following commands will install and start Elasticsearch.


wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-5.5.0.deb
sudo dpkg -i elasticsearch-5.5.0.deb
sudo update-rc.d elasticsearch defaults
sudo service elasticsearch start

A quick test can be performed by running the following from the command line.


curl -X GET 'http://localhost:9200'

The output should look something like:


{
  "name" : "1QHLiux",
  "cluster_name" : "elasticsearch",
  "cluster_uuid" : "mLRqIsnVRrGdgg2OfHWNrg",
  "version" : {
    "number" : "5.1.2",
    "build_hash" : "c8c4c16",
    "build_date" : "2017-01-11T20:18:39.146Z",
    "build_snapshot" : false,
    "lucene_version" : "6.3.0"
  },
  "tagline" : "You Know, for Search"
}

AWS Elasticsearch Setup

To use this plugin first you will need to setup an Elaticsearch service.

To use Amazon Webservices (AWS) to provide an Elasticsearch service for Moodle:

  1. Create an AWS account: Account creation guide
  2. Setup an Elasticsearch service: AWS Elasticsearch setup guide

Elastic Cloud Setup

To use Elastic Cloud to provide an Elasticsearch service for Moodle:

  1. Create an Elastic Cloud account and set up Elasticsearch: Elasticsearch: Getting Started
  2. Generate API Key for your search index and set this key to the moodle plugin settings.

Moodle Plugin Installation

Once you have setup an Elasticsearch service you can now install the Moodle plugin.

These setup steps are the same regardless of how you have setup the Elasticsearch service.

  1. Get the code and copy/ install it to: <moodledir>/search/engine/elastic
  2. This plugin also depends on local_aws get the code from https://github.com/catalyst/moodle-local_aws and copy/ install it into <moodledir>/local/aws (This is required regardless of how your Elasticsearch service is being supplied.)
  3. Run the upgrade: sudo -u www-data php admin/cli/upgrade Note: the user may be different to www-data on your system.

Moodle Plugin Setup

Once you have setup an Elasticsearch service you can now configure the Moodle plugin.

These setup steps are the same regardless of how you have setup the Elasticsearch service.

  1. Log into Moodle as an administrator
  2. Set up the plugin in Site administration > Plugins > Search > Manage global search by selecting elastic as the search engine.
  3. Configure the Elasticsearch plugin at: Site administration > Plugins > Search > Elastic
  4. Set hostname and port of your Elasticsearch server
  5. Optionally, change the Request size variable. Generally this can be left as is. Some Elasticsearch providers such as AWS have a limit on how big the HTTP payload can be. Therefore we limit it to a size in bytes.
  6. Optionally, set API Key as some cloud platforms (e.g. Elastic Cloud) use it for authorizing HTTP requests.
  7. To create the index and populate Elasticsearch with your site's data, run this CLI script. sudo -u www-data php search/cli/indexer.php --force
  8. Enable Global search in Site administration > Advanced features

File Indexing Support

This plugin uses Apache Tika for file indexing support. Tika parses files, extracts the text, and return it via a REST API.

Currently this plugin is tested to work against the following versions of Tika:

  • 1.16
  • 1.28
  • 2.5.0
  • 2.8.0

Tika Setup

Setting up a Tika test service is straight forward. In most cases on a Linux environment, you can simply download the Java JAR then run the service.


wget http://apache.mirror.amaze.com.au/tika/tika-server-1.16.jar
java -jar tika-server-1.16.jar

This will start Tika on the host. By default the Tika service is available on: http://localhost:9998

Enabling File indexing support in Moodle

Once a Tika service is available the Elasticsearch plugin in Moodle needs to be configured for file indexing support.
Assuming you have already followed the basic installation steps, to enable file indexing support:

  1. Configure the Elasticsearch plugin at: Site administration > Plugins > Search > Elastic
  2. Select the Enable file indexing checkbox.
  3. Set Tika hostname and Tika port of your Tika service. If you followed the basic Tika setup instructions the defaults should not need changing.
  4. Click the Save Changes button.

What is Tika

From the Apache Tika website:

The Apache Tika™ toolkit detects and extracts metadata and text from over a thousand different file types (such as PPT, XLS, and PDF). All of these file types can be parsed through a single interface, making Tika useful for search engine indexing, content analysis, translation, and much more. You can find the latest release on the download page. Please see the Getting Started page for more information on how to start using Tika.

Why use Tika as a stand alone service?

It is common to see Elasticsearch implementations using an Elasticsearch file indexing plugin rather than a stand alone service. Current Elasticsearch plugins are a wrapper around Tika. (The Solr search engine also uses Tika).
Using Tika as a standalone service has the following advantages:

  • Can support file indexing for Elasticsearch setups that don't support file indexing plugins such as AWS.
  • No need to change setup or plugins based on Elasticsearch version.
  • You can share one Tika service across multiple Elasticsearch clusters.
  • Can run Tika on dedicated infrastructure that is not part of your search nodes.
  • Files stored using native Elasticsearch functionality are stored as separate records inside Elasticsearch, these are separate to the rest of the data stored relating to that file.
  • Ingesting files using native Elasticsearch functionality is very inefficient. Files are stored in the Elasticsearch internal database as base64 encoded strings. Base64 on average takes up 30% more space than the original binary. This is in addition to the content extracted from the file which is also stored in Elasticsearch.
  • The Elasticsearch documentation also states:
Extracting contents from binary data is a resource intensive operation and consumes a lot of resources. It is highly recommended to run pipelines using this processor in a dedicated ingest node.

Image Recognition and Indexing

This plugin can use the Amazon Web Services (AWS) [Rekognition service(https://aws.amazon.com/rekognition/) to identify the contents of images. The identified content is then indexed by Elasticsearch and can be searched for in Moodle (cool huh?).

NOTE: Indexing of files by Moodle's core Global Search is currently limited to only indexing files from a couple of places. Tracker issue MDL-59459 has been raised to increase the coverage of the files indexed by Global Search.

Currently the best resource to use to test image search functionality it so add an image via the Moodle course file resource.

Enabling image recognition and indexing support in Moodle

Once you have setup Elasticsearch in AWS Moodle needs to be configured for Image Recognition.
Assuming you have already followed the basic installation steps and the file indexing steps, to enable Image Recognition:

  1. Configure the Elasticsearch plugin at: Site administration > Plugins > Search > Elastic
  2. Select the Enable image signing checkbox.
  3. Set Key ID, Secret Key and Region of your AWS credentials and Rekognition region.
  4. Click the Save Changes button.

NOTE: You will need a set of AWS API keys for an AWS IAM user with full Rekognition permissions. Setting this up is beyond the scope of this README. for further information see the AWS Documentation.

Request Signing

Amazon Web Services (AWS) provide Elasticsearch as a managed service. This makes it easy to provision and manage and Elasticsearch cluster.
One of the ways you can secure access to your data in Elasticsearch when using AWS is to use request signing. Request signing allows only valid signed requests to be accepted by the Elasticsearch endpoint. Requests that are unsigned are not authorised to access the endpoint.

Enabling Request Signing support in Moodle

Once you have setup Elasticsearch in AWS Moodle needs to be configured for Request Signing.
Assuming you have already followed the basic installation steps, to enable Request Signing:

  1. Configure the Elasticsearch plugin at: Site administration > Plugins > Search > Elastic
  2. Select the Enable request signing checkbox.
  3. Set Key ID, Secret Key and Region of your AWS credentials and Elasticsearch region.
  4. Click the Save Changes button.

Webservices

This plugin exposes two AJAX enabled webservices, to allow you to integrate Moodle's Global search with other systems and services. The two available webservices are:

  • search_elastic_search - Returns search results based on provided search query.
  • search_elastic_search_areas - Returns the search area IDs for each available search area in Moodle.

Setup and documentation of these services is connsistent with other Moodle core web services.

This plugin sets up a pre-configured External service called Search service when the plugin is installed. This service adds and enables the two webservice methods provided by this plugin.

NOTE: You will need to have Global search and this plugin enabled and configured correctly before you can use the provided web services.

Test Setup

In order to run the PHP Unit tests for this plugin you need to setup and configure an Elasticsearch instance as will as supply the instance details to Moodle. You need to define:

  • Hostname: the name URL of the host of your Elasticsearch Instance
  • Port: The TCP port the host is listening on
  • Index: The name of the index to use during tests. NOTE: Make sure this is different from your production index!

Setup via config.php

To define the required variables in via your Moodle configuration file, add the following to config.php:


define('TEST_SEARCH_ELASTIC_HOSTNAME', 'http://127.0.0.1');
define('TEST_SEARCH_ELASTIC_PORT', 9200);
define('TEST_SEARCH_ELASTIC_INDEX', 'moodle_test_2');

Setup via Environment variables

The required Elasticserach instance configuration variables can also be provided as environment variables. To do this at the Linux command line:


export TEST_SEARCH_ELASTIC_HOSTNAME=http://127.0.0.1; export TEST_SEARCH_ELASTIC_PORT=9200; export TEST_SEARCH_ELASTIC_INDEX=moodle_test

Running the tests

First initialise the test environment, from the Moodle code home directory: php admin/tool/phpunit/cli/init.php To run only this plugins tests: vendor/bin/phpunit search_elastic_engine_testcase search/engine/elastic/tests/engine_test.php

Crafted by Catalyst IT

This plugin was developed by Catalyst IT Australia:

https://www.catalyst-au.net/

Catalyst IT

Contributing and Support

Issues, and pull requests using github are welcome and encouraged!

https://github.com/catalyst/moodle-search_elastic/issues

If you would like commercial support or would like to sponsor additional improvements to this plugin please contact us:

https://www.catalyst-au.net/contact-us

moodle-search_elastic's People

Contributors

andrewhancox avatar andrewmadden avatar anupamatd avatar brendanheywood avatar cameron1729 avatar dmitriim avatar dvdcastro avatar golenkovm avatar jackson-catalyst avatar jwalits avatar keevan avatar marxjohnson avatar matthewhilton avatar peterburnett avatar scottverbeek avatar srdjan-catalyst avatar superzoelu avatar tuanngocnguyen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

moodle-search_elastic's Issues

Problems with setup on Moodle 3.4

There is an error in the setup in Moodle 3.4 and 3.5 where the plugin returns "Section Error" and we cna get no further in the installation. ( we down graded and the plugin and it worked fine)

to3dmoq

And this is the error:
5ref1a6

Indexing cron tasks deletes results

This is a critical one.
When the index is initally created everything is fine. i.e by running: sudo -u www-data php search/cli/indexer.php --force --reindex
However, when the index is updated either by scheduled task or CLI (sudo -u www-data php search/cli/indexer.php) all search results are deleted from the search engine backend.

All search results filtered by Moodle

We are seeing an edge case where a search term entered in Moodle returns no results to the user. However there are plenty of results being returned by the Elasticsearch backend.

This looks to be an issue with the compile_results method in the engine class. It is still to be 100% confirmed but the root cause seems to be:

  • User searches for a term
  • Results from the Elasticsearch backend are limited to 100 total results.
  • 100 Results are returned from the Elasticsearch backend
  • The user has access to the context the results are in, but not the results themselves (This is the case for messages between users).
  • compile_results filters out the results that the user doesn't have access to. In this case all of the results are filtered.
  • No results are displayed to the end user.

The next steps:

  • Confirm the above is correct
  • Find out why the context filtering is allowing results to be returned that the user doesn't have access to. This might mean that user messages are a special case.
  • fix it

Debug messages with indexing the "Messages - received" and "Messages - sent" area

Hi Matt,

I am currently looking into search_elastic and have added the latest version of this plugin to a Moodle 3.2.3+ (Build: 20170622) instance and have hooked this up to a fresh elasticsearch 5.5 instance.

While doing the first indexing with
sudo -u apache /opt/rh/rh-php70/root/usr/bin/php /var/www/html/moodle_dev3/search/cli/indexer.php --force
I saw that there are tons of CLI debug messages for the "Messages - received" and "Messages - sent" area telling me:

++ Error retrieving core_message-message_sent 2111873 document, not all required data is available: Ungültige Nutzer/in ++
* line 59 of /message/classes/search/base_message.php: call to debugging()
* line 61 of /message/classes/search/message_sent.php: call to core_message\search\base_message->get_document()
* line ? of unknownfile: call to core_message\search\message_sent->get_document()
* line 103 of /lib/classes/dml/recordset_walk.php: call to call_user_func()
* line 573 of /search/classes/manager.php: call to core\dml\recordset_walk->current()
* line 75 of /search/cli/indexer.php: call to core_search\manager->index()

("Ungültige Nutzer/in" is the german term for "Invalid user" as I have set $CFG->lang = 'de' in config.php.

However, after some time and a very long CLI output, the indexing job comes to an end.

I had also quickly setup a SOLR instance some weeks ago and I can't remember that indexing the same Moodle instance with SOLR had also thrown these kind of errors.

The only reason for these problems I can think of is that we are using auth_ldap sync on a regular basis to delete Moodle accounts which have disappeared in LDAP, so there might be messages in the Moodle database which don't have a connected sender or receiver Moodle account anymore.

In the end, I am wondering if these debug messages come from your plugin or from Moodle core and if I should worry about them or not.

Thanks in advance,
Alex

Add Webservice and AJAX API support

Add webservice and AJAX API support.
This will allow webservices and embedded Ajax functions in Moodle to run search queries and get results. This would allow for custom search interfaces such as chat bots.

There is a very good argument that this functionality should exist in core a Global Search, however, in the interest of getting this into the wild quickly I'm adding to this plugin first.

Searching anything with a colon character ':' causes an error: Error executing query in Elasticsearch backend.

To see this error:

  1. Make sure the wildcard config options are enabled: Wilcard at the end/start.
  2. Search for

title:something

as specified in the help icon documentation:

image

See the error:

Sep 30 10:13:36 moodle_prod: 2019/09/30 10:13:36 [error]: *1 FastCGI sent in stderr: "PHP mes
sage: Default exception handler: Error executing query in Elasticsearch backend. Debug: {"error":{"root_cause":[{"type":"parse_exce
ption","reason":"parse_exception: Encountered \" \":\" \": \"\" at line 1, column 10.\nWas expecting one of:\n    <EOF> \n    <AND>
 ...\n    <OR> ...\n    <NOT> ...\n    \"+\" ...\n    \"-\" ...\n    <BAREOPER> ...\n    \"(\" ...\n    \"*\" ...\n    \"^\" ...\n 
   <QUOTED> ...\n    <TERM> ...\n    <FUZZY_SLOP> ...\n    <PREFIXTERM> ...\n    <WILDTERM> ...\n    <REGEXPTERM> ...\n    \"[\" ..
.\n    \"{\" ...\n    <NUMBER> ...\n    "}],"type":"search_phase_execution_exception","reason":"all shards failed","phase":"query",
"g" while reading upstream, server: localhost, request: "GET /search/index.php?q=something%3Aelse&c
ontext=11 HTTP/1.1", upstream: "fastcgi://unix:/var/run/php/php7.2-fpm.sock:", host: "localhost", referrer: "localhost/my/"


Course name and description indexing

Review how course name and description are indexed. This might be a core bug or an elastic search plugin bug.

I have a course call "Mathematics 101", and when I search for "math" I don't get any results. I should get the course.

Core bugs that could be related: MDL-59373 and MDL-55303

Cannot connect to standalone tika instance

Hi Matt,

I am currently looking into search_elastic and have added the latest version of this plugin to a Moodle 3.2.3+ (Build: 20170622) instance and have hooked this up to a fresh elasticsearch 5.5 instance.

I also started a standalone tika instance running on a separate machine and configured this tika instance in the plugin's settings.

While doing the first indexing with
sudo -u apache /opt/rh/rh-php70/root/usr/bin/php /var/www/html/moodle_dev3/search/cli/indexer.php --force
with fileindexing enabled, the indexer script encountered a fatal error and stopped with this message:

PHP Notice:  Undefined variable: client in /var/www/html/moodle_dev3/search/engine/elastic/classes/document.php on line 157

Notice: Undefined variable: client in /var/www/html/moodle_dev3/search/engine/elastic/classes/document.php on line 157
Default exception handler: Fehler: Call to a member function post() on null Debug: 
Error code: generalexceptionmessage
* line 157 of /search/engine/elastic/classes/document.php: Error thrown
* line 286 of /search/engine/elastic/classes/document.php: call to search_elastic\document->extract_text()
* line 352 of /search/engine/elastic/classes/engine.php: call to search_elastic\document->export_file_for_engine()
* line 510 of /search/engine/elastic/classes/engine.php: call to search_elastic\engine->process_document_files()
* line 588 of /search/classes/manager.php: call to search_elastic\engine->add_document()
* line 75 of /search/cli/indexer.php: call to core_search\manager->index()

!!! Fehler: Call to a member function post() on null !!!
!! 
Error code: generalexceptionmessage !!
!! Stack trace: * line 157 of /search/engine/elastic/classes/document.php: Error thrown
* line 286 of /search/engine/elastic/classes/document.php: call to search_elastic\document-&gt;extract_text()
* line 352 of /search/engine/elastic/classes/engine.php: call to search_elastic\document-&gt;export_file_for_engine()
* line 510 of /search/engine/elastic/classes/engine.php: call to search_elastic\engine-&gt;process_document_files()
* line 588 of /search/classes/manager.php: call to search_elastic\engine-&gt;add_document()
* line 75 of /search/cli/indexer.php: call to core_search\manager-&gt;index()
 !!

I traced the problem back to commit 4c32c71 which breaks the connection to tika. Based on the latest code, this patch should solve the problem and clean up the function at the same time:

diff --git a/classes/document.php b/classes/document.php
index 6a574df..50ab93b 100644
--- a/classes/document.php
+++ b/classes/document.php
@@ -148,19 +148,18 @@ class document extends \core_search\document {
      */
     private function extract_text($file) {
         // TODO: add timeout and retries for tika.
-        $config = get_config('search_elastic');
         $extractedtext = '';
         $port = $this->tikaport;
-        $hostname = rtrim($this->tikahostname, "/");
+        $hostname = $this->tikahostname;
         $url = $hostname . ':'. $port . '/tika/form';
 
+        $client = new \curl();
         $response = $client->post($url, array('file' => $file));
         if ($client->info['http_code'] === 200) {
             $extractedtext = $response;
         }
 
         return $extractedtext;
-
     }
 
     /**

However, I am wondering how this problem could remain undetected as you are running this plugin in production...

Thanks,
Alex

Add timeout for requests

I have run into a situation where a restored config from another environment results in Moodle upgrade hanging at the search_elastic step as it tries and fails to reach the elastisearch server. Having a configurable timeout would resolve this issue where the elastisearch server falls over and/or cannot be reached.

Elasticsearch 2GB Limit guard

Elasticsearch has a 2GB limit on record size.
We need to add a check in code to not submit records greater than this limit.

This is an edge case that I don't expect to occur in real world use. However, to be safe we should add a check in the code.
If an individual record is over 2GB this check will fail and a debug message will be raised. The document at fault will not be added to the index

Add subplugin architecture

Refactor plugin to use a subplugin architecture.
The Tika text extraction and AWS Rekognition Image rekognition features should be sub plugins. This would make it easier to manage and create integrations to other services.
For example to use Google's image recognition service instead of AWS

Add image search functionality

Extend plugin to support indexing and searching of images.

AWS has a service that will return content information of a provided image. Integrate this service with the plugin so images can be indexed and searched.

Relevant documentation links:
http://docs.aws.amazon.com/aws-sdk-php/v3/api/api-rekognition-2016-06-27.html#detectlabels
https://aws.amazon.com/rekognition/

Initial high level tasks

  • Add configuration options for AWS credentials and region for Rekognition service
  • When files are indexed if a file is JPEG or PNG with dimensions greater than 80 x80 pixels send to recognition for processing.
  • Use returned metadata to create index document
  • Add document to index.

Support for Tika as an ElasticSearch plugin?

Hi,

I am currently looking into search_elastic to run it as an alternativ to search_solr, mainly because Elasticsearch seems to be easier to install and run on our RHEL 7 systems.

I have seen that you recommend running Tika for file indexing as a standalone application (see https://github.com/catalyst/moodle-search_elastic#tika-setup). However, there are no rpm packages for Tika out there as far as I see and fiddling with a manually configured service for Tika can be daunting.

On the other hand, you also write that there are Elasticsearch plugins for Tika. I have found https://www.elastic.co/guide/en/elasticsearch/plugins/current/ingest-attachment.html and would like to ask:

  • Is this plugin the Tika plugin you are mentioning?
  • Is search_elastic able to work with Tika as an Elasticsearch plugin or does it really need Tika as a standalone application?

Thanks,
Alex

Upgrade fails if invalid index url

It has been noted that if there is no pre-existing Moodle configuration for Elastic Search and the default config values fail to resolve to a response, this will instead return a new \search_elastic\guzzle_exception() which does not contain a getBody() method, thereby breaking the \search_elastic\engine method validate_index()

Here is a stack trace from an upgrade on a Moodle instance:

Default exception handler: Exception - Call to undefined method search_elastic\guzzle_exception::getBody() Debug:
Error code: generalexceptionmessage

  • line 147 of /search/engine/elastic/classes/engine.php: Error thrown
  • line 38 of /search/engine/elastic/db/upgrade.php: call to search_elastic\engine->validate_index()
  • line 632 of /lib/upgradelib.php: call to xmldb_search_elastic_upgrade()
  • line 1857 of /lib/upgradelib.php: call to upgrade_plugins()
  • line 182 of /admin/cli/upgrade.php: call to upgrade_noncore()

Support for Moodle 3.5

One issue in classes/query.php, line 321:
For example, this code is never executed because it seams that $usercontents is an object in 3.5 and not an array
// Add contexts. if (gettype($usercontexts) == 'array') { $contexts = $this->construct_contexts($usercontexts); array_push ($query['query']['bool']['filter']['bool']['must'], $contexts); }
Option "Search within enrolled courses only" does not have any effect than

Another change I had to make to make it work is this (added groupid, line 118):
$excludedfields = array('itemid', 'areaid', 'courseid', 'contextid', 'userid', 'owneruserid', 'modified', 'type', 'groupid' // added group id, otherwise it always fails when searching for string );

Enable file indexing retroactively is not possible

Steps to reproduce:

  • Don't enable the "Enable file indexing" setting in search_elastic
  • Fully index your Moodle instance
  • Enable the "Enable file indexing" setting in search_elastic retroactively

Expected result:

  • Moodle / search_elastic will index the existing files additionally to the existing content

Actual result:

  • There isn't anything indexed additionally

I assume that this additional index can't be made automatically under the hood and that it basically needs a full re-index of the existing content to also index the existing files. That's why I would just propose to add this fact to the description of the "Enable file indexing" setting in search_elastic.

Add support for document converter API

Moodle 3.3 introduced a document converter API: https://docs.moodle.org/dev/File_Converters
Need to modify this plugin to use the converter API for converting images and text files ready for indexing.

NOTE: this will probably mean making pre and post Moodle 3.3 branches and maintaining 2 versions of the plugin until Moodle 3.2 is out of support

The original idea was to implement sub plugin support as outlined in issue #19 however the document converter API is a better approach.

Add rekognition cost report

Using rekognition to extract data from images costs money, see: https://aws.amazon.com/rekognition/pricing/
It would be good to to have a report that analyzes the searchable images in a Moodle estimate and provides a report as to how much they will cost to index.
This will be useful in determining if we should turn on this feature.

Modifying activity results in duplicate result

When an existing activity is edited and the search index updated, the number of returned search results for that activity increases by one. This occurs every time the activity is modified and re indexed resulting in an ever increasing number of results for the same activity

Indexing files inside of html_blocks may be incorrect

When a File is embedded inside of a HTML content block, and Tika File indexing is enabled, during a search that includes content inside of the file, Moodle confuses the containing object of that text to be a block, and not a file, and so exceptions when trying to locate the block that may not exist.

This issue was fixed in PR #55, which forces the ID to be correct before display, to avoid exceptions. It is worth investigating what causes the ID to be incorrect in the first place, to avoid other issues of this type in future.
E.g. after set_data_from_engine just before display, the ID would be set to '123', the ID of the document of interest, not 'html_block-content-9', the ID of the containing block, which is what it should be set to. This points to an underlying problem in the way the data is indexed when files are embedded inside a block

Implement support for the regex search

Following the discussion with @mattporritt creating this issue.

Recently we got a couple of requests for reports like "Get all resources that use embedded link". It's really hard to scan all DB tables to get this info and build required report.

It seems like elastic search could be used for that. One missing bit is to be able to search by regex.

File indexing crashes if file is missing in Moodle filedir

Hi Matt,

I am currently looking into search_elastic and have added the latest version of this plugin to a Moodle 3.2.3+ (Build: 20170622) instance and have hooked this up to a fresh elasticsearch 5.5 instance.

I also started a standalone tika instance running on a separate machine and configured this tika instance in the plugin's settings.

While doing the first indexing with
sudo -u apache /opt/rh/rh-php70/root/usr/bin/php /var/www/html/moodle_dev3/search/cli/indexer.php --force
with fileindexing enabled, the indexer script encountered a fatal error and stopped with this message:

////Default exception handler: Die Datei kann nicht gelesen werden. Eventuell existiert sie nicht oder es gibt ein Rechteproblem. Debug: [dataroot]/filedir/c2/40/c24091a092e5afc7310088ce2e416d1d7efcda11
Error code: storedfilecannotread
* line 579 of /lib/filestorage/stored_file.php: file_exception thrown
* line 274 of /search/engine/elastic/classes/document.php: call to stored_file->get_imageinfo()
* line 353 of /search/engine/elastic/classes/engine.php: call to search_elastic\document->export_file_for_engine()
* line 511 of /search/engine/elastic/classes/engine.php: call to search_elastic\engine->process_document_files()
* line 588 of /search/classes/manager.php: call to search_elastic\engine->add_document()
* line 75 of /search/cli/indexer.php: call to core_search\manager->index()

!!! Die Datei kann nicht gelesen werden. Eventuell existiert sie nicht oder es gibt ein Rechteproblem. !!!
!! [dataroot]/filedir/c2/40/c24091a092e5afc7310088ce2e416d1d7efcda11
Error code: storedfilecannotread !!
!! Stack trace: * line 579 of /lib/filestorage/stored_file.php: file_exception thrown
* line 274 of /search/engine/elastic/classes/document.php: call to stored_file-&gt;get_imageinfo()
* line 353 of /search/engine/elastic/classes/engine.php: call to search_elastic\document-&gt;export_file_for_engine()
* line 511 of /search/engine/elastic/classes/engine.php: call to search_elastic\engine-&gt;process_document_files()
* line 588 of /search/classes/manager.php: call to search_elastic\engine-&gt;add_document()
* line 75 of /search/cli/indexer.php: call to core_search\manager-&gt;index()
 !!

The german sentences in the debug message mean that the file couldn't be read from disk. This conclusion is correct as I ran the indexing on a Moodle test instance which was rsynced from our production system with excluding big files in Moodledata (almost always videos) for storage saving reasons. Unfortunately, due to this fact, I am unable to test indexing properly on this test instance.

Would it be possible to check if a file really exists on disk before it is sent to the file indexing backend and ignore it otherwise?

Thanks,
Alex

Tika file size limit

It appears that Tika does not have a configuration based limit for the size of file that can be processed by the Tika service. Instead it seems to be limited by the Java memory for the Tika applicaiton. This is not ideal.

To give some control over the size of files submitted to Tika we need to add a user configuration option to this plugin.
This configuration option will limit the size of the file sent to Tika.
If a file is larger than this setting a file record in Elasticsearch will be created but the the file content will not be included in the index.

Move item deletion to ad-hoc task

Currently in engine::compile_results() items that return with \core_search\manager::ACCESS_DELETED are deleted from the elasticsearch index synchronously. This is a performance hit and doesn't need to be this way.

Instead lets refactor it to be an adhoc task

Document Mapping Failing

Currently there is an issue when the document mapping is created for the index. This is causing all document field types to be set to "text" instead of integer, date etc. This is causing date sorting to not work for search results. It is also causing strange result behaviour.

Also to make it worse, this condition happens on a real site, but does not happen for unit tests. The unit tests get the document created correctly.

Real site document mapping

curl -XGET 'http://localhost:9200/moodle2/_mapping?pretty=true' 
{
  "moodle2" : {
    "mappings" : {
      "doc" : {
        "properties" : {
          "areaid" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "content" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "contextid" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "courseid" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "description1" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "id" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "itemid" : {
            "type" : "long"
          },
          "modified" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "owneruserid" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "parentid" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "title" : {
            "type" : "text",
            "fields" : {
              "keyword" : {
                "type" : "keyword",
                "ignore_above" : 256
              }
            }
          },
          "type" : {
            "type" : "long"
          },
          "userid" : {
            "type" : "long"
          }
        }
      }
    }
  }
}

Unit test document mapping

curl -XGET 'http://localhost:9200/moodle_test/_mapping?pretty=true' 
{
  "moodle_test" : {
    "mappings" : {
      "doc" : {
        "properties" : {
          "areaid" : {
            "type" : "keyword"
          },
          "content" : {
            "type" : "text"
          },
          "contextid" : {
            "type" : "integer"
          },
          "courseid" : {
            "type" : "integer"
          },
          "id" : {
            "type" : "keyword"
          },
          "itemid" : {
            "type" : "integer"
          },
          "modified" : {
            "type" : "date",
            "format" : "epoch_second"
          },
          "owneruserid" : {
            "type" : "integer"
          },
          "parentid" : {
            "type" : "keyword"
          },
          "title" : {
            "type" : "text"
          },
          "type" : {
            "type" : "integer"
          }
        }
      }
    }
  }
}

Deleting a search area index doesn't seem to work

ie from /admin/searchareas.php

I first tried to delete the broken area, html block. It says it was deleted and no longer showed in the table. But the search results were still broken so I do not beleive it did the correct thing on the elastic side of the fence and reported a false positive.

When I deleted all indexed content and reset the lot, then it did work as expected.

Enabling "Wildcard at the start" breaks lucene requests

Steps to replicate:

  1. Assuming you have global search configured to use elastic search
  2. Go to /admin/settings.php?section=elasticsettings and enable "Wildcard at the start"
  3. Search for title:news

Error: Error executing query in search engine: Failed to parse query [*title:news]

Simplify README file

README file contains a robust description about the plugin and related platforms. It feels like it could be simplified and most of the info moved to github WIKI where it could be better organised.

general feedback

  • $item->index->status >= 300 would be future proof

  • add protection from network failure around json_decode and $client->post ( $docurl, $payload )->getBody

CLI Output inaccurate

When the CLI index task is run, and documents are added to the index the CLI output is wrong.
Regardless of how many documents are added the index messages of the form:
No new documents to index for area.
are received.
Not sure if this is a bug in this plugin or in core Global Search

Error retrieving core_message-message_sent

Running CLI reindex command
sudo -u www-data php search/cli/indexer.php --force --reindex

I'm getting

Processing Messages - sent area
++ Error retrieving core_message-message_sent 80 document, not all required data is available: Invalid user ++

  • line 59 of /message/classes/search/base_message.php: call to debugging()
  • line 61 of /message/classes/search/message_sent.php: call to core_message\search\base_message->get_document()
  • line ? of unknownfile: call to core_message\search\message_sent->get_document()
  • line 103 of /lib/classes/dml/recordset_walk.php: call to call_user_func()
  • line 573 of /search/classes/manager.php: call to core\dml\recordset_walk->current()
  • line 79 of /search/cli/indexer.php: call to core_search\manager->index()
    No new documents to index for Messages - sent area.

Maybe it's been related to some user being removed. Not sure if it needs any attention, but log it as part of my testing.

Requirement for local_aws on a local ElasticSearch instance?

Hi,

I am currently looking into search_elastic to run it as an alternativ to search_solr, mainly because Elasticsearch seems to be easier to install and run on our RHEL 7 systems.

I have seen that search_elastic requires local_aws. May I ask if local_aws is really necessary for local ElasticSearch instances (i.e. not in AWS)?

If it is really necessary, could you please also publish local_aws to the Moodle plugins repository so that we can install it from there and get update notifications?

If it is not necessary, could you please remove this requirement from version.php and e.g. replace it with some custom checks if local_aws is installed before a AWS instance can be configured?

Thanks,
Alex

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.