Git Product home page Git Product logo

examine's People

Contributors

alastairtree avatar benjaminc avatar bergmania avatar bjarnef avatar callumbwhyte avatar chrish619 avatar dependabot[bot] avatar fcingolani avatar fspezi avatar ja0b avatar jakoss avatar jbreuer avatar jclementson avatar jmayntzhusen avatar jsheardry avatar lars-erik avatar leekelleher avatar matthewcare avatar nikcio avatar nzdev avatar perplexdaniel avatar samgooch avatar shazwazza avatar sniffdk avatar vivekboii avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

examine's Issues

Lucene.Net.Store.AlreadyClosedException: this IndexReader is closed - during shutdown

When running under stress tests that shuts down the appdomain very often while also trying to view search results, an exception may occur:

Lucene.Net.Store.AlreadyClosedException: this IndexReader is closed

This is because when the app domain is shutdown, the reader is closed, but at the same time another request might still be iterating it. So we need to only close the reader at the last possible moment just before the appdomain terminates.

OrderBy and OrderByDescending not working

I'm trying to search the index and have it return the results ordered, but it isn't working for me; the results are always returned in the same order. Here is my query:

BaseSearchProvider searcher = ExamineManager.Instance.SearchProviderCollection["FooSearcher"]; ISearchCriteria searchCriteria = searcher.CreateSearchCriteria(); IBooleanOperation boolOperation = searchCriteria.NodeTypeAlias("fooNodeTypeAlias"); boolOperation = boolOperation.And().OrderBy("fooName"); ISearchResults results = searcher.Search(boolOperation.Compile());

Here is my index:
<IndexSet SetName="FooIndexSet" IndexPath="~/App_Data/TEMP/ExamineIndexes/{machinename}/Foo/"> <IndexAttributeFields> <add Name="id" /> <add Name="nodeName"/> <add Name="nodeTypeAlias" /> </IndexAttributeFields> <IndexUserFields> <add Name="fooName" EnableSorting="true" /> </IndexUserFields> </IndexSet>

And here is my searcher:
<add name="FooIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine" supportUnpublished="false" supportProtected="true" analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net" />

Edit: I'm using Examine 0.1.70.0

Wrong files showing while querying the index path using lucene.

Hi everyone,

I'm new to Lucene. I've an issue while querying the index file (path) while searching for a string in the files (say: doc files). I've looped through all the files using the following code.

string indexFileLocation = txtRootDirectory.Text.Trim();
            Lucene.Net.Store.Directory dir = Lucene.Net.Store.FSDirectory.GetDirectory(indexFileLocation,
                 false);
            Lucene.Net.Search.IndexSearcher searcher = new Lucene.Net.Search.IndexSearcher(dir, false);
            Lucene.Net.Index.Term searchTerm = new Lucene.Net.Index.Term("content", 
                 txtSearch.Text.Trim());
            Lucene.Net.Search.Query query = new Lucene.Net.Search.TermQuery(searchTerm);
            Lucene.Net.Search.Hits hits = searcher.Search(query);

for (int i = 0; i < hits.Length(); i++)
            {
                Lucene.Net.Documents.Document doc = hits.Doc(i);
                StringBuilder contentValue = new StringBuilder();
                contentValue.Append(doc.Get("content"));
                string id = doc.Get("id");
                lblSearchResults.Text += id + "<br />";
            }

But unfortunately, I've been getting the same search results, the same file name as follows.

repetition

I couldn't figure out if I'm doing wrong anywhere in my code. Please help me out.

Thanks in advance.

Q: How to use 'ExamineManager.Instance.ReIndexNode()' with custom data?

Hi!
I've got a custom (non umbraco) Indexer set up and working properly by looping through my custom data and creating a "SimpleDataSet" for each item. Now I am adding functionality to update the index when operations happen on the custom objects.

I have successfully set up a "Remove from Index" function to run on object delete by looking up the index nodeId for the object, and passing it to 'ExamineManager.Instance.DeleteFromIndex(...)'

Now I'd like to add operations to run on object create and update which would add just the current object to the index. I was looking at 'ExamineManager.Instance.ReIndexNode()' which expects an "XElement" as the representation of the index data, but I am unclear what format that needs to be in, or how to convert a SimpleDataSet into an XElement.

Is it possible to only index a single object? I'd rather not have to run 'ExamineManager.Instance.IndexAll()' every time something is added or changed... But perhaps that isn't possible?

NRT Readers

We can allow having Near Real Time readers in Examine (yes even in v1!), before I only thought this possible based on the ctor but have managed to come up with a nice solution.

Push into Search methods into BaseSearchProvider or create a Interface and use it

Hi I am trying to integrate Elasticsearch into umbraco using Examine but I hit a roadblock as there are a few places in umbraco where the SearchProvider is cast to the specific LuceneSearcher to use a few extra Search methods implemented on the Specific searcher.

Would it be possible to create two abstract methods
BaseLuceneSearcher: ISearchResults Search(ISearchCriteria searchParams, int maxResults)
LuceneSearcher: ISearchResults Search(string searchText, bool useWildcards, string indexType)

in the BaseSearchProvider, or create a Inferface for the methods that can be implemented by other providers.

this would make it possible to avoid the casts in Umbraco that makes it difficult to implement SearchProviders based on other engines than Lucene.net?

Next up would be to refactor the usage of UmbracoContentIndexer in Umbraco as it its also tied to Lucene.net but that's another story. :-)

StackOverflow Exception when running v0.1.67 with Umbraco v7.2.8

I previously had an Umbraco v7.2.8 installation running with Examine v0.1.66 for a while now. After upgrading Examine to v0.1.67, the application fails to start with the following exception occurring every 30 seconds or so.

screen shot 2015-08-12 at 15 17 29

In order to reproduce the error, create a v7.2.8 installation of Umbraco and upgrade it's Examine Nuget package to v0.1.67. When navigating to the site for the first time, the above exception should occur. When navigating to the site, the Examine indexes folder should be deleted prior to navigating to replicate. To fix, I have downgraded the Examine Nuget package to v0.1.66 which seems to allow the application to start correctly.

I understand this is probably not that much to go off. I've have checked Umbraco's log files and my system's Event Logs and nothing is logged relating to the exception. I can replicate this using a fresh install of Umbraco via Nuget. I'm guessing the issue will be related to the following changes.

v0.1.66...v0.1.67

The exception occurs within LuceneIndex.cs according to Visual Studio:

screen shot 2015-08-12 at 15 53 38

Support for multiple fields with the same name

Currently if you use DocumentWriting event and create multiple fields with the same name, it will index just fine since Lucene supports that. However, when you search you will get a dictionary error because it is trying to add the same field to the dictionary multiple times.

Since we cannot change the dictionary result since that is a breaking change, we'll support this by doing the following:

The normal dictionary of values will contain the first value, however the there's a new method on the SearchResult object: public IEnumerable<string> GetValues(string key) which will give you all of the values indexed for that field. This method will never return null, if a key doesn't exist at all an empty collection is returned.

Issue in Italian PC

I look that when the indexing process is running in "it-IT" culture, and when the data to indexing contains datatypes like Double, Lucene fires an Exception.

I found the row code where it is happen:
Examine.LuceneEngine.Providers.LuceneIndexer.TryConvert<T>(string var, out object parsedVal) merhod.

The tc.ConvertFrom(val) row try to convert val string to T type. If T is Double or Float, and if val contains decimal digits (like "1234.567"), the method can't convert to T because the DOT char is not the decimal separator in "it-IT" culture.

I think that I have the solution.
I look that this code solve the issue:
parsedVal = (T)tc.ConvertFrom(null, System.Globalization.CultureInfo.InvariantCulture, val);

Is this a good solution? Is it possible to apply this patch in Examine?

Thanks

OutOfMemoryException Building Index - Need to make the enumeration more lazy

I've setup an index using a SimpleDataIndexer that is trying to index data from a database table with around 2 million rows. I'm using Umbraco's database object to query the data which appears to use a DataReader to read a row at a time. In my SimpleDataService I'm looping through the objects returned from Umbraco and yielding a new SimpleDataSet. Am I doing something wrong or is indexing this much data just not supported?

Search results: retrieve only specific fields instead of returning the whole doc.

Hi,

I have been reading the Examine's documentation and looking for in Umbraco forums, but couldn't find anything.
I was wondering if it is possible to retrieve only specific fields instead of returning the whole doc in the search results?
If not so, would it be difficult to implement this feature (I am considering to submit a PR)?

I have found the following doc/examples on the internet:
http://lucene.apache.org/core/2_9_4/api/all/org/apache/lucene/document/FieldSelector.html
http://lucene.apache.org/core/2_9_4/api/all/org/apache/lucene/document/FieldSelectorResult.html
http://kb.ucla.edu/articles/why-are-lucenes-stored-fields-so-slow-to-access
Would it be the correct way to implement this feature?

Thanks,
Alain

Indexer Error when trying to rebuild the index

I have a simple console app to index the data from a SQL table. I am receiving the following error

Value cannot be null
at System.Web.Configuration.ProvidersHelper.InstantiateProvider(ProviderSettings providerSettings, Type providerType)
at System.Web.Configuration.ProvidersHelper.InstantiateProviders(ProviderSettingsCollection configProviders, ProviderCollection providers, Type providerType)
at Examine.ExamineManager.EnsureProviders() in X:\Projects\Examine\Examine\Projects\Examine\ExamineManager.cs:line 96
at Examine.ExamineManager.get_IndexProviderCollection() in X:\Projects\Examine\Examine\Projects\Examine\ExamineManager.cs:line 72

on this line : ExamineManager.Instance.IndexProviderCollection["Simple2Indexer"].RebuildIndex();

Here is what My app.config looks like

<configSections>
    <!-- For more information on Entity Framework configuration, visit http://go.microsoft.com/fwlink/?LinkID=237468 -->
    <section name="Examine" type="Examine.Config.ExamineSettings, Examine" requirePermission="false" />
    <section name="ExamineLuceneIndexSets" type="Examine.LuceneEngine.Config.IndexSets, Examine" requirePermission="false" />

  </configSections>

  <Examine RebuildOnAppStart="false">
    <ExamineIndexProviders>
      <providers>
        <add name="Simple2Indexer" type="Examine.LuceneEngine.Providers.SimpleDataIndexer, Examine" dataService="LucenePOC.Data.ForumDataReaderService,LucenePOC.Data"  indexTypes="TestType" runAsync="false"/>
        <add name="SecondIndexer" type="Examine.LuceneEngine.Providers.SimpleDataIndexer, Examine" dataService="LucenePOC.Data.ForumDataReaderService,LucenePOC.Data" indexTypes="TestType2" runAsync="false"/>
      </providers>
    </ExamineIndexProviders>
    <ExamineSearchProviders defaultProvider="Simple2Searcher">
      <providers>
        <add name="Simple2Searcher" type="Examine.LuceneEngine.Providers.LuceneSearcher, Examine"  />
        <add name="MultiIndexSearcher" type="Examine.LuceneEngine.Providers.MultiIndexSearcher, Examine"
         indexSets="Simple2IndexSet,SecondIndexSet" />
      </providers>
    </ExamineSearchProviders>
  </Examine>
  <ExamineLuceneIndexSets>
    <IndexSet SetName="Simple2IndexSet" IndexPath="F:\Temp\Examine\SimpleIndexSet2">
      <IndexUserFields>
        <add Name="Id" />
        <add Name="Link" />
        <add Name="Module" />
        <add Name="Section" />
        <add Name="Message" />
        <add Name="CreatedBy" />
        <add Name="CreatedOn" />
        <add Name="ModifiedBy" />
        <add Name="ModifiedOn" />
      </IndexUserFields>
    </IndexSet>
    <IndexSet SetName="SecondIndexSet" IndexPath="F:\Temp\Examine\SimpleIndexSet2">
      <IndexUserFields>
        <add Name="Id" />
        <add Name="Link" />
        <add Name="Module" />
        <add Name="Section" />
        <add Name="Message" />
        <add Name="CreatedBy" />
        <add Name="CreatedOn" />
        <add Name="ModifiedBy" />
        <add Name="ModifiedOn" />
      </IndexUserFields>
    </IndexSet>
  </ExamineLuceneIndexSets>

My Package.config looks like this

<packages>
  <package id="EntityFramework" version="6.1.3" targetFramework="net452" />
  <package id="Examine" version="0.1.70.0" targetFramework="net452" />
  <package id="Lucene.Net" version="2.9.4.1" targetFramework="net452" />
  <package id="SharpZipLib" version="0.86.0" targetFramework="net452" />
</packages>

Can you please let me know, what I am missing? Thanks

GroupedOr with a boosted stop word causes an Object Null Reference

I ran into a problem today where if I boosted a stop word and passed it to a GroupedOr, it would explode with an Object Null Reference Exception in Examine.LuceneEngine.SearchCriteria.LuceneSearchCriteria line 308.

I was using Umbraco 7.1.8. I haven't had time to see if there are easier ways to reproduce.

This is a trimmed down, example version of the code I was using to cause the problem.
var searchPhrase = "the united states";
var searchTerms = searchPhrase.RemoveStopWords().Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
var siteSearcher = ExamineManager.Instance.SearchProviderCollection["ExternalSearcher"];
var searchCriteria = siteSearcher.CreateSearchCriteria(BooleanOperation.Or);
var query = searchCriteria.GroupedOr(new [] {"nodeName", "navigationTitle"}, searchTerms.Select(t => t.Boost(10)).ToArray());

In case anyone runs into this, the quick word around is to use this cool string extension method I found called RemoveStopWords(). You can just strip the stop words out before you search with them.

Allow settings parentNodeId per content type

Currently on an indexer, you can set a parentNodeId, however the umbraco content indexer can index both content and media so it's impossible to set a parentNodeId that is relevant for both. We should be allowed to set one per content type.

Stacktrace when entering search term with unmatched double quotes

Entering a search term that contains an unmatched number of doulbe-quote characters (") results in a stacktrace (below). If the number of double-quote characters is even (0, 2, 4) it works fine, but having an unmatched one fails. An example search string that causes this error is as follows:

http://[removed]/search?q=zzz"

The site is using an older version of Umbraco (I believe it's 7.1.6), so the line number in the stacktrace (62) doesn't match up the the latest version of Examine (looking at the code I think it should be line 79 in the current version).

I realise this isn't the most helpful bug report - I don't have access to the source of the website, so I'm afraid I can't give any useful information about exact versions of software in use (and won't be able to text a fixed version).

~rbsec

Last Error: System.Web.HttpCompileException
Controller: SearchResultPage
Action: ArticleList
Exception: mscorlib
Length cannot be less than zero. Parameter name: length

System.String.InternalSubStringWithChecks(Int32 startIndex, Int32 length, Boolean fAlwaysCopy)
Examine.StringExtensions.RemoveStopWords(String searchText) in x:\Projects\Examine\Examine\Projects\Examine\StringExtensions.cs:line 62
Umbraco.Extensions.Services.SiteSearchService.GetPagesAndEvents(String searchTerm, Nullable`1 enf, Nullable`1 wildcardMaxLenth)
Umbraco.Extensions.Services.ContentService.SearchResults_viewModelGet(String queryString, Nullable`1 enf, Nullable`1 wildcardMaxLength)
Umbraco.Extensions.Controllers.SearchResultPageController.ArticleList()
lambda_method(Closure , ControllerBase , Object[] )
System.Web.Mvc.ReflectedActionDescriptor.Execute(ControllerContext controllerContext, IDictionary`2 parameters) 

Problem with GroupedNot

When using GroupedNot method with one field and multiple values only first value is added to query:

E.g.
searchCriteria.NodeTypeAlias("myDocumentTypeAlias");
searchCriteria.GroupedNot(new[] { "id" }.ToList(), new [] {"1","2","3"});

output query:
LuceneQuery: {+__NodeTypeAlias:myDocumentTypeAlias +(-id:1)}

Also, produced query is not working because of additional + sign right before opening bracket.
Correct query should looks like this:

+__NodeTypeAlias:myDocumentTypeAlias -id:(1 2 3)
or
+__NodeTypeAlias:myDocumentTypeAlias -id:1 -id:2 -id:3
or
+__NodeTypeAlias:myDocumentTypeAlias -(id:1 id:2 id:3)

Exception in background thread killing w3wp

Seeing this exception in Umbraco v8 when displaying the Examine dashboard.

   Lucene.Net.Store.AlreadyClosedException
   à Lucene.Net.Index.IndexReader.EnsureOpen()
   à Lucene.Net.Index.IndexReader.IncRef()
   à Examine.LuceneEngine.Cru.SearcherManager.AcquireSearcher()
   à Examine.LuceneEngine.Cru.SearcherManager.get_IsSearcherCurrent()
   à Examine.LuceneEngine.Cru.NrtManager.MaybeReopen(Boolean)
   à Examine.LuceneEngine.Cru.NrtManagerReopener.Start()
   à System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   à System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   à System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
   à System.Threading.ThreadHelper.ThreadStart()

No idea why the thing is throwing but, throwing in a background thread kills w3wp entirely. Creating the issue to keep a reference & details.

Umbraco Examine internal index becomes corrupt regularly

Hi,

On an Umbraco 7.3.2 instance with Examine 0.1.68 installed, we have the out of the box internal index set up:

ExamineIndex.config:

<!-- The internal index set used by Umbraco back-office - DO NOT REMOVE --> <IndexSet SetName="InternalIndexSet" IndexPath="~/App_Data/TEMP/ExamineIndexes/Internal/"/>

ExamineSettings.config:

<ExamineIndexProviders> <providers> <add name="InternalIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine" supportUnpublished="true" supportProtected="true" analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net"/> </providers> </ExamineIndexProviders>
<ExamineSearchProviders defaultProvider="ExternalSearcher"> <providers> <add name="InternalSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net" enableLeadingWildcard="true" enableDefaultEventHandler="true"/> </providers> </ExamineSearchProviders>
We have enabled wildcard search and updates on content saving for this internal index using these settings:
enableLeadingWildcard="true"
enableDefaultEventHandler="true"

This works fine most of the time but then it stops working completely, with no error in the logs. We can see the index has become corrupt because searching fo some content in Umbraco brings no results back. After doing a full site republish the issue gets fixed and the index is back to normal, but this happens regularly.

Is there something we can do to stop the internal index becoming corrupt?

Thank you.

Support of Lucene 3.0.3 and .net 4.5

Current version of Examine built on top of Lucene 2.9.4.1 and target .net framework is 4.0
It would be nice to have it built on top of the latest verion of Lucene (currently 3.0.3) and .net v4.5.1

Examine indexing unpublished nodes despite SupportUnpublishedContent = false

Examine indexes published child nodes of unpublished parents both while rebuilding the index and while listening to AfterUpdateDocumentCache/AfterClearDocumentCache.

When rebuilding an index from scratch, published child nodes of unpublished parents are included.
I gave up digging for answers after following:

@UmbracoExamine\BaseUmbracoIndexer.cs line 315

    protected virtual XDocument GetXDocument(string xPath, string type)

        if (this.SupportUnpublishedContent)
        {
            return DataService.ContentService.GetLatestContentByXPath(xPath);
        }
        else
        {
            return DataService.ContentService.GetPublishedContentByXPath(xPath);
        }

@UmbracoExamine\DataServices\UmbracoContentService.cs line 41

    public XDocument GetPublishedContentByXPath(string xpath)
    {
        return library.GetXmlNodeByXPath(xpath).ToXDocument();
    }

@Umbraco.Web\umbraco.presentation\library.cs line 1416

        //TODO: WTF, why is this here? This won't matter if there's an UmbracoContext or not, it will call the same underlying method!
        // only difference is that the UmbracoContext way will check if its in preview mode.
        private static XmlDocument GetThreadsafeXmlDocument()
        {
            return UmbracoContext.Current != null
                       ? UmbracoContext.Current.GetXml()
                       : content.Instance.XmlContent;
        }

        /// <summary>
        /// Queries the umbraco Xml cache with the specified Xpath query
        /// </summary>
        /// <param name="xpathQuery">The XPath query</param>
        /// <returns>Returns nodes matching the xpath query as a XpathNodeIterator</returns>
        public static XPathNodeIterator GetXmlNodeByXPath(string xpathQuery)
        {
            XPathNavigator xp = GetThreadsafeXmlDocument().CreateNavigator();

            return xp.Select(xpathQuery);
        }

@Umbraco.Web\umbraco.presentation\UmbracoContext.cs line 84

        public XmlDocument GetXml()
        {
            var umbracoContext = Umbraco.Web.UmbracoContext.Current;
            var cache = umbracoContext.ContentCache.InnerCache as Umbraco.Web.PublishedCache.XmlPublishedCache.PublishedContentCache;
            if (cache == null)
                throw new InvalidOperationException("Unsupported IPublishedContentCache, only the Xml one is supported.");

            return cache.GetXml(umbracoContext, umbracoContext.InPreviewMode);
        }

It would seem umbraco's published document cache is the root cause

Also, when unpublishing a node the AfterClearDocumentCache does not fire for the children of an unpublished node, leaving children in the index.

RebuildIndex doesn't need to clear out the index, we can just ctor a new Writer

Currently we clear out the documents to rebuild but that is unnecessary, here's the Lucene docs:

The create argument to the constructor determines whether a new index is created, or whether an existing index is opened. Note that you can open an index with create=true even while readers are using the index. The old readers will continue to search the "point in time" snapshot they had opened, and won't see the newly created index until they re-open. There are also constructors with no create argument which will create a new index if there is not already an index at the provided path and otherwise open the existing index.

https://lucene.apache.org/core/2_9_4/api/all/org/apache/lucene/index/IndexWriter.html

When using German Analyzer for SearchProvider and IndexProvider, query is incorect.

Hi,

When using _GermanAnalyzer_ for search provider, the _LuceneBooleanOperation_ Compile() method is generating query that looks like this:
{ SearchIndexType: content, LuceneQuery: +(+(contents:searchedword*)) +__IndexType:con }

In the other hand, the _StandardAnalyzer_, which is being used for English language, as search provider, generate following query:
{ SearchIndexType: content, LuceneQuery: +(+(contents:searchedword*)) +__IndexType:content }

After further investingation, it seems that the field _IndexType is being tokenized and stemmed by the _GermanAnalyzer. So from word content we get word con.
With search condition __IndexType:con, Lucene will return 0 results, as the __IndexType has only phrase : content or media.

I'm not sure how to fix it, as the project is complex.
After brief investigation I've found that following line is missing a 4th parameter, that would prohibit this field from being analyzed:

this.search.FieldInternal( LuceneExamineIndexer.IndexTypeFieldName, new ExamineValue(Examineness.Explicit, this.search.SearchIndexType.ToString().ToLower()), BooleanClause.Occur.MUST);

I've a work around for this now, but it is hacky.
When I will have some spare time I will try to create pull request.

Deletions do not commit to index until a subsequent re-index of another node.

I'm using Examine on an MVC web application and have had issues with deletions to the index not commiting to the Lucene index. My project has been setup with a DataService that implements an ISimpleDataService and quite happily indexes the data in my Entity Framework database.

However I have discovered that when issuing a delete operation through the ExamineManager class, the node passed to it is not deleted immediately from the index. The node will eventually delete when a new or existing node is indexed, however until that time the node remains in the search index and appears in search results, which causes a 404 within my application as the corresponding database has already been removed while the index entry lingers in the Lucene index.

I decided to have a peak into the source files for Examine to try and understand what is going on and in the file LuceneIndexer in the project path Examine/LuceneEngine/Providers/LuceneIndexer.cs I believe I have found a bug/erroneous function call which I believe is causing this issue.

On line 1521 of the aforementioned file is the following method.

[SecuritySafeCritical]
private void ProcessQueueItem(IndexOperation item, ICollection<IndexedNode> indexedNodes, IndexWriter writer)
{
    switch (item.Operation)
    {
        case IndexOperationType.Add:
            if (ValidateDocument(item.Item.DataToIndex))
            {
                //var added = ProcessIndexQueueItem(item, inMemoryWriter);
                var added = ProcessIndexQueueItem(item, writer);
                indexedNodes.Add(added);
            }
            else
            {
                 //do the delete but no commit - it may or may not exist in the index but since it is not 
                 // valid it should definitely not be there.
                 ProcessDeleteQueueItem(item, writer, false);

                 OnIgnoringNode(new IndexingNodeDataEventArgs(item.Item.DataToIndex, int.Parse(item.Item.Id), null, item.Item.IndexType));
            }
        break;
    case IndexOperationType.Delete:
        ProcessDeleteQueueItem(item, writer, false);
        break;
    default:
        throw new ArgumentOutOfRangeException();
    }
}

For the IndexOperationType.Delete case, I believe the final parameter of the ProcessDeleteQueueItem should be set to true, as it is a flag as to whether to commit the change or not.

As it stands I believe that the current action is not being committed until a subsequent re-index action is processed and commits any outstanding actions to the index because the delete operation is not committing its own change to the index.

I've not managed to test this out yet, but was wondering if you could confirm my suspicions or not.

Kind regards,
Tim

Raw lucene sub-query

I was wondering whether there is way to add raw lucene sub-queries.

If not so, would be very difficult to implement a method like RawQuery(string rawQuery)? That would help to create complex queries using the Fluent API.

Example:

var query = searchCriteria
    .Fields("nodeName","hello")
    .And().RawQuery(" +(metaTitle:hello metaDescription:goodbye)")
    .Compile();

When app pool is shutting down there's potential for a YSOD because the BlockingCollection is disposed

YSOD:

Server Error in '/' Application.

The collection has been disposed.
Object name: 'BlockingCollection'.

Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code. 

Exception Details: System.ObjectDisposedException: The collection has been disposed.
Object name: 'BlockingCollection'.

Source Error: 


Line 1615:            }
Line 1616:            else
Line 1617:            {
Line 1618:                OnIndexingError(
Line 1619:                    new IndexingErrorEventArgs(

Source File: x:\Projects\Examine\Examine\Projects\Examine\LuceneEngine\Providers\LuceneIndexer.cs    Line: 1617 

Stack Trace: 


[ObjectDisposedException: The collection has been disposed.
Object name: 'BlockingCollection'.]
   System.Collections.Concurrent.BlockingCollection`1.CheckDisposed() +2116463
   System.Collections.Concurrent.BlockingCollection`1.TryAddWithNoTimeValidation(T item, Int32 millisecondsTimeout, CancellationToken cancellationToken) +52
   Examine.LuceneEngine.Providers.LuceneIndexer.EnqueueIndexOperation(IndexOperation op) in x:\Projects\Examine\Examine\Projects\Examine\LuceneEngine\Providers\LuceneIndexer.cs:1617
   Examine.LuceneEngine.Providers.LuceneIndexer.IndexAll(String type) in x:\Projects\Examine\Examine\Projects\Examine\LuceneEngine\Providers\LuceneIndexer.cs:829
   UmbracoExamine.BaseUmbracoIndexer.IndexAll(String type) in X:\Projects\Umbraco\Umbraco_7.4\src\UmbracoExamine\BaseUmbracoIndexer.cs:279
   UmbracoExamine.BaseUmbracoIndexer.PerformIndexRebuild() in X:\Projects\Umbraco\Umbraco_7.4\src\UmbracoExamine\BaseUmbracoIndexer.cs:353
   UmbracoExamine.BaseUmbracoIndexer.RebuildIndex() in X:\Projects\Umbraco\Umbraco_7.4\src\UmbracoExamine\BaseUmbracoIndexer.cs:265
   UmbracoExamine.UmbracoContentIndexer.RebuildIndex() in X:\Projects\Umbraco\Umbraco_7.4\src\UmbracoExamine\UmbracoContentIndexer.cs:483
   Overflow.Controllers.TestController.Index(RenderModel render) in x:\Projects\Umbraco\Umbraco_7.4\src\Umbraco.Web.UI\App_Code\UmbContactController.cs:117
   lambda_method(Closure , ControllerBase , Object[] ) +139
   System.Web.Mvc.ReflectedActionDescriptor.Execute(ControllerContext controllerContext, IDictionary`2 parameters) +229
   System.Web.Mvc.ControllerActionInvoker.InvokeActionMethod(ControllerContext controllerContext, ActionDescriptor actionDescriptor, IDictionary`2 parameters) +35
   System.Web.Mvc.Async.AsyncControllerActionInvoker.<BeginInvokeSynchronousActionMethod>b__39(IAsyncResult asyncResult, ActionInvocation innerInvokeState) +39
   System.Web.Mvc.Async.WrappedAsyncResult`2.CallEndDelegate(IAsyncResult asyncResult) +71
   System.Web.Mvc.Async.AsyncControllerActionInvoker.EndInvokeActionMethod(IAsyncResult asyncResult) +42
   System.Web.Mvc.Async.AsyncInvocationWithFilters.<InvokeActionMethodFilterAsynchronouslyRecursive>b__3d() +72
   System.Web.Mvc.Async.<>c__DisplayClass46.<InvokeActionMethodFilterAsynchronouslyRecursive>b__3f() +386
   System.Web.Mvc.Async.<>c__DisplayClass46.<InvokeActionMethodFilterAsynchronouslyRecursive>b__3f() +386
   System.Web.Mvc.Async.<>c__DisplayClass46.<InvokeActionMethodFilterAsynchronouslyRecursive>b__3f() +386
   System.Web.Mvc.Async.AsyncControllerActionInvoker.EndInvokeActionMethodWithFilters(IAsyncResult asyncResult) +42
   System.Web.Mvc.Async.<>c__DisplayClass2b.<BeginInvokeAction>b__1c() +38
   System.Web.Mvc.Async.<>c__DisplayClass21.<BeginInvokeAction>b__1e(IAsyncResult asyncResult) +186
   System.Web.Mvc.Async.AsyncControllerActionInvoker.EndInvokeAction(IAsyncResult asyncResult) +38
   System.Web.Mvc.Controller.<BeginExecuteCore>b__1d(IAsyncResult asyncResult, ExecuteCoreState innerState) +29
   System.Web.Mvc.Async.WrappedAsyncVoid`1.CallEndDelegate(IAsyncResult asyncResult) +67
   System.Web.Mvc.Controller.EndExecuteCore(IAsyncResult asyncResult) +53
   System.Web.Mvc.Async.WrappedAsyncVoid`1.CallEndDelegate(IAsyncResult asyncResult) +36
   System.Web.Mvc.Controller.EndExecute(IAsyncResult asyncResult) +38
   System.Web.Mvc.MvcHandler.<BeginProcessRequest>b__5(IAsyncResult asyncResult, ProcessRequestState innerState) +44
   System.Web.Mvc.Async.WrappedAsyncVoid`1.CallEndDelegate(IAsyncResult asyncResult) +67
   System.Web.Mvc.MvcHandler.EndProcessRequest(IAsyncResult asyncResult) +38
   System.Web.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() +399
   System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously) +137

This can be replicated by doing this in a GET request (not that you should ever do this):

var doc = Services.ContentService.GetById(CurrentPage.Id);
            var xml = doc.ToXml();
            //add an icon attribute to get indexed
            xml.Add(new XAttribute("icon", doc.ContentType.Icon));

            ApplicationContext.RestartApplicationPool(HttpContext);

            ExamineManager.Instance.IndexProviderCollection["InternalIndexer"].RebuildIndex();
            ExamineManager.Instance.IndexProviderCollection["InternalIndexer"].ReIndexNode(xml, IndexTypes.Content);

Use a new IndexWriterTracker to track IndexWriter's across LuceneSearcher and LuceneIndexer to have NRT by default

We have NRT built into Examine v1 based on the ctor overloads but not based on the standard Examine config/provider model. We could achieve this by creating an IndexWriterTracker similar to the DirectoryTracker that we use so that there is only one IndexWriter per Directory which would make NRT by default.

This has some consequences though because many of the GetIndexWriter, etc... methods are virtual and are overridden in Umbraco, so would be hard to force the usage of NRT, but perhaps we can support both and libraries that want NRT will need to adjust their overrides.

Race condition could occur when an index doesn't exist

the searcher and the indexer could simultaneously attempt to initialize an index in a lucene directory if it doesn't exist. I'm not sure if this has ever happened but the theory is in the code because there is legacy code in the searcher that makes sure that an index exists at it's location if it doesn't, but this is the responsibility of the indexer.

If there are multiple app restarts AlreadyClosedException may occur

After running some tests with high concurrency and adding multiple app restarts into the mix, we end up with error logs such as:

2015-03-30 13:25:30,307 [49] ERROR UmbracoExamine.DataServices.UmbracoLogService - [Thread 71] Provider=InternalIndexer, NodeId=-1
System.Exception: IndexSet: InternalIndexSet, Lucene.Net.Store.AlreadyClosedException: this IndexWriter is closed
at Lucene.Net.Index.IndexWriter.EnsureOpen(Boolean includePendingClose)
at Lucene.Net.Index.IndexWriter.EnsureOpen()
at Lucene.Net.Index.IndexWriter.Commit(IDictionary`2 commitUserData)
at Examine.LuceneEngine.Providers.LuceneIndexer.ForceProcessQueueItems(Boolean block) in x:\Projects\Examine\Examine\Projects\Examine\LuceneEngine\Providers\LuceneIndexer.cs:line 1456

This is due to the way that the index writer is closed in some cases during disposal. We will block the shutdown thread until the writing has finished and then commit but in some cases another thread might be trying to commit the last batch and then we close to early. To fix this we simply track the number of active entries in ForceProcessQueueItems and during disposal wait until this is zero.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.