shazwazza / examine Goto Github PK
View Code? Open in Web Editor NEWA .NET indexing and search engine powered by Lucene.Net
Home Page: https://shazwazza.github.io/Examine/
A .NET indexing and search engine powered by Lucene.Net
Home Page: https://shazwazza.github.io/Examine/
if the indexed node count == 0 there's no need for committing/merging or raising events
When running under stress tests that shuts down the appdomain very often while also trying to view search results, an exception may occur:
Lucene.Net.Store.AlreadyClosedException: this IndexReader is closed
This is because when the app domain is shutdown, the reader is closed, but at the same time another request might still be iterating it. So we need to only close the reader at the last possible moment just before the appdomain terminates.
I'm trying to search the index and have it return the results ordered, but it isn't working for me; the results are always returned in the same order. Here is my query:
BaseSearchProvider searcher = ExamineManager.Instance.SearchProviderCollection["FooSearcher"]; ISearchCriteria searchCriteria = searcher.CreateSearchCriteria(); IBooleanOperation boolOperation = searchCriteria.NodeTypeAlias("fooNodeTypeAlias"); boolOperation = boolOperation.And().OrderBy("fooName"); ISearchResults results = searcher.Search(boolOperation.Compile());
Here is my index:
<IndexSet SetName="FooIndexSet" IndexPath="~/App_Data/TEMP/ExamineIndexes/{machinename}/Foo/"> <IndexAttributeFields> <add Name="id" /> <add Name="nodeName"/> <add Name="nodeTypeAlias" /> </IndexAttributeFields> <IndexUserFields> <add Name="fooName" EnableSorting="true" /> </IndexUserFields> </IndexSet>
And here is my searcher:
<add name="FooIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine" supportUnpublished="false" supportProtected="true" analyzer="Lucene.Net.Analysis.Standard.StandardAnalyzer, Lucene.Net" />
Edit: I'm using Examine 0.1.70.0
I was wondering if it is possible to get the total number of results from the search query without loading all results into memory? My understanding is that BaseSearchProvider.Search() will load everything into memory, is this correct?
My main use for this is for paging.
Previous changes were made to examine to pass in an IEnumerable collection to be indexed which would be resolved lazily, the SimpleDataIndexer wasn't updated to support this feature.
Hi everyone,
I'm new to Lucene. I've an issue while querying the index file (path) while searching for a string in the files (say: doc files). I've looped through all the files using the following code.
string indexFileLocation = txtRootDirectory.Text.Trim();
Lucene.Net.Store.Directory dir = Lucene.Net.Store.FSDirectory.GetDirectory(indexFileLocation,
false);
Lucene.Net.Search.IndexSearcher searcher = new Lucene.Net.Search.IndexSearcher(dir, false);
Lucene.Net.Index.Term searchTerm = new Lucene.Net.Index.Term("content",
txtSearch.Text.Trim());
Lucene.Net.Search.Query query = new Lucene.Net.Search.TermQuery(searchTerm);
Lucene.Net.Search.Hits hits = searcher.Search(query);
for (int i = 0; i < hits.Length(); i++)
{
Lucene.Net.Documents.Document doc = hits.Doc(i);
StringBuilder contentValue = new StringBuilder();
contentValue.Append(doc.Get("content"));
string id = doc.Get("id");
lblSearchResults.Text += id + "<br />";
}
But unfortunately, I've been getting the same search results, the same file name as follows.
I couldn't figure out if I'm doing wrong anywhere in my code. Please help me out.
Thanks in advance.
What I need is ability to index documents with multi-value fields, eg. tags. There is no way I can add a document with many values for the same field
Hi!
I've got a custom (non umbraco) Indexer set up and working properly by looping through my custom data and creating a "SimpleDataSet" for each item. Now I am adding functionality to update the index when operations happen on the custom objects.
I have successfully set up a "Remove from Index" function to run on object delete by looking up the index nodeId for the object, and passing it to 'ExamineManager.Instance.DeleteFromIndex(...)'
Now I'd like to add operations to run on object create and update which would add just the current object to the index. I was looking at 'ExamineManager.Instance.ReIndexNode()' which expects an "XElement" as the representation of the index data, but I am unclear what format that needs to be in, or how to convert a SimpleDataSet into an XElement.
Is it possible to only index a single object? I'd rather not have to run 'ExamineManager.Instance.IndexAll()' every time something is added or changed... But perhaps that isn't possible?
We can allow having Near Real Time readers in Examine (yes even in v1!), before I only thought this possible based on the ctor but have managed to come up with a nice solution.
For example, when indexing thousands of items, it would be much better if we can queue up an iterator for the worker thread to process instead of queuing up already serialized items which can consume memory.
Hi I am trying to integrate Elasticsearch into umbraco using Examine but I hit a roadblock as there are a few places in umbraco where the SearchProvider is cast to the specific LuceneSearcher to use a few extra Search methods implemented on the Specific searcher.
Would it be possible to create two abstract methods
BaseLuceneSearcher: ISearchResults Search(ISearchCriteria searchParams, int maxResults)
LuceneSearcher: ISearchResults Search(string searchText, bool useWildcards, string indexType)
in the BaseSearchProvider, or create a Inferface for the methods that can be implemented by other providers.
this would make it possible to avoid the casts in Umbraco that makes it difficult to implement SearchProviders based on other engines than Lucene.net?
Next up would be to refactor the usage of UmbracoContentIndexer in Umbraco as it its also tied to Lucene.net but that's another story. :-)
Hi,
I searched the entire solution and I cannot find the mentioned Azure providers anywhere. Could you please point me at the right location?
Thanks.
for example, specifying a max result of 3 will return 3 results, however the TotalItemCount should return the actual total amount, not just the amount limited.
I previously had an Umbraco v7.2.8 installation running with Examine v0.1.66 for a while now. After upgrading Examine to v0.1.67, the application fails to start with the following exception occurring every 30 seconds or so.
In order to reproduce the error, create a v7.2.8 installation of Umbraco and upgrade it's Examine Nuget package to v0.1.67. When navigating to the site for the first time, the above exception should occur. When navigating to the site, the Examine indexes folder should be deleted prior to navigating to replicate. To fix, I have downgraded the Examine Nuget package to v0.1.66 which seems to allow the application to start correctly.
I understand this is probably not that much to go off. I've have checked Umbraco's log files and my system's Event Logs and nothing is logged relating to the exception. I can replicate this using a fresh install of Umbraco via Nuget. I'm guessing the issue will be related to the following changes.
The exception occurs within LuceneIndex.cs according to Visual Studio:
Currently if you use DocumentWriting event and create multiple fields with the same name, it will index just fine since Lucene supports that. However, when you search you will get a dictionary error because it is trying to add the same field to the dictionary multiple times.
Since we cannot change the dictionary result since that is a breaking change, we'll support this by doing the following:
The normal dictionary of values will contain the first value, however the there's a new method on the SearchResult
object: public IEnumerable<string> GetValues(string key)
which will give you all of the values indexed for that field. This method will never return null, if a key doesn't exist at all an empty collection is returned.
AzureDirectory on Nuget has moved to a later version of Lucene, we need to keep it as supporting 2.9.4.1 but with the bug fixes, so we'll release a separate version of that to help with some Umbraco related bits.
I look that when the indexing process is running in "it-IT" culture, and when the data to indexing contains datatypes like Double, Lucene fires an Exception.
I found the row code where it is happen:
Examine.LuceneEngine.Providers.LuceneIndexer.TryConvert<T>(string var, out object parsedVal)
merhod.
The tc.ConvertFrom(val)
row try to convert val
string to T
type. If T
is Double or Float, and if val
contains decimal digits (like "1234.567"), the method can't convert to T
because the DOT char is not the decimal separator in "it-IT" culture.
I think that I have the solution.
I look that this code solve the issue:
parsedVal = (T)tc.ConvertFrom(null, System.Globalization.CultureInfo.InvariantCulture, val);
Is this a good solution? Is it possible to apply this patch in Examine?
Thanks
We shouldn't have AutomaticallyOptimize as true by default, it should be false. Optimization comes with a large overhead and we don't really want sites to suddenly start optimizing large indexes which could cause slowness, etc...
Also note that optimization for lucene is more or less a legacy thing:
http://blog.trifork.com/2011/11/21/simon-says-optimize-is-bad-for-you/
I've setup an index using a SimpleDataIndexer that is trying to index data from a database table with around 2 million rows. I'm using Umbraco's database object to query the data which appears to use a DataReader to read a row at a time. In my SimpleDataService I'm looping through the objects returned from Umbraco and yielding a new SimpleDataSet. Am I doing something wrong or is indexing this much data just not supported?
Hi,
I have been reading the Examine's documentation and looking for in Umbraco forums, but couldn't find anything.
I was wondering if it is possible to retrieve only specific fields instead of returning the whole doc in the search results?
If not so, would it be difficult to implement this feature (I am considering to submit a PR)?
I have found the following doc/examples on the internet:
http://lucene.apache.org/core/2_9_4/api/all/org/apache/lucene/document/FieldSelector.html
http://lucene.apache.org/core/2_9_4/api/all/org/apache/lucene/document/FieldSelectorResult.html
http://kb.ucla.edu/articles/why-are-lucenes-stored-fields-so-slow-to-access
Would it be the correct way to implement this feature?
Thanks,
Alain
Examine search issue on Umbraco
During app shutdown, once cancellation is requested, do not allow rebuild or optimize
I have a simple console app to index the data from a SQL table. I am receiving the following error
Value cannot be null
at System.Web.Configuration.ProvidersHelper.InstantiateProvider(ProviderSettings providerSettings, Type providerType)
at System.Web.Configuration.ProvidersHelper.InstantiateProviders(ProviderSettingsCollection configProviders, ProviderCollection providers, Type providerType)
at Examine.ExamineManager.EnsureProviders() in X:\Projects\Examine\Examine\Projects\Examine\ExamineManager.cs:line 96
at Examine.ExamineManager.get_IndexProviderCollection() in X:\Projects\Examine\Examine\Projects\Examine\ExamineManager.cs:line 72
on this line : ExamineManager.Instance.IndexProviderCollection["Simple2Indexer"].RebuildIndex();
Here is what My app.config looks like
<configSections>
<!-- For more information on Entity Framework configuration, visit http://go.microsoft.com/fwlink/?LinkID=237468 -->
<section name="Examine" type="Examine.Config.ExamineSettings, Examine" requirePermission="false" />
<section name="ExamineLuceneIndexSets" type="Examine.LuceneEngine.Config.IndexSets, Examine" requirePermission="false" />
</configSections>
<Examine RebuildOnAppStart="false">
<ExamineIndexProviders>
<providers>
<add name="Simple2Indexer" type="Examine.LuceneEngine.Providers.SimpleDataIndexer, Examine" dataService="LucenePOC.Data.ForumDataReaderService,LucenePOC.Data" indexTypes="TestType" runAsync="false"/>
<add name="SecondIndexer" type="Examine.LuceneEngine.Providers.SimpleDataIndexer, Examine" dataService="LucenePOC.Data.ForumDataReaderService,LucenePOC.Data" indexTypes="TestType2" runAsync="false"/>
</providers>
</ExamineIndexProviders>
<ExamineSearchProviders defaultProvider="Simple2Searcher">
<providers>
<add name="Simple2Searcher" type="Examine.LuceneEngine.Providers.LuceneSearcher, Examine" />
<add name="MultiIndexSearcher" type="Examine.LuceneEngine.Providers.MultiIndexSearcher, Examine"
indexSets="Simple2IndexSet,SecondIndexSet" />
</providers>
</ExamineSearchProviders>
</Examine>
<ExamineLuceneIndexSets>
<IndexSet SetName="Simple2IndexSet" IndexPath="F:\Temp\Examine\SimpleIndexSet2">
<IndexUserFields>
<add Name="Id" />
<add Name="Link" />
<add Name="Module" />
<add Name="Section" />
<add Name="Message" />
<add Name="CreatedBy" />
<add Name="CreatedOn" />
<add Name="ModifiedBy" />
<add Name="ModifiedOn" />
</IndexUserFields>
</IndexSet>
<IndexSet SetName="SecondIndexSet" IndexPath="F:\Temp\Examine\SimpleIndexSet2">
<IndexUserFields>
<add Name="Id" />
<add Name="Link" />
<add Name="Module" />
<add Name="Section" />
<add Name="Message" />
<add Name="CreatedBy" />
<add Name="CreatedOn" />
<add Name="ModifiedBy" />
<add Name="ModifiedOn" />
</IndexUserFields>
</IndexSet>
</ExamineLuceneIndexSets>
My Package.config looks like this
<packages>
<package id="EntityFramework" version="6.1.3" targetFramework="net452" />
<package id="Examine" version="0.1.70.0" targetFramework="net452" />
<package id="Lucene.Net" version="2.9.4.1" targetFramework="net452" />
<package id="SharpZipLib" version="0.86.0" targetFramework="net452" />
</packages>
Can you please let me know, what I am missing? Thanks
I ran into a problem today where if I boosted a stop word and passed it to a GroupedOr, it would explode with an Object Null Reference Exception in Examine.LuceneEngine.SearchCriteria.LuceneSearchCriteria line 308.
I was using Umbraco 7.1.8. I haven't had time to see if there are easier ways to reproduce.
This is a trimmed down, example version of the code I was using to cause the problem.
var searchPhrase = "the united states";
var searchTerms = searchPhrase.RemoveStopWords().Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries);
var siteSearcher = ExamineManager.Instance.SearchProviderCollection["ExternalSearcher"];
var searchCriteria = siteSearcher.CreateSearchCriteria(BooleanOperation.Or);
var query = searchCriteria.GroupedOr(new [] {"nodeName", "navigationTitle"}, searchTerms.Select(t => t.Boost(10)).ToArray());
In case anyone runs into this, the quick word around is to use this cool string extension method I found called RemoveStopWords()
. You can just strip the stop words out before you search with them.
Currently on an indexer, you can set a parentNodeId, however the umbraco content indexer can index both content and media so it's impossible to set a parentNodeId that is relevant for both. We should be allowed to set one per content type.
Entering a search term that contains an unmatched number of doulbe-quote characters (") results in a stacktrace (below). If the number of double-quote characters is even (0, 2, 4) it works fine, but having an unmatched one fails. An example search string that causes this error is as follows:
http://[removed]/search?q=zzz"
The site is using an older version of Umbraco (I believe it's 7.1.6), so the line number in the stacktrace (62) doesn't match up the the latest version of Examine (looking at the code I think it should be line 79 in the current version).
I realise this isn't the most helpful bug report - I don't have access to the source of the website, so I'm afraid I can't give any useful information about exact versions of software in use (and won't be able to text a fixed version).
~rbsec
Last Error: System.Web.HttpCompileException
Controller: SearchResultPage
Action: ArticleList
Exception: mscorlib
Length cannot be less than zero. Parameter name: length
System.String.InternalSubStringWithChecks(Int32 startIndex, Int32 length, Boolean fAlwaysCopy)
Examine.StringExtensions.RemoveStopWords(String searchText) in x:\Projects\Examine\Examine\Projects\Examine\StringExtensions.cs:line 62
Umbraco.Extensions.Services.SiteSearchService.GetPagesAndEvents(String searchTerm, Nullable`1 enf, Nullable`1 wildcardMaxLenth)
Umbraco.Extensions.Services.ContentService.SearchResults_viewModelGet(String queryString, Nullable`1 enf, Nullable`1 wildcardMaxLength)
Umbraco.Extensions.Controllers.SearchResultPageController.ArticleList()
lambda_method(Closure , ControllerBase , Object[] )
System.Web.Mvc.ReflectedActionDescriptor.Execute(ControllerContext controllerContext, IDictionary`2 parameters)
This feature allows us to not have to create sub-classed indexers/searchers to use custom directories.
When using GroupedNot method with one field and multiple values only first value is added to query:
E.g.
searchCriteria.NodeTypeAlias("myDocumentTypeAlias");
searchCriteria.GroupedNot(new[] { "id" }.ToList(), new [] {"1","2","3"});
output query:
LuceneQuery: {+__NodeTypeAlias:myDocumentTypeAlias +(-id:1)}
Also, produced query is not working because of additional + sign right before opening bracket.
Correct query should looks like this:
+__NodeTypeAlias:myDocumentTypeAlias -id:(1 2 3)
or
+__NodeTypeAlias:myDocumentTypeAlias -id:1 -id:2 -id:3
or
+__NodeTypeAlias:myDocumentTypeAlias -(id:1 id:2 id:3)
Currently when shutdown is requested, all pending adds are cancelled. We want to allow a small window of opportunity for a pending add during a shutdown to get processed.
Seeing this exception in Umbraco v8 when displaying the Examine dashboard.
Lucene.Net.Store.AlreadyClosedException
à Lucene.Net.Index.IndexReader.EnsureOpen()
à Lucene.Net.Index.IndexReader.IncRef()
à Examine.LuceneEngine.Cru.SearcherManager.AcquireSearcher()
à Examine.LuceneEngine.Cru.SearcherManager.get_IsSearcherCurrent()
à Examine.LuceneEngine.Cru.NrtManager.MaybeReopen(Boolean)
à Examine.LuceneEngine.Cru.NrtManagerReopener.Start()
à System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
à System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
à System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
à System.Threading.ThreadHelper.ThreadStart()
No idea why the thing is throwing but, throwing in a background thread kills w3wp entirely. Creating the issue to keep a reference & details.
GetValues should return a result even if there is only one value, currently it will only return values if there is more than one value.
To work around this currently you'd have to Union both the GetValues("myField") and the result["myField"] value
Hi,
On an Umbraco 7.3.2 instance with Examine 0.1.68 installed, we have the out of the box internal index set up:
ExamineIndex.config:
<!-- The internal index set used by Umbraco back-office - DO NOT REMOVE --> <IndexSet SetName="InternalIndexSet" IndexPath="~/App_Data/TEMP/ExamineIndexes/Internal/"/>
ExamineSettings.config:
<ExamineIndexProviders> <providers> <add name="InternalIndexer" type="UmbracoExamine.UmbracoContentIndexer, UmbracoExamine" supportUnpublished="true" supportProtected="true" analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net"/> </providers> </ExamineIndexProviders>
<ExamineSearchProviders defaultProvider="ExternalSearcher"> <providers> <add name="InternalSearcher" type="UmbracoExamine.UmbracoExamineSearcher, UmbracoExamine" analyzer="Lucene.Net.Analysis.WhitespaceAnalyzer, Lucene.Net" enableLeadingWildcard="true" enableDefaultEventHandler="true"/> </providers> </ExamineSearchProviders>
We have enabled wildcard search and updates on content saving for this internal index using these settings:
enableLeadingWildcard="true"
enableDefaultEventHandler="true"
This works fine most of the time but then it stops working completely, with no error in the logs. We can see the index has become corrupt because searching fo some content in Umbraco brings no results back. After doing a full site republish the issue gets fixed and the index is back to normal, but this happens regularly.
Is there something we can do to stop the internal index becoming corrupt?
Thank you.
I don't know if this is a known issue, but:
Query strings containing lucene-recognizable boolean operators causes QueryParseException.
These include: (!, ||, &&, NOT, OR, AND).
Example from our.umbraco.org: https://our.umbraco.org/search?q=OR
The same happends with other umbraco sites using examine.
Current version of Examine built on top of Lucene 2.9.4.1 and target .net framework is 4.0
It would be nice to have it built on top of the latest verion of Lucene (currently 3.0.3) and .net v4.5.1
Examine indexes published child nodes of unpublished parents both while rebuilding the index and while listening to AfterUpdateDocumentCache/AfterClearDocumentCache.
When rebuilding an index from scratch, published child nodes of unpublished parents are included.
I gave up digging for answers after following:
@UmbracoExamine\BaseUmbracoIndexer.cs line 315
protected virtual XDocument GetXDocument(string xPath, string type)
if (this.SupportUnpublishedContent)
{
return DataService.ContentService.GetLatestContentByXPath(xPath);
}
else
{
return DataService.ContentService.GetPublishedContentByXPath(xPath);
}
@UmbracoExamine\DataServices\UmbracoContentService.cs line 41
public XDocument GetPublishedContentByXPath(string xpath)
{
return library.GetXmlNodeByXPath(xpath).ToXDocument();
}
@Umbraco.Web\umbraco.presentation\library.cs line 1416
//TODO: WTF, why is this here? This won't matter if there's an UmbracoContext or not, it will call the same underlying method!
// only difference is that the UmbracoContext way will check if its in preview mode.
private static XmlDocument GetThreadsafeXmlDocument()
{
return UmbracoContext.Current != null
? UmbracoContext.Current.GetXml()
: content.Instance.XmlContent;
}
/// <summary>
/// Queries the umbraco Xml cache with the specified Xpath query
/// </summary>
/// <param name="xpathQuery">The XPath query</param>
/// <returns>Returns nodes matching the xpath query as a XpathNodeIterator</returns>
public static XPathNodeIterator GetXmlNodeByXPath(string xpathQuery)
{
XPathNavigator xp = GetThreadsafeXmlDocument().CreateNavigator();
return xp.Select(xpathQuery);
}
@Umbraco.Web\umbraco.presentation\UmbracoContext.cs line 84
public XmlDocument GetXml()
{
var umbracoContext = Umbraco.Web.UmbracoContext.Current;
var cache = umbracoContext.ContentCache.InnerCache as Umbraco.Web.PublishedCache.XmlPublishedCache.PublishedContentCache;
if (cache == null)
throw new InvalidOperationException("Unsupported IPublishedContentCache, only the Xml one is supported.");
return cache.GetXml(umbracoContext, umbracoContext.InPreviewMode);
}
It would seem umbraco's published document cache is the root cause
Also, when unpublishing a node the AfterClearDocumentCache does not fire for the children of an unpublished node, leaving children in the index.
Currently we clear out the documents to rebuild but that is unnecessary, here's the Lucene docs:
The create argument to the constructor determines whether a new index is created, or whether an existing index is opened. Note that you can open an index with create=true even while readers are using the index. The old readers will continue to search the "point in time" snapshot they had opened, and won't see the newly created index until they re-open. There are also constructors with no create argument which will create a new index if there is not already an index at the provided path and otherwise open the existing index.
https://lucene.apache.org/core/2_9_4/api/all/org/apache/lucene/index/IndexWriter.html
Hi,
When using _GermanAnalyzer_ for search provider, the _LuceneBooleanOperation_ Compile() method is generating query that looks like this:
{ SearchIndexType: content, LuceneQuery: +(+(contents:searchedword*)) +__IndexType:con }
In the other hand, the _StandardAnalyzer_, which is being used for English language, as search provider, generate following query:
{ SearchIndexType: content, LuceneQuery: +(+(contents:searchedword*)) +__IndexType:content }
After further investingation, it seems that the field _IndexType is being tokenized and stemmed by the _GermanAnalyzer. So from word content we get word con.
With search condition __IndexType:con, Lucene will return 0 results, as the __IndexType has only phrase : content or media.
I'm not sure how to fix it, as the project is complex.
After brief investigation I've found that following line is missing a 4th parameter, that would prohibit this field from being analyzed:
this.search.FieldInternal( LuceneExamineIndexer.IndexTypeFieldName, new ExamineValue(Examineness.Explicit, this.search.SearchIndexType.ToString().ToLower()), BooleanClause.Occur.MUST);
I've a work around for this now, but it is hacky.
When I will have some spare time I will try to create pull request.
I'm using Examine on an MVC web application and have had issues with deletions to the index not commiting to the Lucene index. My project has been setup with a DataService that implements an ISimpleDataService and quite happily indexes the data in my Entity Framework database.
However I have discovered that when issuing a delete operation through the ExamineManager class, the node passed to it is not deleted immediately from the index. The node will eventually delete when a new or existing node is indexed, however until that time the node remains in the search index and appears in search results, which causes a 404 within my application as the corresponding database has already been removed while the index entry lingers in the Lucene index.
I decided to have a peak into the source files for Examine to try and understand what is going on and in the file LuceneIndexer in the project path Examine/LuceneEngine/Providers/LuceneIndexer.cs I believe I have found a bug/erroneous function call which I believe is causing this issue.
On line 1521 of the aforementioned file is the following method.
[SecuritySafeCritical]
private void ProcessQueueItem(IndexOperation item, ICollection<IndexedNode> indexedNodes, IndexWriter writer)
{
switch (item.Operation)
{
case IndexOperationType.Add:
if (ValidateDocument(item.Item.DataToIndex))
{
//var added = ProcessIndexQueueItem(item, inMemoryWriter);
var added = ProcessIndexQueueItem(item, writer);
indexedNodes.Add(added);
}
else
{
//do the delete but no commit - it may or may not exist in the index but since it is not
// valid it should definitely not be there.
ProcessDeleteQueueItem(item, writer, false);
OnIgnoringNode(new IndexingNodeDataEventArgs(item.Item.DataToIndex, int.Parse(item.Item.Id), null, item.Item.IndexType));
}
break;
case IndexOperationType.Delete:
ProcessDeleteQueueItem(item, writer, false);
break;
default:
throw new ArgumentOutOfRangeException();
}
}
For the IndexOperationType.Delete case, I believe the final parameter of the ProcessDeleteQueueItem should be set to true, as it is a flag as to whether to commit the change or not.
As it stands I believe that the current action is not being committed until a subsequent re-index action is processed and commits any outstanding actions to the index because the delete operation is not committing its own change to the index.
I've not managed to test this out yet, but was wondering if you could confirm my suspicions or not.
Kind regards,
Tim
I was wondering whether there is way to add raw lucene sub-queries.
If not so, would be very difficult to implement a method like RawQuery(string rawQuery)? That would help to create complex queries using the Fluent API.
Example:
var query = searchCriteria
.Fields("nodeName","hello")
.And().RawQuery(" +(metaTitle:hello metaDescription:goodbye)")
.Compile();
Currently the DataService.GetAllData puts the result into memory and then calls the AddNodesToIndex with the memory blob, but we can just iterate of the enumerable and iteratively call AddNodesToIndex. Then we are not doubling up on mem usage
YSOD:
Server Error in '/' Application.
The collection has been disposed.
Object name: 'BlockingCollection'.
Description: An unhandled exception occurred during the execution of the current web request. Please review the stack trace for more information about the error and where it originated in the code.
Exception Details: System.ObjectDisposedException: The collection has been disposed.
Object name: 'BlockingCollection'.
Source Error:
Line 1615: }
Line 1616: else
Line 1617: {
Line 1618: OnIndexingError(
Line 1619: new IndexingErrorEventArgs(
Source File: x:\Projects\Examine\Examine\Projects\Examine\LuceneEngine\Providers\LuceneIndexer.cs Line: 1617
Stack Trace:
[ObjectDisposedException: The collection has been disposed.
Object name: 'BlockingCollection'.]
System.Collections.Concurrent.BlockingCollection`1.CheckDisposed() +2116463
System.Collections.Concurrent.BlockingCollection`1.TryAddWithNoTimeValidation(T item, Int32 millisecondsTimeout, CancellationToken cancellationToken) +52
Examine.LuceneEngine.Providers.LuceneIndexer.EnqueueIndexOperation(IndexOperation op) in x:\Projects\Examine\Examine\Projects\Examine\LuceneEngine\Providers\LuceneIndexer.cs:1617
Examine.LuceneEngine.Providers.LuceneIndexer.IndexAll(String type) in x:\Projects\Examine\Examine\Projects\Examine\LuceneEngine\Providers\LuceneIndexer.cs:829
UmbracoExamine.BaseUmbracoIndexer.IndexAll(String type) in X:\Projects\Umbraco\Umbraco_7.4\src\UmbracoExamine\BaseUmbracoIndexer.cs:279
UmbracoExamine.BaseUmbracoIndexer.PerformIndexRebuild() in X:\Projects\Umbraco\Umbraco_7.4\src\UmbracoExamine\BaseUmbracoIndexer.cs:353
UmbracoExamine.BaseUmbracoIndexer.RebuildIndex() in X:\Projects\Umbraco\Umbraco_7.4\src\UmbracoExamine\BaseUmbracoIndexer.cs:265
UmbracoExamine.UmbracoContentIndexer.RebuildIndex() in X:\Projects\Umbraco\Umbraco_7.4\src\UmbracoExamine\UmbracoContentIndexer.cs:483
Overflow.Controllers.TestController.Index(RenderModel render) in x:\Projects\Umbraco\Umbraco_7.4\src\Umbraco.Web.UI\App_Code\UmbContactController.cs:117
lambda_method(Closure , ControllerBase , Object[] ) +139
System.Web.Mvc.ReflectedActionDescriptor.Execute(ControllerContext controllerContext, IDictionary`2 parameters) +229
System.Web.Mvc.ControllerActionInvoker.InvokeActionMethod(ControllerContext controllerContext, ActionDescriptor actionDescriptor, IDictionary`2 parameters) +35
System.Web.Mvc.Async.AsyncControllerActionInvoker.<BeginInvokeSynchronousActionMethod>b__39(IAsyncResult asyncResult, ActionInvocation innerInvokeState) +39
System.Web.Mvc.Async.WrappedAsyncResult`2.CallEndDelegate(IAsyncResult asyncResult) +71
System.Web.Mvc.Async.AsyncControllerActionInvoker.EndInvokeActionMethod(IAsyncResult asyncResult) +42
System.Web.Mvc.Async.AsyncInvocationWithFilters.<InvokeActionMethodFilterAsynchronouslyRecursive>b__3d() +72
System.Web.Mvc.Async.<>c__DisplayClass46.<InvokeActionMethodFilterAsynchronouslyRecursive>b__3f() +386
System.Web.Mvc.Async.<>c__DisplayClass46.<InvokeActionMethodFilterAsynchronouslyRecursive>b__3f() +386
System.Web.Mvc.Async.<>c__DisplayClass46.<InvokeActionMethodFilterAsynchronouslyRecursive>b__3f() +386
System.Web.Mvc.Async.AsyncControllerActionInvoker.EndInvokeActionMethodWithFilters(IAsyncResult asyncResult) +42
System.Web.Mvc.Async.<>c__DisplayClass2b.<BeginInvokeAction>b__1c() +38
System.Web.Mvc.Async.<>c__DisplayClass21.<BeginInvokeAction>b__1e(IAsyncResult asyncResult) +186
System.Web.Mvc.Async.AsyncControllerActionInvoker.EndInvokeAction(IAsyncResult asyncResult) +38
System.Web.Mvc.Controller.<BeginExecuteCore>b__1d(IAsyncResult asyncResult, ExecuteCoreState innerState) +29
System.Web.Mvc.Async.WrappedAsyncVoid`1.CallEndDelegate(IAsyncResult asyncResult) +67
System.Web.Mvc.Controller.EndExecuteCore(IAsyncResult asyncResult) +53
System.Web.Mvc.Async.WrappedAsyncVoid`1.CallEndDelegate(IAsyncResult asyncResult) +36
System.Web.Mvc.Controller.EndExecute(IAsyncResult asyncResult) +38
System.Web.Mvc.MvcHandler.<BeginProcessRequest>b__5(IAsyncResult asyncResult, ProcessRequestState innerState) +44
System.Web.Mvc.Async.WrappedAsyncVoid`1.CallEndDelegate(IAsyncResult asyncResult) +67
System.Web.Mvc.MvcHandler.EndProcessRequest(IAsyncResult asyncResult) +38
System.Web.CallHandlerExecutionStep.System.Web.HttpApplication.IExecutionStep.Execute() +399
System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously) +137
This can be replicated by doing this in a GET request (not that you should ever do this):
var doc = Services.ContentService.GetById(CurrentPage.Id);
var xml = doc.ToXml();
//add an icon attribute to get indexed
xml.Add(new XAttribute("icon", doc.ContentType.Icon));
ApplicationContext.RestartApplicationPool(HttpContext);
ExamineManager.Instance.IndexProviderCollection["InternalIndexer"].RebuildIndex();
ExamineManager.Instance.IndexProviderCollection["InternalIndexer"].ReIndexNode(xml, IndexTypes.Content);
We have NRT built into Examine v1 based on the ctor overloads but not based on the standard Examine config/provider model. We could achieve this by creating an IndexWriterTracker similar to the DirectoryTracker that we use so that there is only one IndexWriter per Directory which would make NRT by default.
This has some consequences though because many of the GetIndexWriter, etc... methods are virtual and are overridden in Umbraco, so would be hard to force the usage of NRT, but perhaps we can support both and libraries that want NRT will need to adjust their overrides.
Currently it's very strongly tied to the config file, we need to abstract this out somehow, at least make it replaceable so that we can configure it outside of a web app
the searcher and the indexer could simultaneously attempt to initialize an index in a lucene directory if it doesn't exist. I'm not sure if this has ever happened but the theory is in the code because there is legacy code in the searcher that makes sure that an index exists at it's location if it doesn't, but this is the responsibility of the indexer.
After running some tests with high concurrency and adding multiple app restarts into the mix, we end up with error logs such as:
2015-03-30 13:25:30,307 [49] ERROR UmbracoExamine.DataServices.UmbracoLogService - [Thread 71] Provider=InternalIndexer, NodeId=-1
System.Exception: IndexSet: InternalIndexSet, Lucene.Net.Store.AlreadyClosedException: this IndexWriter is closed
at Lucene.Net.Index.IndexWriter.EnsureOpen(Boolean includePendingClose)
at Lucene.Net.Index.IndexWriter.EnsureOpen()
at Lucene.Net.Index.IndexWriter.Commit(IDictionary`2 commitUserData)
at Examine.LuceneEngine.Providers.LuceneIndexer.ForceProcessQueueItems(Boolean block) in x:\Projects\Examine\Examine\Projects\Examine\LuceneEngine\Providers\LuceneIndexer.cs:line 1456
This is due to the way that the index writer is closed in some cases during disposal. We will block the shutdown thread until the writing has finished and then commit but in some cases another thread might be trying to commit the last batch and then we close to early. To fix this we simply track the number of active entries in ForceProcessQueueItems and during disposal wait until this is zero.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.