chriseldredge / lucene.net.linq Goto Github PK
View Code? Open in Web Editor NEWLINQ provider to run native queries on a Lucene.Net index
License: Other
LINQ provider to run native queries on a Lucene.Net index
License: Other
When using the fluent mappings, I am getting no results back when searching on a string containing at least one upper-case letter. It would appear that even though the ClassMap
is properly setting CaseInsensitiveKeywordAnalyzer
, this isn't being copied into the PerFieldAnalyzer
properly.
Failing unit test: https://gist.github.com/mj1856/9147377
Are array properties supported? Seems no.
Lucene supports multiple field instances with the same name for a single doc. Is this feature supported in Lucene.Net.Linq?
The Context
class in Context.cs depends on a local SearcherClientTracker
object (instance name tracker
) which itself is disposable however Context
does not expose the ability for a client app to dispose of it. This results in hemorrhaging memory when a LuceneDataContext
is instantiated for each request (identifying case was in a WCF service).
I've coded a fairly dirty and ineffective workaround by making the Context
disposable, disposing of its tracker
instance in Context.Dispose
, and disposing of the Context
instance in the LuceneDataContext.Dispose
method. This resolves the memory leak but causes an unfortunate decline in scalability (i.e. performance blows at high volume). The scalability issue is simply worked around by forcing my WCF service to act as a singleton (safe because the library is thread-safe), but is far from ideal.
The second issue might be addressed by reviewing overall library design to move heavy state to lower/lesser used abstractions in the stack, but such changes are fairly invasive and might be hard to swallow. Wondering if you have any ideas on how to address it otherwise.
Let me know if you would like to see a repro or pull request for more information.
Thanks.
I have 5 milion record index of one model.When I want read index it`s load all data and take 4 GB memory .You can see my code below.Is any way to optimize reading performance?
ReadOnlyLuceneDataProvider provider = new ReadOnlyLuceneDataProvider(FSDirectory.Open(Configuration.GetConfig(ConfigsKey.RequestUrlPath)), Lucene.Net.Util.Version.LUCENE_30);
var email = provider.AsQueryable<Pcapnet.Model.RequestUrl>();
Hi. I have the document with Id ([NumericField(Key = true)]).
For example I have 10 documents. If I insert document, I have 11 documents in session.
But if I update document, I will have 1 document in session.
using (var session = LuceneDataProvider.OpenSession())
{
var doc = session.Query().First(x => x.Id == id);
doc.Relevance++;
session.Add(doc);
}
I use Lucene.Net 3.0.3 and Lucene.Net.Linq 3.1.46.0
I'm doing something wrong?
I'm using Lucene.Net.Linq v3.1.48
I make the LuceneDataProvider a singleton. I checked the index folder and found that all index files are in the same folder. Now if I index many Majors and Universities, and query it like this:
// The returned majors contains both Majors and Universities
// (the CLR type is Major, but it returns Universities as Majors),
// but I just want it to return Majors, not Universites
var majors = provider.OpenSession<Major>().Query();
var universities = provider.OpenSession<University>().Query();
for example, the first line above will return these data:
Major { Code = "0001", Name = "Major 1", MajorSpecificField1 = "major 1" } // correct
Major { Code = "0002", Name = "Major 2", MajorSpecificField1 = "major 2" } // correct
Major { Code = "0011", Name = "University 1", MajorSpecificField1 = null } // incorrect
NOTE: The Majors and Universities are indexed using a same LuceneDataProvider. And the Major and University both have "Name" and "Code" properties. I think this might be the cause?
public class Major {
[Field(Key = true)]
public string Code { get; set; }
public string Name { get; set; }
// other properties
}
public class University {
[Field(Key = true)]
public string Code { get; set; }
public string Name { get; set; }
// other properties
}
Check this gist: https://gist.github.com/mouhong/5924744#file-program-cs
Any ideas? Thanks
This should be a fairly common issue:
I have a date field that I'm storing in the index using the Lucene.Net.Document.DateTools methods. This requires me to place a Converter in the mapping of the field. Unfortunately, this cause Lucene.Net.Linq to use a custom comparator for the sorting on that field and it's excruciatingly slow (about 5s to sort a few thousand records)! If most data in a Lucene index is (or should be) stored in a way that can be naturally sorted (if sorting is anticipated on that field) there should be a way to disable this custom comparator, but unfortunately there is no way to disable the custom comparator in the presence of a converter in the mapping.
I've dug through the classes that ultimately trigger this behaviour, between ReflectionDocumentMapper and ReflectionFieldMapper and there's a whole lot of static private methods that would need to re-implemented if I wanted to map a single field with a custom subclass of ReflectionFieldMapper.
I've looked at trying to do something with ClassMap, but I can't subclass PropertyMap because all of the constructors are internal...
In all, I'm stumped. Any clues where I could work around this?
If you think a new extension point could be made to remedy this, do you think it should be specific to the sorting behaviour (for example, a UseNaturalSorting property of FieldAttribute) or a lower-level extension point to be able to provide your own mapper?
Incidentally, this happens even if you just apply a Format option instead of a Converter (since I figure it will make a converter from the format internally.
An example of the final Lucene query it generates looks like:
+(-validstate:999 :) +(+program:221) sort by <custom:"date": Lucene.Net.Linq.Search.NonGenericConvertableFieldComparatorSource>!.Take(10)
My organization is playing with the idea of utilizing this great library to implement a NoSQL-ish data store based on Lucene.NET to improve accessibility from our .NET products.
In my POC testing I've noticed that when multiple processes attempt to open a single index to search, it results in a LockObtainFailedException
exception when used with (at least) the FSDirectory.
This exception is the result of attempting to blindly open an IndexWriter when instantiating the LuceneDataProvider
class (LuceneDataProvider.cs:90) after another application (or possibly even the same one) already has an open LuceneDataProvider
instance.
I've implemented a hack to workaround this limitation for the time being, but wanted to notify you of the issue before losing track of it. I'll describe the hack here for anyone else that encounters it, but am happy to submit a pull request if you feel it is appropriate.
Thanks.
LuceneDataProvider:107:
get
{
+ if (Lucene.Net.Index.IndexWriter.IsLocked(directory))
+ return null;
lock (sync)
LuceneDataProvider:201:
if (writerIsExternal) return;
+ if (IndexWriter != null)
IndexWriter.Dispose();
}
You have two test-methods, which actually fail on systems with german culture:
FieldMappingInfoBuilderNumericFieldTests.CopyFromDocument
FieldMappingInfoBuilderNumericFieldTests.CopyToDocument
String lengths are both 4. Strings differ at index 1.
Expected: "1.34"
But was: "1,34"
------------^
I suppose you somehow inject an invariant CultureInfo
-instance (or by attribution) into your reflection based mapping.
Here is a console application I made to try Lucene.Net.Linq.
When I run this code i have this result :
Id : 10, Name : Product 10, Category : test 10
Id : 1, Name : Product 1, Category : test 1
Id : 10, Name : Product 10, Category : test 10
Id : 1, Name : Product 1, Category : test 1
Why do i get duplicate rows
The code :
class Program
{
const string IndexPath = @"C:\temp\Lucense\products";
static public IEnumerable<Product> GetProducts()
{
var result = new List<Product>();
for (int i = 0; i < 10000; i++)
{
result.Add(new Product() { Id = i, Category = "test " + i, Name = "Product " + i });
}
return result;
}
static void Main(string[] args)
{
var directory = FSDirectory.Open(IndexPath);
var provider = new LuceneDataProvider(directory, Lucene.Net.Util.Version.LUCENE_30);
using (var session = provider.OpenSession<Product>())
{
//GetProducts().ToList().ForEach(i => session.Add(i));
var query = session.Query().Where(i => i.Name.Contains("Product 1")).Where(i => i.Id < 100).Where(i => i.Category.Contains("test"));
query.ToList().ForEach(i => Console.WriteLine(i));
}
Console.WriteLine("Done !");
Console.ReadLine();
}
}
A common use case is to execute a query and display total hits while only retrieving the first page of results. Currently this requires executing a query twice: once to count all results and again to retrieve the first N items.
Provide a LINQ extension that allows clients to register a callback or output object that contains metadata such as total hits, top score and other metadata.
I have simple class like:
public class Data
{
[NumericField(Key=true)]
public Id { get; set; }
[Field]
public Name { get; set;}
}
It is never selected from catalog because executionContext.Filter passed to searcher in LuceneQueryExecutor.ExecuteCollection contains filter (+Id:*) that filters out all results.
Changing it to ordinary Field fixes the issue.
thanks for your help so far. I've got stuck trying to remove/update items.
My PieReflection mapper sets a constant DocumentKey, so I'm thinking you remove, then add it again? It doesn't seem to be taking things out of the index... I suspect I am doing it wrong though...
private void UpdateDocumentInIndex<T>(PieDocument document) where T : new()
{
using (var session = _provider.OpenSession<T>(new PieReflectionDocumentMapper<T>((Version)Version.LUCENE_30, this)))
{
session.Delete((T) document.Data);
}
using (var session = _provider.OpenSession<T>(new PieReflectionDocumentMapper<T>((Version)Version.LUCENE_30, this)))
{
session.Add((T)document.Data);
}
}
I've been trying to do some var users = q.Where(u => u.Name == "Firstname Lastname");
queries, but am getting incorrect results.
I can replicate the issue by adding another test to CustomWhereTests.cs
[Test]
public void WhereUsingExpression()
{
AddDocument(new SampleDocument { Name = "Documents Bill", Id = "X.Y.1.2" });
AddDocument(new SampleDocument { Name = "Bills Document", Id = "X.Z.1.3" });
var documents = provider.AsQueryable<SampleDocument>();
var result = documents.Where(d => d.Name == "Bills Document");
//Fails, no results found
Assert.That(result.Single().Name, Is.EqualTo("Bills Document"));
}
I think this is because the query term ("Firstname Lastname") gets pulled into two parts by the . Apparently a TermQuery is the correct thing to generate. Is there some way to hint my model to do this? Maybe a QueryType.ExactMatch is required?
I'm also a second issue, getting extra results when querying on a q.Where(u => u.Name.StartsWith("Oli") || u.Name.Contains("Oli")
. I get results coming back with names that don't have "Oli" in them, but if I search for "Oliv" it returns the correct results.
Is there something I am missing? thanks again in advance.
I have created a fluent map:
ClassMap<Record> map = new ClassMap<Record>(Lucene.Net.Util.Version.LUCENE_30);
map.Property(…);
…
LuceneDataProvider provider = new …;
Then provider.AsQueryable<Record>()
automatically uses my map. But provider.OpenSession<Record>()
does not. I need to explicitly specify the document mapper: provider.OpenSession<Record>(map.ToDocumentMapper())
What's wrong am I doing or understanding?
It would appear I can only map a collection property as a field if it is declared as IEnumerable<T>
. It should really support any type that inherits from IEnumerable<T>
, including lists, arrays, IList<T>
, etc...
It appears that using a catch-all filter on LuceneQueryExecutorBase.ExecuteCollection(136)
causes 100-400 ms of unnecessary overhead in IndexSearcher.Search
when the context is read-only and working with > 50K records/index. This is caused by the QueryExecutionContext.Filter
being set to <KeyField>:*
and filtering for ALL records when not necessary.
I can't tell whether this was a necessary adjustment to get your Nuget implementation going or not, but based on the documentation, this filter is only necessary when Updates/Deletes are possible. This should never be the case when in a read-only context so the performance degradation shouldn't apply.
As a dirty workaround, I've commented out QueryModelTranslator.cs(99)
so the Filter
property is never set, but this might break whatever use case this was initially intended to address (which likely doesn't apply to my environment). I couldn't find a good way in your code to only turn this off for searches (though I didn't try that hard either).
If this is necessary for your purposes then I can fork the project or help you work through the test scenario. Otherwise, let me know if you need a repro or pull request for further detail.
Thanks.
I looked at the source, unit tests, and limited docs, but couldn't find an API for this. Is there one?
Hi,
How do you ignore properties when using Fluent mapping?
I am unable to use the attributes due to using EF6; before considering DTO classes I wanted to try and use this feature but I am having problems.
Thanks
A FieldAttribute
does not have a bool
property Store
, rather a StoreMode
property Store
, so:
Store = true
should be
Store = StoreMode.Yes
Also, there is no PorterStemAnalyzer
-class in the bundle, but a KeywordAnalyzer
-class and StandardAnalyzer
-class.
And, you should include the implementation of VersionConverter
-class!
Many times I need to get the native Query from LuceneQueryable because I need to do something not yet provided by Lucene.Net.Linq (e.g., facet search). So I think it might be better to add a method/property (e.g., GetNativeQuery()) to LuceneQueryable to return the underlying native Query, and let session.Query() return LuceneQueryable, not IQueryable, so that I can access the Query like this:
var nativeQuery = session.Query().Where(...).GetNativeQuery();
What do you think?
If would be great if the FieldAttribute and the FieldMapper would support the setting of a TermVector and a Boost value. Right now we're using these settings on some fields and they don't seem to be supported by the POCO <-> document mapping.
In our document we have the command:
doc.Add(new Field("MediumName", model.MediumName, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.YES) {Boost = 2f});
I'm trying to sort against a DateTime property, but am getting an error in
public class DateTimeConverter : TypeConverter
{
...
public override object ConvertFrom(ITypeDescriptorContext context, CultureInfo culture, object value)
{
return DateTime.SpecifyKind(DateTime.ParseExact((string) value, format, null), DateTimeKind.Utc); //exception here. value="2013-04-30t07:12:47" but format = "yyyy-MM-ddTHH:mm:ss"
}
}
I notice that the format has a capital T, but the value being parsed has a lower case t.
I will try to put together a test for this.
[Field(IndexModel.NotAnalyzed)]
public bool Flag { get; set; }
var query = session.Query().Where(x => x.Flag == true); // Always returns nothing
Expected: return results that Flag is true
Actual: return nothing
I checked the source code, and found that:
In the field mapper, BooleanConverter converts true value to string "True".
In the linq provider part, it translates Where(x => x.Flag == true) to lucene query: Flag: true. Note that it's "true", not "True".
So the query returns empty result.
Because by default the boolean field is "Analyzed", so this problem only happens when the field is manually set to IndexMode.NotAnalyzed.
How can set true to LuceneQueryPredicateExpression.AllowSpecialCharacters?
Chris, is it possible to point this package to a Solr location - passing the native lucene queries it builds from OData?
Thanks.
Edit: This WAS titled: OData: Combining substringof('value',Field) or other 'function' operator with any other clause causes error
I don't know if this is actually something that can be fixed or not, though the documentation of HandleNullPropagationOptions suggests that the null propagation can be determined by the query provider. The following article indicates that Linq ACTUALLY determines this by a hard-coded check on the assembly name of the provider: http://blogs.msdn.com/b/alexj/archive/2012/08/21/web-api-queryable-current-support-and-tentative-roadmap.aspx
So, maybe the best we can do is document this in a prominent place, or else improve the unmangling of the Linq AST to handle the 'function' style operators better in the presence of the null checks.
I'm not really sure where to start tracking this one down, but here's my setup:
I have a WebApi method, that takes an ODataQueryOptions object as its parameter. I have a DTO object that I'm mapping to my lucene index. I'm using ODataQueryOptions.Filter.ApplyTo on my lucene IQueryable. For the most part, this all works wonderfully. I have a problem that is a blocker, though: If I have a $filter that has a 'substringof' function or any other function combined with any other clause, I get an exception. For example
(substringof('jeff',Field1) and Field2 eq 'bob')
also tested as not working in combination with other clauses: startswith and endswith
Combining two or more 'eq' clauses together is fine.
Combing 'eq' and 'ne' (not equal) clauses is fine.
The error is:
Expected Left or Right to be LuceneQueryFieldExpression
System.NotSupportedException
at Lucene.Net.Linq.Transformation.TreeVisitors.BinaryToQueryExpressionTreeVisitor.VisitBinaryExpression(BinaryExpression expression)
at Remotion.Linq.Parsing.ExpressionTreeVisitor.VisitExpression(Expression expression)
at Remotion.Linq.Clauses.WhereClause.TransformExpressions(Func`2 transformation)
at Lucene.Net.Linq.Transformation.QueryModelTransformer.VisitWhereClause(WhereClause whereClause, QueryModel queryModel, Int32 index)
at Remotion.Linq.Clauses.WhereClause.Accept(IQueryModelVisitor visitor, QueryModel queryModel, Int32 index)
at Remotion.Linq.QueryModelVisitorBase.VisitBodyClauses(ObservableCollection`1 bodyClauses, QueryModel queryModel)
at Remotion.Linq.QueryModelVisitorBase.VisitQueryModel(QueryModel queryModel)
at Remotion.Linq.QueryModel.Accept(IQueryModelVisitor visitor)
at Lucene.Net.Linq.LuceneQueryExecutorBase`1.PrepareQuery(QueryModel queryModel)
at Lucene.Net.Linq.LuceneQueryExecutorBase`1.ExecuteScalar[T](QueryModel queryModel)
at Remotion.Linq.Clauses.StreamedData.StreamedScalarValueInfo.ExecuteScalarQueryModel[T](QueryModel queryModel, IQueryExecutor executor)
at Remotion.Linq.Clauses.StreamedData.StreamedScalarValueInfo.ExecuteQueryModel(QueryModel queryModel, IQueryExecutor executor)
at Remotion.Linq.QueryModel.Execute(IQueryExecutor executor)
at Remotion.Linq.QueryProviderBase.Execute(Expression expression)
at Remotion.Linq.QueryProviderBase.System.Linq.IQueryProvider.Execute[TResult](Expression expression)
at System.Linq.Queryable.Count[TSource](IQueryable`1 source)
at (My controller class)
I suspect this has something to do with how WebApi's OData support is turning the odata filter clause into a Linq query, because doing:
queryable.Where(x => x.Field1.Contains("jeff") && x.Field2 == "bob")
'by hand' in linq works just fine.
I suspect this information is probably not enough to start working on this problem, but please let me know how to gather more information about this issue. I've been working with Lucene.Net.Linq for several weeks already and for some reason have never tried this particular use case which will be very common for our users, so I'm kinda painted into a corner right now.
Is there any way to query agains another Lucence.NET commands like SpellChecker or MoreLikeThis or DocFreq?
It may be not an issue but misunderstanding, but i consider LuceneDataSource as a long living object while Sessions as short-lived. Now if session fails to commit it calls close on parent data source writer, thus making it unusable any further.
Is it expected to create LinqDataSource per session?
Trying delete/insert for key value 5501, i got exception in QueryParser:
Lucene.Net.QueryParsers.ParseException occurred
HResult=-2146233088
Message=Cannot parse '` *}': Lexical error at line 1, column 7. Encountered: after : ""
Source=Lucene.Net
StackTrace:
at Lucene.Net.QueryParsers.QueryParser.Parse(String query) in d:\Lucene.Net\FullRepo\trunk\src\core\QueryParser\QueryParser.cs:line 239
InnerException: Lucene.Net.QueryParsers.TokenMgrError
HResult=-2146232832
Message=Lexical error at line 1, column 7. Encountered: after : ""
Source=Lucene.Net
StackTrace:
at Lucene.Net.QueryParsers.QueryParserTokenManager.GetNextToken() in d:\Lucene.Net\FullRepo\trunk\src\core\QueryParser\QueryParserTokenManager.cs:line 1429
at Lucene.Net.QueryParsers.QueryParser.Jj_ntk() in d:\Lucene.Net\FullRepo\trunk\src\core\QueryParser\QueryParser.cs:line 1929
at Lucene.Net.QueryParsers.QueryParser.Term(String field) in d:\Lucene.Net\FullRepo\trunk\src\core\QueryParser\QueryParser.cs:line 1461
at Lucene.Net.QueryParsers.QueryParser.Clause(String field) in d:\Lucene.Net\FullRepo\trunk\src\core\QueryParser\QueryParser.cs:line 1383
at Lucene.Net.QueryParsers.QueryParser.Query(String field) in d:\Lucene.Net\FullRepo\trunk\src\core\QueryParser\QueryParser.cs:line 1301
at Lucene.Net.QueryParsers.QueryParser.TopLevelQuery(String field) in d:\Lucene.Net\FullRepo\trunk\src\core\QueryParser\QueryParser.cs:line 1287
at Lucene.Net.QueryParsers.QueryParser.Parse(String query) in d:\Lucene.Net\FullRepo\trunk\src\core\QueryParser\QueryParser.cs:line 223
InnerException:
Whenever I call .Count() on a LuceneQueryable I get the following exception:
[InvalidOperationException: The operands for operator 'Equal' do not match the parameters of method 'op_Equality'.]
System.Linq.Expressions.Expression.GetMethodBasedBinaryOperator(ExpressionType binaryType, Expression left, Expression right, MethodInfo method, Boolean liftToNull) +4339023
System.Linq.Expressions.Expression.Equal(Expression left, Expression right, Boolean liftToNull, MethodInfo method) +6153784
System.Linq.Expressions.Expression.MakeBinary(ExpressionType binaryType, Expression left, Expression right, Boolean liftToNull, MethodInfo method, LambdaExpression conversion) +94
Remotion.Linq.Parsing.ExpressionTreeVisitor.VisitBinaryExpression(BinaryExpression expression) +261
Remotion.Linq.Clauses.WhereClause.TransformExpressions(Func2 transformation) +41 Lucene.Net.Linq.Transformation.QueryModelTransformer.VisitWhereClause(WhereClause whereClause, QueryModel queryModel, Int32 index) +332 Remotion.Linq.QueryModelVisitorBase.VisitBodyClauses(ObservableCollection
1 bodyClauses, QueryModel queryModel) +284
Remotion.Linq.QueryModelVisitorBase.VisitQueryModel(QueryModel queryModel) +73
Lucene.Net.Linq.LuceneQueryExecutorBase1.PrepareQuery(QueryModel queryModel) +109 Lucene.Net.Linq.LuceneQueryExecutorBase
1.ExecuteScalar(QueryModel queryModel) +45
Remotion.Linq.Clauses.StreamedData.StreamedScalarValueInfo.ExecuteScalarQueryModel(QueryModel queryModel, IQueryExecutor executor) +79
Remotion.Linq.Clauses.StreamedData.StreamedScalarValueInfo.ExecuteQueryModel(QueryModel queryModel, IQueryExecutor executor) +207
Remotion.Linq.QueryProviderBase.System.Linq.IQueryProvider.Execute(Expression expression) +35
System.Linq.Queryable.Count(IQueryable`1 source) +298
when do search by MERSEDES-BENZ, lucene.net.linq builds query {text:mersedes-benz} and has no results, lucene luke builds {text:mersedes text:benz} and find all items
Hello Chris,
I recently found an interesting GitHub project for Lucene.NET:
https://github.com/NielsKuhnel/NrtManager/tree/master/Lucene.Net.Contrib.Management
Niels ported the NrtManager class (introduced in Lucene 3.5 I think) to .NET.
This class is usefull for managing single writer multiple reader scenarios without having to commit the index.
There is also a background saver and a searcher manager.
Count()/Any() are implemented using ScoreDocs.length. This means for a query such as Documents.Count(), all the metadata for each doc needs to be loaded by lucene.
If the TotalHits field were used instead, you would only have to request a single record in the search, which should improve performance greatly for large indexes.
eg)
var hits = indexSearcher.Search(query, null, 1, new Sort())
return hits.TotalHits;
instead of:
var hits = indexSearcher.Search(query, null, int.MaxValue, new Sort())
return hits.ScoreDocs.Length
Original report: OctopusDeploy/Issues#189
Also reported at: themotleyfool/NuGet.Lucene#10
Queries that do String.Contains on a null safe expression are not converted correctly.
Example where clause:
docs.Where(d => (d.Name != null ? d.Name.ToLower() : "").Contains("foo"))
First of all, thanks for your project, it's almost exactly what I have been looking for. Now for that almost :)
I was wondering if you could add some extension points to the object mapping code, so that I could:
a. define custom search/storage keys based on things other than properties of the object
b. override the re-materialization process
I want to do this as I am trying to use your library to provide search capabilities for my custom object database. My db objects are POCO and don't even have their own Id properties - these are stored in an external lookup. I need to be able to put these id's into the search index, so when I retrieve the data, I can use the stored id to find the original object using my db, and return that, rather than the reconstructed object.
I have tried forking so I could do this myself, but I am getting a compile error about a missing ../build/version.cs file.
Range queries such as this (Time is a NumericField):
(from a in data
where
a.Time >= startTime
&& a.Time < endTime
select a)
.Take(10);
are translated into two separate range queries:
+Time:{635197536000000000 TO 3155378975999999999] +Time:[288000000000 TO 635197536600000000]
eg) startTime TO EndOfTime AND BeginningOfTime TO endTime
Ideally, it should be translated into a single range query:
+Time:{635197536000000000 TO 635197536600000000]
eg) startTime TO endTime
When the index is large, this causes a large performance penalty.
On tests of an index with 50million documents:
Using a single range query - a range that covers 1million document takes 100ms.
Using the double range query pattern described above - a range that covers 1 million documents take 3 seconds.
As the range covers more of available documents, the performance penalty decreases, but is still quite noticeable.
Cannot configure IEnumerable field using fluent interface.
Property:
public IEnumerable<String> Tags { get; set; }
Mapping:
map.Property(x => x.Tags);
The field doesn't appear in the index after adding a document.
This has been making debugging problematic.
Any time I use a enum type in a query clause, I get an error about not being able to cast an Int32 to the enum type. Mapping the field with a typeconverter doesn't help, as it appears that this is not used for transforming term values, only when storing or reading. There IS a bizarre work-around. If I cast BOTH sides of the term of the query to an object, it behaves correctly.
So, this results in an exception:
provider.SaQueryable<MyMappedType>().Where(x => x.EnumField != MyEnumType.SomeValue);
This does not:
provider.SaQueryable<MyMappedType>().Where(x => ((object)x.EnumField) != ((object)MyEnumType.SomeValue));
I would like to see a fluent interface be available for configuration so the classes that represent index items do not require attributes and have a dependency on the Lucene.Net.Linq assembly.
Supposing a document like:
public class Book
{
public string Title { get; set; }
public string Author{ get; set; }
public string Text{ get; set; }
}
And a query like:
from b in books
where b.Title == "foo" || b.Author == "bar"
select new { b.Title, b.Author };
Lucene.Net.Linq should not retrieve large, possibly compressed fields like Text since the client is not using that field.
Seems like Uri
type is not supported in fluent configuring. Previously I had attribute mapping:
[Field("Url")]
public IEnumerable<Uri> Urls { get; set; }
And now I'm using fluent configration:
map.Property(x => x.Urls).ToField("Url");
This property does not appear in the index.
Version 3.2.53
If I try to do queriable.Where(x => listOfValues.Contains(x.Value))
I get an exception:
The binary operator Equal is not defined for the types 'System.String[]' and 'System.String'.
I wonder if there's a way to special-case this into a multi-term query, which should run efficiently in Lucene (although subject to the term limit, of course).
I have been trying to get a simple example working using the Fluent code. I have a simple Account
class with two properties: AccountId
and AccountName
.
public class Account
{
public int AccountId { get; set; }
public string AccountName { get; set; }
}
I am creating a directory in memory, adding two accounts and then searching for them. I am noticing that having a space in the AccountName
is breaking the search. Based on some of your examples, I can't see why this isn't working. Could you give me a little insight?
var version = Lucene.Net.Util.Version.LUCENE_30;
var mapping = new ClassMap<Account>(version);
mapping.Key(a => a.AccountId).AsNumericField();
mapping.Property(a => a.AccountName).WithTermVector.Yes();
var directory = new RAMDirectory();
var provider = new LuceneDataProvider(directory, version);
provider.Settings.EnableMultipleEntities = false;
using (var session = provider.OpenSession(mapping.ToDocumentMapper()))
{
var account1 = new Account() { AccountId = 1, AccountName = "test account", };
var account2 = new Account() { AccountId = 2, AccountName = "account test", };
session.Add(account1, account2);
}
var accounts = from account in provider.AsQueryable<Account>(mapping.ToDocumentMapper())
where account.AccountName == "test"
orderby account.Score()
select account;
foreach (var account in accounts)
{
Console.Out.WriteLine(account.AccountName);
}
Hi! Can you tell me if there is any usage of EF or with the database without DbContext? I've red the documentation and didn't saw the information.
var articlesByJohn = from a in articles
where a.Author == "John Doe" && a.PublishDate > threshold
orderby a.Title
select a;
This expression gives me in the debugger a "Children could not be evaluated"-error, and when going for a .ToList()
(or .Count()
, ...), the returned collection is empty.
I'm trying to run example from readme (it is a bit outdated) and it fails with this:
Sequence contains more than one element
at System.Linq.Enumerable.Single[TSource](IEnumerable1 source) at Lucene.Net.Linq.Util.AnalyzerExtensions.Analyze(Analyzer analyzer, String fieldName, String pattern) at Lucene.Net.Linq.Mapping.ReflectionFieldMapper
1.EvaluateExpressionToStringAndAnalyze(Object value)
at Lucene.Net.Linq.Mapping.ReflectionFieldMapper`1.CreateRangeQuery(Object lowerBound, Object upperBound, RangeType lowerRange, RangeType upperRange)
at Lucene.Net.Linq.Translation.TreeVisitors.QueryBuildingExpressionTreeVisitor.CreateRangeQuery(IFieldMappingInfo mapping, QueryType queryType, LuceneQueryPredicateExpression lowerBoundExpression, LuceneQueryPredicateExpression upperBoundExpression)
at Lucene.Net.Linq.Translation.TreeVisitors.QueryBuildingExpressionTreeVisitor.VisitLuceneQueryPredicateExpression(LuceneQueryPredicateExpression expression)
at Lucene.Net.Linq.Clauses.TreeVisitors.LuceneExpressionTreeVisitor.VisitExtensionExpression(ExtensionExpression expression)
at Remotion.Linq.Clauses.Expressions.ExtensionExpression.Accept(ExpressionTreeVisitor visitor)
at Remotion.Linq.Parsing.ExpressionTreeVisitor.VisitExpression(Expression expression)
at Remotion.Linq.Parsing.ExpressionTreeVisitor.VisitBinaryExpression(BinaryExpression expression)
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Lucene.Net.Analysis.Standard;
using Lucene.Net.Index;
using Lucene.Net.Linq;
using Lucene.Net.Linq.Mapping;
using Lucene.Net.Store;
using ServiceStack.Text;
namespace ConsoleApplication2
{
public class Article
{
public string Author { get; set; }
public string Title { get; set; }
public DateTimeOffset PublishDate { get; set; }
// Stores the field as a NumericField
[NumericField]
public long Id { get; set; }
// Stores the field as text
public int IssueNumber { get; set; }
[Field(IndexMode.NotIndexed, Store = StoreMode.Yes)]
public string BodyText { get; set; }
// Maps to field "text"
[Field("text", Store = StoreMode.No)]
public string SearchText
{
get { return string.Join(" ", new[] { Author, Title, BodyText }); }
}
// Add IgnoreFieldAttribute to properties that should not be mapped to/from Document
[IgnoreField]
public string IgnoreMe { get; set; }
}
class Program
{
static void Main(string[] args)
{
var directory = new RAMDirectory();
var writer = new IndexWriter(directory, new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30), IndexWriter.MaxFieldLength.UNLIMITED);
var provider = new LuceneDataProvider(directory, writer.Analyzer, Lucene.Net.Util.Version.LUCENE_30, writer);
// add some documents
using (var session = provider.OpenSession<Article>())
{
session.Add(new Article { Author = "John Doe", BodyText = "some body text", PublishDate = DateTimeOffset.UtcNow });
}
var articles = provider.AsQueryable<Article>();
var threshold = DateTimeOffset.UtcNow.Subtract(TimeSpan.FromDays(30));
var articlesByJohn = from a in articles
where a.Author == "John Doe" && a.PublishDate > threshold
orderby a.Title
select a;
var searchResults = from a in articles
where a.SearchText == "some search query"
select a;
Console.WriteLine(articlesByJohn.Dump());
Console.WriteLine(searchResults.Dump());
Console.Read();
}
}
}
I'm trying to save and query a DateTime as a numerical field (for performance). So i've added a custom TypeConverter to do the conversion, but it seems that, it only uses the converter when adding it.
When querying it defaults to the built in DateTimeConverter and fails with a FormatException: "String was not recognized as a valid DateTime".
And should'nt the value be fetched as a long, being that its defined as a numericalField?
The same thing works when using Attributes, and uses the custom converter both for adding and querying (and passes in a long/int64)
I've created unit tests for the failing (fluent) and working (attributes) example in this gist
https://gist.github.com/TheoAndersen/8236625
It can also be found in this commit on my fork
TheoAndersen@408bab0
/Theo
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.