chriseldredge / lucene.net.linq Goto Github PK

View Code? Open in Web Editor NEW

151.0 151.0 66.0 5.02 MB

LINQ provider to run native queries on a Lucene.Net index

License: Other

C# 100.00%

lucene.net.linq's People

Contributors

Stargazers

Watchers

lucene.net.linq's Issues

Fluent mappings aren't using per-field analyzers

When using the fluent mappings, I am getting no results back when searching on a string containing at least one upper-case letter. It would appear that even though the ClassMap is properly setting CaseInsensitiveKeywordAnalyzer, this isn't being copied into the PerFieldAnalyzer properly.

Failing unit test: https://gist.github.com/mj1856/9147377

Array support

Are array properties supported? Seems no.

Lucene supports multiple field instances with the same name for a single doc. Is this feature supported in Lucene.Net.Linq?

Memory leak in Context object

The Context class in Context.cs depends on a local SearcherClientTracker object (instance name tracker) which itself is disposable however Context does not expose the ability for a client app to dispose of it. This results in hemorrhaging memory when a LuceneDataContext is instantiated for each request (identifying case was in a WCF service).

I've coded a fairly dirty and ineffective workaround by making the Context disposable, disposing of its tracker instance in Context.Dispose, and disposing of the Context instance in the LuceneDataContext.Dispose method. This resolves the memory leak but causes an unfortunate decline in scalability (i.e. performance blows at high volume). The scalability issue is simply worked around by forcing my WCF service to act as a singleton (safe because the library is thread-safe), but is far from ideal.

The second issue might be addressed by reviewing overall library design to move heavy state to lower/lesser used abstractions in the stack, but such changes are fairly invasive and might be hard to swallow. Wondering if you have any ideas on how to address it otherwise.

Let me know if you would like to see a repro or pull request for more information.

Thanks.

Read Index

I have 5 milion record index of one model.When I want read index it`s load all data and take 4 GB memory .You can see my code below.Is any way to optimize reading performance?

ReadOnlyLuceneDataProvider provider = new ReadOnlyLuceneDataProvider(FSDirectory.Open(Configuration.GetConfig(ConfigsKey.RequestUrlPath)), Lucene.Net.Util.Version.LUCENE_30);

 var email = provider.AsQueryable<Pcapnet.Model.RequestUrl>();

Update document. Is it correct?

Hi. I have the document with Id ([NumericField(Key = true)]).
For example I have 10 documents. If I insert document, I have 11 documents in session.
But if I update document, I will have 1 document in session.

using (var session = LuceneDataProvider.OpenSession())
{
var doc = session.Query().First(x => x.Id == id);
doc.Relevance++;
session.Add(doc);
}

I use Lucene.Net 3.0.3 and Lucene.Net.Linq 3.1.46.0
I'm doing something wrong?

Search ModelType1 but also return results of ModelType2

I'm using Lucene.Net.Linq v3.1.48

I make the LuceneDataProvider a singleton. I checked the index folder and found that all index files are in the same folder. Now if I index many Majors and Universities, and query it like this:

// The returned majors contains both Majors and Universities
// (the CLR type is Major, but it returns Universities as Majors), 
// but I just want it to return Majors, not Universites
var majors = provider.OpenSession<Major>().Query();
var universities = provider.OpenSession<University>().Query();

for example, the first line above will return these data:

Major { Code = "0001", Name = "Major 1", MajorSpecificField1 = "major 1" } // correct
Major { Code = "0002", Name = "Major 2", MajorSpecificField1 = "major 2" } // correct
Major { Code = "0011", Name = "University 1", MajorSpecificField1 = null } // incorrect

NOTE: The Majors and Universities are indexed using a same LuceneDataProvider. And the Major and University both have "Name" and "Code" properties. I think this might be the cause?

public class Major {
    [Field(Key = true)]
    public string Code { get; set; }
    public string Name { get; set; }
    // other properties 
}

public class University {
    [Field(Key = true)]
    public string Code { get; set; }
    public string Name { get; set; }
    // other properties
}

Check this gist: https://gist.github.com/mouhong/5924744#file-program-cs

Any ideas? Thanks

Unable to use natural sorting with a property assigned a Converter or Format

This should be a fairly common issue:
I have a date field that I'm storing in the index using the Lucene.Net.Document.DateTools methods. This requires me to place a Converter in the mapping of the field. Unfortunately, this cause Lucene.Net.Linq to use a custom comparator for the sorting on that field and it's excruciatingly slow (about 5s to sort a few thousand records)! If most data in a Lucene index is (or should be) stored in a way that can be naturally sorted (if sorting is anticipated on that field) there should be a way to disable this custom comparator, but unfortunately there is no way to disable the custom comparator in the presence of a converter in the mapping.

I've dug through the classes that ultimately trigger this behaviour, between ReflectionDocumentMapper and ReflectionFieldMapper and there's a whole lot of static private methods that would need to re-implemented if I wanted to map a single field with a custom subclass of ReflectionFieldMapper.

I've looked at trying to do something with ClassMap, but I can't subclass PropertyMap because all of the constructors are internal...

In all, I'm stumped. Any clues where I could work around this?

If you think a new extension point could be made to remedy this, do you think it should be specific to the sorting behaviour (for example, a UseNaturalSorting property of FieldAttribute) or a lower-level extension point to be able to provide your own mapper?

Incidentally, this happens even if you just apply a Format option instead of a Converter (since I figure it will make a converter from the format internally.

An example of the final Lucene query it generates looks like:

+(-validstate:999 :) +(+program:221) sort by <custom:"date": Lucene.Net.Linq.Search.NonGenericConvertableFieldComparatorSource>!.Take(10)

No support for multiple read-only processes

My organization is playing with the idea of utilizing this great library to implement a NoSQL-ish data store based on Lucene.NET to improve accessibility from our .NET products.

In my POC testing I've noticed that when multiple processes attempt to open a single index to search, it results in a LockObtainFailedException exception when used with (at least) the FSDirectory.

This exception is the result of attempting to blindly open an IndexWriter when instantiating the LuceneDataProvider class (LuceneDataProvider.cs:90) after another application (or possibly even the same one) already has an open LuceneDataProvider instance.

I've implemented a hack to workaround this limitation for the time being, but wanted to notify you of the issue before losing track of it. I'll describe the hack here for anyone else that encounters it, but am happy to submit a pull request if you feel it is appropriate.

Thanks.

LuceneDataProvider:107:

            get
            {
+                if (Lucene.Net.Index.IndexWriter.IsLocked(directory))
+                    return null;

                lock (sync)

LuceneDataProvider:201:

            if (writerIsExternal) return;

+            if (IndexWriter != null)
                IndexWriter.Dispose();
        }

Mapping of System.Double not working

You have two test-methods, which actually fail on systems with german culture:

FieldMappingInfoBuilderNumericFieldTests.CopyFromDocument
FieldMappingInfoBuilderNumericFieldTests.CopyToDocument

  String lengths are both 4. Strings differ at index 1.
  Expected: "1.34"
  But was:  "1,34"
  ------------^

I suppose you somehow inject an invariant CultureInfo-instance (or by attribution) into your reflection based mapping.

row appears twice in result

Here is a console application I made to try Lucene.Net.Linq.

When I run this code i have this result :

Id : 10, Name : Product 10, Category : test 10
Id : 1, Name : Product 1, Category : test 1
Id : 10, Name : Product 10, Category : test 10
Id : 1, Name : Product 1, Category : test 1

Why do i get duplicate rows

The code :

class Program
    {
        const string IndexPath = @"C:\temp\Lucense\products";

        static public IEnumerable<Product> GetProducts()
        {
            var result = new List<Product>();
            for (int i = 0; i < 10000; i++)
            {
                result.Add(new Product() { Id = i, Category = "test " + i, Name = "Product " + i });
            }
            return result;
        }
        static void Main(string[] args)
        {
            var directory = FSDirectory.Open(IndexPath);

            var provider = new LuceneDataProvider(directory, Lucene.Net.Util.Version.LUCENE_30);

            using (var session = provider.OpenSession<Product>())
            {
                //GetProducts().ToList().ForEach(i => session.Add(i));

                var query = session.Query().Where(i => i.Name.Contains("Product 1")).Where(i => i.Id < 100).Where(i => i.Category.Contains("test"));

                query.ToList().ForEach(i => Console.WriteLine(i));
            }
            Console.WriteLine("Done !");
            Console.ReadLine();
        }
    }

Capture metadata on query execution

A common use case is to execute a query and display total hits while only retrieving the first page of results. Currently this requires executing a query twice: once to count all results and again to retrieve the first N items.

Provide a LINQ extension that allows clients to register a callback or output object that contains metadata such as total hits, top score and other metadata.

NumericField(Key = true) never returned from Query

I have simple class like:

public class Data
{
[NumericField(Key=true)]
public Id { get; set; }
[Field]
public Name { get; set;}
}

It is never selected from catalog because executionContext.Filter passed to searcher in LuceneQueryExecutor.ExecuteCollection contains filter (+Id:*) that filters out all results.

Changing it to ordinary Field fixes the issue.

How to use delete?

thanks for your help so far. I've got stuck trying to remove/update items.

My PieReflection mapper sets a constant DocumentKey, so I'm thinking you remove, then add it again? It doesn't seem to be taking things out of the index... I suspect I am doing it wrong though...

private void UpdateDocumentInIndex<T>(PieDocument document) where T : new()
        {
            using (var session = _provider.OpenSession<T>(new PieReflectionDocumentMapper<T>((Version)Version.LUCENE_30, this)))
            {
                session.Delete((T) document.Data);
            }
            using (var session = _provider.OpenSession<T>(new PieReflectionDocumentMapper<T>((Version)Version.LUCENE_30, this)))
            {
                session.Add((T)document.Data);
            }
        }

Incorrect results when querying by text containing a space

I've been trying to do some var users = q.Where(u => u.Name == "Firstname Lastname"); queries, but am getting incorrect results.

I can replicate the issue by adding another test to CustomWhereTests.cs

        [Test]
        public void WhereUsingExpression()
        {
            AddDocument(new SampleDocument { Name = "Documents Bill", Id = "X.Y.1.2" });
            AddDocument(new SampleDocument { Name = "Bills Document", Id = "X.Z.1.3" });

            var documents = provider.AsQueryable<SampleDocument>();

            var result = documents.Where(d => d.Name == "Bills Document");

            //Fails, no results found
            Assert.That(result.Single().Name, Is.EqualTo("Bills Document"));

        }

I think this is because the query term ("Firstname Lastname") gets pulled into two parts by the . Apparently a TermQuery is the correct thing to generate. Is there some way to hint my model to do this? Maybe a QueryType.ExactMatch is required?

I'm also a second issue, getting extra results when querying on a q.Where(u => u.Name.StartsWith("Oli") || u.Name.Contains("Oli"). I get results coming back with names that don't have "Oli" in them, but if I search for "Oliv" it returns the correct results.

Is there something I am missing? thanks again in advance.

Fluent mapping and AsQueryable/OpenSession inconsistency

I have created a fluent map:

ClassMap<Record> map = new ClassMap<Record>(Lucene.Net.Util.Version.LUCENE_30);
map.Property(…);
…
LuceneDataProvider provider = new …;

Then provider.AsQueryable<Record>() automatically uses my map. But provider.OpenSession<Record>() does not. I need to explicitly specify the document mapper: provider.OpenSession<Record>(map.ToDocumentMapper())

What's wrong am I doing or understanding?

Collection fields

It would appear I can only map a collection property as a field if it is declared as IEnumerable<T>. It should really support any type that inherits from IEnumerable<T>, including lists, arrays, IList<T>, etc...

`QueryExecutionContext.Filter` causing unnecessary overhead for read-only contexts with large indexes

It appears that using a catch-all filter on LuceneQueryExecutorBase.ExecuteCollection(136) causes 100-400 ms of unnecessary overhead in IndexSearcher.Search when the context is read-only and working with > 50K records/index. This is caused by the QueryExecutionContext.Filter being set to <KeyField>:* and filtering for ALL records when not necessary.

I can't tell whether this was a necessary adjustment to get your Nuget implementation going or not, but based on the documentation, this filter is only necessary when Updates/Deletes are possible. This should never be the case when in a read-only context so the performance degradation shouldn't apply.

As a dirty workaround, I've commented out QueryModelTranslator.cs(99) so the Filter property is never set, but this might break whatever use case this was initially intended to address (which likely doesn't apply to my environment). I couldn't find a good way in your code to only turn this off for searches (though I didn't try that hard either).

If this is necessary for your purposes then I can fork the project or help you work through the test scenario. Otherwise, let me know if you need a repro or pull request for further detail.

Thanks.

Question: Is there an attribute or fluent syntax for boosting documents in an index?

I looked at the source, unit tests, and limited docs, but couldn't find an API for this. Is there one?

Equivalent to IgnoreAttribute for Fluent Mapping

Hi,

How do you ignore properties when using Fluent mapping?

I am unable to use the attributes due to using EF6; before considering DTO classes I wanted to try and use this feature but I am having problems.

Thanks

Example not compiling

A FieldAttribute does not have a bool property Store, rather a StoreMode property Store, so:

Store = true

should be

Store = StoreMode.Yes

Also, there is no PorterStemAnalyzer-class in the bundle, but a KeywordAnalyzer-class and StandardAnalyzer-class.

And, you should include the implementation of VersionConverter-class!

Get native Query from LuceneQueryable

Many times I need to get the native Query from LuceneQueryable because I need to do something not yet provided by Lucene.Net.Linq (e.g., facet search). So I think it might be better to add a method/property (e.g., GetNativeQuery()) to LuceneQueryable to return the underlying native Query, and let session.Query() return LuceneQueryable, not IQueryable, so that I can access the Query like this:

var nativeQuery = session.Query().Where(...).GetNativeQuery();

What do you think?

Support for TermVector and Boosting in the FieldAttribute

If would be great if the FieldAttribute and the FieldMapper would support the setting of a TermVector and a Boost value. Right now we're using these settings on some fields and they don't seem to be supported by the POCO <-> document mapping.

In our document we have the command:

doc.Add(new Field("MediumName", model.MediumName, Field.Store.YES, Field.Index.ANALYZED, Field.TermVector.YES) {Boost = 2f});

DateTime parse issue when sorting

I'm trying to sort against a DateTime property, but am getting an error in

public class DateTimeConverter : TypeConverter
    {
       ... 
        public override object ConvertFrom(ITypeDescriptorContext context, CultureInfo culture, object value)
        {
            return DateTime.SpecifyKind(DateTime.ParseExact((string) value, format, null), DateTimeKind.Utc); //exception here. value="2013-04-30t07:12:47" but format = "yyyy-MM-ddTHH:mm:ss" 
        }
    }

I notice that the format has a capital T, but the value being parsed has a lower case t.

I will try to put together a test for this.

Not analyzed boolean field not working

[Field(IndexModel.NotAnalyzed)]
public bool Flag { get; set; }

var query = session.Query().Where(x => x.Flag == true); // Always returns nothing

Expected: return results that Flag is true
Actual: return nothing

I checked the source code, and found that:
In the field mapper, BooleanConverter converts true value to string "True".
In the linq provider part, it translates Where(x => x.Flag == true) to lucene query: Flag: true. Note that it's "true", not "True".

So the query returns empty result.

Because by default the boolean field is "Analyzed", so this problem only happens when the field is manually set to IndexMode.NotAnalyzed.

Is there posibility to configure AllowSpecialCharacters?

How can set true to LuceneQueryPredicateExpression.AllowSpecialCharacters?

Searching against Solr

Chris, is it possible to point this package to a Solr location - passing the native lucene queries it builds from OData?

Thanks.

Document : Always set ODataQuerySettings.HandleNullPropagation = HandleNullPropagationOption.False

Edit: This WAS titled: OData: Combining substringof('value',Field) or other 'function' operator with any other clause causes error
I don't know if this is actually something that can be fixed or not, though the documentation of HandleNullPropagationOptions suggests that the null propagation can be determined by the query provider. The following article indicates that Linq ACTUALLY determines this by a hard-coded check on the assembly name of the provider: http://blogs.msdn.com/b/alexj/archive/2012/08/21/web-api-queryable-current-support-and-tentative-roadmap.aspx

So, maybe the best we can do is document this in a prominent place, or else improve the unmangling of the Linq AST to handle the 'function' style operators better in the presence of the null checks.

Original post below

I'm not really sure where to start tracking this one down, but here's my setup:

I have a WebApi method, that takes an ODataQueryOptions object as its parameter. I have a DTO object that I'm mapping to my lucene index. I'm using ODataQueryOptions.Filter.ApplyTo on my lucene IQueryable. For the most part, this all works wonderfully. I have a problem that is a blocker, though: If I have a $filter that has a 'substringof' function or any other function combined with any other clause, I get an exception. For example
(substringof('jeff',Field1) and Field2 eq 'bob')
also tested as not working in combination with other clauses: startswith and endswith
Combining two or more 'eq' clauses together is fine.
Combing 'eq' and 'ne' (not equal) clauses is fine.

The error is:

Expected Left or Right to be LuceneQueryFieldExpression
System.NotSupportedException
at Lucene.Net.Linq.Transformation.TreeVisitors.BinaryToQueryExpressionTreeVisitor.VisitBinaryExpression(BinaryExpression expression)
   at Remotion.Linq.Parsing.ExpressionTreeVisitor.VisitExpression(Expression expression)
   at Remotion.Linq.Clauses.WhereClause.TransformExpressions(Func`2 transformation)
   at Lucene.Net.Linq.Transformation.QueryModelTransformer.VisitWhereClause(WhereClause whereClause, QueryModel queryModel, Int32 index)
   at Remotion.Linq.Clauses.WhereClause.Accept(IQueryModelVisitor visitor, QueryModel queryModel, Int32 index)
   at Remotion.Linq.QueryModelVisitorBase.VisitBodyClauses(ObservableCollection`1 bodyClauses, QueryModel queryModel)
   at Remotion.Linq.QueryModelVisitorBase.VisitQueryModel(QueryModel queryModel)
   at Remotion.Linq.QueryModel.Accept(IQueryModelVisitor visitor)
   at Lucene.Net.Linq.LuceneQueryExecutorBase`1.PrepareQuery(QueryModel queryModel)
   at Lucene.Net.Linq.LuceneQueryExecutorBase`1.ExecuteScalar[T](QueryModel queryModel)
   at Remotion.Linq.Clauses.StreamedData.StreamedScalarValueInfo.ExecuteScalarQueryModel[T](QueryModel queryModel, IQueryExecutor executor)
   at Remotion.Linq.Clauses.StreamedData.StreamedScalarValueInfo.ExecuteQueryModel(QueryModel queryModel, IQueryExecutor executor)
   at Remotion.Linq.QueryModel.Execute(IQueryExecutor executor)
   at Remotion.Linq.QueryProviderBase.Execute(Expression expression)
   at Remotion.Linq.QueryProviderBase.System.Linq.IQueryProvider.Execute[TResult](Expression expression)
   at System.Linq.Queryable.Count[TSource](IQueryable`1 source)
   at (My controller class)

I suspect this has something to do with how WebApi's OData support is turning the odata filter clause into a Linq query, because doing:
queryable.Where(x => x.Field1.Contains("jeff") && x.Field2 == "bob")
'by hand' in linq works just fine.

I suspect this information is probably not enough to start working on this problem, but please let me know how to gather more information about this issue. I've been working with Lucene.Net.Linq for several weeks already and for some reason have never tried this particular use case which will be very common for our users, so I'm kinda painted into a corner right now.

Other Lucence.NET commands

Is there any way to query agains another Lucence.NET commands like SpellChecker or MoreLikeThis or DocFreq?

LuceneDataSource gets corrupted on session rollback

It may be not an issue but misunderstanding, but i consider LuceneDataSource as a long living object while Sessions as short-lived. Now if session fails to commit it calls close on parent data source writer, thus making it unusable any further.
Is it expected to create LinqDataSource per session?

How To Get DocID ?

Invalid query for NumericField(Key=True) on some values

Trying delete/insert for key value 5501, i got exception in QueryParser:

Lucene.Net.QueryParsers.ParseException occurred
HResult=-2146233088
Message=Cannot parse '` *}': Lexical error at line 1, column 7. Encountered: after : ""
Source=Lucene.Net
StackTrace:
at Lucene.Net.QueryParsers.QueryParser.Parse(String query) in d:\Lucene.Net\FullRepo\trunk\src\core\QueryParser\QueryParser.cs:line 239
InnerException: Lucene.Net.QueryParsers.TokenMgrError
HResult=-2146232832
Message=Lexical error at line 1, column 7. Encountered: after : ""
Source=Lucene.Net
StackTrace:
at Lucene.Net.QueryParsers.QueryParserTokenManager.GetNextToken() in d:\Lucene.Net\FullRepo\trunk\src\core\QueryParser\QueryParserTokenManager.cs:line 1429
at Lucene.Net.QueryParsers.QueryParser.Jj_ntk() in d:\Lucene.Net\FullRepo\trunk\src\core\QueryParser\QueryParser.cs:line 1929
at Lucene.Net.QueryParsers.QueryParser.Term(String field) in d:\Lucene.Net\FullRepo\trunk\src\core\QueryParser\QueryParser.cs:line 1461
at Lucene.Net.QueryParsers.QueryParser.Clause(String field) in d:\Lucene.Net\FullRepo\trunk\src\core\QueryParser\QueryParser.cs:line 1383
at Lucene.Net.QueryParsers.QueryParser.Query(String field) in d:\Lucene.Net\FullRepo\trunk\src\core\QueryParser\QueryParser.cs:line 1301
at Lucene.Net.QueryParsers.QueryParser.TopLevelQuery(String field) in d:\Lucene.Net\FullRepo\trunk\src\core\QueryParser\QueryParser.cs:line 1287
at Lucene.Net.QueryParsers.QueryParser.Parse(String query) in d:\Lucene.Net\FullRepo\trunk\src\core\QueryParser\QueryParser.cs:line 223
InnerException:

Count() returning exception

Whenever I call .Count() on a LuceneQueryable I get the following exception:

[InvalidOperationException: The operands for operator 'Equal' do not match the parameters of method 'op_Equality'.]
System.Linq.Expressions.Expression.GetMethodBasedBinaryOperator(ExpressionType binaryType, Expression left, Expression right, MethodInfo method, Boolean liftToNull) +4339023
System.Linq.Expressions.Expression.Equal(Expression left, Expression right, Boolean liftToNull, MethodInfo method) +6153784
System.Linq.Expressions.Expression.MakeBinary(ExpressionType binaryType, Expression left, Expression right, Boolean liftToNull, MethodInfo method, LambdaExpression conversion) +94
Remotion.Linq.Parsing.ExpressionTreeVisitor.VisitBinaryExpression(BinaryExpression expression) +261
Remotion.Linq.Clauses.WhereClause.TransformExpressions(Func2 transformation) +41 Lucene.Net.Linq.Transformation.QueryModelTransformer.VisitWhereClause(WhereClause whereClause, QueryModel queryModel, Int32 index) +332 Remotion.Linq.QueryModelVisitorBase.VisitBodyClauses(ObservableCollection1 bodyClauses, QueryModel queryModel) +284
Remotion.Linq.QueryModelVisitorBase.VisitQueryModel(QueryModel queryModel) +73
Lucene.Net.Linq.LuceneQueryExecutorBase1.PrepareQuery(QueryModel queryModel) +109 Lucene.Net.Linq.LuceneQueryExecutorBase1.ExecuteScalar(QueryModel queryModel) +45
Remotion.Linq.Clauses.StreamedData.StreamedScalarValueInfo.ExecuteScalarQueryModel(QueryModel queryModel, IQueryExecutor executor) +79
Remotion.Linq.Clauses.StreamedData.StreamedScalarValueInfo.ExecuteQueryModel(QueryModel queryModel, IQueryExecutor executor) +207
Remotion.Linq.QueryProviderBase.System.Linq.IQueryProvider.Execute(Expression expression) +35
System.Linq.Queryable.Count(IQueryable`1 source) +298

Can not find items with special chars -,(,), issue with build query

when do search by MERSEDES-BENZ, lucene.net.linq builds query {text:mersedes-benz} and has no results, lucene luke builds {text:mersedes text:benz} and find all items

Near real time search (NrtManager)

Hello Chris,

I recently found an interesting GitHub project for Lucene.NET:
https://github.com/NielsKuhnel/NrtManager/tree/master/Lucene.Net.Contrib.Management

Niels ported the NrtManager class (introduced in Lucene 3.5 I think) to .NET.
This class is usefull for managing single writer multiple reader scenarios without having to commit the index.

There is also a background saver and a searcher manager.

Performance - Use TotalHits instead of ScoreDocs.length when computing Count()/Any()

Count()/Any() are implemented using ScoreDocs.length. This means for a query such as Documents.Count(), all the metadata for each doc needs to be loaded by lucene.
If the TotalHits field were used instead, you would only have to request a single record in the search, which should improve performance greatly for large indexes.

eg)
var hits = indexSearcher.Search(query, null, 1, new Sort())
return hits.TotalHits;

instead of:

var hits = indexSearcher.Search(query, null, int.MaxValue, new Sort())
return hits.ScoreDocs.Length

Null-safe string contains not converted correctly

Original report: OctopusDeploy/Issues#189
Also reported at: themotleyfool/NuGet.Lucene#10

Queries that do String.Contains on a null safe expression are not converted correctly.

Example where clause:

docs.Where(d => (d.Name != null ? d.Name.ToLower() : "").Contains("foo"))

IObjectMapper extension points

First of all, thanks for your project, it's almost exactly what I have been looking for. Now for that almost :)

I was wondering if you could add some extension points to the object mapping code, so that I could:

a. define custom search/storage keys based on things other than properties of the object
b. override the re-materialization process

I want to do this as I am trying to use your library to provide search capabilities for my custom object database. My db objects are POCO and don't even have their own Id properties - these are stored in an external lookup. I need to be able to put these id's into the search index, so when I retrieve the data, I can use the stored id to find the original object using my db, and return that, rather than the reconstructed object.

I have tried forking so I could do this myself, but I am getting a compile error about a missing ../build/version.cs file.

RangeQueries on NumericFields are translated to lucene as 2 seperate NumericRangeQueries, greatly decreasing performance.

Range queries such as this (Time is a NumericField):
(from a in data
where
a.Time >= startTime
&& a.Time < endTime
select a)
.Take(10);

are translated into two separate range queries:
+Time:{635197536000000000 TO 3155378975999999999] +Time:[288000000000 TO 635197536600000000]

eg) startTime TO EndOfTime AND BeginningOfTime TO endTime

Ideally, it should be translated into a single range query:
+Time:{635197536000000000 TO 635197536600000000]

eg) startTime TO endTime

When the index is large, this causes a large performance penalty.
On tests of an index with 50million documents:
Using a single range query - a range that covers 1million document takes 100ms.
Using the double range query pattern described above - a range that covers 1 million documents take 3 seconds.

As the range covers more of available documents, the performance penalty decreases, but is still quite noticeable.

IEnumerable and fluent interface

Cannot configure IEnumerable field using fluent interface.
Property:
public IEnumerable<String> Tags { get; set; }
Mapping:
map.Property(x => x.Tags);
The field doesn't appear in the index after adding a document.

NuGet package missing pdb file

This has been making debugging problematic.

Can't create query terms with enum types

Any time I use a enum type in a query clause, I get an error about not being able to cast an Int32 to the enum type. Mapping the field with a typeconverter doesn't help, as it appears that this is not used for transforming term values, only when storing or reading. There IS a bizarre work-around. If I cast BOTH sides of the term of the query to an object, it behaves correctly.

So, this results in an exception:

provider.SaQueryable<MyMappedType>().Where(x => x.EnumField != MyEnumType.SomeValue);

This does not:

provider.SaQueryable<MyMappedType>().Where(x => ((object)x.EnumField) != ((object)MyEnumType.SomeValue));

Support a fluent interface

I would like to see a fluent interface be available for configuration so the classes that represent index items do not require attributes and have a dependency on the Lucene.Net.Linq assembly.

Lazy field retrieval

Supposing a document like:

public class Book
{
  public string Title { get; set; }
  public string Author{ get; set; }
  public string Text{ get; set; }
}

And a query like:

from b in books
where b.Title == "foo" || b.Author == "bar"
select new { b.Title, b.Author };

Lucene.Net.Linq should not retrieve large, possibly compressed fields like Text since the client is not using that field.

Uri and fluent interface

Seems like Uri type is not supported in fluent configuring. Previously I had attribute mapping:

        [Field("Url")]
        public IEnumerable<Uri> Urls { get; set; }

And now I'm using fluent configration:

map.Property(x => x.Urls).ToField("Url");

This property does not appear in the index.
Version 3.2.53

Search on arbitrary number of homogenous terms / foo.Contains(x.Id)

If I try to do queriable.Where(x => listOfValues.Contains(x.Value)) I get an exception:

The binary operator Equal is not defined for the types 'System.String[]' and 'System.String'.

I wonder if there's a way to special-case this into a multi-term query, which should run efficiently in Lucene (although subject to the term limit, of course).

Simple Search not Working

I have been trying to get a simple example working using the Fluent code. I have a simple Account class with two properties: AccountId and AccountName.

        public class Account
        {
            public int AccountId { get; set; }

            public string AccountName { get; set; }
        }

I am creating a directory in memory, adding two accounts and then searching for them. I am noticing that having a space in the AccountName is breaking the search. Based on some of your examples, I can't see why this isn't working. Could you give me a little insight?

        var version = Lucene.Net.Util.Version.LUCENE_30;
        var mapping = new ClassMap<Account>(version);
        mapping.Key(a => a.AccountId).AsNumericField();
        mapping.Property(a => a.AccountName).WithTermVector.Yes();

        var directory = new RAMDirectory();
        var provider = new LuceneDataProvider(directory, version);
        provider.Settings.EnableMultipleEntities = false;

        using (var session = provider.OpenSession(mapping.ToDocumentMapper()))
        {
            var account1 = new Account() { AccountId = 1, AccountName = "test account", };
            var account2 = new Account() { AccountId = 2, AccountName = "account test", };
            session.Add(account1, account2);
        }

        var accounts = from account in provider.AsQueryable<Account>(mapping.ToDocumentMapper())
                       where account.AccountName == "test"
                       orderby account.Score()
                       select account;

        foreach (var account in accounts)
        {
            Console.Out.WriteLine(account.AccountName);
        }

Integration with EF

Hi! Can you tell me if there is any usage of EF or with the database without DbContext? I've red the documentation and didn't saw the information.

Example not working

var articlesByJohn = from a in articles
                      where a.Author == "John Doe" && a.PublishDate > threshold
                      orderby a.Title
                      select a;

This expression gives me in the debugger a "Children could not be evaluated"-error, and when going for a .ToList() (or .Count(), ...), the returned collection is empty.

Exception is thrown trying to run example from readme

I'm trying to run example from readme (it is a bit outdated) and it fails with this:

Sequence contains more than one element
at System.Linq.Enumerable.Single[TSource](IEnumerable1 source) at Lucene.Net.Linq.Util.AnalyzerExtensions.Analyze(Analyzer analyzer, String fieldName, String pattern) at Lucene.Net.Linq.Mapping.ReflectionFieldMapper1.EvaluateExpressionToStringAndAnalyze(Object value)
at Lucene.Net.Linq.Mapping.ReflectionFieldMapper`1.CreateRangeQuery(Object lowerBound, Object upperBound, RangeType lowerRange, RangeType upperRange)
at Lucene.Net.Linq.Translation.TreeVisitors.QueryBuildingExpressionTreeVisitor.CreateRangeQuery(IFieldMappingInfo mapping, QueryType queryType, LuceneQueryPredicateExpression lowerBoundExpression, LuceneQueryPredicateExpression upperBoundExpression)
at Lucene.Net.Linq.Translation.TreeVisitors.QueryBuildingExpressionTreeVisitor.VisitLuceneQueryPredicateExpression(LuceneQueryPredicateExpression expression)
at Lucene.Net.Linq.Clauses.TreeVisitors.LuceneExpressionTreeVisitor.VisitExtensionExpression(ExtensionExpression expression)
at Remotion.Linq.Clauses.Expressions.ExtensionExpression.Accept(ExpressionTreeVisitor visitor)
at Remotion.Linq.Parsing.ExpressionTreeVisitor.VisitExpression(Expression expression)
at Remotion.Linq.Parsing.ExpressionTreeVisitor.VisitBinaryExpression(BinaryExpression expression)

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using Lucene.Net.Analysis.Standard;
using Lucene.Net.Index;
using Lucene.Net.Linq;
using Lucene.Net.Linq.Mapping;
using Lucene.Net.Store;
using ServiceStack.Text;

namespace ConsoleApplication2
{  
    public class Article
    {
        public string Author { get; set; }
        public string Title { get; set; }
        public DateTimeOffset PublishDate { get; set; }

        // Stores the field as a NumericField
        [NumericField]
        public long Id { get; set; }

        // Stores the field as text
        public int IssueNumber { get; set; }

        [Field(IndexMode.NotIndexed, Store = StoreMode.Yes)]
        public string BodyText { get; set; }

        // Maps to field "text"
        [Field("text", Store = StoreMode.No)]
        public string SearchText
        {
            get { return string.Join(" ", new[] { Author, Title, BodyText }); }
        }

        // Add IgnoreFieldAttribute to properties that should not be mapped to/from Document
        [IgnoreField]
        public string IgnoreMe { get; set; }
    }

    class Program
    {
        static void Main(string[] args)
        {
            var directory = new RAMDirectory();
            var writer = new IndexWriter(directory, new StandardAnalyzer(Lucene.Net.Util.Version.LUCENE_30), IndexWriter.MaxFieldLength.UNLIMITED);

            var provider = new LuceneDataProvider(directory, writer.Analyzer, Lucene.Net.Util.Version.LUCENE_30, writer);

            // add some documents
            using (var session = provider.OpenSession<Article>())
            {
                session.Add(new Article { Author = "John Doe", BodyText = "some body text", PublishDate = DateTimeOffset.UtcNow });
            }

            var articles = provider.AsQueryable<Article>();

            var threshold = DateTimeOffset.UtcNow.Subtract(TimeSpan.FromDays(30));

            var articlesByJohn = from a in articles
                                 where a.Author == "John Doe" && a.PublishDate > threshold
                                 orderby a.Title
                                 select a;

            var searchResults = from a in articles
                                where a.SearchText == "some search query"
                                select a;

            Console.WriteLine(articlesByJohn.Dump());
            Console.WriteLine(searchResults.Dump());
            Console.Read();
        }
    }
}

Fluent mappings does not use Custom converter for queries

I'm trying to save and query a DateTime as a numerical field (for performance). So i've added a custom TypeConverter to do the conversion, but it seems that, it only uses the converter when adding it.
When querying it defaults to the built in DateTimeConverter and fails with a FormatException: "String was not recognized as a valid DateTime".
And should'nt the value be fetched as a long, being that its defined as a numericalField?

The same thing works when using Attributes, and uses the custom converter both for adding and querying (and passes in a long/int64)

I've created unit tests for the failing (fluent) and working (attributes) example in this gist
https://gist.github.com/TheoAndersen/8236625

It can also be found in this commit on my fork
TheoAndersen@408bab0

/Theo

chriseldredge / lucene.net.linq Goto Github PK

lucene.net.linq's People

Contributors

Stargazers

Watchers

Forkers

lucene.net.linq's Issues

Original post below

Recommend Projects

Recommend Topics

Recommend Org