Git Product home page Git Product logo

xapian-haystack's Introduction

Xapian backend for Django-Haystack

GitHub Actions

image

PyPI version

Xapian-haystack is a backend of Django-Haystack for the Xapian search engine. Thanks for checking it out.

You can find more information about Xapian here.

Features

Xapian-Haystack provides all the standard features of Haystack:

  • Weighting
  • Faceted search (date, query, etc.)
  • Sorting
  • Spelling suggestions
  • EdgeNGram and Ngram (for autocomplete)

Limitations

The endswith search operation is not supported by Xapian-Haystack.

Requirements

  • Python 3+
  • Django 2.2+
  • Django-Haystack 2.8.0
  • Xapian 1.4+

Installation

First, install Xapian in your machine e.g. with the script provided, install_xapian.sh. Call it after activating the virtual environment to install:

source <path>/bin/activate
./install_xapian.sh <version>

<version> must be >=1.4.0. This takes around 10 minutes.

Finally, install Xapian-Haystack by running:

pip install xapian-haystack

Configuration

Xapian is configured as other backends of Haystack. You have to define the connection to the database, which is done to a path to a directory, e.g:

HAYSTACK_CONNECTIONS = {
    'default': {
        'ENGINE': 'xapian_backend.XapianEngine',
        'PATH': os.path.join(os.path.dirname(__file__), 'xapian_index')
    },
}

The backend has the following optional settings:

  • HAYSTACK_XAPIAN_LANGUAGE: the stemming language; the default is english and the list of available languages can be found here.
  • HAYSTACK_XAPIAN_WEIGHTING_SCHEME: a tuple with parameters to be passed to the weighting scheme BM25. By default, it uses the same parameters as Xapian recommends; this setting allows you to change them.
  • HAYSTACK_XAPIAN_FLAGS: the options used to parse AutoQueries; the default is FLAG_PHRASE | FLAG_BOOLEAN | FLAG_LOVEHATE | FLAG_WILDCARD | FLAG_PURE_NOT See here for more information on what they mean.
  • HAYSTACK_XAPIAN_STEMMING_STRATEGY: This option lets you chose the stemming strategy used by Xapian. Possible values are STEM_NONE, STEM_SOME, STEM_ALL, STEM_ALL_Z, where STEM_SOME is the default. See here for more information about the different strategies.
  • XAPIAN_NGRAM_MIN_LENGTH, XAPIAN_NGRAM_MAX_LENGTH: options for custom configuration of ngrams (phrases) length.
  • HAYSTACK_XAPIAN_USE_LOCKFILE: Use a lockfile to prevent database locking errors when running management commands with multiple workers. Defaults to True.

Testing

Xapian-Haystack has a test suite in continuous deployment with GitHub Actions. The file .github/workflows/test.yml contains the steps required to run the test suite.

Source

The source code can be found in github.

Credits

Xapian-Haystack is maintained by Jorge C. Leitão; David Sauve was the main contributor of Xapian-Haystack and Xapian-Haystack was originally funded by Trapeze. Claude Paroz is a frequent contributor. ANtlord implemented support for EdgeNgram and Ngram.

License

Xapian-haystack is free software licenced under GNU General Public Licence v2 and Copyright (c) 2009, 2010, 2011, 2012 David Sauve, 2009, 2010 Trapeze, 2014 Jorge C. Leitão. It may be redistributed under the terms specified in the LICENSE file.

Questions, Comments, Concerns:

Feel free to open an issue here or pull request your work.

You can ask questions on the django-haystack mailing list: or in the irc #haystack.

xapian-haystack's People

Contributors

ajslater avatar alexsilva avatar asedeno avatar claudep avatar jezdez avatar jorgecarleitao avatar jrast avatar karolyi avatar kneufeld avatar mop avatar notanumber avatar symroe avatar toastdriven avatar viorels avatar wshallum avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xapian-haystack's Issues

filter with __gte, __gt, __lt or __lte doesn't have any schemas in the backend

I'm able to perform a search like this:

    searchqueryset = SearchQuerySet(site=site)
    searchquertset.filter(date=datetime.date(2010, 1, 10))

But not like this:

    searchqueryset = SearchQuerySet(site=site)
    searchquertset.filter(date__gte=datetime.date(2010, 1, 10))

Here's a truncated traceback:

    ...
    File "/home/peterbe/dev/DJANGO/xapian-haystack/xapian_backend.py", line 1021, in _filter_gte 
    pos, begin, end = vrp('%s:%s' % (field, _marshal_value(term)), '*') 
    TypeError: 'NoneType' object is not iterable 

Fuller description can be found here: http://groups.google.com/group/django-haystack/browse_thread/thread/87aeb8de7799a1ee

Current version of xapian_backend is (1, 1, 1, 'beta')

I've concluded that the TypeError is not really the problem. The problem is that self.backend.schema inside the call of XHValueRangeProcessor is an empty list.

Passed site doesn't to SearchQuerySet(site=my_site) doesn't get transferred to xapian backend

Passed site doesn't to SearchQuerySet(site=my_site) doesn't get transferred to xapian backend.

If you do this in your view code:

   searchqueryset = SearchQuerySet(site=site)

That site that you have specified gets lost when using the xapian
backend by xapian_backend.py. xapian_backend ends up using the default
site instance from haystack.sites which might not be the one you want
to use. In my case, I've subclassed SearchSite in a custom class that
knows not to use the Django ORM.

This patch has two problems: 1) It's a patch that involves two
different projects and 2) I don't know how to add tests to prove it.
Current versions:

  • django-haystack: (1, 1, 0, 'alpha')
  • xapian-haystack: (1, 1, 1, 'beta')

If you apply the following change to xapian-haystack/
xapian_backend.py:

diff --git a/xapian_backend.py b/xapian_backend.py
index ec648a7..7036d69 100755
--- a/xapian_backend.py
+++ b/xapian_backend.py
@@ -795,16 +795,18 @@ class SearchQuery(BaseSearchQuery):
It acts as an intermediary between the SearchQuerySet and the
SearchBackend itself.
"""

  • def init(self, backend=None):
  • def init(self, backend=None, site=None):
    """
    Create a new instance of the SearchQuery setting the backend
    as
    specified. If no backend is set, will use the Xapian
    SearchBackend.
Optional arguments:
    ``backend`` -- The ``SearchBackend`` to use (default =

None)

  •        `site` -- The site to use in the new `SearchBackend`
    

    if it

  •        needs to be created (default = None)
    """
    super(SearchQuery, self).**init**(backend=backend)
    
  •    self.backend = backend or SearchBackend()
    
  •    self.backend = backend or SearchBackend(site=site)
    

    def run(self, spelling_query=None):
    try:

And the following change to django-haystack/haystack/query.py:

diff --git a/haystack/query.py b/haystack/query.py
index 4fc96a0..503e2e9 100644
--- a/haystack/query.py
+++ b/haystack/query.py
@@ -13,7 +13,7 @@ class SearchQuerySet(object):
Supports chaining (a la QuerySet) to narrow the search.
"""
def init(self, site=None, query=None):

  •    self.query = query or backend.SearchQuery()
    
  •    self.query = query or backend.SearchQuery(site=site)
    self._result_cache = []
    self._result_count = None
    self._cache_full = False
    

Then the site you pass when you create a SearchQuerySet instance gets
used in the xapian backend.

_do_field_facets

Ignore my post on the haystack mailing list that says CharField works. Even when using CharField, if I use something like:

def prepare_vehicles(self, object):
        return [v.model for v in object.vehicles.all()]

Xapian haystack gets back a list in _do_field_facets(), where it then dies. I made the following change which seems to work with CharField, but not MultiValueField. I don't really understand all that's going on to be of much more help...

diff --git a/xapian_backend.py b/xapian_backend.py
index 0400610..462db6f 100644
--- a/xapian_backend.py
+++ b/xapian_backend.py
@@ -530,7 +530,11 @@ class SearchBackend(BaseSearchBackend):
             
             for result in results:
                 field_value = getattr(result, field)
-                facet_list[field_value] = facet_list.get(field_value, 0) + 1
+                if isinstance(field_value, list):
+                    for value in field_value:
+                        facet_list[value] = facet_list.get(value, 0) + 1
+                else:
+                    facet_list[field_value] = facet_list.get(field_value, 0) + 1
             
             facet_dict[field] = facet_list.items()
         
@@ -1020,4 +1024,4 @@ class SearchQuery(BaseSearchQuery):
         results = self.backend.more_like_this(self._mlt_instance, additional_query_string, **kwargs)
         self._results = results.get('results', [])
         self._hit_count = results.get('hits', 0)
-        
\ No newline at end of file
+  

Missing index should raise a better exception

Currently, when the content of the HAYSTACK_XAPIAN_PATH folder is empty (usually prior to the first reindex) and an attempt is made to search, a Xapian.DatabaseOpeningError is raised. This should be wrapped in a Haystack error with a friendlier message.

If the folder exists, but is empty, consider the index empty and don't raise an exception. If the folder does not exist, raise an exception.

Re-write SearchBackend to not use QueryParser

Re-write the SearchBackend and SearchQuery classes so that the do not make use of the QueryParser class, instead, create and chain Query objects together.

This will allow a much greater control of how the queries are built and parsed and should eliminate strange ordering issues with OR, NOT, AND, etc not parsing as expected.

__startswith is highly inefficient

Our app with a reasonably large index (about 40k objects) was brought to it's knees due to the use of startswith.

From my reading of the backend, it seems that the startswith query is finding all query terms which start with what you typed, making a python set out of them, and then passing them back into xapian to search against O_o

For my own use, I've hacked around this by using the parse_query SearchBackend method - it uses the FLAG_PARTIAL, which is all you really need.

class StarstwithSQ(haystack.backend.SearchQuery):
    def _filter_startswith(self, term, field, is_not, *args, **kwargs):
        """
        A rough hack at a more efficient startswith parser.

        This totally ignores the "field", doesn't handle is_not and just
        parses the term. But it's good enough for what we need here.

        """
        return haystack.backend.SearchBackend().parse_query(term)
matches = SearchQuerySet(query=StarstwithSQ()).filter(text__startswith=q)

DatabaseModifiedError

I got this from error report mail. It happens once or twice those days.
I can't reduplicate this error myself.
But there's a error trace.
I use 1.1beta

File "/project_path/apps/search/views.py", line 70, in search_result
facets = results.facet_counts()

File "/project_path/haystack/query.py", line 403, in facet_counts
return clone.query.get_facet_counts()

File "/project_path/haystack/backends/init.py", line 430, in get_facet_counts
self.run()

File "/project_path/haystack/backends/init.py", line 354, in run
results = self.backend.search(final_query, **kwargs)

File "/project_path/haystack/backends/init.py", line 47, in wrapper
return func(obj, query_string, _args, *_kwargs)

File "/project_path/haystack/backends/xapian_backend.py", line 370, in search
app_label, module_name, pk, model_data = pickle.loads(match.document.get_data())

DatabaseModifiedError: The revision being read has been discarded - you should call Xapian::Database::reopen() and retry the operation

my view:
results = search_form.search()
hits = results.count()
facets = results.facet_counts()
this error could both happen on "hits = results.count()" or "facets = results.facet_counts()"

OR operator doesn't function as expected

When attempting to use the OR operator, expression OR expression, I receive an incorrect number or results. The number of results returned is lower than it should be.

>>> from haystack.query import SearchQuerySet
>>> sqs = SearchQuerySet()
>>> len(sqs.auto_query("javascript"))
8
>>> len(sqs.auto_query("css"))
6
>>> len(sqs.auto_query("javascript AND css"))
6
>>> len(sqs.auto_query("javascript OR css"))
4

As you can see when using an AND operator 6 results are returned, so the OR should be >= 6 and <= 14.

Exact queries don't work on IntegerField and DateFields

In my application, using the filter method of a SearchQuerySet using an exact field lookup for IntegerFields and DateFields produce no results. However, chaining lte and gtw field lookups do work on both of these fields.

I've provided appropriate pastes below to detail the situation in which this occurs:
SearchIndex class - http://dpaste.com/126825/
Interactive session showing issue - http://dpaste.com/126829/

If you have more questions or need more information, I will be idling in #haystack on freenode for at least the next week as 'smehmood'.

gt, lt, lte, and gte filters don't work with FloatField's

I can't seem to get gt, lt, lte, or gte to work with FloatFields. lte and gte always return 0 results, and gt and lt always return all results.

search_indexes.py
from hastack import indexes
from job.models import Job

class JobIndex(indexes.SearchIndex):
    id = indexes.IntegerField(model_attr="id")
    lat = indexes.FloatField(model_attr="lat")
    lon = indexes.FloatField(model_attr="lon")
    date = indexes.DateTimeField(model_attr="date_added")

site.register(Job, JobIndex)

Output from shell:
from haystack.query import SearchQuerySet

# data
>>> sqs = SearchQuerySet()
>>> for job in sqs.order_by("id"):
        print job.id
1
2
3
4
5
8
9
10
11
12
13
14
15
16
17

# IntegerField's work as expected
>>> sqs.filter(id__gt=15)
[<SearchResult: job.job (pk=16)>, <SearchResult: job.job (pk=17)>]

>>> sqs.filter(id__gte=15)
[<SearchResult: job.job (pk=15)>, <SearchResult: job.job (pk=16)>, 
          <SearchResult: job.job (pk=17)>]

>>> sqs.filter(id__lte=2)
[<SearchResult: job.job (pk=1)>, <SearchResult: job.job (pk=2)>]

>>> sqs.filter(id__lt=2)
[<SearchResult: job.job (pk=1)>]

# DateFields also work
>>> from datetime import datetime
>>> for job in sqs.filter(date__lte=datetime(2009, 8, 20)):
...     print job.date
2009-07-27 15:34:54.404177
2009-07-28 10:29:50.365970
2009-07-28 14:21:28.560279
2009-07-29 13:17:36.364374
2009-08-07 10:09:54.974200
2009-08-18 11:14:02.744569

>>> for job in sqs.filter(date__lt=datetime(2009, 8, 20)):
        print job.date
2009-08-07 10:09:54.974200
2009-07-28 10:29:50.365970
2009-07-28 14:21:28.560279
2009-07-27 15:34:54.404177
2009-07-29 13:17:36.364374
2009-08-18 11:14:02.744569

>>> for job in sqs.filter(date__gt=datetime(2009, 8, 20)):
        print job.date
2009-08-20 09:03:47.502073
2009-08-20 09:05:10.059170
2009-08-20 09:09:42.932609
2009-08-20 09:06:41.144744
2009-08-20 09:11:33.291274
2009-08-20 09:10:42.417625
2009-08-20 09:07:47.276048
2009-08-20 09:12:35.100536
2009-08-20 09:05:47.030590

>>> for job in sqs.filter(date__gte=datetime(2009, 8, 20)):
        print job.date
2009-08-20 09:03:47.502073
2009-08-20 09:05:10.059170
2009-08-20 09:05:47.030590
2009-08-20 09:06:41.144744
2009-08-20 09:07:47.276048
2009-08-20 09:09:42.932609
2009-08-20 09:10:42.417625
2009-08-20 09:11:33.291274
2009-08-20 09:12:35.100536


# data
>>> for job in sqs:
...     print job.lat
32.847521
32.886601
32.718834
32.721742
32.718834
61.216531
39.294255
32.718834
32.718834
39.200763
32.718834
40.202799
32.718834
30.268735
39.172728

# FloatField's return none when using gte and lte
>>> sqs.filter(lat__gte=34).count()
0
>>> sqs.filter(lat__lte=34).count()
0

# FloatField's return all when using gt and lt
>>> sqs.filter(lat__lt=32).count()
15
>>> sqs.filter(lat__gt=70).count()
15

If I change the lat and lon fields to integer then it works, however that would not work for coordinates.

Untitled

I'm trying to do a simple filter by an integer value. The integer is being unicode encoded somewhere which is causing mass craziness. Verified issue with daniellindsley on #haystack.

joshua@symbiosis-staging:/var/www/vhosts/webrealtr.com$ sudo python pri*/pro*/my*/manage.py shell
/var/lib/python-support/python2.6/MySQLdb/__init__.py:34: DeprecationWarning: the sets module is deprecated
  from sets import ImmutableSet
Python 2.6.2 (release26-maint, Apr 19 2009, 01:56:41) 
[GCC 4.3.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
(InteractiveConsole)
>>> from haystack.query import SearchQuerySet
>>> SearchQuerySet().filter(price__gte=0)
Traceback (most recent call last):
  File "<console>", line 1, in <module>
  File "/usr/local/lib/python2.6/dist-packages/haystack/query.py", line 39, in __repr__
    data = list(self[:REPR_OUTPUT_SIZE])
  File "/usr/local/lib/python2.6/dist-packages/haystack/query.py", line 156, in __getitem__
    if not self._cache_is_full():
  File "/usr/local/lib/python2.6/dist-packages/haystack/query.py", line 59, in _cache_is_full
    return len(self._result_cache) >= len(self) - self._ignored_result_count
  File "/usr/local/lib/python2.6/dist-packages/haystack/query.py", line 48, in __len__
    return self.query.get_count()
  File "/usr/local/lib/python2.6/dist-packages/haystack/backends/__init__.py", line 284, in get_count
    self.run()
  File "/usr/local/lib/python2.6/dist-packages/haystack/backends/xapian_backend.py", line 983, in run
    results = self.backend.search(final_query, **kwargs)
  File "/usr/local/lib/python2.6/dist-packages/haystack/backends/xapian_backend.py", line 319, in search
    database, query_string, narrow_queries, boost
  File "/usr/local/lib/python2.6/dist-packages/haystack/backends/xapian_backend.py", line 724, in _query
    query = qp.parse_query(query_string, self._flags(query_string))
TypeError: : Swig director type mismatch in output value of type '(Xapian::valueno, std::string, std::string)'
>>> 

When you get a chance ;)

Permission on indexes differs between reindex command and internal methods

Permission on indexes differs between reindex command and internal methods.

This is the result of ./manage.py reindex running as the current user, and the internal methods running under different (web) user. Perhaps there needs to be a check performed prior to accessing the indexes in order to ensure they can be properly read and written.

Index Field with index=False raises KeyError

Here's the end of the traceback:
.../xapian_backend.py", line 193, in update
document.add_value(field['column'], self._from_python(value))
KeyError: 'column'

I put a print field just before that line, and print just before it fails outputs:
{'indexed': 'false', 'type': 'text', 'field_name': 'slug', 'multi_valued': 'false'}

Here's the index Field for that field:
slug = indexes.CharField(indexed=False, model_attr='slug')

date facet issue when gap by month and gap amount larger than 1

I'add some test into /tests/xapian_backend.py

        results = self.sb.search('index', date_facets={'pub_date': {'start_date': datetime.datetime(2008, 10, 26), 'end_date': datetime.datetime(2009, 3, 27), 'gap_by': 'month', 'gap_amount': 3}})
        self.assertEqual(results['hits'], 3)
        self.assertEqual(results['facets']['dates']['pub_date'], [
            ('2009-01-26T00:00:00', 3),
            ('2008-10-26T00:00:00', 0),
        ])

        results = self.sb.search('index', date_facets={'pub_date': {'start_date': datetime.datetime(2008, 10, 26), 'end_date': datetime.datetime(2009, 3, 27), 'gap_by': 'month', 'gap_amount': 2}})
        self.assertEqual(results['hits'], 3)
        self.assertEqual(results['facets']['dates']['pub_date'], [
            ('2009-02-26T00:00:00', 0),
            ('2008-12-26T00:00:00', 3),
            ('2008-10-26T00:00:00', 0),
        ])

        results = self.sb.search('index', date_facets={'pub_date': {'start_date': datetime.datetime(2005, 10, 26), 'end_date': datetime.datetime(2009, 11, 27), 'gap_by': 'month', 'gap_amount': 15}})
        self.assertEqual(results['hits'], 3)
        self.assertEqual(results['facets']['dates']['pub_date'], [
             ('2009-07-26T00:00:00', 0),
             ('2008-04-26T00:00:00', 3),
             ('2007-01-26T00:00:00', 0),
             ('2005-10-26T00:00:00', 0)
        ])

i think should change the code around xapian_backend.py line 619

            elif gap_type == 'month':
                if date_range.month == 12:
                    date_range = date_range.replace(
                        month=1, year=date_range.year + int(gap_value)
                    )
                else:
                    date_range = date_range.replace(
                        month=date_range.month + int(gap_value)
                    )

to this

            elif gap_type == 'month':
                if date_range.month + int(gap_value) > 12:
                    add_year = (date_range.month + int(gap_value))/12
                    new_month = (date_range.month + int(gap_value))%12
                    date_range = date_range.replace(
                        month=new_month, year=date_range.year + add_year
                    )
                else:
                    date_range = date_range.replace(
                        month=date_range.month + int(gap_value)
                    )

__in filter combined with punctuation drops valid results

For instance, given a valid search like so:

print 1, [i.organization_title for i in item_results]
item_results = item_results.filter(organization_title__in=organization_titles)
print 2, organization_titles
print 3, [i.organization_title for i in item_results]

The output is:

1 [u'Carefirst Family Health Team', u'FRASER VALLEY CHILD DEVELOPMENT CENTRE FOUNDATION', u'ROJAERLOJO FOUNDATION', u'TIMMINS LEARNING CENTRE INC.']
2 [u'Carefirst Family Health Team', u'FRASER VALLEY CHILD DEVELOPMENT CENTRE FOUNDATION', u'ROJAERLOJO FOUNDATION', u'TIMMINS LEARNING CENTRE INC.']
3 [u'Carefirst Family Health Team', u'FRASER VALLEY CHILD DEVELOPMENT CENTRE FOUNDATION', u'ROJAERLOJO FOUNDATION']

If you remove the "." from the TIMMINS value before passing it in, it works. It seems single quotes also suffer from this issue.

order_by not working?

I can't get the order of my queryset to change... for instance, here's my weird console output:

In [6]: [r.popularity for r in  SearchQuerySet().order_by('-popularity')[:5]]
Out[6]: [35.0, 834.0, 521.0, 146.0, 972.0]

more_like_this error

I'm getting an error when I try to use more_like_this():

>>> SearchQuerySet().more_like_this(p)
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib/python2.6/site-packages/haystack/query.py", line 39, in __repr__
    data = list(self[:REPR_OUTPUT_SIZE])
  File "/usr/lib/python2.6/site-packages/haystack/query.py", line 153, in __getitem__
    if not self._cache_is_full():
  File "/usr/lib/python2.6/site-packages/haystack/query.py", line 59, in _cache_is_full
    return len(self._result_cache) >= len(self) - self._ignored_result_count
  File "/usr/lib/python2.6/site-packages/haystack/query.py", line 48, in __len__
    return self.query.get_count()
  File "/usr/lib/python2.6/site-packages/haystack/backends/__init__.py", line 272, in get_count
    self.run_mlt()
  File "/usr/lib/python2.6/site-packages/haystack/backends/xapian_backend.py", line 943, in run_mlt
    results = self.backend.more_like_this(self._mlt_instance, additional_query_string, **kwargs)
TypeError: more_like_this() takes exactly 2 non-keyword arguments (3 given)

With complex queries, SearchQuerySet.exclude() filters instead of excludes

Okay, here's my SearchQuerySet query, which should exclude all old or far-future events from our search results:

sqs = sqs.exclude(
    SQ(contenttype='events.event') &
    (
        SQ(date__lte=(today - SITE_SEARCH_EVENTS_FROM)) |
        SQ(date__gte=(today + SITE_SEARCH_EVENTS_TO))
    )
)

Here is the SQ.__repr__() it generates (the sites part is applied to the query earlier on):

<SQ: AND (content__exact=old AND content__exact=97 AND sites__in=1 AND NOT ((contenttype__exact=events.event AND (date__lte=2010-01-15 OR date__gte=2010-04-22))))>

Here is the xapian.Query.get_description() that the SQ produces:

Xapian::Query(((Zold OR old) AND (Z97 OR 97) AND (ZXSITES1 OR XSITES1) AND (ZXCONTENTTYPEevents.ev OR XCONTENTTYPEevents.event) AND (VALUE_RANGE 15 00010101000000 20100115000000 OR VALUE_RANGE 15 20100422000000 99990101000000)))

Because of the 3rd AND, this filters by my exclusion instead of excluding it. It should be AND_NOT (ZXCONTENTTYPE...

If I could only change that one AND to an AND_NOT, my problems would be over for now. I've been unable to do it with my own experimentation, however.

Thank you again for any help you can provide!

Spelling suggestions includes content type words

When I try to enable spelling suggestions, I get some pretty crazy results. :-) On a search for "old 97's", I get the spelling suggestion "blogentry newsstory eventsnew". I'll try to correct this in my fork.

SearchQuery doesn't seem to handle multiple exclude attributes correctly

I've tried using the SearchQuerySet.filter() method, and I've tried using SearchQuery.add_filter() with SQ objects. Both get the same result. Here's some output from ipdb:

ipdb> sqs
Out[0]: [<SearchResult: restaurants.restaurant (pk=621)>, <SearchResult: places.place (pk=5159)>, <SearchResult: events.event (pk=509)>, <SearchResult: restaurants.restaurant (pk=487)>, <SearchResult: restaurants.restaurant (pk=518)>, <SearchResult: restaurants.restaurant (pk=589)>, <SearchResult: restaurants.restaurant (pk=604)>, <SearchResult: restaurants.restaurant (pk=649)>, <SearchResult: restaurants.restaurant (pk=673)>, <SearchResult: restaurants.restaurant (pk=548)>, <SearchResult: restaurants.restaurant (pk=557)>, <SearchResult: restaurants.restaurant (pk=608)>, <SearchResult: restaurants.restaurant (pk=626)>, <SearchResult: restaurants.restaurant (pk=704)>, <SearchResult: restaurants.restaurant (pk=508)>, <SearchResult: restaurants.restaurant (pk=592)>, <SearchResult: restaurants.restaurant (pk=593)>, <SearchResult: restaurants.restaurant (pk=347)>, <SearchResult: restaurants.restaurant (pk=989)>, '...(remaining elements truncated)...']
ipdb> sqs.filter(contenttype='events.event', date__lte=today)
Out[0]: [<SearchResult: events.event (pk=509)>]
ipdb> sqs.exclude(contenttype='events.event', date__lte=today)
Out[0]: []
ipdb> 

In the output above, [<SearchResult: events.event (pk=509)>] should have been filtered out of the sqs, but instead it returned a completely empty sqs.

This looks like a xapian-backend.py problem since I get the same results with manipulating SQ objects around. I'm going to continue trying to resolve the issue today, and I'll post more here if I make progress.

Thanks!

Add multi-field ordering

Currently, order_by only works with one field. Extend this to use Xapian::MultiValueSorter in order to allow sorting by multiple fields.

gte and lte filters raise QueryParserError

According to the documentation the following field lookups are supported: exact, gt, gte, lt, lte, in, startswith. Using a simple search_index with a FloatField an error is raised when using gte or lte.

search_indexes.py:
class JobIndex(indexes.SearchIndex):
text = indexes.CharField(document=True)
title = indexes.CharField(model_attr="title")
lat = indexes.FloatField(model_attr="lat")
lon = indexes.FloatField(model_attr="lon")

>>> from haystack.query import SearchQuerySet
>>> SearchQuerySet()
[<SearchResult: job.job (pk='2')>, <SearchResult: job.job (pk='3')>,     <SearchResult: job.job (pk='1')>]
>>> sqs.filter(lat__gt=0)
[]    
>>> sqs.filter(lat__gte=0)
...
haystack/backends/xapian_backend.pyc in search(self, query_string, sort_by, start_offset, end_offset, fields, highlight, facets, date_facets, query_facets, narrow_queries, **kwargs)
--> 276             query = qp.parse_query(query_string, flags)
277             if getattr(settings, 'HAYSTACK_INCLUDE_SPELLING', False) is True:
278                 spelling_suggestion = qp.get_corrected_query_string()
QueryParserError: Syntax: <expression> NOT <expression>

The query_string is set to: NOT lat:*..0

The same happens when using lte, however lt works.

Wildcard

Queries with Wildcards give no results

Multi-value fields

It appears that xapian-haystack may not currently handle multi-value field indexes as shown in the haystack documentation of the prepare statement:

def prepare(self, object):
        self.prepared_data = super(NoteIndex, self).prepare(object)

        # Add in tags (assuming there's a M2M relationship to Tag on the model).
        # Note that this would NOT get picked up by the automatic
        # schema tools provided by Haystack.
        self.prepared_data['tags'] = [tag.name for tag in object.tags.all()]

        return self.prepared_data
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib/python2.6/site-packages/haystack/query.py", line 393, in facet_counts
    return clone.query.get_facet_counts()
  File "/usr/lib/python2.6/site-packages/haystack/backends/__init__.py", line 312, in get_facet_counts
    self.run()
  File "/usr/lib/python2.6/site-packages/haystack/backends/xapian_backend.py", line 966, in run
    results = self.backend.search(final_query, **kwargs)
  File "/usr/lib/python2.6/site-packages/haystack/backends/xapian_backend.py", line 348, in search
    facets_dict['fields'] = self._do_field_facets(results, facets)
  File "/usr/lib/python2.6/site-packages/haystack/backends/xapian_backend.py", line 525, in _do_field_facets
    facet_list[field_value] = facet_list.get(field_value, 0) + 1
TypeError: unhashable type: 'list'

Release notes inaccurate regarding Xapian and mod_python

Requirements

[...]

  • Xapian 1.0.13.X (May work with earlier versions, but untested)
  • mod_wsgi 1.3.X

Notes

  • Due to an issue with mod_python causing deadlocks with Xapian (http://trac.xapian.org/ticket/185), mod_python is not supported with xapian-haystack. It may work, with some tweaking, but your mileage will vary.

Ticket #185 was fixed in Xapian 1.0.13 (by avoiding the situation which causes mod_python to deadlock), so if people follow the requirements, this isn't an issue.

It's probably wise to choose mod_wsgi over mod_python, as the former seems better maintained now, but these deadlocks are no longer a reason to avoid mod_python.

Also, it's "Xapian 1.0.13" not "Xapian 1.0.13.X".

Percent attribute

It would be nice to have a percent (%) attribute. Like the current score attribute.

I think xapian already provide this, but not haystack.

Investigate start_offset and end_offset

Need to investigate whether start_offset and end_offset are working as expected in regards to Solr and Whoosh.

Currently, start_offset is 0 indexed and end_offset is not actually the offset, but is the maximum number of values to return.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.