whoosh-community / whoosh Goto Github PK
View Code? Open in Web Editor NEWWhoosh is a fast, featureful full-text indexing and searching library implemented in pure Python.
License: Other
Whoosh is a fast, featureful full-text indexing and searching library implemented in pure Python.
License: Other
Original report by Marcin Kuzminski (Bitbucket: marcinkuzminski, GitHub: marcinkuzminski).
Ability to use spans is a great future, when I build an Index from scratch it works fine,
doing reindexing that builds second index and not merging it with the default raises this exception.
NotImplementedError: spans not implemented in <class 'whoosh.matching.MultiMatcher'>
Is it possible to implement spans in MultiMatcher ?
Original report by Matt Chaput (Bitbucket: mchaput, GitHub: mchaput).
Original report by Chris Dent (Bitbucket: cdent, GitHub: cdent).
Parsing a query with a large number of 'OR' statements in it, with the default QueryParser will cause
RuntimeError: maximum recursion depth exceeded while calling a Python object
Here's a basic test case
>>> from whoosh.fields import Schema, TEXT
>>> from whoosh.qparser import QueryParser
>>> schema = Schema(content=TEXT())
>>> parser = QueryParser("content", schema=schema)
>>> x = parser.parse(u"1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1 OR 1")
What matters here is not the values being queried between the ORs but the number of ORs. Take one away and recursion depth is not hit. The above number of ORs may seem odd but comes about as the result of automatic query generation (there are many layers of processing in this situation). The actual query is more than this, but the above is the minimum to tickle the problem.
Is this a fundamental limitation in the parser or a fixable bug?
Original report by Matt Chaput (Bitbucket: mchaput, GitHub: mchaput).
Original report by Matt Chaput (Bitbucket: mchaput, GitHub: mchaput).
Don't call _read_values() for a block until someone asks for a value().
Original report by Matt Chaput (Bitbucket: mchaput, GitHub: mchaput).
aback42 on the forum:
In Whoosh 1.0.0b2,
when searching on 8.2G index this error occurs:
Traceback (most recent call last):
File "main.py", line 241, in miadvsearch_activate_cb
File "main.py", line 259, in search_stmt
File "cireader.pyc", line 332, in search
File "searchengine\searching.pyc", line 29, in search
File "whoosh\searching.pyc", line 351, in search
File "whoosh\query.pyc", line 466, in matcher
File "whoosh\searching.pyc", line 119, in postings
File "whoosh\reading.pyc", line 465, in postings
File "whoosh\filedb\filereading.pyc", line 238, in postings
File "whoosh\filedb\filepostings.pyc", line 252, in __init__
File "whoosh\filedb\filepostings.pyc", line 396, in _next_block
File "whoosh\filedb\filepostings.pyc", line 380, in _consume_block
File "whoosh\filedb\filepostings.pyc", line 353, in _read_values
File "whoosh\filedb\structfile.pyc", line 110, in __getitem__
OverflowError: long int too large to convert to int
Original report by Anonymous.
Original report by Marcin Kuzminski (Bitbucket: marcinkuzminski, GitHub: marcinkuzminski).
I'm using whoosh as an indexer in my app, it scanns over a mercurial repositories. When marking search results using highlight module sometime in can kill my server. For example building an index on few dictionary files and searching one word from this index makes my app freze for 2-4 minutes off 100% cpu usage to highlight the one word from content.
This is how i do it it's not a complicated analyzer or formatter.
#!python
analyzer = RegexTokenizer(expression=r"\w+") | LowercaseFilter()
formatter = HtmlFormatter('span',
between='\n<span class="break">...</span>\n')
#how the parts are splitted within the same text part
fragmenter = SimpleFragmenter(200)
#fragmenter = ContextFragmenter(search_items)
for res in results:
d = {}
d.update(res)
hl = highlight(escape(res['content']), search_items,
analyzer=analyzer,
fragmenter=fragmenter,
formatter=formatter,
top=5)
It would be awesome if it could be improved.
Original report by Daniel Lindsley (Bitbucket: toastdriven, GitHub: toastdriven).
While struggling with a bug of my own, I fleshed out some open range tests for the NUMERIC
type. They pass for me, which means a more complete test suite for you, but sadly don't reveal my bug. Patch included.
Original report by Adam Blinkinsop (Bitbucket: blinks, GitHub: blinks).
Looks like it's caused by line 908 of whoosh.query:
#!python
return [word for word in self.words if (fieldname, word) in ixreader]
This won't return anything if the words don't exist in the reader, and if I'm looking for the reversed existing terms (non-existant terms?), I get zip.
Original report by jdubery (Bitbucket: jdubery, GitHub: Unknown).
Function whoosh.query.Phrase.eq() has [and self.words == other.word and] in its return statement; it should have [and self.words == other.words and]
cheers, John
Original report by eevee (Bitbucket: eevee, GitHub: eevee).
I ran into this using searcher.search(sorted_by=(u'foo', u'bar')), done mostly out of habit. The actual field names don't contain any non-ASCII characters, but merely naming them with unicodes instead of strs probably shouldn't crash.
Actual problem is that encode_termkey does "%s %s" % (fieldname, encoded_text); having the fieldname be a unicode forces the returned string to also be a unicode, which tries to decode the encoded text as ascii, which generally fails since it's actually encoded UTF-8.
Attached patch fixes this and adds a test. Also fixes the test_random_termkeys test; it would intermittently bomb because it can pick characters from the D800–DFFF range, which are all used for surrogate pairs and aren't real characters, so they don't roundtrip when encoded+decoded as utf8.
Original report by Matt Chaput (Bitbucket: mchaput, GitHub: mchaput).
Add a method for iterating through all documents that yields both docnums and fields. This would make e.g. finding out-of-date documents and deleting them easier.
Original report by Collin Anderson (Bitbucket: collinmanderson, GitHub: Unknown).
I want to use AsyncWriter
to solve my LockError
problems, but I get this error when calling delete_document
:
_record() takes exactly 4 arguments (3 given)}}}
{{{_record() takes exactly 4 arguments (3 given)
The problem is that delete_document passes docnum
, instead of passing *args
and **kwargs
like others.
Original report by Stéphane Démurget (Bitbucket: zzrough, GitHub: zzrough).
(I hope creating new issues at bitbucket instead of the old Trac system is okay)
I've encountered http://whoosh.ca/ticket/70 (file lock fd not closed, that you fixed when importing whoosh into bitbucket -- this was the original point of my patch).
I'm opening a writer for each write in a FastCGI application (writes are not so frequent, but still, the system is exhausted of fds as the processes are reused).
Here's a small patch that tries to close the fd even if the flock fail. This also ensures the object state stays in sync (self.locked and self.fd). I'm silencing out eventual close errors not to mask the real exception raised until logging is available (I saw #6 here). I also fixed a missing self.locked update in the msvcrt case.
I do not have access to the box leaking fds on my side ATM but I should get a lsof or /proc/xxx/fd listing soon to ensure it's only the index not properly closed. Still, your fix might only fixes a part of #70 since the lsof output shows the fds of the opened segments (might only be the reporter's program indexing when lsof is launched, thus there is already no more leak).
Original report by Anonymous.
When I type the search text (TAMIL) "அம்மா" and do a search the query
search was happening only for a portion of the text "அம" and I am
leading in to a wrong search.
What I could see from the library was from the parser class the data
send where correct. When it call the Term class in the query.py, the
data that got assigned in the self.text has only "அம" instead of
"அம்மா". But in the calling place I can see the full text. Only when
the data is assigned at self.text I am getting this issue.
But when I do a search in English, I am getting the correct result.
How do I rectify this issue, any updated would be appreciated.
Thanks,
Veera
Original report by Stavros Korokithakis (Bitbucket: Stavros, GitHub: Stavros).
Deleted documents are still returned when the query is just "*". Is this by design? I couldn't find anything in the docs. How can I get rid of these results?
Original report by Nicolas Vandamme (Bitbucket: nvandamme, GitHub: nvandamme).
Hi,
Running in a Pylons environment, I open the indexer on every search requests on my controller. However, on each request, the indexer never close its file descriptors, whenever i call ix.close() or not, leading to an IOError: [Errno 24] Too many open files. Is there a way to close manually all index files ?
Original report by Anonymous.
I create some data file on OS X 64bit, and use them on WinXP 32bit, then get following error.
ERROR:root:Uncaught exception GET /search?query=tcp&index_name=seed (127.0.0.1)
HTTPRequest(protocol='http', host='127.0.0.1:10000', method='GET', uri='/search?
query=tcp&index_name=seed', version='HTTP/1.1', remote_ip='127.0.0.1', remote_ip
='127.0.0.1', body='', headers={'Accept-Language': 'zh-CN,zh;q=0.8', 'Accept-Enc
oding': 'gzip,deflate,sdch', 'Host': '127.0.0.1:10000', 'Accept': 'application/x
ml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5',
'User-Agent': 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US) AppleWebKit/534.3
(KHTML, like Gecko) Chrome/6.0.472.59 Safari/534.3 ChromePlus/1.4.2.0alpha1', '
Accept-Charset': 'GBK,utf-8;q=0.7,*;q=0.3', 'Connection': 'keep-alive', 'Referer
': 'http://127.0.0.1:10000/'})
Traceback (most recent call last):
File "C:\Python26\lib\site-packages\tornado-1.1-py2.6.egg\tornado\web.py", lin
e 810, in _stack_context
yield
File "C:\Python26\lib\site-packages\tornado-1.1-py2.6.egg\tornado\stack_contex
t.py", line 77, in StackContext
yield
File "C:\Python26\lib\site-packages\tornado-1.1-py2.6.egg\tornado\web.py", lin
e 827, in _execute
getattr(self, self.request.method.lower())(*args, **kwargs)
File "E:\lee\luna\aio\core\search_manager.py", line 57, in get
page_length = page_length)
File "E:\lee\luna\aio\core\indexer_searcher.py", line 55, in esearch_seed
page_length = page_length)
File "E:\lee\luna\aio\core\_search_whoosh_backend.py", line 80, in esearch
ix = _index.open_dir(datapath, indexname)
File "C:\Python26\lib\site-packages\whoosh-1.0.0-py2.6.egg\whoosh\index.py", l
ine 97, in open_dir
return storage.open_index(indexname)
File "C:\Python26\lib\site-packages\whoosh-1.0.0-py2.6.egg\whoosh\filedb\files
tore.py", line 49, in open_index
return FileIndex(self, schema=schema, indexname=indexname)
File "C:\Python26\lib\site-packages\whoosh-1.0.0-py2.6.egg\whoosh\filedb\filei
ndex.py", line 218, in __init__
_read_toc(self.storage, self._schema, self.indexname)
File "C:\Python26\lib\site-packages\whoosh-1.0.0-py2.6.egg\whoosh\filedb\filei
ndex.py", line 126, in _read_toc
check_size("long", _LONG_SIZE)
File "C:\Python26\lib\site-packages\whoosh-1.0.0-py2.6.egg\whoosh\filedb\filei
ndex.py", line 123, in check_size
raise IndexError("Index was created on different architecture: saved %s = %s
, this computer = %s" % (name, sz, target))
IndexError: Index was created on different architecture: saved long = 4, this co
mputer = 8
ERROR:root:500 GET /search?query=tcp&index_name=seed (127.0.0.1) 62.00ms
Original report by Marcin Kuzminski (Bitbucket: marcinkuzminski, GitHub: marcinkuzminski).
On whoosh/query.py on b13 version
#!python
from whoosh.matching import (AndMaybeMatcher, DisjunctionMaxMatcher,
ListMatcher, IntersectionMatcher, InverseMatcher,
NullMatcher, PhraseMatcher, RequireMatcher,
UnionMatcher, WrappingMatcher)
PhraseMatcher is imported, and commented later on in matching.py
It raises ImportError
Original report by Matt Chaput (Bitbucket: mchaput, GitHub: mchaput).
query.Phrase should use "position boost" instead of "position" and use the boosts to calculate the score of the phrase.
Right now the scorer's _poses() method returns a list of positions using scorer.value_as("positions"), so this would have to be changed to use a list of (position, score) tuples.
Original report by Olexiy Strashko (Bitbucket: olexiy_strashko, GitHub: Unknown).
Hi, I will describe problem and fix here:
Environment: haystack 1.1.0a and whoosh 0.3.18.
Problem: when writer is locked - AsyncWriter raise exception:
#!python
File "/home/bogushtime/prod/third_party_apps/haystack/indexes.py", line 152, in update_object
self.backend.update(self, [instance])
File "/home/bogushtime/prod/third_party_apps/haystack/backends/whoosh_backend.py", line 161, in update
writer.update_document(**doc)
File "/home/bogushtime/prod/third_party_apps/whoosh/writing.py", line 215, in update_document
self._record("update_document", *args, **kwargs)
File "/home/bogushtime/prod/third_party_apps/whoosh/writing.py", line 194, in _record
self.events.add(method, args, kwargs)
AttributeError: 'list' object has no attribute 'add'
I've investigated the code, and I found the problem:
Code from AsyncWriter:
#!python
self.events.add(method, args, kwargs)
That's the problem. List has no add method and it should be **tuple **added.
**
So, the fix should be this change in the _record method: **
#!python
self.events.append((method, args, kwargs))
When no locking appear - everything works great.
Thanks for whoosh!
With respect, Olexiy.
Original report by Anonymous.
Tried both easy_install and trunk install. Same result:
ImportError Traceback (most recent call last)
/home/martin/Downloads/ in ()
Thanks!
Original report by jdubery (Bitbucket: jdubery, GitHub: Unknown).
Hi,
just had the following exception generated in whoosh code (whoosh 0.3.18) ...
File "c:\python25\lib\site-packages\whoosh\searching.py", line 266, in search
scored_list = sorter.order(self, query.docs(self), reverse=reverse)
File "C:\Python25\lib\site-packages\whoosh\query.py", line 184, in docs
return self.scorer(searcher).all_ids()
File "C:\Python25\lib\site-packages\whoosh\query.py", line 665, in scorer
return InverseScorer(scorer, reader.doc_count_all(), reader.is_deleted)
File "C:\Python25\lib\site-packages\whoosh\postings.py", line 871, in __init__
self._find_next()
File "C:\Python25\lib\site-packages\whoosh\postings.py", line 879, in _find_next
while self.id == self.scorer.id and not self.is_deleted(self.id):
TypeError: is_deleted() takes exactly 1 argument (2 given)
... looks like a whoosh typo
John
Original report by eevee (Bitbucket: eevee, GitHub: eevee).
SpellChecker.suggest() runs a query, then sorts the results and returns the top suggestions.
But now searcher.search() defaults limit to 10, and suggest()'s query doesn't sort in any particularly useful order. So it's getting back 10 more-or-less arbitrary results, then sorting //those// in spell-check order. For words with a lot of common n-grams and big dictionaries, this gives total garbage results, and suggestions_and_scores() is seriously gimped.
Adding limit=5000 to the s.search(q) call would restore 0.3's behavior. A better solution might be to also rewrite suggest()'s sorting as a custom weighter and do the querying and sorting at the same time.
Original report by Paul Davis (Bitbucket: davisp, GitHub: davisp).
The math for calculating the total number of pages is off by one if the number of results is exactly divisible by the page length.
http://bitbucket.org/mchaput/whoosh/src/tip/src/whoosh/searching.py#cl-714
Should be something like:
#!python
self.pagecount = self.total / pagelen
if self.total % pagelen == 0:
self.pagecount += 1
Original report by ollyc (Bitbucket: ollyc, GitHub: ollyc).
I'm using the SimpleParser in Whoosh 0.3.18 and I get the following exception. It seems to occur whenever there is a stop word in a query.
#!python
>>> from whoosh.fields import Schema, TEXT
>>> from whoosh.qparser import SimpleParser
>>> schema = Schema(content=TEXT())
>>> parser = SimpleParser("content", schema=schema)
>>> parser.parse(u"sound the trumpets")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/tmp/ve/lib/python2.6/site-packages/whoosh/qparser/simple.py", line 118, in parse
opts = [make_clause(text) for text in opts]
File "/tmp/ve/lib/python2.6/site-packages/whoosh/qparser/simple.py", line 107, in make_clause
return self.make_basic_clause(self.fieldname, text, boost=boost)
File "/tmp/ve/lib/python2.6/site-packages/whoosh/qparser/simple.py", line 104, in make_basic_clause
return self.termclass(fieldname, parts[0], boost=boost)
IndexError: list index out of range
Original report by Peter Hansen (Bitbucket: microcode, GitHub: microcode).
The following code demonstrates a failure with 1.0.0b9 (and perhaps earlier) when you create an index but don't add any documents and then try to search it. The problem seems to come from code in fileindex.SegmentSet.reader()
which doesn't handle the no-segment case gracefully.
from whoosh import index
from whoosh.fields import Schema, TEXT
schema = Schema(text=TEXT())
ix = index.create_in('.', schema)
search = ix.searcher()
search.find('text', u'foo')
This gives a traceback ending with this:
File ".....\whoosh\searching.py", line 264, in find
qp = QueryParser(defaultfield, schema=self.ixreader.schema)
AttributeError: 'MultiReader' object has no attribute 'schema'
Original report by encukou (Bitbucket: encukou, GitHub: encukou).
CompatibilityScorer was introduced here: http://bitbucket.org/mchaput/whoosh/changeset/cc65bdd6d89d#chg-src/whoosh/scoring.py_newline315
In __init__
(), "self.method" is set, but in score(), "self.scoremethod" is used.
Whoosh crashes on me with the obvious AttributeError: 'CompatibilityScorer' object has no attribute 'scoremethod'.
When I change the line in __init__
line to "self.scoremethod = scoremethod", Whoosh starts working again for me.
Original report by Marcin Kuzminski (Bitbucket: marcinkuzminski, GitHub: marcinkuzminski).
It happend to me several times that the
dfl function returned 0 when searching and caused a FloatDivision error on returning the value
The thing is i can't reproduce this bug, but it happened when i started searching during my daemon was indexing, and also when i renamed index dir, and renamed it back to original name.
I think a simple try except will fix that issue or rewrite the default on doc_field_length to always return the default.
#!python
class WOLWeighting(Weighting):
"""Abstract middleware class for weightings that can use
"weight-over-length" (WOL) as an approximate quality rating.
"""
def quality_fn(self, searcher, fieldname, text):
dfl = searcher.doc_field_length
def fn(m):
return m.weight() / dfl(m.id(), fieldname, 1) #here
return fn
def block_quality_fn(self, searcher, fieldname, text):
def fn(m):
return m.blockinfo.maxwol
return fn
Update:
since it happens to me more often i rewrite the doc_field_length to sth like this:
#!python
@protected
def doc_field_length(self, docnum, fieldname, default=0):
if self.fieldlengths is None: return default
fl = self.fieldlengths.get(docnum, fieldname, default=default)
if fl == 0:return default
return fl
Original report by Stavros Korokithakis (Bitbucket: Stavros, GitHub: Stavros).
I would like to delete a document according to search results, but there does not appear to be a way to get some sort of immutable ID so I can delete it with delete_document().
Do documents have some sort of immutable ID (perhaps a hash)? Otherwise, the simple use case of deleting documents by search results is impossible...
Original report by Matt Chaput (Bitbucket: mchaput, GitHub: mchaput).
When you finally get around to writing tutorials, include one about using the Weighting.final() method to influence document scores based on a document popularity factor stored somewhere else.
#!python
class MyWeighting(scoring.BM25F):
def final(searcher, docnum, score):
# Let's say your model associates a document ID with the hit count
# for each document, and the document ID is in the "id" stored field.
# First, get the contents of the "id" field for this document
docid = searcher.stored_fields(docnum)["id"]
# Look up the document's hit count in my model
maxhits = mymodel.max_hits()
hitcount = mymodel.get_hits(docid)
# Multiply the computed score for this document by the popularity
return score * (hitcount / maxhits)
Original report by Anonymous.
Hello,
I am packaging python-whoosh for Debian. Since, build chroots usually do not mount /dev/shm as tmpfs, that results in one of the tests (test_multipool in
s/test_indexing.py) to fail, I added a patch for that test to check if Queue() can be run before attempting the test. Please find the patch attached.
Original report by Matt Chaput (Bitbucket: mchaput, GitHub: mchaput).
This blog post describes an interesting "champion list"-like extension to the inverted index, called a "reverted index".
http://palblog.fxpal.com/?p=4550
This would be very straightforward to implement in Whoosh as a one-off, but it would be better to look for a way to generalize the creation of the inverted index, term vectors, and reverted index, which are essentially variations on the same storage mechanisms.
Original report by Stéphane Démurget (Bitbucket: zzrough, GitHub: zzrough).
When I try to use a BOOLEAN field, the initializer fails when invoking self.format = Existence(), as existing at least needs an analyzer parameter.
You did not specify the BOOLEAN field in the documentation so maybe you think it's not ready to be used yet? Please close if that's the case.
Original report by Stavros Korokithakis (Bitbucket: Stavros, GitHub: Stavros).
If I query, say, for "url: something", whoosh dies with an exception.
Original report by jdubery (Bitbucket: jdubery, GitHub: Unknown).
I am currently using a patched whoosh to provide this facility; I would like to remove the need for patching.
Original report by Collin Anderson (Bitbucket: collinmanderson, GitHub: Unknown).
Same as #40, except ExcludeMatcher. Would a django traceback help?
Original report by tcrombez (Bitbucket: tcrombez, GitHub: tcrombez).
My index and search are working great, except for phrase search, which is vital to my project.
This is the output:
#!python
>>> s.search(u'"nous avons"')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "whoosh_indexer.py", line 77, in search
results = searcher.search(query, sortedby=sortedby)
File "build/bdist.macosx-10.3-i386/egg/whoosh/searching.py", line 369, in search
File "build/bdist.macosx-10.3-i386/egg/whoosh/searching.py", line 310, in sort_query
File "build/bdist.macosx-10.3-i386/egg/whoosh/scoring.py", line 491, in order
File "build/bdist.macosx-10.3-i386/egg/whoosh/spans.py", line 194, in all_ids
File "build/bdist.macosx-10.3-i386/egg/whoosh/spans.py", line 184, in next
File "build/bdist.macosx-10.3-i386/egg/whoosh/spans.py", line 169, in _find_next
File "build/bdist.macosx-10.3-i386/egg/whoosh/spans.py", line 343, in _get_spans
File "build/bdist.macosx-10.3-i386/egg/whoosh/matching.py", line 404, in spans
IndexError: list index out of range
Original report by eevee (Bitbucket: eevee, GitHub: eevee).
SpellChecker has accepted a minscore param for as long as I can remember, but the code doesn't actually do anything with it.
Patch is fairly trivial and is attached, with a test that fails against trunk.
Original report by Matt Chaput (Bitbucket: mchaput, GitHub: mchaput).
Consider this query:
#!python
And([Term(), And([Term(), Term()], boost=2.0)])
Normalizing will Merge the two Ands, but that changes the meaning of the query.
Normalize should either not merge CompoundQueries with different boosts, or should apply the subquery's boost to its members before hoisting them into the parent query.
Original report by Alexander Clausen (Bitbucket: alexc, GitHub: alexc).
Using whoosh 0.3.18 together with django-haystack, deployed on mod_wsgi. I'm getting errors when searching that look suspiciously like those when Whoosh was not thread safe:
Traceback (most recent call last):
File "/usr/local/pythonenv/flussinfo/lib/python2.5/site-packages/django/core/handlers/base.py", line 101, in get_response
response = callback(request, *callback_args, **callback_kwargs)
File "/usr/local/pythonenv/flussinfo/lib/python2.5/site-packages/haystack/views.py", line 131, in search_view
return view_class(*args, **kwargs)(request)
File "/usr/local/pythonenv/flussinfo/lib/python2.5/site-packages/haystack/views.py", line 45, in __call__
return self.create_response()
File "/usr/local/pythonenv/flussinfo/lib/python2.5/site-packages/haystack/views.py", line 117, in create_response
(paginator, page) = self.build_page()
File "/usr/local/pythonenv/flussinfo/lib/python2.5/site-packages/haystack/views.py", line 99, in build_page
page = paginator.page(self.request.GET.get('page', 1))
File "/usr/local/pythonenv/flussinfo/lib/python2.5/site-packages/django/core/paginator.py", line 37, in page
number = self.validate_number(number)
File "/usr/local/pythonenv/flussinfo/lib/python2.5/site-packages/django/core/paginator.py", line 28, in validate_number
if number > self.num_pages:
File "/usr/local/pythonenv/flussinfo/lib/python2.5/site-packages/django/core/paginator.py", line 60, in _get_num_pages
if self.count == 0 and not self.allow_empty_first_page:
File "/usr/local/pythonenv/flussinfo/lib/python2.5/site-packages/django/core/paginator.py", line 48, in _get_count
self._count = self.object_list.count()
File "/usr/local/pythonenv/flussinfo/lib/python2.5/site-packages/haystack/query.py", line 377, in count
return len(clone)
File "/usr/local/pythonenv/flussinfo/lib/python2.5/site-packages/haystack/query.py", line 53, in __len__
self._result_count = self.query.get_count()
File "/usr/local/pythonenv/flussinfo/lib/python2.5/site-packages/haystack/backends/__init__.py", line 408, in get_count
self.run()
File "/usr/local/pythonenv/flussinfo/lib/python2.5/site-packages/haystack/backends/__init__.py", line 363, in run
results = self.backend.search(final_query, **kwargs)
File "/usr/local/pythonenv/flussinfo/lib/python2.5/site-packages/haystack/backends/__init__.py", line 52, in wrapper
return func(obj, query_string, *args, **kwargs)
File "/usr/local/pythonenv/flussinfo/lib/python2.5/site-packages/haystack/backends/whoosh_backend.py", line 298, in search
narrow_searcher = self.index.searcher()
File "build/bdist.linux-x86_64/egg/whoosh/index.py", line 329, in searcher
return Searcher(self.reader(), **kwargs)
File "build/bdist.linux-x86_64/egg/whoosh/filedb/fileindex.py", line 291, in reader
return self.segments.reader(self.storage, self.schema)
File "build/bdist.linux-x86_64/egg/whoosh/filedb/fileindex.py", line 422, in reader
for segment in segments]
File "build/bdist.linux-x86_64/egg/whoosh/filedb/filereading.py", line 73, in __init__
self.termtable = open_terms(storage, segment)
File "build/bdist.linux-x86_64/egg/whoosh/filedb/filereading.py", line 34, in open_terms
termfile = storage.open_file(segment.term_filename)
File "build/bdist.linux-x86_64/egg/whoosh/filedb/filestore.py", line 56, in open_file
f = StructFile(open(self._fpath(name), "rb"), *args, **kwargs)
IOError: [Errno 2] No such file or directory: u'/usr/local/pythonenv/flussinfo/share/flussinfo/whoosh_index/_MAIN_7.tiz'
and yes, they seem to go away when switching to threads=1 in the WSGIDaemonProcess. Strangely the site worked fine for almost a month with threads enabled.
Original report by Collin Anderson (Bitbucket: collinmanderson, GitHub: Unknown).
Attached is a django traceback
Original report by Matt Chaput (Bitbucket: mchaput, GitHub: mchaput).
Outside of tight loops, Whoosh should using logging to log useful information. By default, the log would be thrown away, but the user could redirect the log to see the information.
For example, when the posting pool writes out a run.
Original report by Marcin Kuzminski (Bitbucket: marcinkuzminski, GitHub: marcinkuzminski).
There's a typo in spans implementation.
currently it looks like that.
#!python
def spans(self):
return sorted(set(self.a.spans() | set(self.b.spans())))
as You can see the first set is not closed, this is causing TypeError when using queries with two or more search terms
Regards
Original report by Matt Chaput (Bitbucket: mchaput, GitHub: mchaput).
Apparently it currently raises a KeyError.
Original report by Matt Chaput (Bitbucket: mchaput, GitHub: mchaput).
Original report by Sardar Yumatov (Bitbucket: sardarnl, GitHub: sardarnl).
If no sorting is used, then search will always return results in reverse order. This that means current page will be the first one in the resultset (this is strange, highest scoring results should go first...). The ResultsPage doesn't know about reverse sorting, so it always fetches the tail of resultset, which is the first page actually. So ResultsPage is always the first page.
Steps to reproduce:
#!python
page = searcher.search_page(query, 2, pagelen = 5)
for r in page:
print r
Try different page numbers, the result is always the first page.
Work around:
#!python
results = searcher.search(query, limit=page * pagelen)
page = results[0:pagelen]
Original report by Matt Chaput (Bitbucket: mchaput, GitHub: mchaput).
Original report by jdubery (Bitbucket: jdubery, GitHub: Unknown).
Using python 2.5 (the most recent on my machine) I get a syntax error on line 1218 of revision 1.0.0b9 query.py: return self.__class__(*self.subqueries, boost=self.boost)
. This can be cleared by changing to the following code: return self.__class__(*self.subqueries, **{"boost":self.boost})
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.