Comments (11)
Hi Amit; glad you have found this useful.
I think your requirement could be implemented easily by modifying ConcatenateFilter to have a new mode perhaps called "prefixShingles" in which on every word added, it outputs the token. Without looking closer, this might better be better as a separate Filter; not sure. If you implement this requirement, please contribute this back -- it would be a very welcome addition.
from solrtexttagger.
Hi David,
I would be happy to contribute and your suggestion looks doable. Let me try it out.
Best,
Amit
from solrtexttagger.
Hi David,
While working on above, I did try a different approach for my needs. I added an EdgeNGramFilterFactory at the end of Analyzer chain like this -
<fieldType name="tagger" class="solr.TextField" positionIncrementGap="100" postingsFormat="Memory" omitTermFreqAndPositions="true" omitNorms="true"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter
class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.ASCIIFoldingFilterFactory"/> <filter class="solr.LowerCaseFilterFactory" /> <filter class="org.opensextant.solrtexttagger.ConcatenateFilterFactory" /> <filter class="solr.EdgeNGramFilterFactory" maxGramSize="50" minGramSize="2"/> </analyzer> <analyzer type="query"> <charFilter class="solr.HTMLStripCharFilterFactory"/> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.EnglishPossessiveFilterFactory" /> <filter class="solr.ASCIIFoldingFilterFactory"/> <filter class="solr.LowerCaseFilterFactory" /> </analyzer> </fieldType>
Now with this, I am able to not only do word completions, but also partial prefix matches. So this is working perfectly as I wanted. However, there are 2 things where I am facing issues -
1). In the output, the data is returned in correct sort order. For ex -
When I type - I am from asi
it returns "asian continent" first and "asia" second. My guess is that the default sort it is doing is based on _id field and not on the tag itself. I then tried to apply "sort" condition in the query, but that also does not work.
Can you please let me know how could I achieve it.
2). If in a sentence there are multiple tags, I want only top 10 suggestions for each tag. But there is no option for that currently.
I would really appreciate any help you could provide here.
BR,
Amit
from solrtexttagger.
Hi Amit,
RE (1) sort: I'm unclear what you want to achieve. First of all, sorting of what? The tags, or the matching Solr documents? The matching solr documents are not sortable at present, though such a feature request is welcome. By "not sortable" I mean it comes back in whatever order Lucene/Solr has the documents, which might appear to have an order but I wouldn't depend on it. However I don't think it's that very useful since in general you're going to want all the data, and so you can sort how you want client-side. I think the tags are sorted by start offset then end offset; does that sound fine?
RE (2) limit per sentence: I guess that could be useful; though what would constitute the "top" suggestions? Any way feel free to file an issue if you think you might get to it.
from solrtexttagger.
Hi David,
Thanks for the reply.
Yes I meant sorting of matching solr documents. In my case, as I mentioned earlier, since I have added EdgeNGramFilterFactory at the end, it has allowed me to do tag searches based of partial left substring match (sort of autocomplete). This results in getting 100s of matching documents for 1 tag. So what I was looking for is to sort them and limit only few.
For Ex - lets say I have indexed my tags in order "Tag 100", Tag 99", Tag 98"...."Tag 1", "Tag" and if I type "tag", it returns the result in the same order, but I would have liked the reverse and only 10 i.e.
"Tag", "Tag 1"....."Tag 9".
Hope this makes sense.
BR,
Amit
from solrtexttagger.
I understand how the document order is what it is, but I'm less clear on how it is you're getting 100s of matching documents per tag. Are you not using overlaps=LONGEST_DOMINANT_RIGHT
I think that's key for a use case like this.
from solrtexttagger.
If I remove EdgeNGramFilterFactory , then No_SUB and LONGEST_DOMINANT_RIGHT works as expected, but not while I have this filter. It returns all matches.
from solrtexttagger.
Or let me take my words back. It works here too. For ex -
If I type tag 100, then it only gives "Tag 100" back (and not "Tag" or "Tag 1").
But the problem is when i type "tag", then it returns all.
from solrtexttagger.
Ok. Perhaps what's needed is some sort of way to limit the number of tags at a specific location -- not for the sentence but for a given word. And then you'd need to provide some way to articulate to the tagger a sorting function of the tags. You might for example add a char length number to each document and want the shortest tag.
from solrtexttagger.
Exactly. Would it be possible?
from solrtexttagger.
Yeah; doesn't sound too hard, and it sounds generally useful.
from solrtexttagger.
Related Issues (20)
- SolrTextTagger not working with EmbeddedSolr 6.2.1 HOT 2
- implementing fuzzy matching HOT 2
- Copyrights, Org, etc. HOT 1
- OpenSextant projects add your topics. HOT 4
- SOLR 7 HOT 6
- Retrieve tagged term HOT 2
- synonyms with SolrTextTagger HOT 1
- Running the 100cities.txt example HOT 2
- Error while request tags: TaggerRequestHandler requires text to be POSTed to it HOT 1
- tagging within html attributes HOT 1
- FSTOrdPostingsFormat could enable faster Tagger HOT 1
- Can't create a schema with ConcatenateFilterFactory HOT 2
- Bring the ConcatenateFilter upstream to Lucene HOT 1
- Bring the TaggerRequestHandler to Solr (thus everything?) HOT 4
- Each tag in the output is an array of names and values instead of an object HOT 2
- htmlOffsetAdjust and matchText enabled gets StringIndexOutOfBoundsException HOT 7
- Small slowdown in tagging performance after moving to the Solr 7.4 built-in tagger handler HOT 1
- how to use the ConcatenateFilterFactory with solr 7.6 HOT 1
- Issue for creating collection in solrcloud. HOT 2
- why do you use FST HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from solrtexttagger.