Comments (7)
Pinging @elastic/es-search (Team:Search)
from elasticsearch.
@S-Dragon0302 Would you please let us know what the problem is you are encountering? I'm going to remove the "bug" label for now as I don't see whats missing. Also keep in mind that if this is a language-specific problem, the language-specific discuss forums (https://discuss.elastic.co/c/in-your-native-tongue/11) might be a good place to ask.
from elasticsearch.
The segmentation result is incorrect. The token I generated from segmentation has no value. Actually, there should be a result
from elasticsearch.
The segmentation result should be this.
{
"tokens": [
{
"token": "是不",
"start_offset": 0,
"end_offset": 3,
"type": "word",
"position": 0
},
{
"token": "不是",
"start_offset": 2,
"end_offset": 5,
"type": "word",
"position": 1
},
{
"token": "是发",
"start_offset": 4,
"end_offset": 7,
"type": "word",
"position": 2
},
{
"token": "发现",
"start_offset": 6,
"end_offset": 9,
"type": "word",
"position": 3
},
{
"token": "现我",
"start_offset": 8,
"end_offset": 11,
"type": "word",
"position": 4
},
{
"token": "我的",
"start_offset": 10,
"end_offset": 13,
"type": "word",
"position": 5
},
{
"token": "的字",
"start_offset": 12,
"end_offset": 15,
"type": "word",
"position": 6
},
{
"token": "字冒",
"start_offset": 14,
"end_offset": 17,
"type": "word",
"position": 7
},
{
"token": "冒烟",
"start_offset": 16,
"end_offset": 19,
"type": "word",
"position": 8
},
{
"token": "烟了",
"start_offset": 18,
"end_offset": 21,
"type": "word",
"position": 9
}
]
}
from elasticsearch.
The actual result is this.
{
"tokens" : [ ]
}
from elasticsearch.
For the given:
是ྂ不ྂ是ྂ发ྂ现ྂ我ྂ的ྂ字ྂ冒ྂ烟ྂ了ྂ
the pattern without token filtering:
GET /my_index/_analyze
{
"filter": [
"lowercase"
],
"tokenizer": {
"type": "pattern",
"pattern": "[^\\p{L}\\p{N}]+"
},
"text": "是ྂ不ྂ是ྂ发ྂ现ྂ我ྂ的ྂ字ྂ冒ྂ烟ྂ了ྂ"
}
Results in:
{
"tokens": [
{
"token": "是",
"start_offset": 0,
"end_offset": 1,
"type": "word",
"position": 0
},
{
"token": "不",
"start_offset": 2,
"end_offset": 3,
"type": "word",
"position": 1
},
{
"token": "是",
"start_offset": 4,
"end_offset": 5,
"type": "word",
"position": 2
},
{
"token": "发",
"start_offset": 6,
"end_offset": 7,
"type": "word",
"position": 3
},
{
"token": "现",
"start_offset": 8,
"end_offset": 9,
"type": "word",
"position": 4
},
{
"token": "我",
"start_offset": 10,
"end_offset": 11,
"type": "word",
"position": 5
},
{
"token": "的",
"start_offset": 12,
"end_offset": 13,
"type": "word",
"position": 6
},
{
"token": "字",
"start_offset": 14,
"end_offset": 15,
"type": "word",
"position": 7
},
{
"token": "冒",
"start_offset": 16,
"end_offset": 17,
"type": "word",
"position": 8
},
{
"token": "烟",
"start_offset": 18,
"end_offset": 19,
"type": "word",
"position": 9
},
{
"token": "了",
"start_offset": 20,
"end_offset": 21,
"type": "word",
"position": 10
}
]
}
None of those are longer than 1
ngram. So filtering, requiring 2
ngram results in no output.
from elasticsearch.
closing as expected behavior. Filtering requiring 2 ngram when there is only 1 ngram is expected.
from elasticsearch.
Related Issues (20)
- [CI] SearchProgressActionListenerIT testSearchProgressSimple failing HOT 5
- [CI] DockerTests test600Interrupt failing HOT 8
- [CI] TextSimilarityRankTests testRerankInferenceResultMismatch failing HOT 3
- [CI] PrevalidateShardPathIT testCheckShards failing HOT 2
- [CI] MlDistributedFailureIT testClusterWithTwoMlNodes_StopsDatafeed_GivenJobFailsOnReassign failing HOT 2
- [CI] EsqlSpecIT test {stats.MaxOfIpGrouping SYNC} failing HOT 2
- [CI] EsqlSpecIT test {stats.MinOfIpGrouping SYNC} failing HOT 3
- [CI] EsqlSpecIT test {stats.MinOfIpGrouping ASYNC} failing HOT 3
- [CI] InternalDistributionBwcSetupPluginFuncTest class failing HOT 2
- [CI] SecureHdfsSnapshotRepoTestKitIT class failing HOT 3
- [CI] FullClusterRestartIT testDisableFieldNameField {cluster=UPGRADED} failing HOT 2
- [CI] RollupIT class failing HOT 3
- Get snapshot API with `verbose` = `false` returns an empty `data_streams` object HOT 2
- Get snapshot API returns duplicate information for `.fleet-actions-results` system data stream HOT 1
- Missing `.fleet-actions-results` system data stream in the `fleet` feature state reported by Get snapshot API HOT 1
- [CI] CoordinatorTests testElectionSchedulingAfterDiscoveryOutage failing HOT 2
- Set names on classloaders HOT 2
- [CI] PermissionsIT testDLS failing HOT 1
- [CI] ReindexWithSecurityClientYamlTestSuiteIT test {yaml=/10_reindex/Reindex as same user works} failing HOT 1
- Add official support for Debian 12 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from elasticsearch.