Comments (3)
As a workaround, I'm trying the PUT /indexname
endpoint with committing turned off, and I'm seeing surprisingly high times it takes to complete each request.
My index PUT /indexname/_create
body
[
{
"name": "id",
"type": "text",
"options": {
"indexing": {
"record": "position",
"tokenizer": "default"
},
"stored": true
}
},
{
"name": "title",
"type": "text",
"options": {
"indexed": true,
"stored": true
}
},
{
"name": "body",
"type": "text",
"options": {
"indexed": true,
"stored": true
}
},
{
"name": "url",
"type": "text",
"options": {
"indexing": {
"record": "position",
"tokenizer": "default"
},
"stored": true
}
}
]
Example records
(I removed body to see if that was slowing it down, but that didn't really make it go any faster)
Surprising insertion times
Snippet of JavaScript API calls
In this example, I've turned parallelism to 1, so there's only one PUT /indexname
going on at a time, with commit: false
.
async function postRecord(record) {
const bodyRemoved = { ...record };
delete bodyRemoved.body;
try {
const t1 = present();
const response = await axios.put(
`http://localhost:8080/wikipedia`,
JSON.stringify({
options: { commit: false },
document: bodyRemoved,
}),
{ headers: { 'Content-Type': 'application/json' } },
);
const t2 = present();
console.log(`done posting record ${record.id} in ${Math.round(t2 - t1)}ms`);
} catch (error) {
if (error.response) {
console.warn(error.response.status);
console.warn(error.response.body);
throw error;
} else {
throw error;
}
}
}
async function postBatch(batch) {
await pMap(batch, postRecord, { concurrency: 1 });
}
CPU and Memory are bored during this slowness
Any ideas? I'm happy to help debug if you think it's a bug.
from toshi.
I appreciate the very detailed report. I'm almost certain this has to do with when I originally wrote the bulk insert, I struggled with how to bubble errors back up to the top and return those for the request while not leaving a ton of open channels and threads floating around from a bad request. I'm almost certain an error is happening during the channel passing.
I noticed immediately that there are unescaped new lines in your json. Can you humor me and try escaping the input so I can rule that out?
As for the slow post times, those definitely seem wrong too. Is the index particularly large when you commit? Or is this an empty index?
from toshi.
I don't have time to keep tinkering with this right now. I was using spare time on a Friday night just to check it out.
The index as empty, and it got locked up pretty soon after I started bulk inserting. Based on what you're saying, it seems like a bug with bulk insert.
I'll let you know if I get around to poking at this again.
from toshi.
Related Issues (20)
- Build issues HOT 5
- Unable to build on FreeBSD/ HardenedBSD HOT 1
- Can you plug this into Kibana? HOT 2
- Cargo build error with error type :expected struct `std::io::Error`, found struct `std::string::String` HOT 1
- Toshi Search fails in systemd service HOT 4
- toshi panic at run time due to tokio HOT 5
- Support for dynamic fields? HOT 2
- API for listing existing indexes HOT 5
- Search for all fields instead of a single field HOT 1
- Is there any way to use a custom tokenizer? HOT 5
- cargo build --release failed HOT 2
- More guidances in need HOT 3
- build error HOT 1
- POSTd data over 8kb in size fails HOT 3
- How close to Elasticsearch do you want to be? HOT 1
- Reindexing an existing index HOT 4
- compile failed
- Evaluate using Profile-Guided Optimization (PGO) and LLVM BOLT HOT 1
- Feature request: versioning and github releases HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from toshi.