Git Product home page Git Product logo

Comments (3)

aguynamedben avatar aguynamedben commented on June 14, 2024

As a workaround, I'm trying the PUT /indexname endpoint with committing turned off, and I'm seeing surprisingly high times it takes to complete each request.

My index PUT /indexname/_create body

[
  {
    "name": "id",
    "type": "text",
    "options": {
      "indexing": {
        "record": "position",
        "tokenizer": "default"
      },
      "stored": true
    }
  },
  {
    "name": "title",
    "type": "text",
    "options": {
      "indexed": true,
      "stored": true
    }
  },
  {
    "name": "body",
    "type": "text",
    "options": {
      "indexed": true,
      "stored": true
    }
  },
  {
    "name": "url",
    "type": "text",
    "options": {
      "indexing": {
        "record": "position",
        "tokenizer": "default"
      },
      "stored": true
    }
  }
]

Example records

(I removed body to see if that was slowing it down, but that didn't really make it go any faster)
image

Surprising insertion times

image

Snippet of JavaScript API calls

In this example, I've turned parallelism to 1, so there's only one PUT /indexname going on at a time, with commit: false.

async function postRecord(record) {
  const bodyRemoved = { ...record };
  delete bodyRemoved.body;
  try {
    const t1 = present();
    const response = await axios.put(
      `http://localhost:8080/wikipedia`,
      JSON.stringify({
        options: { commit: false },
        document: bodyRemoved,
      }),
      { headers: { 'Content-Type': 'application/json' } },
    );
    const t2 = present();
    console.log(`done posting record ${record.id} in ${Math.round(t2 - t1)}ms`);
  } catch (error) {
    if (error.response) {
      console.warn(error.response.status);
      console.warn(error.response.body);
      throw error;
    } else {
      throw error;
    }
  }
}

async function postBatch(batch) {
  await pMap(batch, postRecord, { concurrency: 1 });
}

CPU and Memory are bored during this slowness

image
image

Any ideas? I'm happy to help debug if you think it's a bug.

from toshi.

hntd187 avatar hntd187 commented on June 14, 2024

I appreciate the very detailed report. I'm almost certain this has to do with when I originally wrote the bulk insert, I struggled with how to bubble errors back up to the top and return those for the request while not leaving a ton of open channels and threads floating around from a bad request. I'm almost certain an error is happening during the channel passing.

I noticed immediately that there are unescaped new lines in your json. Can you humor me and try escaping the input so I can rule that out?

As for the slow post times, those definitely seem wrong too. Is the index particularly large when you commit? Or is this an empty index?

from toshi.

aguynamedben avatar aguynamedben commented on June 14, 2024

I don't have time to keep tinkering with this right now. I was using spare time on a Friday night just to check it out.

The index as empty, and it got locked up pretty soon after I started bulk inserting. Based on what you're saying, it seems like a bug with bulk insert.

I'll let you know if I get around to poking at this again.

from toshi.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.