Git Product home page Git Product logo

Comments (5)

elasticsearchmachine avatar elasticsearchmachine commented on August 18, 2024

Pinging @elastic/ml-core (Team:ML)

from elasticsearch.

prwhelan avatar prwhelan commented on August 18, 2024

Similar to #107266 (comment)

A stopped transform has a running thread that eventually fails to do whatever it is that it is doing (in this case, save to index)

from elasticsearch.

prwhelan avatar prwhelan commented on August 18, 2024

I cannot think of a good way to approach this, the current option is to explicitly check for this here:

 else if (irrecoverableException instanceof IndexNotFoundException && IndexerState.ABORTING == getState()) {
    logger.debug(
        "[{}] Bulk index experienced IndexNotFoundException failure while Transform is aborting. This is likely "
            + "due to the Transform delete API called with delete_dest_index=true.  Aborting indexer. ",
        getJobId()
    );
    onAbort();
    // do not call listener
}

This is similar to what happens before and after this code is invoked, where the Transform exits gracefully if it was moved into the ABORTING state (via the DELETE API).

The alternative is to have fail check for the ABORTING state and exit gracefully there, but that is a more invasive change with broader impact

from elasticsearch.

prwhelan avatar prwhelan commented on August 18, 2024

The above isn't quite true, it is correlated but I cannot repro it.

This is actually the line that is throwing the exception: https://github.com/elastic/elasticsearch/blob/main/server/src/main/java/org/elasticsearch/action/support/replication/TransportReplicationAction.java#L832

I haven't fully digested what this code is doing enough to repro it, it feels vaguely like the node running the transform has a earlier cluster state version than the node processing the bulk request, so we fail out instead of retrying?

from elasticsearch.

prwhelan avatar prwhelan commented on August 18, 2024

This can kinda be consistently reproduced by trying to get the shards to relocate. When this happens, we want the Transform to retry the checkpoint and check if the Index exists.

It might be possible that a Delete API can still trigger this issue, in which case we will want to have the Indexer gracefully shut down.

It also might be possible that the user can delete the Index using the Index API without stopping the Transform, in that case the Transform seems to continue running and the Bulk API will recreate the Index as per its spec. Ideally we'd either stop the Transform or have the Transform create/update the Index as per the Transport API spec, but I don't think that is within the scope of this issue.

from elasticsearch.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.