Git Product home page Git Product logo

Comments (7)

twmb avatar twmb commented on June 28, 2024 1

Topic IDs are globally unique forever, even if you recreate a topic with the same name. The client remembers the ID and forever uses it. You'll have to purge the topic and then re-add it to forget the old ID -- but you can do this without recreating the client.

from franz-go.

twmb avatar twmb commented on June 28, 2024

Have you seen the UnknownTopicRetries option / does it help / provide a way to answer your question?

Also -- when you say tight loop, are you not checking the fetches.Errors(), or not backing off a little bit if it contains UnknownTopicOrPartition?

Also, what do debug logs look like? I'm curious why there's a tight loop.

from franz-go.

genzgd avatar genzgd commented on June 28, 2024

Thanks for looking! I hadn't noticed the UnknownTopicRetries option but the default of 4 does help explain why the initial delay before the error was reported (it takes approximately 6 minutes) was longer than we expected.

We are checking fetches.Errors(), but as I noted in the issue, we actually removed our backoff/retry logic since it was duplicative of the functionality internal to the kgo.Client. From this behavior it seems like we have to treat the UnknownTopicOrPartitionError differently, which is not a problem, I just want to confirm that's the right thing to do.

Debug logs attached, the tight loop starts at 01:57:44.723, where PollRecords starts returning immediately. The debug logs are very noisy until then.

franzretry.txt

from franz-go.

twmb avatar twmb commented on June 28, 2024

The tight loop is because the connection to your broker is really fast, and you're doing nothing with the error that is being returned from PollFetches.

The UnknownTopicID error is stripped 5 times because Kafka has some metadata cluster broadcasting problems, see here:

franz-go/pkg/kgo/source.go

Lines 925 to 946 in ae169a1

case kerr.UnknownTopicID:
// We need to keep UnknownTopicID even though it is
// retryable, because encountering this error means
// the topic has been recreated and we will never
// consume the topic again anymore. This is an error
// worth bubbling up.
//
// Kafka will actually return this error for a brief
// window immediately after creating a topic for the
// first time, meaning the controller has not yet
// propagated to the leader that it is now the leader
// of a new partition. We need to ignore this error
// for a little bit.
if fails := partOffset.from.unknownIDFails.Add(1); fails > 5 {
partOffset.from.unknownIDFails.Add(-1)
keep = true
} else if s.cl.cfg.keepRetryableFetchErrors {
keep = true
} else {
strip(topic, partition, fp.Err)
}

But, after 5x (per partition), the error is being bubbled up to you. Once that happens, you can see in your logs that you poll, print the error, and immediately poll again. The connection to your broker is fast enough such that this looks like a spin loop - the client is actually requesting the broker and returning the error very fast.

2161 2023-10-10 01:57:44.857 ERR kafka fetch failed after retries error="UNKNOWN_TOPIC_ID: This server does not host this topic ID." pipe_id=test_pipe_id retriable=true                      
2162 2023-10-10 01:57:44.857 DBG wrote Fetch v15 broker=42 bytes_written=62 err=null pipe_id=test_pipe_id time_to_write=0.03925 write_wait=0.051041
2163 2023-10-10 01:57:44.889 DBG read Fetch v15 broker=16 bytes_read=76 err=null pipe_id=test_pipe_id read_wait=0.054042 time_to_read=61.768375
2164 2023-10-10 01:57:44.889 DBG updated uncommitted group=geoff_mac_717 pipe_id=test_pipe_id to=large_trips[]
2165 2023-10-10 01:57:44.889 ERR kafka fetch failed after retries error="UNKNOWN_TOPIC_ID: This server does not host this topic ID." pipe_id=test_pipe_id retriable=true
2166 2023-10-10 01:57:44.889 DBG wrote Fetch v15 broker=16 bytes_written=62 err=null pipe_id=test_pipe_id time_to_write=0.079 write_wait=0.055333
2167 2023-10-10 01:57:44.908 DBG read Fetch v15 broker=54 bytes_read=76 err=null pipe_id=test_pipe_id read_wait=0.045041 time_to_read=72.789042
2168 2023-10-10 01:57:44.908 DBG updated uncommitted group=geoff_mac_717 pipe_id=test_pipe_id to=large_trips[]
2169 2023-10-10 01:57:44.908 ERR kafka fetch failed after retries error="UNKNOWN_TOPIC_ID: This server does not host this topic ID." pipe_id=test_pipe_id retriable=true
2170 2023-10-10 01:57:44.908 DBG wrote Fetch v15 broker=54 bytes_written=62 err=null pipe_id=test_pipe_id time_to_write=0.0865 write_wait=0.034917
2171 2023-10-10 01:57:44.921 DBG read Fetch v15 broker=42 bytes_read=76 err=null pipe_id=test_pipe_id read_wait=0.063667 time_to_read=63.454208
2172 2023-10-10 01:57:44.921 DBG updated uncommitted group=geoff_mac_717 pipe_id=test_pipe_id to=large_trips[]
2173 2023-10-10 01:57:44.921 ERR kafka fetch failed after retries error="UNKNOWN_TOPIC_ID: This server does not host 

You can use PurgeTopicsFromConsuming if you actually expect the topic to not exist anymore.

from franz-go.

genzgd avatar genzgd commented on June 28, 2024

Thanks for the explanation! It seems clear that in this particular instance we'll need to add an additional backoff/retry loop rather than relying on the Client. Do you know offhand any other types of errors we should look for that would cause similar behavior?

from franz-go.

twmb avatar twmb commented on June 28, 2024

I don't know what you mean by retry -- this error is not recoverable, if you deleted the topic. You need to handle it otherwise you will always receive this error. Same thing for if you received UnknownTopicOrPartition.

from franz-go.

genzgd avatar genzgd commented on June 28, 2024

Ah, I didn't realize that the error was not recoverable even if the topic was recreated. Our application runs as a service and the coordination between the user who deleted the topic and the user who is responsible for monitoring the service for errors could take quite some time, so we were thinking that the user could "fix" the situation by recreating the topic. I'm guessing we would have to create a new client/consumer group/etc. in that case. Again thanks for the insight!

from franz-go.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.