Comments (9)
@anuchandy please take a look
from azure-sdk-for-java.
Hello @neumannm,
-
There shouldn't be a need for
enableCrossEntityTransactions()
, the reading and forwarding from DLQ can still be done without it, so in that case we don't have to enable this config, you can remove them. -
The
maxAutoLockRenewDuration(..)
has no impact on message peeked using "peek*" api. The lock renew matters only if the client is used to receive messages using "receive*" api (those messages which later gets disposition (complete|abandon)). The same goes withdisableAutoComplete()
, but keeping these two doesn't hurt, the peek calls won’t be impacted by these two settings.
The message -
{"az.sdk.message":"AMQP response did not contain OK status code.","statusCode":"NO_CONTENT"}
means that there is no message in the queue at the moment, in this case, the SDK will return an empty list. So, this is not a problem but service telling the SDK that there is no message to peek.
The message "not opened within.." means that the SDK cannot establish its TCP connection from the host machine to the Service Bus endpoint. The reason for seeing the multiple occurrences of the error-message
Error occurred while refreshing token that is not retriable. Not scheduling refresh task
after the connectivity error-message (i.e., "not opened within..") is - SDK was trying to establish a TCP connection, but there were also several peekMessages (or other api calls) that were queued up. All of these calls need a shared authentication that is also dependent on the TCP connection. However, SDK could not open the connection after waiting for a minute, so all the peekMessage enqueued that were waiting for auth to complete received signal indicating connectivity issue. This is what the sequence of log messages shows.
In this case, it indicates that there is a temporary network connectivity problem from host running the app to service bus.
Regarding the last question on recommendations –
- One suggestion in terms of coding patterns is - you may use separate builder instance to build the client for each queue (i.e., use 1builder:1queue), rather than the current approach of using a shared builder instance (i.e., not use 1builder:Nqueue). This approach will give more resiliency to your app.
- I would also suggest upgrading to 7.15.1. Please refer to this documentation on upgrade and v2 flags to enable. Troubleshoot Azure Service Bus - Azure SDK for Java | Microsoft Learn
One question - in the application, are you opening, peeking and closing the client for every http request?
from azure-sdk-for-java.
Hi @anuchandy, thank you for your detailed response and the useful hints! I will adapt my code accordingly.
The message "not opened within.." means that the SDK cannot establish its TCP connection from the host machine to the Service Bus endpoint. [...] In this case, it indicates that there is a temporary network connectivity problem from host running the app to service bus.
Good to know! Might also explain why I don't see this error when testing locally, only in production (where the application runs in a Kubernetes cluster and the requests go through Azure API management... maybe there's some issue specific to this environment causing connectivity problems 🤔).
Regarding your comment:
The maxAutoLockRenewDuration(..) has no impact on message peeked using "peek*" api. The lock renew matters only if the client is used to receive messages using "receive*" api (those messages which later gets disposition (complete|abandon)).
-> we also have methods to delete or resend, where we indeed use the receive* api. So what about those - is a maxAutoLockRenewDuration
of 1 minute advisable (when lock duration is also 1 minute), or should it be higher? I think I read somewhere that these durations should not be the same.
One question - in the application, are you opening, peeking and closing the client for every http request?
Well yes, for every http request the peekMessages
Method is called that contains the code snippet from above. Since I'm using try-with-resources, the (AutoClosable
) receiver client should be closed after the method returns. This is what it looks like after refactoring:
try (ServiceBusReceiverClient receiver = new ServiceBusClientBuilder()
.connectionString(connectionString)
.configuration(new ConfigurationBuilder()
.putProperty("com.azure.messaging.servicebus.nonSession.syncReceive.v2", "true")
.build())
.receiver()
.queueName(queueName)
.buildClient()) {
Why are you asking?
from azure-sdk-for-java.
Update: I implemented all the suggestions, including updating to 7.15.1 and setting the v2 flag on the receiver client, and also creating a separate builder instance to build the client for each queue (for each request).
Unfortunately, it did not help with the problem. The connection is still lost frequently.
WARN 1 --- [ctor-executor-3] c.a.c.a.i.handler.ConnectionHandler: {"az.sdk.message":"onTransportError","connectionId":"MF_5af603_1710239185932","errorCondition":"amqp:connection:framing-error","errorDescription":"org.apache.qpid.proton.engine.TransportException: connection aborted","hostName":"my-sbns.servicebus.windows.net"}
I cannot reproduce this when running the application locally (using the same service bus connection), it only happens with the instance running in the Kubernetes cluster. Any ideas how to debug that? I tried doing
watch nc -z -v my-sbns.servicebus.windows.net 443
from another container in the same namespace, but this continuously yields
Connection to my-sbns.servicebus.windows.net (23.102.0.186) 443 port [tcp/https] succeeded!
even during the time when the timeout occurs.
Sidenote: Each request now gives me a warn message
WARN 1 --- [nio-8080-exec-1] c.a.m.s.ServiceBusClientBuilder: 'enableAutoComplete' is not supported in synchronous client except through callback receive.
now that I dropped disableAutoComplete()
.
from azure-sdk-for-java.
Hi @neumannm, thanks for the additional details. I didn't realize that client creation-disposal happens for every HTTP request.
It is not an efficient pattern to create and get rid of the client instances for each HTTP request, as this incurs extra costs, such as connection/link negotiation, AD authentication calls and other overhead, impacting the networking. In the common messaging (Service Bus, Event Hubs) scenarios, the client instances last for a long time, so the application should follow a caching approach. The approach here, for example, would be to use ConcurrentHashMap with entry key as the queue name and entry value as client instance. This map will be scoped to the application and is discarded when the application shuts down. An entry could be populated using computeIfAbsent method with a provider that is responsible to new up a builder and client from it. Each time application needs to call peek*, it can reach out to this global map to obtain an already cached client or computeIfAbsent will create one and cache it.
I don't have expertise in AKS infra or it's low-level networking debugging, but I would say, we start with the above approach, which makes the app more suitable for the limited container environments, and then check if it lowers the frequency of the networking connection abort logs that you noticed.
Also, I know that the application has some code for receive/delete/forward as well. If they are in the same pod where peek runs, you can simplify the debugging by temporarily disabling (commenting out) those receive/delete/forward code and only running peek. This will make the logs less cluttered and help you see if the network situation gets better for that peek run. Also disable any other runs in the app or other apps in the same pod.
Few questions,
- How many queues does the application monitor using the message-peek method you showed earlier?
- What is the HTTP request rate looks like – requests / min / sec?
Few things to check,
- Are the pod (hosting application) and Service Bus namespace deployed to the same region? Client and server located in distinct regions also have an impact on networking.
- I wonder the cores and memory settings for a pod that runs the Java application, too low resourcing (e.g., 0.2, 0.5 cores / pod) is another reason for frequent time out or stalling in constrained environment. Based on the case study of many Java containerized production apps, the OpenJDK team at Microsoft suggests no less than 1 core per pod. Here is the documentation by that team reg: core/memory - https://learn.microsoft.com/en-us/azure/developer/java/containers/overview#determine-how-many-cpu-cores-are-needed
Also, we can ignore the WARN about auto-complete, it’s a feature available only for ProcessorClient, so SDK is simply saying the chosen client does not support that. You may keep disableAutoComplete (which has no impact on receiver client anyway) if WARN is a noise. But with the caching approach discussed above, we should be seeing this warn only once when a cache entry is populated.
(Regarding the question on max-renewal, I’ll follow up)
from azure-sdk-for-java.
@neumannm, checking back, did the recommendations above help with your use case?
from azure-sdk-for-java.
Closing this issue, assuming that the suggestions were useful, or this is not a priority at the moment. Feel free to reach out if any assistance is needed at later point.
from azure-sdk-for-java.
@anuchandy So sorry for the late reply, project work got me distracted from this issue unfortunately...
Yes your recommendations seem to have helped. I have changed the code for the peeking receiver as you suggested, using a Map to store receivers per queue, creating each receiver only at first request. Since then, we did not have any notable issues with peeking into our queues.
I still need to refactor the code regarding the functionality to receive from DLQ and re-send to the corresponding "normal" queue for reprocessing. I tried to do the same as with the peeking receiver, but my first attempt was not successful. Which might have to do with the code being suboptimal. You mentioned that cross-entity-transactions are not needed for this functionality, but I was not able to solve it without. I hope I find time soon to dig again into that part.
You said you wanted to follow up on my question regarding max-renewal, did you find anything?
from azure-sdk-for-java.
Hello @neumannm, no worries. Glad to hear that caching pattern worked.
Regarding the lock-renew for receiver, you’ll need to set an auto-lock renew duration in the client only if the client application is expected to hold on to the received message more than the lock duration set in the entity (queue, topic) level (E.g., in Azure portal). The client-side lock renewal means calls to the service, so it’s a tradeoff b/w setting a higher value in the portal vs reducing client-side calls to service.
For example, at an entity level if it is currently 60 seconds, but if the application almost always takes ~70 seconds to process and call complete
then consider bumping entity level value to ~70-80 seconds.
from azure-sdk-for-java.
Related Issues (20)
- [BUG] pom_file_version_scanner.ps1 needs better errors around misformed lines
- [BUG] container registry lisrepository names only returns 109 max HOT 1
- [BUG] ServiceBus SessionProcessorClient with negative session idle timeout throws incorrect validation error message HOT 6
- [BUG] Frequent cases with Starting lock renewal messages in log and messages are not delivered HOT 3
- ResponseBodyMode support for SSE data
- Remove `HttpPipelineNextPolicy`
- [FEATURE REQ]Add readMany support for azure-cosmos-encryption HOT 1
- [BUG] Unable to create ASB subsctiption with azure-messaging-servicebus 7.16.0 HOT 5
- Update azure-analytics-defender-easm module, reflect latest spec HOT 1
- Releease azure-ai-contentsafety module 1.1.0
- [FEATURE REQ] Improve the experience around close calls in messaging clients
- [QUERY] Getting BlockCountExceedsLimit error code from blob storage. HOT 8
- [BUG] When we use DISTINCT with ORDER BY and OFFSET and LIMT, the result is incorrect. HOT 2
- [BUG] Not able to assign display name for sender email address in Communication Email HOT 1
- Cannot subscribe. Processor is already terminated HOT 4
- Databricks Runtime Issue with Azure Cosmos DB OLTP Spark 3 connector HOT 2
- [QUERY] Customer is not able to complete a service bus message using Azure Java SDK 7.15.2 V2 Stack, Disabling the V2 Stack they are able to complete a message with out issue HOT 6
- [QUERY] I have a spring boot application, using key vault to store my mongodb credentials. I am using service principal authentication to connect with azure key vault. The application is currently running on local env. But, when I am trying to deploy it on azure spring cloud, I am getting the below exception HOT 2
- [Query] Issue with running spring boot application on azure spring apps with key vault configuration. Using Service principal auth. HOT 3
- [QUERY] ServiceBusProcessor/SessionProcessor client - apis to accept specific session, receive deferred messages, peek message using sequence number HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from azure-sdk-for-java.