I have the app running in a Web App for container running in Azure. After a period

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

This has been fixed in the following PR: <a class="issue-link js-issue-link" data-erro

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

AzNamingTool - Application is crashing because of logfiles? about cloudadoptionframework HOT 14 CLOSED

microsoft commented on May 21, 2024 1

AzNamingTool - Application is crashing because of logfiles?

from cloudadoptionframework.

Comments (14)

iic-eon commented on May 21, 2024

Dear team, it will be helpful if the above can be solved. Thank you.

from cloudadoptionframework.

jamasten commented on May 21, 2024

Thank you for reporting the issue. We are looking into it and will respond here once we've implemented a fix.

from cloudadoptionframework.

ciprianglg commented on May 21, 2024

@jamasten if required i can provide de app url, and give credentials if will speed up, at least to understand what the problem might be, in case something requires more understanding.

from cloudadoptionframework.

BryanSoltis commented on May 21, 2024

Hi @ciprianglg,

Thank you for the feedback. I have been looking at the code and may have some ideas as to the cause, but need to clarify some aspects first.

Your stack traces in your original message show errors on the Generate and Reference pages. However, later in your report under the "how to replicate", you mention calling the API. Can you please clarify how you are using the site and receiving the error?
If you are using the site (not the API), are you receiving the error after some time of not accessing the site? The stack trace indicates the "delimiter" value is empty, so I believe the root cause is that the site no longer has the correct configuration data. I have added code that would reload this data if it's ever empty but wanted to confirm this was the scenario you are seeing.

The site by default will attempt to cache configuration for 5 minutes (I have increased this to 10 minutes in the next update). When the configuration data is requested and it's empty/expired, then the site should repull the configuration data. This would require something happening on the site (reloading a page, navigating to another page, etc.). Can you detail the steps you take when you see the errors?

When you stated you cleared the admin log/generated names log files, are you also restarting the application? If so, I believe the log aspect is unrelated, as I think the cause it the configuration not being loaded (see # 3 above).
Your stacktrace.txt file shows errors when the site attempts to deserialize the data from the JSON files. If you are running the site as a container, these files would be stored in the mounted storage account. Can you confirm that the storage account contains the JSON files? You will need to browse to the storage account directly to look at the file share files.

Thank you for any additional details you can provide!

-Bryan

from cloudadoptionframework.

ciprianglg commented on May 21, 2024

@BryanSoltis sorry for late response, i was off for a couple of days.
Point 1:
On how to replicate is just how you can make the application hanging and not working any more.
On my side I'm using the application mostly using API calls to generate names, and suddenly i don't get a reply back, and in the logs i can see the stack overflow error when i don't get the response back.
Then I'm moving to the web app, to see what is going on, and there i can see those errors from screen shot and also posted when i navigate through the menu especially to Reference/Generate

Point 2:
As mentioned in point 1, is just immediately after the API call is not working, I'm switching to the web interface to see if application is still working, and navigating through the interface

Point 3:
Application is restarting very often because it says that is not healthy. I'm doing a stop and start, but still when it comes back is still crashing because it is not healthy. Only way to make it working properly again, is stopping the web app, delete those two log files, and start it again.

Point 4:
I'm running it in Azure App Service in a web app for container, and to the app service i have mounted the file share from a storage account, and all the files are there all the time, as I'm using it to get rid of the log files to make app working again.

If required, we can have a session together for a couple of minutes, to go through the errors. Idea is that i want to use this app, especially the API in provisioning azure resources programmatically, and I'm finding it fit for my purpose, as i don't need to worry about what names i should provide to resources.

from cloudadoptionframework.

BryanSoltis commented on May 21, 2024

Hi @ciprianglg,

Thank you for the details! It definitely helps me understand better about what might be happening. In reviewing the stack trace again, I can see it's failing repeatedly to write to the Admin Log. When a name is requested via the API and the process fails, it will attempt to do this. The actual "RequestName" function has a single try/catch that will attempt this if anything goes wrong. What's odd is that your stack trace file has MANY occurrences of this in a short succession (milliseconds apart).

The "Write to Admin Log" process goes like this:

Get all the current log items
Increase the id counter
Write the new log item to the list with the new id

I think the issue might be when the site is attempting to pull those Admin Log Messages, it's having some error doing so. But, that function is also in a try catch and should just error out to a single response if it does.

During the development of the tool and after the first launch, I hosted it in an Azure App Service container, as well. While this worked almost every time, occasionally I would see an issue with it being able to connect to the storage mount. I was never able to fully identify the circumstances that lead to the issue, and restarting it usually resolved it. I'm wondering if that issue is what you are experiencing, as well.

I am looking at the code now to see if I can add any sort of check to "verify" that the storage is available, in the event that is causing an issue. It's not a solution, but would at least avoid the problem you are seeing, as the site should self-correct if it happens.

Once I get that fix in place, I will send you an invite to a private repo with the change, if you don't mind testing. I will attempt to recreate the problem, but it would be awesome if you could, as well.

Bryan

from cloudadoptionframework.

ciprianglg commented on May 21, 2024

Hi @BryanSoltis imagine, that I'm trying to provision resources for a project, and in that project i have different kind of resources. Each time i do a change, there will be lots of request to get the names for me, and this will explain a lots of calls. In the example i have to reproduce the error quickly I'm just trying to request a name for all those 350+ short names, and this is why you can see those occurrences (not ideal)
Also i was thinking that the problem is with the storage account, but i succeed to crash the app without storage account being mounted, so I've just eliminated that one, so don't think you should focus on that one. Bad part about not having the file share mounted was that, each time, the app crashes, the image is retrieved from registry and app is like new, and also I'm not able to find the logs in the container.
About testing I'm ready when you are :).

from cloudadoptionframework.

BryanSoltis commented on May 21, 2024

OK, good to know. In looking at it some more, I think the issue may actually be with the caching. I'll be sharing a private repo shortly with the updates for you to test.

NOTE
You will see tool/file version messages, as this code is ahead of the official release.

Bryan

from cloudadoptionframework.

BryanSoltis commented on May 21, 2024

This has been fixed in the following PR: #124

Bryan

from cloudadoptionframework.

ciprianglg commented on May 21, 2024

@BryanSoltis I still have the issue that the application is crashing from time to time with the following stack overflow message:
stackOverflowIssue31.01.2023.txt

I tried several time to restart the app service, stop it, start it, but i was not able to make it working again, as the container is crashing.
Only way to solve it was to delete admin log and then application was started to work again.
Any hints about it?

from cloudadoptionframework.

BryanSoltis commented on May 21, 2024

Hello Cirprian,

Sorry to hear you are still having issues. From the logs, it appears to have a problem with the Serialization/De-serialization of the Admin Log messages. Then, when it has this issue, it attempts to log the message in the Amin Log, further compounding the problem. Additional logic was added int eh previous PR to address this issue, but it seems like something else is still not working.

Can you please provide the following:

Current tool version you are running
Environment you are running (AKS, Docker, etc.)

Also, I have created a new private GitHub and added you as a collaborator. Can you please add your /settings/adminlogmessages.json file to the repo so I can review. Ideally, I would like the log in the state when you experience the issue to determine if there is specific content.

Once I have the above, I will research this more to try and determine what the problem may be.

-Bryan

from cloudadoptionframework.

ciprianglg commented on May 21, 2024

@BryanSoltis
I'm using the latest version of the tool, and is running inside a web app service. Unfortunately i didn't saved the adminlogmessages.json from storage account, and realized this later that i should keep it.
In this case i will wait for a new failure and then, i will uploaded together with multiple logs from the app service, and in case we will not have a urgent need, i will share with you the links to see behavior.
Is this ok with you?

from cloudadoptionframework.

BryanSoltis commented on May 21, 2024

Sounds good. Let me know when you upload the files to the private repo and I'll check it out. Thank you!

-Bryan

from cloudadoptionframework.

ciprianglg commented on May 21, 2024

To make a summary, i used self written code through which i called the api to return me generated name. My code was sending requests too fast, that made strange behavior to the application, which in the end will result in a stack overflow message (this message was generated because the log files especially adminlogmessages.json had a bad formatting, containing an extra "]").
My code was refactored and for the moment i was not able to generate the stack overflow message anymore.

from cloudadoptionframework.

AzNamingTool - Application is crashing because of logfiles? about cloudadoptionframework HOT 14 CLOSED

Comments (14)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent