Comments (14)
Strange... Can you please share the logs? (located under your data folder in the "logs" subfolder)
from clearml-server.
apiserver.log contains only messages as follows:
[2019-06-24 13:47:08,731] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 15ms
[2019-06-24 13:47:08,736] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 13ms
[2019-06-24 13:47:08,859] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 8ms
[2019-06-24 13:47:09,096] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 9ms
[2019-06-24 13:47:09,139] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 9ms
[2019-06-24 13:47:09,167] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 9ms
[2019-06-24 13:47:09,265] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 10ms
[2019-06-24 13:47:09,295] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 13ms
[2019-06-24 13:47:09,300] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 11ms
[2019-06-24 13:47:09,348] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 9ms
[2019-06-24 13:47:09,361] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 8ms
[2019-06-24 13:47:09,445] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 9ms
[2019-06-24 13:47:09,638] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 13ms
[2019-06-24 13:47:09,651] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 17ms
[2019-06-24 13:47:09,658] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 13ms
[2019-06-24 13:47:09,715] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 13ms
[2019-06-24 13:47:09,731] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 20ms
[2019-06-24 13:47:09,734] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 16ms
[2019-06-24 13:47:09,779] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 12ms
[2019-06-24 13:47:09,786] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 11ms
[2019-06-24 13:47:09,890] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 9ms
[2019-06-24 13:47:10,128] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 9ms
[2019-06-24 13:47:10,173] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 8ms
[2019-06-24 13:47:10,311] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 9ms
[2019-06-24 13:47:10,352] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 15ms
[2019-06-24 13:47:10,357] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 13ms
[2019-06-24 13:47:10,390] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 9ms
[2019-06-24 13:47:10,402] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 8ms
[2019-06-24 13:47:10,476] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 9ms
[2019-06-24 13:47:10,723] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 17ms
[2019-06-24 13:47:10,728] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 13ms
[2019-06-24 13:47:10,741] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 10ms
[2019-06-24 13:47:10,782] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 9ms
[2019-06-24 13:47:10,803] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 16ms
[2019-06-24 13:47:10,805] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 14ms
[2019-06-24 13:47:10,824] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 8ms
[2019-06-24 13:47:10,839] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 9ms
[2019-06-24 13:47:10,921] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 9ms
[2019-06-24 13:47:11,159] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 9ms
[2019-06-24 13:47:11,204] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 8ms
[2019-06-24 13:47:11,247] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 9ms
[2019-06-24 13:47:11,354] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 9ms
[2019-06-24 13:47:11,423] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 17ms
[2019-06-24 13:47:11,426] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 12ms
[2019-06-24 13:47:11,439] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 10ms
[2019-06-24 13:47:11,453] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 11ms
[2019-06-24 13:47:11,508] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 9ms
[2019-06-24 13:47:11,773] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 9ms
[2019-06-24 13:47:11,786] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 9ms
[2019-06-24 13:47:11,798] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 8ms
[2019-06-24 13:47:11,825] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 9ms
[2019-06-24 13:47:11,885] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 17ms
[2019-06-24 13:47:11,887] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 24ms
[2019-06-24 13:47:11,903] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 22ms
[2019-06-24 13:47:11,905] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 12ms
[2019-06-24 13:47:11,952] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 9ms
[2019-06-24 13:47:12,191] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 9ms
[2019-06-24 13:47:12,235] [8] [INFO] [trains.service_repo] Returned 200 for users.get_all in 9ms
webserver.log contains:
[2019-06-20 10:22:38,614] [7] [INFO] [trains.webserver] ################ Web Server initializing #####################
from clearml-server.
It seems the apiserver is somewhat busy answering multiple "get all users" calls. These requests are typically called from the webserver in order to display the full user list in the login page. It looks like the webserver's login page is constantly being refreshed (in under 40ms intervals).
Any idea why? Is there any chance your server is publicly accessible?
from clearml-server.
Any idea why? Is there any chance your server is publicly accessible?
Shouldn't be, not that I know of. I'll recheck it of course...
Is there a way to get verbose logging to figure out the IP of the sender?
from clearml-server.
The easiest way would to to run netstat -na | grep :8080
on the host machine (use 8080 to monitor connections to the web-server, and 8008 to monitor connections to the api-server).
from clearml-server.
@doronAtuar was the high CPU usage problem solved? did you manage to get the TRAINS-server working?
from clearml-server.
Yes,
did you manage to get the TRAINS-server working?
It was working fine
The easiest way would to to run netstat -na | grep :8080
I ran netstat -na | grep :8080
All connections seem like they were coming from localhost
After that, I had to stop Trains due to another problem
So that is how far I was able to investigate
from clearml-server.
@doronAtuar if the high CPU usage problem persists, please share a few more details on your installation environment, as we were not able to reproduce this behavior.
- Cloud provider / On-Prem
- OS
- docker version
- public/private IP , firewall, load-balancer,
- etc.
from clearml-server.
Sure...
- On-Prem
- OS - Ubuntu 18.04.1
- docker version - Docker version 18.09.5, build e8ff056
- It was exposed only internally in our LAN, which is behind a firewall.
These are the details of the previous run
I will try soon on another machine and comment again if it persists.
from clearml-server.
@doronAtuar , thanks!
Unfortunately we tested on ubuntu 18.04 and everything seemed normal.
Could you also please share the sudo netstat -natp | grep 8080
output?
from clearml-server.
Hey,
I'm unable to provide the logs since I had to stop the service.
Unfortunately, I didn't save them.
I cannot start it again on the same machine since it collided with another service running MongoDB.
I will try to start it on another machine and let you know
from clearml-server.
I cannot start it again on the same machine since it collided with another service running MongoDB.
This might actually be the issue. If there is a mongodb service running, configured on the same port as the mongodb docker, then the mongodb docker will keep restarting. The API server will keep trying to access the mongodb that is not available (or even worse, the other mongodb service), and this will incur high CPU usage...
from clearml-server.
@doronAtuar a new TRAINS-server was released, please let me know if the problem still exists.
from clearml-server.
Closing, due to lack of activity.
from clearml-server.
Related Issues (20)
- ClearML does not report after using .report_matplotlib_figure() HOT 9
- Http 401 in Reports section HOT 9
- How do I connect a non-AWS S3 bucket? HOT 10
- clearml-webserver crashes when IPv6 is disabled on a k8s node HOT 1
- Could not find host server definition HOT 5
- Feature Request: Get server configuration parameters from AWS Secrets Manager [security]
- [Customising web-ui] - Projects are loading tasks in web ui of self hosting server but i want them to show datasets HOT 3
- generating clearml-reports HOT 13
- How to write artifacts to S3 from server side? HOT 1
- Nginx Not Loading Plotly.js Resource: ClearML Self-Hosted Docker HOT 7
- Failed Navigate From Overview to Experiments Details HOT 4
- Async Delete Always Failed when Removing Experiments (using Minio)
- nginx 0.6.x < 1.20.1 1-Byte Memory Overwrite RCE vulnerability HOT 2
- ElasticSearch UI and Redis UI? HOT 2
- The problem with scalars HOT 12
- Curl 7.69 < 8.4.0 Heap Buffer Overflow vulnerability HOT 2
- OpenSSL 1.1.1 < 1.1.1x Vulnerability HOT 1
- Elasticsearch image tag 7.17 does not exist HOT 4
- Git package is not installed by default in node:20-bookworm-slim HOT 1
- SERVER UNAVAILABLE HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from clearml-server.