Comments (7)
Hey @jakob-reesalu, I was pretty busy with my daily job for the past two weeks. I'll take a look at this issue and also will follow up on the other one. 👍
from sql_exporter.
@burningalchemist Ya no worries!
from sql_exporter.
Update:
To get logging I've run my sql-exporters from command line now. One weird thing I've seen when running from command line is that everything seems to get locked up until I press up/down arrow (might be that any key press works) in the command line window.
For example: I refresh the /metrics page and it just gets stuck loading, then I bring up the command window and press up or down arrow and suddenly I see logging about collected metrics, at this point the /metrics page stops loading and presents the metrics.
So, coming back from the week-end I had logs showing "error gathering metrics", until I pressed up/down arrows in the command window, then the exporter logged that metrics were gathered:
[Just lots of errors most of week-end]
...
ts=2022-11-14T05:58:17.379Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 3 error(s) occurred:\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"table_counts\"] context deadline exceeded"
ts=2022-11-14T06:58:16.894Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 3 error(s) occurred:\n* [from Gatherer #1] [collector=\"table_counts\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded"
ts=2022-11-14T07:58:16.410Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 3 error(s) occurred:\n* [from Gatherer #1] [collector=\"table_counts\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded"
[So here I press a up or down arrow, now suddenly we get "Collecting fresh metrics...". Some errors but then it all starts working.]
ts=2022-11-14T08:11:46.823Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"table_counts\"] Collecting fresh metrics: min_interval=21600.000s cache_age=234005.553s"
ts=2022-11-14T08:11:46.823Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"table_counts\", query=\"<table1>s_count\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"table_counts\", query=\"<table2>s_count\"] context deadline exceeded"
ts=2022-11-14T08:11:46.823Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"index_fragmentation_checks\"] Collecting fresh metrics: min_interval=3600.000s cache_age=230405.467s"
ts=2022-11-14T08:11:46.824Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"cleanup_jobs\"] Collecting fresh metrics: min_interval=21600.000s cache_age=234005.553s"
ts=2022-11-14T08:11:46.824Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"table_counts\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\", query=\"index_fragmentation_<table1>_<table2>_<table3>\"] context deadline exceeded"
ts=2022-11-14T08:13:20.163Z caller=klog.go:55 level=debug func=Verbose.Infof msg="returned_columns=\"[database count]\"collector=\"table_counts\", query=\"<table1>s_count\""
ts=2022-11-14T08:13:23.571Z caller=klog.go:55 level=debug func=Verbose.Infof msg="returned_columns=\"[database count]\"collector=\"table_counts\", query=\"<table2>s_count\""
ts=2022-11-14T08:15:46.414Z caller=klog.go:55 level=debug func=Verbose.Infof msg="returned_columns=\"[database table index avgFragmentationInPercent]\"collector=\"index_fragmentation_checks\", query=\"index_fragmentation_<table1>_<table2>_<table3>\""
ts=2022-11-14T08:15:57.153Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"table_counts\"] Returning cached metrics: min_interval=21600.000s cache_age=318.392s"
ts=2022-11-14T08:15:57.154Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"index_fragmentation_checks\"] Returning cached metrics: min_interval=3600.000s cache_age=318.392s"
ts=2022-11-14T08:15:57.250Z caller=klog.go:55 level=debug func=Verbose.Infof msg="returned_columns=\"[database cleanupPerformed]\"collector=\"cleanup_jobs\", query=\"refreshtoken_cleanup_job\""
ts=2022-11-14T08:15:57.305Z caller=klog.go:55 level=debug func=Verbose.Infof msg="returned_columns=\"[database cleanupPerformed]\"collector=\"cleanup_jobs\", query=\"<table1>s_cleanup_job\""
ts=2022-11-14T08:15:57.885Z caller=klog.go:55 level=debug func=Verbose.Infof msg="returned_columns=\"[database cleanupPerformed]\"collector=\"cleanup_jobs\", query=\"<table2>s_cleanup_job\""
ts=2022-11-14T08:15:57.885Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"cleanup_jobs\"] Returning cached metrics: min_interval=21600.000s cache_age=318.392s"
ts=2022-11-14T08:20:24.757Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"table_counts\"] Returning cached metrics: min_interval=21600.000s cache_age=1327.962s"
ts=2022-11-14T08:20:24.758Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"cleanup_jobs\"] Returning cached metrics: min_interval=21600.000s cache_age=1327.963s"
ts=2022-11-14T08:20:24.758Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"index_fragmentation_checks\"] Returning cached metrics: min_interval=3600.000s cache_age=1327.963s"
from sql_exporter.
Update: To get logging I've run my sql-exporters from command line now. One weird thing I've seen when running from command line is that everything seems to get locked up until I press up/down arrow (might be that any key press works) in the command line window.
For example: I refresh the /metrics page and it just gets stuck loading, then I bring up the command window and press up or down arrow and suddenly I see logging about collected metrics, at this point the /metrics page stops loading and presents the metrics.
So, coming back from the week-end I had logs showing "error gathering metrics", until I pressed up/down arrows in the command window, then the exporter logged that metrics were gathered:
[Just lots of errors most of week-end] ... ts=2022-11-14T05:58:17.379Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 3 error(s) occurred:\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"table_counts\"] context deadline exceeded" ts=2022-11-14T06:58:16.894Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 3 error(s) occurred:\n* [from Gatherer #1] [collector=\"table_counts\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded" ts=2022-11-14T07:58:16.410Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 3 error(s) occurred:\n* [from Gatherer #1] [collector=\"table_counts\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"cleanup_jobs\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\"] context deadline exceeded" [So here I press a up or down arrow, now suddenly we get "Collecting fresh metrics...". Some errors but then it all starts working.] ts=2022-11-14T08:11:46.823Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"table_counts\"] Collecting fresh metrics: min_interval=21600.000s cache_age=234005.553s" ts=2022-11-14T08:11:46.823Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"table_counts\", query=\"<table1>s_count\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"table_counts\", query=\"<table2>s_count\"] context deadline exceeded" ts=2022-11-14T08:11:46.823Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"index_fragmentation_checks\"] Collecting fresh metrics: min_interval=3600.000s cache_age=230405.467s" ts=2022-11-14T08:11:46.824Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"cleanup_jobs\"] Collecting fresh metrics: min_interval=21600.000s cache_age=234005.553s" ts=2022-11-14T08:11:46.824Z caller=klog.go:84 level=debug func=Infof msg="Error gathering metrics: 2 error(s) occurred:\n* [from Gatherer #1] [collector=\"table_counts\"] context deadline exceeded\n* [from Gatherer #1] [collector=\"index_fragmentation_checks\", query=\"index_fragmentation_<table1>_<table2>_<table3>\"] context deadline exceeded" ts=2022-11-14T08:13:20.163Z caller=klog.go:55 level=debug func=Verbose.Infof msg="returned_columns=\"[database count]\"collector=\"table_counts\", query=\"<table1>s_count\"" ts=2022-11-14T08:13:23.571Z caller=klog.go:55 level=debug func=Verbose.Infof msg="returned_columns=\"[database count]\"collector=\"table_counts\", query=\"<table2>s_count\"" ts=2022-11-14T08:15:46.414Z caller=klog.go:55 level=debug func=Verbose.Infof msg="returned_columns=\"[database table index avgFragmentationInPercent]\"collector=\"index_fragmentation_checks\", query=\"index_fragmentation_<table1>_<table2>_<table3>\"" ts=2022-11-14T08:15:57.153Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"table_counts\"] Returning cached metrics: min_interval=21600.000s cache_age=318.392s" ts=2022-11-14T08:15:57.154Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"index_fragmentation_checks\"] Returning cached metrics: min_interval=3600.000s cache_age=318.392s" ts=2022-11-14T08:15:57.250Z caller=klog.go:55 level=debug func=Verbose.Infof msg="returned_columns=\"[database cleanupPerformed]\"collector=\"cleanup_jobs\", query=\"refreshtoken_cleanup_job\"" ts=2022-11-14T08:15:57.305Z caller=klog.go:55 level=debug func=Verbose.Infof msg="returned_columns=\"[database cleanupPerformed]\"collector=\"cleanup_jobs\", query=\"<table1>s_cleanup_job\"" ts=2022-11-14T08:15:57.885Z caller=klog.go:55 level=debug func=Verbose.Infof msg="returned_columns=\"[database cleanupPerformed]\"collector=\"cleanup_jobs\", query=\"<table2>s_cleanup_job\"" ts=2022-11-14T08:15:57.885Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"cleanup_jobs\"] Returning cached metrics: min_interval=21600.000s cache_age=318.392s" ts=2022-11-14T08:20:24.757Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"table_counts\"] Returning cached metrics: min_interval=21600.000s cache_age=1327.962s" ts=2022-11-14T08:20:24.758Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"cleanup_jobs\"] Returning cached metrics: min_interval=21600.000s cache_age=1327.963s" ts=2022-11-14T08:20:24.758Z caller=klog.go:55 level=debug func=Verbose.Infof msg="[collector=\"index_fragmentation_checks\"] Returning cached metrics: min_interval=3600.000s cache_age=1327.963s"
Actually! I'm running in Powershell and googled on this, found out this is due to a Powershell setting described in this answer: https://serverfault.com/questions/204150/sometimes-powershell-stops-sending-output-until-i-press-enter-why
So that's solved now and no Sql-Exporter issue :)
from sql_exporter.
Hey @jakob-reesalu, did you eventually figure out the configuration?
I believe in the end it's related to the timeout configured on the Prometheus side. If the connection is cut after 15s, sql_exporter also cancels the request since there's no receiver to respond to.
I'm going to close the issue as stale/solved. Feel free to reopen it. 👍
Cheers!
from sql_exporter.
Hey man @burningalchemist!
I don't recall exactly what I did in the end. Unfortunately I haven't reached the "end" fully as we're still having issues with the exporter. =/ As of right now we're getting one or perhaps some scrapes for a day, then Prometheus alerts that the exporter is down but the service is still running on the DB machine. Even if the service is running the /metrics page shows errors though, instead of the metrics that it previously managed to get. Not sure if our issue relates to what you've added in the start page:
"Configuration
SQL Exporter is deployed alongside the DB server it collects metrics from. If both the exporter and the DB server are on the same host, they will share the same failure domain: they will usually be either both up and running or both down. When the database is unreachable, /metrics responds with HTTP code 500 Internal Server Error, causing Prometheus to record up=0 for that scrape. Only metrics defined by collectors are exported on the /metrics endpoint. SQL Exporter process metrics are exported at /sql_exporter_metrics."
The thing is that the database is up and running in our case so it doesn't seem to be unreachable.
I'll see if I get time to come back to this in the future.
from sql_exporter.
@jakob-reesalu Got it, thanks! Yeah, please re-open once you have some time, and maybe we can run over it again, I'm happy to assist. In the meantime I'll think about it. There might be many factors when the query is long.
from sql_exporter.
Related Issues (20)
- Add support for ASE database (sybase) HOT 1
- Context deadline exceeded error handling HOT 4
- The problem of specifying a schema when capturing PostgreSQL metrics HOT 6
- Different collector file for each differrent job HOT 1
- Allow to set metric timestamp from query results
- Caching context deadline exceeded result HOT 6
- Context Deadline Exceeded HOT 9
- Support for MSSQL named pipe (np:) and shared memory (lpc:) DSNs HOT 16
- `up` metric to show that database connection was successful HOT 5
- parameter to set log file location HOT 8
- Add custom metrics to expose MSSQL server hostname HOT 1
- can we have the Postgres collector yml file example
- Is it possible to distinguish between the sql exporter process status and the target database? HOT 3
- Scrape each job separately HOT 2
- [helm] Support getting data-source-name from an existing secret HOT 12
- Ignore no rows returned HOT 6
- Better handling of NULL return values in certain cases HOT 6
- Environment variable substitution for DSN's HOT 3
- Cache Mechanism for sql exporter HOT 2
- clickhouse `context deadline exceeded` HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sql_exporter.