Git Product home page Git Product logo

oracle-db-appdev-monitoring's Introduction

Unified Observability for Oracle Database

This project aims to provide observability for the Oracle Database so that users can understand performance and diagnose issues easily across applications and database. Over time, this project will provide not just metrics, but also logging and tracing support, and integration into popular frameworks like Spring Boot. The project aims to deliver functionality to support both cloud and on-premises databases, including those running in Kubernetes and containers.

From the first production release, v1.0, onwards, this project provides a Prometheus exporter for Oracle Database that is based in part on a Prometheus exporter created by Seth Miller with changes to comply with various Oracle standards and policies.

Contributions are welcome - please see contributing.

Table of Contents

Release Notes

Version 1.3.1, July 22, 2024

This release includes the following changes:

  • Alert logs can be disabled by setting parameter log.disable to 1.
  • Alert log exporter will stop if it gets three consecutive failures.
  • Updated the list of required permissions.
  • Updated the TxEventQ sample dashboard.
  • Updated some third-party dependencies.

Thank you to the following people for their suggestions and contributions:

Version 1.3.0, June 7, 2024

This release includes the following changes:

  • Alert logs can be exported for collection by a log reader like Promtail or FluentBit. Default output to /log/alert.log in JSON format.
  • Provide ability to connect as SYSDBA or SYSOPER by setting DB_ROLE.
  • New default metric is added to report the type of database connected to (CDB or PDB).
  • New default metrics are added for cache hit ratios.
  • Deafult metrics updated to suppress spurious warnings in log.
  • Wait class metric updated to use a better query.
  • The sample dashboard is updated to include new metrics.
  • Fixed a bug which prevented periodic freeing of memory.
  • Set CLIENT_INFO to a meaningful value.
  • Update Go toolchain to 1.22.4.
  • Updated some third-party dependencies.

Thank you to the following people for their suggestions and contributions:

Version 1.2.1, April 16, 2024

This release includes the following changes:

  • Accept max idle and open connections settings as parameters.
  • Updated some third-party dependencies.

Version 1.2.0, January 17, 2024

This release includes the following changes:

  • Introduced a new feature to periodically restart the process if requested.
  • Introduced a new feature to periodically attempt to free OS memory if requested.
  • Updated some third-party dependencies.

Version 1.1.1, November 28, 2023

This release just updates some third-party dependencies.

Version 1.1, October 27, 2023

This release includes the following changes:

  • The query for the standard metric wait_class has been updated so that it will work in both container databases and pluggable databases, including in Oracle Autonomous Database instances. Note that this query will not return any data unless the database instance is under load.
  • Support for reading the database password from OCI Vault has been added (see details)
  • Log messages have been improved
  • Some dependencies have been updated

Version 1.0, September 13, 2023

The first production release, v1.0, includes the following features:

  • A number of standard metrics are exposed,
  • Users can define custom metrics,
  • Oracle regularly reviews third-party licenses and scans the code and images, including transitive/recursive dependencies for issues,
  • Connection to Oracle can be a basic connection or use an Oracle Wallet and TLS - connection to Oracle Autonomous Database is supported,
  • Metrics for Oracle Transactional Event Queues are also supported,
  • A Grafana dashboard is provided for Transacational Event Queues, and
  • A pre-built container image is provided, based on Oracle Linux, and optimized for size and security.

Note that this exporter uses a different Oracle Database driver which in turn uses code directly written by Oracle to access the database. This driver does require an Oracle client. In this initial release, the client is bundled into the container image, however we intend to make that optional in order to minimize the image size.

The interfaces for this version have been kept as close as possible to those of earlier alpha releases in this repository to assist with migration. However, it should be expected that there may be breaking changes in future releases.

Roadmap

We always welcome input on features you would like to see supported. Please open an issue in this repository with your suggestions.

Currently, we plan to address the following key features:

  • Implement multiple database support - allow the exporter to publish metrics for multiple database instances,
  • Implement connection storm protection - prevent the exporter from repeatedly connecting when the credentials fail, to prevent a storm of connections causing accounts to be locked across a large number of databases,
  • Provide the option to have the Oracle client outside of the container image, e.g., on a shared volume,
  • Implement the ability to update the configuration dynamically, i.e., without a restart,
  • Implement support for exporting logs, including audit logs for example, from the database,
  • Implement support for tracing within the database, e.g., using an execution context ID provide by an external caller,
  • Provide additional pre-built Grafana dashboards,
  • Integration with Spring Observability, e.g., Micrometer,
  • Provide additional documentation and samples, and
  • Integrate with the Oracle Database Operator for Kubernetes.

Standard metrics

The following metrics are exposed by default:

# HELP oracledb_activity_execute_count Generic counter metric from v$sysstat view in Oracle.
# TYPE oracledb_activity_execute_count gauge
oracledb_activity_execute_count 64469
# HELP oracledb_activity_parse_count_total Generic counter metric from v$sysstat view in Oracle.
# TYPE oracledb_activity_parse_count_total gauge
oracledb_activity_parse_count_total 25883
# HELP oracledb_activity_user_commits Generic counter metric from v$sysstat view in Oracle.
# TYPE oracledb_activity_user_commits gauge
oracledb_activity_user_commits 158
# HELP oracledb_activity_user_rollbacks Generic counter metric from v$sysstat view in Oracle.
# TYPE oracledb_activity_user_rollbacks gauge
oracledb_activity_user_rollbacks 2
# HELP oracledb_db_platform_value Database platform
# TYPE oracledb_db_platform_value gauge
oracledb_db_platform_value{platform_name="Linux x86 64-bit"} 1
# HELP oracledb_db_system_value Database system resources metric
# TYPE oracledb_db_system_value gauge
oracledb_db_system_value{name="cpu_count"} 2
oracledb_db_system_value{name="pga_aggregate_limit"} 2.147483648e+09
oracledb_db_system_value{name="sga_max_size"} 1.610612736e+09
# HELP oracledb_dbtype Type of database the exporter is connected to (0=non-CDB, 1=CDB, >1=PDB).
# TYPE oracledb_dbtype gauge
oracledb_dbtype 0
# HELP oracledb_exporter_build_info A metric with a constant '1' value labeled by version, revision, branch, goversion from which oracledb_exporter was built, and the goos and goarch for the build.
# TYPE oracledb_exporter_build_info gauge
oracledb_exporter_build_info{branch="",goarch="amd64",goos="linux",goversion="go1.22.4",revision="unknown",tags="unknown",version=""} 1
# HELP oracledb_exporter_last_scrape_duration_seconds Duration of the last scrape of metrics from Oracle DB.
# TYPE oracledb_exporter_last_scrape_duration_seconds gauge
oracledb_exporter_last_scrape_duration_seconds 0.040507382
# HELP oracledb_exporter_last_scrape_error Whether the last scrape of metrics from Oracle DB resulted in an error (1 for error, 0 for success).
# TYPE oracledb_exporter_last_scrape_error gauge
oracledb_exporter_last_scrape_error 0
# HELP oracledb_exporter_scrapes_total Total number of times Oracle DB was scraped for metrics.
# TYPE oracledb_exporter_scrapes_total counter
oracledb_exporter_scrapes_total 3
# HELP oracledb_process_count Gauge metric with count of processes.
# TYPE oracledb_process_count gauge
oracledb_process_count 79
# HELP oracledb_resource_current_utilization Generic counter metric from v$resource_limit view in Oracle (current value).
# TYPE oracledb_resource_current_utilization gauge
oracledb_resource_current_utilization{resource_name="branches"} 0
oracledb_resource_current_utilization{resource_name="cmtcallbk"} 0
oracledb_resource_current_utilization{resource_name="dml_locks"} 0
oracledb_resource_current_utilization{resource_name="enqueue_locks"} 43
oracledb_resource_current_utilization{resource_name="enqueue_resources"} 31
oracledb_resource_current_utilization{resource_name="gcs_resources"} 0
oracledb_resource_current_utilization{resource_name="gcs_shadows"} 0
oracledb_resource_current_utilization{resource_name="ges_big_msgs"} 0
oracledb_resource_current_utilization{resource_name="ges_cache_ress"} 0
oracledb_resource_current_utilization{resource_name="ges_locks"} 0
oracledb_resource_current_utilization{resource_name="ges_procs"} 0
oracledb_resource_current_utilization{resource_name="ges_reg_msgs"} 0
oracledb_resource_current_utilization{resource_name="ges_ress"} 0
oracledb_resource_current_utilization{resource_name="ges_rsv_msgs"} 0
oracledb_resource_current_utilization{resource_name="k2q_locks"} 0
oracledb_resource_current_utilization{resource_name="max_rollback_segments"} 22
oracledb_resource_current_utilization{resource_name="max_shared_servers"} 2
oracledb_resource_current_utilization{resource_name="parallel_max_servers"} 2
oracledb_resource_current_utilization{resource_name="processes"} 80
oracledb_resource_current_utilization{resource_name="sessions"} 95
oracledb_resource_current_utilization{resource_name="smartio_buffer_memory"} 0
oracledb_resource_current_utilization{resource_name="smartio_metadata_memory"} 0
oracledb_resource_current_utilization{resource_name="smartio_overhead_memory"} 0
oracledb_resource_current_utilization{resource_name="smartio_sessions"} 0
oracledb_resource_current_utilization{resource_name="sort_segment_locks"} 2
oracledb_resource_current_utilization{resource_name="temporary_table_locks"} 0
oracledb_resource_current_utilization{resource_name="transactions"} 0
# HELP oracledb_resource_limit_value Generic counter metric from v$resource_limit view in Oracle (UNLIMITED: -1).
# TYPE oracledb_resource_limit_value gauge
oracledb_resource_limit_value{resource_name="branches"} -1
oracledb_resource_limit_value{resource_name="cmtcallbk"} -1
oracledb_resource_limit_value{resource_name="dml_locks"} -1
oracledb_resource_limit_value{resource_name="enqueue_locks"} 5542
oracledb_resource_limit_value{resource_name="enqueue_resources"} -1
oracledb_resource_limit_value{resource_name="gcs_resources"} -1
oracledb_resource_limit_value{resource_name="gcs_shadows"} -1
oracledb_resource_limit_value{resource_name="ges_big_msgs"} -1
oracledb_resource_limit_value{resource_name="ges_cache_ress"} -1
oracledb_resource_limit_value{resource_name="ges_locks"} -1
oracledb_resource_limit_value{resource_name="ges_procs"} 0
oracledb_resource_limit_value{resource_name="ges_reg_msgs"} -1
oracledb_resource_limit_value{resource_name="ges_ress"} -1
oracledb_resource_limit_value{resource_name="ges_rsv_msgs"} 0
oracledb_resource_limit_value{resource_name="k2q_locks"} -1
oracledb_resource_limit_value{resource_name="max_rollback_segments"} 65535
oracledb_resource_limit_value{resource_name="max_shared_servers"} -1
oracledb_resource_limit_value{resource_name="parallel_max_servers"} 32767
oracledb_resource_limit_value{resource_name="processes"} 300
oracledb_resource_limit_value{resource_name="sessions"} 472
oracledb_resource_limit_value{resource_name="smartio_buffer_memory"} -1
oracledb_resource_limit_value{resource_name="smartio_metadata_memory"} -1
oracledb_resource_limit_value{resource_name="smartio_overhead_memory"} -1
oracledb_resource_limit_value{resource_name="smartio_sessions"} -1
oracledb_resource_limit_value{resource_name="sort_segment_locks"} -1
oracledb_resource_limit_value{resource_name="temporary_table_locks"} -1
oracledb_resource_limit_value{resource_name="transactions"} -1
# HELP oracledb_sessions_value Gauge metric with count of sessions by status and type.
# TYPE oracledb_sessions_value gauge
oracledb_sessions_value{status="ACTIVE",type="BACKGROUND"} 65
oracledb_sessions_value{status="ACTIVE",type="USER"} 7
oracledb_sessions_value{status="INACTIVE",type="USER"} 1
# HELP oracledb_tablespace_bytes Generic counter metric of tablespaces bytes in Oracle.
# TYPE oracledb_tablespace_bytes gauge
oracledb_tablespace_bytes{tablespace="SYSAUX",type="PERMANENT"} 5.7442304e+08
oracledb_tablespace_bytes{tablespace="SYSTEM",type="PERMANENT"} 1.101135872e+09
oracledb_tablespace_bytes{tablespace="TEMP",type="TEMPORARY"} 0
oracledb_tablespace_bytes{tablespace="UNDOTBS1",type="UNDO"} 4.1353216e+07
oracledb_tablespace_bytes{tablespace="USERS",type="PERMANENT"} 1.048576e+06
# HELP oracledb_tablespace_free Generic counter metric of tablespaces free bytes in Oracle.
# TYPE oracledb_tablespace_free gauge
oracledb_tablespace_free{tablespace="SYSAUX",type="PERMANENT"} 1.7939390464e+10
oracledb_tablespace_free{tablespace="SYSTEM",type="PERMANENT"} 1.7936965632e+10
oracledb_tablespace_free{tablespace="TEMP",type="TEMPORARY"} 1.7947820032e+10
oracledb_tablespace_free{tablespace="UNDOTBS1",type="UNDO"} 3.4318368768e+10
oracledb_tablespace_free{tablespace="USERS",type="PERMANENT"} 1.7930805248e+10
# HELP oracledb_tablespace_max_bytes Generic counter metric of tablespaces max bytes in Oracle.
# TYPE oracledb_tablespace_max_bytes gauge
oracledb_tablespace_max_bytes{tablespace="SYSAUX",type="PERMANENT"} 1.8513813504e+10
oracledb_tablespace_max_bytes{tablespace="SYSTEM",type="PERMANENT"} 1.9038101504e+10
oracledb_tablespace_max_bytes{tablespace="TEMP",type="TEMPORARY"} 1.7947820032e+10
oracledb_tablespace_max_bytes{tablespace="UNDOTBS1",type="UNDO"} 3.4359721984e+10
oracledb_tablespace_max_bytes{tablespace="USERS",type="PERMANENT"} 1.7931853824e+10
# HELP oracledb_tablespace_used_percent Gauge metric showing as a percentage of how much of the tablespace has been used.
# TYPE oracledb_tablespace_used_percent gauge
oracledb_tablespace_used_percent{tablespace="SYSAUX",type="PERMANENT"} 3.102672714489066
oracledb_tablespace_used_percent{tablespace="SYSTEM",type="PERMANENT"} 5.783853352019611
oracledb_tablespace_used_percent{tablespace="TEMP",type="TEMPORARY"} 0
oracledb_tablespace_used_percent{tablespace="UNDOTBS1",type="UNDO"} 0.1203537561196118
oracledb_tablespace_used_percent{tablespace="USERS",type="PERMANENT"} 0.005847560493698568
# HELP oracledb_teq_curr_inst_id ID of current instance
# TYPE oracledb_teq_curr_inst_id gauge
oracledb_teq_curr_inst_id 1
# HELP oracledb_top_sql_elapsed SQL statement elapsed time running
# TYPE oracledb_top_sql_elapsed gauge
oracledb_top_sql_elapsed{sql_id="01uy9sb7w8a9g",sql_text=" begin      dbms_aqadm_sys.remove_all_nondurablesub(:1,"} 0.147496
oracledb_top_sql_elapsed{sql_id="0nakmm882vmq0",sql_text="select /* QOSH:DROP_STAT_HIST_PARTS */ partition_name, "} 0.072836
oracledb_top_sql_elapsed{sql_id="0sbbcuruzd66f",sql_text="select /*+ rule */ bucket_cnt, row_cnt, cache_cnt, null"} 0.072226
oracledb_top_sql_elapsed{sql_id="121ffmrc95v7g",sql_text="select i.obj#,i.ts#,i.file#,i.block#,i.intcols,i.type#,"} 0.17176
oracledb_top_sql_elapsed{sql_id="20x4skzx6dbjm",sql_text="INSERT INTO OPATCH_XINV_TAB(XML_INVENTORY) SELECT * FRO"} 2.656821
oracledb_top_sql_elapsed{sql_id="3un99a0zwp4vd",sql_text="select owner#,name,namespace,remoteowner,linkname,p_tim"} 0.069393
oracledb_top_sql_elapsed{sql_id="3wrrjm9qtr2my",sql_text="SELECT T.CLIENT_ID,         T.OPERATION_ID,         T.T"} 0.309885
oracledb_top_sql_elapsed{sql_id="44dn40afubks4",sql_text="select decode(u.type#, 2, u.ext_username, u.name), o.na"} 0.098865
oracledb_top_sql_elapsed{sql_id="586577qpbkgnk",sql_text="select 1 from DBA_SCHEDULER_JOBS  where JOB_NAME like '"} 0.072079
oracledb_top_sql_elapsed{sql_id="5yutdqf5nvrmt",sql_text="SELECT     dt.tablespace_name as tablespace,     dt.con"} 0.081922
oracledb_top_sql_elapsed{sql_id="8gbt6t0s3jn0t",sql_text="MERGE /*+ OPT_PARAM('_parallel_syspls_obey_force' 'fals"} 0.068104
oracledb_top_sql_elapsed{sql_id="b9c6ffh8tc71f",sql_text="BEGIN dbms_output.enable(NULL); END;"} 0.0982
oracledb_top_sql_elapsed{sql_id="cz8wbmy7k5bxn",sql_text="begin sys.dbms_aq_inv.internal_purge_queue_table(:1, :2"} 0.181691
# HELP oracledb_up Whether the Oracle database server is up.
# TYPE oracledb_up gauge
oracledb_up 1
# HELP oracledb_wait_time_administrative counter metric from system_wait_class view in Oracle.
# TYPE oracledb_wait_time_administrative counter
oracledb_wait_time_administrative 0
# HELP oracledb_wait_time_application counter metric from system_wait_class view in Oracle.
# TYPE oracledb_wait_time_application counter
oracledb_wait_time_application 0.03
# HELP oracledb_wait_time_cluster counter metric from system_wait_class view in Oracle.
# TYPE oracledb_wait_time_cluster counter
oracledb_wait_time_cluster 0
# HELP oracledb_wait_time_commit counter metric from system_wait_class view in Oracle.
# TYPE oracledb_wait_time_commit counter
oracledb_wait_time_commit 0.04
# HELP oracledb_wait_time_concurrency counter metric from system_wait_class view in Oracle.
# TYPE oracledb_wait_time_concurrency counter
oracledb_wait_time_concurrency 0.56
# HELP oracledb_wait_time_configuration counter metric from system_wait_class view in Oracle.
# TYPE oracledb_wait_time_configuration counter
oracledb_wait_time_configuration 0.15
# HELP oracledb_wait_time_network counter metric from system_wait_class view in Oracle.
# TYPE oracledb_wait_time_network counter
oracledb_wait_time_network 0
# HELP oracledb_wait_time_other counter metric from system_wait_class view in Oracle.
# TYPE oracledb_wait_time_other counter
oracledb_wait_time_other 16.44
# HELP oracledb_wait_time_scheduler counter metric from system_wait_class view in Oracle.
# TYPE oracledb_wait_time_scheduler counter
oracledb_wait_time_scheduler 0.59
# HELP oracledb_wait_time_system_io counter metric from system_wait_class view in Oracle.
# TYPE oracledb_wait_time_system_io counter
oracledb_wait_time_system_io 1.62
# HELP oracledb_wait_time_user_io counter metric from system_wait_class view in Oracle.
# TYPE oracledb_wait_time_user_io counter
oracledb_wait_time_user_io 24.5

Database permissions required

For the built-in default metrics, the database user that the exporter uses to connect to the Oracle Database instance must have the SELECT_CATALOG_ROLE privilege and/or SELECT permission on the following objects:

  • dba_tablespace_usage_metrics
  • dba_tablespaces
  • v$system_wait_class
  • v$asm_diskgroup_stat
  • v$datafile
  • v$sysstat
  • v$process
  • v$waitclassmetric
  • v$session
  • v$resource_limit
  • v$parameter
  • v$database
  • v$sqlstats
  • v$sysmetric
  • v$diag_alert_ext (for alert logs only)

Alert logs

The exporter can export alert log records into a file that is suitable for collection by a log ingestion tool like Promtail or FluentBit.

You can specify the interval that log records should be published using the parameter log.interval using a GoLang duration string. A duration string is a possibly signed sequence of decimal numbers, each with optional fraction and a unit suffix, such as "300ms", "-1.5h" or "2h45m". Valid time units are "ns", "us" (or "µs"), "ms", "s", "m", "h", "d", "w", "y".

You can specify the location of the output log file using the environment variable LOG_DESTINATION. The default is /log/alert.log. If you are running in Kubernetes, you should mount a volume on /log so that it can be accessed by both the exporter container and your log collector container.

The output is formatted as one JSON record per line, which most log collection tools will be able to parse with minimal configuration.

Here is an example of the output:

{"timestamp":"2023-09-02T05:40:43.626Z","moduleId":"","ecid":"","message":"Starting ORACLE instance (restrict) (OS id: 1473)"}
{"timestamp":"2023-09-02T05:40:43.64Z","moduleId":"","ecid":"","message":"****************************************************"}
{"timestamp":"2023-09-02T05:40:43.64Z","moduleId":"","ecid":"","message":" Sys-V shared memory will be used for creating SGA "}
{"timestamp":"2023-09-02T05:40:43.64Z","moduleId":"","ecid":"","message":" ****************************************************"}
{"timestamp":"2023-09-02T05:40:43.641Z","moduleId":"","ecid":"","message":"**********************************************************************"}
{"timestamp":"2023-09-02T05:40:43.641Z","moduleId":"","ecid":"","message":"Dump of system resources acquired for SHARED GLOBAL AREA (SGA) "}
{"timestamp":"2023-09-02T05:40:43.642Z","moduleId":"","ecid":"","message":" Domain name: kubepods.slice/kubepods-besteffort.slice/kubepods-besteffort-poda2061467_5334_40c3_9328_71be8196ee89.slice/crio-09918aac8159cea"}
{"timestamp":"2023-09-02T05:40:43.642Z","moduleId":"","ecid":"","message":" Per process system memlock (soft) limit = 64K"}
{"timestamp":"2023-09-02T05:40:43.642Z","moduleId":"","ecid":"","message":" Expected per process system memlock (soft) limit to lock"}
{"timestamp":"2023-09-02T05:40:43.642Z","moduleId":"","ecid":"","message":" instance MAX SHARED GLOBAL AREA (SGA) into memory: 1532M"}
{"timestamp":"2023-09-02T05:40:43.643Z","moduleId":"","ecid":"","message":" Available system pagesizes:"}
{"timestamp":"2023-09-02T05:40:43.643Z","moduleId":"","ecid":"","message":"  4K, 2048K "}
{"timestamp":"2023-09-02T05:40:43.643Z","moduleId":"","ecid":"","message":" Supported system pagesize(s):"}
{"timestamp":"2023-09-02T05:40:43.643Z","moduleId":"","ecid":"","message":"  PAGESIZE  AVAILABLE_PAGES  EXPECTED_PAGES  ALLOCATED_PAGES  ERROR(s)"}
{"timestamp":"2023-09-02T05:40:43.644Z","moduleId":"","ecid":"","message":"        4K       Configured               5           391529        NONE"}
{"timestamp":"2023-09-02T05:40:43.644Z","moduleId":"","ecid":"","message":"     2048K                0             766                0        NONE"}

You may disable alert logs by setting the parameter log.disable to 1.

Installation

There are a number of ways to run the exporter. In this section you will find information on running the exporter:

Docker, Podman, etc.

You can run the exporter in a local container using a conatiner image from Oracle Container Registry. The container image is available in the "observability-exporter" repository in the "Database" category. No authentication or license presentment/acceptance are required to pull this image from the registry.

Oracle Database

If you need an Oracle Database to test the exporter, you can use this command to start up an instance of Oracle Database 23ai Free which also requires no authentication or license presentment/acceptance to pull the image.

If you are running on a Mac with an Mx processor you need to install and run [colima] (https://github.com/abiosoft/colima) to make the 23a Free database run in the Mac platform. Run this command to start colima:

colima start --arch x86_64 --memory 8 --vm-type=vz --mount-type virtiofs
docker run --name free23ai \
    -d \
    -p 1521:1521 \
    -e ORACLE_PWD=Welcome12345 \
    container-registry.oracle.com/database/free:latest

This will pull the image and start up the database with a listener on port 1521. It will also create a pluggable database (a database container) called "FREEPDB1" and will set the admin passwords to the password you specified on this command.

You can tail the logs to see when the database is ready to use:

docker logs -f free23ai

(look for this message...)
#########################
DATABASE IS READY TO USE!
#########################

To obtain the IP address of the container, which you will need to connect to the database, use this command. Note: depending on your platform and container runtime, you may be able to access the database at "localhost":

docker inspect free23ai | grep IPA
    "SecondaryIPAddresses": null,
    "IPAddress": "172.17.0.2",
            "IPAMConfig": null,
            "IPAddress": "172.17.0.2",

Exporter

You need to give the exporter the connection details for the Oracle Database that you want it to run against. You can use a simple connection, or a wallet.

Simple connection

For a simple connection, you will provide the details using these variables:

  • DB_USERNAME is the database username, e.g., pdbadmin
  • DB_PASSWORD is the password for that user, e.g., Welcome12345
  • DB_CONNECT_STRING is the connection string, e.g., free23ai:1521/freepdb
  • DB_ROLE (Optional) can be set to SYSDBA or SYSOPER if you want to connect with one of those roles, however Oracle recommends that you connect with the lowest possible privileges and roles necessary for the exporter to run.

To run the exporter in a container and expose the port, use a command like this, with the appropriate values for the environment variables:

docker run -it --rm \
    -e DB_USERNAME=pdbadmin \
    -e DB_PASSWORD=Welcome12345 \
    -e DB_CONNECT_STRING=free23ai:1521/freepdb \
    -p 9161:9161 \
    container-registry.oracle.com/database/observability-exporter:1.3.1
Using a wallet

For a wallet connection, you must first set up the wallet. If you are using Oracle Autonomous Database, for example, you can download the wallet from the Oracle Cloud Infrastructure (OCI) console.

  1. Unzip the wallet into a new directory, e.g., called wallet.
  2. Edit the sqlnet.ora file and set the DIRECTORY to /wallet. This is the path inside the exporter container where you will provide the wallet.
  3. Take a note of the TNS name from the tnsnames.ora that will be used to connect to the database, e.g., devdb_tp.

Now, you provide the connection details using these variables:

  • DB_USERNAME is the database username, e.g., pdbadmin
  • DB_PASSWORD is the password for that user, e.g., Welcome12345
  • DB_CONNECT_STRING is the connection string, e.g., devdb_tp?TNS_ADMIN=/wallet
  • ORACLE_HOME is the location of the Oracle Instant Client, i.e., /lib/oracle/21/client64/lib. If you built your own container image, the path may be different.

To run the exporter in a container and expose the port, use a command like this, with the appropriate values for the environment variables, and mounting your wallet directory as /wallet in the container to provide access to the wallet:

docker run -it --rm \
    -e DB_USERNAME=pdbadmin \
    -e DB_PASSWORD=Welcome12345 \
    -e DB_CONNECT_STRING=devdb_tp \
    -v ./wallet:/wallet \
    -p 9161:9161 \
    container-registry.oracle.com/database/observability-exporter:1.3.1

Test/demo environment with Docker Compose

If you would like to set up a test environment with the exporter, you can use the provided "Docker Compose" file in this repository which will start an Oracle Database instance, the exporter, Prometheus and Grafana.

If you are running on a Mac with an Mx processor you need to install and run [colima] (https://github.com/abiosoft/colima) to make the 23a Free database run in the Mac platform. Run this command to start colima:

colima start --arch x86_64 --memory 8 --vm-type=vz --mount-type virtiofs

```bash
cd docker-compose
docker-compose up -d

The containers will take a short time to start. The first time, the Oracle container might take a few minutes to start while it creates the database instance, but this is a one-time operation, and subequent restarts will be much faster (a few seconds).

Once the containers are all running, you can access the services using these URLs:

  • Exporter
  • Prometheus - try a query for "oracle".
  • Grafana - username is "admin" and password is "grafana". An Oracle Database dashboard is provisioned and configured to use data from the exporter.

Kubernetes

To run the exporter in Kubernetes, you need to complete the following steps. All steps must be completed in the same Kunernetes namespace. The examples below assume you want to use a namespace called exporter, you must change the commands if you wish to use a different namespace.

Create a secret with credentials for connecting to the Oracle Database

Create a secret with the Oracle database user and password that the exporter should use to connect to the database using this command. You must specify the correct user and password for your environment. This example uses pdbadmin as the user and Welcome12345 as the password:

kubectl create secret generic db-secret \
    --from-literal=username=pdbadmin \
    --from-literal=password=Welcome12345 \
    -n exporter

Create a config map for the wallet (optional)

Create a config map with the wallet (if you are using one) using this command. Run this command in the wallet directory you created earlier.

kubectl create cm db-metrics-tns-admin \
    --from-file=cwallet.sso \
    --from-file=ewallet.p12 \
    --from-file=ewallet.pem \
    --from-file=keystore.jks \
    --from-file=ojdbc.properties \
    --from-file=sqlnet.ora \
    --from-file=tnsnames.ora \
    --from-file=truststore.jks \
    -n exporter

Create a config map for you metrics definition file (optional)

If you have defined any custom metrics, you must create a config map for the metrics definition file. For example, if you created a configuration file called txeventq-metrics.toml, then create the config map with this command:

kubectl create cm db-metrics-txeventq-exporter-config \
    --from-file=txeventq-metrics.toml \
    -n exporter

Deploy the Oracle Database Observability exporter

A sample Kubernetes manifest is provided here. You must edit this file to set the namespace you wish to use, the database connect string to use, and if you have any custom metrics, you will need to uncomment and customize some sections in this file.

Once you have made the necessary updates, apply the file to your cluster using this command:

kubectl apply -f metrics-exporter-deployment.yaml

You can check the deployment was successful and monitor the exporter startup with this command:

kubectl get pods -n exporter -w

You can view the exporter's logs with this command:

kubectl logs -f svc/metrics-exporter -n exporter

Create a Kubernetes service for the exporter

Create a Kubernetes service to allow access to the exporter pod(s). A sample Kubernetes manifest is provided here. You may need to customize this file to update the namespace.

Once you have made any necessary udpates, apply the file to your cluster using this command:

kubectl apply -f metrics-exporter-service.yaml

Create a Kubernetes service monitor

Create a Kubernetes service monitor to tell Prometheus (for example) to collect metrics from the exporter. A sample Kubernetes manifest is provided here. You may need to customize this file to update the namespace.

Once you have made any necessary udpates, apply the file to your cluster using this command:

kubectl apply -f metrics-service-monitor.yaml

Configure a Prometheus target (optional)

You may need to update your Prometheus configuration to add a target. If so, you can use this example job definition as a guide:

  - job_name: 'oracle-exporter'
    metrics_path: '/metrics'
    scrape_interval: 15s
    scrape_timeout: 10s
    static_configs:
    - targets: 
      - metrics-exporter.exporter.svc.cluster.local:9161

Import Grafana dashboard definition(s) (optional)

See Grafana dashboards below.

Standalone binary

Pre-compiled versions for Linux 64 bit can be found under releases.

In order to run, you'll need the Oracle Instant Client Basic for your operating system. Only the basic version is required for execution.

The following command line arguments (flags) can be passed to the exporter:

Usage of oracledb_exporter:
  --log.format value
       	If set use a syslog logger or JSON logging. Example: logger:syslog?appname=bob&local=7 or logger:stdout?json=true. Defaults to stderr.
  --log.level value
       	Only log messages with the given severity or above. Valid levels: [debug, info, warn, error, fatal].
  --custom.metrics string
        File that may contain various custom metrics in a TOML file.
  --default.metrics string
        Default TOML file metrics.
  --web.systemd-socket
        Use systemd socket activation listeners instead of port listeners (Linux only).
  --web.listen-address string
       	Address to listen on for web interface and telemetry. (default ":9161")
  --web.telemetry-path string
       	Path under which to expose metrics. (default "/metrics")
  --database.maxIdleConns string
        Number of maximum idle connections in the connection pool. (default "0")
  --database.maxOpenConns string
        Number of maximum open connections in the connection pool. (default "10")
  --web.config.file
        Path to configuration file that can enable TLS or authentication.

Using OCI Vault

The exporter will read the password from a secret stored in OCI Vault if you set these two environment variables:

  • VAULT_ID should be set to the OCID of the OCI vault that you wish to use
  • VAULT_SECRET_NAME should be set to the name of the secret in the OCI vault which contains the database password

Custom metrics

The exporter allows definition of arbitrary custom metrics in a TOML file. To specify this file to the exporter, you can:

  • Use --custom.metrics flag followed by the name of the TOML file, or
  • Export CUSTOM_METRICS variable environment (export CUSTOM_METRICS=my-custom-metrics.toml)

This file must contain the following elements:

  • One or several metric sections ([[metric]])
  • For each section: a context, a request and a map between the field(s) in the request and comment(s).

Here's a simple example:

[[metric]]
context = "test"
request = "SELECT 1 as value_1, 2 as value_2 FROM DUAL"
metricsdesc = { value_1 = "Simple example returning always 1.", value_2 = "Same but returning always 2." }

NOTE: Do not add a semicolon (;) at the end of the SQL queries.

This file produce the following entries in the exporter:

# HELP oracledb_test_value_1 Simple example returning always 1.
# TYPE oracledb_test_value_1 gauge
oracledb_test_value_1 1
# HELP oracledb_test_value_2 Same but returning always 2.
# TYPE oracledb_test_value_2 gauge
oracledb_test_value_2 2

You can also provide labels using labels field. Here's an example providing two metrics, with and without labels:

[[metric]]
context = "context_no_label"
request = "SELECT 1 as value_1, 2 as value_2 FROM DUAL"
metricsdesc = { value_1 = "Simple example returning always 1.", value_2 = "Same but returning always 2." }

[[metric]]
context = "context_with_labels"
labels = [ "label_1", "label_2" ]
request = "SELECT 1 as value_1, 2 as value_2, 'First label' as label_1, 'Second label' as label_2 FROM DUAL"
metricsdesc = { value_1 = "Simple example returning always 1.", value_2 = "Same but returning always 2." }

This TOML file produces the following result:

# HELP oracledb_context_no_label_value_1 Simple example returning always 1.
# TYPE oracledb_context_no_label_value_1 gauge
oracledb_context_no_label_value_1 1
# HELP oracledb_context_no_label_value_2 Same but returning always 2.
# TYPE oracledb_context_no_label_value_2 gauge
oracledb_context_no_label_value_2 2
# HELP oracledb_context_with_labels_value_1 Simple example returning always 1.
# TYPE oracledb_context_with_labels_value_1 gauge
oracledb_context_with_labels_value_1{label_1="First label",label_2="Second label"} 1
# HELP oracledb_context_with_labels_value_2 Same but returning always 2.
# TYPE oracledb_context_with_labels_value_2 gauge
oracledb_context_with_labels_value_2{label_1="First label",label_2="Second label"} 2

Last, you can set metric type using metricstype field.

[[metric]]
context = "context_with_labels"
labels = [ "label_1", "label_2" ]
request = "SELECT 1 as value_1, 2 as value_2, 'First label' as label_1, 'Second label' as label_2 FROM DUAL"
metricsdesc = { value_1 = "Simple example returning always 1 as counter.", value_2 = "Same but returning always 2 as gauge." }
# Can be counter or gauge (default)
metricstype = { value_1 = "counter" }

This TOML file will produce the following result:

# HELP oracledb_test_value_1 Simple test example returning always 1 as counter.
# TYPE oracledb_test_value_1 counter
oracledb_test_value_1 1
# HELP oracledb_test_value_2 Same test but returning always 2 as gauge.
# TYPE oracledb_test_value_2 gauge
oracledb_test_value_2 2

You can find working examples of custom metrics for slow queries, big queries and top 100 tables. An exmaple of custom metrics for Transacational Event Queues is also provided.

Customize metrics in a container image

If you run the exporter as a container image and want to include your custom metrics in the image itself, you can use the following example Dockerfile to create a new image:

FROM container-registry.oracle.com/database/observability-exporter:1.3.1
COPY custom-metrics.toml /
ENTRYPOINT ["/oracledb_exporter", "--custom.metrics", "/custom-metrics.toml"]

Controlling memory usage

If you are running in an environment with limited memory, or you are running a large number of exporters, you may want to control the exporter's usage of memory.

Under normal circumstances, the exporter process will retain OS memory that was used by the Go garbage collector but is no longer needed, in case it may be needed again in the future, unless the host OS is under memory pressure. The result of this behavior (which is the normal behavior of the Go runtime) is that the resident set size will not decrease until the host OS memory is almost all used. Under most circumstances, this will not cause any issues, but if you are in an environment where you need to conserve memory, the following options are provided:

  • You may set the FREE_INTERVAL environment variable to a Go duration string, e.g., 60s and run the exporter in debug mode by setting the GODEBUG environment variable to a value including madvdontneed=1, e.g., GODEBUG=gctrace=1,madvdontneed=1. The exporter will call the FreeOSMemory() at the specified interval. This tells the Go runtime to attempt to release memory which is no longer needed. Please note that this does not guarantee that the memory will be released to the OS, but over time you should see the RSS shrink sooner than without these settings.
  • You may set the RESTART_INTERVAL environment variable to a Go duration string, e.g., 10m. The exporter will restart its own process at the specified iterval (by calling the OS exec syscall). As no new process is created, the process identifier (PID) does not change, but the machine code, data, heap, and stack of the process are replaced by those of the new program (source: Wikipedia). This has the side effect of freeing the resident set, so that it will return to its original size.
  • In addition to these, you may also set GOMAXPROCS, GOGC, and GOMEMLIMIT (see documentation) to further limit the amount of resources that the Go runtime may use.

Grafana dashboards

A sample Grafana dashboard definition is provided in this directory. You can import this into your Grafana instance, and set it to use the Prometheus datasource that you have defined for the Prometheus instance that is collecting metrics from the exporter.

The dashboard shows some basic information, as shown below:

Monitoring Transactional Event Queues

Oracle Transactional Event Queues ("TxEventQ") is a fault-tolerant, scalable, real-time messaging backbone offered by converged Oracle Database that allows you to build an enterprise-class event-driven architectures.

Access to the real-time broker, producer, and consumer metrics in a single dashboard and receiving alerts for issues allows teams to understand the state of their system.

The exporter includes a set of metrics for monitoring TxEventQ and a pre-built Grafana dashboard.

Note: The metrics are written for Oracle Database 21c or later.

How to create some traffic with PL/SQL

If you need to create a topic to monitor, you can use these statements to create and start a topic, and create a subscriber:

declare
    subscriber sys.aq$_agent;
begin
  -- create the topic
  dbms_aqadm.create_transactional_event_queue(
    queue_name => 'my_topic',
    multiple_consumers => true  -- true makes a pub/sub topic
  );

  -- start the topic
  dbms_aqadm.start_queue(
    queue_name => 'my_topic'
  );

  -- create a subscriber
  dbms_aqadm.add_subscriber(
    queue_name => 'my_teq',
    subscriber => sys.aq$_agent(
      'my_subscriber',    -- the subscriber name
      null,               -- address, only used for notifications
      0                   -- protocol
    ),
    rule => 'correlation = ''my_subscriber'''
  );
end;

You can produce a message with these commands:

declare
    enqueue_options    dbms_aq.enqueue_options_t;
    message_properties dbms_aq.message_properties_t;
    message_handle     raw(16);
    message            SYS.AQ$_JMS_TEXT_MESSAGE;
begin
    -- create the message payload
    message := SYS.AQ$_JMS_TEXT_MESSAGE.construct;
    message.set_text('{"orderid": 12345, "username": "Jessica Smith"}');

    -- set the consumer name
    message_properties.correlation := 'my_subscriber';

    -- enqueue the message
    dbms_aq.enqueue(
        queue_name           => 'my_topic',
        enqueue_options      => enqueue_options,
        message_properties   => message_properties,
        payload              => message,
        msgid                => message_handle);

    -- commit the transaction
    commit;
end;

How to create some traffic with Java (Spring Boot)

A simple load generator is provided in this directory which you can use to create some traffic so you can experiment with the sample dashboard.

To run the sample, first update application.yaml with the correct IP address for your database, then start the application as follows:

mvn spring-boot:run

The application will create ten queues names TOPIC_0 through TOPIC_9 and randomly produce and consume messages on those queues. The example dashboard shown below was monitoring traffic produced using this application.

Metrics definitions

The metrics definitions are provided in this file. You need to provide this file to the exporter, e.g., by adding it to your container image, or creating a Kubernetes config map containing the file and mounting that config map as a volume in your deployment. You also need to set the CUSTOM_METRICS environment variable to the location of this file.

Additional database permissions

The database user that the exporter uses to connect to the database will also need additional permissions, which can be granted with these statements. This example assumes the exporter connects with the username "exporter":

grant execute on dbms_aq to exporter;
grant execute on dbms_aqadm to exporter;
grant execute on dbms_aqin to exporter;
grant execute on dbms_aqjms_internal to exporter;
grant execute on dbms_teqk to exporter;
grant execute on DBMS_RESOURCE_MANAGER to exporter;
grant select_catalog_role to exporter;
grant select on sys.aq$_queue_shards to exporter;
grant select on user_queue_partition_assignment_table to exporter;

Grafana dashboard

A Grafana dashboard for Transactional Event Queues is provided in this file. This can be imported into your Grafana environment. Choose the Prometheus datasource that is collecting metrics from the exporter.

Note: You may not see any activity on the dashboard unless there are clients producing and consuming messages from topics.

The dashboard will look like this:

Developer notes

The exporter itself is fairly simple. The initialization is done as follows:

  • Parse flags options
  • Load the default toml file (default-metrics.toml) and store each metric in a Metric struct
  • Load the custom toml file (if a custom toml file is given)
  • Create an Exporter object
  • Register exporter in prometheus library
  • Launching a web server to handle incoming requests

These operations are mainly done in the main function.

After this initialization phase, the exporter will wait for the arrival of a request.

Each time, it will iterate over the content of the metricsToScrape structure (in the function scrape func (e * Export) scrape (ch chan <- prometheus.Metric)).

For each element (of Metric type), a call to the ScrapeMetric function will be made which will itself make a call to the ScrapeGenericValues function.

The ScrapeGenericValues function will read the information from the Metric structure and, depending on the parameters, will generate the metrics to return. In particular, it will use the GeneratePrometheusMetrics function which will make SQL calls to the database.

Docker/container build

To build a container image, run the following command:

make docker

Building Binaries

Run build:

make go-build

This will create binaries and archives inside the dist folder for the building operating system.

Contributing

This project welcomes contributions from the community. Before submitting a pull request, please review our contribution guide

Security

Please consult the security guide for our responsible security vulnerability disclosure process

License

Copyright (c) 2016, 2024, Oracle and/or its affiliates.

Released under the Universal Permissive License v1.0 as shown at https://oss.oracle.com/licenses/upl/ and the MIT License (MIT)

oracle-db-appdev-monitoring's People

Contributors

andytael avatar djelibeybi avatar markxnelson avatar pasimoes avatar paulparkinson avatar spavlusieva avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

oracle-db-appdev-monitoring's Issues

fix the config file of oracledb dashboard

I found a mistake in 3333_rev1.json.
The query for active sessions is written as follows,

"expr": "oracledb_sessions_active{instance=\"$host\"}"

but I think it is correctly written as follows.

"expr": "oracledb_sessions_value{instance=\"$host\", status=\"ACTIVE\", type=\"USER\"}"

This worked correctly for me.

oracledb_exporter get error="dpoPool_create user= extAuth=0: ORA-24415: Missing or null username."

Hi:
I using oracledb_exporter to start. Get error error="dpoPool_create user= extAuth=0: ORA-24415: Missing or null username.".The OS user is oracle
DB_USERNAME=oracledb_exporter
DB_PASSWORD=p2k6hiiRK
DB_CONNECT_STRING=emcc
/mysqlbackup/recovery/oracle/monitor/oracledb_exporter --default.metrics "/mysqlbackup/recovery/oracle/monitor/default-metrics.toml" --log.level=info --web.listen-address=0.0.0.0:9161 --log.destination="/var/log/alert.log"
ts=2024-07-02T02:42:45.343Z caller=main.go:72 level=info msg="FREE_INTERVAL end var is not present, will not periodically attempt to release memory"
ts=2024-07-02T02:42:45.343Z caller=main.go:79 level=info msg="RESTART_INTERVAL env var is not present, so will not restart myself periodically"
ts=2024-07-02T02:42:45.358Z caller=collector.go:391 level=info msg="Could not set CLIENT_INFO."
ts=2024-07-02T02:42:45.358Z caller=collector.go:396 level=info msg="dbtype err =dpoPool_create user= extAuth=0: ORA-24415: Missing or null username."
ts=2024-07-02T02:42:45.359Z caller=collector.go:402 level=info msg="got error checking my database role"
ts=2024-07-02T02:42:45.359Z caller=collector.go:404 level=info msg="Connected as SYSDBA? "
ts=2024-07-02T02:42:45.360Z caller=collector.go:283 level=error msg="Error pinging oracle" error="dpoPool_create user= extAuth=0: ORA-24415: Missing or null username."
ts=2024-07-02T02:42:45.360Z caller=main.go:107 level=info msg="Starting oracledb_exporter" version=1.3.0
ts=2024-07-02T02:42:45.360Z caller=main.go:108 level=info msg="Build context" build="(go=go1.22.4, platform=linux/amd64, user=, date=, tags=unknown)"
ts=2024-07-02T02:42:45.360Z caller=main.go:109 level=info msg="Collect from: " metricPath=/metrics
ts=2024-07-02T02:42:45.360Z caller=main.go:159 level=info msg="Exporting alert logs to /var/log/alert.log"
ts=2024-07-02T02:42:45.361Z caller=tls_config.go:313 level=info msg="Listening on" address=[::]:9161
ts=2024-07-02T02:42:45.361Z caller=tls_config.go:316 level=info msg="TLS is disabled." http2=false address=[::]:9161
ts=2024-07-02T02:42:50.357Z caller=collector.go:283 level=error msg="Error pinging oracle" error="dpoPool_create user= extAuth=0: ORA-24415: Missing or null username."
ts=2024-07-02T02:43:00.369Z caller=alertlog.go:103 level=error msg="Error querying the alert logs"
ts=2024-07-02T02:43:05.356Z caller=collector.go:283 level=error msg="Error pinging oracle" error="dpoPool_create user= extAuth=0: ORA-24415: Missing or null username."
ts=2024-07-02T02:43:15.368Z caller=alertlog.go:103 level=error msg="Error querying the alert logs"
ts=2024-07-02T02:43:20.357Z caller=collector.go:283 level=error msg="Error pinging oracle" error="dpoPool_create user= extAuth=0: ORA-24415: Missing or null username."

Bug in metric definition

Hello,

First of all I would like to thank you for building this product.
I would like to raise the problem with one definition of the default metric:

context = "wait_time"
metricsdesc = { value="Generic counter metric from v$waitclassmetric view in Oracle." }
fieldtoappend= "wait_class"
request = '''
SELECT wait_class as WAIT_CLASS, sum(time_waited) as VALUE
FROM gv$active_session_history 
where wait_class is not null 
and sample_time > sysdate - interval '1' hour
GROUP BY wait_class;
'''

it is using summary of time_waited taken from sampled rows keep by the v$active_session_history view.
There is a few issues related to this SQL query:

  • sum(time_waited) should not be used from ASH (see page 28 of this Oracle presentation - ASH deep dive
  • v$active_session_history view required an Oracle EE edition plus diagnostic pack - this should be mentioned in documentation if you want to keep using it
  • using samples for last hour if Prometheus metric scraping is done every 5 sec by default may not be representative

I would propose to use v$system_wait_class (as it was in the initial version of this exporter) and define all metrics as counters ( as this is how they are implemented in v$system_wait_class view).

I am still working on some enhancements of the metrics and if you are ok I can raise a PR request when I am ready.

regards,
Marcin

failed to connect to local PDB on an Oracle Base Database DB System in OCI Cloud

installed oracledb_exporter from binary 1.3.0 on the DB System

excute as oracle user
./startExporter.sh

ts=2024-07-18T13:33:50.693Z caller=main.go:72 level=info msg="FREE_INTERVAL end var is not present, will not periodically attempt to release memory"
ts=2024-07-18T13:33:50.694Z caller=main.go:79 level=info msg="RESTART_INTERVAL env var is not present, so will not restart myself periodically"
ts=2024-07-18T13:33:50.695Z caller=collector.go:361 level=debug msg="Launching connection to localhost:1521/JOCHEN01"
ts=2024-07-18T13:33:50.695Z caller=collector.go:374 level=debug msg="connection properties: user=system password=SECRET-*** connectString=localhost:1521/MyPDBName configDir=\nconnectionClass= enableEvents=0 externalAuth=0 heterogeneousPool=0 initOnNewConnection=0\nlibDir= noBreakOnContextCancel=0 noTimezoneCheck=0 perSessionTimezone=0 pingInterval=0s\npoolIncrement=0 poolMaxSessions=0 poolMinSessions=0 poolSessionMaxLifetime=0s\npoolSessionTimeout=0s poolWaitTimeout=0s prelim=0 standaloneConnection=0 sysasm=0\nsysdba=0 sysoper=0 timezone="
ts=2024-07-18T13:33:50.695Z caller=collector.go:379 level=debug setmaxidleconnectionsto=0
ts=2024-07-18T13:33:50.695Z caller=collector.go:381 level=debug setmaxopenconnectionsto=10
ts=2024-07-18T13:33:50.695Z caller=collector.go:384 level=debug msg="Successfully configured connection to localhost:1521/JOCHEN01"
ts=2024-07-18T13:33:50.748Z caller=collector.go:391 level=info msg="Could not set CLIENT_INFO."
ts=2024-07-18T13:33:50.748Z caller=collector.go:396 level=info msg="dbtype err =pool=0x29f1410 stats=busy=0 open=0 max=0 maxLifetime=1h0m0s timeout=5m0s waitTimeout=30s params={authMode:0 connectionClass: connectionClassLength:0 purity:0 newPassword: newPasswordLength:0 appContext: numAppContext:0 externalAuth:0 externalHandle: pool:0x29f1410 tag: tagLength:0 matchAnyTag:0 outTag: outTagLength:0 outTagFound:0 shardingKeyColumns: numShardingKeyColumns:0 superShardingKeyColumns: numSuperShardingKeyColumns:0 outNewSession:0}: ORA-24415: Missing or null username."
ts=2024-07-18T13:33:50.748Z caller=collector.go:402 level=info msg="got error checking my database role"
ts=2024-07-18T13:33:50.748Z caller=collector.go:404 level=info msg="Connected as SYSDBA? "
ts=2024-07-18T13:33:50.748Z caller=collector.go:272 level=debug msg="error = pool=0x29f1410 stats=busy=0 open=0 max=0 maxLifetime=1h0m0s timeout=5m0s waitTimeout=30s params={authMode:0 connectionClass: connectionClassLength:0 purity:0 newPassword: newPasswordLength:0 appContext: numAppContext:0 externalAuth:0 externalHandle: pool:0x29f1410 tag: tagLength:0 matchAnyTag:0 outTag: outTagLength:0 outTagFound:0 shardingKeyColumns: numShardingKeyColumns:0 superShardingKeyColumns: numSuperShardingKeyColumns:0 outNewSession:0}: ORA-24415: Missing or null username."
ts=2024-07-18T13:33:50.749Z caller=collector.go:283 level=error msg="Error pinging oracle" error="pool=0x29f1410 stats=busy=0 open=0 max=0 maxLifetime=1h0m0s timeout=5m0s waitTimeout=30s params={authMode:0 connectionClass: connectionClassLength:0 purity:0 newPassword: newPasswordLength:0 appContext: numAppContext:0 externalAuth:0 externalHandle: pool:0x29f1410 tag: tagLength:0 matchAnyTag:0 outTag: outTagLength:0 outTagFound:0 shardingKeyColumns: numShardingKeyColumns:0 superShardingKeyColumns: numSuperShardingKeyColumns:0 outNewSession:0}: ORA-24415: Missing or null username."
ts=2024-07-18T13:33:50.749Z caller=main.go:107 level=info msg="Starting oracledb_exporter" version=1.3.0
ts=2024-07-18T13:33:50.749Z caller=main.go:108 level=info msg="Build context" build="(go=go1.22.4, platform=linux/amd64, user=, date=, tags=unknown)"
ts=2024-07-18T13:33:50.749Z caller=main.go:109 level=info msg="Collect from: " metricPath=/metrics

startExporter.sh

#!/bin/bash
. exporter.env
/usr/local/bin/oracledb_exporter --log.level debug --web.listen-address 0.0.0.0:9161 --default.metrics /etc/oracledb_exporter/default-metrics.toml

exporter.env

#/bin/bash
export LOG_DESTINATION=/var/log/oracledb_exporter/alert.log
export DB_USERNAME=c##_monitoring
export DB_PASSWORT=*******
export DB_CONNECT_STRING=localhost:1521/MyPDBName

Database ist available

sqlplus c##_monitoring/"*********"@MyPDBName

SQL*Plus: Release 19.0.0.0.0 - Production on Thu Jul 18 16:00:14 2024
Version 19.23.0.0.0
Copyright (c) 1982, 2023, Oracle.  All rights reserved.

Connected to:
Oracle Database 19c Standard Edition 2 Release 19.0.0.0.0 - Production
Version 19.23.0.0.0

SQL> 

Exporter throws errors, Oracle query timeout and no metrics found while parsing

The logfile for the exporter contains a lot of these error messages:

ts=2024-05-24T17:46:41.315Z caller=collector.go:328 level=error msg="Error scraping metric" Context=obaas_top_sql MetricsDesc="map[elapsed:SQL statement elapsed time running]" time=5.007098854s error="Oracle query timed out"
ts=2024-05-24T17:46:41.319Z caller=collector.go:328 level=error msg="Error scraping metric" Context=resource MetricsDesc="map[current_utilization:Generic counter metric from v$resource_limit view in Oracle (current value). limit_value:Generic counter metric from v$resource_limit view in Oracle (UNLIMITED: -1).]" time=1.74656ms error="no metrics found while parsing, query returned no rows"
ts=2024-05-24T17:46:46.324Z caller=collector.go:328 level=error msg="Error scraping metric" Context=obaas_top_sql MetricsDesc="map[elapsed:SQL statement elapsed time running]" time=5.006780993s error="Oracle query timed out"
ts=2024-05-24T17:46:46.328Z caller=collector.go:328 level=error msg="Error scraping metric" Context=resource MetricsDesc="map[current_utilization:Generic counter metric from v$resource_limit view in Oracle (current value). limit_value:Generic counter metric from v$resource_limit view in Oracle (UNLIMITED: -1).]" time=1.722233ms error="no metrics found while parsing, query returned no rows"

k describe -n oracle-database-exporter pod/metrics-exporter-64658f4559-9lmrx gives this:

Name:             metrics-exporter-64658f4559-9lmrx
Namespace:        oracle-database-exporter
Priority:         0
Service Account:  default
Node:             10.42.6.12/10.42.6.12
Start Time:       Thu, 23 May 2024 13:17:38 -0500
Labels:           app=metrics-exporter
                  app.kubernetes.io/name=oracle-database-exporter
                  app.kubernetes.io/part-of=database
                  app.kubernetes.io/version=1.2.1
                  pod-template-hash=64658f4559
Annotations:      <none>
Status:           Running
IP:               10.244.0.23
IPs:
  IP:           10.244.0.23
Controlled By:  ReplicaSet/metrics-exporter-64658f4559
Containers:
  metrics-exporter:
    Container ID:  cri-o://fdd77d9a9a5db05f896e6d85a63ca00831571575eaba24103948d6c8cae7e48e
    Image:         container-registry.oracle.com/database/observability-exporter:1.2.1
    Image ID:      8839ee049af745ca71baa76884657909585501bc52797ae8bdacd06f1cd2d5e5
    Port:          8080/TCP
    Host Port:     0/TCP
    Command:
      /oracledb_exporter
    Args:
      --log.level=info
    State:          Running
      Started:      Thu, 23 May 2024 13:18:56 -0500
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     500m
      memory:  128Mi
    Requests:
      cpu:     250m
      memory:  64Mi
    Environment:
      CUSTOM_METRICS:     /oracle/observability/obaas-metrics.toml
      ORACLE_HOME:        /lib/oracle/21/client64/lib
      TNS_ADMIN:          ${ORACLE_HOME}/network/admin
      DB_USERNAME:        ADMIN
      DB_PASSWORD:        <set to the key 'db.password' in secret 'zimbadb-db-secrets'>  Optional: false
      DB_CONNECT_STRING:  zimbadb_tp
    Mounts:
      /lib/oracle/21/client64/lib/network/admin from tns-admin (rw)
      /oracle/observability/obaas-metrics.toml from config-volume (rw,path="obaas-metrics.toml")
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-67kqf (ro)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True
  Initialized                 True
  Ready                       True
  ContainersReady             True
  PodScheduled                True
Volumes:
  tns-admin:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  zimbadb-tns-admin
    Optional:    false
  config-volume:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      obaas-metrics
    Optional:  false
  kube-api-access-67kqf:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:                      <none>

Database is ADB-S, 19c on OCI

metrics default error

I have problems with some checks against an autonomous database in the standard metrics.
The exporter is deployed in Kubernetes.

I also have the same problem in custom checks such as slow queries.

caller=collector.go:326 level=error Errorscrapingfor=resource _="unsupported value type" 5.687861ms=:
caller=collector.go:326 level=error Errorscrapingfor=wait_time _="unsupported value type" 5.207271ms=:

provide more spring boot (security, etc.) examples and facilitate usage

currently, we defer to spring boot doc for security aspects and any other features the user may want that are inherent from spring boot. We should provide more examples and facilitate the use of such features thereby insuring, eg, the user does not need to build the image in order to use them and can simply provide environment/properties to the prebuilt hosted image.

provide more examples

currently, we defer to the workshop but need to provide more examples of all three exporters, dashboards, etc.

FRAME_SIZE_ERROR from tracing

i.o.e.internal.grpc.OkHttpGrpcExporter : Failed to export spans. The request could not be executed. Full error message: FRAME_SIZE_ERROR: 4740180 occurs, likely due to some spring boot dependency as this does not occur in standalone version/run.

Multiple database support

Is there any update on this feature from the roadmap?
We have tested the exporter in our environment, but can't really start using it before we have support for multi-target.
Our environment consists of multiple cdbs and pdbs.

write tests

both unit and functional (bring in functional tests currently run outside this repos as part of this)

Exporter Modification Request

1. Timeout granularity:

Some SQL statements will have longer runtimes or higher volatility in runtimes. These longer runtimes might be across the board, or just on a subset of host environments with certain characteristics (e.g. orders of magnitude more tablespaces on certain hosts/applications can impact tablespace infrastructure focused queries). As the number of queries in a config file grows, the current solution of increasing the timeout for every query in the config file has the potential issue of allowing the confg file to run very, very long. It is allowed to run up to the timeout value for every SQL statement. A few ways this could be dealt with include:

  • Timeout granularity per SQL statement
  • Timeout granularity per host
  • Increase the current limit of two config files

These potential solutions are not exclusionary and each provides different benefits.

2. Frequency of SQL statement execution:

Some SQL statements do not need to be executed as frequently as others. Many need to be run at the typical 3 minute scrape frequency. Others might need to be run only once every 30 minutes. Being able to account for these differences would reduce load on the hosts. Possible solutions include:

  • Allowing multiple config files and allowing adjustment of the call time for each one
  • Allowing some SQL statements to execute every x calls of the config file (run this statement on every 10th call of the config file)

Provide ability for docker container to accept JVM options

This would provide the ability for the user to optionally specify any number of runtime options to the JVM in the exporter's container. This is important for users that might want to set certain Springboot options, reduce the JVM footprint (e.g. setting a heap max), etc.

How are things with RAC support?

Hi. First of all I would like to thank you for building this product.
We have a cluster with two active nodes. We want to track each instance individually, which we need to do?
those. We want to see Label instance name

Get secrets from OCI Vault

Currently trying to run this in container instances in OCI. The container is set up to fetch secret from OCI Vault, by providing the VAULT_ID and VAULT_SECRET_NAME.
The common.ConfigurationProviderEnvironmentVariables("vault","") and secrets.NewSecretsClientWithConfigurationProvider(configProvider) in vault.go can not create client if not vault_tenancy_ocid, vault_user_ocid, vault_fingerprint, vault_region is defined.
This is intended to run as a standalone container in container instances and let the cotnainer be able to read the secret (without any additional users added or config files located on the container).

The dynamic group has been set up with: matching_rule = "All {resource.type='computecontainerfamily'}"
The identity policy has been set up with:
Allow dynamic-group GROUP to read vaults in tenancy
Allow dynamic-group GROUP to read secret-bundles in tenancy.

Potential memory leak

A user reports that they are seeing a potential memory leak which appears to be easily reproducible and needs further analysis and correction.

Reproduction steps from reporter:

  • set GOMEMLIMIT=40MiB
  • Invoke with --scrape.interval=5s
  • Give it a decent toml file with some queries that return some data (not sure how necessary this is really, we just run it with ~10 or 15 queries that return a moderate amount of data)
  • Watch mem usage in mem tool of choice
  • I've captured the go stats that the exporter itself collects, and the only one that I could see grow (that wasn't a total metric) was go_memstats_buck_hash_sys_bytes but not sure if that's the problem.
  • Ran pprof with inuse_object and inuse_bytes repeatedly over a couple of days and see plenty of GC and what not, but can't see anything not getting GC'd or "dangling" but that may just be my inexperience with pprof.

Reporter commented that the same issue appears to occur with upstream iamseth code (with different db library/driver, etc.).

Reporter commented that they found very little on the buck_hash and the best they could find is that it has to do with internal representations of maps in Go. Said this somewhat jives with what they're seeing in pprof in that none of the "application code" shows up as suspect of object counts ever-increasing or even heap size increasing over and over. That made them wonder if there was some validity to the buck bytes stuff in that maybe that's not seen in pprof

Seems to be reproducible on linux x86_64 and macOS M2 (and in a container on linux)

Reporter provided this script that they use to perform reproduction/test runs:

export LD_LIBRARY_PATH="/path/to/instantclient_19_8:$LD_LIBRARY_PATH"
export DYLD_LIBRARY_PATH="$LD_LIBRARY_PATH"
export QUERY_TIMEOUT='15'
export DB_USERNAME='c##odbe'
export DB_PASSWORD=''
export DB_CONNECT_STRING='localhost:1521/FREE'

os="$(uname -s | tr [A-Z] [a-z])"
#arch="$(uname -m)"
# manually set arch since it will return arm64 on m2
arch="amd64"
rm -Rf dist
make go-build-$os-$arch
if [ $? -eq 0 ]; then
    export GOMAXPROCS=1
    export GOGC=75
    export GOMEMLIMIT=40MiB
    ./dist/oracledb_exporter-2.0.0.$os-$arch/oracledb_exporter --scrape.interval=5s
fi

Need help to use oci vault

My oracledb_exporter.service file has the lines:

Environment="vault_tenancy_ocid=ocid1.tenancy.oc1..aaaaaaaa..."
Environment="VAULT_ID=<the_name_of_my_vault>" # name or id? Id returned an error when starting the service
Environment="VAULT_SECRET_NAME=oracledb_exporter"

Start the service with this environment fails with:

systemctl status -l oracledb_exporter
● oracledb_exporter.service - Prometheus oracledb_exporter
   Loaded: loaded (/etc/systemd/system/oracledb_exporter.service; enabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Tue 2024-08-06 11:42:27 CEST; 3s ago
  Process: 4321 ExecStart=/usr/local/bin/oracledb_exporter --log.level debug --web.listen-address 0.0.0.0:9161 --default.metrics /etc/oracledb_exporter/default-metrics.toml (code=exited, status=1/FAILURE)
 Main PID: 4321 (code=exited, status=1/FAILURE)

Started Prometheus oracledb_exporter.
ts=2024-08-06T09:42:27.513Z caller=main.go:65 level=info msg="VAULT_ID env var is present so using OCI Vault" vault_name=<the_name_of_my_vault>
oracledb_exporter.service: main process exited, code=exited, status=1/FAILURE
Unit oracledb_exporter.service entered failed state.
oracledb_exporter.service failed.

loglevel is debug

When I query the secret with the OCI CLI, I get the correct value. But I used the compartment ID.

COMPARTMENT_OCID=<my_compartment_ocid"
SECRET_OCID=$(oci vault secret list --compartment-id "$COMPARTMENT_OCID" \
                      --query "data[?\"secret-name\" == '$SECRET_NAME'].id | [0]" \
                      --raw-output
)
SECRET=$(oci secrets secret-bundle get \
    --secret-id "$SECRET_OCID" \
    --raw-output \
    --query "data.\"secret-bundle-content\".content" | base64 -d
)

Expose db connection status as a metric

Expose the current database connection (pool) status as a simple internal metric (up/down) to enable easy detection of connection issues.

Note that the driver does not actually connect to the database when Open() is called, only when Ping() is called or you try to run a query.

provide ability to use a directory of config files

For exporter deployments with a large number of queries, the ability to use more than 2 config files would be helpful. Whether that's a directory of config files, or a CONFIG_FILE set of vars, just some way to break up the sql into more manageable files would be a nice add.

Provide mechanism to specify how often each metric should be collected

While you would normally want to collect all metrics at each scrape, it is possible, and likely that there may be some metrics you may wish to collect less frequently, since the collection of the data for those metrics is much more expensive.

This enhancement requests seeks a mechanism to add an interval (or perhaps a set of times) to determine when a metric should be collected. If this is not specified in the definition of a metric, then the default interval should apply.

discrepancy between SESSIONTIMEZONE and SYSTIMESTAMP

Hi, everyone!
Can someone help me how to resolve WARNING below?

oracledb-exporter-1  | godror WARNING: discrepancy between SESSIONTIMEZONE ("+00:00"=0) and SYSTIMESTAMP ("+05:00"=500) - set connection timezone, see https://github.com/godror/godror/blob/master/doc/timezone.md

I'm running exporter in docker-compose.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.