opsdis / aci-exporter Goto Github PK
View Code? Open in Web Editor NEWA Cisco ACI Prometheus exporter written in Go
Home Page: https://www.opsdis.com
License: GNU General Public License v3.0
A Cisco ACI Prometheus exporter written in Go
Home Page: https://www.opsdis.com
License: GNU General Public License v3.0
Is your feature request related to a problem? Please describe.
I want to scrape timestamp metrics as value(not label) from ACI such as modTs.
{
"totalCount":"1",
"imdata":[
{
"infraWiNode":{
"attributes":{
-snip-
"modTs":"2020-04-18T05:24:07.722+00:00",
Describe the solution you'd like
I think aci-exporter need to translate from "2020-04-18T05:24:07.722+00:00" to unixtime.
Do you have any plans to add the above function?
Or is it already implemented?
Describe the solution you'd like
In the current implementation all requests to the ACI APIC is done sequentially. The scrape time will be the sum of all the requests. Instead goroutines should be used for each http request and the response returned on a channel(s). Depending on the parallelism of the machine the scrape time would be in the range of the time of the longest response time of all the request done to the APIC.
This will have a little effect on scrape time if the number of APIC queries are low, but will have a major effect on scrape time when the number increase and keep the scrape time more "constant" to a growing number of queries.
I am trying to extract the EPG to Port mapping with this code
epg_to_port:
class_name: vlanCktEp
query_parameter: '?rsp-subtree-include=required&rsp-subtree-class=l2RsPathDomAtt&rsp-subtree=children'
metrics:
- name: dynamic_binding
value_name: vlanCktEp.attributes.pcTag
type: gauge
labels:
- property_name: vlanCktEp.attributes.epgDn
regex: "^uni/tn-(?P<tenant>.*)/ap-(?P<app>.*)/epg-(?P<epg>.*)"
- property_name: vlanCktEp.children.[l2RsPathDomAtt].attributes.tDn
regex: "^topology/pod-(?P<podid>[1-9][0-9]*)/node-(?P<nodeid>[1-9][0-9]+)/sys/conng/path-\\[(?P<interface>[^\\]]+)\\]"
But it seems the vlanCktEp.children.[l2RsPathDomAtt].attributes.tDn
label is not processed.
I found a way to make it work by passing the child in the metric.
For example if I use this in the metrics value_name vlanCktEp.children.[l2RsPathDomAtt].attributes.parentSKey
then the vlanCktEp.children.[l2RsPathDomAtt].attributes.tDn
label is added as well as vlanCktEp.attributes.epgDn
Is this expected?
Describe the solution you'd like
Add support to exclude and/or include which queries to execute for a specific fabric.
aci-exporter allows the user to limit which queries to execute by passing a "queries" parameter with a comma separated list of queries.
For example: queries=node_bgp_peers,node_interface_info
If Prometheus is accessing aci-exporter and the scraping config is not correct is possible to send a list that contains spaces, new lines etc...
aci-exporter does not do any validation on this field and will just silently fail.
It should validate the queries
and return an error if the format is not correct.
Is your feature request related to a problem? Please describe.
Every query api call is doing a new login adding additional latency.
Describe the solution you'd like
After a valid login reuse the token until it fails
Hi @thenodon :
Describe the bug
Is it possible to add static label in a class query :
BEFORE
aci_node_system_psu_status{aci="MY_FABRIC",fabric="MY_FABRIC",node_id="2",pod_id="1",psu_slot_id="6"} 1
AFTER
aci_node_system_psu_status{aci="MY_FABRIC",fabric="MY_FABRIC",node_id="2",pod_id="1",psu_slot_id="6" sensor_name="PDU"} 1
By static I mean that this label doesn't correspond any class attribut. It is like a custom label that is added to the config
Hello,
First of all, thank you for this great exporter !
I'm currently trying to get usage visibility on dynamic pool vlans, and in order to do so would need to be able to use as metric values the attributes 'to' and 'from' of class fvnsEncapBlk.
Problem is that those attributes values are in the form 'vlan-xxxx', so I need to remove the 'vlan-' part in order to have a float value.
I did not manage to find a way to use regex on the transform_value function, is it possible ?
BR
Describe the solution you'd like
If more then one apic is configured for the fabric the exporter should be able to round-robin between them.
Describe the bug
When you use a prefix
the # HELP and # TYPE lines in the generated metric file does not reflect it.
Effectively rendering those line inefectives
To Reproduce
Steps to reproduce the behavior:
prefix: aci_
# HELP health_ratio Returns health score
# TYPE health_ratio gauge
aci_health_ratio
Expected behavior
It should be:
# HELP aci_health_ratio Returns health score
# TYPE aci_health_ratio gauge
aci_health_ratio
These endpoints are not correct
urlMap["login"] = "/api/mo/aaaLogin.xml"
urlMap["logout"] = "/api/mo/aaaLogout.xml"
and are not valid in ACI 5.2 and should be updated to
urlMap["login"] = "/api/aaaLogin.xml"
urlMap["logout"] = "/api/aaaLogout.xml"
The new syntax should work across previous ACI versions as well.
Currently aci-exporter works fine for most configurations but on large scale fabric if a query returns too many object it might hit the Maximum response size
the APIC can handle and the query will fail.
aci-exporter should implement pagination
Describe the solution you'd like
Be able to add static labels to queries that is not parsed from the query response. Like:
static_labels:
- key: xyz
value: XYZ
Problem Description
I'm always frustrated when trying to integrate various monitoring solutions due to the lack of a standardized format and semantics. OpenTelemetry (oTel) and the OpenTelemetry Protocol (oTLP) offer a standard approach, but many solutions don't natively support it. This results in redundant work and increased complexity in data integration.
Desired Solution
I would like to see a native OpenTelemetry (oTel) Exporter implemented, enabling the seamless export of data (primarily Metrics from the MELT stack) to any backend that supports oTel, such as Splunk Observability Cloud, DynaTrace, Loki, etc. This would simplify data integration and enhance interoperability among observability products.
Alternatives Considered
SNMPoTel Project: While the SNMPoTel project provides a solution, SNMP itself is cumbersome and adds unnecessary complexity. A native oTel Exporter would offer a more streamlined and efficient approach. More details on SNMPoTel can be found here.
Additional Context
A wide range of observability products already leverage OpenTelemetry, making it a robust standard for sending and contextualizing data across different backends. Implementing a native oTel Exporter would align with this trend and provide significant benefits in standardization and ease of integration.
Is your feature request related to a problem? Please describe.
The example used under "Parsing and metrics" is not realistic as it doesn't extract a useful metric.
Describe the solution you'd like
It would be great if the example would show how to extract all the relevant metrics.
Examples of json-result from ACI
{
"totalCount": "1",
"imdata": [
{
"ethpmDOMCurrentStats": {
"attributes": {
"alert": "none",
"childAction": "",
"dn": "topology/pod-1/node-101/sys/phys-[eth1/1]/phys/domstats/current",
"hiAlarm": "12.000001",
"hiAlarm2": "0.000000",
"hiAlarm3": "0.000000",
"hiAlarm4": "0.000000",
"hiAlarm5": "0.000000",
"hiAlarm6": "0.000000",
"hiAlarm7": "0.000000",
"hiAlarm8": "0.000000",
"hiWarn": "10.500001",
"hiWarn2": "0.000000",
"hiWarn3": "0.000000",
"hiWarn4": "0.000000",
"hiWarn5": "0.000000",
"hiWarn6": "0.000000",
"hiWarn7": "0.000000",
"hiWarn8": "0.000000",
"lanes": "1",
"loAlarm": "1.000000",
"loAlarm2": "0.000000",
"loAlarm3": "0.000000",
"loAlarm4": "0.000000",
"loAlarm5": "0.000000",
"loAlarm6": "0.000000",
"loAlarm7": "0.000000",
"loAlarm8": "0.000000",
"loWarn": "2.500000",
"loWarn2": "0.000000",
"loWarn3": "0.000000",
"loWarn4": "0.000000",
"loWarn5": "0.000000",
"loWarn6": "0.000000",
"loWarn7": "0.000000",
"loWarn8": "0.000000",
"modTs": "never",
"status": "",
"value": "5.640000",
"value2": "0.000000",
"value3": "0.000000",
"value4": "0.000000",
"value5": "0.000000",
"value6": "0.000000",
"value7": "0.000000",
"value8": "0.000000"
}
}
}
]
}
{
"totalCount": "1",
"imdata": [
{
"ethpmDOMCurrentStats": {
"attributes": {
"alert": "none",
"childAction": "",
"dn": "topology/pod-1/node-101/sys/phys-[eth1/36]/phys/domstats/current",
"hiAlarm": "14.996000",
"hiAlarm2": "14.996000",
"hiAlarm3": "14.996000",
"hiAlarm4": "14.996000",
"hiAlarm5": "14.996000",
"hiAlarm6": "14.996000",
"hiAlarm7": "14.996000",
"hiAlarm8": "14.996000",
"hiWarn": "12.998000",
"hiWarn2": "12.998000",
"hiWarn3": "12.998000",
"hiWarn4": "12.998000",
"hiWarn5": "12.998000",
"hiWarn6": "12.998000",
"hiWarn7": "12.998000",
"hiWarn8": "12.998000",
"lanes": "8",
"loAlarm": "4.496000",
"loAlarm2": "4.496000",
"loAlarm3": "4.496000",
"loAlarm4": "4.496000",
"loAlarm5": "4.496000",
"loAlarm6": "4.496000",
"loAlarm7": "4.496000",
"loAlarm8": "4.496000",
"loWarn": "5.000000",
"loWarn2": "5.000000",
"loWarn3": "5.000000",
"loWarn4": "5.000000",
"loWarn5": "5.000000",
"loWarn6": "5.000000",
"loWarn7": "5.000000",
"loWarn8": "5.000000",
"modTs": "never",
"status": "",
"value": "28.256001",
"value2": "28.256001",
"value3": "24.744001",
"value4": "33.856003",
"value5": "16.448000",
"value6": "16.448000",
"value7": "25.702002",
"value8": "24.676001"
}
}
}
]
}
The relevant parts of the data:
value: actual value for lane 1
value2: actual value for lane 2 and so on
hiAlarm: high threshold value for alarm for lane 1, alarm will be triggered above this value
hiAlarm2: high threshold value for alarm for lane 2, alarm will be triggered above this value
loAlarm: low threshold value for alarm for lane 1, alarm will be triggered below this value
Similiar for hiWarn and loWarn.
The most important metrics are "value" for all the lanes as this indicates the optical quality of the link.
Is your feature request related to a problem? Please describe.
The config.yml can become a large with all query definitions.
Describe the solution you'd like
Support the pattern of configuration directory that can have multiple files including a mix of queries and query types
Is your feature request related to a problem? Please describe.
Currently no metrics is returned if there is an issue with login to the apic.
Describe the solution you'd like
An additional aci_up
that follow exporter pattern to return 1 if target can be scraped and 0 if it failes
Describe alternatives you've considered
N/A
Additional context
N/A
Describe the bug
A query like /api/class/fvAEPg.json?rsp-subtree-include=health,required will return the health for the object and often the health for the node, e.g:
"children": [
{
"healthNodeInst": {
"attributes": {
"childAction": "deleteNonPresent",
"chng": "400",
"cur": "100",
"isExisting": "no",
"lcOwn": "local",
"maxSev": "cleared",
"modTs": "never",
"nodeId": "101",
"podId": "1",
"prev": "20",
"rn": "nodehealth-101",
"status": "",
"twScore": "100",
"updTs": "2020-08-11T17:41:24.154+02:00",
"weight": "1"
}
}
},
{
"healthNodeInst": {
"attributes": {
"childAction": "deleteNonPresent",
"chng": "400",
"cur": "100",
"isExisting": "no",
"lcOwn": "local",
"maxSev": "cleared",
"modTs": "never",
"nodeId": "102",
"podId": "1",
"prev": "20",
"rn": "nodehealth-102",
"status": "",
"twScore": "100",
"updTs": "2020-08-11T17:41:31.400+02:00",
"weight": "1"
}
}
},
{
"healthInst": {
"attributes": {
"childAction": "",
"chng": "400",
"cur": "100",
"maxSev": "cleared",
"modTs": "never",
"prev": "20",
"rn": "health",
"status": "",
"twScore": "100",
"updTs": "2020-08-11T17:41:32.306+02:00"
}
}
}
]
In the query I am only interested in the healthInst
, but I have not find a way to filter on just healthInst
when query the apic api.
With go gjson I have not found a way to express the parsing string for an array of different object. From the documentation it looks like code is needed. The only way I know about now is to sort that string and since healthInst will be "before" healthNodeInst we can apply the .0. for the index, like this:
This is not in anyway a solid solution.
Is your feature request related to a problem? Please describe.
We are trying to retrieve the property topSystem.attributes.systemUpTime
which works (as a label) but unfortunately we didn’t find a way to convert the format (ie. 210:20:16:20.000
) to a correct value (seconds)
Describe the solution you'd like
Ideally we would want a metric like this :
aci_node_uptime_duration_seconds{aci="GVA_DC_FABRIC",nodeid="1",podid="1"} 4320000
with the value been the number of second since last reboot
Describe alternatives you've considered
Unfortunately I don’t see any other alternative as we didn’t find another property which would store the value in a raw format.
From what I understand if a calculation does not already exist it would require a custom one, correct ?
Describe the bug
value_transform do not work. It always display 0 as id no matter the value of the operstatus
class_name: eqptPsu
metrics:
- name: infra_node_psu
value_name: X
type: "gauge"
help: "Returns the info of the node psu states"
unit: "info"
value_calculation: "1"
value_transform:
'unknown': 0
'ok': 1
'fail': 2
'absent': 3
'shut': 4
'mismatch': 5
Describe the bug
As my fabric and my exporter is getting bigger, I would like to know if there is a way to reduce scrapping time. Like multiple class queries for example
I am trying to extract the static biding infos with this class query:
static_binding_info:
class_name: fvAEPg
query_parameter: '?rsp-subtree-include=required&rsp-subtree-class=fvRsPathAtt&rsp-subtree=children'
metrics:
- name: static_binding
value_name: fvAEPg.children.[fvRsPathAtt].attributes.encap
type: gauge
value_regex_transformation: "vlan-(.*)"
help: "Static Binding Infos"
labels:
- property_name: fvAEPg.attributes.dn
regex: "^uni/tn-(?P<tenant>.*)/ap-(?P<app>.*)/epg-(?P<epg>.*)"
- property_name: fvAEPg.attributes.[.*].attributes.tDn
regex: "^topology/pod-(?P<podid>[1-9][0-9]*)/(protpaths|paths)-(?P<nodeid>[1-9][0-9].*)/pathep-\\[(?P<port>.+)\\]"
- property_name: fvAEPg.attributes.[.*].attributes.encap
regex: "^(?P<encap>.*)"
The query works but it seems we aren't triggering the value_regex_transformation
:
I get this error message
{"level":"info","msg":"could not convert value to float, will return 0.0 ","time":"2023-07-12T03:02:43Z","value":"vlan-110"}
that matches the toFloat
function Info message and seems the toFloatTransform
isn't called?
the returned data is
aci_static_binding{aci="ACI Fabric2",app="Trex",encap="vlan-110",epg="Trex-2",fabric="fab2",nodeid="203-204",podid="1",port="bm-01-40G_PolGrp",tenant="Trex"} 0
There are no other error messages so I am not sure why this is happening.
Currently all the data collected by aci-exporter comes from the APICs.
This works great for small fabrics but at scale generates a lot of extra load on the APICs.
A good example is interface statistics: aci-exporter collecting them for all the interfaces of all the switches and APIC becomes the bottle neck.
A better approach would be to gather this data directly from the switches.
In order to do so we need to:
Describe the bug
The exporter is unable to convert operSpeed string values to float.
To Reproduce
class_queries:
interface_info:
class_name: l1PhysIf
query_parameter: "?rsp-subtree=children&rsp-subtree-include=stats&rsp-subtree-class=ethpmPhysIf,eqptIngrBytes5min,eqptEgrBytes5min,eqptIngrDropPkts5min,eqptEgrDropPkts5min&query-target-filter=and(ne( l1PhysIf.adminSt, \"down\"))"
metrics:
# It works here
- name: interface_oper_state
value_name: l1PhysIf.children.[ethpmPhysIf].attributes.operSt
type: gauge
help: The current operational state of the interface. (0=unknown, 1=down, 2=up, 3=link-up)
# A string to float64 transform table of the value
value_transform:
'down': 0 ## ~ disabled interfaces
'up': 1 ## ~ enabled interfaces
# But not here
- name: interface_oper_speed
value_name: l1PhysIf.children.[ethpmPhysIf].attributes.operSpeed
type: gauge
help: The current operational speed of the interface, in bits per second.
value_transform:
'unknown': 0
'100M': 100000000
'1G': 1000000000
'10G': 10000000000
'25G': 25000000000
'40G': 40000000000
'100G': 100000000000
'400G': 400000000000
Additional context
{"level":"info","msg":"could not convert value to float, will return 0.0 ","time":"2023-08-11T09:56:17Z","value":"100G"}
I have this log for all values (1G,10G,etc.)
It works for all attributes of this classes e.g. operSt, but not for operSpeed
auth failed for a user with a longish password that has several special chars in it. Putting the user and password between double quotes in the xml fixed the issue for me .
diff --git a/aci-connection.go b/aci-connection.go
index ccfc6dd..d4c9110 100644
--- a/aci-connection.go
+++ b/aci-connection.go
@@ -84,7 +84,7 @@ func newAciConnction(ctx context.Context, fabricConfig *Fabric) *AciConnection {
func (c AciConnection) login() error {
for i, controller := range c.fabricConfig.Apic {
_, status, err := c.doPostXML("login", fmt.Sprintf("%s%s", controller, c.URLMap["login"]),
- []byte(fmt.Sprintf("<aaaUser name=%s pwd=%s/>", c.fabricConfig.Username, c.fabricConfig.Password)))
+ []byte(fmt.Sprintf("<aaaUser name=\"%s\" pwd=\"%s\"/>", c.fabricConfig.Username, c.fabricConfig.Password)))
if err != nil || status != 200 {
err = fmt.Errorf("failed to login to %s, try next apic", controller)
Describe the bug
If you don’t input the type
and / or help
field in the yaml it will successfully generate the metric file but prometheus will fail to scrap it
To Reproduce
if you leave commented the type
and / or help
lines in the following metric the fille will be generated with empty # HELP
and / or # TYPE
lines
metrics:
- name: test_metric
value_name: eqptCh.attributes.model
# type: "gauge"
unit: "ratio"
# help: "Returns the kernel space cpu load of a fabric node"
Expected behavior
Option1: The program should detect the yaml file as invalid
Option2: The program should replace the missing filed to "default" values type: "gauge"
/ help: "Missing"
The log level in the config file is never read it only works if you set it as a flag.
As a quick fix I just added this after err := viper.ReadInConfig()
and that fixed it
ll := viper.GetString("loglevel")
if ll != "" {
level, err := log.ParseLevel(ll)
if err != nil {
log.Error(fmt.Sprintf("Not supported log level - %s", err))
os.Exit(1)
}
log.SetLevel(level)
}
Same thing for the config_dir
Describe the bug
The labels set by he user in the yaml fils can contain upercase
Expected behavior
As a best practice labels should always be lowercase
You can find regex to validates labels names and metric names in this issue for example: prometheus/client_java#28
For label names maybe a .ToLower()
could be an option and would minimise rework for existing files.
... But it might break existing queries
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.