Comments (10)
@dszakallas IMHO the right approach is to leverage the metrics filtering support provided by Spark. Currently, all metrics are retained: https://github.com/banzaicloud/spark-metrics/blob/master/src/main/scala/com/banzaicloud/spark/metrics/sink/PrometheusSink.scala#L61
Users should be able to pass in the metric filter class implementation to be used (e.g. metrics-filter-class
). This class would be instantiated and passed to ScheduledReporter's constructor.
This way the metric filtering becomes fully customizable.
from spark-metrics.
Seems like the way to go. Just to make sure I understood you correctly, the proposal is to instantiate a user-provided class using reflection to use as the MetricFilter
implementation.
Ideally we should be able to provide a way for the user to parametrize the class, e.g. in this particular case, different instances can provide different regexps. This would come in handy for me so that the same class could be used with the direct Prometheus sink with a different regexp, although I don't think that sink has any support for MetricFilter
s currently.
Spark has a general mechanism for instantiating metrics sinks (and in case of Spark 3, plugins) where each instance is named, and can be parametrized. Adding such a registry for metric filters is quite a disruptive change and should be a Spark feature.
A much more conservative way to go (it only affects this plugin, and doesn't require a registry) is to have a single unnamed metric-filter locally for this plugin. In this case we look for a metrics-filter.class
property and if it is defined we search for properties matching metrics-filter.*
and pass them to its constructor as Properties
.
Of course, there's still the third option of leaving it unconfigurable, and instantiate with the default constructor. If the user wants different regexps, they need to find another means to do it, e.g use different classes.
from spark-metrics.
Ideally we should be able to provide a way for the user to parametrize the class, e.g. in this particular case, different instances can provide different regexps. This would come in handy for me so that the same class could be used with the direct Prometheus sink with a different regexp, although I don't think that sink has any support for MetricFilters currently.
Yes, the Prometheus Sink currently doesn't have any support for MetricFilters.
A much more conservative way to go (it only affects this plugin, and doesn't require a registry) is to have a single unnamed metric-filter locally for this plugin. In this case we look for a metrics-filter.class property and if it is defined we search for properties matching metrics-filter.* and pass them to its constructor as Properties.
The metric filter class should be a property of the sink, e.g.: *.sink.prometheus.metrics-filter-class
. (Note: due to the way Spark resolves Sink properties the last element of the config property key can not contain dot hence the metrics-filter-class
)
Metrics filter implementations must implement the com.codahale.metrics.MetricFilter
interface. The Prometheus Sink will instantiate the class specified in metrics-filter-class
using reflection and pass it down to https://github.com/banzaicloud/spark-metrics/blob/master/src/main/scala/com/banzaicloud/spark/metrics/sink/PrometheusSink.scala#L61
If no metrics-filter-class
is provided in the config than the Metrics.ALL
filter should be used by default as it is now.
Since metric filter implementations may require one or more configuration settings we could pass the path to the config file into the constructor. This way different configuration formats can be supported (e.g. json, yaml, etc)
If there is *.sink.prometheus.metrics-filter-config-path
than the value of this Sink property is passed into the constructor of the metrics filter, otherwise, the parameterless constructor is used.
from spark-metrics.
Note: due to the way Spark resolves Sink properties the last element of the config property key can not contain dot hence the metrics-filter-class)
Interesting, I didn't know that.
Since metric filter implementations may require one or more configuration settings we could pass the path to the config file into the constructor. This way different configuration formats can be supported (e.g. json, yaml, etc)
It's a bit cumbersome to require a separate file deployed in the cluster for configuring the metric filter. I would preferred to be passed as a configuration option. How about passing s/metrics-filter-(.*)/\1/
, i.e use a leading dash instead of a dot?
from spark-metrics.
Can you expand more on this how the Prometheus Sink configuration would look like?
from spark-metrics.
metrics-filter-class
would be the implementor class's name. If provided then properties matching metrics-filter-(.*)
are passed as Map[String, String]
to the constructor, with unqualified keys. For example:
*.sink.prometheus.metrics-filter-class=mycompany.RegexMetricsFilter
*.sink.prometheus.metrics-filter-regex=mycompany\.(.*)
*.sink.prometheus.metrics-filter-ignore-case=true
will result in something like (using reflection of course)
new RegexMetricsFilter(Map(
"regex" -> "mycompany\\.(.*)",
"ignore-case" -> "true"
))
from spark-metrics.
I was thinking something along the lines:
*.sink.prometheus.metrics-filter-class=mycompany.RegexMetricsFilter
*.sink.prometheus.metrics-filter-arg-regex=mycompany\.(.*)
*.sink.prometheus.metrics-filter-arg-ignore-case=true
where part after -arg-
denotes constructor parameters. (regex
could be mapped to the constructor parameter called regex
while ignore-case
is mapped to ignoreCase
, but your initial proposal of passing all these as a Map
would work as well).
from spark-metrics.
If we pass as args we will have to match the types, the names and the order of arguments, i.e. opening up a great big can of worms. The Map
approach would defer this task to the implementation. Spark uses the latter for configuring plugins, so we stay fairly conventional if we go with this approach.
I also find the arg
prefix superfluous and the name itself is not too descriptive in case of using a property map.
from spark-metrics.
Alright, let's go with the Map
approach and drop the arg
prefix.
from spark-metrics.
OK, I think everything is clear. I'll draft a PR real quick.
from spark-metrics.
Related Issues (20)
- Metrics namespace HOT 11
- Pushgateway Read timed out HOT 2
- Want to understand that this spark matrics repo will work with prometheus in Hadoop cluster ? HOT 4
- Release Spark provided fix to maven HOT 2
- No Metrics From Spark Executors (Classes are being instantiated) HOT 10
- Metrics name pre-processing by custom Prometheus sink is working for only one component(driver/executor/applicationMaster) HOT 10
- Metric Name RegEx Replacement doesn't work with JMX HOT 2
- Spark Metrics Stop Pushing After Pushgateway Restarts
- Configure sink to stop sending job as label/group-key HOT 3
- Metrics filter doesn't work
- Add remote_write to PrometheusSink
- Security Policy violation Binary Artifacts HOT 315
- Security Policy violation Branch Protection HOT 314
- Prometheus Sink is not working with SparkPi
- VictoriaMetrics HOT 2
- Repetitions of last metric value HOT 2
- Adding the ability to set custom labels on metrics
- com.banzaicloud:spark-metrics_2.12:3.1-1.0.0 version Not published to maven central HOT 3
- Only driver metrics visible on local
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from spark-metrics.