Git Product home page Git Product logo

Comments (10)

stoader avatar stoader commented on June 12, 2024

@dszakallas IMHO the right approach is to leverage the metrics filtering support provided by Spark. Currently, all metrics are retained: https://github.com/banzaicloud/spark-metrics/blob/master/src/main/scala/com/banzaicloud/spark/metrics/sink/PrometheusSink.scala#L61

Users should be able to pass in the metric filter class implementation to be used (e.g. metrics-filter-class). This class would be instantiated and passed to ScheduledReporter's constructor.

This way the metric filtering becomes fully customizable.

from spark-metrics.

dszakallas avatar dszakallas commented on June 12, 2024

Seems like the way to go. Just to make sure I understood you correctly, the proposal is to instantiate a user-provided class using reflection to use as the MetricFilter implementation.

Ideally we should be able to provide a way for the user to parametrize the class, e.g. in this particular case, different instances can provide different regexps. This would come in handy for me so that the same class could be used with the direct Prometheus sink with a different regexp, although I don't think that sink has any support for MetricFilters currently.

Spark has a general mechanism for instantiating metrics sinks (and in case of Spark 3, plugins) where each instance is named, and can be parametrized. Adding such a registry for metric filters is quite a disruptive change and should be a Spark feature.

A much more conservative way to go (it only affects this plugin, and doesn't require a registry) is to have a single unnamed metric-filter locally for this plugin. In this case we look for a metrics-filter.class property and if it is defined we search for properties matching metrics-filter.* and pass them to its constructor as Properties.

Of course, there's still the third option of leaving it unconfigurable, and instantiate with the default constructor. If the user wants different regexps, they need to find another means to do it, e.g use different classes.

from spark-metrics.

stoader avatar stoader commented on June 12, 2024

Ideally we should be able to provide a way for the user to parametrize the class, e.g. in this particular case, different instances can provide different regexps. This would come in handy for me so that the same class could be used with the direct Prometheus sink with a different regexp, although I don't think that sink has any support for MetricFilters currently.

Yes, the Prometheus Sink currently doesn't have any support for MetricFilters.

A much more conservative way to go (it only affects this plugin, and doesn't require a registry) is to have a single unnamed metric-filter locally for this plugin. In this case we look for a metrics-filter.class property and if it is defined we search for properties matching metrics-filter.* and pass them to its constructor as Properties.

The metric filter class should be a property of the sink, e.g.: *.sink.prometheus.metrics-filter-class. (Note: due to the way Spark resolves Sink properties the last element of the config property key can not contain dot hence the metrics-filter-class)

Metrics filter implementations must implement the com.codahale.metrics.MetricFilter interface. The Prometheus Sink will instantiate the class specified in metrics-filter-class using reflection and pass it down to https://github.com/banzaicloud/spark-metrics/blob/master/src/main/scala/com/banzaicloud/spark/metrics/sink/PrometheusSink.scala#L61

If no metrics-filter-class is provided in the config than the Metrics.ALL filter should be used by default as it is now.

Since metric filter implementations may require one or more configuration settings we could pass the path to the config file into the constructor. This way different configuration formats can be supported (e.g. json, yaml, etc)

If there is *.sink.prometheus.metrics-filter-config-path than the value of this Sink property is passed into the constructor of the metrics filter, otherwise, the parameterless constructor is used.

from spark-metrics.

dszakallas avatar dszakallas commented on June 12, 2024

Note: due to the way Spark resolves Sink properties the last element of the config property key can not contain dot hence the metrics-filter-class)

Interesting, I didn't know that.

Since metric filter implementations may require one or more configuration settings we could pass the path to the config file into the constructor. This way different configuration formats can be supported (e.g. json, yaml, etc)

It's a bit cumbersome to require a separate file deployed in the cluster for configuring the metric filter. I would preferred to be passed as a configuration option. How about passing s/metrics-filter-(.*)/\1/, i.e use a leading dash instead of a dot?

from spark-metrics.

stoader avatar stoader commented on June 12, 2024

Can you expand more on this how the Prometheus Sink configuration would look like?

from spark-metrics.

dszakallas avatar dszakallas commented on June 12, 2024

metrics-filter-class would be the implementor class's name. If provided then properties matching metrics-filter-(.*) are passed as Map[String, String] to the constructor, with unqualified keys. For example:

*.sink.prometheus.metrics-filter-class=mycompany.RegexMetricsFilter
*.sink.prometheus.metrics-filter-regex=mycompany\.(.*)
*.sink.prometheus.metrics-filter-ignore-case=true

will result in something like (using reflection of course)

new RegexMetricsFilter(Map(
  "regex" -> "mycompany\\.(.*)",
  "ignore-case" -> "true"
))

from spark-metrics.

stoader avatar stoader commented on June 12, 2024

I was thinking something along the lines:

*.sink.prometheus.metrics-filter-class=mycompany.RegexMetricsFilter
*.sink.prometheus.metrics-filter-arg-regex=mycompany\.(.*)
*.sink.prometheus.metrics-filter-arg-ignore-case=true

where part after -arg- denotes constructor parameters. (regex could be mapped to the constructor parameter called regex while ignore-case is mapped to ignoreCase, but your initial proposal of passing all these as a Map would work as well).

from spark-metrics.

dszakallas avatar dszakallas commented on June 12, 2024

If we pass as args we will have to match the types, the names and the order of arguments, i.e. opening up a great big can of worms. The Map approach would defer this task to the implementation. Spark uses the latter for configuring plugins, so we stay fairly conventional if we go with this approach.

I also find the arg prefix superfluous and the name itself is not too descriptive in case of using a property map.

from spark-metrics.

stoader avatar stoader commented on June 12, 2024

Alright, let's go with the Map approach and drop the arg prefix.

from spark-metrics.

dszakallas avatar dszakallas commented on June 12, 2024

OK, I think everything is clear. I'll draft a PR real quick.

from spark-metrics.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.