willkg / markus Goto Github PK

View Code? Open in Web Editor NEW

70.0 5.0 8.0 281 KB

Markus is a Python library for generating metrics

License: Mozilla Public License 2.0

Python 99.06% Makefile 0.94%

metrics python statsd library

markus's Introduction

Markus

Markus is a Python library for generating metrics.

Code:	https://github.com/willkg/markus
Issues:	https://github.com/willkg/markus/issues
License:	MPL v2
Documentation:	http://markus.readthedocs.io/en/latest/

Goals

Markus makes it easier to generate metrics in your program by:

providing multiple backends (Datadog statsd, statsd, logging, logging rollup, and so on) for sending data to different places
sending metrics to multiple backends at the same time
providing a testing framework for easy testing
providing a decoupled architecture making it easier to write code to generate metrics without having to worry about making sure creating and configuring a metrics client has been done--similar to the Python logging Python logging module in this way

I use it at Mozilla in the collector of our crash ingestion pipeline. Peter used it to build our symbols lookup server, too.

Install

To install Markus, run:

$ pip install markus

(Optional) To install the requirements for the markus.backends.statsd.StatsdMetrics backend:

$ pip install 'markus[statsd]'

(Optional) To install the requirements for the markus.backends.datadog.DatadogMetrics backend:

$ pip install 'markus[datadog]'

Quick start

Similar to using the logging library, every Python module can create a markus.main.MetricsInterface (loosely equivalent to a Python logging logger) at any time including at module import time and use that to generate metrics.

For example:

import markus

metrics = markus.get_metrics(__name__)

Creating a markus.main.MetricsInterface using __name__ will cause it to generate all stats keys with a prefix determined from __name__ which is a dotted Python path to that module.

Then you can use the markus.main.MetricsInterface anywhere in that module:

@metrics.timer_decorator("chopping_vegetables")
def some_long_function(vegetable):
    for veg in vegetable:
        chop_vegetable()
        metrics.incr("vegetable", value=1)

At application startup, configure Markus with the backends you want and any options they require to publish metrics.

For example, let us configure Markus to publish metrics to the Python logging infrastructure and Datadog:

import markus

markus.configure(
    backends=[
        {
            # Publish metrics to the Python logging infrastructure
            "class": "markus.backends.logging.LoggingMetrics",
        },
        {
            # Publish metrics to Datadog
            "class": "markus.backends.datadog.DatadogMetrics",
            "options": {
                "statsd_host": "example.com",
                "statsd_port": 8125,
                "statsd_namespace": ""
            }
        }
    ]
)

Once you've added code that publishes metrics, you'll want to test it and make sure it's working correctly. Markus comes with a markus.testing.MetricsMock to make testing and asserting specific outcomes easier:

from markus.testing import MetricsMock


def test_something():
    with MetricsMock() as mm:
        # ... Do things that might publish metrics

        # Make assertions on metrics published
        mm.assert_incr_once("some.key", value=1)

markus's People

Contributors

Stargazers

Watchers

Forkers

peterbe mythmon jwhitlock dav009 keshabb bradykieffer python-repository-hub robhudson

markus's Issues

pytest import in `init.py` makes it difficult to determine when pytest is running

For mozilla/fx-private-relay#1809, I wanted to enable code only if we're not running pytest. There's one way to do it, and a quick way (if "pytest" in sys.modules). Because markus imports pytest early (added for #73), the quick method doesn't work.

I think I can make this work by adding a pytest plugin.

support passing in no key prefix to `get_metrics`

get_metrics requires a "thing" from which it generates a key prefix for all keys produced with that MetricsInterface.

It'd be nice if get_metrics didn't require a thing and if there was no thing, then there would be no key prefix.

Implement tags in Statsd backend creating additional metrics

Would it make sense that for a given metric A, using tags, to create additional metrics for each tag?

For example, for calls:

incr('A')
incr('A', tags=dict(tag1=value1, tag2=value2))
incr('A', tags=dict(tag2=value2))

It would generate metrics:

A.count = 3
A.tagged.tag1.value1.count = 1
A.tagged.tag2.value2.count = 2

support timing and histogram metrics types in cloudwatch backend

Datadog has support for other metrics types than incr and gauge now.

https://docs.datadoghq.com/integrations/amazon_lambda/#lambda-metrics

We should adjust the cloudwatch backend code accordingly.

drop support for python 3.4

We can drop support for Python 3.4 now since it's hit end-of-life:

https://www.python.org/dev/peps/pep-0429/

clean up args

Go through and clean up the args to incr, gauge, timing, and histogram.

They should take a stat and a value.

Print markus records on assertion failure

When using MetricsMock's test helpers, I always want to see the records on test failures. This can be done manually, such as running pytest --pdb and inspecting, or changing the test to include metricsmock.print_records().

pytest has a type-based method for adding more data to assetions, but this may require some gymnastics to wrap results in a new type, solely for the purpose of better failed test output.

Another option is to manually create the failure message to include the records. This would lose pytest assertion output but would work for other test frameworks.

switch to circleci

Feels like TravisCI's days are numbered. Should switch to Circle CI.

what timezone should LoggingMetrics use for timestamps?

What timezone should LoggingMetrics use for timestamps? Why?

Should it be configurable? Why?

What does the Python logging module do for timestamps and timezones?

add project_urls

Add project_urls to setup.py.

https://packaging.python.org/guides/distributing-packages-using-setuptools/#project-urls

way to define metrics published and autodocumenting

One of the things I want Markus to do is make it easier to document the metrics generated by some module.

What metrics does this module publish? What type are they (gauge, counter, and so on)? What do they mean?

It'd be great if we could automatically document metrics using a Sphinx extension and something like:

.. autometrics:: path.to.MetricsImplementation.instance

I think the best way to do this is to declare what metrics get published somewhere. Maybe do it when acquiring the MetricsInterface? Maybe something like this:

import markus


metrics = markus.get_metrics(
    'antenna.breakpad_resource',
    stats=[
        markus.Stat(key='widget', doc='Counts the number of times we make a widget.'),
        markus.Stat(key='widget_creation', doc='Timing for how long it takes to make a widget.'),
    ]
)

Then you point autometrics at that and it'd spit out something like this:

Metrics:

.. markus:stat:: antenna.breakpad_resource.widget

   Counts the number of times we make a widget.

.. markus:stat:: antenna.breakpad_resource.widget_creation

   Timing for how long it takes to make a widget.

Maybe we could also add a flag to markus.get_metrics that has it raise an error if something tries to publish a metric it doesn't know about?

Is the syntax for markus.get_metrics good? Are there better ways to specify things?

All of this would be optional--you could use it ad-hoc or in a declare-before-use mode the latter allowing for auto-documentation.

support python 3.7

We should support Python 3.7 and make whatever changes that are necessary.

sphinx extension for documenting keys used

Markus should have a sphinx extension that makes it easier to document the keys in use, link to them in the docs, and spit out a listing of all the documented keys.

Maybe something like this:

.. markus:metric:: widget.count
   :type: count

   This key counts the number of widgets made.

... markus:metricslist::

:markus:metric:`widget.count`

hook up to readthedocs

switch to src/ layout

We should switch to a src/ directory model.

We'd need to change tox to install markus before running tests.

We should also add a tox environment for running twine check on the sdist.

rename master to main

See: https://www.hanselman.com/blog/EasilyRenameYourGitDefaultBranchFromMasterToMain.aspx

Update any other documentation (ie readme, release checklist etc)
Document changes in repo
Move master into new main branch and set new default

requires 'six' but doesn't include it as a dependency?

I added markus to my project, but when Circle tried to run my tests, it spat out:

Traceback (most recent call last):
  File "manage.py", line 21, in <module>
    main()
  File "manage.py", line 17, in main
    execute_from_command_line(sys.argv)
  File "/app/.local/lib/python3.7/site-packages/django/core/management/__init__.py", line 381, in execute_from_command_line
    utility.execute()
  File "/app/.local/lib/python3.7/site-packages/django/core/management/__init__.py", line 375, in execute
    self.fetch_command(subcommand).run_from_argv(self.argv)
  File "/app/.local/lib/python3.7/site-packages/django/core/management/commands/test.py", line 23, in run_from_argv
    super().run_from_argv(argv)
  File "/app/.local/lib/python3.7/site-packages/django/core/management/base.py", line 315, in run_from_argv
    parser = self.create_parser(argv[0], argv[1])
  File "/app/.local/lib/python3.7/site-packages/django/core/management/base.py", line 289, in create_parser
    self.add_arguments(parser)
  File "/app/.local/lib/python3.7/site-packages/django/core/management/commands/test.py", line 44, in add_arguments
    test_runner_class = get_runner(settings, self.test_runner)
  File "/app/.local/lib/python3.7/site-packages/django/test/utils.py", line 303, in get_runner
    test_runner_class = test_runner_class or settings.TEST_RUNNER
  File "/app/.local/lib/python3.7/site-packages/django/conf/__init__.py", line 79, in __getattr__
    self._setup(name)
  File "/app/.local/lib/python3.7/site-packages/django/conf/__init__.py", line 66, in _setup
    self._wrapped = Settings(settings_module)
  File "/app/.local/lib/python3.7/site-packages/django/conf/__init__.py", line 157, in __init__
    mod = importlib.import_module(self.SETTINGS_MODULE)
  File "/usr/local/lib/python3.7/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1006, in _gcd_import
  File "<frozen importlib._bootstrap>", line 983, in _find_and_load
  File "<frozen importlib._bootstrap>", line 967, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 677, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 728, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/app/privaterelay/settings.py", line 16, in <module>
    import markus
  File "/app/.local/lib/python3.7/site-packages/markus/__init__.py", line 5, in <module>
    from markus.main import configure, get_metrics  # noqa
  File "/app/.local/lib/python3.7/site-packages/markus/main.py", line 12, in <module>
    import six
ModuleNotFoundError: No module named 'six'

implement sets

add tests for backends

We currently have no tests for markus.configure or the backends. We should probably add at least some basic ones.

test markus.configure with zero, one, and multiple backends does the right thing
test markus.configure with backend options passes the options in
test markus.configure with backend filters passes the filters in
test that filters work on the backends

Probably some other things, too.

drop support for Python 3.6

Support ends for Python 3.6 in December 2021. However, there's some irritants with supporting 3.6 so it'll be easier maintenance-wise to drop support now.

fix example for AddTagFilter

The example for AddTagFilter is wrong. It's importing it from the wrong place.

https://markus.readthedocs.io/en/latest/filters.html#markus.filters.AddTagFilter

add a statsd backend

We have a datadog backend, but we should also add a regular statsd backend.

give logging backend more structure

I think there are two use cases for the logging backend:

A ops person or developer person is watching the logs go sailing by and is keeping an eye on certain metrics. This is good for local development.
Someone writes a log parser and it's pulling out and keeping track of metrics to report on.

The latter would be a lot easier if the logged lines were better structured and easier to tokenize.

This issue covers fixing that. Maybe the best idea is to mimic the format we use in the cloudwatch backend?

look into switching to time.perf_counter

Currently Markus uses time.time to measure elapsed time. @peterbe pointed out that it's probably the case we should be using time.perf_counter:

https://docs.python.org/3/library/time.html#time.perf_counter

This issue covers looking into that.

support python 2.7

I want to use Markus in Socorro, but I need it to support Python 2.7 first.

One of the things that's hard is that Markus uses the Python 3 statistics module. There's a backport (https://pypi.python.org/pypi/statistics) for Python 2.7, so I think we can use that. Otherwise, I think this should be straight-forward.

rework tags

The Datadog and logging backends support tags, but the API for doing tags is a little weird and there's nothing that sanitizes tag keys and values.

This issue covers rethinking that a bit.

add usage docs

Need to add usage docs. All we have currently is the README.

add support for disabling container id in datadog library

If you use Markus and the Datadog Python library 0.45.0 in a k8s environment, the Datadog library will add a container id to metrics. Telegraf doesn't support the container id.

influxdata/telegraf#12991

We should add something to Markus to disable the container id from being added by default.

@jwhitlock suggested adding origin_detection_enabled and defaulting it to False in the DatadogMetrics backend and passing that through to the DogStatsD client.

add doctests where helpful

should the LoggingMetrics format be configurable?

Should the LoggingMetrics format be configurable? If so, why?

Are there any use cases for this in real life (as opposed to hypothetical ones)?

fix defining filters at configure time

The docs say you can do this:

markus.configure(
    backends=[
        {
            "class": "markus.backends.datadog.DatadogMetrics",
            "options": {
                "statsd_host": "example.com",
                "statsd_port": 8125,
                "statsd_namespace": ""
            },
            "filters": [HostFilter(HOSTID)]
        }
    ]
)

Except that doesn't work because the filters aren't passed as an argument when instantiating the backend class.

This issue covers fixing that.

While doing that, we should also fix the docs for writing filters to have a link to the MetricsRecord class.

add "clear" to MetricsMock

If you want to use MetricsMock with pytest and fixtures, you might make a fixture like this:

from markus.testing import MetricsMock

@pytest.fixture
def metricsmock():
    with MetricsMock() as mm:
        yield mm


def test_something(metricsmock):
    # do things that create metrics
    assert metricsmock.has_record('incr', stat='something', value=1)

The thing MetricsMock is missing is the ability to clear the records.

This issue covers implementing that.

maybe support opentelemetry?

It might be interesting to support opentelemetry:

https://opentelemetry-python.readthedocs.io/en/stable/index.html

If the shape of that is too different, it probably doesn't make sense to do in Markus.

MetricsMock should grow some more testing helpers

@jwhitlock suggested MetricsMock grow some more assertion helpers like Mock has. I like that idea generally. I think we should look at figuring out which ones solve which use cases and add some.

rewrite internals to emit records

Currently, we carry the whole incr, gauge, timing, and histogram thing from the metrics interface all the way through to the backends. Doing that makes it hard to think about metrics generation as a pipeline with intermediary steps between the interface and the backends.

This issue covers reworking that so that the metrics interface creates a record and then calls .emit(record) on the backends.

This does a couple of things:

simplifies the backends--they only need to implement one method
makes it possible to expand the pipeline with filters in the middle that can change the metrics as they go by--this preps us for issue #40

switch from flake8 to ruff

drop support for python 2.7

Originally, Markus only supported Python 3 but then I needed to use it in Socorro which used Python 2.7, so I added support for that. Socorro is now Python 3, so we can drop support for Python 2.7 again.

Enable pytest asserts for MetricsMock helpers

When using MetricsMock for testing (mozilla/ichnaea#1164 (comment)), methods like assert_incr_once use plain assertions, rather than pytest rewritten assertions, such as:

Traceback (most recent call last):
  File "/app/ichnaea/content/tests/test_views.py", line 112, in test_content
    metricsmock.assert_incr_once('request', tags=[f"path:{metric_path}", "method:get", f"status:{status}"])
  File "/usr/local/lib/python3.8/site-packages/markus/testing.py", line 106, in assert_incr_once
    assert len(self.filter_records(INCR, stat=stat, value=value, tags=tags)) == 1
AssertionError

I enabled assertion rewriting in my project's conftest.py file:

pytest.register_assert_rewrite("markus.testing")
from markus.testing import MetricsMock  # noqa: E402

A similar method could be used in markus.__init__.py, according to pytest assertion rewriting docs. Something like:

try:
   import pytest
except ImportError:
   pass
else:
   pytest.register_assert_rewrite("markus.testing")

fix readthedocs configuration

We need to fix the readthedocs configuration:

http://docs.readthedocs.io/en/latest/yaml-config.html

Should switch to .readthedocs.yml and set it to pip install and include all the extras.

Also, we should move the requirements-dev.txt file into setup extra_requires stuff--seems silly to have it codified in an extra place.

calling configure in MetricsMock context messes up MetricsMock

If you're testing with MetricsMock and entered a context and the thing you're testing calls markus.configure(), then that stomps on the MetricsMock backend and you don't capture any records.

For example, this doesn't work:

import markus
from markus.testing import MetricsMock

def test_something():
    with MetricsMock() as mm:
        markus.configure([{'class': 'markus.backends.logging.LoggingMetrics'}])

        metrics = markus.get_metrics('thing')
        metrics.incr('key1')

        assert mm.has_record('incr', stat='thing.key1', value=1)

add support for Python 3.9

Python 3.9 has been released. We should add support for it. This might be as simple as adding it to the test suite.

add support for Python 3.10

Python 3.10 is out. We should add support for it.

Pretty sure that there's no actual work here and it's just updating the metadata.

add support for tags across all backends

DatadogStatsd has support for tags. We should add that to all the backends. For backends that can't do anything with them, they can drop them.

hook up to travis ci

add pytest fixture

Everyone that I've talked to that uses Markus is also using pytest and has to add something like this to their conftest.py:

from markus.testing import MetricsMock

@pytest.fixture
def metricsmock():
    with MetricsMock() as mm:
        yield mm

That seems silly. We should add it to Markus proper and document usage of the fixture.

drop support for python 3.5

Python 3.5 is no longer supported. We can drop support for it.

need way to globally modify metrics

We've started to use Markus on Normandy. Thanks for making it easy to integrate!

I think a good addition to Markus would be a middleware system, that could process the data before it is sent to the backend. This would help us solve a problem we're having on Normandy.

One the requests we had from Ops was the ability to have global tags that could be controlled via environment variables. Specifically, we wanted to add a tag to canary instances so we could compare their stats against the general population.

Instead of updating every single metrics call location with this functionality, I thought to add it as a part of the config. The only configurable place I could find to do this was the backends. Making custom backends to do this either ended up being very repetitive, or complicated and hard to test. It also means that any backend we added would have to have the same modifications, which is hard to enforce.

What do you think? I could work on a PR to add this feature if you think it's a good feature for Markus to have.