reddit / baseplate.py Goto Github PK

View Code? Open in Web Editor NEW

530.0 82.0 171.0 3.83 MB

reddit's python service framework

Home Page: https://baseplate.readthedocs.io

License: BSD 3-Clause "New" or "Revised" License

Makefile 0.28% Python 98.79% Thrift 0.90% Dockerfile 0.04%

reddit python services framework

baseplate.py's Issues

Integrate `_from_config` functions better with the rest of config

It's hard to take advantage of more complex structuring with them, for example it's impossible to use nested keys:

thrift.service_a.endpoint = ...
thrift.service_b.endpoint = ...

wouldn't work with thrift_pool_from_config since it can't handle multi-level prefixes.

Improve the script command

Allow the script to parse arguments itself
Make it possible to run scripts directly by filename rather than doing global module lookup.
Ponder more on how to set up a nice environment in the script sanely.

Automate documentation building and publishing to gh-pages branch here

Move trace shipping to sidecar

This should follow the same model as event publishers where the application workers write to POSIX message queues and a sidecar daemon consumes these messages and publishes batches.

This has a few advantages:

The sidecar can be reused for other languages that need to ship spans.
The application can be unaware of which tracing service is being used (Zipkin etc.)
Traces waiting in queue aren't lost during application shutdown.

Add option to servers to restart occasionally

Could be based on number of requests or a time limit. Should probably be fuzzed to stagger the restarts around the cluster.

The goal is to refresh ourselves every now and then to combat memory leaks/fragmentation etc. and also to get new DNS if necessary.

This probably means telling Einhorn that we want to be replaced, but probably want to also be able to just turn ourselves off and let a process manager figure it out for themselves.

Support kubernetes auth backend for vault

From my understanding, we currently only authenticate to vault using the EC2 instance identity document. For the future when we run applications on kubernetes, this will not be sufficient as many arbitrary applications may run on the same instance in different pods.

The kubernetes auth backend is used by providing the role and a JWT mounted within kubernetes pods at /run/secrets/kubernetes.io/serviceaccount/token which is used to authenticate as a service account. Therefore, we simply just need to read the contents of this file and authenticate at a designated mount point for the kubernetes auth backend in order to get this working.

Here is some sample code using the hvac library:

import hvac

TOKEN_FILE = '/run/secrets/kubernetes.io/serviceaccount/token'
VAULT_URL = 'xxxx' 
AUTH_ENDPOINT = '/v1/auth/kubernetes/login'

with open(TOKEN_FILE, 'r') as f:
    token = f.read()

role = 'xxxx'

params = {
  'jwt': token,
  'role': role
}

client = hvac.Client(url=VAULT_URL)
client.auth(AUTH_ENDPOINT, json=params)

I'm still getting familiar with the code, but it seems we may want to build this functionality into VaultClientFactory rather than use hvac for more flexibility.

Improve span annotations API

Right now, it's a free-for-all. It'd be nice to figure out some common patterns that we're going to be using (especially re: Zipkin) and then figure out how to make sugar for them which 1) reduces boilerplate, and 2) improves consistency. It'd also be good to figure out how some annotations can map to the metrics observer (e.g. count of failed RPC attempts in thrift_pool). This will probably require more things to use the API before we can get a good feel for patterns.

Add ContextFactory for event queue

It'd be more consistent for the event queues to show up on the context object.

Improve/get test coverage in context helpers

Fix the obnoxious non-deterministic tests in the retry library

Annotate thrift service methods as retryable

It'd be nice to have the service IDL include annotations on which methods are safe to retry and which are not so that service consumers can interact with us safely. Once that's in place, it becomes obvious that we can just do retries semi-automatically (i.e. with some specification of retry parameters by the client).

Make a shell for thrift services

We have pshell for pyramid services, it'd be really nice to have an equivalent for quick messing around with thrift services.

Add helpers for IDs

To ensure we have standard processing of reddit IDs (which look suspiciously like r2 fullnames).

Consider replacing custom reloader with hupper

https://docs.pylonsproject.org/projects/hupper/en/latest/

Add helper to Pyramid integration for pshell

Since we're not using a paste entry point it's difficult to get pshell to work with baseplate services. It'd be useful for local dev to have this.

Figure out what's wrong with config.Optional(config.DictOf)

DictOf does what we want without Optional but the combo confusingly always returns the default. This may just be something that we detect and say "don't do that" since it's unnecessary.

from baseplate import config

CONFIG = {
    "foo": config.Optional(
        config.DictOf(config.TupleOf(config.String)),
        default={},
    )
}

app_config = {
    "foo.bar": "a, b, c, d",
    "foo.baz": "e, f",
}

cfg = config.parse_config(app_config, CONFIG)
print(cfg.foo.bar)

prints {}.

Consider adding helper for dumping tracebacks

Maybe the server should dump a traceback on SIGUSR1? This can be helpful in debugging.

Support for `X-B3-*` trace headers.

B3 propagation bas become the standard when using HTTP. Baseplate currently supports different headers for propagating trace data (Thrift ref, Pyramid/WSGI ref.

Supporting both the old version of headers and B3 shouldn't be too difficult.

Diagnostics for when greenlets get blocked

We should detect and at the very least log when greenlets block for too long. This is probably indicative of a blocking API being called somewhere by accident (despite monkeypatching, stuff like flock is still unpatched and still blocks).

This might be relevant: http://www.rfk.id.au/blog/entry/detect-gevent-blocking-with-greenlet-settrace/

Cannot install via pip

Using pip 10.0.1 and python 3.6, pip install baseplate returns:

running bdist_wheel
running build
running build_py
SPECIAL baseplate/thrift/baseplate.thrift build/thrift/baseplate/thrift/baseplate.thrift_buildstamp
mkdir -p build/thrift/baseplate/thrift/baseplate.thrift
thrift1 -strict -gen py:utf8strings,slots,new_style -out build/thrift/baseplate/thrift/baseplate.thrift baseplate/thrift/baseplate.thrift
cp -r build/thrift/baseplate/thrift/baseplate.thrift/baseplate/thrift/ baseplate/
touch build/thrift/baseplate/thrift/baseplate.thrift_buildstamp
make: *** No rule to make target `tests/integration/test.thrift', needed by `build/thrift/tests/integration/test.thrift_buildstamp'.  Stop.
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/private/var/folders/5k/h40qktzn2dl9z13v0cmf33hh0000gn/T/pip-install-o4ua71_c/baseplate/setup.py", line 125, in <module>
    'build_py': BuildPyCommand,
  File "/Users/mohamed/.virtualenvs/local/lib/python3.6/site-packages/setuptools/__init__.py", line 129, in setup
    return distutils.core.setup(**attrs)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/dist.py", line 955, in run_commands
    self.run_command(cmd)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "/Users/mohamed/.virtualenvs/local/lib/python3.6/site-packages/wheel/bdist_wheel.py", line 179, in run
    self.run_command('build')
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/command/build.py", line 135, in run
    self.run_command(cmd_name)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "/private/var/folders/5k/h40qktzn2dl9z13v0cmf33hh0000gn/T/pip-install-o4ua71_c/baseplate/setup.py", line 67, in run
    self._make_thrift()
  File "/private/var/folders/5k/h40qktzn2dl9z13v0cmf33hh0000gn/T/pip-install-o4ua71_c/baseplate/setup.py", line 64, in _make_thrift
    subprocess.check_call(make_cmd)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 291, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/bin/make', 'thrift']' returned non-zero exit status 2.

----------------------------------------
Failed building wheel for baseplate

It seems the tests directory is missing from the tarball uploaded to PyPI.

Fix race condition in Cassandra span reporting

The Cassandra driver is asynchronous under the hood. It runs IO in a reactor on another thread (or in the native reactor if already async, e.g. gevent). It provides an API that returns Futures. You can either attach callbacks to the future to get notified of completion/failure or call result() on the future to block the current thread until the result comes in. Callbacks are called on the IO thread or in the registering thread if the future had already completed at time of registration. The result() API works by blocking on a threading.Event object. When the operation finishes inside the IO thread, the future signals the Event and then calls each of the callbacks in turn in that order.

Baseplate's Cassandra instrumentation intercepts Futures and attaches callbacks to them before returning them to the application. This is how we mark spans finished. The execute_async() API works this way by default and the execute() API is just execute_async() with an immediate call to result() on the returned future.

Because the IO thread signals the event which unleashes the main thread's result() before doing the callbacks, there's a race condition where the main thread finishes up its work and finishes the server span before the callbacks and therefor the child span get to finish.

The impact of this is that a child span might finish after its parent span sometimes. For most observers this should be fine. The metrics observer will lose that child span's metric though because the batch for the server span will have already shipped. This means undercounting Cassandra calls and timings.

config.Optional always returns the default for Boolean values

https://github.com/reddit/baseplate/blob/master/baseplate/config.py#L307

If the value from the config is False this check is falsy and the default will be returned.

Add percent config validator

Things like sample rates are really common in configs, it'd be nice to make them more human readable:

[app:main]
rate = 37%

>>> cfg = config.parse_config(raw_config, {
...    "rate": config.Percent,
... })
>>> print(cfg.rate)
.32

Consider automating Psycogreen monkeypatching

baseplate-{script,serve} already hide gevent monkeypatching, so it'd be nice to do the same for psycogreen monkeypatching if sqlalchemy+PG is in use.

Heathcheck script errors in python 3 when using an AF_UNIX endpoint

baseplate-healthcheck uses urllib.quote when performing a wsgi healthcheck on an Endpoint where the family is socket.AF_UNIX which was moved to urllib.parse in py3. We should probably add it to _compat.py and use that.

Traceback (most recent call last):
  File "/usr/local/bin/baseplate-healthcheck3", line 9, in <module>
    load_entry_point('baseplate', 'console_scripts', 'baseplate-healthcheck3')()
  File "/usr/lib/python3/dist-packages/baseplate/server/healthcheck.py", line 68, in run_healthchecks
    checker(args.endpoint)
  File "/usr/lib/python3/dist-packages/baseplate/server/healthcheck.py", line 35, in check_http_service
    quoted_path = urllib.quote(endpoint.address, safe="")
AttributeError: 'module' object has no attribute 'quote'

Initializing raven hangs indefinitely if a script is being run via a non-absolute path that is not the base git directory

e.g. consider a baseplate-based script launched via python foo/bar.py

In the following code, directory would look like foo. Because this is not the root directory of the git repo, this causes the exception handling branch to be taken. We attempt to move up one directory, calling os.path.dirname on the string "foo", and directory is then updated to be an empty string. From there, we continue in an infinite loop, with directory always being set to an empty string, and the exception branch always being hit.

https://github.com/reddit/baseplate/blob/20f5a0513cb2ad5f5d7ef1b4178fcc9cc112fb67/baseplate/__init__.py#L173-L181

SQLAlchemy instrumentation doesn't handle local spans

It's currently using a ServerSpan level threadlocal internally to pass state around which doesn't work with local spans. This means that the parentage of client spans made ostensibly inside local spans incorrectly points at the server span instead.

Add monitoring of connection pools

It'd be nice to get some data on how application connection pools are doing. This should be doable for things like thrift_pool that we control completely and sqlalchemy that give us introspection events.

Remove hard dependencies that live outside PyPI

Running baseplate actions against a vanilla virtualenv with just the PyPI install of baseplate breaks with a hard dependency on thrift (which is an optional requirement). Upon installing thrift the error changes but appears to be due to a dependency on a different version of the thrift library.

Stacktrace from within a virtualenv:

$ python -m baseplate
Traceback (most recent call last):
  File "/usr/lib64/python2.7/runpy.py", line 163, in _run_module_as_main
    mod_name, _Error)
  File "/usr/lib64/python2.7/runpy.py", line 111, in _get_module_details
    __import__(mod_name)  # Do not catch exceptions initializing package
  File "<$PWD>/.venv/lib/python2.7/site-packages/baseplate/__init__.py", line 10, in <module>
    from .core import Baseplate
  File "<$PWD>/.venv/lib/python2.7/site-packages/baseplate/core.py", line 12, in <module>
    from thrift.util import Serializer
ImportError: No module named util

installed packages:

$ pip freeze
baseplate==0.23.1                                                                                                                                                                                                                      
certifi==2017.11.5
chardet==3.0.4
enum34==1.1.6
idna==2.6
posix-ipc==1.0.0
PyJWT==1.5.3
requests==2.18.4
six==1.11.0
thrift==0.10.0
urllib3==1.22

See when we can safely retry thrift operations

EPIPE seems pretty safe, likewise zero-bytes-written. These should be cases where the RPC wasn't even sent, so retrying is always safe. Let's save the application from having to do that.

debian packaging: Tests are being installed as a top-level python package

They appear in in /usr/lib/python2.7/dist-packages/tests which is bad.

They shouldn't be installed at all, ideally, since they're not needed once the build's done.

Quiet cassandra log spam

The python-driver for cassandra can be very noisy in the logs. Lots of messages like

Jul  3 10:29:05 host [service] 23701:MainThread:cassandra.policies:INFO:Using datacenter 'ue1' for DCAwareRoundRobinPolicy (via host '10.0.0.0); if incorrect, please specify a local_dc to the constructor, or limit contact points to local cluster nodes#015

This comes from the on_up method as found here https://datastax.github.io/python-driver/_modules/cassandra/policies.html

I believe we could reduce this by optionally passing in load_balancing_policy here: https://github.com/reddit/baseplate/blob/master/baseplate/context/cassandra.py#L14

load_balancing_policy=DCAwareRoundRobinPolicy(local_dc=local_dc)

Improve error message when SecretsStore unable to load file

This check means we never load the file if it doesn't exist / we can't access it. This leads to an unhelpful error when fetching a secret later on:

  File "/usr/lib/python2.7/dist-packages/baseplate/secrets/store.py", line 224, in get_versioned
    secret_attributes = self.get_raw(path)
  File "/usr/lib/python2.7/dist-packages/baseplate/secrets/store.py", line 169, in get_raw
    return self._secrets[path]
TypeError: 'NoneType' object has no attribute '__getitem__'

Add test fakes

Things like context.trace should be easier to deal with in tests.

debian packaging: Something is causing scripts to show up in /usr/local/bin too

These don't get overwritten by later updates which makes them a pain to deal with. Where are they coming from!?

config.String is not strict enough

It should probably require some text to be present and the user can use Optional if they want it to really be optional.

Add dev-mode auto-restarter for servers

This should watch for changes in any related source or config files and restart itself. Useful for quick development.

Pyramid integration does not wrap requests using traversal in spans

It currently only works with URL dispatch.

Record success/failure counts in child spans

r2 activity service stuff is doing a lot of g.stats.simple_event("activity_service.write.fail") in a try/except. This should be automatic as part of the child span stuff.

Log metrics at debug level in null transport

This is useful for local testing.

Cannot raise Thrift exceptions in Python 3

BaseplateProcessorEventHandler. handlerError uses logger.exception which throws an error whenever you raise a Thrift exception in Python 3 since the generated Thrift exceptions are unhashable.

https://github.com/reddit/baseplate/blob/69eba73ad7493e8c4822677b391346b5a8ee7c38/baseplate/integration/thrift/__init__.py#L117

This means that we cannot raise Thrift exceptions within a Python 3 Thrift service without monkey patching TException to include the methods that we need.

Fix str/bytes confusion in crypto signer API

From @pacejackson:

crypto.make_signature returns a "bytes" object but crypto.verify_signature says that signature should be a str. In practice, a string or a bytes object for signature works because the base64 decode methods started accepting both bytes and strings in python 3.3

Better logging level control for third-party dependencies

Baseplate uses a few third-party libraries such as python cassandra driver. Their log lines are rather chatty, making backend logs very noisy.

Would be good if baseplate provides a idiomatic way for services to tweak the log levels, since baseplate is aware of all the libraries it depends on.

Coalesce counters in metrics batches

If we increment the same counter a number of times in the same request (e.g. success counters for a client call) we'll currently append a bunch of +1 lines to the metrics batch. It'd be much better to turn that into a single +N line to save space in metrics packets.

Remove rounding from metrics

Though the spec says "For compatibility all values should be integers in the range (-2^53^, 2^53^)." rounding the metrics to an integer is making for really really ugly graphs when things are quick. It looks like everything we're using is float-safe, so let's just let the raw numbers through.

Document or include Sphinx requirements

I needed to install the following to get the docs to build locally:

sphinx
sphinxcontrib-spelling
pyenchant

I have traditionally followed the docs-requirements.txt/dev-requirements.txt pattern for stuff like this, but didn't want to presume this is where you were heading with a premature PR.

I can toss a PR over the fence if you let me know how you'd like to handle it (or not handle it).

Looking good so far!

reddit / baseplate.py Goto Github PK

baseplate.py's Issues

Recommend Projects

Recommend Topics

Recommend Org