reddit / baseplate.py Goto Github PK
View Code? Open in Web Editor NEWreddit's python service framework
Home Page: https://baseplate.readthedocs.io
License: BSD 3-Clause "New" or "Revised" License
reddit's python service framework
Home Page: https://baseplate.readthedocs.io
License: BSD 3-Clause "New" or "Revised" License
It's hard to take advantage of more complex structuring with them, for example it's impossible to use nested keys:
thrift.service_a.endpoint = ...
thrift.service_b.endpoint = ...
wouldn't work with thrift_pool_from_config
since it can't handle multi-level prefixes.
This should follow the same model as event publishers where the application workers write to POSIX message queues and a sidecar daemon consumes these messages and publishes batches.
This has a few advantages:
Could be based on number of requests or a time limit. Should probably be fuzzed to stagger the restarts around the cluster.
The goal is to refresh ourselves every now and then to combat memory leaks/fragmentation etc. and also to get new DNS if necessary.
This probably means telling Einhorn that we want to be replaced, but probably want to also be able to just turn ourselves off and let a process manager figure it out for themselves.
From my understanding, we currently only authenticate to vault using the EC2 instance identity document. For the future when we run applications on kubernetes, this will not be sufficient as many arbitrary applications may run on the same instance in different pods.
The kubernetes auth backend is used by providing the role and a JWT mounted within kubernetes pods at /run/secrets/kubernetes.io/serviceaccount/token
which is used to authenticate as a service account. Therefore, we simply just need to read the contents of this file and authenticate at a designated mount point for the kubernetes auth backend in order to get this working.
Here is some sample code using the hvac library:
import hvac
TOKEN_FILE = '/run/secrets/kubernetes.io/serviceaccount/token'
VAULT_URL = 'xxxx'
AUTH_ENDPOINT = '/v1/auth/kubernetes/login'
with open(TOKEN_FILE, 'r') as f:
token = f.read()
role = 'xxxx'
params = {
'jwt': token,
'role': role
}
client = hvac.Client(url=VAULT_URL)
client.auth(AUTH_ENDPOINT, json=params)
I'm still getting familiar with the code, but it seems we may want to build this functionality into VaultClientFactory rather than use hvac for more flexibility.
Right now, it's a free-for-all. It'd be nice to figure out some common patterns that we're going to be using (especially re: Zipkin) and then figure out how to make sugar for them which 1) reduces boilerplate, and 2) improves consistency. It'd also be good to figure out how some annotations can map to the metrics observer (e.g. count of failed RPC attempts in thrift_pool). This will probably require more things to use the API before we can get a good feel for patterns.
It'd be more consistent for the event queues to show up on the context object.
It'd be nice to have the service IDL include annotations on which methods are safe to retry and which are not so that service consumers can interact with us safely. Once that's in place, it becomes obvious that we can just do retries semi-automatically (i.e. with some specification of retry parameters by the client).
We have pshell
for pyramid services, it'd be really nice to have an equivalent for quick messing around with thrift services.
To ensure we have standard processing of reddit IDs (which look suspiciously like r2 fullnames).
Since we're not using a paste entry point it's difficult to get pshell to work with baseplate services. It'd be useful for local dev to have this.
DictOf does what we want without Optional but the combo confusingly always returns the default. This may just be something that we detect and say "don't do that" since it's unnecessary.
from baseplate import config
CONFIG = {
"foo": config.Optional(
config.DictOf(config.TupleOf(config.String)),
default={},
)
}
app_config = {
"foo.bar": "a, b, c, d",
"foo.baz": "e, f",
}
cfg = config.parse_config(app_config, CONFIG)
print(cfg.foo.bar)
prints {}
.
Maybe the server should dump a traceback on SIGUSR1
? This can be helpful in debugging.
B3 propagation bas become the standard when using HTTP. Baseplate currently supports different headers for propagating trace data (Thrift ref, Pyramid/WSGI ref.
Supporting both the old version of headers and B3 shouldn't be too difficult.
We should detect and at the very least log when greenlets block for too long. This is probably indicative of a blocking API being called somewhere by accident (despite monkeypatching, stuff like flock
is still unpatched and still blocks).
This might be relevant: http://www.rfk.id.au/blog/entry/detect-gevent-blocking-with-greenlet-settrace/
Using pip 10.0.1 and python 3.6, pip install baseplate
returns:
running bdist_wheel
running build
running build_py
SPECIAL baseplate/thrift/baseplate.thrift build/thrift/baseplate/thrift/baseplate.thrift_buildstamp
mkdir -p build/thrift/baseplate/thrift/baseplate.thrift
thrift1 -strict -gen py:utf8strings,slots,new_style -out build/thrift/baseplate/thrift/baseplate.thrift baseplate/thrift/baseplate.thrift
cp -r build/thrift/baseplate/thrift/baseplate.thrift/baseplate/thrift/ baseplate/
touch build/thrift/baseplate/thrift/baseplate.thrift_buildstamp
make: *** No rule to make target `tests/integration/test.thrift', needed by `build/thrift/tests/integration/test.thrift_buildstamp'. Stop.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/private/var/folders/5k/h40qktzn2dl9z13v0cmf33hh0000gn/T/pip-install-o4ua71_c/baseplate/setup.py", line 125, in <module>
'build_py': BuildPyCommand,
File "/Users/mohamed/.virtualenvs/local/lib/python3.6/site-packages/setuptools/__init__.py", line 129, in setup
return distutils.core.setup(**attrs)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/core.py", line 148, in setup
dist.run_commands()
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/dist.py", line 955, in run_commands
self.run_command(cmd)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/Users/mohamed/.virtualenvs/local/lib/python3.6/site-packages/wheel/bdist_wheel.py", line 179, in run
self.run_command('build')
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/command/build.py", line 135, in run
self.run_command(cmd_name)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/cmd.py", line 313, in run_command
self.distribution.run_command(command)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/dist.py", line 974, in run_command
cmd_obj.run()
File "/private/var/folders/5k/h40qktzn2dl9z13v0cmf33hh0000gn/T/pip-install-o4ua71_c/baseplate/setup.py", line 67, in run
self._make_thrift()
File "/private/var/folders/5k/h40qktzn2dl9z13v0cmf33hh0000gn/T/pip-install-o4ua71_c/baseplate/setup.py", line 64, in _make_thrift
subprocess.check_call(make_cmd)
File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 291, in check_call
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/bin/make', 'thrift']' returned non-zero exit status 2.
----------------------------------------
Failed building wheel for baseplate
It seems the tests
directory is missing from the tarball uploaded to PyPI.
The Cassandra driver is asynchronous under the hood. It runs IO in a reactor on another thread (or in the native reactor if already async, e.g. gevent). It provides an API that returns Futures. You can either attach callbacks to the future to get notified of completion/failure or call result()
on the future to block the current thread until the result comes in. Callbacks are called on the IO thread or in the registering thread if the future had already completed at time of registration. The result()
API works by blocking on a threading.Event
object. When the operation finishes inside the IO thread, the future signals the Event
and then calls each of the callbacks in turn in that order.
Baseplate's Cassandra instrumentation intercepts Futures and attaches callbacks to them before returning them to the application. This is how we mark spans finished. The execute_async()
API works this way by default and the execute()
API is just execute_async()
with an immediate call to result()
on the returned future.
Because the IO thread signals the event which unleashes the main thread's result()
before doing the callbacks, there's a race condition where the main thread finishes up its work and finishes the server span before the callbacks and therefor the child span get to finish.
The impact of this is that a child span might finish after its parent span sometimes. For most observers this should be fine. The metrics observer will lose that child span's metric though because the batch for the server span will have already shipped. This means undercounting Cassandra calls and timings.
https://github.com/reddit/baseplate/blob/master/baseplate/config.py#L307
If the value from the config is False
this check is falsy and the default will be returned.
Things like sample rates are really common in configs, it'd be nice to make them more human readable:
[app:main]
rate = 37%
>>> cfg = config.parse_config(raw_config, {
... "rate": config.Percent,
... })
>>> print(cfg.rate)
.32
baseplate-{script,serve}
already hide gevent monkeypatching, so it'd be nice to do the same for psycogreen monkeypatching if sqlalchemy+PG is in use.
baseplate-healthcheck
uses urllib.quote
when performing a wsgi
healthcheck on an Endpoint where the family is socket.AF_UNIX
which was moved to urllib.parse
in py3. We should probably add it to _compat.py
and use that.
Traceback (most recent call last):
File "/usr/local/bin/baseplate-healthcheck3", line 9, in <module>
load_entry_point('baseplate', 'console_scripts', 'baseplate-healthcheck3')()
File "/usr/lib/python3/dist-packages/baseplate/server/healthcheck.py", line 68, in run_healthchecks
checker(args.endpoint)
File "/usr/lib/python3/dist-packages/baseplate/server/healthcheck.py", line 35, in check_http_service
quoted_path = urllib.quote(endpoint.address, safe="")
AttributeError: 'module' object has no attribute 'quote'
e.g. consider a baseplate-based script launched via python foo/bar.py
In the following code, directory
would look like foo
. Because this is not the root directory of the git repo, this causes the exception handling branch to be taken. We attempt to move up one directory, calling os.path.dirname
on the string "foo"
, and directory
is then updated to be an empty string. From there, we continue in an infinite loop, with directory
always being set to an empty string, and the exception branch always being hit.
It's currently using a ServerSpan level threadlocal internally to pass state around which doesn't work with local spans. This means that the parentage of client spans made ostensibly inside local spans incorrectly points at the server span instead.
It'd be nice to get some data on how application connection pools are doing. This should be doable for things like thrift_pool that we control completely and sqlalchemy that give us introspection events.
Running baseplate actions against a vanilla virtualenv with just the PyPI install of baseplate
breaks with a hard dependency on thrift (which is an optional requirement). Upon installing thrift
the error changes but appears to be due to a dependency on a different version of the thrift library.
Stacktrace from within a virtualenv:
$ python -m baseplate
Traceback (most recent call last):
File "/usr/lib64/python2.7/runpy.py", line 163, in _run_module_as_main
mod_name, _Error)
File "/usr/lib64/python2.7/runpy.py", line 111, in _get_module_details
__import__(mod_name) # Do not catch exceptions initializing package
File "<$PWD>/.venv/lib/python2.7/site-packages/baseplate/__init__.py", line 10, in <module>
from .core import Baseplate
File "<$PWD>/.venv/lib/python2.7/site-packages/baseplate/core.py", line 12, in <module>
from thrift.util import Serializer
ImportError: No module named util
installed packages:
$ pip freeze
baseplate==0.23.1
certifi==2017.11.5
chardet==3.0.4
enum34==1.1.6
idna==2.6
posix-ipc==1.0.0
PyJWT==1.5.3
requests==2.18.4
six==1.11.0
thrift==0.10.0
urllib3==1.22
EPIPE seems pretty safe, likewise zero-bytes-written. These should be cases where the RPC wasn't even sent, so retrying is always safe. Let's save the application from having to do that.
They appear in in /usr/lib/python2.7/dist-packages/tests
which is bad.
They shouldn't be installed at all, ideally, since they're not needed once the build's done.
The python-driver for cassandra can be very noisy in the logs. Lots of messages like
Jul 3 10:29:05 host [service] 23701:MainThread:cassandra.policies:INFO:Using datacenter 'ue1' for DCAwareRoundRobinPolicy (via host '10.0.0.0); if incorrect, please specify a local_dc to the constructor, or limit contact points to local cluster nodes#015
This comes from the on_up
method as found here https://datastax.github.io/python-driver/_modules/cassandra/policies.html
I believe we could reduce this by optionally passing in load_balancing_policy
here: https://github.com/reddit/baseplate/blob/master/baseplate/context/cassandra.py#L14
load_balancing_policy=DCAwareRoundRobinPolicy(local_dc=local_dc)
This check means we never load the file if it doesn't exist / we can't access it. This leads to an unhelpful error when fetching a secret later on:
File "/usr/lib/python2.7/dist-packages/baseplate/secrets/store.py", line 224, in get_versioned
secret_attributes = self.get_raw(path)
File "/usr/lib/python2.7/dist-packages/baseplate/secrets/store.py", line 169, in get_raw
return self._secrets[path]
TypeError: 'NoneType' object has no attribute '__getitem__'
Things like context.trace
should be easier to deal with in tests.
These don't get overwritten by later updates which makes them a pain to deal with. Where are they coming from!?
It should probably require some text to be present and the user can use Optional
if they want it to really be optional.
This should watch for changes in any related source or config files and restart itself. Useful for quick development.
It currently only works with URL dispatch.
r2 activity service stuff is doing a lot of g.stats.simple_event("activity_service.write.fail")
in a try/except. This should be automatic as part of the child span stuff.
This is useful for local testing.
BaseplateProcessorEventHandler. handlerError
uses logger.exception
which throws an error whenever you raise a Thrift exception in Python 3 since the generated Thrift exceptions are unhashable.
This means that we cannot raise Thrift exceptions within a Python 3 Thrift service without monkey patching TException
to include the methods that we need.
From @pacejackson:
crypto.make_signature
returns a "bytes" object butcrypto.verify_signature
says thatsignature
should be a str. In practice, a string or a bytes object forsignature
works because the base64 decode methods started accepting both bytes and strings in python 3.3
Baseplate uses a few third-party libraries such as python cassandra driver. Their log lines are rather chatty, making backend logs very noisy.
Would be good if baseplate provides a idiomatic way for services to tweak the log levels, since baseplate is aware of all the libraries it depends on.
If we increment the same counter a number of times in the same request (e.g. success counters for a client call) we'll currently append a bunch of +1 lines to the metrics batch. It'd be much better to turn that into a single +N line to save space in metrics packets.
Though the spec says "For compatibility all values should be integers in the range (-2^53^, 2^53^)." rounding the metrics to an integer is making for really really ugly graphs when things are quick. It looks like everything we're using is float-safe, so let's just let the raw numbers through.
I needed to install the following to get the docs to build locally:
I have traditionally followed the docs-requirements.txt
/dev-requirements.txt
pattern for stuff like this, but didn't want to presume this is where you were heading with a premature PR.
I can toss a PR over the fence if you let me know how you'd like to handle it (or not handle it).
Looking good so far!
This'll require choosing a max line length (we're generally OK up to 100 I think) and dealing with the three violators of that rule (mostly long URLs in comments).
There's no good reason for it to default to unlimited and it's a footgun. Require applications to set something since we can't really think of a good default since it depends on how CPU-bound the application is.
We should consider adding a new secretSource: vault
or some similar config for the publisher that allows you to pull the key name and secret from vault
I believe we're pulling them from config with these lines below
https://github.com/reddit/baseplate/blob/master/baseplate/events/publisher.py#L157-L159
Because of the TException
constructor, message
is None
. This makes for rather annoying log messages.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.