Git Product home page Git Product logo

baseplate.py's Introduction

This repository is archived.

This repository is archived and will not receive any updates or accept issues or pull requests.

To report bugs in reddit.com please make a post in /r/bugs.

If you have found a bug that can in some way compromise the security of the site or its users, please exercise responsible disclosure and e-mail [email protected].


API

For notices about reddit API changes and discussion of reddit API client development, subscribe to the /r/redditdev and /r/changelog subreddits.

To learn more about reddit's API, check out our automated API documentation and the API wiki page. Please use a unique User-Agent string and take care to abide by our API rules.

Quickstart

To set up your own instance of reddit see the install guide.

baseplate.py's People

Contributors

alienth avatar bradengroom avatar bsimpson63 avatar chriskuehl avatar ckwang8128 avatar cshoe avatar curioussavage avatar dependabot[bot] avatar diffyqgirl avatar dwick avatar fishy avatar foreverest avatar ghirsch-reddit avatar kaitaan avatar krav avatar ktatkinson avatar manishapme avatar markis avatar nataliest avatar nsheaps avatar pacejackson avatar pnovotnak avatar praxist avatar roganmurley avatar rram avatar siddharthmanoj avatar sp3nx0r avatar spladug avatar superq avatar xaelias avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

baseplate.py's Issues

Fix str/bytes confusion in crypto signer API

From @pacejackson:

crypto.make_signature returns a "bytes" object but crypto.verify_signature says that signature should be a str. In practice, a string or a bytes object for signature works because the base64 decode methods started accepting both bytes and strings in python 3.3

Cannot raise Thrift exceptions in Python 3

BaseplateProcessorEventHandler. handlerError uses logger.exception which throws an error whenever you raise a Thrift exception in Python 3 since the generated Thrift exceptions are unhashable.

https://github.com/reddit/baseplate/blob/69eba73ad7493e8c4822677b391346b5a8ee7c38/baseplate/integration/thrift/__init__.py#L117

This means that we cannot raise Thrift exceptions within a Python 3 Thrift service without monkey patching TException to include the methods that we need.

Coalesce counters in metrics batches

If we increment the same counter a number of times in the same request (e.g. success counters for a client call) we'll currently append a bunch of +1 lines to the metrics batch. It'd be much better to turn that into a single +N line to save space in metrics packets.

Annotate thrift service methods as retryable

It'd be nice to have the service IDL include annotations on which methods are safe to retry and which are not so that service consumers can interact with us safely. Once that's in place, it becomes obvious that we can just do retries semi-automatically (i.e. with some specification of retry parameters by the client).

Better logging level control for third-party dependencies

Baseplate uses a few third-party libraries such as python cassandra driver. Their log lines are rather chatty, making backend logs very noisy.

Would be good if baseplate provides a idiomatic way for services to tweak the log levels, since baseplate is aware of all the libraries it depends on.

Add test fakes

Things like context.trace should be easier to deal with in tests.

Remove rounding from metrics

Though the spec says "For compatibility all values should be integers in the range (-2^53^, 2^53^)." rounding the metrics to an integer is making for really really ugly graphs when things are quick. It looks like everything we're using is float-safe, so let's just let the raw numbers through.

screenshot from 2015-12-18 09 38 01

Document or include Sphinx requirements

I needed to install the following to get the docs to build locally:

  • sphinx
  • sphinxcontrib-spelling
  • pyenchant

I have traditionally followed the docs-requirements.txt/dev-requirements.txt pattern for stuff like this, but didn't want to presume this is where you were heading with a premature PR.

I can toss a PR over the fence if you let me know how you'd like to handle it (or not handle it).

Looking good so far!

See when we can safely retry thrift operations

EPIPE seems pretty safe, likewise zero-bytes-written. These should be cases where the RPC wasn't even sent, so retrying is always safe. Let's save the application from having to do that.

Quiet cassandra log spam

The python-driver for cassandra can be very noisy in the logs. Lots of messages like

Jul  3 10:29:05 host [service] 23701:MainThread:cassandra.policies:INFO:Using datacenter 'ue1' for DCAwareRoundRobinPolicy (via host '10.0.0.0); if incorrect, please specify a local_dc to the constructor, or limit contact points to local cluster nodes#015

This comes from the on_up method as found here https://datastax.github.io/python-driver/_modules/cassandra/policies.html

I believe we could reduce this by optionally passing in load_balancing_policy here: https://github.com/reddit/baseplate/blob/master/baseplate/context/cassandra.py#L14

load_balancing_policy=DCAwareRoundRobinPolicy(local_dc=local_dc)

Move trace shipping to sidecar

This should follow the same model as event publishers where the application workers write to POSIX message queues and a sidecar daemon consumes these messages and publishes batches.

This has a few advantages:

  • The sidecar can be reused for other languages that need to ship spans.
  • The application can be unaware of which tracing service is being used (Zipkin etc.)
  • Traces waiting in queue aren't lost during application shutdown.

Fix race condition in Cassandra span reporting

The Cassandra driver is asynchronous under the hood. It runs IO in a reactor on another thread (or in the native reactor if already async, e.g. gevent). It provides an API that returns Futures. You can either attach callbacks to the future to get notified of completion/failure or call result() on the future to block the current thread until the result comes in. Callbacks are called on the IO thread or in the registering thread if the future had already completed at time of registration. The result() API works by blocking on a threading.Event object. When the operation finishes inside the IO thread, the future signals the Event and then calls each of the callbacks in turn in that order.

Baseplate's Cassandra instrumentation intercepts Futures and attaches callbacks to them before returning them to the application. This is how we mark spans finished. The execute_async() API works this way by default and the execute() API is just execute_async() with an immediate call to result() on the returned future.

Because the IO thread signals the event which unleashes the main thread's result() before doing the callbacks, there's a race condition where the main thread finishes up its work and finishes the server span before the callbacks and therefor the child span get to finish.

The impact of this is that a child span might finish after its parent span sometimes. For most observers this should be fine. The metrics observer will lose that child span's metric though because the batch for the server span will have already shipped. This means undercounting Cassandra calls and timings.

Make max_concurrency server option required

There's no good reason for it to default to unlimited and it's a footgun. Require applications to set something since we can't really think of a good default since it depends on how CPU-bound the application is.

Initializing raven hangs indefinitely if a script is being run via a non-absolute path that is not the base git directory

e.g. consider a baseplate-based script launched via python foo/bar.py

In the following code, directory would look like foo. Because this is not the root directory of the git repo, this causes the exception handling branch to be taken. We attempt to move up one directory, calling os.path.dirname on the string "foo", and directory is then updated to be an empty string. From there, we continue in an infinite loop, with directory always being set to an empty string, and the exception branch always being hit.

https://github.com/reddit/baseplate/blob/20f5a0513cb2ad5f5d7ef1b4178fcc9cc112fb67/baseplate/__init__.py#L173-L181

Add monitoring of connection pools

It'd be nice to get some data on how application connection pools are doing. This should be doable for things like thrift_pool that we control completely and sqlalchemy that give us introspection events.

Improve error message when SecretsStore unable to load file

This check means we never load the file if it doesn't exist / we can't access it. This leads to an unhelpful error when fetching a secret later on:

  File "/usr/lib/python2.7/dist-packages/baseplate/secrets/store.py", line 224, in get_versioned
    secret_attributes = self.get_raw(path)
  File "/usr/lib/python2.7/dist-packages/baseplate/secrets/store.py", line 169, in get_raw
    return self._secrets[path]
TypeError: 'NoneType' object has no attribute '__getitem__'

Improve span annotations API

Right now, it's a free-for-all. It'd be nice to figure out some common patterns that we're going to be using (especially re: Zipkin) and then figure out how to make sugar for them which 1) reduces boilerplate, and 2) improves consistency. It'd also be good to figure out how some annotations can map to the metrics observer (e.g. count of failed RPC attempts in thrift_pool). This will probably require more things to use the API before we can get a good feel for patterns.

Add percent config validator

Things like sample rates are really common in configs, it'd be nice to make them more human readable:

[app:main]
rate = 37%
>>> cfg = config.parse_config(raw_config, {
...    "rate": config.Percent,
... })
>>> print(cfg.rate)
.32

Figure out what's wrong with config.Optional(config.DictOf)

DictOf does what we want without Optional but the combo confusingly always returns the default. This may just be something that we detect and say "don't do that" since it's unnecessary.

from baseplate import config

CONFIG = {
    "foo": config.Optional(
        config.DictOf(config.TupleOf(config.String)),
        default={},
    )
}

app_config = {
    "foo.bar": "a, b, c, d",
    "foo.baz": "e, f",
}

cfg = config.parse_config(app_config, CONFIG)
print(cfg.foo.bar)

prints {}.

Add option to servers to restart occasionally

Could be based on number of requests or a time limit. Should probably be fuzzed to stagger the restarts around the cluster.

The goal is to refresh ourselves every now and then to combat memory leaks/fragmentation etc. and also to get new DNS if necessary.

This probably means telling Einhorn that we want to be replaced, but probably want to also be able to just turn ourselves off and let a process manager figure it out for themselves.

SQLAlchemy instrumentation doesn't handle local spans

It's currently using a ServerSpan level threadlocal internally to pass state around which doesn't work with local spans. This means that the parentage of client spans made ostensibly inside local spans incorrectly points at the server span instead.

Remove hard dependencies that live outside PyPI

Running baseplate actions against a vanilla virtualenv with just the PyPI install of baseplate breaks with a hard dependency on thrift (which is an optional requirement). Upon installing thrift the error changes but appears to be due to a dependency on a different version of the thrift library.

Stacktrace from within a virtualenv:

$ python -m baseplate
Traceback (most recent call last):
  File "/usr/lib64/python2.7/runpy.py", line 163, in _run_module_as_main
    mod_name, _Error)
  File "/usr/lib64/python2.7/runpy.py", line 111, in _get_module_details
    __import__(mod_name)  # Do not catch exceptions initializing package
  File "<$PWD>/.venv/lib/python2.7/site-packages/baseplate/__init__.py", line 10, in <module>
    from .core import Baseplate
  File "<$PWD>/.venv/lib/python2.7/site-packages/baseplate/core.py", line 12, in <module>
    from thrift.util import Serializer
ImportError: No module named util

installed packages:

$ pip freeze
baseplate==0.23.1                                                                                                                                                                                                                      
certifi==2017.11.5
chardet==3.0.4
enum34==1.1.6
idna==2.6
posix-ipc==1.0.0
PyJWT==1.5.3
requests==2.18.4
six==1.11.0
thrift==0.10.0
urllib3==1.22

Add helpers for IDs

To ensure we have standard processing of reddit IDs (which look suspiciously like r2 fullnames).

Make a shell for thrift services

We have pshell for pyramid services, it'd be really nice to have an equivalent for quick messing around with thrift services.

Integrate `_from_config` functions better with the rest of config

It's hard to take advantage of more complex structuring with them, for example it's impossible to use nested keys:

thrift.service_a.endpoint = ...
thrift.service_b.endpoint = ...

wouldn't work with thrift_pool_from_config since it can't handle multi-level prefixes.

Support kubernetes auth backend for vault

From my understanding, we currently only authenticate to vault using the EC2 instance identity document. For the future when we run applications on kubernetes, this will not be sufficient as many arbitrary applications may run on the same instance in different pods.

The kubernetes auth backend is used by providing the role and a JWT mounted within kubernetes pods at /run/secrets/kubernetes.io/serviceaccount/token which is used to authenticate as a service account. Therefore, we simply just need to read the contents of this file and authenticate at a designated mount point for the kubernetes auth backend in order to get this working.

Here is some sample code using the hvac library:

import hvac

TOKEN_FILE = '/run/secrets/kubernetes.io/serviceaccount/token'
VAULT_URL = 'xxxx' 
AUTH_ENDPOINT = '/v1/auth/kubernetes/login'

with open(TOKEN_FILE, 'r') as f:
    token = f.read()

role = 'xxxx'

params = {
  'jwt': token,
  'role': role
}

client = hvac.Client(url=VAULT_URL)
client.auth(AUTH_ENDPOINT, json=params)

I'm still getting familiar with the code, but it seems we may want to build this functionality into VaultClientFactory rather than use hvac for more flexibility.

Cannot install via pip

Using pip 10.0.1 and python 3.6, pip install baseplate returns:

running bdist_wheel
running build
running build_py
SPECIAL baseplate/thrift/baseplate.thrift build/thrift/baseplate/thrift/baseplate.thrift_buildstamp
mkdir -p build/thrift/baseplate/thrift/baseplate.thrift
thrift1 -strict -gen py:utf8strings,slots,new_style -out build/thrift/baseplate/thrift/baseplate.thrift baseplate/thrift/baseplate.thrift
cp -r build/thrift/baseplate/thrift/baseplate.thrift/baseplate/thrift/ baseplate/
touch build/thrift/baseplate/thrift/baseplate.thrift_buildstamp
make: *** No rule to make target `tests/integration/test.thrift', needed by `build/thrift/tests/integration/test.thrift_buildstamp'.  Stop.
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/private/var/folders/5k/h40qktzn2dl9z13v0cmf33hh0000gn/T/pip-install-o4ua71_c/baseplate/setup.py", line 125, in <module>
    'build_py': BuildPyCommand,
  File "/Users/mohamed/.virtualenvs/local/lib/python3.6/site-packages/setuptools/__init__.py", line 129, in setup
    return distutils.core.setup(**attrs)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/dist.py", line 955, in run_commands
    self.run_command(cmd)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "/Users/mohamed/.virtualenvs/local/lib/python3.6/site-packages/wheel/bdist_wheel.py", line 179, in run
    self.run_command('build')
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/command/build.py", line 135, in run
    self.run_command(cmd_name)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "/private/var/folders/5k/h40qktzn2dl9z13v0cmf33hh0000gn/T/pip-install-o4ua71_c/baseplate/setup.py", line 67, in run
    self._make_thrift()
  File "/private/var/folders/5k/h40qktzn2dl9z13v0cmf33hh0000gn/T/pip-install-o4ua71_c/baseplate/setup.py", line 64, in _make_thrift
    subprocess.check_call(make_cmd)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/subprocess.py", line 291, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/bin/make', 'thrift']' returned non-zero exit status 2.

----------------------------------------
Failed building wheel for baseplate

It seems the tests directory is missing from the tarball uploaded to PyPI.

Heathcheck script errors in python 3 when using an AF_UNIX endpoint

baseplate-healthcheck uses urllib.quote when performing a wsgi healthcheck on an Endpoint where the family is socket.AF_UNIX which was moved to urllib.parse in py3. We should probably add it to _compat.py and use that.

Traceback (most recent call last):
  File "/usr/local/bin/baseplate-healthcheck3", line 9, in <module>
    load_entry_point('baseplate', 'console_scripts', 'baseplate-healthcheck3')()
  File "/usr/lib/python3/dist-packages/baseplate/server/healthcheck.py", line 68, in run_healthchecks
    checker(args.endpoint)
  File "/usr/lib/python3/dist-packages/baseplate/server/healthcheck.py", line 35, in check_http_service
    quoted_path = urllib.quote(endpoint.address, safe="")
AttributeError: 'module' object has no attribute 'quote'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.