Git Product home page Git Product logo

cincoconfig's People

Contributors

ameily avatar burrch3s avatar dependabot[bot] avatar vix597 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Forkers

burrch3s hboshnak

cincoconfig's Issues

v1.0.0 Refactor

Several cincoconfig design choices were done to make mypy, pylint, etc happy, including breaking out Schema and Config into abstract base classes, Base*, with concrete implementations in the Schema and Config classes. This design made it cumbersome to keep mypy/pylint happy and keep an intuitive. For example, when adding field/config reference paths to error messages, sevearl assumptions had to be made that the base type was really fully implemented. Additionally, because of the design, there were several measures in place to stop cyclic imports.

The refactor, in preparation for a stable v1.0.0 release, would make the API more intuitive, use better and more Pythonic implementations, and keep mypy/pylint happy with minimal ignore/disable comments.

The refactor will allow more accurate field/config reference paths and allow the API to easily grow as features are added.

Field Environment Variable

Create a new attribute for the Field class, Field.env, that specifies the environment variable that overrides the config value. The env variable would override both the Field.default and any values loaded from a configuration file. So, if a configuration file sets a field to X but the corresponding environment variable is set to Y, the config's value would be Y (the env variable.)

To make it easier, there will be a helper value that will autogenerate the environment variable based on the Field.key:

# these two lines are equivalent
schema.db.host = HostnameField(env='DB_HOST')
schema.db.host = HostnameField(env=True)

make_type produces a type that can't be serialized when following the list of complex types example recipe

Code:

from cincoconfig import *

# first, define the configuration's schema -- the fields available that
# customize the application's or library's behavior
schema = Schema()

# User account schema
user_account_schema = Schema()
user_account_schema.username = StringField(required=True)
user_account_schema.password = SecureField(required=True)
user_account_schema.groups = ListField(default=lambda: [])
UserAccount = user_account_schema.make_type("UserAccount")

schema.user_accounts = ListField(user_account_schema, default=lambda: [])

config = schema()  # Compile

ua = UserAccount(
    username="user",
    password="password"
)
config.user_accounts.append(ua)

print(config.dumps(format='json', pretty=True))

Error:

Traceback (most recent call last):
  File ".\example.py", line 28, in <module>
    print(config.dumps(format='json', pretty=True))
  File "C:\Users\\Documents\GitRepo\cincoconfig\cincoconfig\config.py", line 487, in dumps
    return formatter.dumps(self, self.to_tree())
  File "C:\Users\\Documents\GitRepo\cincoconfig\cincoconfig\formats\json.py", line 42, in dumps
    return json.dumps(tree, indent=2 if self.pretty else None).encode()
  File "C:\Python37\lib\json\__init__.py", line 238, in dumps
    **kw).encode(obj)
  File "C:\Python37\lib\json\encoder.py", line 201, in encode
    chunks = list(chunks)
  File "C:\Python37\lib\json\encoder.py", line 431, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "C:\Python37\lib\json\encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "C:\Python37\lib\json\encoder.py", line 325, in _iterencode_list
    yield from chunks
  File "C:\Python37\lib\json\encoder.py", line 438, in _iterencode
    o = _default(o)
  File "C:\Python37\lib\json\encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type UserAccount is not JSON serializable

Add a way to tell if a field was set from the default value

I just want BaseField.__setdefault__ link or where ever is appropriate to set an internal flag to identify if the field was set using the default value. This would obviously get unset if the value was later updated.

The rationale being that you might set a field to 127.0.0.1 or something by default, but you might also configure it to be 127.0.0.1 in the configuration. If there wasn't an explicit action taken to configure something, that should be known.

The use case is that a sane default should be set, but after parsing the config file and analyzing other configuration values, you might want to set the value to something else based on non-trivial logic.

A more trivial example (which could be another feature in it's own right -- linking fields to the value of others; circular deps are hard though):

{
    'item1': 'abc',
    'item2': 'def',
    'item3': 'ghi',
    'item4': null
}
schema = Schema()
schema.item1 = Stringfield(required=True)
schema.item2 = Stringfield(required=True, default='jkl')
schema.item3 = Stringfield(required=True)
schema.item4 = Stringfield(required=True, default='jkl')
cfg = schema()
if cfg.item4.isdefault():
    # Set item4 to whatever item2 is configured as
    # even if it is the default value, they should sync
    # if item4 wasn't explicity configured
    cfg.item4 = cfg.item2

Booleans in JSON are accepted in numerous non-Boolean field types

For example, having a config file testme.json:

{
    "something": true
}

This value is considered valid for:

  • IntField (considered a 1)
  • PortField(considered a 1)
  • HostnameField(converted to "0.0.0.1")
  • StringField(just converted to string)
  • FilenameField(also just converted to a string)
  • IPv4NetworkField(Converted to "0.0.0.1/32")

Generate Argparse parser based on config schema

Feature Idea: Add the ability to generate an ArgumentParser object based on a config schema. The application could retrieve this argparse object, parse args, then pass the args into cmdline_args_override.

The generated calls to add_argument could also add in support for parsing from Environment variables:

parser = argparse.ArgumentParser(description='test')
parser.add_argument('--url', default=os.environ.get('URL'))

It would be nice to choose to include/exclude certain fields from being overridden, but I'm not sure what that would look like. It could be another argument to the Field class, or a list of config paths to include/exclude (ex, include=['config.http', 'config.port')).

Cincoconfig File Format

Create a feature rich cincoconfig file format. The format parser and serializer must support:

  • Reading and preserving comments.
  • Adding comments when writing to disk.
  • Simple to write and human readable
  • Format is simple and secure
  • Distinguishable from YAML, JSON, etc.

My initial thought is something like an INI format:

[db]
host = 192.168.1.2
port = 10

# Sub sections are [X.Y]
[db.ssl]
enabled = true
client-cert = /path/to/cert

# List of objects (schemas) is multiple [X.Y] sections, where X.Y is the list field key
[db.users]
name = Adam
# list of simple types (int, str, float) can be comma separated or newline seperated
groups = users, sudo, thing, item\, with a comma

[db.users]
name = Sean

[banner]
message = <
  Very long message that can
  span multiple lines. "\n" chars
  are translated as single spaces.

  But, two "\n" chars are a new
  paragraph. This is similar rules
  to Markdown.

Load and combine multiple configuration files

Allow the Config.load() function to accept a list of file names. When a list is specified, the behavior becomes:

  1. Load all config files using the specified format. If a file doesn't exist or can't be read, do not raise an exception.
  2. Combine all the trees
  3. Parse the combined tree.
final = dict()
for tree in trees:
    self._deep_dict_update(final, tree)  # deep version of dict.update that handles lists and dicts

self.load_tree(final)

Cinco key cannot be generated on Windows and "key file is not open" on first save.

When creating a key file for the first time on Windows, the following exception in thrown.

Traceback (most recent call last):
  File ".\example.py", line 53, in <module>
    print(config.dumps(format='json', pretty=True).decode())
  File "D:\Documents\GitRepo\cincoconfig\cincoconfig\config.py", line 376, in dumps
    return formatter.dumps(self, self.to_tree())
  File "D:\Documents\GitRepo\cincoconfig\cincoconfig\config.py", line 450, in to_tree
    tree[key] = self._data[key].to_tree()
  File "D:\Documents\GitRepo\cincoconfig\cincoconfig\config.py", line 452, in to_tree
    tree[key] = field.to_basic(self, field.__getval__(self))
  File "D:\Documents\GitRepo\cincoconfig\cincoconfig\fields.py", line 985, in to_basic
    with cfg._keyfile as ctx:
  File "D:\Documents\GitRepo\cincoconfig\cincoconfig\encryption.py", line 134, in __enter__
    self.__load_key()
  File "D:\Documents\GitRepo\cincoconfig\cincoconfig\encryption.py", line 111, in __load_key
    self.generate_key()
  File "D:\Documents\GitRepo\cincoconfig\cincoconfig\encryption.py", line 120, in generate_key
    with open(self.filename, 'wb') as fp:
FileNotFoundError: [Errno 2] No such file or directory: '~\\.cincokey'

Fix for the above (need os.path.expanduser):

DEFAULT_CINCOKEY_FILEPATH = os.path.join(os.path.expanduser("~"), ".cincokey")

After this fix...there was another issue on config.dumps

Traceback (most recent call last):
  File ".\example.py", line 56, in <module>
    print(config.dumps(format='json', pretty=True).decode())
  File "D:\Documents\GitRepo\cincoconfig\cincoconfig\config.py", line 376, in dumps
    return formatter.dumps(self, self.to_tree())
  File "D:\Documents\GitRepo\cincoconfig\cincoconfig\config.py", line 450, in to_tree
    tree[key] = self._data[key].to_tree()
  File "D:\Documents\GitRepo\cincoconfig\cincoconfig\config.py", line 452, in to_tree
    tree[key] = field.to_basic(self, field.__getval__(self))
  File "D:\Documents\GitRepo\cincoconfig\cincoconfig\fields.py", line 986, in to_basic
    secret = ctx.encrypt(value, method=self.method)
  File "D:\Documents\GitRepo\cincoconfig\cincoconfig\encryption.py", line 174, in encrypt
    raise TypeError('key file is not open')
TypeError: key file is not open

The above only happened on initial key generation. This was because, in generate_key(), self.__key was not being set. Fix:

    def generate_key(self) -> None:
        '''
        Generate a random 32 byte key and save it to ``filename``.
        '''
        self.__key = os.urandom(32)
        with open(self.filename, 'wb') as fp:
            fp.write(self.__key)

"Encrypted" or Secure FIeld

It'd be nice to be able to securely store sensitive config values within a config file. It doesn't have to be perfect, it'd just be nice to have a protection in place to stop from someone viewing credentials in plain text.

One way we can do this is copy how Django does it with a SECRET_KEY. Where the application has a secret key hardcoded. The secret key would be application-specific.

With our goal of limiting dependencies, we'd have to stick with the standard library.

We would also need a script to generate secure values. Or somehow flag a value as secure. In a perfect world, parsing a config file like this:

{
  "mongodb": {
    "creds": "user:password"
  }
}

The parser would read the creds string value, transparently encrypt it, and write it back to the config:

{
  "mongodb": {
    "creds": {
        "type": "secure_value",
        "value": "<salt>:<encrypted value>"
    }
  }
}

So this issue contains two parts:

  1. a method to encrypt or securely store a string value with the Python stdlib
  2. a new Field that can read both string and secure values and automatically secure string values. This can be done in the API right now by implemented Field.to_python.

"Hint" argument to Field() class

Feature Idea: Add a "hint" argument to the field() class that stores information about what the config option does. Two possible uses:

  • Set the function docstring to this value for evaluation within the application
  • Write out a comment before the value in configuration files for formats that support it (ex, yaml) Example:
from cincoconfig import *

# first, define the configuration's schema -- the fields available that
# customize the application's or library's behavior
schema = Schema()
schema.mode = ApplicationModeField(
    default='production', hint="Mode of application Operation. Development mode enables additional logging and features."
)

# ...

# set a config value manually
if config.mode == 'production':
    config.db.name = config.db.name + '_production'

print(config.dumps(format='yaml', hints=True).decode())
db:
  host: localhost
  name: my_app_production
  password: null
  port: 27017
  user: null
http:
  address: 127.0.0.1
  port: 8080
  ssl:
    cafile: null
    certfile: null
    enabled: false
    keyfile: null
# Mode of application Operation. Development mode enables additional logging and features.
mode: production

Update Toolchain

Update the dev toolchain based on my experience with other tools:

  • flake8 replaces pylint
  • black replaces pycodestyle
  • add poetry
  • pyright replaces mypy
  • switch to GitHub actions

Create IncludeField

Create a new field that can include another configuration file. The idea is that the configuration can load values from a shared or common config:

{
  "val": 1,
  "x": "y",
  "include": "/path/to/other.json"
}

This would require a new attribute in the BaseConfig class, deferred_includes that are evaluated after a config is loaded but before validation.

Enable pylint missing-docstring error

The pylint missing-docstring error is disabled for the time being to allow the build to pass while we are still developing the foundation. Enable this error message once the foundation is good.

IncludeField nested in a Schema object not evaluated

IncludeFields only seem to be parsed, if an IncludeField is a value at root depth. For example:

log_level: "warn"
include: "path/to/file.yml"

Will evaluate the include declaration, and import any values from there, but:

log_level: "warn"
db:
  include: "path/to/file.yml"

will not go to path/to/file.yml and parse values there when cincoconfig.config.Schema.loads is called. The reason seems to be this statement, which doesn't evaluate field for when it is a schema which might contain an IncludeField:

cincoconfig.config Schema.loads()

includes = [(key, field) for key, field in self._schema._fields.items()
                    if isinstance(field, IncludeField)]

My reasoning for this being a bug, as it seems to contradict the IncludeField docstring which states that the following is valid:

        # file1.yaml
        db:
          include: "db.yaml"
        include: "core.yaml"

        # db.yaml
        host: "0.0.0.0"
        port: 27017

        # core.yaml
        mode: "production"
        ssl: true

Collect Validation Errors

To support UIs that wrap a configuration, it would be helpful if all validation errors were collected and returned as a list rather than raising an exception on the first validation error. This would allow UIs to collect all validation errors and display them.

"Map" Field

Allow a new field time, "Map Field", which allows providing a dictionary that can validate both the key and value against a given field type.

Alternatively, update DictField to support this behavior.

Here is a PoC:

class MapField(Field):
    def __init__(self, value_field: Union[BaseField, Type[ConfigType]], key_field: StringField = None, *args, **kwargs):
        self.key_field = key_field or StringField(required=True)
        self.value_field = value_field
        super().__init__(*args, **kwargs)

    def _validate(self, cfg: 'Config', value: Dict[str, Any]) -> Any:
        for k, v in value.items():
            if type(k) is not str:
                raise ValidationError(cfg, self, 'keys must be strings')
            self.key_field.validate(cfg, k)
            self.value_field.validate(cfg, v)
        return value

Errors don't give full path to config value

For example, a nested SecureField: config.one.two.three.password = SecureField() will raise a ValueError if set to something invalid and that error will be something like: Error: password invalid when it should be Error: one.two.three.password invalid

List Field of Complex Types Doesn't work

Using the recipe from the docs: https://cincoconfig.readthedocs.io/en/latest/recipes.html

from cincoconfig import Schema, UrlField, BoolField, ListField

webhook_schema = Schema()
webhook_schema.url = UrlField(required=True)
webhook_schema.verify_ssl = BoolField(default=True)

schema = Schema()
schema.issue_webhooks = ListField(webhook_schema)
schema.merge_request_webhooks = ListField(webhook_schema)

config = schema()

wh = webhook_schema()
wh.url = 'https://google.com'
config.issue_webhooks.append(wh)

(saved as test.py for example) - Running:

python test.py 
Traceback (most recent call last):
  File "test.py", line 15, in <module>
    config.issue_webhooks.append(wh)
AttributeError: 'NoneType' object has no attribute 'append'

Version: cincoconfig==0.9.0

Python: 3.10.12

OS: Linux

Implement a SQLITE format

I had this idea this morning.

Why

  1. SQLite3 is builtin to python
  2. Lot of web applications (namely django) use sqlite3

TODO

  1. I'm not sure what dumps() would look like for sqlite
  2. Also, for it to be useful, config.save() would need to be patched by the sqlite format to not overwrite the file if it exists and instead just load it and insert into the database

It's hard

  1. How do we deal with a changing schema?
  2. Who actually wants this?
  3. If we were to support sqlite config and use it for a django app...how would that make anything better?

This also might be a terrible idea. Feel free to just close this if it's dumb.

IPv4Network min/max prefix length

Implement options for validating an IPv4 network with a minimum and maximum prefix length. For example, these options could be used to filter out single IP addresses (max_prefix_length = 31) and filter out class A networks (min_prefix_length = 9).

Implement configuration path parsing

It seems like it would be useful if there was a top-level helper method (maybe called get_path) in Field that would return the full path to that field. I imagine the definition being something like this:

class Field:
    def get_path(self, sep="."):
        pass

Additionally, we will need to add support in the BaseSchema to index by path. I would expect the usage and output to look like this:

Usage

schema = Schema()
schema.mode = ApplicationModeField(default='production')
schema.http.port = PortField(default=8080, required=True)
schema.http.ssl.enabled = BoolField(default=False)
schema.some.really.nested.config.value.that.I.would.rather.not.have.to.type.all.the.time = BoolField(default=False)

mode_p = schema.mode.get_path()
port_p = schema.http.port.get_path()
ssl_p = schema.http.ssl.enabled.get_path(sep="/")
long_p = schema.some.really.nested.config.value.that.I.would.rather.not.have.to.type.all.the.time.get_path()

print(mode_p)
print(port_p)
print(ssl_p)
print(long_p)

print()

print(schema[mode_p])
print(schema[port_p])
print(schema[ssl_p])
print(schema[long_p])

print()

config = schema()

print(config[mode_p])
print(config[port_p])
print(config[ssl_p])
print(config[long_p])

Output

mode
http.port
http/ssl/enabled
some.really.nested.config.value.that.I.would.rather.not.have.to.type.all.the.time

ApplicationModeField@sfhksdhfd
PortField@sfhksdjf
BoolField@slfhkjdsf
BoolField@shfkjdsf

production
8080
False
False

NOTE: Per the output example above, both get_path and the path indexing should work a BaseSchema object and a BaseConfig.

Feature Flag Field

An application feature can be defined as a single sub schema or config. For example, an application may support exporting data to Elasticsearch, and the corresponding configuration would look like:

schema.elastic.enabled = BoolField(default=False)
schema.elastic.url = UrlField(required=True)

In practice, any required configuration options should be ignored when the feature is enabled (e.g.- do not perform validation if the feature is disabled.) Cincoconfig's current design doesn't allow for this and will always raise a validation error when URL is not specified or is not valid.

This issue would add a new field, FeatureFlagField that, when set to False, would disable the bound configuration's validation.

Set config value via full path

It'd be nice to be able to set a configuration value programmatically via it's full path:

config['db.ssl.enabled'] = True

This would split the key by '.' and then navigate the tree to the actual field / config.

Empty ListField not properly dealt with

If a ListField is left empty in the config, the value is left None and causes an error when trying to output to an dict or JSON.

from cincoconfig import *

schema = Schema()

schema.something = ListField()

config = schema()
print(config.dumps(format='json', pretty=True).decode())

causes the error:

Traceback (most recent call last):
  File ".\configtest.py", line 9, in <module>
    print(config.dumps(format='json', pretty=True).decode())
  File "C:\Users\user\Documents\test\venv\lib\site-packages\cincoconfig\config.py", line 518, in dumps
    return formatter.dumps(self, self.to_tree(virtual=virtual, secure_mask=secure_mask))
  File "C:\Users\user\Documents\test\venv\lib\site-packages\cincoconfig\config.py", line 675, in to_tree
    value = field.to_basic(self, field.__getval__(self))
  File "C:\Users\user\Documents\test\venv\lib\site-packages\cincoconfig\fields.py", line 679, in to_basic
    return list(value)
TypeError: 'NoneType' object is not iterable

List of complex types doesn't print full config path on error

The following code:

import getpass
from cincoconfig import *

# first, define the configuration's schema -- the fields available that
# customize the application's or library's behavior
schema = Schema()

# Create a user account schema to be nested
user_account_schema = Schema()
user_account_schema.username = StringField(required=True, transform_case="lower", transform_strip=True)
user_account_schema.password = ChallengeField("sha512", required=True)
user_account_schema.groups = ListField(default=lambda: [])
UserAccount = user_account_schema.make_type("UserAccount")

# Add to top-level schema
schema.user_accounts = ListField(user_account_schema, default=lambda: [])


config = schema()

config.user_accounts.append(UserAccount(
    username="User"
))

Triggers the following exception:

Traceback (most recent call last):
  File ".\example.py", line 25, in <module>
    username="User"
  File "C:\Users\Documents\GitRepo\cincoconfig\cincoconfig\fields.py", line 494, in append   
    super().append(self._validate(item))
  File "C:\Users\Documents\GitRepo\cincoconfig\cincoconfig\fields.py", line 537, in _validate
    value.validate()
  File "C:\Users\Documents\GitRepo\cincoconfig\cincoconfig\config.py", line 686, in validate 
    self._schema._validate(self)  # type: ignore
  File "C:\Users\Documents\GitRepo\cincoconfig\cincoconfig\config.py", line 264, in _validate
    field.validate(config, val)
  File "C:\Users\Documents\GitRepo\cincoconfig\cincoconfig\abc.py", line 154, in validate    
    raise ValueError('%s is required' % self.name)
ValueError: password is required

Expected something more like:

ValueError: user_accounts index 0, password is required

human readable time fields

Add in a field to validate human readable time strings like 5d20h30m15s

This is my proof of concept I was able to cook up. The to_python and to_basic might need some work on serialization, as the timeparse library is very flexible and if you read in one format, then serialize the cincoconfig to JSON, it won't necessarily be in the format that it was input as. The serialized format will still be a valid config, it just won't necessarily look like the values that were read in initially.

from pytimeparse.timeparse import timeparse
from cincoconfig import Field

class DurationField(Field):
    """
    A human readable duration field. Values are validated that they parse to timedelta.
    """

    storage_type = timedelta

    def __init__(self, **kwargs):
        """
        Override to make sure default is valid
        """
        value = kwargs.get("default")
        if value is not None:
            value = self._convert(value)
            kwargs["default"] = value
        super().__init__(**kwargs)

    def _convert(self, value: Any) -> timedelta:
        if isinstance(value, timedelta):
            return value
        elif isinstance(value, str):
            v = timeparse(value)
            if v is None:
                raise ValueError("value can not be parsed by any format")
            return timedelta(seconds=float(v))
        elif isinstance(value, float):
            return timedelta(seconds=value)
        elif isinstance(value, int):
            return timedelta(seconds=float(value))

        raise ValueError("value must be timedelta, str or number, not %s" % type(value).__name__)

    def _validate(self, cfg: Config, value: Any) -> timedelta:
        return self._convert(value)

    def to_basic(self, cfg: Config, value: timedelta) -> str:
        return str(value)

    def to_python(self, cfg: Config, value: str) -> Optional[timedelta]:
        return self._convert(value)

fields.py is getting huge. Break into a module

Something like:

fields/
    __init__.py # __all__ = ('All', 'The', 'Fields',)
    secure.py # SecureField
    string.py # StringField, ApplicationModeField, etc...
    number.py # All number fields
    list.py # List fields and ListProxy
    bool.py # Bool field
    etc...

Config Validator

Create an overall config validator hook that can validate the entire config after it's been loaded.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.