ameily / cincoconfig Goto Github PK
View Code? Open in Web Editor NEWPython Cinco Config
License: ISC License
Python Cinco Config
License: ISC License
Several cincoconfig design choices were done to make mypy, pylint, etc happy, including breaking out Schema and Config into abstract base classes, Base*
, with concrete implementations in the Schema
and Config
classes. This design made it cumbersome to keep mypy/pylint happy and keep an intuitive. For example, when adding field/config reference paths to error messages, sevearl assumptions had to be made that the base type was really fully implemented. Additionally, because of the design, there were several measures in place to stop cyclic imports.
The refactor, in preparation for a stable v1.0.0 release, would make the API more intuitive, use better and more Pythonic implementations, and keep mypy/pylint happy with minimal ignore/disable comments.
The refactor will allow more accurate field/config reference paths and allow the API to easily grow as features are added.
Create a new attribute for the Field
class, Field.env
, that specifies the environment variable that overrides the config value. The env variable would override both the Field.default
and any values loaded from a configuration file. So, if a configuration file sets a field to X
but the corresponding environment variable is set to Y
, the config's value would be Y
(the env variable.)
To make it easier, there will be a helper value that will autogenerate the environment variable based on the Field.key
:
# these two lines are equivalent
schema.db.host = HostnameField(env='DB_HOST')
schema.db.host = HostnameField(env=True)
Provide a function similiar to dumps
that returns a dictionary of the data within a schema. A method named dict()
would make sense and be consistent with Pydantic.
Currently using schema.field._data
for this functionality.
Code:
from cincoconfig import *
# first, define the configuration's schema -- the fields available that
# customize the application's or library's behavior
schema = Schema()
# User account schema
user_account_schema = Schema()
user_account_schema.username = StringField(required=True)
user_account_schema.password = SecureField(required=True)
user_account_schema.groups = ListField(default=lambda: [])
UserAccount = user_account_schema.make_type("UserAccount")
schema.user_accounts = ListField(user_account_schema, default=lambda: [])
config = schema() # Compile
ua = UserAccount(
username="user",
password="password"
)
config.user_accounts.append(ua)
print(config.dumps(format='json', pretty=True))
Error:
Traceback (most recent call last):
File ".\example.py", line 28, in <module>
print(config.dumps(format='json', pretty=True))
File "C:\Users\\Documents\GitRepo\cincoconfig\cincoconfig\config.py", line 487, in dumps
return formatter.dumps(self, self.to_tree())
File "C:\Users\\Documents\GitRepo\cincoconfig\cincoconfig\formats\json.py", line 42, in dumps
return json.dumps(tree, indent=2 if self.pretty else None).encode()
File "C:\Python37\lib\json\__init__.py", line 238, in dumps
**kw).encode(obj)
File "C:\Python37\lib\json\encoder.py", line 201, in encode
chunks = list(chunks)
File "C:\Python37\lib\json\encoder.py", line 431, in _iterencode
yield from _iterencode_dict(o, _current_indent_level)
File "C:\Python37\lib\json\encoder.py", line 405, in _iterencode_dict
yield from chunks
File "C:\Python37\lib\json\encoder.py", line 325, in _iterencode_list
yield from chunks
File "C:\Python37\lib\json\encoder.py", line 438, in _iterencode
o = _default(o)
File "C:\Python37\lib\json\encoder.py", line 179, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type UserAccount is not JSON serializable
I just want BaseField.__setdefault__
link or where ever is appropriate to set an internal flag to identify if the field was set using the default value. This would obviously get unset if the value was later updated.
The rationale being that you might set a field to 127.0.0.1 or something by default, but you might also configure it to be 127.0.0.1 in the configuration. If there wasn't an explicit action taken to configure something, that should be known.
The use case is that a sane default should be set, but after parsing the config file and analyzing other configuration values, you might want to set the value to something else based on non-trivial logic.
A more trivial example (which could be another feature in it's own right -- linking fields to the value of others; circular deps are hard though):
{
'item1': 'abc',
'item2': 'def',
'item3': 'ghi',
'item4': null
}
schema = Schema()
schema.item1 = Stringfield(required=True)
schema.item2 = Stringfield(required=True, default='jkl')
schema.item3 = Stringfield(required=True)
schema.item4 = Stringfield(required=True, default='jkl')
cfg = schema()
if cfg.item4.isdefault():
# Set item4 to whatever item2 is configured as
# even if it is the default value, they should sync
# if item4 wasn't explicity configured
cfg.item4 = cfg.item2
For example, having a config file testme.json
:
{
"something": true
}
This value is considered valid for:
IntField
(considered a 1)PortField
(considered a 1)HostnameField
(converted to "0.0.0.1"
)StringField
(just converted to string)FilenameField
(also just converted to a string)IPv4NetworkField
(Converted to "0.0.0.1/32"
)All paths must be passed through os.path.expanduser
so that they can handle ~/path/to/thing
.
Add a new argument to dumps
and to_tree
that can mask secure values.
Feature Idea: Add the ability to generate an ArgumentParser object based on a config schema. The application could retrieve this argparse object, parse args, then pass the args into cmdline_args_override
.
The generated calls to add_argument
could also add in support for parsing from Environment variables:
parser = argparse.ArgumentParser(description='test')
parser.add_argument('--url', default=os.environ.get('URL'))
It would be nice to choose to include/exclude certain fields from being overridden, but I'm not sure what that would look like. It could be another argument to the Field
class, or a list of config paths to include/exclude (ex, include=['config.http', 'config.port')
).
Create a feature rich cincoconfig file format. The format parser and serializer must support:
My initial thought is something like an INI format:
[db]
host = 192.168.1.2
port = 10
# Sub sections are [X.Y]
[db.ssl]
enabled = true
client-cert = /path/to/cert
# List of objects (schemas) is multiple [X.Y] sections, where X.Y is the list field key
[db.users]
name = Adam
# list of simple types (int, str, float) can be comma separated or newline seperated
groups = users, sudo, thing, item\, with a comma
[db.users]
name = Sean
[banner]
message = <
Very long message that can
span multiple lines. "\n" chars
are translated as single spaces.
But, two "\n" chars are a new
paragraph. This is similar rules
to Markdown.
Allow the Config.load()
function to accept a list of file names. When a list is specified, the behavior becomes:
final = dict()
for tree in trees:
self._deep_dict_update(final, tree) # deep version of dict.update that handles lists and dicts
self.load_tree(final)
When creating a key file for the first time on Windows, the following exception in thrown.
Traceback (most recent call last):
File ".\example.py", line 53, in <module>
print(config.dumps(format='json', pretty=True).decode())
File "D:\Documents\GitRepo\cincoconfig\cincoconfig\config.py", line 376, in dumps
return formatter.dumps(self, self.to_tree())
File "D:\Documents\GitRepo\cincoconfig\cincoconfig\config.py", line 450, in to_tree
tree[key] = self._data[key].to_tree()
File "D:\Documents\GitRepo\cincoconfig\cincoconfig\config.py", line 452, in to_tree
tree[key] = field.to_basic(self, field.__getval__(self))
File "D:\Documents\GitRepo\cincoconfig\cincoconfig\fields.py", line 985, in to_basic
with cfg._keyfile as ctx:
File "D:\Documents\GitRepo\cincoconfig\cincoconfig\encryption.py", line 134, in __enter__
self.__load_key()
File "D:\Documents\GitRepo\cincoconfig\cincoconfig\encryption.py", line 111, in __load_key
self.generate_key()
File "D:\Documents\GitRepo\cincoconfig\cincoconfig\encryption.py", line 120, in generate_key
with open(self.filename, 'wb') as fp:
FileNotFoundError: [Errno 2] No such file or directory: '~\\.cincokey'
Fix for the above (need os.path.expanduser
):
DEFAULT_CINCOKEY_FILEPATH = os.path.join(os.path.expanduser("~"), ".cincokey")
After this fix...there was another issue on config.dumps
Traceback (most recent call last):
File ".\example.py", line 56, in <module>
print(config.dumps(format='json', pretty=True).decode())
File "D:\Documents\GitRepo\cincoconfig\cincoconfig\config.py", line 376, in dumps
return formatter.dumps(self, self.to_tree())
File "D:\Documents\GitRepo\cincoconfig\cincoconfig\config.py", line 450, in to_tree
tree[key] = self._data[key].to_tree()
File "D:\Documents\GitRepo\cincoconfig\cincoconfig\config.py", line 452, in to_tree
tree[key] = field.to_basic(self, field.__getval__(self))
File "D:\Documents\GitRepo\cincoconfig\cincoconfig\fields.py", line 986, in to_basic
secret = ctx.encrypt(value, method=self.method)
File "D:\Documents\GitRepo\cincoconfig\cincoconfig\encryption.py", line 174, in encrypt
raise TypeError('key file is not open')
TypeError: key file is not open
The above only happened on initial key generation. This was because, in generate_key()
, self.__key
was not being set. Fix:
def generate_key(self) -> None:
'''
Generate a random 32 byte key and save it to ``filename``.
'''
self.__key = os.urandom(32)
with open(self.filename, 'wb') as fp:
fp.write(self.__key)
It'd be nice to be able to securely store sensitive config values within a config file. It doesn't have to be perfect, it'd just be nice to have a protection in place to stop from someone viewing credentials in plain text.
One way we can do this is copy how Django does it with a SECRET_KEY
. Where the application has a secret key hardcoded. The secret key would be application-specific.
With our goal of limiting dependencies, we'd have to stick with the standard library.
We would also need a script to generate secure values. Or somehow flag a value as secure. In a perfect world, parsing a config file like this:
{
"mongodb": {
"creds": "user:password"
}
}
The parser would read the creds
string value, transparently encrypt it, and write it back to the config:
{
"mongodb": {
"creds": {
"type": "secure_value",
"value": "<salt>:<encrypted value>"
}
}
}
So this issue contains two parts:
Field
that can read both string and secure values and automatically secure string values. This can be done in the API right now by implemented Field.to_python
.Feature Idea: Add a "hint" argument to the field() class that stores information about what the config option does. Two possible uses:
from cincoconfig import *
# first, define the configuration's schema -- the fields available that
# customize the application's or library's behavior
schema = Schema()
schema.mode = ApplicationModeField(
default='production', hint="Mode of application Operation. Development mode enables additional logging and features."
)
# ...
# set a config value manually
if config.mode == 'production':
config.db.name = config.db.name + '_production'
print(config.dumps(format='yaml', hints=True).decode())
db:
host: localhost
name: my_app_production
password: null
port: 27017
user: null
http:
address: 127.0.0.1
port: 8080
ssl:
cafile: null
certfile: null
enabled: false
keyfile: null
# Mode of application Operation. Development mode enables additional logging and features.
mode: production
Update the dev toolchain based on my experience with other tools:
Implement the INI config file format.
Create a new field that can include another configuration file. The idea is that the configuration can load values from a shared or common config:
{
"val": 1,
"x": "y",
"include": "/path/to/other.json"
}
This would require a new attribute in the BaseConfig class, deferred_includes
that are evaluated after a config is loaded but before validation.
The pylint missing-docstring
error is disabled for the time being to allow the build to pass while we are still developing the foundation. Enable this error message once the foundation is good.
IncludeFields only seem to be parsed, if an IncludeField is a value at root depth. For example:
log_level: "warn"
include: "path/to/file.yml"
Will evaluate the include
declaration, and import any values from there, but:
log_level: "warn"
db:
include: "path/to/file.yml"
will not go to path/to/file.yml
and parse values there when cincoconfig.config.Schema.loads
is called. The reason seems to be this statement, which doesn't evaluate field
for when it is a schema which might contain an IncludeField
:
cincoconfig.config Schema.loads()
includes = [(key, field) for key, field in self._schema._fields.items()
if isinstance(field, IncludeField)]
My reasoning for this being a bug, as it seems to contradict the IncludeField
docstring which states that the following is valid:
# file1.yaml
db:
include: "db.yaml"
include: "core.yaml"
# db.yaml
host: "0.0.0.0"
port: 27017
# core.yaml
mode: "production"
ssl: true
To support UIs that wrap a configuration, it would be helpful if all validation errors were collected and returned as a list rather than raising an exception on the first validation error. This would allow UIs to collect all validation errors and display them.
If you do something like this:
ssl_ca_certs = FilenameField(exists="file", required=False)
and the value in the config is null
- You get a ValueError
- I would expect that since I'm saying it's not required (meaning the value can be null
) that this wouldn't raise an exception.
Allow a new field time, "Map Field", which allows providing a dictionary that can validate both the key and value against a given field type.
Alternatively, update DictField to support this behavior.
Here is a PoC:
class MapField(Field):
def __init__(self, value_field: Union[BaseField, Type[ConfigType]], key_field: StringField = None, *args, **kwargs):
self.key_field = key_field or StringField(required=True)
self.value_field = value_field
super().__init__(*args, **kwargs)
def _validate(self, cfg: 'Config', value: Dict[str, Any]) -> Any:
for k, v in value.items():
if type(k) is not str:
raise ValidationError(cfg, self, 'keys must be strings')
self.key_field.validate(cfg, k)
self.value_field.validate(cfg, v)
return value
For example, a nested SecureField
: config.one.two.three.password = SecureField()
will raise a ValueError
if set to something invalid and that error will be something like: Error: password invalid
when it should be Error: one.two.three.password invalid
Error message for ApplicationMode and really any StringField
with choices should be updated to include valid choices. This could blow up error output, so I propose either hard-coding a limited number of choices to display or exposing an option in the field to display valid choices on error.
Using the recipe from the docs: https://cincoconfig.readthedocs.io/en/latest/recipes.html
from cincoconfig import Schema, UrlField, BoolField, ListField
webhook_schema = Schema()
webhook_schema.url = UrlField(required=True)
webhook_schema.verify_ssl = BoolField(default=True)
schema = Schema()
schema.issue_webhooks = ListField(webhook_schema)
schema.merge_request_webhooks = ListField(webhook_schema)
config = schema()
wh = webhook_schema()
wh.url = 'https://google.com'
config.issue_webhooks.append(wh)
(saved as test.py
for example) - Running:
python test.py
Traceback (most recent call last):
File "test.py", line 15, in <module>
config.issue_webhooks.append(wh)
AttributeError: 'NoneType' object has no attribute 'append'
Version: cincoconfig==0.9.0
Python: 3.10.12
OS: Linux
I had this idea this morning.
dumps()
would look like for sqliteconfig.save()
would need to be patched by the sqlite format to not overwrite the file if it exists and instead just load it and insert into the databaseThis also might be a terrible idea. Feel free to just close this if it's dumb.
Implement options for validating an IPv4 network with a minimum and maximum prefix length. For example, these options could be used to filter out single IP addresses (max_prefix_length = 31
) and filter out class A networks (min_prefix_length = 9
).
It seems like it would be useful if there was a top-level helper method (maybe called get_path
) in Field
that would return the full path to that field. I imagine the definition being something like this:
class Field:
def get_path(self, sep="."):
pass
Additionally, we will need to add support in the BaseSchema
to index by path. I would expect the usage and output to look like this:
schema = Schema()
schema.mode = ApplicationModeField(default='production')
schema.http.port = PortField(default=8080, required=True)
schema.http.ssl.enabled = BoolField(default=False)
schema.some.really.nested.config.value.that.I.would.rather.not.have.to.type.all.the.time = BoolField(default=False)
mode_p = schema.mode.get_path()
port_p = schema.http.port.get_path()
ssl_p = schema.http.ssl.enabled.get_path(sep="/")
long_p = schema.some.really.nested.config.value.that.I.would.rather.not.have.to.type.all.the.time.get_path()
print(mode_p)
print(port_p)
print(ssl_p)
print(long_p)
print()
print(schema[mode_p])
print(schema[port_p])
print(schema[ssl_p])
print(schema[long_p])
print()
config = schema()
print(config[mode_p])
print(config[port_p])
print(config[ssl_p])
print(config[long_p])
mode
http.port
http/ssl/enabled
some.really.nested.config.value.that.I.would.rather.not.have.to.type.all.the.time
ApplicationModeField@sfhksdhfd
PortField@sfhksdjf
BoolField@slfhkjdsf
BoolField@shfkjdsf
production
8080
False
False
NOTE: Per the output example above, both get_path
and the path indexing should work a BaseSchema
object and a BaseConfig
.
An application feature can be defined as a single sub schema or config. For example, an application may support exporting data to Elasticsearch, and the corresponding configuration would look like:
schema.elastic.enabled = BoolField(default=False)
schema.elastic.url = UrlField(required=True)
In practice, any required configuration options should be ignored when the feature is enabled (e.g.- do not perform validation if the feature is disabled.) Cincoconfig's current design doesn't allow for this and will always raise a validation error when URL is not specified or is not valid.
This issue would add a new field, FeatureFlagField
that, when set to False
, would disable the bound configuration's validation.
It'd be nice to be able to set a configuration value programmatically via it's full path:
config['db.ssl.enabled'] = True
This would split the key by '.'
and then navigate the tree to the actual field / config.
If a ListField is left empty in the config, the value is left None
and causes an error when trying to output to an dict or JSON.
from cincoconfig import *
schema = Schema()
schema.something = ListField()
config = schema()
print(config.dumps(format='json', pretty=True).decode())
causes the error:
Traceback (most recent call last):
File ".\configtest.py", line 9, in <module>
print(config.dumps(format='json', pretty=True).decode())
File "C:\Users\user\Documents\test\venv\lib\site-packages\cincoconfig\config.py", line 518, in dumps
return formatter.dumps(self, self.to_tree(virtual=virtual, secure_mask=secure_mask))
File "C:\Users\user\Documents\test\venv\lib\site-packages\cincoconfig\config.py", line 675, in to_tree
value = field.to_basic(self, field.__getval__(self))
File "C:\Users\user\Documents\test\venv\lib\site-packages\cincoconfig\fields.py", line 679, in to_basic
return list(value)
TypeError: 'NoneType' object is not iterable
The following code:
import getpass
from cincoconfig import *
# first, define the configuration's schema -- the fields available that
# customize the application's or library's behavior
schema = Schema()
# Create a user account schema to be nested
user_account_schema = Schema()
user_account_schema.username = StringField(required=True, transform_case="lower", transform_strip=True)
user_account_schema.password = ChallengeField("sha512", required=True)
user_account_schema.groups = ListField(default=lambda: [])
UserAccount = user_account_schema.make_type("UserAccount")
# Add to top-level schema
schema.user_accounts = ListField(user_account_schema, default=lambda: [])
config = schema()
config.user_accounts.append(UserAccount(
username="User"
))
Triggers the following exception:
Traceback (most recent call last):
File ".\example.py", line 25, in <module>
username="User"
File "C:\Users\Documents\GitRepo\cincoconfig\cincoconfig\fields.py", line 494, in append
super().append(self._validate(item))
File "C:\Users\Documents\GitRepo\cincoconfig\cincoconfig\fields.py", line 537, in _validate
value.validate()
File "C:\Users\Documents\GitRepo\cincoconfig\cincoconfig\config.py", line 686, in validate
self._schema._validate(self) # type: ignore
File "C:\Users\Documents\GitRepo\cincoconfig\cincoconfig\config.py", line 264, in _validate
field.validate(config, val)
File "C:\Users\Documents\GitRepo\cincoconfig\cincoconfig\abc.py", line 154, in validate
raise ValueError('%s is required' % self.name)
ValueError: password is required
Expected something more like:
ValueError: user_accounts index 0, password is required
Add in a field to validate human readable time strings like 5d20h30m15s
This is my proof of concept I was able to cook up. The to_python
and to_basic
might need some work on serialization, as the timeparse
library is very flexible and if you read in one format, then serialize the cincoconfig to JSON, it won't necessarily be in the format that it was input as. The serialized format will still be a valid config, it just won't necessarily look like the values that were read in initially.
from pytimeparse.timeparse import timeparse
from cincoconfig import Field
class DurationField(Field):
"""
A human readable duration field. Values are validated that they parse to timedelta.
"""
storage_type = timedelta
def __init__(self, **kwargs):
"""
Override to make sure default is valid
"""
value = kwargs.get("default")
if value is not None:
value = self._convert(value)
kwargs["default"] = value
super().__init__(**kwargs)
def _convert(self, value: Any) -> timedelta:
if isinstance(value, timedelta):
return value
elif isinstance(value, str):
v = timeparse(value)
if v is None:
raise ValueError("value can not be parsed by any format")
return timedelta(seconds=float(v))
elif isinstance(value, float):
return timedelta(seconds=value)
elif isinstance(value, int):
return timedelta(seconds=float(value))
raise ValueError("value must be timedelta, str or number, not %s" % type(value).__name__)
def _validate(self, cfg: Config, value: Any) -> timedelta:
return self._convert(value)
def to_basic(self, cfg: Config, value: timedelta) -> str:
return str(value)
def to_python(self, cfg: Config, value: str) -> Optional[timedelta]:
return self._convert(value)
Something like:
fields/
__init__.py # __all__ = ('All', 'The', 'Fields',)
secure.py # SecureField
string.py # StringField, ApplicationModeField, etc...
number.py # All number fields
list.py # List fields and ListProxy
bool.py # Bool field
etc...
Create an overall config validator hook that can validate the entire config after it's been loaded.
Implement the XML format.
Empty strings should probably be treated as None for SecureField
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.