google / ml_collections Goto Github PK
View Code? Open in Web Editor NEWML Collections is a library of Python Collections designed for ML use cases.
Home Page: https://ml-collections.readthedocs.io/
License: Apache License 2.0
ML Collections is a library of Python Collections designed for ML use cases.
Home Page: https://ml-collections.readthedocs.io/
License: Apache License 2.0
Hi, I am trying to do some more complicated configuration file setups which include putting together a couple of lazy ints to a lazy tuple. The following script is an example how I can get it to work. However the resulting FieldReference
object stores a tuple but thinks it's storing an int.
from ml_collections import ConfigDict, FieldReference
from ml_collections.config_dict import _Op
if __name__ == "__main__":
a = FieldReference(None, int)
b = FieldReference(None, int)
a_tuple = FieldReference(a, op=_Op(lambda x : (x,), ()))
b_tuple = FieldReference(b, op=_Op(lambda x : (x,), ()))
c = a_tuple + b_tuple
a.set(1)
b.set(2)
print(f"a_tuple: {a_tuple.get()}")
print(f"b_tuple: {b_tuple.get()}")
print(f"c: {c.get()}")
print(f"a_tuple type: {a_tuple._field_type}")
print(f"b_tuple type: {b_tuple._field_type}")
print(f"c type: {c._field_type}")
This outputs
a_tuple: (1,)
b_tuple: (2,)
c: (1, 2)
a_tuple type: <class 'int'>
b_tuple type: <class 'int'>
c type: <class 'int'>
Which is probably a bug. This was run on python 3.10.8 and ml_collections
0.1.1.
Note that if we change the script to
from ml_collections import ConfigDict, FieldReference
from ml_collections.config_dict import _Op
if __name__ == "__main__":
a = FieldReference(None, int)
b = FieldReference(None, int)
a_tuple = FieldReference(a, tuple, op=_Op(lambda x : (x,), ()))
b_tuple = FieldReference(b, tuple, op=_Op(lambda x : (x,), ()))
c = a_tuple + b_tuple
a.set(1)
b.set(2)
print(f"a_tuple: {a_tuple.get()}")
print(f"b_tuple: {b_tuple.get()}")
print(f"c: {c.get()}")
print(f"a_tuple type: {a_tuple._field_type}")
print(f"b_tuple type: {b_tuple._field_type}")
print(f"c type: {c._field_type}")
it throws an exception:
Traceback (most recent call last):
File "/home/andrius/repos/sde-sampling/tmp.py", line 8, in <module>
a_tuple = FieldReference(a, tuple, op=_Op(lambda x : (x,), ()))
File "/home/andrius/env/lib/python3.10/site-packages/ml_collections/config_dict/config_dict.py", line 248, in __init__
self.set(default)
File "/home/andrius/env/lib/python3.10/site-packages/ml_collections/config_dict/config_dict.py", line 305, in set
raise TypeError('Reference is of type {} but should be of type {}'
TypeError: Reference is of type <class 'int'> but should be of type <class 'tuple'>
I think it would be better to change the behaviour as follows:
field_type
is not specified when a FieldReference
object is constructed, set it to the type of the argument default
as it is done now.field_type
it is specified, then:
op
is not specified and field_type
is different from the type of default
, raise an exception.op
is specified, raise an exception in get
if the result of op
is not the same as the field_type
specified.I'd be happy to file a PR if authors agree.
notebook code:
from ml_collections import config_flags
_CONFIG = config_flags.DEFINE_config_file('config')
mp.set_start_method('spawn')
params = _CONFIG.value
get an error:
---------------------------------------------------------------------------
UnparsedFlagAccessError Traceback (most recent call last)
Cell In[2], line 5
2 _CONFIG = config_flags.DEFINE_config_file('config')
3 mp.set_start_method('spawn')
----> 5 params = _CONFIG.value
File ~/anaconda3/envs/LLM/lib/python3.11/site-packages/absl/flags/_flagvalues.py:1376, in FlagHolder.value(self)
1365 @property
1366 def value(self):
1367 """Returns the value of the flag.
1368
1369 If ``_ensure_non_none_value`` is ``True``, then return value is not
(...)
1374 IllegalFlagValueError: if value is None unexpectedly.
1375 """
-> 1376 val = getattr(self._flagvalues, self._name)
1377 if self._ensure_non_none_value and val is None:
1378 raise _exceptions.IllegalFlagValueError(
1379 'Unexpected None value for flag %s' % self._name)
File ~/anaconda3/envs/LLM/lib/python3.11/site-packages/absl/flags/_flagvalues.py:481, in FlagValues.__getattr__(self, name)
479 return fl[name].value
480 else:
--> 481 raise _exceptions.UnparsedFlagAccessError(
482 'Trying to access flag --%s before flags were parsed.' % name)
UnparsedFlagAccessError: Trying to access flag --config before flags were parsed.
Describe the bug
This project uses the imp
module which has been deprecated since Python 3.4 and removed in 3.12:
PendingDeprecationWarning
since 3.4 (2014)DeprecationWarning
since 3.5 (2015)DeprecationWarning
to say removal in 3.12 since 3.10 (2021)Python 3.12 is set for release on 2023-10-02 and this library is one of the top 5,000 most-downloaded from PyPI.
Please could you upgrade to use importlib
? The imp
docs have suggestions on what to use to replace each function and constant.
To Reproduce
git grep "\bimp\b"
Expected behavior
A clear and concise description of what you expected to happen.
Works on Python 3.12 using importlib.
Would it be possible to get git tags for releases? This helps with packaging outside of pypi, eg. on NixOS. I looked through the commit history, but couldn't find a commit that looked like a release.
Hi,
This code is generating a lot of interest in the advanced research computing community.
Unfortunately, it currently can not be supported on most clusters because of its lack of release versions.
This makes it very difficult to install software in a reproducible way.
Any plan to address this would be appreciated:
Make proper release versions (I recommend Semantic versionning: https://semver.org/lang)
Thanks!
Hi,
There have some important updates (such as supporting enum) since the current version. Do you have any plan to upgrade the version for the release? It will be more convenient to users, otherwise we have to pip install from the latest commit.
This is not a bug, but I cannot remove the bug label.
Thanks for open-sourcing this nice package!
I don't think I see why in principle FrozenConfigDict can't include fields which are tuples of frozen config dicts, given that Python tuples themselves are already frozen. Is is just a matter of not adding support yet, or is there something more fundamental?
For example,
FrozenConfigDict({'a': ({'b': 1}, {'b': 2})})
gives the error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-133-a093d88df5a2> in <module>
----> 1 c=FrozenConfigDict({'a': ({'b': 1}, {'b': 2})})
~/opt/anaconda3/lib/python3.7/site-packages/ml_collections/config_dict/config_dict.py in __init__(self, initial_dictionary, type_safe)
1635 type_safe=type_safe)
1636
-> 1637 _frozenconfigdict_valid_input(initial_configdict)
1638 # This will define the self._configdict attribute
1639 _frozenconfigdict_fill_seed(self, initial_configdict)
~/opt/anaconda3/lib/python3.7/site-packages/ml_collections/config_dict/config_dict.py in _frozenconfigdict_valid_input(obj, ancestor_list)
1418 if isinstance(obj, ConfigDict):
1419 for value in obj.values():
-> 1420 _frozenconfigdict_valid_input(value, ancestor_list)
1421 elif isinstance(obj, FieldReference):
1422 _frozenconfigdict_valid_input(obj.get(), ancestor_list)
~/opt/anaconda3/lib/python3.7/site-packages/ml_collections/config_dict/config_dict.py in _frozenconfigdict_valid_input(obj, ancestor_list)
1424 for element in obj:
1425 if isinstance(element, (dict, ConfigDict, FieldReference)):
-> 1426 raise ValueError('Bad FrozenConfigDict initialization: Cannot '
1427 'contain a dict, ConfigDict, or FieldReference '
1428 'within a list or tuple.')
ValueError: Bad FrozenConfigDict initialization: Cannot contain a dict, ConfigDict, or FieldReference within a list or tuple.
Describe the bug
ConfigDict() does not allow dictionary with int data type as key
To Reproduce
from ml_collections import ConfigDict
tmp = ConfigDict()
tmp.demo0 = 10 # no bug
tmp.demo1 = '10' # no bug
tmp.demo2 = {'0': '10', '1': '20'} # no bug
tmp.demo3 = {0: '10', 1: '20'} # bug
ConfigDict
ConfigFlags
im not sure how to provide config flags, therefore i created a brand new conda virtual env to demonstrate the bug.
~$ conda create --name bug python=3.8
~$ conda activate bug
~$ pip install ml-collections
~$ python
from ml_collections import ConfigDict
tmp = ConfigDict()
tmp.demo = {0: '10', 1: '20'}
Expected behavior
the should run, and save the dictionary without issue
Environment:
Additional context
when I changed the key to str type, the code runs. However I want to be able to use str type.
I'm currently using the following pattern to load a config from dictionary
from absl import app, flags
from ml_collections import config_flags
from .config import MY_CONFIG_DICT
FLAGS = flags.FLAGS
_CONFIG = config_flags.DEFINE_config_from_dict('config', MY_CONFIG_DICT)
# ...
absl
allows me to alter the config from the command line - this is great for running experiments with different parameters. However, if I want to re-use the same configuration for another run, there doesn't seem to be a de-facto way to do this.
For example, I want to be able to run:
python -m src.experiment --config.value1 10 --config.value2 20
and then be able to re-use this config without having to remember to flags.
How can we save and re-load a ConfigDict
instance when using absl
flags?
My current approach to this is:
In the experiment script:
ConfigDict
to a .yml
file.In a postprocessing script:
.yml
file.# postprocessing.py
from absl import flags
from ml_collections import config_dict, config_flags
from .config import MY_CONFIG_DICT
FLAGS = flags.FLAGS
_CONFIG = config_flags.DEFINE_config_from_dict('config', MY_CONFIG_DICT)
_LOAD_CONFIG = flags.DEFINE_string('load_config', None, '.yml file from which to update config.')
def update_config():
def _update_config_dict(a, b) -> config_dict.ConfigDict:
da = a.to_dict(preserve_field_references=True)
db = b.to_dict(preserve_field_references=True)
return config_dict.ConfigDict(da | db)
with open(FLAGS.load_config, 'r') as f:
config_from_yaml = yaml.load(f, Loader=yaml.UnsafeLoader)
FLAGS.config = _update_config_dict(FLAGS.config, config_from_yaml)
def main(_):
update_config()
# ...
While this works it doesn't seem a great approach.
Is there any standard way to accomplish this - or should I be dealing with updated configs in a different manner?
Say I have a configuration like
configs = ConfigDict()
configs.num_neurons = 15
config.activation = "relu"
Now I would like to delete activation
. Is it possible?
Describe the bug
When using config_flags.DEFINE_config_dict
we are unable to use --flagfile
to override values, we get the error:
FATAL Flags parsing error: Unknown command line flag ...
To Reproduce
# test.py
from absl import app, flags
from ml_collections import config_dict, config_flags
import copy
CONFIG = config_dict.ConfigDict(
{
'animal': {
'cat': 10,
'dog': 20
},
'vehicle': {
'car': 30,
'train': 40
}
}
)
FLAGS = flags.FLAGS
# _CONFIG = config_flags.DEFINE_config_dict('config', copy.deepcopy(CONFIG))
_CONFIG = config_flags.DEFINE_config_dict('config', CONFIG)
def main(_):
print(FLAGS.config)
if __name__ == '__main__':
app.run(main)
# flagfile.txt
--config.animal.cat=5000
$ python test.py --flagfile=flagfile.txt
FATAL Flags parsing error: Unknown command line flag 'config.animal.cat'
Pass --helpshort or --helpfull to see help on flags.
Expected behavior
absl
should be able to read the changes to config from the flagfile.txt
, something is going wrong here.
Environment:
Describe the bug
The requirements.txt
file is missing from the source tarball that was pushed to PyPI (see https://pypi.org/project/ml-collections/0.1.0/#files). That means installing from that source tarball is broken, because the setup.py
tries to read requirements.txt
, which leads to:
Running setup.py (path:/tmp/eb-hdz3kwtm/pip-req-build-7djg6soh/setup.py) egg_info for package from file:///tmp/ml_collections/ml_collections-0.1.0
Created temporary directory: /tmp/eb-hdz3kwtm/pip-pip-egg-info-s7iw7xqe
Running command python setup.py egg_info
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/tmp/eb-hdz3kwtm/pip-req-build-7djg6soh/setup.py", line 42, in <module>
install_requires=_parse_requirements('requirements.txt'),
File "/tmp/eb-hdz3kwtm/pip-req-build-7djg6soh/setup.py", line 23, in _parse_requirements
with open(requirements_txt_path) as fp:
FileNotFoundError: [Errno 2] No such file or directory: 'requirements.txt'
To Reproduce
ml_collections-0.1.0.tar.gz
from https://pypi.org/project/ml-collections/0.1.0/#filespip install .
ConfigDict
(none, irrelevant)
ConfigFlags
(none, irrelevant)
Expected behavior
Installing from PyPI source tarball works.
Environment:
Additional context
(none)
Dataclasses are still the way to go for some libraries. Is there any simple way to convert a dataclass object into an ConfigDict?
Environment:
Hi, I've tried to install ml-collections with code "pip install ml-collections" under terminal window in Pycharm. But it returns ERROR: No .egg-info directory found in C:\xxx\xxx..
. Did this happen because the version of python doesn't satisfy the requirement or something else? Could you please tell me how to install this package successfully.
I find this behavior surprising with regards to the type safety ConfigDict offers:
c = ConfigDict({'a': 1})
c.a = None
c.a = 'string'
I've managed to change the type of a
from integer to string without any invocation of ignore_type
. Would it perhaps make sense for the a
field to "remember" that it was originally an int?
This is on the master branch (cd29283).
Hi,
I have found the private function _apply_op
to be quite useful for custom lazy computation. For example:
import ml_collections
structures = ml_collections.FieldReference(
[
"parotid_right",
"parotid_left",
"brainstem",
"spinal_cord",
],
field_type=list,
)
################################################
number_of_structures = structures._apply_op(len)
number_of_structures._field_type = int
################################################
edge_length = ml_collections.FieldReference(512, field_type=int)
cfg = ml_collections.ConfigDict(
{
"structures": structures,
"edge_length": edge_length,
"mask_shape": (edge_length, edge_length, number_of_structures),
}
)
assert cfg.mask_shape == (512, 512, 4) # Passes
cfg.structures = ["brainstem", "spinal_cord"]
cfg.edge_length = 64
assert cfg.mask_shape == (64, 64, 2) # Passes
Might it be possible that this method be exposed as a first class method named apply_op
(or more simply apply
)? I'd be happy to make a PR including the above as a test if there is interest in making this design adjustment.
Here is an example implementation:
def apply(parent: ml_collections.FieldReference, fn, field_type=None):
child = parent._apply_op(fn) # pylint: disable = protected-access
if field_type is not None:
child._field_type = field_type # pylint: disable = protected-access
a_child_value = child.get()
if not isinstance(
a_child_value, child._field_type # pylint: disable = protected-access
):
raise TypeError(
f"Expected operation result to be of type {field_type}. "
f"Instead however it was {type(a_child_value)}. "
"Adjust the `field_type` parameter accordingly."
)
return child
and a corresponding test:
def test_config_apply():
fr = ml_collections.FieldReference("a,b,c", field_type=str)
def fn(string):
return string.split(",")
with pytest.raises(TypeError):
apply(fr, fn)
new_fr = apply(fr, fn, field_type=list)
assert new_fr.get() == ["a", "b", "c"]
Cheers,
Simon
***run:
import ml_collections
import os
import logging
def get_logger(logpath, filepath, package_files=[], displaying=True, saving=True, debug=False):
logger = logging.getLogger(name)
if debug:
level = logging.DEBUG
else:
level = logging.INFO
logger.setLevel(level)
if saving:
info_file_handler = logging.FileHandler(logpath, mode="a")
info_file_handler.setLevel(level)
logger.addHandler(info_file_handler)
if displaying:
console_handler = logging.StreamHandler()
console_handler.setLevel(level)
formatter = logging.Formatter('%(asctime)s -- [%(levelname)s] -- %(message)s') # '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
console_handler.setFormatter(formatter)
logger.addHandler(console_handler)
logger.info(filepath)
with open(filepath, "r") as f:
logger.info(f.read())
for f in package_files:
logger.info(f)
with open(f, "r") as package_f:
logger.info(package_f.read())
return logger
log = get_logger(logpath='../exp/log.txt', filepath=os.path.abspath(file))
log.info('fasd')
***output:
2023-01-13 18:14:09,523 -- [INFO] -- fasd
I0113 18:14:09.523282 139717526757952 vit_seg_configs.py:164] fasd
The second line of output is garbled.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.