google / ml_collections Goto Github PK

View Code? Open in Web Editor NEW

851.0 14.0 38.0 467 KB

ML Collections is a library of Python Collections designed for ML use cases.

Home Page: https://ml-collections.readthedocs.io/

License: Apache License 2.0

Starlark 2.60% Python 96.98% Shell 0.42%

ml_collections's Introduction

ML Collections

ML Collections is a library of Python Collections designed for ML use cases.

ConfigDict

The two classes called ConfigDict and FrozenConfigDict are "dict-like" data structures with dot access to nested elements. Together, they are supposed to be used as a main way of expressing configurations of experiments and models.

This document describes example usage of ConfigDict, FrozenConfigDict, FieldReference.

Features

Dot-based access to fields.
Locking mechanism to prevent spelling mistakes.
Lazy computation.
FrozenConfigDict() class which is immutable and hashable.
Type safety.
"Did you mean" functionality.
Human readable printing (with valid references and cycles), using valid YAML format.
Fields can be passed as keyword arguments using the ** operator.
There is one exception to the strong type-safety of the ConfigDict: int values can be passed in to fields of type float. In such a case, the value is type-converted to a float before being stored. (Back in the day of Python 2, there was a similar exception to allow both str and unicode values in string fields.)

Basic Usage

from ml_collections import config_dict

cfg = config_dict.ConfigDict()
cfg.float_field = 12.6
cfg.integer_field = 123
cfg.another_integer_field = 234
cfg.nested = config_dict.ConfigDict()
cfg.nested.string_field = 'tom'

print(cfg.integer_field)  # Prints 123.
print(cfg['integer_field'])  # Prints 123 as well.

try:
  cfg.integer_field = 'tom'  # Raises TypeError as this field is an integer.
except TypeError as e:
  print(e)

cfg.float_field = 12  # Works: `Int` types can be assigned to `Float`.
cfg.nested.string_field = u'bob'  # `String` fields can store Unicode strings.

print(cfg)

FrozenConfigDict

A FrozenConfigDictis an immutable, hashable type of ConfigDict:

from ml_collections import config_dict

initial_dictionary = {
    'int': 1,
    'list': [1, 2],
    'tuple': (1, 2, 3),
    'set': {1, 2, 3, 4},
    'dict_tuple_list': {'tuple_list': ([1, 2], 3)}
}

cfg = config_dict.ConfigDict(initial_dictionary)
frozen_dict = config_dict.FrozenConfigDict(initial_dictionary)

print(frozen_dict.tuple)  # Prints tuple (1, 2, 3)
print(frozen_dict.list)  # Prints tuple (1, 2)
print(frozen_dict.set)  # Prints frozenset {1, 2, 3, 4}
print(frozen_dict.dict_tuple_list.tuple_list[0])  # Prints tuple (1, 2)

frozen_cfg = config_dict.FrozenConfigDict(cfg)
print(frozen_cfg == frozen_dict)  # True
print(hash(frozen_cfg) == hash(frozen_dict))  # True

try:
  frozen_dict.int = 2 # Raises TypeError as FrozenConfigDict is immutable.
except AttributeError as e:
  print(e)

# Converting between `FrozenConfigDict` and `ConfigDict`:
thawed_frozen_cfg = config_dict.ConfigDict(frozen_dict)
print(thawed_frozen_cfg == cfg)  # True
frozen_cfg_to_cfg = frozen_dict.as_configdict()
print(frozen_cfg_to_cfg == cfg)  # True

FieldReferences and placeholders

A FieldReference is useful for having multiple fields use the same value. It can also be used for lazy computation.

You can use placeholder() as a shortcut to create a FieldReference (field) with a None default value. This is useful if a program uses optional configuration fields.

from ml_collections import config_dict

placeholder = config_dict.FieldReference(0)
cfg = config_dict.ConfigDict()
cfg.placeholder = placeholder
cfg.optional = config_dict.placeholder(int)
cfg.nested = config_dict.ConfigDict()
cfg.nested.placeholder = placeholder

try:
  cfg.optional = 'tom'  # Raises Type error as this field is an integer.
except TypeError as e:
  print(e)

cfg.optional = 1555  # Works fine.
cfg.placeholder = 1  # Changes the value of both placeholder and
                     # nested.placeholder fields.

print(cfg)

Note that the indirection provided by FieldReferences will be lost if accessed through a ConfigDict.

from ml_collections import config_dict

placeholder = config_dict.FieldReference(0)
cfg.field1 = placeholder
cfg.field2 = placeholder  # This field will be tied to cfg.field1.
cfg.field3 = cfg.field1  # This will just be an int field initialized to 0.

Lazy computation

Using a FieldReference in a standard operation (addition, subtraction, multiplication, etc...) will return another FieldReference that points to the original's value. You can use FieldReference.get() to execute the operations and get the reference's computed value, and FieldReference.set() to change the original reference's value.

from ml_collections import config_dict

ref = config_dict.FieldReference(1)
print(ref.get())  # Prints 1

add_ten = ref.get() + 10  # ref.get() is an integer and so is add_ten
add_ten_lazy = ref + 10  # add_ten_lazy is a FieldReference - NOT an integer

print(add_ten)  # Prints 11
print(add_ten_lazy.get())  # Prints 11 because ref's value is 1

# Addition is lazily computed for FieldReferences so changing ref will change
# the value that is used to compute add_ten.
ref.set(5)
print(add_ten)  # Prints 11
print(add_ten_lazy.get())  # Prints 15 because ref's value is 5

If a FieldReference has None as its original value, or any operation has an argument of None, then the lazy computation will evaluate to None.

We can also use fields in a ConfigDict in lazy computation. In this case a field will only be lazily evaluated if ConfigDict.get_ref() is used to get it.

from ml_collections import config_dict

config = config_dict.ConfigDict()
config.reference_field = config_dict.FieldReference(1)
config.integer_field = 2
config.float_field = 2.5

# No lazy evaluatuations because we didn't use get_ref()
config.no_lazy = config.integer_field * config.float_field

# This will lazily evaluate ONLY config.integer_field
config.lazy_integer = config.get_ref('integer_field') * config.float_field

# This will lazily evaluate ONLY config.float_field
config.lazy_float = config.integer_field * config.get_ref('float_field')

# This will lazily evaluate BOTH config.integer_field and config.float_Field
config.lazy_both = (config.get_ref('integer_field') *
                    config.get_ref('float_field'))

config.integer_field = 3
print(config.no_lazy)  # Prints 5.0 - It uses integer_field's original value

print(config.lazy_integer)  # Prints 7.5

config.float_field = 3.5
print(config.lazy_float)  # Prints 7.0
print(config.lazy_both)  # Prints 10.5

Changing lazily computed values

Lazily computed values in a ConfigDict can be overridden in the same way as regular values. The reference to the FieldReference used for the lazy computation will be lost and all computations downstream in the reference graph will use the new value.

from ml_collections import config_dict

config = config_dict.ConfigDict()
config.reference = 1
config.reference_0 = config.get_ref('reference') + 10
config.reference_1 = config.get_ref('reference') + 20
config.reference_1_0 = config.get_ref('reference_1') + 100

print(config.reference)  # Prints 1.
print(config.reference_0)  # Prints 11.
print(config.reference_1)  # Prints 21.
print(config.reference_1_0)  # Prints 121.

config.reference_1 = 30

print(config.reference)  # Prints 1 (unchanged).
print(config.reference_0)  # Prints 11 (unchanged).
print(config.reference_1)  # Prints 30.
print(config.reference_1_0)  # Prints 130.

Cycles

You cannot create cycles using references. Fortunately the only way to create a cycle is by assigning a computed field to one that is not the result of computation. This is forbidden:

from ml_collections import config_dict

config = config_dict.ConfigDict()
config.integer_field = 1
config.bigger_integer_field = config.get_ref('integer_field') + 10

try:
  # Raises a MutabilityError because setting config.integer_field would
  # cause a cycle.
  config.integer_field = config.get_ref('bigger_integer_field') + 2
except config_dict.MutabilityError as e:
  print(e)

One-way references

One gotcha with get_ref is that it creates a bi-directional dependency when no operations are performed on the value.

from ml_collections import config_dict

config = config_dict.ConfigDict()
config.reference = 1
config.reference_0 = config.get_ref('reference')
config.reference_0 = 2
print(config.reference)  # Prints 2.
print(config.reference_0)  # Prints 2.

This can be avoided by using get_oneway_ref instead of get_ref.

from ml_collections import config_dict

config = config_dict.ConfigDict()
config.reference = 1
config.reference_0 = config.get_oneway_ref('reference')
config.reference_0 = 2
print(config.reference)  # Prints 1.
print(config.reference_0)  # Prints 2.

Advanced usage

Here are some more advanced examples showing lazy computation with different operators and data types.

from ml_collections import config_dict

config = config_dict.ConfigDict()
config.float_field = 12.6
config.integer_field = 123
config.list_field = [0, 1, 2]

config.float_multiply_field = config.get_ref('float_field') * 3
print(config.float_multiply_field)  # Prints 37.8

config.float_field = 10.0
print(config.float_multiply_field)  # Prints 30.0

config.longer_list_field = config.get_ref('list_field') + [3, 4, 5]
print(config.longer_list_field)  # Prints [0, 1, 2, 3, 4, 5]

config.list_field = [-1]
print(config.longer_list_field)  # Prints [-1, 3, 4, 5]

# Both operands can be references
config.ref_subtraction = (
    config.get_ref('float_field') - config.get_ref('integer_field'))
print(config.ref_subtraction)  # Prints -113.0

config.integer_field = 10
print(config.ref_subtraction)  # Prints 0.0

Equality checking

You can use == and .eq_as_configdict() to check equality among ConfigDict and FrozenConfigDict objects.

from ml_collections import config_dict

dict_1 = {'list': [1, 2]}
dict_2 = {'list': (1, 2)}
cfg_1 = config_dict.ConfigDict(dict_1)
frozen_cfg_1 = config_dict.FrozenConfigDict(dict_1)
frozen_cfg_2 = config_dict.FrozenConfigDict(dict_2)

# True because FrozenConfigDict converts lists to tuples
print(frozen_cfg_1.items() == frozen_cfg_2.items())
# False because == distinguishes the underlying difference
print(frozen_cfg_1 == frozen_cfg_2)

# False because == distinguishes these types
print(frozen_cfg_1 == cfg_1)
# But eq_as_configdict() treats both as ConfigDict, so these are True:
print(frozen_cfg_1.eq_as_configdict(cfg_1))
print(cfg_1.eq_as_configdict(frozen_cfg_1))

Equality checking with lazy computation

Equality checks see if the computed values are the same. Equality is satisfied if two sets of computations are different as long as they result in the same value.

from ml_collections import config_dict

cfg_1 = config_dict.ConfigDict()
cfg_1.a = 1
cfg_1.b = cfg_1.get_ref('a') + 2

cfg_2 = config_dict.ConfigDict()
cfg_2.a = 1
cfg_2.b = cfg_2.get_ref('a') * 3

# True because all computed values are the same
print(cfg_1 == cfg_2)

Locking and copying

Here is an example with lock() and deepcopy():

import copy
from ml_collections import config_dict

cfg = config_dict.ConfigDict()
cfg.integer_field = 123

# Locking prohibits the addition and deletion of new fields but allows
# modification of existing values.
cfg.lock()
try:
  cfg.intagar_field = 124  # Modifies the wrong field
except AttributeError as e:  # Raises AttributeError and suggests valid field.
  print(e)
with cfg.unlocked():
  cfg.intagar_field = 1555  # Works fine.

# Get a copy of the config dict.
new_cfg = copy.deepcopy(cfg)
new_cfg.integer_field = -123  # Works fine.

print(cfg)
print(new_cfg)

Output:

'Key "intagar_field" does not exist and cannot be added since the config is locked. Other fields present: "{\'integer_field\': 123}"\nDid you mean "integer_field" instead of "intagar_field"?'
intagar_field: 1555
integer_field: 123

intagar_field: 1555
integer_field: -123

Dictionary attributes and initialization

from ml_collections import config_dict

referenced_dict = {'inner_float': 3.14}
d = {
    'referenced_dict_1': referenced_dict,
    'referenced_dict_2': referenced_dict,
    'list_containing_dict': [{'key': 'value'}],
}

# We can initialize on a dictionary
cfg = config_dict.ConfigDict(d)

# Reference structure is preserved
print(id(cfg.referenced_dict_1) == id(cfg.referenced_dict_2))  # True

# And the dict attributes have been converted to ConfigDict
print(type(cfg.referenced_dict_1))  # ConfigDict

# However, the initialization does not look inside of lists, so dicts inside
# lists are not converted to ConfigDict
print(type(cfg.list_containing_dict[0]))  # dict

More Examples

For more examples, take a look at ml_collections/config_dict/examples/

For examples and gotchas specifically about initializing a ConfigDict, see ml_collections/config_dict/examples/config_dict_initialization.py.

Config Flags

This library adds flag definitions to absl.flags to handle config files. It does not wrap absl.flags so if using any standard flag definitions alongside config file flags, users must also import absl.flags.

Currently, this module adds two new flag types, namely DEFINE_config_file which accepts a path to a Python file that generates a configuration, and DEFINE_config_dict which accepts a configuration directly. Configurations are dict-like structures (see ConfigDict) whose nested elements can be overridden using special command-line flags. See the examples below for more details.

Usage

Use ml_collections.config_flags alongside absl.flags. For example:

script.py:

from absl import app
from absl import flags

from ml_collections import config_flags

_CONFIG = config_flags.DEFINE_config_file('my_config')
_MY_FLAG = flags.DEFINE_integer('my_flag', None)

def main(_):
  print(_CONFIG.value)
  print(_MY_FLAG.value)

if __name__ == '__main__':
  app.run(main)

config.py:

# Note that this is a valid Python script.
# get_config() can return an arbitrary dict-like object. However, it is advised
# to use ml_collections.config_dict.ConfigDict.
# See ml_collections/config_dict/examples/config_dict_basic.py

from ml_collections import config_dict

def get_config():
  config = config_dict.ConfigDict()
  config.field1 = 1
  config.field2 = 'tom'
  config.nested = config_dict.ConfigDict()
  config.nested.field = 2.23
  config.tuple = (1, 2, 3)
  return config

Warning: If you are using a pickle-based distributed programming framework such as Launchpad, be aware of limitations on the structure of this script that are [described below] (#config_files_and_pickling).

Now, after running:

python script.py --my_config=config.py \
                 --my_config.field1=8 \
                 --my_config.nested.field=2.1 \
                 --my_config.tuple='(1, 2, (1, 2))'

we get:

field1: 8
field2: tom
nested:
  field: 2.1
tuple: !!python/tuple
- 1
- 2
- !!python/tuple
  - 1
  - 2

Usage of DEFINE_config_dict is similar to DEFINE_config_file, the main difference is the configuration is defined in script.py instead of in a separate file.

script.py:

from absl import app

from ml_collections import config_dict
from ml_collections import config_flags

config = config_dict.ConfigDict()
config.field1 = 1
config.field2 = 'tom'
config.nested = config_dict.ConfigDict()
config.nested.field = 2.23
config.tuple = (1, 2, 3)

_CONFIG = config_flags.DEFINE_config_dict('my_config', config)

def main(_):
  print(_CONFIG.value)

if __name__ == '__main__':
  app.run()

config_file flags are compatible with the command-line flag syntax. All the following options are supported for non-boolean values in configurations:

-(-)config.field=value
-(-)config.field value

Options for boolean values are slightly different:

-(-)config.boolean_field: set boolean value to True.
-(-)noconfig.boolean_field: set boolean value to False.
-(-)config.boolean_field=value: value is true, false, True or False.

Note that -(-)config.boolean_field value is not supported.

Parameterising the get_config() function

It's sometimes useful to be able to pass parameters into get_config, and change what is returned based on this configuration. One example is if you are grid searching over parameters which have a different hierarchical structure - the flag needs to be present in the resulting ConfigDict. It would be possible to include the union of all possible leaf values in your ConfigDict, but this produces a confusing config result as you have to remember which parameters will actually have an effect and which won't.

A better system is to pass some configuration, indicating which structure of ConfigDict should be returned. An example is the following config file:

from ml_collections import config_dict

def get_config(config_string):
  possible_structures = {
      'linear': config_dict.ConfigDict({
          'model_constructor': 'snt.Linear',
          'model_config': config_dict.ConfigDict({
              'output_size': 42,
          }),
      'lstm': config_dict.ConfigDict({
          'model_constructor': 'snt.LSTM',
          'model_config': config_dict.ConfigDict({
              'hidden_size': 108,
          })
      })
  }

  return possible_structures[config_string]

The value of config_string will be anything that is to the right of the first colon in the config file path, if one exists. If no colon exists, no value is passed to get_config (producing a TypeError if get_config expects a value).

The above example can be run like:

python script.py -- --config=path_to_config.py:linear \
                    --config.model_config.output_size=256

or like:

python script.py -- --config=path_to_config.py:lstm \
                    --config.model_config.hidden_size=512

Additional features

Loads any valid python script which defines get_config() function returning any python object.
Automatic locking of the loaded object, if the loaded object defines a callable .lock() method.
Supports command-line overriding of arbitrarily nested values in dict-like objects (with key/attribute based getters/setters) of the following types:
- int
- float
- bool
- str
- tuple (but not list)
- enum.Enum
Overriding is type safe.
Overriding of a tuple can be done by passing in the tuple value as a string (see the example in the Usage section).
The overriding tuple object can be of a different length and have different item types than the original. Nested tuples are also supported.

Config Files and Pickling {#config_files_and_pickling}

This is likely to be troublesome:

@dataclasses.dataclass
class MyRecord:
  num_balloons: int
  color: str

def get_config():
  return MyRecord(num_balloons=99, color='red')

This is not:

def get_config():
  @dataclasses.dataclass
  class MyRecord:
    num_balloons: int
    color: str

  return MyRecord(num_balloons=99, color='red')

Explanation

A config file is a Python module but it is not imported through Python's usual module-importing mechanism.

Meanwhile, serialization libraries such as cloudpickle (which is used by Launchpad) and Apache Beam expect to be able to pickle an object without also pickling every type to which it refers, on the assumption that types defined at module scope can later be reconstructed simply by re-importing the modules in which they are defined.

That assumption does not hold for a type that is defined at module scope in a config file, because the config file can't be imported the usual way. The symptom of this will be an ImportError when unpickling an object.

The treatment is to move types from module scope into get_config() so that they will be serialized along with the values that have those types.

Authors

Sergio Gómez Colmenarejo - [email protected]
Wojciech Marian Czarnecki - [email protected]
Nicholas Watters
Mohit Reddy - [email protected]

ml_collections's People

Stargazers

Watchers

ml_collections's Issues

Any way to convert a dataclass object into a ConfigDict?

Dataclasses are still the way to go for some libraries. Is there any simple way to convert a dataclass object into an ConfigDict?

ConfigDict does not allow dictionary key to be int type

Describe the bug

ConfigDict() does not allow dictionary with int data type as key

To Reproduce

from ml_collections import ConfigDict

tmp = ConfigDict()
tmp.demo0 = 10                         # no bug 
tmp.demo1 = '10'                       # no bug 
tmp.demo2 = {'0': '10', '1': '20'}     # no bug 
tmp.demo3 = {0: '10', 1: '20'}         # bug

ConfigDict

ConfigFlags

im not sure how to provide config flags, therefore i created a brand new conda virtual env to demonstrate the bug.

~$ conda create --name bug python=3.8
~$ conda activate bug
~$ pip install ml-collections
~$ python

    from ml_collections import ConfigDict
    tmp = ConfigDict()
    tmp.demo = {0: '10', 1: '20'}

Expected behavior

the should run, and save the dictionary without issue

Environment:

OS: ubuntu
OS Version: 20
Python: 3.8

Additional context

when I changed the key to str type, the code runs. However I want to be able to use str type.

'import ml_collections' and 'import logging' occured a bug.

***run:
import ml_collections
import os
import logging

def get_logger(logpath, filepath, package_files=[], displaying=True, saving=True, debug=False):
logger = logging.getLogger(name)
if debug:
level = logging.DEBUG
else:
level = logging.INFO
logger.setLevel(level)
if saving:
info_file_handler = logging.FileHandler(logpath, mode="a")
info_file_handler.setLevel(level)
logger.addHandler(info_file_handler)
if displaying:
console_handler = logging.StreamHandler()
console_handler.setLevel(level)
formatter = logging.Formatter('%(asctime)s -- [%(levelname)s] -- %(message)s') # '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
console_handler.setFormatter(formatter)
logger.addHandler(console_handler)
logger.info(filepath)
with open(filepath, "r") as f:
logger.info(f.read())

for f in package_files:
    logger.info(f)
    with open(f, "r") as package_f:
        logger.info(package_f.read())
return logger

log = get_logger(logpath='../exp/log.txt', filepath=os.path.abspath(file))
log.info('fasd')

***output:
2023-01-13 18:14:09,523 -- [INFO] -- fasd
I0113 18:14:09.523282 139717526757952 vit_seg_configs.py:164] fasd
The second line of output is garbled.

FrozenConfigDict can't initialize from ConfigDict with tuples

Thanks for open-sourcing this nice package!

I don't think I see why in principle FrozenConfigDict can't include fields which are tuples of frozen config dicts, given that Python tuples themselves are already frozen. Is is just a matter of not adding support yet, or is there something more fundamental?

For example,

FrozenConfigDict({'a': ({'b': 1}, {'b': 2})})

gives the error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-133-a093d88df5a2> in <module>
----> 1 c=FrozenConfigDict({'a': ({'b': 1}, {'b': 2})})

~/opt/anaconda3/lib/python3.7/site-packages/ml_collections/config_dict/config_dict.py in __init__(self, initial_dictionary, type_safe)
   1635                                     type_safe=type_safe)
   1636 
-> 1637     _frozenconfigdict_valid_input(initial_configdict)
   1638     # This will define the self._configdict attribute
   1639     _frozenconfigdict_fill_seed(self, initial_configdict)

~/opt/anaconda3/lib/python3.7/site-packages/ml_collections/config_dict/config_dict.py in _frozenconfigdict_valid_input(obj, ancestor_list)
   1418   if isinstance(obj, ConfigDict):
   1419     for value in obj.values():
-> 1420       _frozenconfigdict_valid_input(value, ancestor_list)
   1421   elif isinstance(obj, FieldReference):
   1422     _frozenconfigdict_valid_input(obj.get(), ancestor_list)

~/opt/anaconda3/lib/python3.7/site-packages/ml_collections/config_dict/config_dict.py in _frozenconfigdict_valid_input(obj, ancestor_list)
   1424     for element in obj:
   1425       if isinstance(element, (dict, ConfigDict, FieldReference)):
-> 1426         raise ValueError('Bad FrozenConfigDict initialization: Cannot '
   1427                          'contain a dict, ConfigDict, or FieldReference '
   1428                          'within a list or tuple.')

ValueError: Bad FrozenConfigDict initialization: Cannot contain a dict, ConfigDict, or FieldReference within a list or tuple.

Replace deprecated imp module with importlib

Describe the bug

This project uses the imp module which has been deprecated since Python 3.4 and removed in 3.12:

Raised PendingDeprecationWarning since 3.4 (2014)
Raised DeprecationWarning since 3.5 (2015)
Updated DeprecationWarning to say removal in 3.12 since 3.10 (2021)
Removal planned for 3.12 (2023)

Python 3.12 is set for release on 2023-10-02 and this library is one of the top 5,000 most-downloaded from PyPI.

Please could you upgrade to use importlib? The imp docs have suggestions on what to use to replace each function and constant.

To Reproduce

git grep "\bimp\b"

Expected behavior
A clear and concise description of what you expected to happen.

Works on Python 3.12 using importlib.

FieldReference op assumes type does not change when it does

Hi, I am trying to do some more complicated configuration file setups which include putting together a couple of lazy ints to a lazy tuple. The following script is an example how I can get it to work. However the resulting FieldReference object stores a tuple but thinks it's storing an int.

from ml_collections import ConfigDict, FieldReference
from ml_collections.config_dict import _Op

if __name__ == "__main__":
    a = FieldReference(None, int)
    b = FieldReference(None, int)

    a_tuple = FieldReference(a, op=_Op(lambda x : (x,), ()))
    b_tuple = FieldReference(b, op=_Op(lambda x : (x,), ()))
    c = a_tuple + b_tuple

    a.set(1)
    b.set(2)

    print(f"a_tuple: {a_tuple.get()}")
    print(f"b_tuple: {b_tuple.get()}")
    print(f"c: {c.get()}")

    print(f"a_tuple type: {a_tuple._field_type}")
    print(f"b_tuple type: {b_tuple._field_type}")
    print(f"c type: {c._field_type}")

This outputs

a_tuple: (1,)
b_tuple: (2,)
c: (1, 2)
a_tuple type: <class 'int'>
b_tuple type: <class 'int'>
c type: <class 'int'>

Which is probably a bug. This was run on python 3.10.8 and ml_collections 0.1.1.

Note that if we change the script to

from ml_collections import ConfigDict, FieldReference
from ml_collections.config_dict import _Op

if __name__ == "__main__":
    a = FieldReference(None, int)
    b = FieldReference(None, int)

    a_tuple = FieldReference(a, tuple, op=_Op(lambda x : (x,), ()))
    b_tuple = FieldReference(b, tuple, op=_Op(lambda x : (x,), ()))
    c = a_tuple + b_tuple

    a.set(1)
    b.set(2)

    print(f"a_tuple: {a_tuple.get()}")
    print(f"b_tuple: {b_tuple.get()}")
    print(f"c: {c.get()}")

    print(f"a_tuple type: {a_tuple._field_type}")
    print(f"b_tuple type: {b_tuple._field_type}")
    print(f"c type: {c._field_type}")

it throws an exception:

Traceback (most recent call last):
  File "/home/andrius/repos/sde-sampling/tmp.py", line 8, in <module>
    a_tuple = FieldReference(a, tuple, op=_Op(lambda x : (x,), ()))
  File "/home/andrius/env/lib/python3.10/site-packages/ml_collections/config_dict/config_dict.py", line 248, in __init__
    self.set(default)
  File "/home/andrius/env/lib/python3.10/site-packages/ml_collections/config_dict/config_dict.py", line 305, in set
    raise TypeError('Reference is of type {} but should be of type {}'
TypeError: Reference is of type <class 'int'> but should be of type <class 'tuple'>

I think it would be better to change the behaviour as follows:

If field_type is not specified when a FieldReference object is constructed, set it to the type of the argument default as it is done now.
If field_type it is specified, then:
- If op is not specified and field_type is different from the type of default, raise an exception.
- If op is specified, raise an exception in get if the result of op is not the same as the field_type specified.

I'd be happy to file a PR if authors agree.

Possible to delete a key?

Say I have a configuration like

configs = ConfigDict()
configs.num_neurons = 15
config.activation = "relu"

Now I would like to delete activation. Is it possible?

git tags for releases?

Would it be possible to get git tags for releases? This helps with packaging outside of pypi, eg. on NixOS. I looked through the commit history, but couldn't find a commit that looked like a release.

requirements.txt missing from PyPI source tarball

Describe the bug

The requirements.txt file is missing from the source tarball that was pushed to PyPI (see https://pypi.org/project/ml-collections/0.1.0/#files). That means installing from that source tarball is broken, because the setup.py tries to read requirements.txt, which leads to:

    Running setup.py (path:/tmp/eb-hdz3kwtm/pip-req-build-7djg6soh/setup.py) egg_info for package from file:///tmp/ml_collections/ml_collections-0.1.0
    Created temporary directory: /tmp/eb-hdz3kwtm/pip-pip-egg-info-s7iw7xqe
    Running command python setup.py egg_info
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/eb-hdz3kwtm/pip-req-build-7djg6soh/setup.py", line 42, in <module>
        install_requires=_parse_requirements('requirements.txt'),
      File "/tmp/eb-hdz3kwtm/pip-req-build-7djg6soh/setup.py", line 23, in _parse_requirements
        with open(requirements_txt_path) as fp:
    FileNotFoundError: [Errno 2] No such file or directory: 'requirements.txt'

To Reproduce

download ml_collections-0.1.0.tar.gz from https://pypi.org/project/ml-collections/0.1.0/#files
unpack, and try to install with pip install .

ConfigDict

(none, irrelevant)

ConfigFlags

(none, irrelevant)

Expected behavior

Installing from PyPI source tarball works.

Environment:

OS: Linux
OS Version: RHEL8
Python: Python 3.8.6

Additional context

(none)

Saving ConfigDict to File -- Using ABSL

The Problem:

I'm currently using the following pattern to load a config from dictionary

from absl import app, flags
from ml_collections import config_flags

from .config import MY_CONFIG_DICT

FLAGS = flags.FLAGS

_CONFIG = config_flags.DEFINE_config_from_dict('config', MY_CONFIG_DICT)

# ...

absl allows me to alter the config from the command line - this is great for running experiments with different parameters. However, if I want to re-use the same configuration for another run, there doesn't seem to be a de-facto way to do this.

For example, I want to be able to run:

python -m src.experiment --config.value1 10 --config.value2 20

and then be able to re-use this config without having to remember to flags.

How can we save and re-load a ConfigDict instance when using absl flags?

My Approach:

My current approach to this is:

In the experiment script:

Load config into flags as above.
Save the updated ConfigDict to a .yml file.

In a postprocessing script:

Load the original config using the same method.
Provide an additional flag which takes a path to the saved .yml file.
Update the original config as follows:

# postprocessing.py

from absl import flags
from ml_collections import config_dict, config_flags

from .config import MY_CONFIG_DICT

FLAGS = flags.FLAGS

_CONFIG = config_flags.DEFINE_config_from_dict('config', MY_CONFIG_DICT)
_LOAD_CONFIG = flags.DEFINE_string('load_config', None, '.yml file from which to update config.')

def update_config():

    def _update_config_dict(a, b) -> config_dict.ConfigDict:

        da = a.to_dict(preserve_field_references=True)
        db = b.to_dict(preserve_field_references=True)

        return config_dict.ConfigDict(da | db)

    with open(FLAGS.load_config, 'r') as f:
        config_from_yaml = yaml.load(f, Loader=yaml.UnsafeLoader)

    FLAGS.config = _update_config_dict(FLAGS.config, config_from_yaml)

def main(_):

    update_config()

    # ...

While this works it doesn't seem a great approach.

Is there any standard way to accomplish this - or should I be dealing with updated configs in a different manner?

Proper release

Hi,
This code is generating a lot of interest in the advanced research computing community.
Unfortunately, it currently can not be supported on most clusters because of its lack of release versions.
This makes it very difficult to install software in a reproducible way.

Any plan to address this would be appreciated:

Make proper release versions (I recommend Semantic versionning: https://semver.org/lang)

Thanks!

ml-collections breaks on Python 3.12 by relying on removed `imp` module

Describe the bug
This package no longer works on Python 3.12, because it relies on the imp module which has been removed in Python 3.12.

To Reproduce

from ml_collections import config_flags

which gives the error message:

Traceback (most recent call last):
File "", line 1, in
File "/home/user/.local/lib/python3.12/site-packages/ml_collections/config_flags/init.py", line 17, in
from .config_flags import DEFINE_config_dataclass
File "/home/user/.local/lib/python3.12/site-packages/ml_collections/config_flags/config_flags.py", line 23, in
import imp
ModuleNotFoundError: No module named 'imp'

Environment:

OS Version: Ubuntu 24.04 LTS
Python: 3.12

How to use it in jupyter notebook?

notebook code：

from ml_collections import config_flags
_CONFIG = config_flags.DEFINE_config_file('config')
mp.set_start_method('spawn')

params = _CONFIG.value

get an error:

---------------------------------------------------------------------------
UnparsedFlagAccessError                   Traceback (most recent call last)
Cell In[2], line 5
      2 _CONFIG = config_flags.DEFINE_config_file('config')
      3 mp.set_start_method('spawn')
----> 5 params = _CONFIG.value

File ~/anaconda3/envs/LLM/lib/python3.11/site-packages/absl/flags/_flagvalues.py:1376, in FlagHolder.value(self)
   1365 @property
   1366 def value(self):
   1367   """Returns the value of the flag.
   1368 
   1369   If ``_ensure_non_none_value`` is ``True``, then return value is not
   (...)
   1374     IllegalFlagValueError: if value is None unexpectedly.
   1375   """
-> 1376   val = getattr(self._flagvalues, self._name)
   1377   if self._ensure_non_none_value and val is None:
   1378     raise _exceptions.IllegalFlagValueError(
   1379         'Unexpected None value for flag %s' % self._name)

File ~/anaconda3/envs/LLM/lib/python3.11/site-packages/absl/flags/_flagvalues.py:481, in FlagValues.__getattr__(self, name)
    479   return fl[name].value
    480 else:
--> 481   raise _exceptions.UnparsedFlagAccessError(
    482       'Trying to access flag --%s before flags were parsed.' % name)

UnparsedFlagAccessError: Trying to access flag --config before flags were parsed.

Plan for next version?

Hi,

There have some important updates (such as supporting enum) since the current version. Do you have any plan to upgrade the version for the release? It will be more convenient to users, otherwise we have to pip install from the latest commit.

This is not a bug, but I cannot remove the bug label.

Failed to install

Environment:

pip version: 22.2
Python version: 3.7.2

Hi, I've tried to install ml-collections with code "pip install ml-collections" under terminal window in Pycharm. But it returns ERROR: No .egg-info directory found in C:\xxx\xxx... Did this happen because the version of python doesn't satisfy the requirement or something else? Could you please tell me how to install this package successfully.

Feature request: Expose `_apply_op` as a public method

Hi,

I have found the private function _apply_op to be quite useful for custom lazy computation. For example:

import ml_collections

structures = ml_collections.FieldReference(
    [
        "parotid_right",
        "parotid_left",
        "brainstem",
        "spinal_cord",
    ],
    field_type=list,
)

################################################
number_of_structures = structures._apply_op(len)
number_of_structures._field_type = int
################################################

edge_length = ml_collections.FieldReference(512, field_type=int)

cfg = ml_collections.ConfigDict(
    {
        "structures": structures,
        "edge_length": edge_length,
        "mask_shape": (edge_length, edge_length, number_of_structures),
    }
)

assert cfg.mask_shape == (512, 512, 4)  # Passes

cfg.structures = ["brainstem", "spinal_cord"]
cfg.edge_length = 64
assert cfg.mask_shape == (64, 64, 2)  # Passes

Might it be possible that this method be exposed as a first class method named apply_op (or more simply apply)? I'd be happy to make a PR including the above as a test if there is interest in making this design adjustment.

Here is an example implementation:

def apply(parent: ml_collections.FieldReference, fn, field_type=None):
    child = parent._apply_op(fn)  # pylint: disable = protected-access

    if field_type is not None:
        child._field_type = field_type  # pylint: disable = protected-access

    a_child_value = child.get()
    if not isinstance(
        a_child_value, child._field_type  # pylint: disable = protected-access
    ):
        raise TypeError(
            f"Expected operation result to be of type {field_type}. "
            f"Instead however it was {type(a_child_value)}. "
            "Adjust the `field_type` parameter accordingly."
        )

    return child

and a corresponding test:

def test_config_apply():
    fr = ml_collections.FieldReference("a,b,c", field_type=str)

    def fn(string):
        return string.split(",")

    with pytest.raises(TypeError):
        apply(fr, fn)

    new_fr = apply(fr, fn, field_type=list)
    assert new_fr.get() == ["a", "b", "c"]

Cheers,
Simon

ml_collections / absl -- cannot read config updates from --flagfile

Describe the bug
When using config_flags.DEFINE_config_dict we are unable to use --flagfile to override values, we get the error:

FATAL Flags parsing error: Unknown command line flag ...

To Reproduce

# test.py

from absl import app, flags
from ml_collections import config_dict, config_flags

import copy


CONFIG = config_dict.ConfigDict(
    {
        'animal': {
            'cat': 10,
            'dog': 20
        },
        'vehicle': {
            'car': 30,
            'train': 40
        }
    }
)


FLAGS = flags.FLAGS
# _CONFIG = config_flags.DEFINE_config_dict('config', copy.deepcopy(CONFIG))
_CONFIG = config_flags.DEFINE_config_dict('config', CONFIG)


def main(_):

    print(FLAGS.config)


if __name__ == '__main__':
    app.run(main)

# flagfile.txt
--config.animal.cat=5000

$ python test.py --flagfile=flagfile.txt

FATAL Flags parsing error: Unknown command line flag 'config.animal.cat'
Pass --helpshort or --helpfull to see help on flags.

Expected behavior
absl should be able to read the changes to config from the flagfile.txt, something is going wrong here.

Environment:

OS: MacOS
OS Version: macOS Venture (13.0)
Python: 3.10.0

Field types can change after being set to None

I find this behavior surprising with regards to the type safety ConfigDict offers:

c = ConfigDict({'a': 1})
c.a = None
c.a = 'string'

I've managed to change the type of a from integer to string without any invocation of ignore_type. Would it perhaps make sense for the a field to "remember" that it was originally an int?

This is on the master branch (cd29283).

google / ml_collections Goto Github PK

ml_collections's Introduction

ML Collections

ConfigDict

Features

Basic Usage

FrozenConfigDict

FieldReferences and placeholders

Lazy computation

Changing lazily computed values

Cycles

One-way references

Advanced usage

Equality checking

Equality checking with lazy computation

Locking and copying

Dictionary attributes and initialization

More Examples

Config Flags

Usage

Parameterising the get_config() function

Additional features

Config Files and Pickling {#config_files_and_pickling}

Explanation

Authors

ml_collections's People

Stargazers

Watchers

Forkers

ml_collections's Issues

The Problem:

My Approach:

Recommend Projects

Recommend Topics

Recommend Org