google / ml_collections Goto Github PK

ML Collections is a library of Python Collections designed for ML use cases.

Home Page: https://ml-collections.readthedocs.io/

License: Apache License 2.0

Starlark 2.58% Python 97.00% Shell 0.42%

ml_collections's Issues

FieldReference op assumes type does not change when it does

Hi, I am trying to do some more complicated configuration file setups which include putting together a couple of lazy ints to a lazy tuple. The following script is an example how I can get it to work. However the resulting FieldReference object stores a tuple but thinks it's storing an int.

from ml_collections import ConfigDict, FieldReference
from ml_collections.config_dict import _Op

if __name__ == "__main__":
    a = FieldReference(None, int)
    b = FieldReference(None, int)

    a_tuple = FieldReference(a, op=_Op(lambda x : (x,), ()))
    b_tuple = FieldReference(b, op=_Op(lambda x : (x,), ()))
    c = a_tuple + b_tuple

    a.set(1)
    b.set(2)

    print(f"a_tuple: {a_tuple.get()}")
    print(f"b_tuple: {b_tuple.get()}")
    print(f"c: {c.get()}")

    print(f"a_tuple type: {a_tuple._field_type}")
    print(f"b_tuple type: {b_tuple._field_type}")
    print(f"c type: {c._field_type}")

This outputs

a_tuple: (1,)
b_tuple: (2,)
c: (1, 2)
a_tuple type: <class 'int'>
b_tuple type: <class 'int'>
c type: <class 'int'>

Which is probably a bug. This was run on python 3.10.8 and ml_collections 0.1.1.

Note that if we change the script to

from ml_collections import ConfigDict, FieldReference
from ml_collections.config_dict import _Op

if __name__ == "__main__":
    a = FieldReference(None, int)
    b = FieldReference(None, int)

    a_tuple = FieldReference(a, tuple, op=_Op(lambda x : (x,), ()))
    b_tuple = FieldReference(b, tuple, op=_Op(lambda x : (x,), ()))
    c = a_tuple + b_tuple

    a.set(1)
    b.set(2)

    print(f"a_tuple: {a_tuple.get()}")
    print(f"b_tuple: {b_tuple.get()}")
    print(f"c: {c.get()}")

    print(f"a_tuple type: {a_tuple._field_type}")
    print(f"b_tuple type: {b_tuple._field_type}")
    print(f"c type: {c._field_type}")

it throws an exception:

Traceback (most recent call last):
  File "/home/andrius/repos/sde-sampling/tmp.py", line 8, in <module>
    a_tuple = FieldReference(a, tuple, op=_Op(lambda x : (x,), ()))
  File "/home/andrius/env/lib/python3.10/site-packages/ml_collections/config_dict/config_dict.py", line 248, in __init__
    self.set(default)
  File "/home/andrius/env/lib/python3.10/site-packages/ml_collections/config_dict/config_dict.py", line 305, in set
    raise TypeError('Reference is of type {} but should be of type {}'
TypeError: Reference is of type <class 'int'> but should be of type <class 'tuple'>

I think it would be better to change the behaviour as follows:

If field_type is not specified when a FieldReference object is constructed, set it to the type of the argument default as it is done now.
If field_type it is specified, then:
- If op is not specified and field_type is different from the type of default, raise an exception.
- If op is specified, raise an exception in get if the result of op is not the same as the field_type specified.

I'd be happy to file a PR if authors agree.

How to use it in jupyter notebook?

notebook code：

from ml_collections import config_flags
_CONFIG = config_flags.DEFINE_config_file('config')
mp.set_start_method('spawn')

params = _CONFIG.value

get an error:

---------------------------------------------------------------------------
UnparsedFlagAccessError                   Traceback (most recent call last)
Cell In[2], line 5
      2 _CONFIG = config_flags.DEFINE_config_file('config')
      3 mp.set_start_method('spawn')
----> 5 params = _CONFIG.value

File ~/anaconda3/envs/LLM/lib/python3.11/site-packages/absl/flags/_flagvalues.py:1376, in FlagHolder.value(self)
   1365 @property
   1366 def value(self):
   1367   """Returns the value of the flag.
   1368 
   1369   If ``_ensure_non_none_value`` is ``True``, then return value is not
   (...)
   1374     IllegalFlagValueError: if value is None unexpectedly.
   1375   """
-> 1376   val = getattr(self._flagvalues, self._name)
   1377   if self._ensure_non_none_value and val is None:
   1378     raise _exceptions.IllegalFlagValueError(
   1379         'Unexpected None value for flag %s' % self._name)

File ~/anaconda3/envs/LLM/lib/python3.11/site-packages/absl/flags/_flagvalues.py:481, in FlagValues.__getattr__(self, name)
    479   return fl[name].value
    480 else:
--> 481   raise _exceptions.UnparsedFlagAccessError(
    482       'Trying to access flag --%s before flags were parsed.' % name)

UnparsedFlagAccessError: Trying to access flag --config before flags were parsed.

Replace deprecated imp module with importlib

Describe the bug

This project uses the imp module which has been deprecated since Python 3.4 and removed in 3.12:

Raised PendingDeprecationWarning since 3.4 (2014)
Raised DeprecationWarning since 3.5 (2015)
Updated DeprecationWarning to say removal in 3.12 since 3.10 (2021)
Removal planned for 3.12 (2023)

Python 3.12 is set for release on 2023-10-02 and this library is one of the top 5,000 most-downloaded from PyPI.

Please could you upgrade to use importlib? The imp docs have suggestions on what to use to replace each function and constant.

To Reproduce

git grep "\bimp\b"

Expected behavior
A clear and concise description of what you expected to happen.

Works on Python 3.12 using importlib.

git tags for releases?

Would it be possible to get git tags for releases? This helps with packaging outside of pypi, eg. on NixOS. I looked through the commit history, but couldn't find a commit that looked like a release.

Proper release

Hi,
This code is generating a lot of interest in the advanced research computing community.
Unfortunately, it currently can not be supported on most clusters because of its lack of release versions.
This makes it very difficult to install software in a reproducible way.

Any plan to address this would be appreciated:

Make proper release versions (I recommend Semantic versionning: https://semver.org/lang)

Thanks!

Plan for next version?

Hi,

There have some important updates (such as supporting enum) since the current version. Do you have any plan to upgrade the version for the release? It will be more convenient to users, otherwise we have to pip install from the latest commit.

This is not a bug, but I cannot remove the bug label.

FrozenConfigDict can't initialize from ConfigDict with tuples

Thanks for open-sourcing this nice package!

I don't think I see why in principle FrozenConfigDict can't include fields which are tuples of frozen config dicts, given that Python tuples themselves are already frozen. Is is just a matter of not adding support yet, or is there something more fundamental?

For example,

FrozenConfigDict({'a': ({'b': 1}, {'b': 2})})

gives the error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-133-a093d88df5a2> in <module>
----> 1 c=FrozenConfigDict({'a': ({'b': 1}, {'b': 2})})

~/opt/anaconda3/lib/python3.7/site-packages/ml_collections/config_dict/config_dict.py in __init__(self, initial_dictionary, type_safe)
   1635                                     type_safe=type_safe)
   1636 
-> 1637     _frozenconfigdict_valid_input(initial_configdict)
   1638     # This will define the self._configdict attribute
   1639     _frozenconfigdict_fill_seed(self, initial_configdict)

~/opt/anaconda3/lib/python3.7/site-packages/ml_collections/config_dict/config_dict.py in _frozenconfigdict_valid_input(obj, ancestor_list)
   1418   if isinstance(obj, ConfigDict):
   1419     for value in obj.values():
-> 1420       _frozenconfigdict_valid_input(value, ancestor_list)
   1421   elif isinstance(obj, FieldReference):
   1422     _frozenconfigdict_valid_input(obj.get(), ancestor_list)

~/opt/anaconda3/lib/python3.7/site-packages/ml_collections/config_dict/config_dict.py in _frozenconfigdict_valid_input(obj, ancestor_list)
   1424     for element in obj:
   1425       if isinstance(element, (dict, ConfigDict, FieldReference)):
-> 1426         raise ValueError('Bad FrozenConfigDict initialization: Cannot '
   1427                          'contain a dict, ConfigDict, or FieldReference '
   1428                          'within a list or tuple.')

ValueError: Bad FrozenConfigDict initialization: Cannot contain a dict, ConfigDict, or FieldReference within a list or tuple.

ConfigDict does not allow dictionary key to be int type

Describe the bug

ConfigDict() does not allow dictionary with int data type as key

To Reproduce

from ml_collections import ConfigDict

tmp = ConfigDict()
tmp.demo0 = 10                         # no bug 
tmp.demo1 = '10'                       # no bug 
tmp.demo2 = {'0': '10', '1': '20'}     # no bug 
tmp.demo3 = {0: '10', 1: '20'}         # bug

ConfigDict

ConfigFlags

im not sure how to provide config flags, therefore i created a brand new conda virtual env to demonstrate the bug.

~$ conda create --name bug python=3.8
~$ conda activate bug
~$ pip install ml-collections
~$ python

    from ml_collections import ConfigDict
    tmp = ConfigDict()
    tmp.demo = {0: '10', 1: '20'}

Expected behavior

the should run, and save the dictionary without issue

Environment:

OS: ubuntu
OS Version: 20
Python: 3.8

Additional context

when I changed the key to str type, the code runs. However I want to be able to use str type.

Saving ConfigDict to File -- Using ABSL

The Problem:

I'm currently using the following pattern to load a config from dictionary

from absl import app, flags
from ml_collections import config_flags

from .config import MY_CONFIG_DICT

FLAGS = flags.FLAGS

_CONFIG = config_flags.DEFINE_config_from_dict('config', MY_CONFIG_DICT)

# ...

absl allows me to alter the config from the command line - this is great for running experiments with different parameters. However, if I want to re-use the same configuration for another run, there doesn't seem to be a de-facto way to do this.

For example, I want to be able to run:

python -m src.experiment --config.value1 10 --config.value2 20

and then be able to re-use this config without having to remember to flags.

How can we save and re-load a ConfigDict instance when using absl flags?

My Approach:

My current approach to this is:

In the experiment script:

Load config into flags as above.
Save the updated ConfigDict to a .yml file.

In a postprocessing script:

Load the original config using the same method.
Provide an additional flag which takes a path to the saved .yml file.
Update the original config as follows:

# postprocessing.py

from absl import flags
from ml_collections import config_dict, config_flags

from .config import MY_CONFIG_DICT

FLAGS = flags.FLAGS

_CONFIG = config_flags.DEFINE_config_from_dict('config', MY_CONFIG_DICT)
_LOAD_CONFIG = flags.DEFINE_string('load_config', None, '.yml file from which to update config.')

def update_config():

    def _update_config_dict(a, b) -> config_dict.ConfigDict:

        da = a.to_dict(preserve_field_references=True)
        db = b.to_dict(preserve_field_references=True)

        return config_dict.ConfigDict(da | db)

    with open(FLAGS.load_config, 'r') as f:
        config_from_yaml = yaml.load(f, Loader=yaml.UnsafeLoader)

    FLAGS.config = _update_config_dict(FLAGS.config, config_from_yaml)

def main(_):

    update_config()

    # ...

While this works it doesn't seem a great approach.

Is there any standard way to accomplish this - or should I be dealing with updated configs in a different manner?

Possible to delete a key?

Say I have a configuration like

configs = ConfigDict()
configs.num_neurons = 15
config.activation = "relu"

Now I would like to delete activation. Is it possible?

ml_collections / absl -- cannot read config updates from --flagfile

Describe the bug
When using config_flags.DEFINE_config_dict we are unable to use --flagfile to override values, we get the error:

FATAL Flags parsing error: Unknown command line flag ...

To Reproduce

# test.py

from absl import app, flags
from ml_collections import config_dict, config_flags

import copy


CONFIG = config_dict.ConfigDict(
    {
        'animal': {
            'cat': 10,
            'dog': 20
        },
        'vehicle': {
            'car': 30,
            'train': 40
        }
    }
)


FLAGS = flags.FLAGS
# _CONFIG = config_flags.DEFINE_config_dict('config', copy.deepcopy(CONFIG))
_CONFIG = config_flags.DEFINE_config_dict('config', CONFIG)


def main(_):

    print(FLAGS.config)


if __name__ == '__main__':
    app.run(main)

# flagfile.txt
--config.animal.cat=5000

$ python test.py --flagfile=flagfile.txt

FATAL Flags parsing error: Unknown command line flag 'config.animal.cat'
Pass --helpshort or --helpfull to see help on flags.

Expected behavior
absl should be able to read the changes to config from the flagfile.txt, something is going wrong here.

Environment:

OS: MacOS
OS Version: macOS Venture (13.0)
Python: 3.10.0

requirements.txt missing from PyPI source tarball

Describe the bug

The requirements.txt file is missing from the source tarball that was pushed to PyPI (see https://pypi.org/project/ml-collections/0.1.0/#files). That means installing from that source tarball is broken, because the setup.py tries to read requirements.txt, which leads to:

    Running setup.py (path:/tmp/eb-hdz3kwtm/pip-req-build-7djg6soh/setup.py) egg_info for package from file:///tmp/ml_collections/ml_collections-0.1.0
    Created temporary directory: /tmp/eb-hdz3kwtm/pip-pip-egg-info-s7iw7xqe
    Running command python setup.py egg_info
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/eb-hdz3kwtm/pip-req-build-7djg6soh/setup.py", line 42, in <module>
        install_requires=_parse_requirements('requirements.txt'),
      File "/tmp/eb-hdz3kwtm/pip-req-build-7djg6soh/setup.py", line 23, in _parse_requirements
        with open(requirements_txt_path) as fp:
    FileNotFoundError: [Errno 2] No such file or directory: 'requirements.txt'

To Reproduce

download ml_collections-0.1.0.tar.gz from https://pypi.org/project/ml-collections/0.1.0/#files
unpack, and try to install with pip install .

ConfigDict

(none, irrelevant)

ConfigFlags

(none, irrelevant)

Expected behavior

Installing from PyPI source tarball works.

Environment:

OS: Linux
OS Version: RHEL8
Python: Python 3.8.6

Additional context

(none)

Any way to convert a dataclass object into a ConfigDict?

Dataclasses are still the way to go for some libraries. Is there any simple way to convert a dataclass object into an ConfigDict?

Failed to install

Environment:

pip version: 22.2
Python version: 3.7.2

Hi, I've tried to install ml-collections with code "pip install ml-collections" under terminal window in Pycharm. But it returns ERROR: No .egg-info directory found in C:\xxx\xxx... Did this happen because the version of python doesn't satisfy the requirement or something else? Could you please tell me how to install this package successfully.

Field types can change after being set to None

I find this behavior surprising with regards to the type safety ConfigDict offers:

c = ConfigDict({'a': 1})
c.a = None
c.a = 'string'

I've managed to change the type of a from integer to string without any invocation of ignore_type. Would it perhaps make sense for the a field to "remember" that it was originally an int?

This is on the master branch (cd29283).

Feature request: Expose `_apply_op` as a public method

Hi,

I have found the private function _apply_op to be quite useful for custom lazy computation. For example:

import ml_collections

structures = ml_collections.FieldReference(
    [
        "parotid_right",
        "parotid_left",
        "brainstem",
        "spinal_cord",
    ],
    field_type=list,
)

################################################
number_of_structures = structures._apply_op(len)
number_of_structures._field_type = int
################################################

edge_length = ml_collections.FieldReference(512, field_type=int)

cfg = ml_collections.ConfigDict(
    {
        "structures": structures,
        "edge_length": edge_length,
        "mask_shape": (edge_length, edge_length, number_of_structures),
    }
)

assert cfg.mask_shape == (512, 512, 4)  # Passes

cfg.structures = ["brainstem", "spinal_cord"]
cfg.edge_length = 64
assert cfg.mask_shape == (64, 64, 2)  # Passes

Might it be possible that this method be exposed as a first class method named apply_op (or more simply apply)? I'd be happy to make a PR including the above as a test if there is interest in making this design adjustment.

Here is an example implementation:

def apply(parent: ml_collections.FieldReference, fn, field_type=None):
    child = parent._apply_op(fn)  # pylint: disable = protected-access

    if field_type is not None:
        child._field_type = field_type  # pylint: disable = protected-access

    a_child_value = child.get()
    if not isinstance(
        a_child_value, child._field_type  # pylint: disable = protected-access
    ):
        raise TypeError(
            f"Expected operation result to be of type {field_type}. "
            f"Instead however it was {type(a_child_value)}. "
            "Adjust the `field_type` parameter accordingly."
        )

    return child

and a corresponding test:

def test_config_apply():
    fr = ml_collections.FieldReference("a,b,c", field_type=str)

    def fn(string):
        return string.split(",")

    with pytest.raises(TypeError):
        apply(fr, fn)

    new_fr = apply(fr, fn, field_type=list)
    assert new_fr.get() == ["a", "b", "c"]

Cheers,
Simon

'import ml_collections' and 'import logging' occured a bug.

***run:
import ml_collections
import os
import logging

def get_logger(logpath, filepath, package_files=[], displaying=True, saving=True, debug=False):
logger = logging.getLogger(name)
if debug:
level = logging.DEBUG
else:
level = logging.INFO
logger.setLevel(level)
if saving:
info_file_handler = logging.FileHandler(logpath, mode="a")
info_file_handler.setLevel(level)
logger.addHandler(info_file_handler)
if displaying:
console_handler = logging.StreamHandler()
console_handler.setLevel(level)
formatter = logging.Formatter('%(asctime)s -- [%(levelname)s] -- %(message)s') # '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
console_handler.setFormatter(formatter)
logger.addHandler(console_handler)
logger.info(filepath)
with open(filepath, "r") as f:
logger.info(f.read())

for f in package_files:
    logger.info(f)
    with open(f, "r") as package_f:
        logger.info(package_f.read())
return logger

log = get_logger(logpath='../exp/log.txt', filepath=os.path.abspath(file))
log.info('fasd')

***output:
2023-01-13 18:14:09,523 -- [INFO] -- fasd
I0113 18:14:09.523282 139717526757952 vit_seg_configs.py:164] fasd
The second line of output is garbled.

google / ml_collections Goto Github PK

ml_collections's Issues

The Problem:

My Approach:

Recommend Projects

Recommend Topics

Recommend Org