Git Product home page Git Product logo

rtfm's Issues

Inconsistent parameters

The readme states

rtfm/README.md

Line 67 in 9884a6b

python scripts/utils/prepare_csv_for_eval.py --output_dir ./eval_tasks/my_task

python scripts/utils/prepare_csv_for_eval.py --output_dir ./eval_tasks/my_task

which gives

ERROR: The function received no value for the required argument: target_colname
Usage: prepare_csv_for_eval.py CSV OUT_DIR TARGET_COLNAME TO_REGRESSION

so output_dir should be out_dir and to_regression is missing, but mainly target_colname is missing. However if I provide it it gets ignored in

generate_files_from_csv(csv, out_dir, to_regression=to_regression)

And then overwritten and inferred here

target_colname = df.columns[-1]

target_colname = df.columns[-1]

To solve this, target_colname should be optional and only if not given, it should be inferred from df.columns[-1]. Also to_regression should have a default to_regression: bool = False. What do you think?

Using wrong files for training

In the readme

rtfm/README.md

Line 41 in 9884a6b

--train-task-file "./sampledata/v6.0.3-serialized/test/test-files.txt" \

there is

...
  --train-task-file "./sampledata/v6.0.3-serialized/test/test-files.txt" \
  --eval-task-file "./sampledata/v6.0.3-serialized/train/train-files.txt" \
...

however --train-task-file should use train/train-files.txt and not swapped as it is currently, right?

Not usable absolute paths

In

https://github.com/mlfoundations/rtfm/blob/main/sampledata/v6.0.3-serialized/test/test-files.txt

there are absolute paths specific to one user like

/Users/jpgard/Documents/github/tablm/sampledata/v6.0.3-serialized/test/test-000002.tar

however they should be changed to relative paths, e.g.

rtfm/sampledata/v6.0.3-serialized/test/test-000002.tar

as it is also done in

https://github.com/mlfoundations/rtfm/blob/main/sampledata/v6.0.3-serialized/train/train-files.txt

But since all other paths are relative to rtfm it should actually be

sampledata/v6.0.3-serialized/test/test-000002.tar

and then also changed in the train files.

RuntimeError: Invalid device string: 'cuda:None'

During training (Tesla V100-PCIE-16GB) I get the following error

Train:   0%|                                                                                                                                           | 0/10 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/anaconda/envs/rtfm/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/anaconda/envs/rtfm/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/dev-medekm-gpu/code/Users/michael.medek/rtfm/rtfm/finetune.py", line 451, in <module>
    main(
  File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/dev-medekm-gpu/code/Users/michael.medek/rtfm/rtfm/finetune.py", line 408, in main
    results = train(
  File "/mnt/batch/tasks/shared/LS_root/mounts/clusters/dev-medekm-gpu/code/Users/michael.medek/rtfm/rtfm/train_utils.py", line 274, in train
    batch[key] = batch[key].to(f"cuda:{local_rank}")
RuntimeError: Invalid device string: 'cuda:None'
Train:   0%| 

Which traces to here

batch[key] = batch[key].to(f"cuda:{local_rank}")

where local_rank is None, thus Invalid device string: 'cuda:None'. How is this supposed to work? The default of the function is local_rank=None which should be invalid, since it must be int, right? In evaluate() there is only local_rank: int.

By adding

local_rank = 0
rank = 0
print("WARNING! Overwriting local_rank and rank to 0!")

this issue is worked around.

TrainConfig does not contain serializer_cls

Hello! I was trying to run the inference.ipynb notebook, and I get an AttributeError in the first cell, because the 'TrainConfig' object has no attribute ‘serializer_cls'.
Screenshot 2024-08-05 at 13 05 42

What is the correct way to run serialization?

Best regards,
Anna Badalyan

Deprecated readme path of evaluate_checkpoint.py

In the readme there still is

python scripts/utils/prepare_csv_for_eval.py ...

however the file was moved and it should be now

python rtfm/evaluation/evaluate_checkpoint.py

or

python -m rtfm.evaluation.evaluate_checkpoint ...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.