System Info transformers ve

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

cc <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Thanks for the issue <a class="user-mention notranslate" data-hovercard-type="user" da

I think there are two ways to make this work <a class="user-me

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Error on fine tuning paligemma for object detection about transformers HOT 7 OPEN

hadariru commented on June 29, 2024

Error on fine tuning paligemma for object detection

from transformers.

Comments (7)

hadariru commented on June 29, 2024 1

@molbap
Yes, the evaluation part is giving me error.
Training itself is working fine. I can see finetune is working okay. (I checked by running prediction on the training data)

from transformers.

SangbumChoi commented on June 29, 2024 1

@muellerz @molbap @hadariru I think this happens because trainer accept the case when loss is None.

transformers/src/transformers/trainer.py

Line 3765 in ab0f050

if loss is not None:

when the loss is None and when you want to compute the metrics losses is not defined due to gather function for None in multi-gpu is useless. So you cannot del the losses variable since it has not been defined.

from transformers.

amyeroberts commented on June 29, 2024

cc @molbap

from transformers.

molbap commented on June 29, 2024

Thanks for the issue @hadariru - just one note, it looks like the fine-tuning itself is working (ie if you let loss go down and don't add eval), it's the evaluation part in Trainer that has an issue?
Seems the only way for losses to be not accessed would be prediction_step failing. cc @muellerzr in case you are familiar, will take a look at this soon

from transformers.

SangbumChoi commented on June 29, 2024

I think there are two ways to make this work

@hadariru Make sure that Paligemma returns the appropriate losses value (check if you set appropriate arguements)
@muellerz Or we can also set if else statement to the trainer for checking if that value can be deleted.

from transformers.

hadariru commented on June 29, 2024

@SangbumChoi
this is the model that I used

    model = PaliGemmaForConditionalGeneration.from_pretrained(
        object_detection_config.MODEL_ID,
        torch_dtype=object_detection_config.MODEL_DTYPE,
        device_map=device,
        revision=object_detection_config.MODEL_REVISION,
    )

I tried to backtrack the reason why loss is None.
I found out that self.label_names and loss_without_labels when it is evaluating is [] and False

I am not sure on what value to give or how to set label_names on trainer

from transformers.

hadariru commented on June 29, 2024

changing
data_collator = partial(self.data_collator, train=False) -> data_collator = partial(self.data_collator, train=True) on the get_eval_dataloader

gives me this error

Traceback (most recent call last):
  File "xxx", line 361, in <module>
    trainer.train()
  File "xxxlib/python3.11/site-packages/transformers/trainer.py", line 1885, in train
    return inner_training_loop(
           ^^^^^^^^^^^^^^^^^^^^
  File "xxxlib/python3.11/site-packages/transformers/trainer.py", line 2291, in _inner_training_loop
    self._maybe_log_save_evaluate(tr_loss, grad_norm, model, trial, epoch, ignore_keys_for_eval)
  File "xxxlib/python3.11/site-packages/transformers/trainer.py", line 2721, in _maybe_log_save_evaluate
    metrics = self.evaluate(ignore_keys=ignore_keys_for_eval)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xxxlib/python3.11/site-packages/transformers/trainer.py", line 3572, in evaluate
    output = eval_loop(
             ^^^^^^^^^^
  File "xxxlib/python3.11/site-packages/transformers/trainer.py", line 3780, in evaluation_loop
    all_preds.add(logits)
  File "xxxlib/python3.11/site-packages/transformers/trainer_pt_utils.py", line 326, in add
    self.tensors = nested_concat(self.tensors, tensors, padding_index=self.padding_index)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xxxlib/python3.11/site-packages/transformers/trainer_pt_utils.py", line 138, in nested_concat
    return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xxxlib/python3.11/site-packages/transformers/trainer_pt_utils.py", line 138, in <genexpr>
    return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xxxlib/python3.11/site-packages/transformers/trainer_pt_utils.py", line 138, in nested_concat
    return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xxxlib/python3.11/site-packages/transformers/trainer_pt_utils.py", line 138, in <genexpr>
    return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xxxlib/python3.11/site-packages/transformers/trainer_pt_utils.py", line 138, in nested_concat
    return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xxxlib/python3.11/site-packages/transformers/trainer_pt_utils.py", line 138, in <genexpr>
    return type(tensors)(nested_concat(t, n, padding_index=padding_index) for t, n in zip(tensors, new_tensors))
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xxxlib/python3.11/site-packages/transformers/trainer_pt_utils.py", line 140, in nested_concat
    return torch_pad_and_concatenate(tensors, new_tensors, padding_index=padding_index)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "xxxlib/python3.11/site-packages/transformers/trainer_pt_utils.py", line 99, in torch_pad_and_concatenate
    return torch.cat((tensor1, tensor2), dim=0)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: Sizes of tensors must match except in dimension 0. Expected size 454 but got size 482 for tensor number 1 in the list.
  0%|          | 10/24240 [00:27<18:35:03,  2.76s/it]

from transformers.

Error on fine tuning paligemma for object detection about transformers HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent