Describe the bug There seems to be a mismatch betwe

Thanks for reporting the issue <a class="user-mention notranslate" data-hovercard-type

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks for your research and explanation <a class="user-mention notranslate" data-hove

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

upload_to_hf_hub model path mismatch with model.save,about ludwig-ai/ludwig

Comments (16)

osma commented on June 4, 2024 1

FWIF, I was able to work around the problem by creating the directory finetuned-model/model and moving all files below that directory.

from ludwig.

arnavgarg1 commented on June 4, 2024 1

Thanks for reporting the issue @osma and I'm glad you were able to find a workaround. @alexsherstinsky Will be taking a look at it over the next few days

from ludwig.

alexsherstinsky commented on June 4, 2024 1

@sanjaydasgupta Has graciously agreed to work on this issue.

from ludwig.

sanjaydasgupta commented on June 4, 2024 1

After poking around a bit in the code-base here is what I learned:

Changing the implementation of Ludwig.api.LudwigModel.save() so that it first suffixes a literal /model to the provided save_path (an intuitive but naive solution) breaks 7 or 8 tests. Inspection of those tests reveals that save() has a twin -- predictably named load(). Also, save() and load() are designed to play well together as is.

Therefore, changing save() so that it works more intuitively with upload_to_hf_hub() will entail changes to load() and possibly other things -- probably be a lot of work. One alternative to save that effort is to improve the documentation of save() to clearly state that the save_path provided must end in a model. That same save_path can then be passed to upload_to_hf_hub() successfully.

@alexsherstinsky please let me know what you think.

from ludwig.

alexsherstinsky commented on June 4, 2024 1

@sanjaydasgupta and @alexsherstinsky met to discuss this and agreed that there is a design issue that needs to be studied deeper, potentially affecting "ludwig/utils/upload_utils.py" and modules that use it.

from ludwig.

alexsherstinsky commented on June 4, 2024 1

@osma Your criticism is absolutely valid. Last night (Pacific Time), @sanjaydasgupta and I had a call to brainstorm how this can be improved. Your filing this issue and continued comments have been very helpful, and so your participation going forward is welcome. One idea that @sanjaydasgupta is currently evaluating is: What if we just modify _validate_upload_parameters() in ludwig/utils/upload_utils.py, such that we do not hard code the path in line 102 (trained_model_artifacts_path = os.path.join(model_path, "model", "model_weights")) but instead only use model_path -- and then enhance the validation so as to make sure that the files we need from the subdirectories model and model_weights are there. In fact, we already have the validation for the model_weights directory contents; we would just need to add the validation for the directory one above (i.e., model) for things that we care about, such as the hyperparameter configuration, which @sanjaydasgupta added last week. Then any model_path would work as long as its contents pass this validation; in this case, modifying tests would be acceptable. What do you both think? Thank you.

from ludwig.

sanjaydasgupta commented on June 4, 2024

#take

from ludwig.

osma commented on June 4, 2024

Thanks for your research and explanation @sanjaydasgupta !

From a naïve user perspective, what surprised me is that upload_to_hf_hub() requires a literal model directory. So instead of changing the implementation of save() and load(), which undoubtedly would cause many problems, I would suggest instead looking at whether this requirement in upload_to_hf_hub() could be lifted. The simplest possible fix would be to remove "model" from line 99 of upload_utils.py, which I quoted in my OP.

from ludwig.

alexsherstinsky commented on June 4, 2024

@sanjaydasgupta and @osma Thank you very much for reporting this issue and the detailed analysis.

@sanjaydasgupta Is it possible to investigate the implementation from the first principles? Let us figure out what is happening first, and then discuss potential fixes with backward-compatibility in mind to avoid breaking changes.

We know that the default results relative directory is /results/api_experiment_run/model/model_weights where the next model will be saved into /results/api_experiment_run_1/model/model_weights and then next into /results/api_experiment_run_2/model/model_weights, and so on. This is what we pass to upload_to_hf_hub(). The reason for that is that HuggingFace expects only such files as:

adapter_config.json
adapter_model.safetensors
README.md

because this is all that is needed for loading the adapter. The configuration file, which you added in your previous PR is needed for Ludwig fine-tuning reproducibility concerns, but it is not needed for loading adapters.

So it looks like the custom name finetuned-model replaced the generic api_experiment_run, api_experiment_run_1, api_experiment_run_2, etc. directory names, with the rest of the structure (/model and model_weights) preserved. Thus, loading to HuggingFace would be done using:

LudwigModel.upload_to_hf_hub(MY_HF_MODEL_ID, "/results/finetuned-model/model/model_weights")

(Hence, I would prefer using the name finetuned_model as traditionally with the underscore for clarity when saving and loading the model -- because this is the path on the local filesystem, not an ID in a cloud storage provider.)

If we try it and it works, then it means that we have to perhaps update the documentation in order to avoid confusion.

Please let me know what you think. Thank you!

from ludwig.

sanjaydasgupta commented on June 4, 2024

Hi @alexsherstinsky and @osma, thank you very much for your inputs. Allow me to add some more information about these two functions:

Both, LudwigModel.save() and LudwigModel.upload_to_hf_hub() take a path parameter as input. But the meaning of that parameter is different. Compare the published documentation for the two functions:

save()'s parameter save_path is defined as: path to the directory where the model is going to be saved. Both a JSON file containing the model architecture hyperparameters and checkpoints files containing model weights will be saved.
upload_to_hf_hub()'s parameter model_path is defined as: The path of the saved model. This is the top level directory where the models weights as well as other associated training artifacts are saved.

Despite the somewhat similar beginning of those two descriptions, they are different. Referring to the screenshot (see below) of a typical Ludwig run, upload_to_hf_hub() needs its model_path parameter to point to the api_experiment_run directory. But save() creates the contents of the model directory (as seen in the screenshot) in the directory pointed to by save_path.

This difference has existed since before PR-3965.

We can change the implementation of upload_to_hf_hub() to examine the contents of model_path, determine what it has been passed, and then do the right thing. That will also protect any existing code that uses it. upload_to_hf_hub() does not have any in-project clients, so it is easier and safer to change it's implementation rather than save()'s.

from ludwig.

alexsherstinsky commented on June 4, 2024

@sanjaydasgupta So it looks like the implementation is correct, it is just that possibly the documentation may need to be improved. The purpose of these functions is indeed different -- they are not symmetric. The save() method is Ludwig focused -- it saves the various artifacts that Ludwig creates, including the hyperparameters (the full config) and the actual model weights. However, the HuggingFace upload only concerns itself with the latter (and we recently added the config). So if we add any protection/checks, we should do it in a way that constitutes a non-breaking change for all users thus far. Thank you very much!

from ludwig.

sanjaydasgupta commented on June 4, 2024

Thanks for your research and explanation @sanjaydasgupta !

From a naïve user perspective, what surprised me is that upload_to_hf_hub() requires a literal model directory. So instead of changing the implementation of save() and load(), which undoubtedly would cause many problems, I would suggest instead looking at whether this requirement in upload_to_hf_hub() could be lifted. The simplest possible fix would be to remove "model" from line 99 of upload_utils.py, which I quoted in my OP.

LudwigModel.upload_to_hf_hub() is a thin wrapper that just invokes ludwig.utils.upload_utils.HuggingFaceHub.upload(). All implementation, including parameter checks, occurs after that point. Unfortunately there appear to be 10 or more other execution paths that run through HuggingFaceHub.upload(), so removing the requirement that the directory be named model will likely have a large impact.

from ludwig.

sanjaydasgupta commented on June 4, 2024

Hi @osma, I am trying to get a better understanding of your use-case.

It seems you had already fine-tuned an LLM, and then wanted to upload the fine-tuned model to HF. Normally, in this situation, calling upload_to_hf_hub() with a model_path pointing to the api_experiment_run directory is sufficient (it is not necessary to explicitly call save() before calling upload_to_hf_hub()). Am I missing something about your use-case?

from ludwig.

sanjaydasgupta commented on June 4, 2024

Thanks for the detailed discussion @alexsherstinsky.

from ludwig.

osma commented on June 4, 2024

@sanjaydasgupta

It seems you had already fine-tuned an LLM, and then wanted to upload the fine-tuned model to HF. Normally, in this situation, calling upload_to_hf_hub() with a model_path pointing to the api_experiment_run directory is sufficient (it is not necessary to explicitly call save() before calling upload_to_hf_hub()). Am I missing something about your use-case?

Yes, you are right. I was just learning to use Ludwig, trying to replace a brittle set of Python fine-tuning code (based on transformers, trl, bitsandbytes, peft...) with a higher level approach. So my mindset was that I first want to fine-tune a model, then save it (so it won't get lost after spending a long time fine-tuning it), then run some inference tests to make sure that the model does what I want to. I was looking at tutorials like the Zephyr example from Predibase (which is half Ludwig tutorial, half marketing of the Predibase API service, which seems to be a common pattern nowadays).

After a few iterations of testing, I was happy with the fine-tuned model, so I wanted to publish it on HF Hub. I looked at Ludwig API specs and found the promising upload_to_hf_hub method, so I tried to use it, and ended up in the situation described in the OP.

Nowhere in my thought process did I really think about the api_experiment_run directory. To me, that was some internal housekeeping stuff that Ludwig handles, not interesting unless I want to dig deeper into the details of the fine-tuning process (and my goal was rather to get away from lower level stuff into a higher abstraction level). I'm used to other tools and APIs for machine learning such as scikit-learn, Keras, torch, trl etc. and maybe my expectations were influenced by how those work; for example model.train() and model.save() are pretty common operations in most if not all of these.

I hope that explains my thinking. I'm just a beginner, trying to get an understanding of tools like Ludwig. Maybe reading some other documentation would have helped me avoid the problem. But I think you may have some deeper conceptual issues that could be solved to make your tool easier to learn and apply.

from ludwig.

alexsherstinsky commented on June 4, 2024

PR #3977 addresses this issue; hence, closing. Thank you @osma for reporting it and to @sanjaydasgupta for fixing it!

from ludwig.

upload_to_hf_hub model path mismatch with model.save about ludwig HOT 16 CLOSED

Comments (16)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent