Comments (16)
FWIF, I was able to work around the problem by creating the directory finetuned-model/model
and moving all files below that directory.
from ludwig.
Thanks for reporting the issue @osma and I'm glad you were able to find a workaround. @alexsherstinsky Will be taking a look at it over the next few days
from ludwig.
@sanjaydasgupta Has graciously agreed to work on this issue.
from ludwig.
After poking around a bit in the code-base here is what I learned:
Changing the implementation of Ludwig.api.LudwigModel.save() so that it first suffixes a literal /model
to the provided save_path
(an intuitive but naive solution) breaks 7 or 8 tests. Inspection of those tests reveals that save()
has a twin -- predictably named load()
. Also, save()
and load()
are designed to play well together as is.
Therefore, changing save()
so that it works more intuitively with upload_to_hf_hub()
will entail changes to load()
and possibly other things -- probably be a lot of work. One alternative to save that effort is to improve the documentation of save() to clearly state that the save_path
provided must end in a model
. That same save_path can then be passed to upload_to_hf_hub()
successfully.
@alexsherstinsky please let me know what you think.
from ludwig.
@sanjaydasgupta and @alexsherstinsky met to discuss this and agreed that there is a design issue that needs to be studied deeper, potentially affecting "ludwig/utils/upload_utils.py" and modules that use it.
from ludwig.
@osma Your criticism is absolutely valid. Last night (Pacific Time), @sanjaydasgupta and I had a call to brainstorm how this can be improved. Your filing this issue and continued comments have been very helpful, and so your participation going forward is welcome. One idea that @sanjaydasgupta is currently evaluating is: What if we just modify _validate_upload_parameters()
in ludwig/utils/upload_utils.py
, such that we do not hard code the path in line 102 (trained_model_artifacts_path = os.path.join(model_path, "model", "model_weights")
) but instead only use model_path
-- and then enhance the validation so as to make sure that the files we need from the subdirectories model and model_weights are there. In fact, we already have the validation for the model_weights
directory contents; we would just need to add the validation for the directory one above (i.e., model
) for things that we care about, such as the hyperparameter configuration, which @sanjaydasgupta added last week. Then any model_path
would work as long as its contents pass this validation; in this case, modifying tests would be acceptable. What do you both think? Thank you.
from ludwig.
#take
from ludwig.
Thanks for your research and explanation @sanjaydasgupta !
From a naïve user perspective, what surprised me is that upload_to_hf_hub()
requires a literal model
directory. So instead of changing the implementation of save()
and load()
, which undoubtedly would cause many problems, I would suggest instead looking at whether this requirement in upload_to_hf_hub()
could be lifted. The simplest possible fix would be to remove "model"
from line 99 of upload_utils.py
, which I quoted in my OP.
from ludwig.
@sanjaydasgupta and @osma Thank you very much for reporting this issue and the detailed analysis.
@sanjaydasgupta Is it possible to investigate the implementation from the first principles? Let us figure out what is happening first, and then discuss potential fixes with backward-compatibility in mind to avoid breaking changes.
We know that the default results relative directory is /results/api_experiment_run/model/model_weights
where the next model will be saved into /results/api_experiment_run_1/model/model_weights
and then next into /results/api_experiment_run_2/model/model_weights
, and so on. This is what we pass to upload_to_hf_hub()
. The reason for that is that HuggingFace expects only such files as:
adapter_config.json
adapter_model.safetensors
README.md
because this is all that is needed for loading the adapter. The configuration file, which you added in your previous PR is needed for Ludwig fine-tuning reproducibility concerns, but it is not needed for loading adapters.
So it looks like the custom name finetuned-model
replaced the generic api_experiment_run
, api_experiment_run_1
, api_experiment_run_2
, etc. directory names, with the rest of the structure (/model
and model_weights
) preserved. Thus, loading to HuggingFace would be done using:
LudwigModel.upload_to_hf_hub(MY_HF_MODEL_ID, "/results/finetuned-model/model/model_weights")
(Hence, I would prefer using the name finetuned_model
as traditionally with the underscore for clarity when saving and loading the model -- because this is the path on the local filesystem, not an ID in a cloud storage provider.)
If we try it and it works, then it means that we have to perhaps update the documentation in order to avoid confusion.
Please let me know what you think. Thank you!
from ludwig.
Hi @alexsherstinsky and @osma, thank you very much for your inputs. Allow me to add some more information about these two functions:
Both, LudwigModel.save() and LudwigModel.upload_to_hf_hub() take a path parameter as input. But the meaning of that parameter is different. Compare the published documentation for the two functions:
-
save()
's parametersave_path
is defined as: path to the directory where the model is going to be saved. Both a JSON file containing the model architecture hyperparameters and checkpoints files containing model weights will be saved. -
upload_to_hf_hub()
's parametermodel_path
is defined as: The path of the saved model. This is the top level directory where the models weights as well as other associated training artifacts are saved.
Despite the somewhat similar beginning of those two descriptions, they are different. Referring to the screenshot (see below) of a typical Ludwig run, upload_to_hf_hub()
needs its model_path
parameter to point to the api_experiment_run
directory. But save()
creates the contents of the model
directory (as seen in the screenshot) in the directory pointed to by save_path
.
This difference has existed since before PR-3965.
We can change the implementation of upload_to_hf_hub()
to examine the contents of model_path
, determine what it has been passed, and then do the right thing. That will also protect any existing code that uses it. upload_to_hf_hub()
does not have any in-project clients, so it is easier and safer to change it's implementation rather than save()
's.
from ludwig.
@sanjaydasgupta So it looks like the implementation is correct, it is just that possibly the documentation may need to be improved. The purpose of these functions is indeed different -- they are not symmetric. The save()
method is Ludwig focused -- it saves the various artifacts that Ludwig creates, including the hyperparameters (the full config) and the actual model weights. However, the HuggingFace upload only concerns itself with the latter (and we recently added the config). So if we add any protection/checks, we should do it in a way that constitutes a non-breaking change for all users thus far. Thank you very much!
from ludwig.
Thanks for your research and explanation @sanjaydasgupta !
From a naïve user perspective, what surprised me is that
upload_to_hf_hub()
requires a literalmodel
directory. So instead of changing the implementation ofsave()
andload()
, which undoubtedly would cause many problems, I would suggest instead looking at whether this requirement inupload_to_hf_hub()
could be lifted. The simplest possible fix would be to remove"model"
from line 99 ofupload_utils.py
, which I quoted in my OP.
LudwigModel.upload_to_hf_hub()
is a thin wrapper that just invokes ludwig.utils.upload_utils.HuggingFaceHub.upload(). All implementation, including parameter checks, occurs after that point. Unfortunately there appear to be 10 or more other execution paths that run through HuggingFaceHub.upload()
, so removing the requirement that the directory be named model
will likely have a large impact.
from ludwig.
Hi @osma, I am trying to get a better understanding of your use-case.
It seems you had already fine-tuned an LLM, and then wanted to upload the fine-tuned model to HF. Normally, in this situation, calling upload_to_hf_hub()
with a model_path
pointing to the api_experiment_run
directory is sufficient (it is not necessary to explicitly call save()
before calling upload_to_hf_hub()
). Am I missing something about your use-case?
from ludwig.
Thanks for the detailed discussion @alexsherstinsky.
from ludwig.
It seems you had already fine-tuned an LLM, and then wanted to upload the fine-tuned model to HF. Normally, in this situation, calling upload_to_hf_hub() with a model_path pointing to the api_experiment_run directory is sufficient (it is not necessary to explicitly call save() before calling upload_to_hf_hub()). Am I missing something about your use-case?
Yes, you are right. I was just learning to use Ludwig, trying to replace a brittle set of Python fine-tuning code (based on transformers, trl, bitsandbytes, peft...) with a higher level approach. So my mindset was that I first want to fine-tune a model, then save it (so it won't get lost after spending a long time fine-tuning it), then run some inference tests to make sure that the model does what I want to. I was looking at tutorials like the Zephyr example from Predibase (which is half Ludwig tutorial, half marketing of the Predibase API service, which seems to be a common pattern nowadays).
After a few iterations of testing, I was happy with the fine-tuned model, so I wanted to publish it on HF Hub. I looked at Ludwig API specs and found the promising upload_to_hf_hub
method, so I tried to use it, and ended up in the situation described in the OP.
Nowhere in my thought process did I really think about the api_experiment_run
directory. To me, that was some internal housekeeping stuff that Ludwig handles, not interesting unless I want to dig deeper into the details of the fine-tuning process (and my goal was rather to get away from lower level stuff into a higher abstraction level). I'm used to other tools and APIs for machine learning such as scikit-learn, Keras, torch, trl etc. and maybe my expectations were influenced by how those work; for example model.train()
and model.save()
are pretty common operations in most if not all of these.
I hope that explains my thinking. I'm just a beginner, trying to get an understanding of tools like Ludwig. Maybe reading some other documentation would have helped me avoid the problem. But I think you may have some deeper conceptual issues that could be solved to make your tool easier to learn and apply.
from ludwig.
PR #3977 addresses this issue; hence, closing. Thank you @osma for reporting it and to @sanjaydasgupta for fixing it!
from ludwig.
Related Issues (20)
- Add Ludwig config json to output directory containing model weights HOT 3
- Improve docker build times for `ludwig-ray` and `ludwig-ray-gpu`
- Ray - protobuf issue HOT 5
- Support for Models stored in GCS bucket HOT 1
- GPU is not available
- Wandb on ludwigai/ludwig-ray-gpu:latest + ray throws AttributeError: module 'pydantic.fields' has no attribute 'ModelField'
- Token-level Probability Always 0.0 When Fine-tuning Llama2-7b Model on Single GPU
- Dependency issue HOT 2
- `RESPONSE` contains lot longer text than is expected based on the `output_features` and `max_sequence_length`.
- PyYAML error while installing with python 3.12 HOT 1
- Ray retraining fails with StopIteration exception when retraining a model with small datasets HOT 1
- Issues fine tuning Mistral HOT 1
- Issue fien tuning Falcon HOT 2
- Uploading model to HF HOT 1
- phi 3 error HOT 1
- Error running inference on Llama3 model HOT 4
- Add llava support in ludwig
- MNIST Dataset can't be downloaded
- 4/5 trial fails due to lack of memory HOT 2
- Twitter Bots Example Overfits "Out-of-the-Box" HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from ludwig.