Comments (3)
hey @hSterz thanks for your question!
the reason is that RoBERTa
is not natively supported by pyvene
('pyreft
parent library). we thus use RoBERTa
to show, we can work with any torch models.
to use RoBERTa
, you have to setup your config as (with a string access to the component "model.layers[0].output"
; note that this is an example for llama model but the same concept here),
# get reft model
reft_config = pyreft.ReftConfig(representations={
"layer": 15, "component": "block_output",
# alternatively, you can specify as string component access,
# "component": "model.layers[0].output",
"low_rank_dimension": 4,
"intervention": pyreft.LoreftIntervention(embed_dim=model.config.hidden_size,
low_rank_dimension=4)})
reft_model = pyreft.get_reft_model(model, reft_config)
reft_model.set_device("cuda")
reft_model.print_trainable_parameters()
"""
trainable intervention params: 32,772 || trainable model params: 0
model params: 6,738,415,616 || trainable%: 0.00048634578018881287
"""
in our actual code, you can see how we did it as well here:
https://github.com/stanfordnlp/pyreft/blob/main/examples/loreft/train.py#L286
from pyreft.
Thank you for the reply @frankaging My question is how can I load a REFT module added and trained trained as described by your example?
from pyreft.
@hSterz got it! so, if the model is natively supported by pyvene (supported model can be found here), you can load the model as,
reft_model = pyreft.ReftModel.load("<your_directory>", model)
if the model is not supported by pyvene, you have either (1) add the support in pyvene and reinstall pyvene, or (2) reinitialize the pyreft model, and load manually by yourself. All the interventions can be accessed as reft_model.interventions
.
let me know if these help.
from pyreft.
Related Issues (20)
- [P1] Convert reft model to hf model HOT 1
- [P0] Make `make_last_position_supervised_data_module` parallelizable to speed up processing! HOT 2
- [P1] TypeError: train() takes 1 positional argument but 2 were given HOT 1
- [P1] Loreft example gsm8k train gives: RuntimeError: output with shape [64, 1, 7] doesn't match the broadcast shape [64, 0, 7] HOT 3
- forward() got an unexpected keyword argument 'unit_locations' HOT 2
- [P1] Model Compatibility HOT 2
- [P1] Transitioning from peft to pyreft for Classification Approach HOT 2
- [P1] Questions on differences between paper and code HOT 2
- [P1] Multiple Positions Intervention HOT 1
- [P0] Does this project support turning in 4bit or 8bit Quantify? HOT 5
- [P1] Loss decrease slow in readme demo when use NousResearch/Llama-2-7b-chat-hf HOT 2
- [P1] Is it possible to merge the base model + REFT model into only model? HOT 1
- transformers_modules.microsoft.Phi-3-mini-4k-instruct.d269012bea6fbe38ce7752c8940fea010eea3383.modeling_phi3.Phi3ForCausalLM HOT 1
- [P1] [Error] can not use bfloat16 and TypeError: Object of type type is not JSON serializable HOT 21
- [Major][pyreft-core] ReFT next release items HOT 1
- [P1] Possible to do batch inference? HOT 3
- [P1] TypeError: Object of type type is not JSON serializable HOT 4
- [P1] Experimental setup for instruction following experiments in the ReFT paper HOT 3
- [P1] Refactor ReftTrainer to save artifacts with the config HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyreft.