Comments (3)
Thanks for raising this point. Unfortunately, it could be hard --- in some sense, just like adaptors are not folded into the weights due to non-linearity and thus increase compute overhead. For instance, if ReFT is applied in the middle of two linear layers, then yes, it could be potentially folded in by coupling the rotation weights with the linear layer weights.
Another reason why it is hard is that, we intervene not on all positions, but just on very limited positions (e.g., the first n tokens and last n tokens in the prompt). And these intervening positions depend on the input. As a result, the interventions have to happen during run-time to target dynamic locations.
Hopefully, these are helpful!
from pyreft.
This is helpful in understanding how it works. From tinkering with it over the last few weeks, it seems unique in how it works, and would probably need to be built from scratch to work on something other than torch/transformers - is that a fair assumption? Was looking into what it would take to do mlx weeks ago, came back to it this weekend, and given pyreft requires pyvene, it sounds like much more than a weekend project.
Fun though - really neat to be able to steer it so easily.
from pyreft.
closing this issue for now, as the MLX support will be tracked in pyvene: stanfordnlp/pyvene#67
from pyreft.
Related Issues (20)
- [P1] I am bit confused how to reproduce Table 2 (all baselines + main method) HOT 3
- [P0] Adding DPO Support HOT 8
- [P0] Why is the number of trainable parameters for prefix-tuning is 0.11% HOT 7
- [P1] TypeError: Object of type type is not JSON serializable HOT 6
- [P2] Pyreft tensorboard integration
- [P1] Location of code for "LM training and serving with ReFT" HOT 2
- [P0] compreft.ipynb error = KeyError: 'subspaces' HOT 4
- [P1] Confirmation of alpaca_eval version HOT 4
- [P1] Intuitive-wise, should we keep the projection orthogonal during training? HOT 2
- [P1] catastrophic forgetting HOT 1
- [P1] RuntimeError: cutlassF: no kernel found to launch! HOT 4
- Getting issue while loading Phi3 in reft_model HOT 6
- [P1] How did you create the validation set for Commonsense reasoning hyperparameter tuning? HOT 5
- [P0] Additional intervention arguments are not saved correctly, e.g. `add_bias`
- [P1] Getting error as IntervenableModel.train() takes 1 positional argument but 2 were given HOT 2
- [P1] Convert reft model to hf model HOT 1
- [P0] Make `make_last_position_supervised_data_module` parallelizable to speed up processing! HOT 2
- [P1] Loading REFT fro RoBERTa Models HOT 3
- [P1] TypeError: train() takes 1 positional argument but 2 were given HOT 1
- [P1] Loreft example gsm8k train gives: RuntimeError: output with shape [64, 1, 7] doesn't match the broadcast shape [64, 0, 7] HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pyreft.