pratyushasharma / laser Goto Github PK
View Code? Open in Web Editor NEWThe Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
Home Page: https://pratyushasharma.github.io/laser/
License: MIT License
The Truth Is In There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction
Home Page: https://pratyushasharma.github.io/laser/
License: MIT License
Hi,
Great work on this codebase! Would you mind adding a license, ie MIT/Apache/ISC?
Thank you!
Thanks for publishing this excellent work. If I understand correctly, you run LASER intervention separately for each evaluation task.
Would it be possible to make one LASER model that is generic to all tasks? My goal is to compress LLAMA-v2-7B to be smaller, for executing faster on mobile devices.
Also, is it correct that you just apply LASER to one layer of the model? I was wondering, did you try applying it to most of the layers?
Title.
Hello! Thanks for your idea and codes, and I am applying the code to my model. There are two questions for me now:
Thank you!
Can you consider implementing LASER on three-dimensional tensors? For example, use this method for the conv3d architecture.
Do you publish the rank-reduced models anywhere?
In sections 5.1 and 5.2, I have a few questions
Reference:
Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and editing factual associations in
GPT. Advances in Neural Information Processing Systems, 36, 2022.
Thanks for the awesome work. I would like to try it out. What is the ETA on the code?
Thanks for providing the code for this promising research. I'm looking forward to see how far this idea can be pushed. It would be especially cool if a heuristic could be found that applies this technique to multiple layers chosen in a way that works across different models.
When I investigated the results a bit closer and ran some of the benchmarks locally, I came across some potential issues. Specifically, I took a look at the BigBench-Epistemic Reasoning benchmark, but I suspect that others could also be affected. First of all, I noticed that the accuracy of the models without intervention was below 50% (Tab. 1). For a binary classification task, this is strange. When debugging the results, I found that for Roberta and GPT-J (haven't tested Llama), the models always predicted the same label, and since that label was used in 37% of samples in the dataset, that's also their accuracy. As Llama has 63% accuracy with intervention, I suspect that it simply always predicts other label.
Digging a bit deeper, I found the logits for the label tokens to be extremely small. This typically happens when the model is somehow "derailed" and wants to predict neither of the tokens. Sometimes, this simply comes down to tokenization: Often, the models try to predict " True" and " False" (leading whitespace) because this is how they tokenize the text. Other times, they want to go in a completely different direction. I would recommend to log the absolute probabilities of the label tokens and double-check when they are too low. Often, this can be fixed by slight adjustments to prompts or labels.
Also, there is a typo in this prompt: "entails" => "entail".
I hope this is helpful.
Hi,
Thank you so much for making this project! I see that there's a CLI argument for dataset_file
, do you know what I should point it to for the counterfact method?
Thank you!
This is the issue that contains list of all features for the upcoming refactoring:
A unified abstract class that does all common stuff like create command line arguments, make an LLM, and run the experiment. We may have only 1 file per LLM (or per LLM type) and this abstract class. We may not be able to get it down to a single file since certain LLMs like Roberta which are really Masked Language Models have a different procedure to computing accuracy and log-loss using the tokens.
Replace the use of rate with ρ which is used in the paper.
Add a feature to support memory reduction by storing separate U, S, V matrices rather than multiplying them back and loosing the memory advantage.
Add more LLMs, specifically, Mistral and other Llama2 versions and Phi models.
Release LLMs with optimally chosen reductions from Table3 of the paper https://arxiv.org/pdf/2312.13558.pdf.
If you have more requests, please paste them below. Do note that the first version of refactoring may not be able to do all of the above, but we'll do our best. We welcome PRs.
Hi! I really enjoyed the paper.
I've implemented a version that uses marchenko pastur in order to speed up the search, instead looking within a grid search.
If it's of your interest, we can join efforts.
I would be glad if you could take a look at https://github.com/cognitivecomputations/laserRMT
Congratulations for your work.
Best,
Fernando
Hello~ @pratyushasharma. Thanks for your effort and the code, I have been reproducing the result of Llama2-7B + TruthfulQA based on your code so that I can use your work as my baseline for further research, but I found that the results (i.e. accuracy) were almost the same, which is around 56.52 especially for the base model. I do not know what is wrong and I am still confused why that causes so much accuracy increase in Llama2-7B + TruthfulQA (around 5.7% in your result). I will appreciate it if you can help me check this result
Hi,
Great work on this! Is Mistral supported? Right now I only see GPT-J and Llama 2.
Thank you!
I should run which command to obtain the accuracy of the base model?
The github readme says 'rate' measures how much rank to retain, but in the code implements "results = torch.svd_lowrank(weight, q=desired_rank, niter=niter)" confuses me a bit. desired_rank means to retain, desired_rank = max_rank * k so k means how much to retain, then k = (10 - rate) * 0.1, so rate should mean how much to reduction? Is there something wrong with my understanding?
Hi,
Thanks for releasing this code. Does this codebase decrease the size of the model (ie file size, required VRAM)?
Thank you!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.