Git Product home page Git Product logo

laser's Introduction

pratyushasharma.github.io

laser's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

laser's Issues

License

Hi,
Great work on this codebase! Would you mind adding a license, ie MIT/Apache/ISC?
Thank you!

Generic model?

Thanks for publishing this excellent work. If I understand correctly, you run LASER intervention separately for each evaluation task.

Would it be possible to make one LASER model that is generic to all tasks? My goal is to compress LLAMA-v2-7B to be smaller, for executing faster on mobile devices.

Also, is it correct that you just apply LASER to one layer of the model? I was wondering, did you try applying it to most of the layers?

method of composing reductions across layers

Hello! Thanks for your idea and codes, and I am applying the code to my model. There are two questions for me now:

  1. The paper says "greedily search" over the parameters and have a "simple compose strategy" when composing reductions across layers。Does this mean that search the best rate in different later MLP layers and then simply compose them?
  2. Can I use a single command line to realize composing reductions across layers? Or i need to repeat doing intervention on a single layer for a few times to compose reductions?

Thank you!

Excellent work, looking forward to following up with further research!

In sections 5.1 and 5.2, I have a few questions

  1. In the counterfact dataset, we should not only compare the accuracy of top k, but also consider the ES,EM metrics mentioned by Meng, which may increase the probability of wrong words while increasing the probability of correct words (The native language of Danielle Darrieux is English. Danielle Darrieux's native language is French).
  1. I think if a specific LASER is performed on each dataset individually, although it will significantly improve the prediction performance, there is a risk of overfitting? I think a uniform set of hyperparameters should be found to reduce the RANK to demonstrate the effectiveness of this approach.
  2. Nevertheless, I think this is a very worthwhile endeavor and it gives us a very valuable insight into the inner workings of the transformer.

Reference:
Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and editing factual associations in
GPT. Advances in Neural Information Processing Systems, 36, 2022.

Potential improvements for evaluation

Thanks for providing the code for this promising research. I'm looking forward to see how far this idea can be pushed. It would be especially cool if a heuristic could be found that applies this technique to multiple layers chosen in a way that works across different models.

When I investigated the results a bit closer and ran some of the benchmarks locally, I came across some potential issues. Specifically, I took a look at the BigBench-Epistemic Reasoning benchmark, but I suspect that others could also be affected. First of all, I noticed that the accuracy of the models without intervention was below 50% (Tab. 1). For a binary classification task, this is strange. When debugging the results, I found that for Roberta and GPT-J (haven't tested Llama), the models always predicted the same label, and since that label was used in 37% of samples in the dataset, that's also their accuracy. As Llama has 63% accuracy with intervention, I suspect that it simply always predicts other label.

Digging a bit deeper, I found the logits for the label tokens to be extremely small. This typically happens when the model is somehow "derailed" and wants to predict neither of the tokens. Sometimes, this simply comes down to tokenization: Often, the models try to predict " True" and " False" (leading whitespace) because this is how they tokenize the text. Other times, they want to go in a completely different direction. I would recommend to log the absolute probabilities of the label tokens and double-check when they are too low. Often, this can be fixed by slight adjustments to prompts or labels.

Also, there is a typo in this prompt: "entails" => "entail".

I hope this is helpful.

Where to Get the Dataset

Hi,
Thank you so much for making this project! I see that there's a CLI argument for dataset_file, do you know what I should point it to for the counterfact method?
Thank you!

Feature Request for Upcoming Refactoring

This is the issue that contains list of all features for the upcoming refactoring:

  1. A unified abstract class that does all common stuff like create command line arguments, make an LLM, and run the experiment. We may have only 1 file per LLM (or per LLM type) and this abstract class. We may not be able to get it down to a single file since certain LLMs like Roberta which are really Masked Language Models have a different procedure to computing accuracy and log-loss using the tokens.

  2. Replace the use of rate with ρ which is used in the paper.

  3. Add a feature to support memory reduction by storing separate U, S, V matrices rather than multiplying them back and loosing the memory advantage.

  4. Add more LLMs, specifically, Mistral and other Llama2 versions and Phi models.

  5. Release LLMs with optimally chosen reductions from Table3 of the paper https://arxiv.org/pdf/2312.13558.pdf.

If you have more requests, please paste them below. Do note that the first version of refactoring may not be able to do all of the above, but we'll do our best. We welcome PRs.

Llama2-7B + TruthfulQA reproduce issue

Hello~ @pratyushasharma. Thanks for your effort and the code, I have been reproducing the result of Llama2-7B + TruthfulQA based on your code so that I can use your work as my baseline for further research, but I found that the results (i.e. accuracy) were almost the same, which is around 56.52 especially for the base model. I do not know what is wrong and I am still confused why that causes so much accuracy increase in Llama2-7B + TruthfulQA (around 5.7% in your result). I will appreciate it if you can help me check this result

Mistral Support

Hi,
Great work on this! Is Mistral supported? Right now I only see GPT-J and Llama 2.
Thank you!

what does the 'rate' parameters actually mean in code?

The github readme says 'rate' measures how much rank to retain, but in the code implements "results = torch.svd_lowrank(weight, q=desired_rank, niter=niter)" confuses me a bit. desired_rank means to retain, desired_rank = max_rank * k so k means how much to retain, then k = (10 - rate) * 0.1, so rate should mean how much to reduction? Is there something wrong with my understanding?

Question

Hi,
Thanks for releasing this code. Does this codebase decrease the size of the model (ie file size, required VRAM)?
Thank you!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.