Comments (5)
What's your extrapolation setting? Is it identical to our paper? Maybe you can try window attention which is much easier to implement to see the performance first.
from torchscale.
If you use it on LongEval setting, I think it doesn't work to retrieve very long topics. The local techniques maintain the local modeling where ppl is more stable.
from torchscale.
Thank for the reply @sunyt32 ! I was actually using the rotary embedding as implemented in the LLaMa HF codes. I only implemented the BCA to help it extrapolate to longer context. I did very simple tests for debugging:
For example, I set the window size to be w
where my prompt is padded on the left to 2w
(e.g., w = 16, 32, 128
). (Do you think it's a reasonable case for debugging?) The LLama model worked well when I turned off the BCA. With BCA, I expect it to generate reasonable answers following the prompt, but I got gibberish like 6.666666 after the generation of three or five new tokens. I think this dummy case indicates there might be a bug in my codes. So I appreciate any additional information that can help me check the expected outputs and intermediate tensors (like the k,v cache and rotary positional embedding calculation with bca) in the context of generation, which would be super helpful!
Thanks a lot for your time!
from torchscale.
I see, the reason here is similar, the window attention actually doesn't have the ability for longer context. However, using BCA or window attention should not cause gibberish. The reasonable generation sequence is at least coherent.
I have to admit that the long context evaluation is much more reasonable nowadays...It's a wrong idea just to concentrate on ppl. Let's forget window attention styles...
ntk extrapolation is a good technique for these tasks. But xPos still has its values. Our experiments show that xPos+ntk will have a more stable performance than RoPE, including ppl and retrieval.
from torchscale.
Gotcha! Thanks for the nice advice! I'll try the other way you suggested!
from torchscale.
Related Issues (20)
- pip package does not contain RetNet HOT 2
- initialization of qkv HOT 3
- Compatibility with torchsummary HOT 1
- RuntimeError: The size of tensor a (5) must match the size of tensor b (2) at non-singleton dimension 0 HOT 3
- AttributeError: 'EncoderDecoderConfig' object has no attribute 'normalize_output' HOT 3
- BEiT3 Vision-Language Expert question HOT 4
- About training memory HOT 2
- Query about Retentive Network's Recurrent Representation HOT 1
- Chunk recurrent representation incorrect results HOT 7
- typo in normalization denominator in parallel retention? HOT 1
- about gamma/decay in RetNet HOT 2
- Question about RetNetRelPos HOT 2
- Question about the normalization in attention HOT 2
- [Minor issue] Discrepancy inside arxiv paper
- Training RetNet on A100 GPUs HOT 1
- Question regarding the configuration of decoder_retention_heads HOT 2
- Introducing padding_mask to RetNet HOT 2
- Wrong Rnm Normalization. HOT 1
- about the longnet's ppl HOT 2
- about attention mask
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from torchscale.