damo-nlp-sg / clex Goto Github PK

View Code? Open in Web Editor NEW

69.0 4.0 11.0 100 KB

[ICLR 2024] CLEX: Continuous Length Extrapolation for Large Language Models

License: MIT License

Python 99.22% Shell 0.78%

large-language-models rotary-position-embedding long-context-modeling

clex's People

Contributors

Stargazers

Watchers

Forkers

codeaudit yvrjsharma edgscp darcstar-solutions-tech xiechengmude agokrani radarfudan bruinxiong chenxinan-fdu

clex's Issues

Questions about the code

Hi, I have several questions about the implementation.

For scaled_inv_freq during the validation, as I understand, it should be scale_inv_freq = self.freq_cached[int(t_val)]. It does't need to subtract 1.
In L97, if seq_len < self.max_position_embeddings, scale_factor would be zero so that L104 would encounter divide-zero error. It seems that // should be replaced with / for L97 and L105.
In ODELinear
- In L31, why assign alpha = 2 * t - 1 other than t?
- In L35, the calculation of delta_ntk_freq was not found at the paper, which is $$-\frac{2i}{d-2} \cdot 10000^{\frac i d} \cdot \alpha^{\frac{i}{d-2}+1}$$
- In L40-41, why x plus torch.log(time) and time_embed = delta_time / time? I feel a bit confused when comparing it with paper's Eq. (14).

Hi, I found that the forward and backward passes of odeint is very slow. It is probably caused by too much iterations during solving the Neural ODE. The backward process is similar to RNN's BPTT. Have you test the training latency in your experiments? How is it compared to the baselines settings, such as PI and Yarn.

TypeError: field() got an unexpected keyword argument 'choices'

Hi, I found an error when I ran the training code: TypeError: field() got an unexpected keyword argument 'choices'.
Can you tell me how to solve this problem? I think it may be caused by version of package dataclasses.

Unable to load model

Hi,

I am getting following error when trying to load the model using AutoModelFromCausalLM
Traceback (most recent call last):
File "", line 1, in
File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 526, in from_pretrained
config, kwargs = AutoConfig.from_pretrained(
File "/opt/conda/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1099, in from_pretrained
return config_class.from_dict(config_dict, **unused_kwargs)
File "/opt/conda/lib/python3.10/site-packages/transformers/configuration_utils.py", line 774, in from_dict
config = cls(**config_dict)
File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/configuration_llama.py", line 160, in init
self._rope_scaling_validation()
File "/opt/conda/lib/python3.10/site-packages/transformers/models/llama/configuration_llama.py", line 180, in _rope_scaling_validation
raise ValueError(
ValueError: rope_scaling must be a dictionary with with two fields, type and factor, got {'max_factor': 16, 'param_factor': 1, 'type': 'clex', 'factor': 1}

and when trying to load it via PhiForCausalLM, I got error during generate
File "/opt/conda/envs/clex/lib/python3.10/site-packages/torch/autograd/function.py", line 506, in apply
return super().apply(*args, **kwargs) # type: ignore[misc]
File "/opt/conda/envs/clex/lib/python3.10/site-packages/flash_attn/bert_padding.py", line 17, in forward
return torch.gather(
RuntimeError: index 17 is out of bounds for dimension 0 with size 7

Can you please guide me on how to set this up properly?

Potential bug of logn_scale

Hi, I just found that there is a potential bug for the logn implementation in this repo. As shown in line, the scale factor is math.log(k_len) / math.log(train_len) for each q. However, in Su's blog, it should be torch.arange(k_len).log() / math.log(train_len). Its implementation can also be found at ReRoPE

damo-nlp-sg / clex Goto Github PK

clex's People

Contributors

Stargazers

Watchers

Forkers

clex's Issues

Questions about the code

Slow training speed

TypeError: field() got an unexpected keyword argument 'choices'

Unable to load model

Potential bug of logn_scale

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent