No GQA implementation is found, so the model is not capable to scale to 70B for compos

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

The Project is not implemented for 70B llama? about llm-shearing HOT 7 OPEN

zhangzhenyu13 commented on August 20, 2024

The Project is not implemented for 70B llama?

from llm-shearing.

Comments (7)

zhangzhenyu13 commented on August 20, 2024 1

Pruning queries might cause the number of queries to be different in different groups. So maybe a group-based pruning is more reasonable? @zhangzhenyu13

Yes, your settings are right. We need to share z across groups.

from llm-shearing.

xiamengzhou commented on August 20, 2024

Hi, the modeling file currently does not support GQA, but should require minimal changes to support it. What you described should work perfectly :)

from llm-shearing.

ZhiYuanZeng commented on August 20, 2024

It seems that we need a hierarchical pruning scheme for gqa, group pruning and head pruning inside group? Since we need to keep the number of heads in each group the same.

from llm-shearing.

zhangzhenyu13 commented on August 20, 2024

It seems that we need a hierarchical pruning scheme for gqa, group pruning and head pruning inside group? Since we need to keep the number of heads in each group the same.

In order to make the pruned model be able to run tp, it would be better to keep the group num unchanged.
We only need to prune the query heads for each group, thus maybe a layer_num * group_num* group_heads_query z_group_query need to initialized.

from llm-shearing.

xiamengzhou commented on August 20, 2024

Pruning queries might cause the number of queries to be different in different groups. So maybe a group-based pruning is more reasonable? @zhangzhenyu13

from llm-shearing.

ZhiYuanZeng commented on August 20, 2024

Could we share the mask of query-heads among different groups?

Pruning queries might cause the number of queries to be different in different groups. So maybe a group-based pruning is more reasonable? @zhangzhenyu13

from llm-shearing.

Longyichen commented on August 20, 2024

Hi @zhangzhenyu13 I have some confusion. The author's composer llama file does not implement any GQA functionality. Did you implement GQA forward yourself? Which llama warehouse implementation version is better to refer to?

from llm-shearing.

Recommend Projects

The Project is not implemented for 70B llama? about llm-shearing HOT 7 OPEN

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent