I was previously able to compile llama 2 7B using ten

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Compilation errors for llama 2 models about transformers-neuronx HOT 8 CLOSED

dacorvo commented on June 26, 2024

Compilation errors for llama 2 models

from transformers-neuronx.

Comments (8)

dacorvo commented on June 26, 2024

I did more tests, changing the default compiler optimization option from -O2 to -O1, and I am able to use the same configurations I used with transformers-neuronx==0.6.106: batch_size=1 and n_positions=2048.

During inference, the device memory is at 64 Gb for the 13B model and 22 Gb for the 7B model.

I also tested with -O3 but got the same kind of errors.

from transformers-neuronx.

aws-rhsoln commented on June 26, 2024

Thank you for reporting the issue. We are replicating the issue on our end and will get back with a fix.

from transformers-neuronx.

santhoshkolloju commented on June 26, 2024

Hi
What’s the throughput tokens/sec did u get on 7 billion model ?

from transformers-neuronx.

dacorvo commented on June 26, 2024

With the 2.14.1 compiler (neuronx-cc), I am able to compile the llama2 7B model with -O1 for different batch sizes.

I tested several combinations of cores / batch size with the default maximum sequence length for llama model (2048).

Here are the results:

| cores/batch | 128 tokens | 512 tokens | 1024 tokens | 2048 tokens | Throughput   |
|-------------|------------|------------|-------------|-------------|--------------|
| 2c / bs2    | 8.5 s      | 34 s       | 69 s        | 143 s       | 29 tokens/s  |
| 2c / bs4    | 8.6 s      | 35 s       | 72 s        | 150 s       | 55 tokens/s  |
| 24c / bs2   | 1.3 s      | 5.4 s      | 11.5 s      | 22.8 s      | 180 tokens/s |
| 24c / bs4   | 1.4 s      | 5.8 s      | 11.5 s      | 24 s        | 341 tokens/s |

Note: I experienced extremely long compilation times for batch size 4 (more than 3 hours), even with -O1, when it takes only minutes for batch size 1 or 2.

from transformers-neuronx.

awsilya commented on June 26, 2024

@dacorvo thank you for confirming. Yes, batch 4 compilation time is an issue, we are working on it and it's been tracked elsewhere. I'm closing this one.

from transformers-neuronx.

awsilya commented on June 26, 2024

closing

from transformers-neuronx.

dacorvo commented on June 26, 2024

On most open-source projects, issues are closed only when they have been resolved, so that users:

users reporting the issue can be notified when a fix is pushed,
new users facing the issues later can be redirected to the proper version.
How can we track progress on these compilation errors now that you've closed this one ? Can you link it to the relevant issues ?

from transformers-neuronx.

hannanjgaws commented on June 26, 2024

Hi @dacorvo:

We confirmed that the Llama 7B compilation error you reported is fixed in the 2.15.2 Release. Can you install the latest Neuron SDK and try re-running your script to confirm that you no longer see compilation issues for this model?

from transformers-neuronx.

Compilation errors for llama 2 models about transformers-neuronx HOT 8 CLOSED

Comments (8)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent