atinoda / text-generation-webui-docker Goto Github PK

View Code? Open in Web Editor NEW

361.0 361.0 75.0 53 KB

Docker variants of oobabooga's text-generation-webui, including pre-built images.

License: GNU Affero General Public License v3.0

Dockerfile 59.21% Shell 40.79%

text-generation-webui-docker's People

Contributors

Stargazers

Watchers

text-generation-webui-docker's Issues

Unable to load GGUF model (llama-cpu-nightly)

Using atinoda/text-generation-webui:llama-cpu-nightly, llama-cpu fails for another reason (likely missing GGUF support).

Loading ggml-model-f16.gguf…

2023-09-02 13:05:52 INFO:Loading ggml-model-f16.gguf...
text-generation-webui           | 2023-09-02 13:05:52 INFO:llama.cpp weights detected: /models/ggml-model-f16.gguf
text-generation-webui           | 2023-09-02 13:05:52 INFO:Cache capacity is 0 bytes
text-generation-webui           | llama_model_loader: loaded meta data with 18 key-value pairs and 291 tensors from /models/ggml-model-f16.gguf (version GGUF V2 (latest))
text-generation-webui           | llama_model_loader: - tensor    0:                token_embd.weight f16      [  4096, 32000,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor    1:           blk.0.attn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor    2:            blk.0.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor    3:            blk.0.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor    4:              blk.0.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor    5:            blk.0.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor    6:              blk.0.attn_k.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor    7:         blk.0.attn_output.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor    8:              blk.0.attn_q.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor    9:              blk.0.attn_v.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   10:           blk.1.attn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   11:            blk.1.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   12:            blk.1.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   13:              blk.1.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   14:            blk.1.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   15:              blk.1.attn_k.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   16:         blk.1.attn_output.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   17:              blk.1.attn_q.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   18:              blk.1.attn_v.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   19:          blk.10.attn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   20:           blk.10.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   21:           blk.10.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   22:             blk.10.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   23:           blk.10.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   24:             blk.10.attn_k.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   25:        blk.10.attn_output.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   26:             blk.10.attn_q.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   27:             blk.10.attn_v.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   28:          blk.11.attn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   29:           blk.11.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   30:           blk.11.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   31:             blk.11.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   32:           blk.11.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   33:             blk.11.attn_k.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   34:        blk.11.attn_output.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   35:             blk.11.attn_q.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   36:             blk.11.attn_v.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   37:          blk.12.attn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   38:           blk.12.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   39:           blk.12.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   40:             blk.12.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   41:           blk.12.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   42:             blk.12.attn_k.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   43:        blk.12.attn_output.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   44:             blk.12.attn_q.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   45:             blk.12.attn_v.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   46:          blk.13.attn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   47:           blk.13.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   48:           blk.13.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   49:             blk.13.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   50:           blk.13.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   51:             blk.13.attn_k.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   52:        blk.13.attn_output.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   53:             blk.13.attn_q.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   54:             blk.13.attn_v.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   55:          blk.14.attn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   56:           blk.14.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   57:           blk.14.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   58:             blk.14.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   59:           blk.14.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   60:             blk.14.attn_k.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   61:        blk.14.attn_output.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   62:             blk.14.attn_q.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   63:             blk.14.attn_v.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   64:          blk.15.attn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   65:           blk.15.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   66:           blk.15.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   67:             blk.15.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   68:           blk.15.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   69:             blk.15.attn_k.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   70:        blk.15.attn_output.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   71:             blk.15.attn_q.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   72:             blk.15.attn_v.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   73:          blk.16.attn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   74:           blk.16.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   75:           blk.16.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   76:             blk.16.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   77:           blk.16.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   78:             blk.16.attn_k.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   79:        blk.16.attn_output.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   80:             blk.16.attn_q.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   81:             blk.16.attn_v.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   82:          blk.17.attn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   83:           blk.17.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   84:           blk.17.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   85:             blk.17.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   86:           blk.17.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   87:             blk.17.attn_k.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   88:        blk.17.attn_output.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   89:             blk.17.attn_q.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   90:             blk.17.attn_v.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   91:          blk.18.attn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   92:           blk.18.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   93:           blk.18.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   94:             blk.18.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   95:           blk.18.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   96:             blk.18.attn_k.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   97:        blk.18.attn_output.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   98:             blk.18.attn_q.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor   99:             blk.18.attn_v.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  100:          blk.19.attn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  101:           blk.19.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  102:           blk.19.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  103:             blk.19.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  104:           blk.19.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  105:             blk.19.attn_k.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  106:        blk.19.attn_output.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  107:             blk.19.attn_q.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  108:             blk.19.attn_v.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  109:           blk.2.attn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  110:            blk.2.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  111:            blk.2.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  112:              blk.2.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  113:            blk.2.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  114:              blk.2.attn_k.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  115:         blk.2.attn_output.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  116:              blk.2.attn_q.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  117:              blk.2.attn_v.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  118:          blk.20.attn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  119:           blk.20.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  120:           blk.20.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  121:             blk.20.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  122:           blk.20.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  123:             blk.20.attn_k.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  124:        blk.20.attn_output.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  125:             blk.20.attn_q.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  126:             blk.20.attn_v.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  127:          blk.21.attn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  128:           blk.21.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  129:           blk.21.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  130:             blk.21.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  131:           blk.21.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  132:             blk.21.attn_k.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  133:        blk.21.attn_output.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  134:             blk.21.attn_q.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  135:             blk.21.attn_v.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  136:          blk.22.attn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  137:           blk.22.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  138:           blk.22.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  139:             blk.22.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  140:           blk.22.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  141:             blk.22.attn_k.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  142:        blk.22.attn_output.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  143:             blk.22.attn_q.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  144:             blk.22.attn_v.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  145:          blk.23.attn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  146:           blk.23.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  147:           blk.23.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  148:             blk.23.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  149:           blk.23.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  150:             blk.23.attn_k.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  151:        blk.23.attn_output.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  152:             blk.23.attn_q.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  153:             blk.23.attn_v.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  154:           blk.3.attn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  155:            blk.3.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  156:            blk.3.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  157:              blk.3.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  158:            blk.3.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  159:              blk.3.attn_k.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  160:         blk.3.attn_output.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  161:              blk.3.attn_q.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  162:              blk.3.attn_v.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  163:           blk.4.attn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  164:            blk.4.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  165:            blk.4.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  166:              blk.4.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  167:            blk.4.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  168:              blk.4.attn_k.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  169:         blk.4.attn_output.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  170:              blk.4.attn_q.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  171:              blk.4.attn_v.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  172:           blk.5.attn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  173:            blk.5.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  174:            blk.5.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  175:              blk.5.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  176:            blk.5.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  177:              blk.5.attn_k.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  178:         blk.5.attn_output.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  179:              blk.5.attn_q.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  180:              blk.5.attn_v.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  181:           blk.6.attn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  182:            blk.6.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  183:            blk.6.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  184:              blk.6.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  185:            blk.6.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  186:              blk.6.attn_k.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  187:         blk.6.attn_output.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  188:              blk.6.attn_q.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  189:              blk.6.attn_v.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  190:           blk.7.attn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  191:            blk.7.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  192:            blk.7.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  193:              blk.7.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  194:            blk.7.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  195:              blk.7.attn_k.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  196:         blk.7.attn_output.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  197:              blk.7.attn_q.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  198:              blk.7.attn_v.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  199:           blk.8.attn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  200:            blk.8.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  201:            blk.8.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  202:              blk.8.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  203:            blk.8.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  204:              blk.8.attn_k.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  205:         blk.8.attn_output.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  206:              blk.8.attn_q.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  207:              blk.8.attn_v.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  208:           blk.9.attn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  209:            blk.9.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  210:            blk.9.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  211:              blk.9.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  212:            blk.9.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  213:              blk.9.attn_k.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  214:         blk.9.attn_output.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  215:              blk.9.attn_q.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  216:              blk.9.attn_v.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  217:                    output.weight f16      [  4096, 32000,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  218:          blk.24.attn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  219:           blk.24.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  220:           blk.24.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  221:             blk.24.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  222:           blk.24.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  223:             blk.24.attn_k.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  224:        blk.24.attn_output.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  225:             blk.24.attn_q.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  226:             blk.24.attn_v.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  227:          blk.25.attn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  228:           blk.25.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  229:           blk.25.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  230:             blk.25.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  231:           blk.25.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  232:             blk.25.attn_k.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  233:        blk.25.attn_output.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  234:             blk.25.attn_q.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  235:             blk.25.attn_v.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  236:          blk.26.attn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  237:           blk.26.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  238:           blk.26.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  239:             blk.26.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  240:           blk.26.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  241:             blk.26.attn_k.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  242:        blk.26.attn_output.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  243:             blk.26.attn_q.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  244:             blk.26.attn_v.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  245:          blk.27.attn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  246:           blk.27.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  247:           blk.27.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  248:             blk.27.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  249:           blk.27.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  250:             blk.27.attn_k.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  251:        blk.27.attn_output.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  252:             blk.27.attn_q.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  253:             blk.27.attn_v.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  254:          blk.28.attn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  255:           blk.28.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  256:           blk.28.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  257:             blk.28.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  258:           blk.28.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  259:             blk.28.attn_k.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  260:        blk.28.attn_output.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  261:             blk.28.attn_q.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  262:             blk.28.attn_v.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  263:          blk.29.attn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  264:           blk.29.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  265:           blk.29.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  266:             blk.29.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  267:           blk.29.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  268:             blk.29.attn_k.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  269:        blk.29.attn_output.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  270:             blk.29.attn_q.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  271:             blk.29.attn_v.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  272:          blk.30.attn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  273:           blk.30.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  274:           blk.30.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  275:             blk.30.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  276:           blk.30.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  277:             blk.30.attn_k.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  278:        blk.30.attn_output.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  279:             blk.30.attn_q.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  280:             blk.30.attn_v.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  281:          blk.31.attn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  282:           blk.31.ffn_down.weight f16      [ 11008,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  283:           blk.31.ffn_gate.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  284:             blk.31.ffn_up.weight f16      [  4096, 11008,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  285:           blk.31.ffn_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  286:             blk.31.attn_k.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  287:        blk.31.attn_output.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  288:             blk.31.attn_q.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  289:             blk.31.attn_v.weight f16      [  4096,  4096,     1,     1 ]
text-generation-webui           | llama_model_loader: - tensor  290:               output_norm.weight f32      [  4096,     1,     1,     1 ]
text-generation-webui           | llama_model_loader: - kv   0:                       general.architecture str     
text-generation-webui           | llama_model_loader: - kv   1:                               general.name str     
text-generation-webui           | llama_model_loader: - kv   2:                       llama.context_length u32     
text-generation-webui           | llama_model_loader: - kv   3:                     llama.embedding_length u32     
text-generation-webui           | llama_model_loader: - kv   4:                          llama.block_count u32     
text-generation-webui           | llama_model_loader: - kv   5:                  llama.feed_forward_length u32     
text-generation-webui           | llama_model_loader: - kv   6:                 llama.rope.dimension_count u32     
text-generation-webui           | llama_model_loader: - kv   7:                 llama.attention.head_count u32     
text-generation-webui           | llama_model_loader: - kv   8:              llama.attention.head_count_kv u32     
text-generation-webui           | llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32     
text-generation-webui           | llama_model_loader: - kv  10:                          general.file_type u32     
text-generation-webui           | llama_model_loader: - kv  11:                       tokenizer.ggml.model str     
text-generation-webui           | llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr     
text-generation-webui           | llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr     
text-generation-webui           | llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr     
text-generation-webui           | llama_model_loader: - kv  15:                tokenizer.ggml.bos_token_id u32     
text-generation-webui           | llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32     
text-generation-webui           | llama_model_loader: - kv  17:            tokenizer.ggml.unknown_token_id u32     
text-generation-webui           | llama_model_loader: - type  f32:   65 tensors
text-generation-webui           | llama_model_loader: - type  f16:  226 tensors
text-generation-webui           | llm_load_print_meta: format         = GGUF V2 (latest)
text-generation-webui           | llm_load_print_meta: arch           = llama
text-generation-webui           | llm_load_print_meta: vocab type     = SPM
text-generation-webui           | llm_load_print_meta: n_vocab        = 32000
text-generation-webui           | llm_load_print_meta: n_merges       = 0
text-generation-webui           | llm_load_print_meta: n_ctx_train    = 2048
text-generation-webui           | llm_load_print_meta: n_ctx          = 2048
text-generation-webui           | llm_load_print_meta: n_embd         = 4096
text-generation-webui           | llm_load_print_meta: n_head         = 32
text-generation-webui           | llm_load_print_meta: n_head_kv      = 32
text-generation-webui           | llm_load_print_meta: n_layer        = 32
text-generation-webui           | llm_load_print_meta: n_rot          = 128
text-generation-webui           | llm_load_print_meta: n_gqa          = 1
text-generation-webui           | llm_load_print_meta: f_norm_eps     = 1.0e-05
text-generation-webui           | llm_load_print_meta: f_norm_rms_eps = 1.0e-05
text-generation-webui           | llm_load_print_meta: n_ff           = 11008
text-generation-webui           | llm_load_print_meta: freq_base      = 10000.0
text-generation-webui           | llm_load_print_meta: freq_scale     = 1
text-generation-webui           | llm_load_print_meta: model type     = 7B
text-generation-webui           | llm_load_print_meta: model ftype    = mostly F16
text-generation-webui           | llm_load_print_meta: model size     = 6.74 B
text-generation-webui           | llm_load_print_meta: general.name   = ..
text-generation-webui           | llm_load_print_meta: BOS token = 1 '<s>'
text-generation-webui           | llm_load_print_meta: EOS token = 2 '</s>'
text-generation-webui           | llm_load_print_meta: UNK token = 0 '<unk>'
text-generation-webui           | llm_load_print_meta: LF token  = 13 '<0x0A>'
text-generation-webui           | /scripts/docker-entrypoint.sh: line 69:    90 Illegal instruction     (core dumped) "${LAUNCHER[@]}"
text-generation-webui exited with code 132

I am able to run the model just using llama-cpp, not sure what is going wrong here. Please let me know if you have any insights.

Can't launch the UI by docker compose up

WARN[0000] /mnt/extradisk/text-generation-webui-docker/docker-compose.yml: `version` is obsolete 
Attaching to text-generation-webui
Error response from daemon: failed to create endpoint text-generation-webui on network text-generation-webui-docker_default: failed to add the host (veth7a52283) <=> sandbox (vetheb0e327) pair interfaces: operation not supported

I have git pull and docker compose pull but the issue is still unsolved. Thanks for your help!

Unable to run oobabot-plugin, missing python oobabot_plugin module

Currently able to install oobabot-plugin,but once installed it does not run due to not having the necessary module installed. Is there a way to specify additional modules to install on bootup?

Trying to load EXL2 with default-nightly image fails

Tried yesterday and today. Loading an EXL2 model fails with the below error. Tried with EXL2 and EXL2_HF loaders.

The models load successfully and work fine with the default image.

AMD / ROCM support

is there a way to run with AMD GPU ?

little tiny issue, ?syntax err

scripts/build_extensions.sh has windows lf's in it which crashes during build (docker desktop windows) (syntax err)

run dos2unix on it or copy it into notepad++ with the LF's set to Unix and it'll sail through.

Thanks for all the work.

Unable to Save Characters in Chat Mode

Steps to Reproduce:

Enable Chat Mode.
Navigate to Chat Settings and select the Characters tab.
Enter a new character and attempt to save it.
When prompted, enter a file name.

Expected Behavior:
The character's YAML file should be successfully saved in the character directory. The character should be available for selection even after reloading.

Actual Behavior:
The character's YAML file is not created, and the character is not available for selection after reloading.

Clarification on use of AutoGPTQ, etc.

Hi, I noticed the removal of AutoGPTQ build in the latest. Is it because of this bug?AutoGPTQ/AutoGPTQ#128

Sounds like @PanQiWei is on it, so that's good, but my biggest question is whether this effectively removes the GPU support. I admit I'm still a bit new to this space, so please let me know if there is some alternative mechanism in place now for GPU support. I had been downloading .safetensors versions of models, but should I be looking for something different now?

Running without a GPU

Hey,

I was wanting to check if it is possible to run this container without a GPU?

Thanks,

Container instantly crashes when trying to load GGUF

I am currently running the container on unraid. I have used the docker compose file as well as maually creating the container and changing storage mounts. I am able to download the models from hf and when I select the GGUF model from the drop down it selects the llama.cpp transformer. I have tried many different variations of settings but no combination works. This is also true of ctransformers. As soon as I click load the container crashes with no logs. I am passing in my gtx 1070 with 8gb of VRAM and it is visible from within the container by running nvidia-smi. I have tried the DEFAULT, NVIDIA and even snapshots from 2023. I am not sure what I am doing wrong

Intel Arc support

Intel Arc GPUs have their own images now, according to the developments in the upstream project. I do not have the hardware to test them - so please give them a go! Reports are welcomed.

how to configure for CPU only?

Not running Apple M1: third_party/cuda/bin/ptxas --version died with <Signals.SIGTRAP: 5>.

It looks like the default version (and cpu too) do not run on M1. I tried specifying - EXTRA_LAUNCH_ARGS="--listen --verbose --loader llama.cpp", that however did not have any effect.
It would be nice to have it fixed, since llama.cpp actually runs pretty well with smaller models on M1.
Any pointers?

[+] Building 0.0s (0/0)                                                                                         
[+] Running 2/0
 ✔ Container text-generation-webui                                                                                                                                             Recreated0.1s 
 ! text-generation-webui-docker The requested image's platform (linux/amd64) does not match the detected host platform (linux/arm64/v8) and no specific platform was requested 0.0s 
Attaching to text-generation-webui
text-generation-webui  | === Running text-generation-webui variant: 'DEFAULT' snapshot-2023-10-15 ===
text-generation-webui  | === (This version is 11 commits behind origin main) ===
text-generation-webui  | === Image build date: 2023-10-18 11:30:52 ===
text-generation-webui  | 2023-10-23 18:27:36 WARNING:
text-generation-webui  | You are potentially exposing the web UI to the entire internet without any access password.
text-generation-webui  | You can create one with the "--gradio-auth" flag like this:
text-generation-webui  | 
text-generation-webui  | --gradio-auth username:password
text-generation-webui  | 
text-generation-webui  | Make sure to replace username:password with your own.
text-generation-webui  | /venv/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
text-generation-webui  |   warn("The installed version of bitsandbytes was compiled without GPU support. "
text-generation-webui  | /venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32
text-generation-webui  | Traceback (most recent call last):
text-generation-webui  |   File "/app/server.py", line 31, in <module>
text-generation-webui  |     from modules import (
text-generation-webui  |   File "/app/modules/training.py", line 21, in <module>
text-generation-webui  |     from peft import (
text-generation-webui  |   File "/venv/lib/python3.10/site-packages/peft/__init__.py", line 22, in <module>
text-generation-webui  |     from .auto import (
text-generation-webui  |   File "/venv/lib/python3.10/site-packages/peft/auto.py", line 31, in <module>
text-generation-webui  |     from .mapping import MODEL_TYPE_TO_PEFT_MODEL_MAPPING
text-generation-webui  |   File "/venv/lib/python3.10/site-packages/peft/mapping.py", line 23, in <module>
text-generation-webui  |     from .peft_model import (
text-generation-webui  |   File "/venv/lib/python3.10/site-packages/peft/peft_model.py", line 38, in <module>
text-generation-webui  |     from .tuners import (
text-generation-webui  |   File "/venv/lib/python3.10/site-packages/peft/tuners/__init__.py", line 21, in <module>
text-generation-webui  |     from .lora import LoraConfig, LoraModel
text-generation-webui  |   File "/venv/lib/python3.10/site-packages/peft/tuners/lora.py", line 45, in <module>
text-generation-webui  |     import bitsandbytes as bnb
text-generation-webui  |   File "/venv/lib/python3.10/site-packages/bitsandbytes/__init__.py", line 16, in <module>
text-generation-webui  |     from .nn import modules
text-generation-webui  |   File "/venv/lib/python3.10/site-packages/bitsandbytes/nn/__init__.py", line 6, in <module>
text-generation-webui  |     from .triton_based_modules import SwitchBackLinear, SwitchBackLinearGlobal, SwitchBackLinearVectorwise, StandardLinear
text-generation-webui  |   File "/venv/lib/python3.10/site-packages/bitsandbytes/nn/triton_based_modules.py", line 8, in <module>
text-generation-webui  |     from bitsandbytes.triton.dequantize_rowwise import dequantize_rowwise
text-generation-webui  |   File "/venv/lib/python3.10/site-packages/bitsandbytes/triton/dequantize_rowwise.py", line 10, in <module>
text-generation-webui  |     import triton
text-generation-webui  |   File "/venv/lib/python3.10/site-packages/triton/__init__.py", line 20, in <module>
text-generation-webui  |     from .compiler import compile, CompilationError
text-generation-webui  |   File "/venv/lib/python3.10/site-packages/triton/compiler/__init__.py", line 1, in <module>
text-generation-webui  |     from .compiler import CompiledKernel, compile, instance_descriptor
text-generation-webui  |   File "/venv/lib/python3.10/site-packages/triton/compiler/compiler.py", line 27, in <module>
text-generation-webui  |     from .code_generator import ast_to_ttir
text-generation-webui  |   File "/venv/lib/python3.10/site-packages/triton/compiler/code_generator.py", line 8, in <module>
text-generation-webui  |     from .. import language
text-generation-webui  |   File "/venv/lib/python3.10/site-packages/triton/language/__init__.py", line 4, in <module>
text-generation-webui  |     from . import math
text-generation-webui  |   File "/venv/lib/python3.10/site-packages/triton/language/math.py", line 4, in <module>
text-generation-webui  |     from . import core
text-generation-webui  |   File "/venv/lib/python3.10/site-packages/triton/language/core.py", line 1376, in <module>
text-generation-webui  |     def minimum(x, y):
text-generation-webui  |   File "/venv/lib/python3.10/site-packages/triton/runtime/jit.py", line 542, in jit
text-generation-webui  |     return decorator(fn)
text-generation-webui  |   File "/venv/lib/python3.10/site-packages/triton/runtime/jit.py", line 534, in decorator
text-generation-webui  |     return JITFunction(
text-generation-webui  |   File "/venv/lib/python3.10/site-packages/triton/runtime/jit.py", line 433, in __init__
text-generation-webui  |     self.run = self._make_launcher()
text-generation-webui  |   File "/venv/lib/python3.10/site-packages/triton/runtime/jit.py", line 388, in _make_launcher
text-generation-webui  |     scope = {"version_key": version_key(),
text-generation-webui  |   File "/venv/lib/python3.10/site-packages/triton/runtime/jit.py", line 120, in version_key
text-generation-webui  |     ptxas = path_to_ptxas()[0]
text-generation-webui  |   File "/venv/lib/python3.10/site-packages/triton/common/backend.py", line 114, in path_to_ptxas
text-generation-webui  |     result = subprocess.check_output([ptxas_bin, "--version"], stderr=subprocess.STDOUT)
text-generation-webui  |   File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
text-generation-webui  |     return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
text-generation-webui  |   File "/usr/lib/python3.10/subprocess.py", line 526, in run
text-generation-webui  |     raise CalledProcessError(retcode, process.args,
text-generation-webui  | subprocess.CalledProcessError: Command '['/venv/lib/python3.10/site-packages/triton/common/../third_party/cuda/bin/ptxas', '--version']' died with <Signals.SIGTRAP: 5>.
text-generation-webui exited with code 1

make settings.yaml persistent!!!

find a way to make it persistent. e.g. apply a patch after u get https://github.com/oobabooga/text-generation-webui.git in dockerfile to change the default dir for settings.yaml to eg /extensions

it's absolutely annoying

heres the patch u have to use:

https://github.com/Gee1111/text-generation-webui/blob/main/make_settings_persistent.patch

edit send u a merge request but its better to put the make_settings_persistent.patch in your own repo and access it from there

EXTRA_LAUNCH_ARGS are not honored

Running in kubernetes, it works fine except the EXTRA_LAUNCH_ARGS are not honored as env vars.

The following I would expect to enable the api and preload a model, neither are enabled though.

            - name: EXTRA_LAUNCH_ARGS
              value: "--listen --verbose --api --extensions api --model TheBloke/vicuna-13B-v1.5-GPTQ --gpus all"

full kubernetes deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: text-gen-webui
  namespace: text-gen-demo
spec:
  replicas: 1
  selector:
    matchLabels:
      component: text-gen-webui
  template:
    metadata:
      labels:
        component: text-gen-webui
    spec:
      tolerations:
        - key: "nvidia.com/gpu"
          value: present
          effect: NoSchedule
      containers:
        - name: text-gen-demo-container
          image: atinoda/text-generation-webui
          ports:
            - containerPort: 7860
            - containerPort: 5000
            - containerPort: 5005
          resources:
            limits:
              nvidia.com/gpu: "1"
          env:
            - name: EXTRA_LAUNCH_ARGS
              value: "--listen --verbose --api --extensions api --model TheBloke/vicuna-13B-v1.5-GPTQ --gpus all"
            - name: TORCH_CUDA_ARCH_LIST
              value: "7.5"
          volumeMounts:
            - name: text-gen-demo-pvc
              mountPath: /app/loras
              subPath: loras
            - name: text-gen-demo-pvc
              mountPath: /app/models
              subPath: models
            - name: shm
              mountPath: /dev/shm
      volumes:
        - name: text-gen-demo-pvc
          emptyDir: {}
        - name: shm
          emptyDir:
            medium: Memory
            sizeLimit: 1Gi

Can't load models with CPU

I used this docker-compose to try to run under CPU, but when loading model I get a CUDA version error:
text-generation-webui | === Running text-generation-webui variant: 'DEFAULT' === text-generation-webui | === (This version is 75 commits behind origin) === text-generation-webui | === Image build date: 2023-07-18 18:43:00 === text-generation-webui | /venv/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cpu.so: undefined symbol: cadam32bit_grad_fp32 text-generation-webui | /venv/lib/python3.10/site-packages/bitsandbytes/cextension.py:34: UserWarning: The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable. text-generation-webui | warn("The installed version of bitsandbytes was compiled without GPU support. " text-generation-webui | 2023-07-30 18:34:51 INFO:Loading ggml-model-q4_1.bin... text-generation-webui | CUDA error 35 at ggml-cuda.cu:2478: CUDA driver version is insufficient for CUDA runtime version text-generation-webui | /arrow/cpp/src/arrow/filesystem/s3fs.cc:2598: arrow::fs::FinalizeS3 was not called even though S3 was initialized. This could lead to a segmentation fault at exit text-generation-webui exited with code 1

Isn't this supposed to load without trying to use an Nvidia GPU? I have an AMD, I was trying to use the CPU instead but doesn't work...
I'm in Linux, do I need any extra steeps to do?.

No such file or directory error

I just try to start it as docker compose up and faced the following error.

FileNotFoundError: [Errno 2] No such file or directory: 'presets/simple-1.yaml'

Any idea how to fix this?

Thanks

Directories are mapped against actual windows folders instead of volumes

Amazing work! It was very easy to use! I followed all the instructions and successfully ran a container. I thought the folders inside the image would be mapped to volumes inside docker desktop, but they are linked to windows folders:

Why is it configured this way and not using the volumes that docker themselves recommend? Docker volumes increase portability and persist after updating the container just as well.

I am using Windows 11 home edition + WSL running a NVidea GTX 3060 and cuda GPU acceleration seems to work just fine.

New Cuda

I think the newest Cuda has broken the docker deployment for vast.ai.

Nginx proxy with ssl support?

I am looking for a working Docker configuration with an Nginx proxy? Did you get this working by chance?

Error : arrow::fs::FinalizeS3 was not called even though S3 was initialized

Getting this error while docker compose up

This is using default tag of the container image

is this known ?

Error response from daemon: could not select device driver "nvidia" with capabilities:

no luck for me when trying to use this. Am I missing something? thanks

(base) dewi@DewiJones:~/code/text-generation-webui-docker/text-generation-webui-docker$ gs
++ pwd
+ current_dir=/home/dewi/code/text-generation-webui-docker/text-generation-webui-docker
+ [[ /home/dewi/code/text-generation-webui-docker/text-generation-webui-docker == \/\m\n\t\/\c* ]]
+ /usr/bin/git status -v -v
On branch master
Your branch is up to date with 'origin/master'.

Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git restore <file>..." to discard changes in working directory)
        modified:   docker-compose.yml

--------------------------------------------------
Changes not staged for commit:
diff --git i/docker-compose.yml w/docker-compose.yml
index d1caff0..33dda5e 100644
--- i/docker-compose.yml
+++ w/docker-compose.yml
@@ -1,7 +1,7 @@
 version: "3"
 services:
   text-generation-webui-docker:
-    image: atinoda/text-generation-webui:default # Specify variant as the :tag
+    image: atinoda/text-generation-webui:llama-cpu # Specify variant as the :tag
     container_name: text-generation-webui
     environment:
       - EXTRA_LAUNCH_ARGS="--listen --verbose" # Custom launch args (e.g., --model MODEL_NAME)
no changes added to commit (use "git add" and/or "git commit -a")

git status
commit d4b58daffec5096e2a7057388420e74987537766 (HEAD -> master, origin/master, origin/HEAD)
Author: Atinoda <[email protected]>
Date:   Wed Oct 18 15:49:48 2023 +0100

    Separate nightly builds

(base) dewi@DewiJones:~/code/text-generation-webui-docker/text-generation-webui-docker$ docker compose up
Attaching to text-generation-webui
Error response from daemon: could not select device driver "nvidia" with capabilities: [[gpu]]
(base) dewi@DewiJones:~/code/text-generation-webui-docker/text-generation-webui-docker$

connection refused issue

Can not seem to connect to webui through any means due to err connection refused, checked all firewall settings etc, appears to me that this may be the root of the issue:

https://pythonspeed.com/articles/docker-connection-refused/
As a newbie to all this, not been able to fix yet.

Error when loading some GGUF files with llama.cpp

My spec:RTX 3060ti,R5 5600X,16GB ram
I want to load deepseek models I have tried deepseek 6.7B and 33B,and neither of these models work. I have run several 7B,14B,34B models and it work without any problems. When I want to load deepseek model, the webui just crashed immediately.

Here is the error output from terminal:

text-generation-webui  | llama_model_loader: - type  f32:   65 tensors
text-generation-webui  | llama_model_loader: - type q6_K:  226 tensors
text-generation-webui  | ERROR: byte not found in vocab: '
text-generation-webui  | '
text-generation-webui  | /scripts/docker-entrypoint.sh: line 69:    98 Segmentation fault      (core dumped) "${LAUNCHER[@]}"
text-generation-webui exited with code 139
chj@archlinux /m/m/text-generation-webui-docker (master)>

Can not access WebUI

Running the default build on Windows Docker with WSL2 Ubuntu 24.04. I'm unable to access the UI at http://127.0.0.1:7860
Log shows:
2024-06-06 13:34:17 === Running text-generation-webui variant: 'Nvidia Extended' abe5ddc8833206381c43b002e95788d4cca0893a ===
2024-06-06 13:34:18 === (This version is 0 commits behind origin main) ===
2024-06-06 13:34:18 === Image build date: 2024-06-04 21:34:25 ===
2024-06-06 13:34:21 20:34:21-479491 INFO Starting Text generation web UI
2024-06-06 13:34:22
2024-06-06 13:34:22 Running on local URL: http://127.0.0.1:7860
2024-06-06 13:34:22

Unable to load coqui_tts

Environment

docker version: 24.0.7, build afdd53b4e3
kernel version: 6.6.7-arch1-1
OS: EndeavourOS
nvidia-container-toolkit version: 1.13.5

Docker images used

atinoda/text-generation-webui:latest
atinoda/text-generation-webui:latest-nightly
atinoda/text-generation-webui:default

Setting BUILD_EXTENSIONS_LIVE="coqui_tts" does not build them live and need to be enabled manually on the webui

When enabling coqui_tts in the webui it expects user input

docker-compose log

text-generation-webui  | 2023-12-15 22:33:07 INFO:Loading the extension "coqui_tts"...
text-generation-webui  | [XTTS] Loading XTTS...
text-generation-webui  |  > You must agree to the terms of service to use this model.
text-generation-webui  |  | > Please see the terms of service at https://coqui.ai/cpml.txt
text-generation-webui  |  | > "I have read, understood and agreed to the Terms and Conditions." - [y/n]
text-generation-webui  | 2023-12-15 22:33:15 ERROR:Failed to load the extension "coqui_tts".
text-generation-webui  | Traceback (most recent call last):
text-generation-webui  |   File "/app/modules/extensions.py", line 41, in load_extensions
text-generation-webui  |     extension.setup()
text-generation-webui  |   File "/app/extensions/coqui_tts/script.py", line 180, in setup
text-generation-webui  |     model = load_model()
text-generation-webui  |   File "/app/extensions/coqui_tts/script.py", line 76, in load_model
text-generation-webui  |     model = TTS(params["model_name"]).to(params["device"])
text-generation-webui  |   File "/venv/lib/python3.10/site-packages/TTS/api.py", line 81, in __init__
text-generation-webui  |     self.load_tts_model_by_name(model_name, gpu)
text-generation-webui  |   File "/venv/lib/python3.10/site-packages/TTS/api.py", line 195, in load_tts_model_by_name
text-generation-webui  |     model_path, config_path, vocoder_path, vocoder_config_path, model_dir = self.download_model_by_name(
text-generation-webui  |   File "/venv/lib/python3.10/site-packages/TTS/api.py", line 149, in download_model_by_name
text-generation-webui  |     model_path, config_path, model_item = self.manager.download_model(model_name)
text-generation-webui  |   File "/venv/lib/python3.10/site-packages/TTS/utils/manage.py", line 433, in download_model
text-generation-webui  |     self.create_dir_and_download_model(model_name, model_item, output_path)
text-generation-webui  |   File "/venv/lib/python3.10/site-packages/TTS/utils/manage.py", line 359, in create_dir_and_download_model
text-generation-webui  |     if not self.ask_tos(output_path):
text-generation-webui  |   File "/venv/lib/python3.10/site-packages/TTS/utils/manage.py", line 338, in ask_tos
text-generation-webui  |     answer = input(" | | > ")
text-generation-webui  | EOFError: EOF when reading a line
text-generation-webui  |  | | > Running on local URL:  http://0.0.0.0:7860
text-generation-webui  | 
text-generation-webui  | To create a public link, set `share=True` in `launch()`.

After doing some digging seems like a known problem that has already been fixed in oobabooga, refer to this issue

How to reproduce

docker-compose.yml

version: "3"
services:
  text-generation-webui-docker:
    image: atinoda/text-generation-webui:default-nightly # Specify variant as the :tag
    container_name: text-generation-webui
    environment:
      - EXTRA_LAUNCH_ARGS="--listen --verbose" # Custom launch args (e.g., --model MODEL_NAME)
      - BUILD_EXTENSIONS_LIVE="coqui_tts" # Install named extensions during every container launch. THIS WILL SIGNIFICANLTLY SLOW LAUNCH TIME.
    ports:
      - 7860:7860  # Default web port
      - 5000:5000  # Default API port
      - 5005:5005  # Default streaming port
      - 5001:5001  # Default OpenAI API extension port
    volumes:
      - ./config/characters:/app/characters
      - ./config/loras:/app/loras
      - ./config/models:/app/models
      - ./config/presets:/app/presets
      - ./config/prompts:/app/prompts
      - ./config/training:/app/training
      - ./config/extensions:/app/extensions  # Persist all extensions
#      - ./config/extensions/silero_tts:/app/extensions/silero_tts  # Persist a single extension
    logging:
      driver:  json-file
      options:
        max-file: "3"   # number of files or file count
        max-size: '10m'
    deploy:
        resources:
          reservations:
            devices:
              - driver: nvidia
                device_ids: ['0']
                capabilities: [gpu]

docker-compose up -d

Access the webui on port 7860 -> enable coqui_tts from the session tab
Run docker-compose logs

Alltalk_tts extension doesn't run properly?

Is anyone able to get the alltalk_tts extension (https://github.com/erew123/alltalk_tts) to run properly in this container? I've done both the webui and standalone routes, but both throw errors. I'm running this container in Unraid, which otherwise has been fine. Coqui and silero tts extensions seems to work fine.

Install has been using the alltalk_tts github instructions (https://github.com/erew123/alltalk_tts?tab=readme-ov-file#-quick-setup-text-generation-webui--standalone-installation). Console into the TGW-docker container and an example workflow for running with TGW option being:

#Put alltalk_tts git repo into extensions folder
cd extensions && git clone https://github.com/erew123/alltalk_tts
#Install curl, as the start_linux.sh script needs it
apt install curl -y
#Run the webui start_linux.sh script to get the setup dependencies, like conda, installed, which the alltalk scripts need to work
cd .. && ./start_linux.sh
#select options Nvidia and N for old CUDA version
#Start the env, which alltalk_tts atsetup.sh needs
./cmd_linux.sh
#Make the alltalk config script executable and start execution
cd extensions/alltalk_tts && chmod +x ./atsetup.sh && bash ./atsetup.sh
#option 1 to start install

Alltalk diag log and TGW container log when attempting to use Alltalk extension attached. Thoughts?

alltalk_diag.txt
tgw_alltalk_log.txt

no launch args --subpath

It seems that an update is needed.
https://github.com/oobabooga/text-generation-webui/pull/1538/files

Auto-detected mode as 'legacy' nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1:

I am getting this error when composing my GPU is RTX 3070TI
Cannot start Docker Compose application. Reason: compose [start] exit status 1. Container text-generation-webui-text-generation-webui-1 Starting Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown

Missing ffmpeg dependency for whisper sst

Warning: Couldn't find ffprobe or avprobe - defaulting to ffprobe, but may not work
Followed by
OSError: [Errno 7] Argument list too long: 'ffprobe'

This occurs when using the API. Unsure if the interface side works, I've not sorted out the SSL issues yet; but I can't imagine it works any better.

Is Mistral supported on default version?

Hi I'm trying to use one of the Mistral derived model (which works on the non-docker text-gen) but I keep getting this error message. Things seem to work with the llama model? Am I doing something wrong or is Mistral not supported on the default version? I tried with the Nightly version but it seems to fail as well.

text-generation-webui  | To create a public link, set `share=True` in `launch()`.
text-generation-webui  | 2023-10-12 13:35:19 INFO:Loading Mistral-7B-OpenOrca...
text-generation-webui  | 2023-10-12 13:35:19 ERROR:Failed to load the model.
text-generation-webui  | Traceback (most recent call last):
text-generation-webui  |   File "/app/modules/ui_model_menu.py", line 198, in load_model_wrapper
text-generation-webui  |     shared.model, shared.tokenizer = load_model(shared.model_name, loader)
text-generation-webui  |   File "/app/modules/models.py", line 78, in load_model
text-generation-webui  |     output = load_func_map[loader](model_name)
text-generation-webui  |   File "/app/modules/models.py", line 122, in huggingface_loader
text-generation-webui  |     config = AutoConfig.from_pretrained(path_to_model, trust_remote_code=params['trust_remote_code'])
text-generation-webui  |   File "/venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1039, in from_pretrained
text-generation-webui  |     config_class = CONFIG_MAPPING[config_dict["model_type"]]
text-generation-webui  |   File "/venv/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 734, in __getitem__
text-generation-webui  |     raise KeyError(key)
text-generation-webui  | KeyError: 'mistral'
text-generation-webui  |

Where i have to upload the training-dataset?

i, i am running 'Oobabooga LLM WebUI ', everything is fine, but now i am try to upload a training-dataset to /app/training/datasets.
But the gui does not found anything, also the /app/models/ is empty but a model is working.
Where i have to upload the training-dataset?
Thx

exec /scripts/docker-entrypoint.sh: exec format error

No build errors, however I noticed the container wasn't actually launching. Ran with logs and this is all it produced.

ubuntu:~/text-generation-webui-docker$ sudo docker-compose logs -f
Attaching to text-generation-webui
text-generation-webui | exec /scripts/docker-entrypoint.sh: exec format error
text-generation-webui exited with code 1

Cannot load exllamav2 models

This happens for the two consequent nightly versions, and I have also built an image from the 2024-03-10 snapshot version:
https://github.com/oobabooga/text-generation-webui/releases/tag/snapshot-2024-03-10 . The issue happens both of them.
This is the base-nvidia version.
When I try to load an exllamav2 modell, I receive this error message:

File "/app/modules/ui_model_menu.py", line 245, in load_model_wrapper shared.model, shared.tokenizer = load_model(selected_model, loader) File "/app/modules/models.py", line 87, in load_model output = load_func_map[loader](model_name) File "/app/modules/models.py", line 378, in ExLlamav2_HF_loader from modules.exllamav2_hf import Exllamav2HF File "/app/modules/exllamav2_hf.py", line 7, in from exllamav2 import ( File "/venv/lib/python3.10/site-packages/exllamav2/init.py", line 3, in from exllamav2.model import ExLlamaV2 File "/venv/lib/python3.10/site-packages/exllamav2/model.py", line 23, in from exllamav2.config import ExLlamaV2Config File "/venv/lib/python3.10/site-packages/exllamav2/config.py", line 2, in from exllamav2.fasttensors import STFile File "/venv/lib/python3.10/site-packages/exllamav2/fasttensors.py", line 5, in from exllamav2.ext import exllamav2_ext as ext_c File "/venv/lib/python3.10/site-packages/exllamav2/ext.py", line 15, in import exllamav2_ext ImportError: /venv/lib/python3.10/site-packages/exllamav2_ext.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c107WarningC1ESt7variantIJNS0_11UserWarningENS0_18DeprecationWarningEEERKNS_14SourceLocationESsb

I built an image from the official repo as well, and that worked flowlessly.
I think the issue could be this step from the official repository:
conda install -y -c "nvidia/label/cuda-12.1.1" cuda-runtime

I couldn't find this step in the Dockerfile here.
Thanks for the help!

Apple Silicon / MacOS support

For atinoda/text-generation-webui:llama-cpu-nightly:

 ⠹ text-generation-webui Pulling                                                                                                                                                                       1.2s 
no matching manifest for linux/arm64/v8 in the manifest list entries
make: *** [up] Error 18

For reference, atinoda/text-generation-webui:llama-cpu works without error.

unraid app development

was wanting to make sure your cool with using and forking your repo to make a unraid community app

Error 804

nvidia-smi display the data shown below.
After a "docker-compose up" and some downloading the messages with the error 804 are there(see below).
I'm running this inside a host running debian 12.

Seems a nvidia/torch related issue, any pointers how to resolve it?

Thu Feb  1 12:42:13 2024       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.125.06   Driver Version: 525.125.06   CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  On   | 00000000:01:00.0 Off |                  Off |
|  0%   37C    P8    12W / 450W |      6MiB / 24564MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      1137      G   /usr/lib/xorg/Xorg                  4MiB |
+-----------------------------------------------------------------------------+

Recreating text-generation-webui ... done
Attaching to text-generation-webui
text-generation-webui           | === Running text-generation-webui variant: 'DEFAULT' snapshot-2023-12-31 ===
text-generation-webui           | === (This version is 11 commits behind origin main) ===
text-generation-webui           | === Image build date: 2024-01-04 22:56:23 ===
text-generation-webui           | 11:39:20-264526 INFO     Starting Text generation web UI                        
text-generation-webui           | 11:39:20-266083 WARNING                                                         
text-generation-webui           |                          You are potentially exposing the web UI to the entire  
text-generation-webui           |                          internet without any access password.                  
text-generation-webui           |                          You can create one with the "--gradio-auth" flag like  
text-generation-webui           |                          this:                                                  
text-generation-webui           |                                                                                 
text-generation-webui           |                          --gradio-auth username:password                        
text-generation-webui           |                                                                                 
text-generation-webui           |                          Make sure to replace username:password with your own.  
text-generation-webui           | 11:39:20-266993 INFO     Loading the extension "gallery"                        
text-generation-webui           | ╭───────────────────── Traceback (most recent call last) ──────────────────────╮
text-generation-webui           | │ /app/server.py:254 in                                                │
text-generation-webui           | │                                                                              │
text-generation-webui           | │   253         # Launch the web UI                                            │
text-generation-webui           | │ ❱ 254         create_interface()                                             │
text-generation-webui           | │   255         while True:                                                    │
text-generation-webui           | │                                                                              │
text-generation-webui           | │ /app/server.py:133 in create_interface                                       │
text-generation-webui           | │                                                                              │
text-generation-webui           | │   132         ui_parameters.create_ui(shared.settings['preset'])  # Paramete │
text-generation-webui           | │ ❱ 133         ui_model_menu.create_ui()  # Model tab                         │
text-generation-webui           | │   134         training.create_ui()  # Training tab                           │
text-generation-webui           | │                                                                              │
text-generation-webui           | │ /app/modules/ui_model_menu.py:36 in create_ui                                │
text-generation-webui           | │                                                                              │
text-generation-webui           | │    35         for i in range(torch.cuda.device_count()):                     │
text-generation-webui           | │ ❱  36             total_mem.append(math.floor(torch.cuda.get_device_properti │
text-generation-webui           | │    37                                                                        │
text-generation-webui           | │                                                                              │
text-generation-webui           | │ /venv/lib/python3.10/site-packages/torch/cuda/__init__.py:449 in             │
text-generation-webui           | │ get_device_properties                                                        │
text-generation-webui           | │                                                                              │
text-generation-webui           | │    448     """                                                               │
text-generation-webui           | │ ❱  449     _lazy_init()  # will define _get_device_properties                │
text-generation-webui           | │    450     device = _get_device_index(device, optional=True)                 │
text-generation-webui           | │                                                                              │
text-generation-webui           | │ /venv/lib/python3.10/site-packages/torch/cuda/__init__.py:298 in _lazy_init  │
text-generation-webui           | │                                                                              │
text-generation-webui           | │    297             os.environ["CUDA_MODULE_LOADING"] = "LAZY"                │
text-generation-webui           | │ ❱  298         torch._C._cuda_init()                                         │
text-generation-webui           | │    299         # Some of the queued calls may reentrantly call _lazy_init(); │
text-generation-webui           | ╰──────────────────────────────────────────────────────────────────────────────╯
text-generation-webui           | RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda 
text-generation-webui           | functions before calling NumCudaDevices() that might have already set an error? 
text-generation-webui           | Error 804: forward compatibility was attempted on non supported HW
text-generation-webui exited with code 1

Allow persistent extensions

Would be very helpful with ability to use extensions.

older images?

Hello!
I just stumbled upon this and think this is a great addition to the quick deployment capabilities for text-generation-ui.

Looking through the code updates it looks we're about 2 weeks behind, but also looking at the image tags, we track a default stable build and a nightly build.

I'm wondering why there aren't any older builds? normally on docker-hub you can revert back if necessary, but i don't see any of the older images. will this be an option in the future?

Thank you!

exllamav2 integration?

Hi,
Firstly, thanks for maintaining the docker container!

Just want to know if exllamav2 is integrated in the docker container version?

I installed exllamav2 with pip install via container console and install was success. But when trying to load: https://huggingface.co/turboderp/CodeLlama-13B-instruct-2.65bpw-h6-exl2 or any other exl2 model it fails, see logs below:

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "/app/modules/ui_model_menu.py", line 196, in load_model_wrapper
    shared.model, shared.tokenizer = load_model(shared.model_name, loader)
  File "/app/modules/models.py", line 79, in load_model
    output = load_func_map[loader](model_name)
  File "/app/modules/models.py", line 149, in huggingface_loader
    model = LoaderClass.from_pretrained(Path(f"{shared.args.model_dir}/{model_name}"), low_cpu_mem_usage=True, torch_dtype=torch.bfloat16 if shared.args.bf16 else torch.float16, trust_remote_code=shared.args.trust_remote_code)
  File "/venv/lib/python3.10/site-packages/transformers/models/auto/auto_factory.py", line 516, in from_pretrained
    return model_class.from_pretrained(
  File "/venv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2650, in from_pretrained
    raise EnvironmentError(
OSError: Error no file named pytorch_model.bin, tf_model.h5, model.ckpt.index or flax_model.msgpack found in directory models/turboderp_Llama2-13B-4.0bpw-h6-exl2.

EXTRA_LAUNCH_ARGS only accepts the fist Argument

EXTRA_LAUNCH_ARGS only accepts the fist Argument and ignores the others:
i need e.g. : --listen --trust-remote-code --gradio-auth user:mypw

so my EXTRA_LAUNCH_ARGS value is : --listen --trust-remote-code --gradio-auth user:mypw

fixed it by set in "..."

load model failed

I run the container by docker compose up successfully, and then set the model in webui(http://localhost:7860), i can download the models sucessfully: https://huggingface.co/TheBloke/Llama-2-7B-Chat-GGML/tree/main, there is TheBloke_Llama-2-7B-Chat-GGML/ dir in models dir.
But when i choose the model and click the load button, it has this error:
text-generation-webui | 08:57:39-112718 INFO Loading "TheBloke_Llama-2-7B-Chat-GGML"
text-generation-webui | 08:57:39-152392 ERROR Failed to load the model.
text-generation-webui | Traceback (most recent call last):
text-generation-webui | File "/app/modules/ui_model_menu.py", line 242, in load_model_wrapper
text-generation-webui | shared.model, shared.tokenizer = load_model(selected_model, loader)
text-generation-webui | File "/app/modules/models.py", line 87, in load_model
text-generation-webui | output = load_func_maploader
text-generation-webui | File "/app/modules/models.py", line 247, in llamacpp_loader
text-generation-webui | model_file = list(Path(f'{shared.args.model_dir}/{model_name}').glob('*.gguf'))[0]
text-generation-webui | IndexError: list index out of range

I also try to modify the docker-compose.yml，i add the "--model TheBloke_Llama-2-7B-Chat-GGML" in - EXTRA_LAUNCH_ARGS="--listen --verbose" # Custom launch args (e.g., --model MODEL_NAME), also the same error .

how can i resolve it?

Training in 4-bit

Hi, How do I train a LORA with a GPTQ 4-bit model as its base? I tried with and without the new monkeypatch and with all the different model loaders. I got the furthest by trying to use AutoGPTQ without monkeypatch, and got this response:

285, in _create_new_module raise ValueError( ValueError: Target module QuantLinear() is not supported. Currently, only torch.nn.Linear and Conv1D are supported.

just like this issue

Incorporate exllama?

Any chance you could incorporate Ooba's new exllama support? It's more than just an upstream code update. Needs additional cloning of that repo, etc. https://github.com/oobabooga/text-generation-webui/blob/main/docs/ExLlama.md