Hi, How do I train a LORA with a GPTQ 4-bit model as its base? I tried with and withou

source TurboDerp and kaiokendev <a target="_blank" rel="noopener nor

Training in 4-bit about text-generation-webui-docker HOT 15 CLOSED

commented on July 22, 2024

Training in 4-bit

from text-generation-webui-docker.

Comments (15)

Atinoda commented on July 22, 2024

Glad to hear that you're diving into LoRA training! The monkey-patch technique is quite old but I've just checked upstream and I see that they're doing some updates. I will keep an eye on how things develop before I start making structural changes to this repo. What GPU and OS are you running?

For now, I suggest that you use the default variant, use the Transformers loader (that's just 'plain' huggingface models - not GGML or GPTQ), and check the load-in-4-bit option. I am able to train ehartford/WizardLM-7B-Uncensored 4-bit LoRAs on an RTX3090 using these options. I think this is what Ph0rk0z is referring to in the link that you sent with "You can train with qlora (full size files)..."

It seems that it also possible to train LoRAs on modern GPTQ models, but it is not something I am currently doing.

from text-generation-webui-docker.

Atinoda commented on July 22, 2024

Your system is well aligned with the project! That's basically what I run as the main LLM rig too. I think that the Transformers loader method I described previously is using qlora technique (but I haven't taken the time to actually dig into the code - so i cannot say for sure). If the built-in trainer is sufficient then you do not need to run any code. If you're trying to modify textgen's training behaviour then you're in for a fair bit of development work!

Are you trying to achieve a longer context length?

from text-generation-webui-docker.

Atinoda commented on July 22, 2024

Extended context is on my hit-list as well. Have you had any joy so far?

I still need to have a read through: https://github.com/epfml/landmark-attention which seemed promising.

from text-generation-webui-docker.

Atinoda commented on July 22, 2024

Which guys cracked it - the ones you linked earlier? If there's off-the-shelf code to be integrated into textgen or deployable by Docker then that's exactly what I'll do!

I'm currently working on a large-ish project that integrates LLMs, and extended context is due for investigation a little bit later on. If I get to that point and there's nothing deployable then I'll have a go at coding a solution out of papers.

from text-generation-webui-docker.

Atinoda commented on July 22, 2024

Thanks man, it's good to see other people's excitement and that you spotted that paper too! I'll need to sit down and read it properly soon, then go through their code and try running some implementations.

If you're using textgen as your platform, I do not think that you can run highly customised models and training regimes like that without, at the very least, a custom extension - but probably it will require a customised branch of the source code. That's not as bad as it sounds though, my production build uses a custom fork of textgen with some extra features that I need (and it runs using this repo's docker-compose.yml.build)

from text-generation-webui-docker.

commented on July 22, 2024

source TurboDerp and kaiokendev

And

# These two lines: self.scale = 1 / 4 t *= self.scale

from text-generation-webui-docker.

Atinoda commented on July 22, 2024

I've flicked through that link and it does indeed look promising - I'm definitely saving that for later. Unfortunately the two lines of code described there are pretty low-level from my understanding (i.e., it's "only two bricks to move" but those two bricks are at the bottom of the wall on the ground floor of a skyscraper!). Nevertheless, if someone hasn't moved those bricks around in one of the major inference libraries by the time I'm looking at extended context, I will attempt it.

from text-generation-webui-docker.

commented on July 22, 2024

Actually, it was pretty easy, I managed to port it to Alpaca lora 4 bit but couldn't even get it to run anymore, with or without the updated monkeypatch.

Here a little summary. LLM's have been trained with tokens along side their positional embeddings. For llama, this is 2048. Now what if we decoupled the positional embedding with the amount of tokens fed in. We could extrapolate and run with 2049 and beyond but we've tried it and the line in yellow shows its limitations. This is where the realisation that the models have been always trained on their max limit comes in, with going beyond 2048 llama is like "huh, wtf is this" so what next? lets work within the 2048 with scaling. If we make it so every other token is 1/2 a positional token eg 0.5,1,1.5... then we essentially get 4096 and with 1/4, 8192 and so on. This was the 2 lines of code mentioned, just turning the scale into a fraction. Now when we train on it, we can get it to realise the in between positional embedding and use the max tokens 2048 as a float scale possibly reaching a whole sentence per 1 positional embedding. Neat huh?

from text-generation-webui-docker.

commented on July 22, 2024

Do you get what i mean now? They've cracked it with a really simple solution that anyone could play around with, even me :)

from text-generation-webui-docker.

commented on July 22, 2024

I managed to get a train running with 4096 context on alpaca lora 4bit :)

from text-generation-webui-docker.

commented on July 22, 2024

i wrote the docker compose and edited the docker file for it if you want them to run your own

from text-generation-webui-docker.

Atinoda commented on July 22, 2024

Congratulations - that's an awesome achievement, and right at the cutting edge of open-source LLM deployments! If you're happy to share the files, then please do - I'll test and try to integrate them as a variant here.

from text-generation-webui-docker.

Atinoda commented on July 22, 2024

Would you be happy to post the edited files here? Or fork the repo then link / PR your changes in a new branch? If we develop in a public forum then it adds visibility for other people, and opens the door for other collaborators. How do you think that your results compare to Mosaic's model?

from text-generation-webui-docker.

commented on July 22, 2024

I know, already created a repo and all, just wanted to have any form of pm. Ill add you to the repo tomorrow. Well MPT 7b Isn't as good as llama right off the bat and the long contexts, to which kaiokendev released this 30b 8k tokens and that performs great much better than MPT imo, even their 30B. The one thing I'll be excited for though is MPT 30B Storywriter.

from text-generation-webui-docker.

Atinoda commented on July 22, 2024

Stretched contexts through modified positional encoding is included in the ExLlama loader via the max_seq_len compress_pos_emb options. Check it out! I am going to close this issue as completed, because the motivation of the 4-bit training feature request was to replicate this functionality.

from text-generation-webui-docker.

Training in 4-bit about text-generation-webui-docker HOT 15 CLOSED

Comments (15)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent