jzhang38 / longmamba Goto Github PK
View Code? Open in Web Editor NEWSome preliminary explorations of Mamba's context scaling.
Some preliminary explorations of Mamba's context scaling.
8x A100 80 GB strikes me as a lot of memory usage for a 2.8 GB model at 16k context.
Is this attributable to the trainable A matrix being large? (owing to the size of h, the state)?
I love this exploration! Thanks for writing and coding this up. Right now, we're working on modifications to the causal conv1d and selective scan CUDA kernels to support defining the input state, so we are reviewing your code carefully.
What is the objective of the exponential fall-off in the cache clearing in train-infinite.py
?
if completed_steps % clear_cache_interval == 0:
for layer_idx in range(model.config.n_layer):
conv_state = torch.zeros((1, model.config.d_model*2, 3), dtype=torch.bfloat16, device=accelerator.device).detach()
ssm_state = torch.zeros((1, model.config.d_model*2, 16), dtype=torch.bfloat16, device=accelerator.device).detach()
previous_hidden_states.append((conv_state, ssm_state))
clear_cache_interval *= 2
Also, a general question: do you have a feeling for why your current implementation isn't working? Might vanishing gradients be an issue when running over longer sequences? I noticed that you're using bf16. I found this caused instability, and using amp for higher precision seemed to help.
Hello,
Just wondering if the model that you provided on huggingface was instruction tuned to perform the needle in the haystack test.
Also, (hypothetically speaking) would some of the practices to reduce GPU requirements also apply to SSSM models? For example, Unsloth reduces the GPU demand so consumer GPUs can train Llama2 -7B and Mistral - 7B models. My 8BG GPU was able to finetune Mistral for a small usecase of mine. It would absolutely amazing to see a Mamba-7B model train for half the resources that Unsloth Mistral 7B needs.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.