Google docs integration broken. Looks like something bad happened during the last major refactor. Old document related user commands are no longer recognized.
I think we should be spinning up at leas the default model on startup, rather than waiting for the first message. This will cut down on apparent response time for the first message from the user.
The giant IO_functions.py has got to go, it's getting out of control. The module is approaching 1000 lines and contains functions for discord and matrix. Here's the plan:
Separate discord and matrix functions into bartleby_discord and bartleby_matrix
Move any functions used by both matrix and discord into helper_functions
Move all of the command definitions out of bartleby_matrix into 'cogs' i.e. their own special discord.py class
As the conversation history grows, the original prompt gets pushed out of the LLM's input buffer. I think we should probably be keeping it to anchor the slice of messages we do send for inference.
LLM instances need lifecycle management. Two things we are looking for here:
If a user has not interacted with a LLM instance in some amount of time, we kill it to reclaim GPU resources.
If a user tries to spin up a new LLM instance, first we check if we have room, if we don't, either fall back to CPU or kick an older model off of the GPU.
Now that, I'm writing this, maybe we should demote older LLM instances to CPU before/instead of garbage collecting them. That way, when/if someone starts talking to them, we don't need to go through a cold start, but we also aren't hogging GPU.
Anyway, this deserves some attention - as it stands now whenever a user wants to talk to a new type of model, we just keep jamming them onto the GPUs until we inevitably OOM. Not good.
Bartleby is not responding to messages sent from the Element android app. Suspect this has something to do with differences between message event.source coming from web/desktop vs mobile. Best guess is that the messages are not being picked up properly by the @mention filter.
Not sure we will be using docker for deployment, but we shouldn't let the work that was done to containerize bartleby to get stale and be forgotten. It would be a nice option for folks who want to run it themselves. Figure out how to document and integrate it with main.
Fix or remove Dialo. It was added to make rapid testing easy early on, I don't think we really need it anymore and it generates subpar output. Either we are prompting it incorrectly and it should be fixed, or it's too weak of a model to do what we are asking of it and should be removed.
Seems like we should be saving it more often or at other places in the matrix listener loop. Sometimes on model start-up, the LLM is being fed old messages from chat as if they are new.