abanteai / mentat Goto Github PK
View Code? Open in Web Editor NEWMentat - The AI Coding Assistant
Home Page: https://mentat.ai
License: Apache License 2.0
Mentat - The AI Coding Assistant
Home Page: https://mentat.ai
License: Apache License 2.0
Hey, I think there should be support for the petals.dev network, which uses a bittorrent-style method of serving LLMs like LLaMA 2 and Stable Beluga 2 on a network of machines.
Add a config option to have a list of prompt aliases to use while being in session.
For example, we might then add single-word aliases for commonly used custom prompts such as:
These could then be used while being inside of the session by writing a single word instead of the whole sentence.
Right now, git_root
gets passed around a lot, including being stored in each CodeChange
object. This is necessary because CodeFileManager.file_lines
maps full paths to files.
To clean this up, we should refactor to have CodeFileManager.file_lines
keys just be the relative file paths within the project (from the git root). CodeFileManager
is the only thing that needs to know the actual path to the git project on the system, since it reads / writes to files.
Using ctags (#66 ) to get relevant names, autocomplete them as the user types. Right now we have autocomplete for prompt history, but that's just for the entire request the user types, not on the word level. I'm not sure what switching that will look like.
This will probably have some conflicts with #59 , since that also changes user input. Might be good to merge that first.
It seem so smooth
thanks xD
Currently, Mentat is aware of uncommitted changes to files in context (the git diff for these files).
It'd be nice if you could make it aware of changes that have been committed. Like have it look at the changes from the last 3 commit, or the difference between two branches (a PR review assistant?). This could make it more helpful at continuing changes you are working on, or helping you review a PR.
This could also be a way to automatically give context - instead of manually choosing files, just look at all the files that have changes.
The current implementation of Mentat does not allow users to choose which language model to use, including the more advanced GPT-4 and its 32K version.
--model
, that lets users specify the language model they want to use.๐ ๏ธ This feature is key to user customization and leveraging the full potential of the latest language models.
Currently the conversation we send to the model is structured like this:
(in this example, the model already responded previously once, the structure is the same for shorter or longer conversations, always with a "code message" from the system as the final message)
This order is likely not optimal, because models seem to pay the most attention to the end of the prompt, which here we are filling with code context. My best guess for a better ordering would be:
This would put the most important thing last (the user message). It may be better to reverse the order of (1) and (2). We should experiment to see if this improves performance at all. However running these experiments effectively will require longer context benchmarks than we currently have.
Once code maps are merged (#66), create a new benchmark that requires the model to use the function signature in the code map correctly to pass. Perhaps this could be in the form of it using methods on a class defined in a file not included in context.
It should be difficult enough that pass rate is ~40%-70% - gives us something to aim to realistically improve relatively quickly.
When you interrupt the streaming output model, the stream itself doesn't get closed, so I think it still generates (and charges you for) the full response. This this thread shows how to do it, should be simple to integrate.
The way we currently generate context is:
diff
or pr-diff
they selectno_code_map
is false:At 4), we want to use the remaining context in the most valuable way - not just fill it in with code_map. To do this we will:
(mentat/app.py, 'code')
, (mentat/app.py, 'diff')
, (mentat/app.py, 'cmap_signatures')
etc. for different features of a file, as well as smaller chunks, e.g. (mentat/app.py:run 'code')
, (mentat/app.py:loop 'code')
. Chunks within files should cover the entire file without overlap.cmap_signature_weight
. Just want to prioritize higher-density information.<file>:<func>
is already included and you add <file>
, keep the higher-level item ().Happy for questions or suggestions on the approach! My plan moving forward is:
get_code_message
to CodeFile, update diff and codemaps to work on individual files. Eventually CodeFile will become CodeFeature and can be anything in a).diff
and codemaps
Tree-sitter
to parse files into smaller chunks.Am I the only one running into this?
File "/minecraft/CODE/mentat/mentat/mentat/app.py", line 36, in run
loop(paths, cost_tracker)
File "/minecraft/CODE/mentat/mentat/mentat/app.py", line 59, in loop
explanation, code_changes = conv.get_model_response(code_file_manager, config)
File "/minecraft/CODE/mentat/mentat/mentat/conversation.py", line 35, in get_model_response
state = run_async_stream_and_parse_llm_response(
File "/minecraft/CODE/mentat/mentat/mentat/parsing.py", line 129, in run_async_stream_and_parse_llm_response
asyncio.run(
File "/opt/conda/envs/Py10/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/opt/conda/envs/Py10/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
return future.result()
File "/minecraft/CODE/mentat/mentat/mentat/parsing.py", line 152, in stream_and_parse_llm_response
response = await call_llm_api(messages, model)
File "/minecraft/CODE/mentat/mentat/mentat/llm_api.py", line 44, in call_llm_api
response = await openai.ChatCompletion.acreate(
File "/opt/conda/envs/Py10/lib/python3.10/site-packages/openai/api_resources/chat_completion.py", line 45, in acreate
return await super().acreate(*args, **kwargs)
File "/opt/conda/envs/Py10/lib/python3.10/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 217, in acreate
response, _, api_key = await requestor.arequest(
File "/opt/conda/envs/Py10/lib/python3.10/site-packages/openai/api_requestor.py", line 382, in arequest
resp, got_stream = await self._interpret_async_response(result, stream)
File "/opt/conda/envs/Py10/lib/python3.10/site-packages/openai/api_requestor.py", line 726, in _interpret_async_response
self._interpret_response_line(
File "/opt/conda/envs/Py10/lib/python3.10/site-packages/openai/api_requestor.py", line 763, in _interpret_response_line
raise self.handle_error_response(
openai.error.RateLimitError: Rate limit reached for 10KTPM-200RPM in organization org-Ab4yA9 on tokens per min. Limit: 10000 / min. Please try again in 6ms. Contact us through our help center at help.openai.com if you continue to have issues.
Would be cool if when presented with multiple changes in interactive mode, you could give feedback for updates to a specific function during that flow.
i.e.
in this example if I wanted mentat to make the function more concise or to use a different method for calculating primes, I could just ask it to update it's suggestion.
I get the error your OpenAI API key doesn't have access to gpt-4-0314
when I try to run mentat. I thought GPT-4 API was generally available now? Do I need to change the code to use a specific version of GPT-4 API?
I got an error when running mentat .
app
โโโ * .gitignore
โโโ .gitlab-ci.yml
โโโ Dockerfile
โโโ README.md
โโโ main.py
โโโ helper
โ โโโ __init__.py
โ โโโ variables.py
โโโ model
โ โโโ .gitignore
โ โโโ clf_model.pkl
โ โโโ wiki.txt.gz
โ โโโ word2vec,pkl
โโโ requirements.txt
โโโ sample
โ โโโ .gitignore
โโโ tests
โโโ mock_training.json
โโโ model
โ โโโ .gitignore
โ โโโ wiki.txt.gz
โโโ tes_training.py
Total session cost: $0.00
Traceback (most recent call last):
File "/home/user/app/venv/bin/mentat", line 8, in <module>
sys.exit(run_cli())
File "/home/user/app/venv/lib/python3.10/site-packages/mentat/app.py", line 24, in run_cli
run(paths)
File "/home/user/app/venv/lib/python3.10/site-packages/mentat/app.py", line 35, in run
loop(paths, cost_tracker)
File "/home/user/app/venv/lib/python3.10/site-packages/mentat/app.py", line 48, in loop
tokens = count_tokens(code_file_manager.get_code_message())
File "/home/user/app/venv/lib/python3.10/site-packages/mentat/code_file_manager.py", line 260, in get_code_message
self._read_all_file_lines()
File "/home/user/app/venv/lib/python3.10/site-packages/mentat/code_file_manager.py", line 257, in _read_all_file_lines
self.file_lines[abs_path] = self._read_file(abs_path)
File "/home/user/app/venv/lib/python3.10/site-packages/mentat/code_file_manager.py", line 251, in _read_file
lines = f.read().split("\n")
File "/usr/lib/python3.10/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x8b in position 1: invalid start byte
this is the error im seeing please help
Last login: Wed Jul 26 13:13:02 on ttys002
rileylovett@Rileys-MacBook-Air ~ % /Users/rileylovett/.venv_new/bin/mentat /Users/rileylovett/Discord2
Files included in context:
Discord2
โโโ .env
โโโ README.md
โโโ app.py
โโโ bot.py
โโโ boty.py
โโโ pp.py
โโโ requirements.txt
File token count: 6671
Type 'q' or use Ctrl-C to quit at any time.
What can I do for you?
explain
Total token count: 7518
Total session cost: $0.00
Traceback (most recent call last):
File "/Users/rileylovett/.venv_new/lib/python3.11/site-packages/aiohttp/connector.py", line 980, in _wrap_create_connection
return await self._loop.create_connection(*args, **kwargs) # type: ignore[return-value] # noqa
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 1112, in create_connection
transport, protocol = await self._create_connection_transport(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 1145, in _create_connection_transport
await waiter
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/sslproto.py", line 575, in _on_handshake_complete
raise handshake_exc
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/sslproto.py", line 557, in _do_handshake
self._sslobj.do_handshake()
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/ssl.py", line 979, in do_handshake
self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1002)
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/rileylovett/.venv_new/lib/python3.11/site-packages/openai/api_requestor.py", line 592, in arequest_raw
result = await session.request(**request_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/rileylovett/.venv_new/lib/python3.11/site-packages/aiohttp/client.py", line 536, in _request
conn = await self._connector.connect(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/rileylovett/.venv_new/lib/python3.11/site-packages/aiohttp/connector.py", line 540, in connect
proto = await self._create_connection(req, traces, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/rileylovett/.venv_new/lib/python3.11/site-packages/aiohttp/connector.py", line 901, in _create_connection
_, proto = await self._create_direct_connection(req, traces, timeout)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/rileylovett/.venv_new/lib/python3.11/site-packages/aiohttp/connector.py", line 1206, in _create_direct_connection
raise last_exc
File "/Users/rileylovett/.venv_new/lib/python3.11/site-packages/aiohttp/connector.py", line 1175, in _create_direct_connection
transp, proto = await self._wrap_create_connection(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/rileylovett/.venv_new/lib/python3.11/site-packages/aiohttp/connector.py", line 982, in _wrap_create_connection
raise ClientConnectorCertificateError(req.connection_key, exc) from exc
aiohttp.client_exceptions.ClientConnectorCertificateError: Cannot connect to host api.openai.com:443 ssl:True [SSLCertVerificationError: (1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1002)')]
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/Users/rileylovett/.venv_new/bin/mentat", line 8, in
sys.exit(run_cli())
^^^^^^^^^
File "/Users/rileylovett/.venv_new/lib/python3.11/site-packages/mentat/app.py", line 24, in run_cli
run(paths)
File "/Users/rileylovett/.venv_new/lib/python3.11/site-packages/mentat/app.py", line 35, in run
loop(paths, cost_tracker)
File "/Users/rileylovett/.venv_new/lib/python3.11/site-packages/mentat/app.py", line 57, in loop
explanation, code_changes = conv.get_model_response(code_file_manager, config)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/rileylovett/.venv_new/lib/python3.11/site-packages/mentat/conversation.py", line 52, in get_model_response
state = run_async_stream_and_parse_llm_response(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/rileylovett/.venv_new/lib/python3.11/site-packages/mentat/parsing.py", line 129, in run_async_stream_and_parse_llm_response
asyncio.run(
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 190, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "/Users/rileylovett/.venv_new/lib/python3.11/site-packages/mentat/parsing.py", line 152, in stream_and_parse_llm_response
response = await call_llm_api(messages, model)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/rileylovett/.venv_new/lib/python3.11/site-packages/mentat/llm_api.py", line 46, in call_llm_api
response = await openai.ChatCompletion.acreate(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/rileylovett/.venv_new/lib/python3.11/site-packages/openai/api_resources/chat_completion.py", line 45, in acreate
return await super().acreate(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/rileylovett/.venv_new/lib/python3.11/site-packages/openai/api_resources/abstract/engine_api_resource.py", line 217, in acreate
response, _, api_key = await requestor.arequest(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/rileylovett/.venv_new/lib/python3.11/site-packages/openai/api_requestor.py", line 304, in arequest
result = await self.arequest_raw(
^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/rileylovett/.venv_new/lib/python3.11/site-packages/openai/api_requestor.py", line 609, in arequest_raw
raise error.APIConnectionError("Error communicating with OpenAI") from e
openai.error.APIConnectionError: Error communicating with OpenAI
rileylovett@Rileys-MacBook-Air ~ %
I'm imagining a workflow where you run like mentat --debug python myfile.py arg1
from the terminal, and it:
python myfile.py arg1
)cmap
?)Anyone else think this would be useful? Maybe it could be implemented as a new interface.
Add tests for running mentat from a subdirectory of the git repository (we might have them already but I don't think we do; would be good to check first!).
Add a command to add / remove files from the context inside of the current session to avoid having to restart it after each edit.
I seem to "randomly" have my ctrl+c
keybind disabled during response streaming from OpenAI. KeyboardInterrrupt
is getting swallowed somewhere, however I'm still trying to consistently reproduce this behavior.
These take forever to run sequentially, and since we have temperature set to a non-zero value we should run each benchmark several times and average the pass/fail totals.
@biobootloader @waydegg @PCSwingle
I see a lot of indecisiveness around the conversation class with a lot of things that seem out of scope for it.
May I propose a structure with better separation of concerns with a bit more forethought (possibly a future dependency)?
Note, I'm thinking ahead here to fully logged model agnostic multi agent conversations (similar to ChatDev), with model training in mind where manual & automated training data augmentation can occur with traceability to obey TOS of any model used. -- I know, that's a bit much, but I think that will ultimately be needed for any such AI pair programming system to scale.
(Mind you, I want to really emphasis fine tuning here, because even if you favor OpenAI models, likely GPT4 will be fine tunable in time as GPT3.5 now is.)
I've been prototyping this in AbstractAI, and am currently re-working it locally for readability, simplicity and easier serialization in sql lite, json, and transport & syncing via flask... But when I saw the number of changes to the conversation class which I'm currently using to log my mentat conversations on my instance, I wondered if a discussion wasn't warranted.
@dataclass
class MessageSource:
pass
@dataclass
class Message:
content: str
creation_time: datetime = field(default_factory=datetime.now)
prev_message: "Message" = None
conversation: "Conversation" = None
source: MessageSource = None
_children: List["Message"] = internal_field(default_factory=list)
@dataclass
class MessageSequence:
conversation: "Conversation"
messages: List[Message] = field(default_factory=list)
def add_message(self, message:Message):
message.prev_message = None if len(self.messages) == 0 else self.messages[-1]
self.messages.append(message)
@dataclass
class Conversation:
name: str
description: str
creation_time: datetime = field(default_factory=datetime.now)
message_sequence:MessageSequence = None
_all_messages: List[Message] = internal_field(default_factory=list)
_root_messages: List[Message] = internal_field(default_factory=list)
def __post_init__(self, message_sequence:MessageSequence = None):
if message_sequence is None:
self.message_sequence = MessageSequence(self)
def add_message(self, message:Message):
self._all_messages.append(message)
self.message_sequence.add_message(message)
if message.prev_message is None:
self._root_messages.append(message)
else:
message.prev_message._children.append(message)
@dataclass
class EditSource(MessageSource):
original: Message
new: Message
new_message_source: MessageSource
@dataclass
class ModelSource(MessageSource):
model_name: str
model_parameters: dict
message_sequence: MessageSequence
@dataclass
class UserSource(MessageSource):
user_name: str = None
In order to get a realistic sense of Mentat's performance it'd be best to compare the code it generates to human generated code. We could pick out some high quality open source github repos with good standards around writing commit messages, issues and pull requests. Checkout the code before a commit and prompt mentat with produce a diff that accomplishes $COMMIT_MESSAGE towards solving $ISSUE. We can then evaluate it in two ways:
I'm interested in evaluating 1 even though it's not that informative on its own because it will give us a benchmark to evaluate as we implement ideas aimed at expanding the model's effective context, Issue 3.
We can discuss other ideas for benchmarks here and break good ideas out into separate issues.
Will there be such option?
Hi:
Great demos, but when I installed from github and tried to run my local copy I always get this error. I created it's own virtual environment and all. Any ideas?
Thank you so much! Very excited to try it.
[1] % mentat .
Traceback (most recent call last):
File "/home/gussand/.venv/bin/mentat", line 5, in
from mentat.app import run_cli
File "/home/gussand/github/mentat/mentat/app.py", line 117
match user_response.lower():
^
SyntaxError: invalid syntax
The inclusion of an automated file selection feature could greatly enhance workflow efficiency, eliminating the need for manual selection of relevant files. Here's a proposed implementation:
Automated Descriptions with Local Indexing: We could introduce a locally stored index.json
file that automatically generates descriptions for each folder and file, outlining their respective roles.
Index Updating Command: We could create a command, for instance, mentat index (/optional/path)
, designed to keep the index.json
file current.
Query-Based File Suggestion: With a command like mentat code $query
, the system could leverage the index.json
file to determine and suggest the most relevant files based on the user's query. The user could then either confirm the selection or provide feedback.
Seamless Integration with Existing Workflow: Once the appropriate files are selected, the user can proceed with their usual editing tasks.
followup from this discussion: #123 (comment)
Mentat currently lacks the ability to automatically test and debug code.
๐ This will drastically speed up the development and debugging process.
Refactoring CodeChange into a concrete and abstract format; the concrete format would manage parsing the model's output and would be able to convert into an abstract change. The abstract change would use an internal format and would handle actually changing files. This would allow us to easily create new formats and test them out, without having to change anything outside of the parsing (and maybe streaming?) logic. After talking with @biobootloader here is the current plan:
Concrete Change: Contains logic for parsing different model formats; can be converted to an abstract change. Can also be converted from an abstract change so that we can automatically generate examples for different models (and possibly tests) from a single example set.
Additions/Deletions: Make up an Abstract Change. These will be the simplest possible changes, either adding a block of code at a certain line or deleting a certain block of code. Ideally any conflicts will be resolved by the concrete change parser, but even if they aren't the Additions and Deletions will be processed in order and so conflicts shouldn't be an issue.
Abstract Change: Will be stored per file; each file will contain a list of Additions and Deletions; additionally, each Addition and Deletion will be marked with a specific Concrete Change; this will allow the user to use interactive mode to accept or reject (and eventually maybe ask the model to change?) specific changes that it outputs.
Another design choice we could have made is to have Abstract Changes represent an entire file, and simply be the text of the file after applying all changes rather than having Additions and Deletions; this would resolve some problems with conflicts and renaming, but wouldn't let us be able to reference specific Concrete Changes from the Abstract Changes, meaning we would mostly likely end up simply using hunks (groups of changed lines) in interactive mode. For now, @biobootloader and I have decided to go with the previous plan to allow interactive mode to be more powerful.
Hi, nice work!
I just wanted to ask you if you have any cost estimations for the usage of the tool. It would be really nice to see some numbers on what can we expect.
I'm getting this error and it seems I'm not alone.
The model
gpt-4-0314
does not exist or you do not have access to it. Learn more: https://help.openai.com/en/articles/7102672-how-can-i-access-gpt-4.
This is a necessary step for working with larger codebases that don't fit in context. We'll have some sort of automatic process to gather key locations to include in context, instead of just user added files. This is also necessary to run Mentat at all on single files that exceed context limits.
It's not clear how users could easily add sections of files themselves, but that's not the primary goal here anyway. The requirements are:
CodeFileManager
to keep track of "file segments" instead of whole files that are in context. Perhaps for each file it would also store a list of tuples, encoding the start and end line numbers of included sections?CodeFileManager.get_code_message
There are probably several details and decisions to be made here that I haven't thought of yet!
Hello! I love this project and it is honestly really useful. But the cost is a lot when using GPT-4. Is there a way to manually change the model from GPT-4 to GPT-3.5 turbo? I thought I had seen an issue about this before but I do not see it anymore (I did not see it in the closed issues as well). There might be a way to do this already, and if there is I have not found how to do so yet and am sorry.
Regarding the project, it is yet another example of what great things can be created when great developers and AI work together. Thanks for creating it!
If we incorporate all the codebase in the prompt sent to the LLM, we might overflow the max length allowed by the LLM. So if the codebase is large, maybe we want to add a retrieval step where we identify the most relevant pieces of code given the user instructions and only add these relevant ones to the LLM input
This issue is a place to collect follow-ups from: #119
Just edit and add to this list:
color
argument in StreamMessage
. Since we just copied the cprint
api with .send
, we can/should have types for colors here that the static type checker verifies are correctTraceback (most recent call last):
File "/Users/.../.pyenv/versions/3.11.0/bin/mentat", line 8, in <module>
sys.exit(run_cli())
^^^^^^^^^
File "/Users/.../.pyenv/versions/3.11.0/lib/python3.11/site-packages/mentat/app.py", line 24, in run_cli
run(paths)
File "/Users/.../.pyenv/versions/3.11.0/lib/python3.11/site-packages/mentat/app.py", line 35, in run
loop(paths, cost_tracker)
File "/Users/.../.pyenv/versions/3.11.0/lib/python3.11/site-packages/mentat/app.py", line 48, in loop
tokens = count_tokens(code_file_manager.get_code_message())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/.../.pyenv/versions/3.11.0/lib/python3.11/site-packages/mentat/llm_api.py", line 57, in count_tokens
return len(tiktoken.encoding_for_model("gpt-4").encode(message))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/.../.pyenv/versions/3.11.0/lib/python3.11/site-packages/tiktoken/core.py", line 117, in encode
raise_disallowed_special_token(match.group())
File "/Users/.../.pyenv/versions/3.11.0/lib/python3.11/site-packages/tiktoken/core.py", line 351, in raise_disallowed_special_token
raise ValueError(
ValueError: Encountered text corresponding to disallowed special token '<|endoftext|>'.
If you want this text to be encoded as a special token, pass it to `allowed_special`, e.g. `allowed_special={'<|endoftext|>', ...}`.
If you want this text to be encoded as normal text, disable the check for this token by passing `disallowed_special=(enc.special_tokens_set - {'<|endoftext|>'})`.
To disable this check for all special tokens, pass `disallowed_special=()`.
We are experimenting with edit specification formats for the models. We benchmark performance using different formats, but that just tells us how successful they are. It'd be very helpful to understand why some edits are bad, i.e. did they lead to duplicated lines? accidentally deleted lines?
By sending a unified diff of the result after applying each edit (or multiple edits) to GPT-4, we can ask it what went wrong, and have it list issues for us. We could define issues for it that it could use as labels, such as duplicate-line
, mis-deleted-line
, incorrect-insert-location
etc. As well as letting it do freeform response for errors that don't match.
This would be something we could run during / after benchmarking.
We'd like mentat to select files related to a prompt based on how similar their embeddings are. I've setup a test in a Jupyter Notebook (I can share directly). The process is:
text-embedding-ada-002
. We can send up to 8192 tokens per API call, either a string or a list of strings (batch).ast
and astor
to parse Python functions from the mentat code. There doesn't seem to be a one-stop solution to parsing different languages, but we have some leads.I wrote 3 test prompts, with a list of files and functions I expected to see. Then I ran the algo and returned the top 10, and counted how many of the expected were included.
Prompt: Give me a summary of how asyncio is used in the code.
Files: 3/5
Functions: 1/4
Prompt: Add a new pytest fixture function in conftest to call git commands and return the results. Then, replace calls to subprocess/git in the tests with that new fixture.
Files: 3/4
Functions: 3/10
Prompt: Allow the user to use gpt-3.5 instead of gpt-4, but give them a warning. Add relevant, cost, metadata, etc for the new model, as well as tests and stats.
Files: 2/3
Functions: 2/7
To prepare for future editor extensions and other interfaces, we need to separate the core logic from the terminal interface (both input and output). The goal is to allow for multiple interfaces: terminal, VSCode extension, NeoVim extension, etc
Once code maps are merged (#66), write a test (not a benchmark, it should run on GH actions) that checks that calling CodeMap.get_message
with different token_limits
correctly transitions between a full map (with signatures), a map without signatures, and just a file map.
Reason for test: this is an important feature that seems like it be easy for us to not immediately notice if we broke it on accident
Does it have an ability to ignore files or folders? For example I've got a large project with node modules or other files that I would like to exclude from the context? Or I would like to work only on particular files?
getting this error when getting mentat to rename a JS file
running the repo locally on Ubuntu 22.04.2, haven't tried on any other OS
i'm not a python guy so apologies if it's an obvious env thing lol,
here's the how I recreated it along w/ the stack trace:
What can I do for you?
>>> rename the JS file to app.js, and update package.json with the new fil
ename
Total token count: 1458
streaming... use control-c to interrupt the model at any point
I will rename the JS file to app.js and update the package.json with the new filename.
Steps:
1. Rename the download-data.js file to app.js.
2. Update the package.json file with the new filename.
2023-07-26 23:48:42,828 - ERROR - an error occurred during closing of asynchronous generator <async_generator object APIRequestor.arequest.<locals>.wrap_resp at 0x7f275e2a23c0>
asyncgen: <async_generator object APIRequestor.arequest.<locals>.wrap_resp at 0x7f275e2a23c0>
Traceback (most recent call last):
File "/home/zhoug/.local/lib/python3.10/site-packages/openai/api_requestor.py", line 324, in wrap_resp
yield r
GeneratorExit
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/zhoug/.local/lib/python3.10/site-packages/openai/api_requestor.py", line 326, in wrap_resp
await ctx.__aexit__(None, None, None)
File "/usr/lib/python3.10/contextlib.py", line 206, in __aexit__
await anext(self.gen)
RuntimeError: anext(): asynchronous generator is already running
Total session cost: $0.00
Traceback (most recent call last):
File "/home/zhoug/.local/bin/mentat", line 8, in <module>
sys.exit(run_cli())
File "/home/zhoug/.local/lib/python3.10/site-packages/mentat/app.py", line 24, in run_cli
run(paths)
File "/home/zhoug/.local/lib/python3.10/site-packages/mentat/app.py", line 35, in run
loop(paths, cost_tracker)
File "/home/zhoug/.local/lib/python3.10/site-packages/mentat/app.py", line 57, in loop
explanation, code_changes = conv.get_model_response(code_file_manager, config)
File "/home/zhoug/.local/lib/python3.10/site-packages/mentat/conversation.py", line 52, in get_model_response
state = run_async_stream_and_parse_llm_response(
File "/home/zhoug/.local/lib/python3.10/site-packages/mentat/parsing.py", line 129, in run_async_stream_and_parse_llm_response
asyncio.run(
File "/usr/lib/python3.10/asyncio/runners.py", line 44, in run
return loop.run_until_complete(main)
File "/usr/lib/python3.10/asyncio/base_events.py", line 646, in run_until_complete
return future.result()
File "/home/zhoug/.local/lib/python3.10/site-packages/mentat/parsing.py", line 158, in stream_and_parse_llm_response
await _process_response(state, response, printer, code_file_manager)
File "/home/zhoug/.local/lib/python3.10/site-packages/mentat/parsing.py", line 178, in _process_response
_process_content_line(state, content_line, printer, code_file_manager)
File "/home/zhoug/.local/lib/python3.10/site-packages/mentat/parsing.py", line 201, in _process_content_line
state.new_line(code_file_manager)
File "/home/zhoug/.local/lib/python3.10/site-packages/mentat/parsing.py", line 93, in new_line
self.create_code_change(code_file_manager)
File "/home/zhoug/.local/lib/python3.10/site-packages/mentat/parsing.py", line 71, in create_code_change
CodeChange(json_data, self.code_lines, self.git_root, code_file_manager)
File "/home/zhoug/.local/lib/python3.10/site-packages/mentat/code_change.py", line 50, in __init__
self.action = CodeChangeAction(self.json_data["action"])
File "/usr/lib/python3.10/enum.py", line 385, in __call__
return cls.__new__(cls, value)
File "/usr/lib/python3.10/enum.py", line 710, in __new__
raise ve_exc
ValueError: 'rename-file' is not a valid CodeChangeAction
This plan should provide a solid foundation for the implementation of the desired functionality.
stream_handler error: 'content'
{}
--------smol dev done!---------
Segmentation fault
Different edit formats can improve the success rate of Mentat, but not all edit formats are as concise. For example we could ask GPT-4 to rewrite entire files, which would be easier than specifying edits that involve insertions / deletions. But this would be slow (and more expensive!) to use.
To keep track of this we should record how many tokens the models generate while using each edit format during benchmarks. Comparison is tricky as different tasks will require different numbers of tokens. Perhaps we could choose a set of exercises that are all solved on the first attempt when using each edit format.
The test_start_project_from_scratch
is raising a FileNotFoundError
on the main branch.
It's looking for calculator.py
which shouldn't be needed in this test anyways ๐
===================================== test session starts ======================================
platform darwin -- Python 3.11.4, pytest-7.4.0, pluggy-1.2.0
rootdir: /Users/waydegg/ghq/github.com/biobootloader/mentat
plugins: reportlog-0.4.0, mock-3.11.1, repeat-0.9.1
collected 34 items
tests/benchmark_test.py ....F [100%]
=========================================== FAILURES ===========================================
_______________________________ test_start_project_from_scratch ________________________________
mock_collect_user_input = <MagicMock id='4711084368'>
def test_start_project_from_scratch(mock_collect_user_input):
# Clear the testbed so we can test that it works with empty directories
for item in os.listdir("."):
if os.path.isfile(item):
os.remove(item)
elif os.path.isdir(item):
if item != ".git":
shutil.rmtree(item)
mock_collect_user_input.side_effect = [
"make a file that does fizzbuzz, named fizzbuzz.py, going up to 10",
"y",
KeyboardInterrupt,
]
> run(["."])
/Users/waydegg/ghq/github.com/biobootloader/mentat/tests/benchmark_test.py:126:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
/Users/waydegg/ghq/github.com/biobootloader/mentat/mentat/app.py:70: in run
loop(paths, exclude_paths, cost_tracker)
/Users/waydegg/ghq/github.com/biobootloader/mentat/mentat/app.py:92: in loop
code_file_manager = CodeFileManager(
/Users/waydegg/ghq/github.com/biobootloader/mentat/mentat/code_file_manager.py:110: in __init__
self._set_file_paths(paths, exclude_paths)
/Users/waydegg/ghq/github.com/biobootloader/mentat/mentat/code_file_manager.py:143: in _set_file_paths
file_paths_direct, file_paths_from_dirs = _abs_file_paths_from_list(
/Users/waydegg/ghq/github.com/biobootloader/mentat/mentat/code_file_manager.py:90: in _abs_file_paths_from_list
file_paths_from_dirs.update(
/Users/waydegg/ghq/github.com/biobootloader/mentat/mentat/code_file_manager.py:92: in <lambda>
lambda f: (not check_for_text) or _is_file_text_encoded(f),
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
file_path = '/private/var/folders/f0/l3msd5g16kn5vkkrdy0hbsnw0000gn/T/tmphjm0i8zm/testbed/multifile_calculator/calculator.py'
def _is_file_text_encoded(file_path):
try:
# The ultimate filetype test
> with open(file_path) as f:
E FileNotFoundError: [Errno 2] No such file or directory: '/private/var/folders/f0/l3msd5g16kn5vkkrdy0hbsnw0000gn/T/tmphjm0i8zm/testbed/multifile_calculator/calculator.py'
/Users/waydegg/ghq/github.com/biobootloader/mentat/mentat/code_file_manager.py:65: FileNotFoundError
------------------------------------- Captured stdout call -------------------------------------
Total session cost: $0.00
=================================== short test summary info ====================================
FAILED tests/benchmark_test.py::test_start_project_from_scratch - FileNotFoundError: [Errno 2] No such file or directory: '/private/var/folders/f0/l3msd5g16k...
=========================== 1 failed, 4 passed in 126.61s (0:02:06) ============================
I'm assuming we're using mypy since we have mypy-extensions
as a dependency in the requirements.txt
, however we don't have any checks setup in GH actions. Would be pretty trivial to add a command that checks for this.
@biobootloader any specific reason why choosing mypy over Pyright? I'm not sure what the best choice would be tbh.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.