nvms / wingman Goto Github PK

View Code? Open in Web Editor NEW

60.0 60.0 9.0 100.69 MB

Your pair programming wingman. Supports OpenAI, Anthropic, or any LLM on your local inference server.

Home Page: https://marketplace.visualstudio.com/items?itemName=nvms.ai-wingman

License: ISC License

CSS 3.84% HTML 0.23% JavaScript 0.37% TypeScript 51.54% Svelte 44.02%

wingman's People

Contributors

Stargazers

Watchers

Forkers

capdevc kning0 synw sycomix cyd3nt entaigner uberizual umutcanoner fredxfred

wingman's Issues

Write function comment prompt: ask before injecting the response into the file

I am using a very small local 3B model that does not follow the prompt instructions very well sometimes. For example using the "Write function comment" prompt it still outputs the body of the function despite the instruction not to included in the prompt:

So in the end the body of the function is reinjected in the file. I suggest an option to ask before modifying files, in case of wrong model output. In the example above the output is of the wrong format but is semantically correct, so I can just copy the comments into the file manually

The template feature would be useful here to include a few shots examples in the prompt to make a small local model to be able to output correctly only the comments block

Configurable timeout

Hi, thanks for this nice extension

The timeout is currently hard-coded to 1 minute. I would like to be able to make it longer. My use case: I am doing experiments using small local ggml models running via a Llama.cpp based Go server of my own (something like LocalAi). Because I have no gpu when the prompt is long it takes a lot of time to process it with only cpu, making the timeout to be reached but my local server has not finished to respond with an inference result

It would be great to have a timeout parameter in the settings to increase the timeout if necessary, or even to disable it (there is a cancel request button anyway)

Add support for Visual Studio Code for the Web

https://vscode.dev

Disable the copy icon when generating

When some code is generated in a code block in the conversation window when the mouse cursor hovers the block the copy icon is blinking. It would be better to disable it when generating

Restructure prompt categories to allow for different use context cases.

Currently only really supports software development, other cases could be writing.

FEAT: LSP Provided Context

I'd love to be able to select some code in the editor, and automatically have the source code for functions, classes, etc referenced in my selected code block included in the prompt as additional context.

The idea would be to use the LSP for whatever language I'm using to provide that via the vscode API (vscode.executeDefinitionProvider etc.)

Example:

def f(x):
    return 1 + x

def g(x):
    return 2 * f(x)

Here, if I select g and have a prompt asking the model to generate a docstring for it, the source code for f would be automatically included in the context.

Fancier versions could do things like trim source code from long definitions if a docstring is available, use vector similarity to find relevant source code in addition to parsing, or replace the source code with an llm generated summary of what a called functions does.

Also, this becomes even more useful with longer context models like Claude 100k.

The selected code is forcibly replaced.

hello @nvms
The experience of wingman v2 is excellent. Thank you for your contribution.

I found a relatively obvious problem: The selected code is automatically replaced with the generated code, and in many cases the generated code is not suitable for directly replacing the original code.

In addition, if the text button below the code is changed to an easy-to-understand icon button, the interface layout should be more beautiful.

New Goinfer provider

I started to add support for a Goinfer provider here: https://github.com/synw/wingman

main...synw:wingman:main

Please include the julia language (.jl) extension also.

It is excellent, Thank you. Please enable the right click chat option . Please include the julia language (.jl) extension also. Also please include Julia docs (1.9.3 version) in help for reference. Because, mostly all llama are trained on julia 1.6 version from 2021 data. But it changed a lot in recent months.

Add configuration for openai response type: stream or buffer

Right now we define an onProgress handler for all OpenAI requests. transitive-bullshit/chatgpt-api sees this handler and then configures fetch to receive the response in streamed chunks.

Introduce a configuration option to allow for a buffered response instead.

Relevant:

https://github.com/transitive-bullshit/chatgpt-api/blob/main/src/chatgpt-api.ts#L207

Support configurable inference params per prompt

It would be nice to support per prompt params. For now only the temperature param is supported, and can not be adjusted for different prompts because the setting is global

Example use hypothetical case: I may want a more creative setup for prompts like the Explain one, and a more deterministic for code gen. For example I could set a higher temperature and some tail free sampling for Explain, and want a lower temperature and lower top_p for the Analyze one

Some params from the Llama.cpp server api that I would like to have support for:

interface InferParams {
  n_predict: number;
  top_k: number;
  top_p: number;
  temperature: number;
  frequency_penalty: number;
  presence_penalty: number;
  repeat_penalty: number;
  tfs_z: number;
  stop: Array<string>;
}

Ref: the Llama.cpp completion endpoint doc

It would also be nice to be able to have the model params per prompt for server tools that support multiple models at runtime. My use case: I have very small 3B and 7B models and I want to use one or the other depending on the prompt: I have very specific tasks tailored for a particular model with predefined inference params (example of the concept)

Feature Request: Support Anthropic messages API

The new Claude 3 model family by Anthropic (claude-3-sonnet, claude-3-opus, and claude-3-haiku in the future) is only available in the /messages endpoint.

Usage example:

curl https://api.anthropic.com/v1/messages \
     --header "x-api-key: $ANTHROPIC_API_KEY" \
     --header "anthropic-version: 2023-06-01" \
     --header "content-type: application/json" \
     --data \
'{
    "model": "claude-3-opus-20240229",
    "max_tokens": 1024,
    "messages": [
        {"role": "user", "content": "Hello, world"}
    ]
}'

Template format for local models

A nice feature to have would be to be able to select a prompt format. The prompt templates are different for each model family. I see that you use the Alpaca format ( ### Instruction: ... {prompt}\n\n### Response:). For example it could be nice to be able to change the prompt template to for example the Llama 2 format:

<s>[INST] <<SYS>>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible

If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a question, please don't share false information.
<</SYS>>

{prompt} [/INST]

or Orca 2:

### System:
You are an AI assistant that follows instruction extremely well. Help as much as you can.

### User:
{prompt}

### Response:

[Edit] Example use case: I would like to use https://huggingface.co/TheBloke/LosslessMegaCoder-Llama2-7B-Mini-GGML that looks good on coding tasks, but it's template format is ChatMl:

<|im_start|>system
{system_message}<|im_end|>
<|im_start|>user
{prompt}<|im_end|>
<|im_start|>assistant

Feature Request: Integration of Azure OpenAI Credentials Support

🌐 Enable integration with Azure OpenAI credentials in Wingman.
🔑 This feature would allow users to utilize their Azure-based OpenAI API keys, expanding the range of accessible AI services.
💻 Facilitate seamless connectivity for developers who rely on Azure for their cloud services and AI operations.
⚙️ Enhance the flexibility of Wingman by accommodating a wider variety of cloud platforms and services.
🛠️ This addition would streamline workflows for teams and individuals already invested in the Azure ecosystem.
🤖 Support for Azure OpenAI credentials could potentially unlock new capabilities and features unique to Azure's AI offerings.

Custom Templates can not use call backs

callbackType: CallbackType.Replace, is not a valid option for custom templates as CallbackType.Replace is not defined within their context.

A new theme sharing

Thanks to nvms for writing such a concise and practical plug-in, I'm just an amateur programmer and can't make a big contribution to your project.

In order to try another new theme color matching style, If someone like it, just download the attachment below to replace it.
media.zip

After adding special prompts to llama2, it can work perfectly with LM Studio.
The timeout parameter seems to be unnecessary on the local server, and the openai timeout reminder will keep popping up.

I have made the following modifications and added the judgment of whether the message has been received to automatically complete a single conversation.

      const response = await this.instance!.sendMessage(message, {
        onProgress: (partialResponse: ChatMessage) => {
          if (!parentMessageId) {
              parentMessageId = partialResponse.parentMessageId;
          }
         // Determine whether the message has been received.
          if (partialResponse.detail?.choices[0].finish_reason == "stop") {
              SecondaryViewProvider?.postMessage({ type: "aborted" });
              return
          }
          this.viewProvider?.postMessage({ type: "partialResponse", value: partialResponse });
        },
        systemMessage,
        abortSignal: this._abort.signal,
        ...this.conversationState,
      });

New Koboldcpp provider

I made a Koboldcpp experimental provider: https://github.com/synw/wingman/blob/main/src/providers/koboldcpp.ts. I did it to be able to run inference from my 8Go RAM phone from Wingman queries using small models (Koboldcpp is the only thing that runs on my phone)

It would be nice to be able to switch provider depending on your prompt commands. I have different local servers running different backends: Goinfer on Linux, Koboldcpp on Android. I would like to be able to submit a query to one or another depending on the prompt. If the command could specify the provider it would be nice

By the way what's the plan for this extension: do you want to develop it further and maintain it, or not really? I am wondering because I am suggesting many changes and improvements, but they might not fit in your plan

Missing Implementation of 'file' Placeholder as Mentioned in README

README indicates the presence of a "{{file}}" placeholder, which is intended to be replaced with the contents of the active file, but this functionality is not implemented or not working in the extension.

Model Installed in VS Code but No Settings No Logo in Sidebar Menu

I have installed Wingman in Visual Studio Code Version: 1.85.0 on an M1 Pro MacBook running MacOS Sonoma but I can't find any settings for it and its logo doesn't show up in the left sidebar menu like other extensions do. Also when I open the Command Pallet (Command-Shift_P) I don't see the large control screen like the one shown in your README page https://marketplace.visualstudio.com/items?itemName=nvms.ai-wingman. Am I missing something obvious? I closed and restarted VS Code to no avail.

Great Job!

I can't review this in the VS Code Marketplace but I added a star here and just wanted to say I really like what you've done and how you went about it. I think there's some great potential here!

Context window and max_tokens management

Running the "Write unit tests" command with a local Llama 2 model I get an error message because of the default 1000 max_tokens param:

llama_predict: error: prompt is too long (1133 tokens, max 1020)

I would like to be able to set the context window size of the model (Llama 2 is 4096 tokens). This way the max_tokensparam value could be automatically calculated using the llama-tokenizer-js lib:

import llamaTokenizer from 'llama-tokenizer-js';

const promptNtokens = llamaTokenizer.encode(prompt).length;
const maxTokens = model_context_window_size_param - promptNTokens

[Edit]: it would need another tokenizer for the OpenAi, this one is for local models

Feature Request: Enhance Contextual Understanding by Allowing Code File Integration

📁 Integrate functionality for adding code files directly into Wingman for contextual reference.
🚀 This feature will enable Wingman to provide more accurate and context-aware suggestions, enhancing the pair programming experience.
💡 Allow users to link code files from their current project, serving as an additional context layer for the AI's understanding.
🤖 Improve the AI's ability to understand project-specific conventions, structure, and coding patterns.
🛠️ This will particularly benefit complex projects where understanding the broader codebase is essential for accurate code generation and review.
🌐 Enhance Wingman's versatility and utility across various programming languages and frameworks.

Discussion: modelfusion

Opening this issue to discuss potentially migrating all the provider API handling over to modelfusion.

https://github.com/lgrammel/modelfusion

@nvms We discussed a little over email, but maybe here is better.

Potential benefits as I see them

Wingman gets instant support for all the providers that modelfusion supports.
- OpenAI
- Azure OpenAI
- Anthropic
- Cohere
- Huggingface
- Ollama
- Llama.cpp
- OpenAI Compatible APIs (vLLM, oobabooga, LM Studio for example)
Wingman doesn't have to handle changes to those APIs etc.
Potential to use other tools provided by modelfusion at some point in the future.
- Token counter/API usage cost tracker
- Model functions for tool usage (web browsing, google searching, etc)
- Easier agents (ReAct etc)
- VectorDB/RAG tools (embeddding and search of codebase, or documentation websites, for example)
Pace of modelfusion development is currently pretty fast. Lots of new features and fast support for new models.

Potential issues:

Adding a not totally lightweight dependency
Pace of modelfusion development is really fast, so things are a little unstable.

I think a POC implementation using modelfusion would be a good starting point, so I've begun to work on that. I'll try and get a draft PR up with at least some of the work done to get to feature parity (at least) with where Wingman is now. I'm still working through the current Wingman codebase to figure out how everything is put together.

One thing I think we'll need to change up is some of the Preset parameter handling . Right now I think there's an assumption that all API config is a url and an api_key. The Azure OpenAI API config, for example, has no url parameter. Instead, it takes a few other parameters (resourceName, deploymentId, apiVersion) that are used to construct different end points for different models. I think a better separation of "API config" vs "completion/chat parameters" would maybe make sense.

`Error: Failed to open completion stream: 429 Too Many Requests`

Hi, first of - great extension! Exactly what I've been looking for.

I keep getting 429 Too Many Requests when I try to use a prompt. Nothing appears on the chat view either apart from the correct text selection and the generated prompt:

I've setup the OpenAI key, made sure VSCode and the extension are up to date, restarted VSCode several times, tried using different prompts and text selections. Still no bueno. No relevant logs in the output of Extension Host either.

Would much appreciate some help in debugging this!
Cheers.

How to debug the api responses for local model usage?

I am trying to use this extension with my custom minimalist inference server for local models (this).

I managed to make the server to process the requests correctly and run inference from the extension api calls. It responds in the OpenAi api format but the extension does not seem to understand or get the response, and is still waiting after the request is completed.

Is there a way to get the errors or debug the responses received in the extension, so that I can see what is the problem?

FEAT: API keys in VS Code secrets API

Basically, get provider API keys out of settings.json and into the VS Code secrets store

https://code.visualstudio.com/api/references/vscode-api#SecretStorage

Using the new version with local open source backends

Hi. I have checked the new version of Wingman, and I am quite disappointed by it: there is no first class support for local models and backends. Apparently the extension was built around the way that the big platforms work. I see that the concept of prompt template is not present, and this is a pain for local usage. When running local models it is very important to be able to use the correct template format according to the model if you want good results. For now I must duplicate every prompt and manually add the template format, for each model/template type: the prompt and the template are not separated here, as the big players apis do not need templating.

About the presets, it seems confusing to me because it mixes up some concepts: the provider with api type and connection params, the inference params (that depend on each query, not on the provider), and the system message that is a template concept. Some templates have a system message, and some do not, like Mistral for example.

What about implementing a prompt templates support? The idea would be able to use the predefined prompts with different template formats. If it can help I have made the Modprompt library that does this, supporting many generic template formats, as well as few shots prompts, that are often useful when working with small models.

About the providers I have made the Locallm library that supports multiple providers with a unique api: Llama.cpp server, Koboldcpp and Ollama: it may help to simplify implementing these providers support if you wish.

FEAT: Anthropic Claude support

Any interest in supporting Anthropic's Claude model? The 100k token context window opens a lot of possibilities.

If so I might take a stab at it and send a PR.

Also, love how configurable your extension is.

Wingman v2.0.8 , Local model is not supporting

Wingman v2.0.8 , Local model is not supporting. LM Studio, Koboldcpp URL are not working, please provide documentation or tutorial or video about enabling local models. Thank you

The spinning icon is still showed when a request is canceled

The spinning icon that is showed when the streaming response has not started does not disappear when a request is canceled. We must click clear conversation to remove it. It would be better if that icon could be hided on request cancel

Add support for llama2

https://ai.meta.com/resources/models-and-libraries/llama/

Allow users to define their own "provider" by decoupling the API style from predefined `Provider` types

@capdevc had the idea to decouple the API style from the provider type here:

https://github.com/nvms/wingman/pull/5#issuecomment-1586367918](https://github.com/nvms/wingman/pull/5#issuecomment-1586367918)

If solutions like https://github.com/go-skynet/LocalAI become more commonplace, this change could become more valuable to users that want this extension to use their own self-hosted solution that intentionally replicates the API of some bigger player in the space such that tools like wingman can more easily integrate with them.

Remove restriction on Ollama models

Remove restriction and allow other values in chatModel and codeModel inside "Wingman.Ollama".
In version v0.3.1 only specific models allowed. This is very limiting.

Allow use of llama3 models.