Git Product home page Git Product logo

Comments (10)

KutalVolkan avatar KutalVolkan commented on July 28, 2024 1

Hi @romanlutz,

I'd like to help out with implementing the many-shot jailbreaking feature. I'll read the paper, and if your suggestion about needing 256+ Q/A pairs seems to be correct, I'll start with that. Since this will be my first time contributing to an open-source project, could you please provide some guidance on the general steps for contributing?

Thanks!
Volkan

from pyrit.

romanlutz avatar romanlutz commented on July 28, 2024 1

Hi @KutalVolkan !

Thanks for reaching out! We'd love to collaborate on this one 🙂 I see this as two tasks really:

  • adding a mechanism to craft a prompt with arbitrarily many examples plus the malicious prompt we want the LLM to answer
  • collecting the examples

For the former, we have prompt templates under PyRIT/datasets/prompt_templates. Perhaps it's possible to write one that has one placeholder for where the examples would go, but then have a new subclass of PromptTemplate that can insert all the examples rather than just one? Something like

template = ManyShotTemplate.from_yaml_file(...)  # same as PromptTemplate
template.apply_parameters(prompt=prompt, examples=examples)

Where examples would be the Q&A pairs.

And then a simple orchestrator like PromptSendingOrchestrator could handle sending it to targets.

For the latter, we don't really want to become the place where all the bad stuff from the internet is collected 😄 Ideally, we would want to find these in another repository and just have an import function. Plus, people can always generate or write their own set, of course.

Regarding contributing guidelines there should be plenty in the doc folder.

Please let me know if you have questions or want to chat about any of these points! I may very well have skipped something...

from pyrit.

KutalVolkan avatar KutalVolkan commented on July 28, 2024 1

Hi @romanlutz,

I'll start by reading the paper and then implement the many-shot jailbreaking feature as you described. I'll keep you updated on my progress.

Thanks,
Volkan

from pyrit.

romanlutz avatar romanlutz commented on July 28, 2024 1

Fantastic!

I guess I made an assumption here that the "many shots" are just in one prompt. Another option would be to "fake" the conversation history which is possible with some plain model endpoints but rather unlikely with full generative AI systems (which should prevent you from doing that). So I think I'd go with the single prompt and hence the prompt template makes sense.

Happy to discuss options, though!

from pyrit.

romanlutz avatar romanlutz commented on July 28, 2024 1

We usually use model endpoint in Azure, so I can't comment much on running locally. Maybe using an earlier version of torch helps? PyRIT shouldn't be too opinionated on which one you use.

The list of prompts you found makes sense. Still, we'd have to check in the responses somewhere. As mentioned before, I'd prefer to avoid making PyRIT the place where all the bad stuff on the internet is collected. Maybe it makes sense to put that Q&A dataset in a separate repo from where we can import it? Just thinking out loud here...

from pyrit.

KutalVolkan avatar KutalVolkan commented on July 28, 2024

Hello @romanlutz,

I just wanted to inform you that, according to the paper, we can use this uncensored model: WizardLM-13B-Uncensored.

We can use it to provide answers to the following questions in the "behavior" column of this dataset: harmbench_behaviors_text_all.csv.

I tried to run the model locally and encountered an issue:

UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
attn_output = torch.nn.functional.scaled_dot_product_attention(

This issue is likely not solvable according to this discussion: GitHub Issue.

Therefore, I thought about using the inference endpoints from Hugging Face instead.

P.S. Your approach of using a single prompt definitely makes sense, and I will go with that.

from pyrit.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.