Git Product home page Git Product logo

llamppl's Introduction

LLaMPPL: A Large Language Model Probabilistic Programming Language

LLaMPPL is a research prototype for language model probabilistic programming: specifying language generation tasks by writing probabilistic programs that combine calls to LLMs, symbolic program logic, and probabilistic conditioning. To solve these tasks, LLaMPPL uses a specialized sequential Monte Carlo inference algorithm. This technique, SMC steering, is described in our paper: https://arxiv.org/abs/2306.03081.

Note: A new version of this library is available at https://github.com/probcomp/hfppl that integrates with HuggingFace language models and supports GPU acceleration.

Installation

Clone this repository and run pip install -e . in the root directory, or python setup.py develop to install in development mode. Then run python examples/{example}.py, for one of our examples (constraints.py, infilling.py, or prompt_intersection.py) to test the installation. You will be prompted for a path to the weights, in GGML format, a pretrained LLaMA model. If you have access to Meta's LLaMA weights, you can follow the instructions here to convert them to the proper format.

Usage

A LLaMPPL program is a subclass of the llamppl.Model class.

from llamppl import Model, Transformer, EOS, TokenCategorical

# A LLaMPPL model subclasses the Model class
class MyModel(Model):

    # The __init__ method is used to process arguments
    # and initialize instance variables.
    def __init__(self, prompt, forbidden_letter):
        super().__init__()

        # The string we will be generating
        self.s         = ""
        # A stateful context object for the LLM, initialized with the prompt
        self.context   = self.new_context(prompt)
        # The forbidden letter
        self.forbidden = forbidden_letter
    
    # The step method is used to perform a single 'step' of generation.
    # This might be a single token, a single phrase, or any other division.
    # Here, we generate one token at a time.
    def step(self):
        # Sample a token from the LLM -- automatically extends `self.context`
        token = self.sample(Transformer(self.context), proposal=self.proposal())

        # Condition on the token not having the forbidden letter
        self.condition(self.forbidden not in str(token).lower())

        # Update the string
        self.s += token

        # Check for EOS or end of sentence
        if token == EOS or str(token) in ['.', '!', '?']:
            # Finish generation
            self.finish()
    
    # Helper method to define a custom proposal
    def proposal(self):
        logits = self.context.logits().copy()
        forbidden_token_ids = [i for (i, v) in enumerate(self.vocab()) if self.forbidden in str(v).lower()]
        logits[forbidden_token_ids] = -float('inf')
        return TokenCategorical(logits)

The Model class provides a number of useful methods for specifying a LLaMPPL program:

  • self.sample(dist[, proposal]) samples from the given distribution. Providing a proposal does not modify the task description, but can improve inference. Here, for example, we use a proposal that pre-emptively avoids the forbidden letter.
  • self.condition(cond) conditions on the given Boolean expression.
  • self.new_context(prompt) creates a new context object, initialized with the given prompt.
  • self.finish() indicates that generation is complete.
  • self.observe(dist, obs) performs a form of 'soft conditioning' on the given distribution. It is equivalent to (but more efficient than) sampling a value v from dist and then immediately running condition(v == obs).

To run inference, we use the smc_steer method:

from llamppl import smc_steer, LLaMAConfig
# Initialize the model with weights
LLaMAConfig.set_model_path("path/to/weights.ggml")
# Create a model instance
model = MyModel("The weather today is expected to be", "e")
# Run inference
particles = smc_steer(model, 5, 3) # number of particles N, and beam factor K

Sample output:

sunny.
sunny and cool.
34° (81°F) in Chicago with winds at 5mph.
34° (81°F) in Chicago with winds at 2-9 mph.

llamppl's People

Contributors

alex-lew avatar postylem avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

llamppl's Issues

Shared library not found

Hey all,

I'm trying to run LLaMPPL in a Colab notebook and I'm getting the following error upon following the installation instructions and running an example.

Traceback (most recent call last):
  File "/content/LLaMPPL/examples/constraints.py", line 1, in <module>
    import llamppl as llp
  File "/content/LLaMPPL/llamppl/__init__.py", line 1, in <module>
    from .llama_cpp import *
  File "/content/LLaMPPL/llamppl/llama_cpp.py", line 67, in <module>
    _lib = _load_shared_library(_lib_base_name)
  File "/content/LLaMPPL/llamppl/llama_cpp.py", line 58, in _load_shared_library
    raise FileNotFoundError(
FileNotFoundError: Shared library with base name 'llama' not found

I suspect this is downstream of the -e flag in the pip install but I haven't been able to fix it.

Perhaps it's a scikit-build issue?

Any thoughts?

Buffer calls to LLM

In models that sample tokens from the prior, it is unnecessary to actually run the LLM on the newly sampled token unless the particle survives the next resampling step. Maybe there is a good way to buffer or lazily execute the LLM calls so that this optimization is automated.

Awesome work!

I will be playing with this, as I have a vested interest in seeing proper constraints implemented in LLMs which solve the issues that you outline with the technique I was using (namely, that filter-assisted decoding is greedy).

Super cool to see a bunch of MIT researchers citing my work, I'm honored!

Better alignment than "per-token" for SMC

When to resample in SMC? Currently particles are aligned by number-of-tokens, so when we resample, all particles have the same number of tokens (unless some have already hit EOS). But this isn't really fair. For example:

  • When intersecting "My favorite physicist is" and "My favorite writer is", we end up comparing particles that say, e.g., " Richard Feynman. He was" and " Neil deGrasse Tyson" -- when we really want to compare " Richard Feynman" to " Neil deGrasse Tyson".
  • When intersecting "A great personal finance tip is" and "A great tip for healthy living is", we end up comparing particles that say, e.g., " to avoid eating out" and " to make sure you're". The former loses out, intuitively because its weight already factors in the semantic constraints whereas they largely 'withhold judgment' on the vaguer latter particle.

It would be great to find a clear theoretical framework for thinking about these intermediate distributions, and other heuristics (or principled strategies) for alignment.

One heuristic worth trying might be to resample at syntax-directed points -- at the end of each sentence, clause, or some other grammatical element.

Update to more recent ggml format

I put your changes of llama.cpp into the most recent llama.cpp

Then I had to modify LLaMPPL/llamppl/llama_cpp.py to use the new code from llama_cpp_python, you can see the new file here

Probably the easier change on your end is to pull changes of your llama_cpp branch from main and edit llama_cpp.py, but here's these if needed

Edit: hmm I'm having this issue with my changes when I offload to gpu, hold on lemme look into it:

GGML_ASSERT: C:\...\llama-cpp-python\vendor\llama.cpp\ggml.c:15154: tensor->src0->backend == GGML_BACKEND_CPU

Edit edit: nvm those are just bc eval_multi doesn't have gpu support yet

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.