shark-nlp / openicl Goto Github PK

View Code? Open in Web Editor NEW

526.0 526.0 28.0 341 KB

OpenICL is an open-source framework to facilitate research, development, and prototyping of in-context learning.

License: Apache License 2.0

Python 100.00%

in-context-learning language-model nlp

openicl's People

Stargazers

Watchers

openicl's Issues

Does every sample get the same demonstrations through vote_k retriver?

def vote_k_search(self):
    vote_k_idxs = self.votek_select(embeddings=self.embed_list, select_num=self.ice_num, k=self.votek_k,
                                    overlap_threshold=1)
    return [vote_k_idxs[:] for _ in range(len(self.test_ds))]

why？

[Feature Request] Add function that check LM inputs

请问可以通过retriver类查看为每个测试用例检索到的icl样本嘛

SNLI template in tutorial

Hi, I have been reading your tutorial about how to use template recently. And I noticed that for the example of using 'snli' dataset. The tutorial have the following code:

# Define a DatasetReader, loading dataset from huggingface and selecting 10 pieces of data randomly.
data = DatasetReader('snli', input_columns=['premise', 'hypothesis'], output_column='label', ds_size=10)

# SNLI Template
tp_str = '</E>Premise:</premise>\nHypothesis:</hypothesis>\nDoes the premise entail the hypothesis?\nOPTIONS:\n-yes -It is not possible to tell -no'
template = PromptTemplate(tp_str, column_token_map={'premise' : '</premise>', 'hypothesis' : '</hypothesis>'}, ice_token='</E>')

We can see in the template tp_str, you did not include any label information in the template, only a templated OPTIONS information instead. That doesn't meet the use of In-context learning in my opinion, since ICL requires a 'question-answer' demonstration in the context. However, in this template, you only provide the 'premise' and 'hypothesis' but not the 'label' in the context. That doesn't make sense.

the usage of tokenizer in icl_topk_retriever.py

Hi, thank you for the nice codebase!

I have a question regarding the usage of tokenizer (e.g., tokenizer_name='gpt2-xl') in the TopkRetriever.

As the sentence transformers have the "encode()" function which directly accepts the raw text as input (line 113), I was wondering, is there any particular reason that you encode the text first by using the DatasetEncoder (e.g., line 82), and then decode it again (e.g., line 112)?

Thank you in advance!

[Feature request] faiss-cpu support

Could you please consider compatibility with faiss-cpu?
There is no faiss_gpu for python >= 3.11

can not support llama?

Enable multi-gpu inference

When inferencing with larger models (tens/hundreds of Billions of parameters), using a single GPU will run out of memory. Is it possible to load models with multiple GPUs?

How to add definition and constraint?

How to add definition and constraint besides example?

TypeError: 'NoneType' object is not callable

some problems when i use api, and I've entered my apikey into the terminal

Is it possible to use this codebase to further fine-tuning(warmup) LLMs to obtain better ICL capability?

First of all, thank you for this awesome repository. It makes things ten times easier to test a model's in context learning power. I've been trying to fine-tune some LLMs to 'warmup' its capability of ICL, however, I found it is hard to find a mature codebase to do that. I've tried MetaICL's codebase, but I found downloading datasets is really painful for that model. So I wonder, is it possible to use openICL to do further fine-tuning for the LLMs? If there is a way to do so, would you mind to give me an example script or tutorial on how to do the fine-tuning? Thanks a lot.

RuntimeError: Failed to import transformers.pipelines because of the following error (look up to see its traceback):
cannot import name 'PartialState' from 'accelerate' (/anaconda/envs/mmt/lib/python3.8/site-packages/accelerate/__init__.py)

shark-nlp / openicl Goto Github PK

openicl's People

Stargazers

Watchers

Forkers

openicl's Issues

Recommend Projects

Recommend Topics

Recommend Org