1rgs / jsonformer Goto Github PK
View Code? Open in Web Editor NEWA Bulletproof Way to Generate Structured JSON from Language Models
License: MIT License
A Bulletproof Way to Generate Structured JSON from Language Models
License: MIT License
I'm trying to generate an array containing objects with the following format:
"results": {
"type": "array",
"items": {
"type": "object",
"properties": {
"T": {"type": "string"},
"E": {"type": "string"},
}
}
}
But the resulting array always contains only 1 item.
Update: perhaps it's a problem with my prompt and my model, so I will close this for now.
Great library, but some use-cases require that fields be omitted, or that values can be of one type or another.
In your readme.md
, your model and tokenizer are:
model = AutoModelForCausalLM.from_pretrained("databricks/dolly-v2-12b")
tokenizer = AutoTokenizer.from_pretrained("databricks/dolly-v2-12b")
I just want to use a typical T5ForConditionalGeneration
model as follows:
"""This module provides a T5 model jsonformer."""
import transformers
from jsonformer import Jsonformer
pretrained_model_name = "t5-small"
model = transformers.T5ForConditionalGeneration.from_pretrained(
pretrained_model_name
)
tokenizer = transformers.T5Tokenizer.from_pretrained(pretrained_model_name)
json_schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
}
}
prompt = "Generate a person's information based on the following schema:"
jsonformer = Jsonformer(model, tokenizer, json_schema, prompt)
generated_data = jsonformer()
print(generated_data)
But failed due to the error when generating string:
111 self.debug("[generate_string] response", response)
112 split = response.split('"')
--> 113 assert len(split) >= 2
114 return split[1]
AssertionError:
And I might have to say, could you add more docstring and type hints for your project?
I get the error below when trying to run poetry install
. This is because the README is listed as README.md
in pyproject.toml
, but the file in the repository is named readme.md
(lowercase). Changing either of these to match solves the problem.
$ poetry install
Installing dependencies from lock file
No dependencies to install or update
[Errno 2] No such file or directory: '/home/mmior-admin/apps/jsonformer/README.md'
Is there a way to make it work with AWQ models?
Output:
Fetching 14 files: 100%|███████████████████████████| 14/14 [00:00<00:00, 115591.06it/s]
Replacing layers...: 100%|█████████████████████████████| 32/32 [00:02<00:00, 14.02it/s]
Fusing layers...: 100%|████████████████████████████████| 32/32 [00:04<00:00, 7.79it/s]
2023-12-10 18:50:15.290217: I tensorflow/tsl/cuda/cudart_stub.cc:28] Could not find cuda drivers on your machine, GPU will not be used.
2023-12-10 18:50:15.345000: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2023-12-10 18:50:15.951851: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
Traceback (most recent call last):
File "/home/conic/llm_experimentation/./jsonformer_test.py", line 40, in <module>
generated_data = jsonformer()
File "/home/conic/.local/lib/python3.10/site-packages/jsonformer/main.py", line 242, in __call__
generated_data = self.generate_object(
File "/home/conic/.local/lib/python3.10/site-packages/jsonformer/main.py", line 147, in generate_object
obj[key] = self.generate_value(schema, obj, key)
File "/home/conic/.local/lib/python3.10/site-packages/jsonformer/main.py", line 168, in generate_value
return self.generate_boolean()
File "/home/conic/.local/lib/python3.10/site-packages/jsonformer/main.py", line 90, in generate_boolean
output = self.model.forward(input_tensor.to(self.model.device))
File "/home/conic/.local/lib/python3.10/site-packages/awq/models/base.py", line 37, in forward
return self.model(*args, **kwargs)
File "/home/conic/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/conic/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/conic/.local/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1052, in forward
logits = self.lm_head(hidden_states)
File "/home/conic/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/conic/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/conic/.local/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: Inference tensors cannot be saved for backward. To work around you can make a clone to get a normal tensor and use it in autograd.
Code:
from jsonformer import Jsonformer
import torch
from awq import AutoAWQForCausalLM
from transformers import AutoTokenizer
model_name_or_path = "TheBloke/Xwin-LM-7B-V0.2-AWQ"
device_map = 'auto'
json_schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "number"},
"is_student": {"type": "boolean"},
"courses": {
"type": "array",
"items": {"type": "string"}
}
}
}
device = 'cuda' if torch.cuda.is_available() else 'cpu'
available_devices = torch.cuda.device_count()
device_name = torch.cuda.get_device_name(device=device)
model = AutoAWQForCausalLM.from_quantized(model_name_or_path,
fuse_layers=True,
trust_remote_code=False,
safetensors=True,
device_map=device_map,
)
model.device = device
tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, trust_remote_code=False,
)
prompt = "Generate a person's information based on the following schema:"
jsonformer = Jsonformer(model, tokenizer, json_schema, prompt)
generated_data = jsonformer()
print(generated_data)
As far as I understand it, this is currently not the case.
Having Jsonformer
derive from PreTrainedModel
would enable immediate use with e.g. pipeline
and other ecosystem building blocks that require a PreTrainedModel
.
It might even be possible to automatically derive directly from the (more specialized) base class loaded using from_pretrained
(and automatically loading the tokenizer from the same path except specified otherwise). That way, almost no functions would need to be changed. Other ideas:
from inspect import getmembers, isfunction
# somewhere in Jsonformer.__init__, probably
for name, func in dict(getmembers(self.model, isfunction)):
setattr(self, name, func)
Edit: Thinking about it some more (and understanding the PreTrainedModel
interface better), it's probably not that easy.
Is there any way to return an array of objects (e.g. return multiple car
objects):
{"type": "object", "properties": {"car": {"type": "object", "properties": {"make": {"type": "string"}, "model": {"type": "string"}, "year": {"type": "number"}, "colors": {"type": "array", "items": {"type": "string"}}, "features": {"type": "object", "properties": {"audio": {"type": "object", "properties": {"brand": {"type": "string"}, "speakers": {"type": "number"}, "hasBluetooth": {"type": "boolean"}}}, "safety": {"type": "object", "properties": {"airbags": {"type": "number"}, "parkingSensors": {"type": "boolean"}, "laneAssist": {"type": "boolean"}}}, "performance": {"type": "object", "properties": {"engine": {"type": "string"}, "horsepower": {"type": "number"}, "topSpeed": {"type": "number"}}}}}}}, "owner": {"type": "object", "properties": {"firstName": {"type": "string"}, "lastName": {"type": "string"}, "age": {"type": "number"}}}}}
Here is an example I tried that gave the below error:
json_schema = {
"type": "array",
"properties": {
"type": "object",
"properties": {
"car": {
"type": "object",
"properties": {
"make": {"type": "string"},
"model": {"type": "string"},
"horsepower": {"type": "number"}
}
}
}
}
}
error:
TypeError: string indices must be integers
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-0fa4792b-fa73-408b-b0a8-ecf9f5e56538/lib/python3.10/site-packages/jsonformer/main.py:242, in Jsonformer.__call__(self)
240 def __call__(self) -> Dict[str, Any]:
241 self.value = {}
--> 242 generated_data = self.generate_object(
243 self.json_schema["properties"], self.value
244 )
245 return generated_data
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-0fa4792b-fa73-408b-b0a8-ecf9f5e56538/lib/python3.10/site-packages/jsonformer/main.py:147, in Jsonformer.generate_object(self, properties, obj)
145 for key, schema in properties.items():
146 self.debug("[generate_object] generating value for", key)
--> 147 obj[key] = self.generate_value(schema, obj, key)
148 return obj
File /local_disk0/.ephemeral_nfs/envs/pythonEnv-0fa4792b-fa73-408b-b0a8-ecf9f5e56538/lib/python3.10/site-packages/jsonformer/main.py:156, in Jsonformer.generate_value(self, schema, obj, key)
150 def generate_value(
151 self,
152 schema: Dict[str, Any],
153 obj: Union[Dict[str, Any], List[Any]],
154 key: Union[str, None] = None,
155 ) -> Any:
--> 156 schema_type = schema["type"]
157 if schema_type == "number":
158 if key:
Coming here from this issue over on transformers. Looks like I'm no longer getting the error I was getting earlier but there are still performance issues.
Taking a look at the code, the key issue is that it's calling generate once for every value and there's no caching the key and value tensors so a ton of work is being done multiple times. On my machine the example code takes about 4 seconds to generate when using gpt2 as a model when it normally takes about 0.25 seconds to generate a comparable amount of text with free form generation.
I see two solutions:
prefix_allowed_tokens_fn
argument in generate gets called. My approach does have the problem mentioned in the other issue, which is that it's very slow when implemented in Python so it would need to be done in C/C++/Rust. It would also basically be a complete rewrite of this library.When I attempt to run the code in the README
, it fails with the following stack trace:
Traceback (most recent call last):
File "/home/oogali/lab/llmjson/./poc.py", line 32, in <module>
sys.exit(main())
File "/home/oogali/lab/llmjson/./poc.py", line 27, in main
generated_data = jsonformer()
File "/home/oogali/lab/llmjson/venv/lib/python3.10/site-packages/jsonformer/main.py", line 188, in __call__
generated_data = self.generate_object(
File "/home/oogali/lab/llmjson/venv/lib/python3.10/site-packages/jsonformer/main.py", line 114, in generate_object
obj[key] = self.generate_value(schema, obj, key)
File "/home/oogali/lab/llmjson/venv/lib/python3.10/site-packages/jsonformer/main.py", line 136, in generate_value
return self.generate_array(schema["items"], new_array)
File "/home/oogali/lab/llmjson/venv/lib/python3.10/site-packages/jsonformer/main.py", line 146, in generate_array
element = self.generate_value(item_schema, obj)
File "/home/oogali/lab/llmjson/venv/lib/python3.10/site-packages/jsonformer/main.py", line 131, in generate_value
obj[key if key else -1] = self.generation_marker
IndexError: list assignment index out of range
Hi, is it possible to run jsonformer using a gpu on google colab?
When I ran it today on google colab with a a100 gpu runtime I only see it using cpu/ram resources. is it possible to run dolly + jsonformer on a gpu? and would that decrease the time for the generation?
Jsonformer doesn't work with GPTQ models. For inference speed, it would be nice to have support for such models.
I run this in a laptop with 32gb ram and it top up really quick.
I'm curious about the use case where you need to extract strings with predefined requirements that is easy to validate such as phone numbers or car license numbers in specific regions that follow a predefined pattern. Similar to HTML input pattern: https://www.w3schools.com/tags/att_input_pattern.asp
Example:
car = {
"type": "object",
"properties": {
"car": {
"type": "object",
"properties": {
"make": {"type": "string"},
"model": {"type": "string"},
"year": {"type": "number", *"range": [1900, 2028]*},
"license_number": {"type": "string", *"pattern": "[0-9]{6,7}"*}
}
}
}
First off, great package! Thanks for the contribution.
I've been using this package to basically transform unstructured text to JSON. It works very well with one exception. If a value does not exist in the text, one is made up.
Instead, for a number, for example, I'd be better for it to return null or for a string to return an empty string.
I'd issue a PR but I am not sure how to accomplish this.
Thanks again!
Here's the example at the time of writing:
from jsonformer import Jsonformer
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("databricks/dolly-v2-12b")
tokenizer = AutoTokenizer.from_pretrained("databricks/dolly-v2-12b")
json_schema = {
"type": "object",
"properties": {
"name": {"type": "string"},
"age": {"type": "number"},
"is_student": {"type": "boolean"},
"courses": {
"type": "array",
"items": {"type": "string"}
}
}
}
prompt = "Generate a person's information based on the following schema:"
jsonformer = Jsonformer(model, tokenizer, json_schema, prompt)
generated_data = jsonformer()
print(generated_data)
I find it confusing, because you're not giving the model any data to extract to JSON. Should the prompt be more like: "Generate a person's information based on the following schema. The person is John Doe, aged 23. John is a student at Georgia Tech, and take the following courses: Chemistry, Mathematics, and a minor in Japanese."
Hey!
I'm currently working with a group to try to automate metadata for scholarly articles, and I wanted to use JSONformer to return the metadata. However, I am hoping to translate JSON into JSON-LD. I couldn't find here if you had support for JSON-LD or only support for JSON.
Could someone let me know if I can use this as a method to translate semi-structure to JSON-LD specifically?
Outputs when enforced through json former is getting insufficient memory errors, as the token size increases.
RuntimeError: CUDA out of memory. Tried to allocate 16.00 MiB (GPU 0; 10.76 GiB total capacity; 9.57 GiB already allocated; 16.25 MiB free; 9.70 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CON
Hey there, I think I came across this repo because GitHub Explore suggested it to me. Thanks for putting this out. I do a lot of prompt engineering to get LLMs to output clean JSON files and it is maddening how often the data returned will be malformatted in some trivially small way, so it's great to see something like this.
Had two requests for you, time permitting.
Thanks again, really appreciate your work.
I see in the code and the readme that the stopping criteria for strings is the second quotation mark. However, in most json dialects you can escape a quote in a string with \"
. This appears sometimes so it's not impossible for the LLM to produce this, too.
e.g. the prompt might be "extract the second sentence from the following paragraph" and the paragraph is:
This is the first sentence. This is the "second" sentence. This is the third sentence.
The LLM would ideally output "This is the \"second\" sentence."
but the parser wouldn't handle this correctly.
Hello, any plans for supporting training / fine tuning on specific tokens only ?
Thanks for the library. I would like to test large models such as llama2-70b from huggingface_hub. I wonder if I can use jsonformer via InferenceClient from the hub, because I don't want to download the model.
add compatibility with gemini
I encounter this error when I try to the run the latest version of JSONFormer. It looks like there is no support for Generation Configs yet.
You have modified the pretrained model configuration to control generation. This is a deprecated strategy to control generation and will be removed soon, in a future version. Please use a generation configuration file (see https://huggingface.co/docs/transformers/main_classes/text_generation )
Can you please add this support ?
This work is very interesting and potentially useful in many domains.
Do you have any recommendations on how we might fine-tune models in specific domains to better support structured extraction? Specifically, we would be interested extracting structured data from medical reports where things such as a fixed set of conditions, location site, and other semantic labels are specific to the input text, but patterns could be learned from fine-tuned training. While we could fine-tune the model with a json response to report input, it is not clear that this would be the best approach.
Thanks for this work and any future response.
Currently, the Jsonformer class is using a local transformer model and tokenizer to generate data in JSON schema format. However, it would be useful to have a version of the class that uses OpenAI's Language Models to generate data. Therefore, I would like to request the implementation of a new version of the class that takes OpenAI API keys and a model name as parameters, in addition to the existing parameters of the Jsonformer class.
The OpenAI API keys should be used to authenticate the requests made to OpenAI's API. The model name should be used to specify which LLM to use for data generation. The new version of the class should work similar to the current implementation but use OpenAI's API to generate data
This feature would allow for the generation of data using state-of-the-art LLMs from OpenAI.
Thank you for considering this feature request.
Hi,i am really interesting about your jsonformer project。i have read the code again and again,and i kown how it mask,but i really not konw how to ensure the result to be json stytly。at last,can we use other gpt like turbo to do jsonformer work?
Is there a way to add description to the keys so that the values are mapped correctly?
openai json function calling supports a couple additional keys that jsonformer doesn’t seem to have the structure to parse: description, enums and required.
does anyone have interest in introducing these?
If I see it correctly, the iterations
variable is never incremented in this function. Or did I miss something?
I want the LLM to generate a JSON like this
{
"name": "John Doe",
"info": {
"age": "41",
"tennis_club": "Detroit Club",
"wife": "Jan Doe",
}
}
The info is not always the same. Some people might not have a wife so the property would be omitted.
In theory, there could be thousands of combinations in the "info".
If I specify a schema like this:
{
"type": "object",
"properties": {
"name": { "type": "string" },
"info": {
"type": "object"
}
}
}
It would error with:
File "venv/lib/python3.11/site-packages/jsonformer/main.py", line 185, in generate_value
return self.generate_object(schema["properties"], new_obj)
~~~~~~^^^^^^^^^^^^^^
KeyError: 'properties'
So "properties" is required.
The only way I currently see to solve this is to use an array with objects of format {"key": "...", "value": "..."}
So the schema would look like:
{
"type": "object",
"properties": {
"name": { "type": "string" },
"info": {
"type": "array",
"items": {
"type": "object",
"properties": {
"key": { "type": "string" },
"value": { "type": "string" }
}
}
}
}
}
But I'd rather avoid this.
Hey guys!
First thank you so much for this project and open sourcing!. Absolutely great idea!!!
I keep running into issues when my model is already placed on the GPU. and see no way to apply a device.
Issue I get is:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument index in method wrapper_CUDA__index_select)
It would be great if you could pass the input_ids to cuda() before calling the model.generate.
Or have a way to specify your device target.
Thanks again!!
As per my current testing, it seems like jsonformer is only compatible with text based prompts. It is not compatible with prompts for models like LLaVa
Hello: When running the tiiuae/falcon-7b
model, I get no issue using the package as intented. But some models, such as tiiuae/falcon-rw-1b
will get an error in OutputNumbersTokens().call() like below:
The expanded size of the tensor (50304) must match the existing size (50257) at non-singleton dimension 1. Target sizes: [1, 50304]. Tensor sizes: [50257]
I've been trying to debug this on my own but have not figured out why sometimes self.allowed_mask
and scores
sometimes have mismatching shapes (depending on model) that will cause the above error when trying to run:
self.allowed_mask.expand_as(scores)
with llama-2-7b , normally it could pass in 2k context & my gpu can handle it , but when wrapped with jsonformer , getting out of memory error with just 500 tokens passed into the context.
we might not always want greedy sampling do we?
Could you implement do_sample beeing an init param for JsonFormer or is there anything technically that prohibits this change?
Hi,
I have issue with the generated JSON response. It seems that it doesn't respond well with array related prompt instruction.
from transformers import AutoModelForCausalLM, AutoTokenizer
print("Loading model and tokenizer...")
model_name = "databricks/dolly-v2-3b"
model = AutoModelForCausalLM.from_pretrained(model_name, use_cache=True, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True, use_cache=True)
print("Loaded model and tokenizer")
Prompt:
from jsonformer.format import highlight_values
from jsonformer.main import Jsonformer
stock2 = {
"type": "object",
"properties": {
"stocks": {
"type": "array",
"items": {"type": "string"}
}
}
}
builder = Jsonformer(
model=model,
tokenizer=tokenizer,
json_schema=stock2,
debug=True,
prompt="generate 10 stocks code",
)
print("Generating...")
output = builder()
highlight_values(output)
Response:
Generating...
[generate_object] generating value for stocks
[generate_string] generate 10 stocks code
Output result in the following JSON schema format:
{"type": "object", "properties": {"stocks": {"type": "array", "items": {"type": "string"}}}}
Result: {"stocks": ["
[generate_string] |ABC",|
[generate_string] generate 10 stocks code
Output result in the following JSON schema format:
{"type": "object", "properties": {"stocks": {"type": "array", "items": {"type": "string"}}}}
Result: {"stocks": ["ABC", "
[generate_string] |XYZ",|
[generate_string] generate 10 stocks code
Output result in the following JSON schema format:
{"type": "object", "properties": {"stocks": {"type": "array", "items": {"type": "string"}}}}
Result: {"stocks": ["ABC", "XYZ", "
[generate_string] |PQR",|
{
stocks: [
"ABC",
"XYZ",
"PQR"
]
}
The response only respond with 3 data not 10 as in the prompt. I am not sure if it is issue with the model or not.
Also, you may notice that the memory used for 3b model is at 23GB of RAM. Is this normal?
Any help would be appreciated. Thank you.
This would be awesome to also have in the JS version.
My yet uncompleted attempt to translate JSONFormer into Typescript:
https://github.com/vincenzodomina/jsonformerjs
I used numjs as a lightweight alternative to tensorflow to replace torch, but Huggingface transformers is not available in JavaScript, any help with how to work with or substitute the Huggingface interface and to at least get it running with the OpenAI API would be appreciated.
from jsonformer import Jsonformer
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-0.5B-Instruct")
text_generation_model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen2-0.5B-Instruct")
json_schema = {
"type": "object",
"properties": {
"status": {
"type": "string",
"enum": ["success", "failure"]
},
"mcq_items": {
"type": "array",
"items": {
"type": "object",
"properties": {
"question": {
"type": "string"
},
"options": {
"type": "array",
"items": {
"type": "object",
"properties": {
"option": {
"type": "string"
},
"reasoning": {
"type": "string"
},
"label": {
"type": "number"
}
},
"required": ["option", "reasoning", "label"]
}
},
"answer": {
"type": "number"
}
},
"required": ["question", "options", "answer"]
}
}
},
"required": ["status", "mcq_items"]
}
context = "A story revolving around a man forced by circumstances to participate in a mysterious game of survival. Zheng Kai Si has nothing in his name and in order to pay off his debts, he goes aboard a cruise ship as one of the players in a deadly game. It's a game of lies and deception to outsmart the enemy and emerge victoriously. For the sake of his mother and Liu Qing, Kai Si struggles to survive."
prompt = f"Generate multiple choice question(s) using provided context/topic \
and your general knowledge, including 1 correct option and 3 \
wrong options. Here is the context: {context}"
jsonformer = Jsonformer(text_generation_model, tokenizer, json_schema, prompt)
generated_data = jsonformer()
print(generated_data)
Fails, when i use "number" as a datatype.
Error:
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Traceback (most recent call last):
File "C:\Users\Lenovo\Desktop\project_minerva\jsonformer_test.py", line 57, in
generated_data = jsonformer()
^^^^^^^^^^^^
File "C:\Users\Lenovo\Desktop\project_minerva.venv\Lib\site-packages\jsonformer\main.py", line 242, in call
generated_data = self.generate_object(
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Lenovo\Desktop\project_minerva.venv\Lib\site-packages\jsonformer\main.py", line 147, in generate_object
obj[key] = self.generate_value(schema, obj, key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Lenovo\Desktop\project_minerva.venv\Lib\site-packages\jsonformer\main.py", line 178, in generate_value
return self.generate_array(schema["items"], new_array)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Lenovo\Desktop\project_minerva.venv\Lib\site-packages\jsonformer\main.py", line 192, in generate_array
element = self.generate_value(item_schema, obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Lenovo\Desktop\project_minerva.venv\Lib\site-packages\jsonformer\main.py", line 185, in generate_value
return self.generate_object(schema["properties"], new_obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Lenovo\Desktop\project_minerva.venv\Lib\site-packages\jsonformer\main.py", line 147, in generate_object
obj[key] = self.generate_value(schema, obj, key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Lenovo\Desktop\project_minerva.venv\Lib\site-packages\jsonformer\main.py", line 178, in generate_value
return self.generate_array(schema["items"], new_array)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Lenovo\Desktop\project_minerva.venv\Lib\site-packages\jsonformer\main.py", line 192, in generate_array
element = self.generate_value(item_schema, obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Lenovo\Desktop\project_minerva.venv\Lib\site-packages\jsonformer\main.py", line 185, in generate_value
return self.generate_object(schema["properties"], new_obj)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Lenovo\Desktop\project_minerva.venv\Lib\site-packages\jsonformer\main.py", line 147, in generate_object
obj[key] = self.generate_value(schema, obj, key)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Lenovo\Desktop\project_minerva.venv\Lib\site-packages\jsonformer\main.py", line 162, in generate_value
return self.generate_number()
^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Lenovo\Desktop\project_minerva.venv\Lib\site-packages\jsonformer\main.py", line 61, in generate_number
response = self.model.generate(
^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Lenovo\Desktop\project_minerva.venv\Lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Lenovo\Desktop\project_minerva.venv\Lib\site-packages\transformers\generation\utils.py", line 1758, in generate
result = self._sample(
^^^^^^^^^^^^^
File "C:\Users\Lenovo\Desktop\project_minerva.venv\Lib\site-packages\transformers\generation\utils.py", line 2410, in _sample
next_token_scores = logits_processor(input_ids, next_token_logits)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Lenovo\Desktop\project_minerva.venv\Lib\site-packages\transformers\generation\logits_process.py", line 98, in call
scores = processor(input_ids, scores)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\Lenovo\Desktop\project_minerva.venv\Lib\site-packages\jsonformer\logits_processors.py", line 81, in call
mask = self.allowed_mask.expand_as(scores)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: The expanded size of the tensor (151936) must match the existing size (151646) at non-singleton dimension 1. Target sizes: [1, 151936]. Tensor sizes: [151646]
hey any chance the team can work to provide ctransformers / GGML support? also key description options would be clutch, thanks
Many standard Ubuntu images ship with Python 3.8.
Can we reduce the required Python version from 3.10 to 3.8?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.