mangiucugna / json_repair Goto Github PK

View Code? Open in Web Editor NEW

584.0 584.0 33.0 281 KB

A python module to repair invalid JSON, commonly used to parse the output of LLMs

Home Page: https://pypi.org/project/json-repair/

License: MIT License

Python 100.00%

deep-learning gpt-4 json llama3 llm machine-learning mistral openai-api parser repair

json_repair's Issues

[Bug]: poor repair from json to just integer

Version of the library

0.21.0

Describe the bug

See reproduction

How to reproduce

response0 = """  Here is an example employee profile in JSON format, with keys that are less than 64 characters and made of only alphanumerics, underscores, or hyphens:
```json
{
  "employee_id": 1234,
  "name": "John Doe",
  "email": "[email protected]",
  "job_title": "Software Engineer",
  "department": "Engineering",
  "hire_date": "2020-01-01",
  "salary": 100000,
  "manager_id": 5678
}

In Markdown, you can display this JSON code block like this:

{
"employee_id": 1234,
"name": "John Doe",
"email": "[email protected]",
"job_title": "Software Engineer",
"department": "Engineering",
"hire_date": "2020-01-01",
"salary": 100000,
"manager_id": 5678
}

This will display the JSON code block with proper formatting and highlighting.
"""
from json_repair import repair_json
response = repair_json(response0)
print(response)

gives just 64

I don't see why it would choose that integer in first string as valid json.

How to control which part it returns? E.g. if I always was looking for a {} structure, I could hopefully tell repair_json that.

Expected behavior

Some control over if it should check for number, string, boolean, array, or object.

JSON returned by json_repair is not in expected format

Describe the bug
Returned json by json_repair is not in correct format which was expected.

To Reproduce
Steps to reproduce the behavior:
Run the file in python environment with langchain and json_repair installed

Expected behavior
I expected the repair_json to repair the output json.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

OS: Windows
Browser Chrome
Version 123.0.6312.106

Additional context
OpenAI has returned incomplete json due to limit in completion prompt. The json is missing ":" after 'answer40' at the end of the response. I expected json_repair to fix the issue but it has created unexpected json.

Code to get the json:

import json_repair
from langchain_core.messages.ai import AIMessage

response = AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{""answer"":[{""traits"":""Female aged 60+"",""answer1"":""5"",""answer2"":""Labrador Retriever"",""answer3"":""75"",""answer4"":""Buddy"",""answer5"":""It was my late husband\'s nickname"",""answer6"":""Bud"",""answer7"":""Before 2020"",""answer8"":""yes"",""answer9"":""6"",""answer10"":""Instant connection"",""answer11"":""It\'s taught me more about unconditional love"",""answer12"":""5"",""answer13"":""Family Member"",""answer14"":""Because he\'s part of the family"",""answer15"":""Paw shake, about a week"",""answer16"":""14, in a dog bed by my bedside"",""answer17"":""yes"",""answer18"":""$1200"",""answer19"":""20%"",""answer20"":""60%"",""answer21"":""$100"",""answer22"":""Spent less in March"",""answer23"":""Monthly"",""answer24"":""Dry food"",""answer25"":""I know a lot about the brand"",""answer26"":""5"",""answer27"":""3"",""answer28"":""5"",""answer29"":""4"",""answer30"":""5"",""answer31"":""4"",""answer32"":""2"",""answer33"":""Blue Buffalo"",""answer34"":""High quality, my dog loves it, good value"",""answer35"":""5"",""answer36"":""4"",""answer37"":""5"",""answer38"":""4"",""answer39"":""5"",""answer40"":""4"",""answer41"":""3"",""answer42"":""3"",""answer43"":""I interact more"",""answer44"":""The ongoing pandemic"",""answer45"":""No"",""answer46"":""Yes"",""answer47"":""Car"",""answer48"":""Female"",""answer49"":""63"",""answer50"":""1"",""answer51"":""Graduate degree"",""answer52"":""Retired"",""answer53"":""Less than $25,000"",""answer54"":""Sarasota"",""answer55"":""Florida""},{""traits"":""Female aged 60+"",""answer1"":""3"",""answer2"":""Beagle"",""answer3"":""20"",""answer4"":""Scout"",""answer5"":""To Kill a Mockingbird is my favorite book"",""answer6"":""Scooty"",""answer7"":""After 2020"",""answer8"":""no"",""answer9"":""8"",""answer10"":""Overwhelmed with joy"",""answer11"":""I\'ve become more active"",""answer12"":""5"",""answer13"":""Best Friend"",""answer14"":""We do everything together"",""answer15"":""Stay, a couple of days"",""answer16"":""16, on the living room couch"",""answer17"":""yes"",""answer18"":""$800"",""answer19"":""30%"",""answer20"":""50%"",""answer21"":""$70"",""answer22"":""Spent more in March"",""answer23"":""Bi-weekly"",""answer24"":""Mix"",""answer25"":""I know just the basics about the brand"",""answer26"":""5"",""answer27"":""4"",""answer28"":""5"",""answer29"":""3"",""answer30"":""5"",""answer31"":""2"",""answer32"":""4"",""answer33"":""Purina"",""answer34"":""Affordable, accessible, and Scout enjoys it"",""answer35"":""4"",""answer36"":""5"",""answer37"":""4"",""answer38"":""3"",""answer39"":""5"",""answer40"":""2"",""answer41"":""4"",""answer42"":""2"",""answer43"":""No change"",""answer44"":""The pandemic, but we\'ve adapted"",""answer45"":""Yes"",""answer46"":""No"",""answer47"":""N/A-I\'m not planning to travel with my dog this Spring"",""answer48"":""Female"",""answer49"":""68"",""answer50"":""2"",""answer51"":""Bachelor\'s degree"",""answer52"":""Retired"",""answer53"":""$50,000 to $74,999"",""answer54"":""Boise"",""answer55"":""Idaho""},{""traits"":""Female aged 60+"",""answer1"":""7"",""answer2"":""Golden Retriever"",""answer3"":""65"",""answer4"":""Sunny"",""answer5"":""Her cheerful personality"",""answer6"":""Sun"",""answer7"":""Before 2020"",""answer8"":""yes"",""answer9"":""10"",""answer10"":""Pure happiness"",""answer11"":""I appreciate the simple joys more"",""answer12"":""5"",""answer13"":""Loyal companion"",""answer14"":""Sunny is always by my side"",""answer15"":""Open doors, took about a month"",""answer16"":""12, on her plush bed in the sunroom"",""answer17"":""yes"",""answer18"":""$1500"",""answer19"":""25%"",""answer20"":""40%"",""answer21"":""$130"",""answer22"":""Spent about the same in March"",""answer23"":""Weekly"",""answer24"":""Dry food"",""answer25"":""I know a lot about the brand"",""answer26"":""5"",""answer27"":""2"",""answer28"":""5"",""answer29"":""4"",""answer30"":""5"",""answer31"":""5"",""answer32"":""1"",""answer33"":""Royal Canin"",""answer34"":""Superb quality, vet recommended"",""answer35"":""5"",""answer36"":""3"",""answer37"":""5"",""answer38"":""4"",""answer39"":""5"",""answer40"":""5"",""answer41"":""2"",""answer42"":""2"",""answer43"":""I interact more"",""answer44"":""Pandemic, we\'re staying home more"",""answer45"":""Not Sure"",""answer46"":""Yes"",""answer47"":""Car"",""answer48"":""Female"",""answer49"":""65"",""answer50"":""1"",""answer51"":""Some college, no degree"",""answer52"":""Employed part-time"",""answer53"":""$25,000 to $49,999"",""answer54"":""Tucson"",""answer55"":""Arizona""},{""traits"":""Female aged 60+"",""answer1"":""9"",""answer2"":""Dachshund"",""answer3"":""14"",""answer4"":""Frankie"",""answer5"":""His long body reminds me of a hotdog"",""answer6"":""Frank"",""answer7"":""In 2020"",""answer8"":""no"",""answer9"":""48"",""answer10"":""Amused by his quirky attitude"",""answer11"":""Laughs are a daily occurrence now"",""answer12"":""5"",""answer13"":""Protector"",""answer14"":""Fearless despite his size"",""answer15"":""Dance, surprisingly one weekend"",""answer16"":""10, in my bed"",""answer17"":""yes"",""answer18"":""$500"",""answer19"":""20%"",""answer20"":""40%"",""answer21"":""$40"",""answer22"":""Spent less in March"",""answer23"":""Monthly"",""answer24"":""Wet food"",""answer25"":""I\'m not very familiar with the brand"",""answer26"":""5"",""answer27"":""3"",""answer28"":""4"",""answer29"":""3"",""answer30"":""4"",""answer31"":""3"",""answer32"":""3"",""answer33"":""Hill\'s Science Diet"",""answer34"":""Recommended by my vet, Frankie\'s health"",""answer35"":""4"",""answer36"":""4"",""answer37"":""4"",""answer38"":""3"",""answer39"":""4"",""answer40"":""4"",""answer41"":""3"",""answer42"":""4"",""answer43"":""I interact more"",""answer44"":""Retirement, I have more time"",""answer45"":""Yes"",""answer46"":""No"",""answer47"":""N/A-I\'m not planning to travel with my dog this Spring"",""answer48"":""Female"",""answer49"":""70"",""answer50"":""1"",""answer51"":""Associate degree"",""answer52"":""Retired"",""answer53"":""Less than $25,000"",""answer54"":""Springfield"",""answer55"":""Missouri""},{""traits"":""Female aged 60+"",""answer1"":""2"",""answer2"":""Poodle"",""answer3"":""10"",""answer4"":""Coco"",""answer5"":""Her fur\'s chocolate color"",""answer6"":""Cokes"",""answer7"":""After 2020"",""answer8"":""yes"",""answer9"":""3"",""answer10"":""Adoration"",""answer11"":""I feel less lonely"",""answer12"":""5"",""answer13"":""Best Friend"",""answer14"":""Coco follows me everywhere"",""answer15"":""Fetch, just a few sessions"",""answer16"":""18, on a designer doggy bed"",""answer17"":""yes"",""answer18"":""$2000"",""answer19"":""10%"",""answer20"":""70%"",""answer21"":""$160"",""answer22"":""Spent more in March"",""answer23"":""Weekly"",""answer24"":""Mix"",""answer25"":""I know a lot about the brand"",""answer26"":""5"",""answer27"":""2"",""answer28"":""5"",""answer29"":""5"",""answer30"":""5"",""answer31"":""3"",""answer32"":""5"",""answer33"":""Orijen"",""answer34"":""Coco\'s improvement, natural ingredients"",""answer35"":""5"",""answer36"":""3"",""answer37"":""5"",""answer38"":""5"",""answer39"":""5"",""answer40"":""3"",""answer41"":""5"",""answer42"":""3"",""answer43"":""I interact more"",""answer44"":""Having moved to a pet-friendly community"",""answer45"":""Yes"",""answer46"":""Yes"",""answer47"":""Car"",""answer48"":""Female"",""answer49"":""61"",""answer50"":""1"",""answer51"":""Graduate degree"",""answer52"":""Self-employed"",""answer53"":""$100,000 to $149,000"",""answer54"":""San Diego"",""answer55"":""California""},{""traits"":""Female aged 60+"",""answer1"":""11"",""answer2"":""Yorkshire Terrier"",""answer3"":""7"",""answer4"":""Pixie"",""answer5"":""Her tiny, fairy-like appearance"",""answer6"":""Pix"",""answer7"":""Before 2020"",""answer8"":""no"",""answer9"":""12"",""answer10"":""A bit anxious, she was so small"",""answer11"":""I enjoy everyday moments more"",""answer12"":""5"",""answer13"":""Family Member"",""answer14"":""She\'s been with me through thick and thin"",""answer15"":""Spin, just took two weeks"",""answer16"":""13, her cushioned crate"",""answer17"":""yes"",""answer18"":""$700"",""answer19"":""15%"",""answer20"":""45%"",""answer21"":""$60"",""answer22"":""Spent about the same in March"",""answer23"":""Bi-weekly"",""answer24"":""Dry food"",""answer25"":""I know just the basics about the brand"",""answer26"":""4"",""answer27"":""3"",""answer28"":""4"",""answer29"":""5"",""answer30"":""5"",""answer31"":""1"",""answer32"":""4"",""answer33"":""Iams"",""answer34"":""My Pixie\'s health, cost-effective"",""answer35"":""4"",""answer36"":""5"",""answer37"":""4"",""answer38"":""4"",""answer39"":""5"",""answer40"":""1"",""answer41"":""4"",""answer42"":""5"",""answer43"":""I interact more"",""answer44"":""The pandemic has us spending more time indoors"",""answer45"":""Not Sure"",""answer46"":""No"",""answer47"":""N/A-I\'m not planning to travel with my dog this Spring"",""answer48"":""Female"",""answer49"":""72"",""answer50"":""2"",""answer51"":""High school diploma or equivalent"",""answer52"":""Retired"",""answer53"":""Less than $25,000"",""answer54"":""Macon"",""answer55"":""Georgia""},{""traits"":""Female aged 60+"",""answer1"":""4"",""answer2"":""Border Collie"",""answer3"":""45"",""answer4"":""Shep"",""answer5"":""Traditional name for a sheepdog"",""answer6"":""Sheppy"",""answer7"":""Before 2020"",""answer8"":""yes"",""answer9"":""5"",""answer10"":""I felt protective"",""answer11"":""Daily physical activity is a must"",""answer12"":""5"",""answer13"":""Loyal companion"",""answer14"":""Always ready to work and play"",""answer15"":""We\'ve mastered herding techniques, took months"",""answer16"":""10, in the utility room on his mat"",""answer17"":""yes"",""answer18"":""$1300"",""answer19"":""10%"",""answer20"":""55%"",""answer21"":""$110"",""answer22"":""Spent about the same in March"",""answer23"":""Weekly"",""answer24"":""Dry food"",""answer25"":""I know a lot about the brand"",""answer26"":""5"",""answer27"":""4"",""answer28"":""5"",""answer29"":""3"",""answer30"":""5"",""answer31"":""3"",""answer32"":""2"",""answer33"":""Acana"",""answer34"":""High-quality, breed-specific formula"",""answer35"":""5"",""answer36"":""4"",""answer37"":""5"",""answer38"":""4"",""answer39"":""5"",""answer40"":""3"",""answer41"":""2"",""answer42"":""4"",""answer43"":""No change"",""answer44"":""Seasonal allergies, we adjust our time outside"",""answer45"":""No"",""answer46"":""No"",""answer47"":""N/A-I\'m not planning to travel with my dog this Spring"",""answer48"":""Female"",""answer49"":""67"",""answer50"":""1"",""answer51"":""Bachelor\'s degree"",""answer52"":""Employed part-time"",""answer53"":""$25,000 to $49,999"",""answer54"":""Topeka"",""answer55"":""Kansas""},{""traits"":""Female aged 60+"",""answer1"":""10"",""answer2"":""Chihuahua"",""answer3"":""5"",""answer4"":""Tiny"",""answer5"":""Her petite size"",""answer6"":""T"",""answer7"":""After 2020"",""answer8"":""yes"",""answer9"":""24"",""answer10"":""Amused by her sassiness"",""answer11"":""It\'s important to love fiercely"",""answer12"":""5"",""answer13"":""Best Friend"",""answer14"":""Tiny is always by my side, my constant tiny shadow"",""answer15"":""Sit pretty, she picked it up in days"",""answer16"":""14, she curls up in the bed under the window"",""answer17"":""yes"",""answer18"":""$550"",""answer19"":""40%"",""answer20"":""30%"",""answer21"":""$50"",""answer22"":""Spent less in March"",""answer23"":""Bi-weekly"",""answer24"":""Mix"",""answer25"":""I know just the basics about the brand"",""answer26"":""5"",""answer27"":""5"",""answer28"":""5"",""answer29"":""2"",""answer30"":""5"",""answer31"":""2"",""answer32"":""1"",""answer33"":""Cesar"",""answer34"":""Tiny loves it, it\'s affordable, easy to store"",""answer35"":""5"",""answer36"":""5"",""answer37"":""4"",""answer38"":""2"",""answer39"":""5"",""answer40"":""2"",""answer41"":""1"",""answer42"":""5"",""answer43"":""No change"",""answer44"":""The seasons changing is the biggest influence"",""answer45"":""Yes"",""answer46"":""Yes"",""answer47"":""Car"",""answer48"":""Female"",""answer49"":""69"",""answer50"":""1"",""answer51"":""Some college, no degree"",""answer52"":""Retired"",""answer53"":""$50,000 to $74,999"",""answer54"":""El Paso"",""answer55"":""Texas""},{""traits"":""Female aged 60+"",""answer1"":""6"",""answer2"":""Bulldog"",""answer3"":""50"",""answer4"":""Winston"",""answer5"":""His stately, British-like demeanor"",""answer6"":""Win"",""answer7"":""Before 2020"",""answer8"":""no"",""answer9"":""3"",""answer10"":""Awe, he\'s got so much character"",""answer11"":""I\'ve become more patient"",""answer12"":""5"",""answer13"":""Family Member"",""answer14"":""He\'s more like a child to me than a pet"",""answer15"":""Speak on command, a stubborn month or so"",""answer16"":""12, sprawled across the hallway rug"",""answer17"":""yes"",""answer18"":""$1100"",""answer19"":""15%"",""answer20"":""45%"",""answer21"":""$90"",""answer22"":""Spent about the same in March"",""answer23"":""Monthly"",""answer24"":""Dry food"",""answer25"":""I know a lot about the brand"",""answer26"":""4"",""answer27"":""2"",""answer28"":""4"",""answer29"":""5"",""answer30"":""5"",""answer31"":""3"",""answer32"":""3"",""answer33"":""Nutro"",""answer34"":""Non-GMO ingredients, Winston\'s preference"",""answer35"":""4"",""answer36"":""3"",""answer37"":""4"",""answer38"":""5"",""answer39"":""5"",""answer40"":""3"",""answer41"":""3"",""answer42"":""3"",""answer43"":""I interact less"",""answer44"":""Winston\'s aging, he is more independent"",""answer45"":""Not Sure"",""answer46"":""No"",""answer47"":""N/A-I\'m not planning to travel with my dog this Spring"",""answer48"":""Female"",""answer49"":""65"",""answer50"":""2"",""answer51"":""Graduate degree"",""answer52"":""Retired"",""answer53"":""Prefer not to say"",""answer54"":""Charleston"",""answer55"":""South Carolina""},{""traits"":""Female aged 60+"",""answer1"":""8"",""answer2"":""Shih Tzu"",""answer3"":""12"",""answer4"":""Gizmo"",""answer5"":""His playful and curious nature"",""answer6"":""Gizzy"",""answer7"":""In 2020"",""answer8"":""yes"",""answer9"":""60"",""answer10"":""Instant love, he\'s such a fluffball"",""answer11"":""I\'m more outgoing, he needs lots of socializing"",""answer12"":""5"",""answer13"":""Best Friend"",""answer14"":""We have an unbreakable bond"",""answer15"":""Roll over, a week\'s full of treats and praise"",""answer16"":""15, on his favorite armchair"",""answer17"":""yes"",""answer18"":""$950"",""answer19"":""10%"",""answer20"":""55%"",""answer21"":""$80"",""answer22"":""Spent about the same in March"",""answer23"":""Monthly"",""answer24"":""Dry food"",""answer25"":""I know just the basics about the brand"",""answer26"":""5"",""answer27"":""4"",""answer28"":""5"",""answer29"":""3"",""answer30"":""5"",""answer31"":""1"",""answer32"":""4"",""answer33"":""Science Diet"",""answer34"":""Gizzy likes it, vet approved, easy to find"",""answer35"":""4"",""answer36"":""4"",""answer37"":""5"",""answer38"":""3"",""answer39"":""5"",""answer40"":""1"",""answer41"":""4"",""answer42"":""4"",""answer43"":""No change"",""answer44"":""Seasonal weather, we keep our routine"",""answer45"":""Yes"",""answer46"":""Not Sure"",""answer47"":""N/A-I\'m not planning to travel with my dog this Spring"",""answer48"":""Female"",""answer49"":""64"",""answer50"":""1"",""answer51"":""Bachelor\'s degree"",""answer52"":""Employed full-time"",""answer53"":""$50,000 to $74,999"",""answer54"":""Reno"",""answer55"":""Nevada""},{""traits"":""Female aged 60+"",""answer1"":""1"",""answer2"":""Australian Shepherd"",""answer3"":""30"",""answer4"":""Blue"",""answer5"":""His striking blue eyes"",""answer6"":""Boo"",""answer7"":""After 2020"",""answer8"":""no"",""answer9"":""2"",""answer10"":""Thrilled, he was full of energy"",""answer11"":""I\'ve got a new sense of purpose"",""answer12"":""5"",""answer13"":""Loyal companion"",""answer14"":""He\'s loyal and always there for support"",""answer15"":""Fetch with a frisbee, about two weeks"",""answer16"":""14, on a runner in the hall"",""answer17"":""yes"",""answer18"":""$1000"",""answer19"":""5%"",""answer20"":""65%"",""answer21"":""$85"",""answer22"":""Spent more in March"",""answer23"":""Weekly"",""answer24"":""Dry food"",""answer25"":""I know a lot about the brand"",""answer26"":""5"",""answer27"":""3"",""answer28"":""5"",""answer29"":""4"",""answer30"":""5"",""answer31"":""4"",""answer32"":""1"",""answer33"":""Taste of the Wild"",""answer34"":""Grain-free, Blue\'s coat looks amazing"",""answer35"":""5"",""answer36"":""4"",""answer37"":""5"",""answer38"":""4"",""answer39"":""5"",""answer40', 'name': 'Answers'}})

final_json = json_repair.loads(response.additional_kwargs['function_call']['arguments'])['answer']

JSON return by json_repair:

[{'traits': '', 60: '', 'answer1': '', 5: ',', 'answer2': '', 'Retriever': '', 'answer3': '', 75: ',', 'answer4': '', 'Buddy': '', 'answer5': '', 'nickname': '', 'answer6': '', 'Bud': '', 'answer7': '', 2020: ',', 'answer8': '', 'yes': '', 'answer9': '', 6: ',', 'answer10': '', 'connection': '', 'answer11': '', 'love': '', 'answer12': '', 'answer13': '', 'Member': '', 'answer14': '', 'family': '', 'answer15': '', 'week': '', 'answer16': '', 14: 'in a dog bed by my bedside', ',': 'answer18', ':': 1200, 'answer19': '', 20: '', 'answer20': '', 'answer21': '', 100: ',', 'answer22': '', 'March': '', 'answer23': '', 'Monthly': '', 'answer24': '', 'food': '', 'answer25': '', 'brand': '', 'answer26': '', 'answer27': '', 3: ',', 'answer28': '', 'answer29': '', 4: ',', 'answer30': '', 'answer31': '', 'answer32': '', 2: ',', 'answer33': '', 'Buffalo': '', 'answer34': '', 'value': '', 'answer35': '', 'answer36': '', 'answer37': '', 'answer38': '', 'answer39': '', 'answer40': '', 'answer41': '', 'answer42': '', 'answer43': '', 'more': '', 'answer44': '', 'pandemic': '', 'answer45': '', 'No': '', 'answer46': '', 'Yes': '', 'answer47': '', 'Car': '', 'answer48': '', 'Female': '', 'answer49': '', 63: ',', 'answer50': '', 1: ',', 'answer51': '', 'degree': '', 'answer52': '', 'Retired': '', 'answer53': '', 25: 0, 'answer54': '', 'Sarasota': '', 'answer55': '', 'Florida': ''}, {'traits': '', 60: '', 'answer1': '', 3: ',', 'answer2': '', 'Beagle': '', 'answer3': '', 20: ',', 'answer4': '', 'Scout': '', 'answer5': '', 'book': '', 'answer6': '', 'Scooty': '', 'answer7': '', 2020: ',', 'answer8': '', 'no': '', 'answer9': '', 8: ',', 'answer10': '', 'joy': '', 'answer11': '', 'active': '', 'answer12': '', 5: ',', 'answer13': '', 'Friend': '', 'answer14': '', 'together': '', 'answer15': '', 'days': '', 'answer16': '', 16: 'on the living room couch', ',': 'answer18', ':': 800, 'answer19': '', 30: '', 'answer20': '', 50: 0, 'answer21': '', 70: ',', 'answer22': '', 'March': '', 'answer23': '', 'Bi-weekly': '', 'answer24': '', 'Mix': '', 'answer25': '', 'brand': '', 'answer26': '', 'answer27': '', 4: ',', 'answer28': '', 'answer29': '', 'answer30': '', 'answer31': '', 2: ',', 'answer32': '', 'answer33': '', 'Purina': '', 'answer34': '', 'it': '', 'answer35': '', 'answer36': '', 'answer37': '', 'answer38': '', 'answer39': '', 'answer40': '', 'answer41': '', 'answer42': '', 'answer43': '', 'change': '', 'answer44': '', 'adapted': '', 'answer45': '', 'Yes': '', 'answer46': '', 'No': '', 'answer47': '', 'Spring': '', 'answer48': '', 'Female': '', 'answer49': '', 68: ',', 'answer50': '', 'answer51': '', 'degree': '', 'answer52': '', 'Retired': '', 'answer53': '', 74: 999, 'answer54': '', 'Boise': '', 'answer55': '', 'Idaho': ''}, {'traits': '', 60: '', 'answer1': '', 7: ',', 'answer2': '', 'Retriever': '', 'answer3': '', 65: ',', 'answer4': '', 'Sunny': '', 'answer5': '', 'personality': '', 'answer6': '', 'Sun': '', 'answer7': '', 2020: ',', 'answer8': '', 'yes': '', 'answer9': '', 10: ',', 'answer10': '', 'happiness': '', 'answer11': '', 'more': '', 'answer12': '', 5: ',', 'answer13': '', 'companion': '', 'answer14': '', 'side': '', 'answer15': '', 'month': '', 'answer16': '', 12: 'on her plush bed in the sunroom', ',': 'answer18', ':': 1500, 'answer19': '', 25: 0, 'answer20': '', 40: '', 'answer21': '', 130: ',', 'answer22': '', 'March': '', 'answer23': '', 'Weekly': '', 'answer24': '', 'food': '', 'answer25': '', 'brand': '', 'answer26': '', 'answer27': '', 2: ',', 'answer28': '', 'answer29': '', 4: ',', 'answer30': '', 'answer31': '', 'answer32': '', 1: ',', 'answer33': '', 'Canin': '', 'answer34': '', 'recommended': '', 'answer35': '', 'answer36': '', 3: ',', 'answer37': '', 'answer38': '', 'answer39': '', 'answer40': '', 'answer41': '', 'answer42': '', 'answer43': '', 'answer44': '', 'answer45': '', 'Sure': '', 'answer46': '', 'Yes': '', 'answer47': '', 'Car': '', 'answer48': '', 'Female': '', 'answer49': '', 'answer50': '', 'answer51': '', 'degree': '', 'answer52': '', 'part-time': '', 'answer53': '', 49: 999, 'answer54': '', 'Tucson': '', 'answer55': '', 'Arizona': ''}, {'traits': '', 60: '', 'answer1': '', 9: ',', 'answer2': '', 'Dachshund': '', 'answer3': '', 14: ',', 'answer4': '', 'Frankie': '', 'answer5': '', 'hotdog': '', 'answer6': '', 'Frank': '', 'answer7': '', 2020: ',', 'answer8': '', 'no': '', 'answer9': '', 48: ',', 'answer10': '', 'attitude': '', 'answer11': '', 'now': '', 'answer12': '', 5: ',', 'answer13': '', 'Protector': '', 'answer14': '', 'size': '', 'answer15': '', 'weekend': '', 'answer16': '', 10: 'in my bed', ',': 'answer18', ':': 500, 'answer19': '', 20: '', 'answer20': '', 40: ',', 'answer21': '', 'answer22': '', 'March': '', 'answer23': '', 'Monthly': '', 'answer24': '', 'food': '', 'answer25': '', 'brand': '', 'answer26': '', 'answer27': '', 3: ',', 'answer28': '', 4: ',', 'answer29': '', 'answer30': '', 'answer31': '', 'answer32': '', 'answer33': '', 'Diet': '', 'answer34': '', 'health': '', 'answer35': '', 'answer36': '', 'answer37': '', 'answer38': '', 'answer39': '', 'answer40': '', 'answer41': '', 'answer42': '', 'answer43': '', 'more': '', 'answer44': '', 'time': '', 'answer45': '', 'Yes': '', 'answer46': '', 'No': '', 'answer47': '', 'Spring': '', 'answer48': '', 'Female': '', 'answer49': '', 70: ',', 'answer50': '', 1: ',', 'answer51': '', 'degree': '', 'answer52': '', 'Retired': '', 'answer53': '', 25: 0, 'answer54': '', 'Springfield': '', 'answer55': '', 'Missouri': ''}, {'traits': '', 60: '', 'answer1': '', 2: ',', 'answer2': '', 'Poodle': '', 'answer3': '', 10: '', 'answer4': '', 'Coco': '', 'answer5': '', 'color': '', 'answer6': '', 'Cokes': '', 'answer7': '', 2020: ',', 'answer8': '', 'yes': '', 'answer9': '', 3: ',', 'answer10': '', 'Adoration': '', 'answer11': '', 'lonely': '', 'answer12': '', 5: ',', 'answer13': '', 'Friend': '', 'answer14': '', 'everywhere': '', 'answer15': '', 'sessions': '', 'answer16': '', 18: 'on a designer doggy bed', ',': 'answer18', ':': 2000, 'answer19': '', 'answer20': '', 70: '', 'answer21': '', 160: ',', 'answer22': '', 'March': '', 'answer23': '', 'Weekly': '', 'answer24': '', 'Mix': '', 'answer25': '', 'brand': '', 'answer26': '', 'answer27': '', 'answer28': '', 'answer29': '', 'answer30': '', 'answer31': '', 'answer32': '', 'answer33': '', 'Orijen': '', 'answer34': '', 'ingredients': '', 'answer35': '', 'answer36': '', 'answer37': '', 'answer38': '', 'answer39': '', 'answer40': '', 'answer41': '', 'answer42': '', 'answer43': '', 'more': '', 'answer44': '', 'community': '', 'answer45': '', 'Yes': '', 'answer46': '', 'answer47': '', 'Car': '', 'answer48': '', 'Female': '', 'answer49': '', 61: ',', 'answer50': '', 1: ',', 'answer51': '', 'degree': '', 'answer52': '', 'Self-employed': '', 'answer53': '', 100: 0, 149: 0, 'answer54': '', 'Diego': '', 'answer55': '', 'California': ''}, {'traits': '', 60: ',', 'answer1': '', 11: ',', 'answer2': '', 'Terrier': '', 'answer3': '', 7: ',', 'answer4': '', 'Pixie': '', 'answer5': '', 'appearance': '', 'answer6': '', 'Pix': '', 'answer7': '', 2020: ',', 'answer8': '', 'no': '', 'answer9': '', 12: ',', 'answer10': '', 'small': '', 'answer11': '', 'more': '', 'answer12': '', 5: ',', 'answer13': '', 'Member': '', 'answer14': '', 'thin': '', 'answer15': '', 'weeks': '', 'answer16': '', 13: 'her cushioned crate', ',': 'answer18', ':': 700, 'answer19': '', 15: '', 'answer20': '', 45: '', 'answer21': '', 'answer22': '', 'March': '', 'answer23': '', 'Bi-weekly': '', 'answer24': '', 'food': '', 'answer25': '', 'brand': '', 'answer26': '', 4: ',', 'answer27': '', 3: ',', 'answer28': '', 'answer29': '', 'answer30': '', 'answer31': '', 1: ',', 'answer32': '', 'answer33': '', 'Iams': '', 'answer34': '', 'cost-effective': '', 'answer35': '', 'answer36': '', 'answer37': '', 'answer38': '', 'answer39': '', 'answer40': '', 'answer41': '', 'answer42': '', 'answer43': '', 'answer44': '', 'indoors': '', 'answer45': '', 'Sure': '', 'answer46': '', 'No': '', 'answer47': '', 'Spring': '', 'answer48': '', 'Female': '', 'answer49': '', 72: ',', 'answer50': '', 2: ',', 'answer51': '', 'equivalent': '', 'answer52': '', 'Retired': '', 'answer53': '', 25: 0, 'answer54': '', 'Macon': '', 'answer55': '', 'Georgia': ''}, {'traits': '', 60: '', 'answer1': '', 4: ',', 'answer2': '', 'Collie': '', 'answer3': '', 45: ',', 'answer4': '', 'Shep': '', 'answer5': '', 'sheepdog': '', 'answer6': '', 'Sheppy': '', 'answer7': '', 2020: ',', 'answer8': '', 'yes': '', 'answer9': '', 5: ',', 'answer10': '', 'protective': '', 'answer11': '', 'must': '', 'answer12': '', 'answer13': '', 'companion': '', 'answer14': '', 'play': '', 'answer15': '', 'months': '', 'answer16': '', 10: '', ',': 'answer18', ':': 1300, 'answer19': '', 'answer20': '', 55: '', 'answer21': '', 110: ',', 'answer22': '', 'March': '', 'answer23': '', 'Weekly': '', 'answer24': '', 'food': '', 'answer25': '', 'brand': '', 'answer26': '', 'answer27': '', 'answer28': '', 'answer29': '', 3: ',', 'answer30': '', 'answer31': '', 'answer32': '', 2: ',', 'answer33': '', 'Acana': '', 'answer34': '', 'formula': '', 'answer35': '', 'answer36': '', 'answer37': '', 'answer38': '', 'answer39': '', 'answer40': '', 'answer41': '', 'answer42': '', 'answer43': '', 'change': '', 'answer44': '', 'outside': '', 'answer45': '', 'No': '', 'answer46': '', 'answer47': '', 'Spring': '', 'answer48': '', 'Female': '', 'answer49': '', 67: ',', 'answer50': '', 1: ',', 'answer51': '', 'degree': '', 'answer52': '', 'part-time': '', 'answer53': '', 25: 0, 49: 999, 'answer54': '', 'Topeka': '', 'answer55': '', 'Kansas': ''}, {'traits': '', 60: '', 'answer1': '', 10: ',', 'answer2': '', 'Chihuahua': '', 'answer3': '', 5: ',', 'answer4': '', 'Tiny': '', 'answer5': '', 'size': '', 'answer6': '', 'T': '', 'answer7': '', 2020: ',', 'answer8': '', 'yes': '', 'answer9': '', 24: ',', 'answer10': '', 'sassiness': '', 'answer11': '', 'fiercely': '', 'answer12': '', 'answer13': '', 'Friend': '', 'answer14': '', 'shadow': '', 'answer15': '', 'days': '', 'answer16': '', 14: 'she curls up in the bed under the window', ',': 'answer18', ':': 550, 'answer19': '', 40: '', 'answer20': '', 30: '', 'answer21': '', 50: 0, 'answer22': '', 'March': '', 'answer23': '', 'Bi-weekly': '', 'answer24': '', 'Mix': '', 'answer25': '', 'brand': '', 'answer26': '', 'answer27': '', 'answer28': '', 'answer29': '', 2: ',', 'answer30': '', 'answer31': '', 'answer32': '', 1: ',', 'answer33': '', 'Cesar': '', 'answer34': '', 'store': '', 'answer35': '', 'answer36': '', 'answer37': '', 4: ',', 'answer38': '', 'answer39': '', 'answer40': '', 'answer41': '', 'answer42': '', 'answer43': '', 'change': '', 'answer44': '', 'influence': '', 'answer45': '', 'Yes': '', 'answer46': '', 'answer47': '', 'Car': '', 'answer48': '', 'Female': '', 'answer49': '', 69: ',', 'answer50': '', 'answer51': '', 'degree': '', 'answer52': '', 'Retired': '', 'answer53': '', 74: 999, 'answer54': '', 'Paso': '', 'answer55': '', 'Texas': ''}, {'traits': '', 60: '', 'answer1': '', 6: ',', 'answer2': '', 'Bulldog': '', 'answer3': '', 50: ',', 'answer4': '', 'Winston': '', 'answer5': '', 'demeanor': '', 'answer6': '', 'Win': '', 'answer7': '', 2020: ',', 'answer8': '', 'no': '', 'answer9': '', 3: ',', 'answer10': '', 'character': '', 'answer11': '', 'patient': '', 'answer12': '', 5: ',', 'answer13': '', 'Member': '', 'answer14': '', 'pet': '', 'answer15': '', 'so': '', 'answer16': '', 12: 'sprawled across the hallway rug', ',': 'answer18', ':': 1100, 'answer19': '', 15: '', 'answer20': '', 45: '', 'answer21': '', 90: ',', 'answer22': '', 'March': '', 'answer23': '', 'Monthly': '', 'answer24': '', 'food': '', 'answer25': '', 'brand': '', 'answer26': '', 4: ',', 'answer27': '', 2: ',', 'answer28': '', 'answer29': '', 'answer30': '', 'answer31': '', 'answer32': '', 'answer33': '', 'Nutro': '', 'answer34': '', 'preference': '', 'answer35': '', 'answer36': '', 'answer37': '', 'answer38': '', 'answer39': '', 'answer40': '', 'answer41': '', 'answer42': '', 'answer43': '', 'less': '', 'answer44': '', 'independent': '', 'answer45': '', 'Sure': '', 'answer46': '', 'No': '', 'answer47': '', 'Spring': '', 'answer48': '', 'Female': '', 'answer49': '', 65: ',', 'answer50': '', 'answer51': '', 'degree': '', 'answer52': '', 'Retired': '', 'answer53': '', 'say': '', 'answer54': '', 'Charleston': '', 'answer55': '', 'Carolina': ''}, {'traits': '', 60: ',', 'answer1': '', 8: ',', 'answer2': '', 'Tzu': '', 'answer3': '', 12: ',', 'answer4': '', 'Gizmo': '', 'answer5': '', 'nature': '', 'answer6': '', 'Gizzy': '', 'answer7': '', 2020: ',', 'answer8': '', 'yes': '', 'answer9': '', 'answer10': '', 'fluffball': '', 'answer11': '', 'socializing': '', 'answer12': '', 5: ',', 'answer13': '', 'Friend': '', 'answer14': '', 'bond': '', 'answer15': '', 'praise': '', 'answer16': '', 15: 'on his favorite armchair', ',': 'answer18', ':': 950, 'answer19': '', 10: '', 'answer20': '', 55: '', 'answer21': '', 80: ',', 'answer22': '', 'March': '', 'answer23': '', 'Monthly': '', 'answer24': '', 'food': '', 'answer25': '', 'brand': '', 'answer26': '', 'answer27': '', 4: ',', 'answer28': '', 'answer29': '', 3: ',', 'answer30': '', 'answer31': '', 1: ',', 'answer32': '', 'answer33': '', 'Diet': '', 'answer34': '', 'find': '', 'answer35': '', 'answer36': '', 'answer37': '', 'answer38': '', 'answer39': '', 'answer40': '', 'answer41': '', 'answer42': '', 'answer43': '', 'change': '', 'answer44': '', 'routine': '', 'answer45': '', 'Yes': '', 'answer46': '', 'Sure': '', 'answer47': '', 'Spring': '', 'answer48': '', 'Female': '', 'answer49': '', 64: ',', 'answer50': '', 'answer51': '', 'degree': '', 'answer52': '', 'full-time': '', 'answer53': '', 50: 0, 74: 999, 'answer54': '', 'Reno': '', 'answer55': '', 'Nevada': ''}, {'traits': '', 60: '', 'answer1': '', 1: ',', 'answer2': '', 'Shepherd': '', 'answer3': '', 30: ',', 'answer4': '', 'Blue': '', 'answer5': '', 'eyes': '', 'answer6': '', 'Boo': '', 'answer7': '', 2020: ',', 'answer8': '', 'no': '', 'answer9': '', 2: ',', 'answer10': '', 'energy': '', 'answer11': '', 'purpose': '', 'answer12': '', 5: ',', 'answer13': '', 'companion': '', 'answer14': '', 'support': '', 'answer15': '', 'weeks': '', 'answer16': '', 14: 'on a runner in the hall', ',': 'answer18', ':': 1000, 'answer19': '', 'answer20': '', 65: '', 'answer21': '', 85: ',', 'answer22': '', 'March': '', 'answer23': '', 'Weekly': '', 'answer24': '', 'food': '', 'answer25': '', 'brand': '', 'answer26': '', 'answer27': '', 3: ',', 'answer28': '', 'answer29': '', 4: ',', 'answer30': '', 'answer31': '', 'answer32': '', 'answer33': '', 'Wild': '', 'answer34': '', 'amazing': '', 'answer35': '', 'answer36': '', 'answer37': '', 'answer38': '', 'answer39': ''}]

[Bug]: unescaped quotes immediately followed by comma fail.

Version of the library

0.20.1

Describe the bug

I see that in #46 you fixed the case where there was a comma after the unescaped quote, but with some text in between the comma and the unescaped quote

But sometimes we get results back from the LLM that look like the following:
{"notes": "Sent a message to the "dictator", waiting on response."}
which gets truncated:
{"notes": "Sent a message to the \"dictator"}

How to reproduce

Add this line to test_object_edge_cases and run test again.

assert repair_json('{"key": "Lorem "ipsum", s"}') == '{"key": "Lorem \\"ipsum\\", s"}'

Expected behavior

I would expect that repair_json('{"key": "Lorem "ipsum", s"}') == '{"key": "Lorem \\"ipsum\\", s"}'

[Bug]: Failed repair on some quote cases

Version of the library

0.25.2

Describe the bug

As shown by the cases below, IDs 1, 4, and 5 failed during the repair.

input: {"na"me": "Jack O"Sullivan", "id": "1"}
output: {"na": "e", "Jack O": "ullivan", "id": "1"}
------------
input: {"name": "Jack: The "OG" O"Sullivan"", "id": "2"}
output: {"name": "Jack: The \"OG\" O\"Sullivan\"", "id": "2"}
------------
input: {"name": "Jack: The "OG"", "surname": 'O'Sullivan', "id": "3"}
output: {"name": "Jack: The \"OG\"", "surname": "O'Sullivan", "id": "3"}
------------
input: {"test_str": {"1singlechar": "a""a""a", "2singlechars": "a"a"a"a"a"a"a"a"a"}, "id": "4"}
output: {"test_str": {"1singlechar": "a\"", "a": "a", "2singlechars": "a\"a\"a\"a\"a\"a\"a\"a\"a"}, "id": "4"}
------------
input: {'name': 'Jack O'Sullivan, 'id': '5'}
output: {"name": "Jack O", "id": "5"}
------------

How to reproduce

from json_repair import repair_json


req_jsons = [
    '{"na"me": "Jack O"Sullivan", "id": "1"}',
    '{"name": "Jack: The "OG" O"Sullivan"", "id": "2"}',
    '{"name": "Jack: The "OG"", "surname": \'O\'Sullivan\', "id": "3"}',
    '{"test_str": {"1singlechar": "a""a""a", "2singlechars": "a"a"a"a"a"a"a"a"a"}, "id": "4"}',
    "{'name': 'Jack O'Sullivan, 'id': '5'}",
]

for bad_json_string in req_jsons:
    good_json_string = repair_json(bad_json_string, skip_json_loads=True)
    print(f"input: {bad_json_string}\noutput: {good_json_string}")
    print("------------")

Expected behavior

input: {"na"me": "Jack O"Sullivan", "id": "1"}
output: {"na\me": "Jack O\"Sullivan", "id": "1"}
------------
input: {"name": "Jack: The "OG" O"Sullivan"", "id": "2"}
output: {"name": "Jack: The \"OG\" O\"Sullivan\"", "id": "2"}
------------
input: {"name": "Jack: The "OG"", "surname": 'O'Sullivan', "id": "3"}
output: {"name": "Jack: The \"OG\"", "surname": "O'Sullivan", "id": "3"}
------------
input: {"test_str": {"1singlechar": "a""a""a", "2singlechars": "a"a"a"a"a"a"a"a"a"}, "id": "4"}
output: {"test_str": {"1singlechar": "a\"\"a\"\"a", "2singlechars": "a\"a\"a\"a\"a\"a\"a\"a\"a"}, "id": "4"}
------------
input: {'name': 'Jack O'Sullivan, 'id': '5'}
output: {"name": "Jack O'Sullivan", "id": "5"}

Wrong result when parsing json with trailing texts.

Describe the bug
Wrong result when parsing json with trailing texts.

To Reproduce
The following code should return {'a': '', 'b': [{'c': 1}]}

json_repair.loads("""{"a": "", "b": [ { "c": 1} ]}```""")
# This is parsed to {'a': ', "b'}

json_repair.loads("""{    "a": "",    "b": [ { "c": 1} ] \n}```""")
# This will raise exception TypeError: unhashable type: 'list'

JSON info gets cut off with misplaced brackets

Describe the bug
Hi, I was testing the json_repair module for a personal project that extracts information from a medical text and asks a LLM to fill in a JSON. The LLM I'm using is not great at returning well-formatted JSONs. That's why I was wondering how this module might help. However, I noticed that in case the ill-formatted JSON has objects that have extra closing brackets, the JSON parser stops altogether and assumes the JSON is ended, thus cutting off information.

To Reproduce
The ill-formatted JSON string is:

{"claimant_info":{"name":"John Doe","gender":"male","dominant_hand":"right-handed","date_of_birth":"01/01/2000"},"employment_info":{"occupation":"bank clerk","hours_per_week":0,"was_at_workplace_at_time_of_accident":false,"absence_not_working":[{"type":"sleep disturbance and frequent headaches","duration":""}],"work_restrictions":[{"type":""}]},"past_medical_history":[{"disease_or_pathology":"High cholesterol","text_span":""}]},"recovery_time":[{"body_part":"chest, neck, and back","recovery_time_in_days":"3-4 weeks from 1st treatment date or 9 to 12 visits whichever comes first","text_span":""}]},"dates":{"accident_date":"3/20/2021","examination_date":"3/26/2021","next_examination_date":"04/09/2021","signing_date":"3/26/2021 4:54:17 PM"}}

My test code:

from json_repair import json_repair
import json

jsonString = """{"claimant_info":{"name":"John Doe","gender":"male","dominant_hand":"right-handed","date_of_birth":"01/01/2000"},"employment_info":{"occupation":"bank clerk","hours_per_week":0,"was_at_workplace_at_time_of_accident":false,"absence_not_working":[{"type":"sleep disturbance and frequent headaches","duration":""}],"work_restrictions":[{"type":""}]},"past_medical_history":[{"disease_or_pathology":"High cholesterol","text_span":""}]},"recovery_time":[{"body_part":"chest, neck, and back","recovery_time_in_days":"3-4 weeks from 1st treatment date or 9 to 12 visits whichever comes first","text_span":""}]},"dates":{"accident_date":"3/20/2021","examination_date":"3/26/2021","next_examination_date":"04/09/2021","signing_date":"3/26/2021 4:54:17 PM"}}"""

repaired = json_repair.loads(jsonString)
output = json.dumps(repaired, indent=2)
with open("output.txt","w") as f:
    f.write(output)

Expected behavior
I guess the expected behavior should be that if the extra closing parenthesis is followed by a comma, the parser should infer that the very same bracket is mislocated.

Desktop (please complete the following information):

OS: Windows 11
Python Kernel: 3.11.1
IDE: VSCode

Real numbers without preceding zero are converted incorrectly.

Describe the bug
For a real number without a preceding zero, the value is converted to a whole number.
Example:
.25 -> 25
Should be:
.25 -> 0.25

Additional context
I encountered this output from an AI model.

Escaping underscores

Describe the bug
A clear and concise description of what the bug is.

We have an issue here: OpenDevin/OpenDevin#495

The LLM response tries to escape underscores. So the key new_monologue becomes new\_monologue in the LLM response. json_repair double-escapes the backslash, instead of removing it.

This behavior, where the LLM attempts to escape underscores, seems not uncommon. Maybe we have a special pattern of replacing \_ with _?

Expected behavior
Escape characters removed

Respect original formatting/whitespace

Describe the bug
(This might be too hard and out of scope depending on the implementation. If so, feel free to close)

The repair removes whitespace, so that if the LLM responds with:

{
  "foo": "bar"
}

repair "corrects" it to

{"foo":"bar"}

This makes it hard to detect if an actual repair was made.

To Reproduce
repair JSON with whitespace

Expected behavior
whitespace is preserved

Missing left quotes in numbers are not parsed properly

Describe the bug
If there's a missing out quote at the left of a json define, it doesn't repair properly

To Reproduce
Steps to reproduce the behavior:
Run this json file with json_repair

{
  "words": abcdef",
  "numbers": 12345",
  "words2": ghijkl"
}

Gets parsed like this

{'words': 'abcdef', 'numbers': 12345, ',\n ': 'ords2', 'ghijkl': ''}

Expected behavior
Proper output:
{'words': 'abcdef', 'numbers': '12345', 'words2': 'ghijkl'}

[Bug]: Extra backslash when repairing JSON with escaped double quote

Version of the library

0.26.0

Describe the bug

It looks like repairing the object leads to escaped double quotes getting an extra backslash.

How to reproduce

Run the following snippet:

import json_repair

a = '{"foo": "\\"bar\\""}'
print(json_repair.loads(a))
# {'foo': '"bar"'}
# => OK!

b = """{
  "items": [
    {
      "foo": "\\"bar\\""
    }
"""
print(json_repair.loads(b))
# {'items': [{'foo': '\\"bar"'}]}
# => KO, expected {'items': [{'foo': '"bar"'}]}

c = """{
  "items": [
    {
      "foo": "\\"bar\\""
    }
  ]
}"""
print(json_repair.loads(c))
# {'items': [{'foo': '"bar"'}]}
# => OK!

Expected behavior

No extra backslash in the parsed string

[Bug]: Does not add missing comma in an array of strings but works fine with array of numbers

Version of the library

0.25.1

Describe the bug

Broken Json :
{
"name": "Mike",
"age": 29,
"is_student": "false",
"bio": "Loves to read and play guitar",
"hobbies": ["Reading" "Playing guitar" "Swimming"]
}
Repaired Json:
{
"age": 29,
"bio": "Loves to read and play guitar",
"hobbies": [
"Reading" "Playing guitar" "Swimming"
],
"is_student": "false",
"name": "Mike"
}
This is not how the repaired json should look like , i was expecting it to look like the below one
{
"age": 29,
"bio": "Loves to read and play guitar",
"hobbies": [
"Reading", "Playing guitar" ,"Swimming"
],
"is_student": "false",
"name": "Mike"
}
Although the library works fine in the case of array of integer

How to reproduce

{
"name": "Mike",
"age": 29,
"is_student": "false",
"bio": "Loves to read and play guitar",
"hobbies": ["Reading" "Playing guitar" "Swimming"]
}

Expected behavior

TypeError on malformed string

Describe the bug
The following code throws TypeError: unhashable type: 'dict'.
Notice that the json string is malformed (unmatched double quotes), however we should not throw exception in such cases.

To Reproduce

json_repair.loads('''{ "a": "aa", "de": "{ asdf": {} }" }''')

Exception throw in repair_json.py

Exception is thrown after input of specific string

To reproduce:
good_json = repair_json(' - { "test_key": ["test_value", "test_value2"] }')

Expected behavior
Should convert string ' - { "test_key": ["test_value", "test_value2"] }' into '{ "test_key": ["test_value", "test_value2"] }'

Exception
File "...\venv\Lib\site-packages\json_repair\json_repair.py", line 251, in parse_number
return int(number_str)
^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: '-'

[Bug]: Maximum recursion depth reached when parsing very long incompatible strings (2972 chars)

Version of the library

0.25.0

Describe the bug

Hi! Really appreciate this library :)

Somehow it's happened a few times recently that output from an LLM results in a RecursionError: maximum recursion depth exceeded when passed into repair_json. Unfortunately I don't have the specific output for those cases, but I've been able to reproduce one case.

How to reproduce

An input to repair_json() where there are at least 2972 sequential characters that don't contain valid JSON (for instance a paragraph of text) will result in this error.

Examples:

repair_json("a" * 2972)
repair_json('{"key": "value"}' + ("a" * 2972))
repair_json(("a" * 2972) + '{"key": "value"}' + ("b" * 2972))

paragraph = """
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Proin tincidunt laoreet lorem, ac posuere sapien luctus ut. Etiam volutpat vehicula dolor sit amet aliquet. Maecenas id maximus velit. Phasellus velit justo, consequat et tristique ac, tincidunt sed ligula. Cras ut auctor enim. Ut interdum euismod risus id posuere. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Vestibulum cursus felis massa, id faucibus nisl commodo vitae. Maecenas ex ipsum, consequat a eleifend sed, lacinia id nibh. Pellentesque semper ultrices nunc sit amet tincidunt. Integer pulvinar mi magna, a ultrices sem euismod vitae. Nullam odio turpis, suscipit eget viverra a, rutrum nec tellus.

Curabitur vitae tincidunt lorem, id tincidunt massa. Nam mi massa, accumsan sit amet tellus in, venenatis facilisis est. Sed eu risus fermentum, varius nulla ac, ullamcorper lacus. Nulla facilisi. Praesent a ex nunc. Integer iaculis elit vitae libero pretium elementum. Nullam eu leo vitae neque ullamcorper fermentum a sed tellus. Ut sollicitudin, nibh a faucibus suscipit, enim dolor sodales ante, a accumsan neque diam a justo. Mauris vel orci vitae tellus iaculis dictum id in magna. Duis auctor id dui eget iaculis. Sed quis massa commodo, aliquet tellus quis, tristique nisl.

In luctus tempus quam tempus vulputate. Maecenas laoreet arcu diam, sed bibendum sapien egestas vitae. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia curae; Phasellus laoreet ipsum non ante cursus imperdiet. Interdum et malesuada fames ac ante ipsum primis in faucibus. Aenean laoreet accumsan mollis. Proin feugiat, lacus non congue tincidunt, erat arcu tincidunt metus, eget dignissim ante quam ut diam. Vivamus luctus aliquam placerat. Fusce risus ante, porta ac molestie at, laoreet et odio. Sed quis facilisis magna. Vestibulum sagittis nunc tellus, iaculis ultricies est cursus vitae. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia curae; Duis posuere venenatis posuere. Sed a gravida est, sit amet condimentum massa.

Morbi efficitur aliquam dui a imperdiet. Duis lacus enim, interdum a orci nec, varius porttitor est. Donec vel mi eu mauris sagittis hendrerit sed quis nunc. Vestibulum tortor leo, pulvinar in dignissim ut, ultricies sit amet est. Morbi et viverra magna, eu lacinia nisl. Cras vitae tincidunt dui, vel lobortis velit. Suspendisse tristique imperdiet odio, ac sodales velit pulvinar at. Sed diam enim, imperdiet sit amet mi sollicitudin, rutrum condimentum leo. In id est quis diam pellentesque pharetra sit amet eget tortor.

Etiam vehicula massa quam, sit amet consequat tellus tincidunt vitae. Nam semper ex ut hendrerit pretium. Nam eleifend tincidunt lectus, ut consectetur orci mattis id. Mauris eu sapien id turpis ullamcorper facilisis vitae nec mi. Ut metus augue, mollis nec faucibus sed, malesuada quis ipsum. Vivamus sit amet odio orci. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia curae; Donec posuere lectus risus, pharetra vestibulum turpis vulputate a. Nunc quis felis ullamcorper, blandit purus sit amet, ultricies nibh. Nam vitae sem imperdiet, posuere ex nec, rutrum metus. Aliquam congue magna id ultrices hendrerit.
"""

repair_json(paragraph)

Expected behavior

It would be fine for my use case to just return an empty string or None if this error is caught, or a certain recursion depth is reached.

File input flag

Is your feature request related to a problem? Please describe.
I'd prefer to be able to pass a file name to the function and have it automatically read the content and load it into the function, rather than implementing the logic myself.

Describe the solution you'd like
I'm willing to implement this in a PR using argparse if that's good with you, it will simplify my use case in which the Python library is programatically invoked from a Bash file.

Removing markdown code blocks?

Sometimes I've gotten responses from LLMs that look like this:

```json
{
  "msg": "test"
}
```

and this:

```
{
  "msg": "test"
}
```

Do you think it makes sense to add markdown backtick stripping to this library? =)

[Bug]: loading an ordinary string results in empty string object

Version of the library

0.25.3

Describe the bug

Hi there!

Just upgraded from v0.13 to current release and found that loading a raw string does not work anymore:

>>> json_repair.loads('test')
''

I expected

>>> json_repair.loads('test')
'test'

as a fallback. This breaks my setup as I now have to check myself somehow if parameter is just a raw string without any JSON object definition. Is this a bug or are there any good reasons for that change?

Thanks!

How to reproduce

json_repair.loads('test')
''

Expected behavior

json_repair.loads('test')
'test'

[Bug]: Strings containing unescaped quotes followed by commas are incorrectly truncated

Version of the library

0.19.2

Describe the bug

Within a string with an unescaped quote followed at a later point by a comma, the string gets truncated after the second " character in the unescaped quote within the string. If this string is at the end of the JSON object and the string is not immediately followed by } (i.e. is followed by whitespace or e.g. a comma), then the final word in the string is parsed as a key with an empty (string) value.

This seems to relate to #44, but it seems the attempted fix for that bug report didn't fully resolve this.

How to reproduce

(Note, I've formatted the recovered/output JSON just to make it more readable)

For

>>> repair_json('{"lorem": "Lorem "ipsum" excepteur sint, suntid est laborum"}')

the recovered JSON is:

{
  "lorem": "Lorem \"ipsum"
}

For any of the following examples

>>> repair_json('{"lorem": "Lorem "ipsum" excepteur sint, suntid est laborum" }')
>>> repair_json('{"lorem": "Lorem "ipsum" excepteur sint, suntid est laborum"\n}')
>>> repair_json('{"lorem": "Lorem "ipsum" excepteur sint, suntid est laborum",}')

the recovered JSON is:

{
  "lorem": "Lorem \"ipsum",
  "laborum": ""
}

Removing the comma, the output matches what we'd expect:

>>> repair_json('{"lorem": "Lorem "ipsum" excepteur sint suntid est laborum"}')
>>> repair_json('{"lorem": "Lorem "ipsum" excepteur sint suntid est laborum" }')

yields

{
  "lorem": "Lorem \"ipsum\" excepteur sint suntid est laborum"
}

Expected behavior

>>> print(repair_json('{"lorem": "Lorem "ipsum" excepteur sint suntid est laborum"}'))
{"lorem": "Lorem \"ipsum\" excepteur sint, suntid est laborum"}

>>> print(repair_json('{"lorem": "Lorem "ipsum" excepteur sint suntid est laborum" }'))
{"lorem": "Lorem \"ipsum\" excepteur sint, suntid est laborum"}

Not extracting specific JSON examples

Describe the bug
The library works great for simple JSON response that comes from LLMs but for responses it completely misses them.

The results that this package returned to me
{ \"Summary"

Expected return

{
    "Summary": "The customer, Joey, contacted Avanser to inquire about a specific vehicle model. He was interested in purchasing a silver-colored car and wanted to know if it was available. The agent checked the inventory and found that they had a similar model with a purple color. After discussing the availability of the silver model, the agent offered to allocate one of their salespeople to call Joey back to discuss further. The customer expressed his satisfaction with the service and requested doorstep delivery nationwide.",
    "Brand": "JD",
    "Model": "Silver One on the Back One" (Note: This is not a real vehicle model, but rather a description provided by the customer),
    "Primary topic": "Vehicle Availability",
    "Primary topic explanation": "The customer wanted to know if a specific silver-colored car was available for purchase.",
    "Secondary topic": "Trade-in Options and Financing",
    "Secondary topic explanation": "The customer mentioned that he had cash and was interested in trading in his current vehicle, but the agent clarified that they did not have the necessary information on hand.",
    "Issue resolution": "Partially resolved",
    "Issue resolution explanation": "The agent checked the inventory and found a similar model with a purple color, but could not confirm the availability of the silver model. The customer was offered an alternative solution to discuss further with one of their salespeople."
}

Screenshots

Environment (please complete the following information):

OS: Linux (AWS SageMaker Notebook)
Browser: Chrome

repair_json incorrectly truncates JSON strings with escaped quotes and commas

Describe the bug
The valid JSON {"foo": "bar \"foo\", baz"} gets turned into the broken JSON {"foo": "bar \\"foo"} when using repair_json.
I think this is related to the escaped quotes and comma.

To Reproduce
Steps to reproduce the behavior:

Call repair_json('{"foo": "bar \"foo\", baz"}')
Check the output {"foo": "bar \\"foo"}

Expected behavior
Correct output {"foo": "bar \"foo\", baz"}

Not support pytohn3.7

Describe the bug

First sorry for my english.

This package used := operator.
The operator start with python3.8, so not support python3.7.

But python3.7 will install success, and pyproject.toml set support python3.7.
Should set to python3.8 or not use := opeartor.

json_repair/pyproject.toml

Line 14 in eaf67ab

requires-python = ">=3.7"

To Reproduce
Steps to reproduce the behavior:

Use python3.7.
Use this package
See error

Expected behavior
Should run success in python3.7 or install failed in python3.7.

Screenshots

OS: windows10
Python-Version: 3.7

Additional context

Example code(in python3.7):

import json

from json_repair import repair_json

json_string = r'{"a": 1 }{}'

json.loads(repair_json(json_string))

Not working for basic example

Describe the bug
The following "broken" json:

[
    {
        "foo": "Foo bar baz",
        "tag": "#foo-bar-baz"
    },
    {
        "foo": "foo bar "foobar" foo bar baz.",
        "tag": "#foo-bar-foobar"
    }
]

is repaired well by: https://josdejong.github.io/jsonrepair/

but not by this library.

To Reproduce

>>> bad_json
'[\n    {\n        "foo": "Foo bar baz",\n        "tag": "#foo-bar-baz"\n    },\n    {\n        "foo": "foo bar "foobar" foo bar baz.",\n        "tag": "#foo-bar-foobar"\n    }\n]'
>>> json_repair.loads(bad_json)
[{'foo': 'Foo bar baz', 'tag': '#foo-bar-baz"\n    },\n    {\n        "foo', 'foo bar "foobar" foo bar baz.': 'tag', '#foo-bar-foobar': ''}]

Expected behavior
Expected output:

[
    {
        "foo": "Foo bar baz",
        "tag": "#foo-bar-baz"
    },
    {
        "foo": "foo bar \"foobar\" foo bar baz.",
        "tag": "#foo-bar-foobar"
    }
]

(as per https://josdejong.github.io/jsonrepair/)

output instead:

[{'foo': 'Foo bar baz', 'tag': '#foo-bar-baz"\n    },\n    {\n        "foo', 'foo bar "foobar" foo bar baz.': 'tag', '#foo-bar-foobar': ''}]

Issue with parsing when there is leading text

Describe the bug
Issue with parsing when there is leading text

To Reproduce

json_repair.loads("Based on the information extracted, here is the filled JSON output: ```json { 'a': 'b' } ```")
# this returns the same string inputted to the function

Expected behavior
It returns { 'a': 'b' }

I've noticed that the repair works well with trailing text, e.g.

json_repair.loads("```json { 'a': 'b' } ``` This output reflects the information given in the input.")
# returns {'a': 'b'} as expected

[Bug]: json_repair does not work for following json

Thanks for the library! :)

Version of the library

0.20.0

Describe the bug

After parsing the "graphics" object is not in the correct hierarchy.

And also the bonus with bool variables does not seems to parsed correctly.

How to reproduce

Use json_repair.loads() with following json:
https://raw.githubusercontent.com/vcmi-mods/tides-of-war/vcmi-1.5/Mods/alternative-creatures/content/config/creatures/rampart/dryad.json

Compare object for example with json5 library output.

getting '_io.TextIOWrapper' object has no attribute 'strip'. Not sure if i'm using it wrong or python version conflict

When I try the example of:

import sys
import json_repair
from json_repair import repair_json

JSON_PATH = "/Path/To/JSON/File.txt"

print(sys.version)

try:
    file_descriptor = open(JSON_PATH, 'r')
except OSError:
    ...

try:
    with file_descriptor:
        decoded_object = json_repair.load(file_descriptor)
except Exception as e:
        print("Repairing logfile failed")
        print(f"An exception occurred: {e}")

I get returned:

3.9.2 (default, Feb 28 2021, 17:03:44) 
[GCC 10.2.1 20210110]
Repairing logfile failed
An exception occurred: '_io.TextIOWrapper' object has no attribute 'strip'

in this case the JSON file that is referenced with JSON_PATH is valid. I've also tried breaking it but I always get this exception.
As you can see I'm running Python 3, not sure if that could cause the problem?

Special invalid json text: no response for a long time

# test
gpt_content = """```json
{
    "Name": {
        "en": "Jia-Ming Li",
        "zh": "李家明",
        "de": "Jia-Ming Li"
    },
    "Contact": {
        "en": "Phone: 010-62788597\nFax: 010-62788597\nEmail: [email protected]",
        "zh": "电话：010-62788597\n传真：010-62788597\n电子邮件：[email protected]",
        "de": "Telefon: 010-62788597\nFax: 010-62788597\nE-Mail: [email protected]"
    },
    "Language Ability": {
        "en": [
            "English",
            "Chinese"
        ],
        "zh": [
            "英语",
            "中文"
        ],
        "de": [
            "Englisch",
            "Chinesisch"
        ]
    },
    "Province": {
        "en": "Beijing",
        "zh": "北京",
        "de": "Peking"
    },
    "Title": {
        "en": "Professor, Director of the Center for Atomic and Molecular Nanosciences, Tsinghua University",
        "zh": "教授，清华大学原子分子纳米科学研究中心主任",
        "de": "Professor, Direktor des Zentrums für atomare und molekulare Nanowissenschaften, Tsinghua-Universität"
    },
    "Academic Background & Achievements": {
        "en": [
            "1968 - B.S. in Engineering, Taiwan University",
            "1974 - Ph.D., University of Chicago",
            "Academician, Chinese Academy of Sciences"
        ],
        "zh": [
            "1968 - **大学工程学士",
            "1974 - 美国芝加哥大学博士",
            "**科学院院士"
        ],
        "de": [
            "1968 - B.S. in Ingenieurwissenschaften, Universität Taiwan",
            "1974 - Ph.D., Universität von Chicago",
            "Akademiker, Chinesische Akademie der Wissenschaften"
        ]
    },
    "Work Experience": {
        "en": [
            "1974 - Research Associate, Department of Physics, University of Chicago",
            "1975-1976 - Research Associate, Department of Physics and Astronomy, University of Pittsburgh",
            "1977-1978 - Senior Research Associate, Laser Energy Research Institute, University of Rochester",
            "1979-1982 - Associate Researcher, Institute of Physics, Chinese Academy of Sciences",
            "1983-Present - Researcher, Institute of Physics, Chinese Academy of Sciences",
            "1997-Present - Professor, Director of the Center for Atomic and Molecular Nanosciences, Department of Physics, Tsinghua University",
            "2003-Present - Professor, Department of Physics, Shanghai Jiao Tong University"
        ],
        "zh": [
            "1974 - 美国芝加哥大学物理系，研究助理",
            "1975-1976 - 美国匹兹堡大学物理天文系，研究助理",
            "1977-1978 - 美国罗彻斯特大学激光能量研究所，高级研究助理",
            "1979-1982 - **科学院物理研究所，副研究员",
            "1983至今 - **科学院物理研究所，研究员",
            "1997至今 - 清华大学物理系，原子分子纳米科学研究中心，教授，中心主任",
            "2003至今 - 上海交通大学物理系，教授"
        ],
        "de": [
            "1974 - Forschungsassistent, Abteilung für Physik, Universität Chicago",
            "1975-1976 - Forschungsassistent, Abteilung für Physik und Astronomie, Universität Pittsburgh",
            "1977-1978 - Senior Research Associate, Laser Energy Research Institute, Universität Rochester",
            "1979-1982 - Associate Researcher, Institut für Physik, Chinesische Akademie der Wissenschaften",
            "1983-heute - Forscher, Institut für Physik, Chinesische Akademie der Wissenschaften",
            "1997-heute - Professor, Direktor des Zentrums für atomare und molekulare Nanowissenschaften, Abteilung für Physik, Tsinghua-Universität",
            "2003-heute - Professor, Abteilung für Physik, Shanghai Jiao Tong Universität"
        ]
    },
    "Awards": {
        "en": [
            "1986 - Kastler Prize, International Centre for Theoretical Physics",
            "1990 - Second Class Prize, Natural Science, Chinese Academy of Sciences",
            "1991 - Outstanding Young Expert, Chinese Academy of Sciences",
            "1992 - Second Class Prize, Natural Science, Chinese Academy of Sciences",
            "1994 - Advanced Individual in Scientific Research under the 863 Program, Ministry of Science and Technology of China",
            "2001 - Advanced Individual Award for the 15th Anniversary of the 863 Program, Ministry of Science and Technology of China"
        ],
        "zh": [
            "1986 - 国际理论物理中心的 Kastler 奖",
            "1990 - **科学院自然科学奖二等奖",
            "1991 - **科学院有突出贡献的中青年专家",
            "1992 - **科学院自然科学奖二等奖",
            "1994 - 国防科学技术工业委员会评为“在863计划科研工作中的先进个人”",
            "2001 - **人民解放军总装备部和国家科技部授予“863计划十五周年先进个人奖”"
        ],
        "de": [
            "1986 - Kastler-Preis, Internationales Zentrum für Theoretische Physik",
            "1990 - Zweiter Klasse Preis, Naturwissenschaften, Chinesische Akademie der Wissenschaften",
            "1991 - Herausragender junger Experte, Chinesische Akademie der Wissenschaften",
            "1992 - Zweiter Klasse Preis, Naturwissenschaften, Chinesische Akademie der Wissenschaften",
            "1994 - Fortgeschrittene Einzelperson in der wissenschaftlichen Forschung unter dem 863-Programm, Ministerium für Wissenschaft und Technologie von China",
            "2001 - Fortgeschrittene Einzelperson Auszeichnung zum 15. Jahrestag des 863-Programms, Ministerium für Wissenschaft und Technologie von China"
        ]
    },
    "Areas of Focus": {
        "en": [
            "Atomic and molecular physics",
            "Computational physics",
            "Theoretical physics",
            "Nanoscience"
        ],
        "zh": [
            "原子分子物理",
            "计算物理",
            "理论物理",
            "纳米科学"
        ],
        "de": [
            "Atom- und Molekülphysik",
            "Computational Physik",
            "Theoretische Physik",
            "Nanowissenschaft"
        ]
    },
    "Keywords for Area of Focus": {
        "en": [
            "quantum theory", "computational methods", "atomic properties", "molecular systems", "clusters", "physical properties", "dynamics", "theoretical calculations", "nanotubes", "semiconductors"
        ],
        "zh": [
            "量子理论", "计算方法", "原子属性", "分子系统", "团簇", "物理属性", "动力学", "理论计算", "纳米管", "半导体"
        ],
        "de": [
            "Quantentheorie", "Berechnungsmethoden", "atomare Eigenschaften", "molekulare Systeme", "Cluster", "physikalische Eigenschaften", "Dynamik", "theoretische Berechnungen", "Nanoröhren", "Halbleiter"
        ]
    },
    "Publications": {
        "en": [
            {
                "Title": "Spectroscopy and Collision Theory: The Ar Absorption Spectrum",
                "Author": "C.M.Lee (Jia-Ming Li), K.T.Lu",
                "Publish Date": "1973-01-01"
            },

            {
                "Title": "Variational Calculation of R-matrix: Application to Ar Photoabsorption",
                "Author": "U.Fano, C.M.Lee (Jia-Ming Li)",
                "Publish Date": "1973-01-01"
            },

            {
                "Title": "Spectroscopy and Collision Theory: Atomic Eigenchannel Calculation by a Hartree-Fock-Roothaan Method",
                "Author": "C.M.Lee (Jia-Ming Li)",
                "Publish Date": "1974-01-01"
            },

            {
                "Title": "Spin Polarization and Angular Distribution of Photoelectrons in Jacob-Wick Helicity Formalism: Application to Autoionzation Resonances",
                "Author": "C.M.Lee (Jia-Ming Li)",
                "Publish Date": "1974-01-01"
            },

            {
                "Title": "Multichannel Photodetachment Theory",
                "Author": "C.M.Lee (Jia-Ming Li)",
                "Publish Date": "1975-01-01"
            },

            {
                "Title": "Comment on Structure near the Cut-off Of the Continuous X-ray Spectrum of Lanthanum",
                "Author": "C.M.Lee (Jia-Ming Li), R.H.Pratt",
                "Publish Date": "1975-01-01"
            },

            {
                "Title": "Radiative Capture of High-energy Electrons",
                "Author": "C.M.Lee (Jia-Ming Li), R.H.Pratt",
                "Publish Date": "1975-01-01"
            },

            {
                "Title": "The Electron Bremsstrahlung Spectrum 1---500 keV",
                "Author": "C.M.Lee (Jia-Ming Li), L.Kissel, R.H.Pratt, H.K.Tseng",
                "Publish Date": "1976-01-01"
            },

            {
                "Title": "Radiative Electron Capture by Mo Ions",
                "Author": "C.M.Lee (Jia-Ming Li), R.H.Pratt",
                "Publish Date": "1976-01-01"
            },

            {
                "Title": "Multichannel Dissociative Recombination Theory",
                "Author": "C.M.Lee (Jia-Ming Li)",
                "Publish Date": "1977-01-01"
            },

            {
                "Title": "Application of Low Energy Theorem in Electron Bremsstrahlung",
                "Author": "R.H.Pratt, C.M.Lee (Jia-Ming Li)",
                "Publish Date": "1977-01-01"
            },

            {
                "Title": "Bremsstrahlung Spectrum from Atomic Ions",
                "Author": "C.M.Lee (Jia-Ming Li), R.H.Pratt, H.K.Tseng",
                "Publish Date": "1977-01-01"
            },

            {
                "Title": "Radiative Charge Exchange Process in High-energy Ion-Atom Collisions",
                "Author": "C.M.Lee (Jia-Ming Li)",
                "Publish Date": "1978-01-01"
            },

            {
                "Title": "On the Dispresion Relation for Electron-Atom Scattering",
                "Author": "E.Gerjuoy, C.M.Lee (Jia-Ming Li)",
                "Publish Date": "1978-01-01"
            },

            {
                "Title": "Properties of Matter at High Pressures and Temperatures",
                "Author": "C.M.Lee (Jia-Ming Li), E.Thorsos",
                "Publish Date": "1978-01-01"
            },

            {
                "Title": "Bremsstrahlung Energy Spectra from Electrons of Kinetic Energy 1keV~$\le$~T~$\le$~200~keV incident on Neutral Atoms 2~$\le$~Z~$\le$~92",
                "Author": "R.H.Pratt,H.K.Tseng, C.M.Lee (Jia-Ming Li), L.kissel",
                "Publish Date": "1977-01-01"
            },

            {
                "Title": "Measurement of Compressed Core Density of Laser-imploded Target by X-ray Continuum Edge Shift",
                "Author": "C.M.Lee (Jia-Ming Li), A.Hauer",
                "Publish Date": "1978-01-01"
            },

            {
                "Title": "Electron Bremsstrahlung Angular Distribution in the 1---500 keV Energy Range",
                "Author": "H.K.Tseng, R.H.Pratt, C.M.Lee (Jia-Ming Li)",
                "Publish Date": "1979-01-01"
            },

            {
                "Title": "Explosive-pusher-type Laser Compression Experiment with Neon-filled Microballons",
                "Author": "B.Yaakobi, D.Steel, E.Thorsos, A.Hauer, B.Perry, S.Skupsky, J.Geiger, C.M.Lee (Jia-Ming Li), S.Letzring, J.Rizzo, T.Mukaiyama, E.Lazarus, G.Halpern, H.Deckman, J.Delettrez",
                "Publish Date": "1979-01-01"
            },

            {
                "Title": "Relativistic Random Phase Approximation",
                "Author": "W.R.Johoson, C.D.Lin, K.T.Cheng, C.M.Lee (Jia-Ming Li)",
                "Publish Date": "1980-01-01"
            },

            {
                "Title": "Electronic Impact Excitation of Li-Like Ions",
                "Author": "Jia-Ming Li",
                "Publish Date": "1980-01-01"
            },

            {
                "Title": "Scattering Theory and Specctroscopy: Relativistic Multichannel Quantum Defect Theory",
                "Author": "C.M.Lee (Jia-Ming Li), W.R.Johnson",
                "Publish Date": "1980-01-01"
            },

            {
                "Title": "Systematic Variation of Line-shift of K Radiation from Atomic Ions",
                "Author": "Jia-Ming Li, Zhong-Xin Zhao",
                "Publish Date": "1981-01-01"
            },

            {
                "Title": "Variation in L, M, N Inner-shell Electron Binding Energies of Rare-earth Elements in Valence Transition",
                "Author": "Jia-Ming Li, Zhong-Xin Zhao",
                "Publish Date": "1982-01-01"
            },

            {
                "Title": "Multichannel Inverse Dielectronic Recombination Theory",
                "Author": "Jia-Ming Li",
                "Publish Date": "1983-01-01"
            },

            {
                "Title": "Quantum Defect Theory:Rydberg States of Molecules NO",
                "Author": "Jia-Ming Li, Vo Ky Lan",
                "Publish Date": "1983-01-01"
            },

            {
                "Title": "Generalized Oscillator Strength Density",
                "Author": "Bo-Gang Tian, Jia-Ming Li",
                "Publish Date": "1984-01-01"
            },

            {
                "Title": "Theoretical Calculations of Atomic Two-photon Ionization Processes",
                "Author": "Ying-Jian Wu, Jia-Ming Li",
                "Publish Date": "1985-01-01"
            },

            {
                "Title": "Non-relativistic and Relativistic Atomic Configuration Theory: Excitation Energies and Radiative Transition Probabilities",
                "Author": "Zhong-Xin Zhao, Jia-Ming Li",
                "Publish Date": "1985-01-01"
            },

            {
                "Title": "Minima of Oscillator Strenth Densities for Excited Atoms",
                "Author": "Xiao-Ling Liang, Jia-Ming Li",
                "Publish Date": "1985-01-01"
            },

            {
                "Title": "Scaling Relation of Generlized Oscillator Strength Densities along Isoelectronic Sequence",
                "Author": "Xiao-Chuan Pan, Jia-Ming Li",
                "Publish Date": "1985-01-01"
            },

            {
                "Title": "Eletronic Structure of Atomic Ions With the 4f electrons",
                "Author": "Zhong-Xin Zhao, Jia-Ming Li",
                "Publish Date": "1985-01-01"
            },

            {
                "Title": "Ionization Channels of Superexcited Molecules",
                "Author": "Xiao-Ling Liang, Xiao-Chuan Pan, Jia-Ming Li",
                "Publish Date": "1985-01-01"
            },

            {
                "Title": "Progress Report on Quantum Defect Theory: Dynamics of Excited Atoms and Molecules",
                "Author": "Jia-Ming Li",
                "Publish Date": "1986-01-01"
            },

            {
                "Title": "Eletronic Impact Excitation Cross Sections and Rates: I Spin Allowed Excitation Processes",
                "Author": "Bo-Gang Tian ,Jia-Ming Li",
                "Publish Date": "1986-01-01"
            },

            {
                "Title": "Current Topic in Atomic Physics: Studies on Excited Atoms and Molecules",
                "Author": "Jia-Ming Li",
                "Publish Date": "1986-01-01"
            },

            {
                "Title":"""

decoded_object = json_repair.repair_json(gpt_content, return_objects=True, logging=True)

print(decoded_object)

It doesn't work, no response for a long time.

Infinite loop with open array and closed parent element

Describe the bug
The library runs into an infinite loop when calling repair_json('{foo: [}').

To Reproduce
Steps to reproduce the behavior:

Call repair_json('{foo: [}')

Expected behavior
Output of fixed json: {foo: []}

Unfortunately, I don't have the time to create a PR so I just report the bug here.

Infinite loop when key is empty

Hi, first of all, thanks for this very useful library!
My model occasionally produces JSON strings with empty keys, so I encountered the following issue:

Describe the bug
When a key in the JSON string is empty, the library runs into an infinite loop in 'parse_object'.

To Reproduce
Steps to reproduce the behavior:

Call 'repair_json' with a key in the JSON string being empty,
for example: repair_json("{'': 'test'}")

Expected behavior
Either return 'Invalid JSON format' or substitute the empty key? Not sure what's best here

Adding missing escape for double quote

Hi @mangiucugna,

Thank you for your efforts on this. I've encountered a similar issue with the output from the LLM. It seems that the repair_json function isn't handling certain cases correctly.

For instance, when trying to repair the following JSON string:

json_str = '{\n"html": "<h3 id="title">Waarom meer dan 200 Technical Experts - "Passie voor techniek"?</h3>"}'
data = repair_json(json_str, return_objects=True)

The current output is:

{
    'html': '<h3 id=', 
    'techniek': 'h3>',
    'title': u'Waarom meer dan 200 Technical Experts - '
}

However, the expected output should be:

{
    'html': '<h3 id="title">Waarom meer dan 200 Technical Experts - "Passie voor techniek"?</h3>'
}

It seems like the function is having trouble handling certain characters or nested structures properly. Would you mind looking into this further?

Thank you again for your attention to this matter.

Originally posted by @nikolaysm in #20 (comment)

Infinite Loop when the input is `{"`

Describe the bug
I was trying the json response streaming from openai api and fix it using this library. When the json input was {" it got my program stuck for some reason. After deep investigation, I found it was an infinite loop in this library. Attached below are some screenshots with what's happening and the function that it's happening in. I added some print() in the library code to check what's happening.

To Reproduce
Steps to reproduce the behavior:

Import the library
make your JSON string {"
use the repair_json function from the library
You will have a stuck program with an infinite loop.

Expected behavior
Should throw an error or have an empty object like {}.

Screenshots

Desktop (please complete the following information):

OS: MacOS Sonoma
Version 0.4.1

Problem with missing Value or comment in Key-Value pair

Describe the bug
When there is a comment in the json or a value is missing, the tool creates a new k-v pair.

To Reproduce
{
"value_1": true, SHOULD_NOT_EXIST
"value_2": "data"
}

TRANSFORMS TO

{
"value_1": true,
"SHOULD_NOT_EXIST\n\n": "alue_2",
"": "data",
"}": ""
}

AND

{
"value_1":
"value_2": "data"
}

TRANSFORMS TO

{
"value_1": "value_2",
"": "data",
"}": ""
}

Expected behavior
Those are the 2 expected results:

{
"value_1": true,
"value_2": "data"
}

{
"value_1": ""
"value_2": "data"
}

Desktop (please complete the following information):

OS: Debian
Python 3.11.2
Version 0.2.0

Additional context
The Json files were created with LLama2 and Mistral.

Quote problem - failed JSON

First of all, thank you for this amazing library! Just wanted to report a case, I got on my 10th try.
Describe the bug
ChatGPT returned the following string (I'm using JSON function call method, while using their API)
{ "content": "[LINK]("https://google.com")" }

To Reproduce
Steps to reproduce the behavior:

Try to fix this string, using json_repair

Expected behavior
String should be fixed to:
{ "content": "[LINK](https://google.com)" }
or similar

Additional context
I'm using the latest version
Traceback (line numbers may differ):
`[1930] Failed to execute script 'jsonfix' due to unhandled exception!
Traceback (most recent call last):
File "jsonfix.py", line 281, in repair_json
File "json/init.py", line 348, in loads
File "json/decoder.py", line 337, in decode
File "json/decoder.py", line 353, in raw_decode
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 2 column 20 (char 22)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "jsonfix.py", line 310, in
File "jsonfix.py", line 283, in repair_json
File "jsonfix.py", line 17, in parse
File "jsonfix.py", line 29, in parse_json
File "jsonfix.py", line 106, in parse_object
File "jsonfix.py", line 59, in parse_json
ValueError: Invalid JSON format`

Whitespace before a closing ] breaks fixing unclosed nested arrays

Describe the bug

JSON like this:

[[1 ]

produces an Invalid JSON format exception instead of repairing the JSON. It's the trailing whitespace that is the problem; this:

[[1]

is fine.

(This is a reduced version of JSON produced by ChatGPT where the original error occurred.)

To Reproduce
Try fixing the above JSON.

Note that there are two related issues:

the code only checks for space (not other whitespace like carriage returns)
space at the end of an array is not handled

Additional context
PR which fixes this coming shortly.

[Feature Request]: Get multiple valid fixes for an invalid JSON string

Describe the solution you'd like

Imagine your LLM spits out:

{"foo": {"bar": {"baz": 1}, "zig": {"zag": 2}}

This JSON is missing a bracket. There are two places it could go that would fix the issue:

{"foo": {"bar": {"baz": 1}, "zig": {"zag": 2}}
|-------------------------^--------or---------^|

Currently, repair_json.loads() returns the valid JSON with the added closing bracket at the end. But my pydantic model actually requires the other solution.

So if I could do:

valid_jsons = json_repair.parse_all(invalid_llm_output)
validated_output = None
for valid_json in valid_jsons:
    try:
        validated_output = MyExpectedOutputModel.parse(valid_json)
    except ValidationError:
        continue

if not validated_output:
    ... do something ...

Context

That would help make this in even more powerful tool in avoiding calling the LLM for fixing the JSON.

Newlines are not respected in strings, this breaks gpt functions return content

I'm testing an automation that scaffolds a project based on a txt file with architecture details, using a twitter posting tool as an example (I know, this wouldn't really work because they've changed their API since 2021).

Originally posted by @cooleydw494 in #3 (reply in thread)

[Bug]:

Version of the library

0.20.1

Describe the bug

When some of the string elements in the list are missing double quotes, the current repair program fixes all the elements as a whole.

I haven't found a suitable solution for this either, I was thinking of using a comma as the beginning of a new list element to judge, but it seems that list elements overlap in a variety of ways, e.g. sometimes a comma separates element delimiters, and sometimes it's a punctuation mark in a sentence.

I wonder if this could be fixed more precisely by adding a more detailed context judgment.

How to reproduce

{
"people": ["Rilee Smith", travel bloggers, Matthias Keller, Ben Harrell"],
"additional_research_needed": [
"Current AI trends in the travel industry for 2024.",
"User satisfaction and feedback on AI travel planning tools like ChatGPT, Copilot, and Gemini.",
"Latest advancements in the AI-driven content marketing landscape."
]
}

Expected behavior

Trying to be more precise in determining the contextual conditions of json format.

Price like numbers not properly parsed

Describe the bug
Price like number not properly parsed.

To Reproduce
from json_repair import repair_json repair_json("{'price': [105,000.00']}")

Expected behavior
return "{'price': ['105,000.00']}" (or return "{'price': [105,000.00]}", format doesn't matter since a quote is loss)

Screenshots

Desktop (please complete the following information):

Version Python==3.9.18 json_repair==0.10.1

exchange of a friendly link request

Hello,

I would like to extend my gratitude for the exceptional project you have created. Inspired by your work, I have developed a Go language version of json-repair, which can be found at https://github.com/RealAlexandreAI/json-repair. With the aim of fostering a collaborative development environment and providing developers with access to libraries in various programming languages, I propose the exchange of a friendly link between our projects.

This initiative is particularly relevant in addressing the challenge of handling the chaotic JSON strings often generated by Large Language Models (LLMs). By linking our repositories, we can offer a more comprehensive solution to the developer community, enabling them to effectively manage and repair JSON data across different platforms and programming languages.

I look forward to your positive response and the potential benefits that such a collaboration could bring to both our projects and their users.

Parsing comments with float point number

Consider the following code:

json_repair.loads("""{ \n "b": "xxxxx" // comment 1.2 \n }""")

It should be {'b': 'xxxxx'}.
But the library outputs {'b': 'xxxxx', 0.2: ''}.

Fixing single quote problems from llm when using loads

Hola,
Really useful library
Small llm versions can sometimes output json in single quotes and can be sometimes not very consistant with it.

Here is how I deal with it right now. It's not super inefficient but it work for now.

def fix_single_quotes(json_str):
# This pattern matches keys and values in a JSON string
pattern = r"'([^'])'(?=[\s,}:])|(?<=[:{,]\s)'([^'])"

# Use a list to store the parts of the string that will not be changed
parts = []
last_end = 0

# Iterate over all matches
for match in re.finditer(pattern, json_str):
    # Add the part of the string before the match to the list
    parts.append(json_str[last_end:match.start()])
    # Replace the single quotes around the match with double quotes and add it to the list
    parts.append('"' + match.group(1 or 2) + '"')
    last_end = match.end()

# Add the part of the string after the last match to the list
parts.append(json_str[last_end:])

# Join the parts into a single string
return ''.join(parts)

Feel free to improve it.
Best,

Handle missing comma

Describe the bug
Can we handle missing comma when parsing dict?

To Reproduce

json_repair.loads('''{
  "number": 1,
  "reason": "According..."
  "ans": "YES"
}''')

This code produces {'number': 1, 'reason': 'According..."\n "ans": "YES'}.
But we expect {'number': 1, 'reason': 'According...', 'ans': 'YES'}.

[Bug]: Unable to handle double quotes in start of string

Version of the library

0.25.2

Describe the bug

Working with LLMs (Llama) and having it produce some output in JSON format. There is an edge case I have encountered when working with chinese headings where it will often produce double quotes on the "title" property in the JSON string. This breaks the formatting.

Using the json_repair library should fix this, but instead it returns an empty string in the title.

Output:
[{"chapter_id": 1, "starting_time_stamp": "0:00:00", "title": ""}, {"chapter_id": 2, "starting_time_stamp": "0:01:00", "title": ""}, {"chapter_id": 3, "starting_time_stamp": "0:02:00", "title": ""}, {"chapter_id": 4, "starting_time_stamp": "0:04:00", "title": ""}, {"chapter_id": 5, "starting_time_stamp": "0:06:00", "title": ""}, {"chapter_id": 6, "starting_time_stamp": "0:09:00", "title": ""}, {"chapter_id": 7, "starting_time_stamp": "0:11:00", "title": ""}]

How to reproduce

Use the following JSON.

raw_json = """[
  {
    "chapter_id": 1,
    "starting_time_stamp": "0:00:00",
    "title": ""国内苹果用户和安卓用户使用TikTok的各种方法"
  },
  {
    "chapter_id": 2,
    "starting_time_stamp": "0:01:00",
    "title": ""苹果安卓通用最简单的方法"
  },
  {
    "chapter_id": 3,
    "starting_time_stamp": "0:02:00",
    "title": ""不插卡使用"
  },
  {
    "chapter_id": 4,
    "starting_time_stamp": "0:04:00",
    "title": ""免拔卡模式"
  },
  {
    "chapter_id": 5,
    "starting_time_stamp": "0:06:00",
    "title": ""MITM抓包安装支持MITM的旧版TikTok客户端"
  },
  {
    "chapter_id": 6,
    "starting_time_stamp": "0:09:00",
    "title": ""安卓用户使用修改版"
  },
  {
    "chapter_id": 7,
    "starting_time_stamp": "0:11:00",
    "title": ""苹果端无视SIM卡地区限制的第三方修改版"
  }
]"""

Calling code:

valid_json = repair_json(raw_json)
print(valid_json)

Expected behavior

Expected the removal of one the quotes in the starting of the "title" object string.

[Bug]: Failing to parse truncated JSON (due to LLM repetition and max_tokens)

Version of the library

0.25.3

Describe the bug

Not sure whether to call this a bug or a feature request. Some models have a habit of getting into loops (Llama-3.1 in this case) so the output gets truncated by max_tokens and the JSON is borked. I'd say there were two issues here - one that it's not parsing the quotes correctly (look at the single vs double quotes in its output compared with the input) and secondly that it's not managing to include much of the input JSON. Is it possible to parse this?

How to reproduce

LLM output:
{ "text description" : "subcutaneous oxycodone",\n"terms" : [\n {"term": "Localized swelling, mass and lump of skin and subcutaneous tissue", "score": 0},\n {"term": "Benign lipomatous neoplasm of skin and subcutaneous tissue of head, face and neck", "score": 0},\n {"term": "Localized hyperhidrosis", "score": 0},\n {"term": "Excessive and redundant skin and subcutaneous tissue", "score": 0},\n {"term": "Benign lipomatous neoplasm of skin and subcutaneous tissue of other and unspecified sites", "score": 0},\n {"term": "Superficial frostbite of neck", "score": 0},\n {"term": "Superficial frostbite", "score": 0},\n {"term": "Cellulitis and abscess of mouth", "score": 0},\n {"term": "Frostbite with tissue necrosis of neck", "score": 0},\n {"term": "Other disorders of skin and subcutaneous tissue, not elsewhere classified", "score": 0},\n {"term": "Localized swelling, mass and lump of skin and subcutaneous tissue", "score": 0},\n {"term": "Benign lipomatous neoplasm of skin and subcutaneous tissue of head, face and neck", "score": 0},\n {"term": "Localized hyperhidrosis", "score": 0},\n {"term": "Excessive and redundant skin and subcutaneous tissue", "score": 0},\n {"term": "Benign lipomatous neoplasm of skin and subcutaneous tissue of other and unspecified sites", "score": 0},\n {"term": "Superficial frostbite of neck", "score": 0},\n {"term": "Superficial frostbite", "score": 0},\n {"term": "Cellulitis and abscess of mouth", "score": 0},\n {"term": "Frostbite with tissue necrosis of neck", "score": 0},\n {"term": "Other disorders of skin and subcutaneous tissue, not elsewhere classified", "score": 0},\n {"term": "Localized swelling, mass and lump of skin and subcutaneous tissue", "score": 0},\n {"term": "Benign lipomatous neoplasm of skin and subcutaneous tissue of head, face and neck", "score": 0},\n {"term": "Localized hyperhidrosis", "score": 0},\n {"term": "Excessive and redundant skin and subcutaneous tissue", "score": 0},\n {"term": "Benign lipomatous neoplasm of skin and subcutaneous tissue of other and unspecified sites", "score": 0},\n {"term": "Superficial frostbite of neck", "score": 0},\n {"term": "Superficial frostbite", "score": 0},\n {"term": "Cellulitis and abscess of mouth", "score": 0},\n {"term": "Frostbite with tissue necrosis of neck", "score": 0},\n {"term": "Other disorders of skin and subcutaneous tissue, not elsewhere classified", "score": 0},\n {"term": "Localized swelling, mass and lump of skin and subcutaneous tissue", "score": 0},\n {"term": "Benign lipomatous neoplasm of skin and subcutaneous tissue of head, face and neck", "score": 0},\n {"term": "Localized hyperhidrosis", "score": 0},\n {"term": "Excessive and redundant skin and subcutaneous tissue", "score": 0},\n {"term": "Benign lipomatous neoplasm of skin and subcutaneous tissue of other and unspecified sites", "score": 0},\n {"term": "Superficial frostbite of neck", "score": 0},\n {"term": "Superficial frostbite", "score": 0},\n {"term": "Cellulitis and abscess of mouth", "score": 0},\n {"term": "Frostbite with tissue necrosis of neck", "score": 0},\n {"term": "Other disorders of skin and subcutaneous tissue, not elsewhere classified", "score": 0},\n {"term": "Localized swelling, mass and lump of skin and subcutaneous tissue", "score": 0},\n {"term": "Benign lipomatous neoplasm of skin and subcutaneous tissue of head, face and neck", "score": 0},\n {"

json_repair.loads():
{'text description" : "subcutaneous oxycodone': 'terms" : [\n {"term', 'Localized swelling, mass and lump of skin and subcutaneous tissue': 'score'}

Expected behavior

Ideally, parsed with all the LLM output present in the loaded JSON, but at least something with the "text description" and "terms" objects correctly existing rather than being combined.

I appreciate it might be a big change to json_repair but I did wonder if there might be a way to pass a JSON schema to it, so it can ensure the output conforms.

JSON decoding problem

Describe the bug
Failed JSON decoding

[6771] Failed to execute script 'jsonfix' due to unhandled exception!
Traceback (most recent call last):
File "jsonfix.py", line 281, in repair_json
File "json/__init__.py", line 348, in loads
File "json/decoder.py", line 337, in decode
File "json/decoder.py", line 353, in raw_decode
json.decoder.JSONDecodeError: Invalid control character at: line 2 column 48 (char 50)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "jsonfix.py", line 310, in <module>
File "jsonfix.py", line 283, in repair_json
File "jsonfix.py", line 17, in parse
File "jsonfix.py", line 29, in parse_json
File "jsonfix.py", line 106, in parse_object
File "jsonfix.py", line 59, in parse_json
ValueError: Invalid JSON format`

To Reproduce
Try to parse following json:

{
"real_content": "Some string: Some other string

Some string <a href=\"https://domain.com\">Some link</a>
"
}

Expected behavior
Correct json.

Handle number fractions in json that are not enclosed in quotes.

Is your feature request related to a problem? Please describe.
Code currently does not handle: '{"key": 1/3}' --> it will currently mess up the parsing of the rest of the json string. It will treat the "1" as the value, and the "3" as the next key and swap the keys and values for the rest of the json.
{"key1": 1/3, "key2": 1, "key3": "value3", "key4": "value4"}'
{'key1': 1, 3: 'key2', 1: 'key3', 'value3': 'key4'}

What I would like instead is '{"key": "1/3"}'.

This original json output of '{"key": 1/3}' is a response I have received from an LLM.

Describe the solution you'd like
I would like it to output the fraction as a string.

Describe alternatives you've considered

Additional context

[Bug]: Fails to accurately capture value with missing opening quote if a comma comes before closing quote

Version of the library

0.23.1

Describe the bug

When parsing broken json that looks like this:

[
  {
    "Snippet Summary Id": 1,
    "Overview": "Syncing with Company",
    "Description": The conversation focused on how this company's release management system integrates with ours, providing a streamlined workflow for documentation approval, unlike Jim.",
    "What the Prospect said": "John was interested in understanding how the release flow works and how it can be used to approve documentation and drawings directly in the product.",
    "Seller Response": "Gene explained that the configuration allows the release flow to start from the other product and push information to ours, enabling a wider team to approve documentation without needing direct access to our product.",
    "Quote": "Okay. So configuration is done right now."
  },
  {
    "Snippet Summary Id": 2,
    "Overview": "Assigning Part Numbers",
    "Description": "The discussion covered the capability of this product to assign part numbers to CAD data, a feature that might differentiate Our product from theirs.",
    "What the Prospect said": "Eve was looking at the part table and seemed curious about how part numbers could be assigned and mapped to categories in our product.",
    "Seller Response": "Gene demonstrated how part numbers could be assigned to CAD data through our product and mapped to various categories, showcasing the product's flexibility.",
    "Quote": "One of the options is that you can ask the product to assign per numbers to your CAD data."
  }
]

The missing quote after "Description": is repaired but instead of closing the quote at the existing closing quote, the package inserts a new quote at the first comma it finds, resulting in this:

[
  {
    "Snippet Summary Id": 1,
    "Overview": "Syncing with Company",
    "Description": "The conversation focused on how this company's release management system integrates with ours",
    "Jim.": "What the Prospect said\": \"John was interested in understanding how the release flow works and how it can be used to approve documentation and drawings directly in the product.",
    "Seller Response": "Gene explained that the configuration allows the release flow to start from the other product and push information to ours, enabling a wider team to approve documentation without needing direct access to our product.",
    "Quote": "Okay. So configuration is done right now."
  },
  {
    "Snippet Summary Id": 2,
    "Overview": "Assigning Part Numbers",
    "Description": "The discussion covered the capability of this product to assign part numbers to CAD data, a feature that might differentiate Our product from theirs.",
    "What the Prospect said": "Eve was looking at the part table and seemed curious about how part numbers could be assigned and mapped to categories in our product.",
    "Seller Response": "Gene demonstrated how part numbers could be assigned to CAD data through our product and mapped to various categories, showcasing the product's flexibility.",
    "Quote": "One of the options is that you can ask the product to assign per numbers to your CAD data."
  }
]

How to reproduce

string = """
[
  {
    "Snippet Summary Id": 1,
    "Overview": "Syncing with Company",
    "Description": The conversation focused on how this company's release management system integrates with ours, providing a streamlined workflow for documentation approval, unlike Jim.",
    "What the Prospect said": "John was interested in understanding how the release flow works and how it can be used to approve documentation and drawings directly in the product.",
    "Seller Response": "Gene explained that the configuration allows the release flow to start from the other product and push information to ours, enabling a wider team to approve documentation without needing direct access to our product.",
    "Quote": "Okay. So configuration is done right now."
  },
  {
    "Snippet Summary Id": 2,
    "Overview": "Assigning Part Numbers",
    "Description": "The discussion covered the capability of this product to assign part numbers to CAD data, a feature that might differentiate Our product from theirs.",
    "What the Prospect said": "Eve was looking at the part table and seemed curious about how part numbers could be assigned and mapped to categories in our product.",
    "Seller Response": "Gene demonstrated how part numbers could be assigned to CAD data through our product and mapped to various categories, showcasing the product's flexibility.",
    "Quote": "One of the options is that you can ask the product to assign per numbers to your CAD data."
  }
]
"""
repair_json(string, return_objects=True)

Expected behavior

I'd expect this:

[
  {
    "Snippet Summary Id": 1,
    "Overview": "Syncing with Company",
    "Description": "The conversation focused on how this company's release management system integrates with ours, providing a streamlined workflow for documentation approval, unlike Jim.",
    "What the Prospect said": "John was interested in understanding how the release flow works and how it can be used to approve documentation and drawings directly in the product.",
    "Seller Response": "Gene explained that the configuration allows the release flow to start from the other product and push information to ours, enabling a wider team to approve documentation without needing direct access to our product.",
    "Quote": "Okay. So configuration is done right now."
  },
  {
    "Snippet Summary Id": 2,
    "Overview": "Assigning Part Numbers",
    "Description": "The discussion covered the capability of this product to assign part numbers to CAD data, a feature that might differentiate Our product from theirs.",
    "What the Prospect said": "Eve was looking at the part table and seemed curious about how part numbers could be assigned and mapped to categories in our product.",
    "Seller Response": "Gene demonstrated how part numbers could be assigned to CAD data through our product and mapped to various categories, showcasing the product's flexibility.",
    "Quote": "One of the options is that you can ask the product to assign per numbers to your CAD data."
  }
]

Overall this is an awesome tool!! It's handled everything else I've thrown at it perfectly.

Function "repair_json" raises an AttributeError exception when called with a certain input

Describe the bug
When I call "repair_json" with a certain input, it throws AttributeError in version 0.17.0:

in JSONParser.parse_string(self)

    234     lstring_delimiter = "“"
    235     rstring_delimiter = "”"
--> 236 elif char.isalpha():
    237     # This could be a <boolean> and not a string. Because (T)rue or (F)alse or (N)ull are valid
    238     if char.lower() in ["t", "f", "n"]:
    239         value = self.parse_boolean_or_null()

AttributeError: 'bool' object has no attribute 'isalpha'

To Reproduce

Install json_repair==0.17.0
Execute json_repair.repair_json("[{]", return_objects=True)

Expected behavior
Returns "[]"

Desktop (please complete the following information):

OS: Windows 11

Additional context

Bug occurs in 0.17.0 version only

repair_json throws an error when a single ']' is missing

Describe the bug
I am getting an error when trying to repair a small json file with one single issue: a missing ']'

To Reproduce
Just run the following code

from json_repair import repair_json

str = '''
{
  "resourceType": "Bundle",
  "id": "1",
  "type": "collection",
  "entry": [
    {
      "resource": {
        "resourceType": "Patient",
        "id": "1",
        "name": [
          {"use": "official", "family": "Corwin", "given": ["Keisha", "Sunny"], "prefix": ["Mrs."},
          {"use": "maiden", "family": "Goodwin", "given": ["Keisha", "Sunny"], "prefix": ["Mrs."]}
        ]
      }
    }
  ]
}
'''
repair_json(str, skip_json_loads=True)

Observations
The problem seems to be the fact that "name" is made of two dicts. If you remove the second entry, and simply input

 "name": [
          {"use": "official", "family": "Corwin", "given": ["Keisha", "Sunny"], "prefix": ["Mrs."}
]

seems to work

mangiucugna / json_repair Goto Github PK

json_repair's Issues

Version of the library

Describe the bug

How to reproduce

Expected behavior

Version of the library

Describe the bug

How to reproduce

Expected behavior

Version of the library

Describe the bug

How to reproduce

Expected behavior

Version of the library

Describe the bug

How to reproduce

Expected behavior

Version of the library

Describe the bug

How to reproduce

Expected behavior

Version of the library

Describe the bug

How to reproduce

Expected behavior

Version of the library

Describe the bug

How to reproduce

Expected behavior

Version of the library

Describe the bug

How to reproduce

(Note, I've formatted the recovered/output JSON just to make it more readable)

Expected behavior

Version of the library

Describe the bug

How to reproduce

Describe the solution you'd like

Context

Version of the library

Describe the bug

How to reproduce

Expected behavior

Version of the library

Describe the bug

How to reproduce

Expected behavior

Version of the library

Describe the bug

How to reproduce

Expected behavior

Version of the library

Describe the bug

How to reproduce

Expected behavior

Recommend Projects

Recommend Topics

Recommend Org