mangiucugna / json_repair Goto Github PK
View Code? Open in Web Editor NEWA python module to repair invalid JSON, commonly used to parse the output of LLMs
Home Page: https://pypi.org/project/json-repair/
License: MIT License
A python module to repair invalid JSON, commonly used to parse the output of LLMs
Home Page: https://pypi.org/project/json-repair/
License: MIT License
0.21.0
See reproduction
response0 = """ Here is an example employee profile in JSON format, with keys that are less than 64 characters and made of only alphanumerics, underscores, or hyphens:
```json
{
"employee_id": 1234,
"name": "John Doe",
"email": "[email protected]",
"job_title": "Software Engineer",
"department": "Engineering",
"hire_date": "2020-01-01",
"salary": 100000,
"manager_id": 5678
}
In Markdown, you can display this JSON code block like this:
{
"employee_id": 1234,
"name": "John Doe",
"email": "[email protected]",
"job_title": "Software Engineer",
"department": "Engineering",
"hire_date": "2020-01-01",
"salary": 100000,
"manager_id": 5678
}
This will display the JSON code block with proper formatting and highlighting.
"""
from json_repair import repair_json
response = repair_json(response0)
print(response)
gives just 64
I don't see why it would choose that integer in first string as valid json.
How to control which part it returns? E.g. if I always was looking for a {} structure, I could hopefully tell repair_json that.
Describe the bug
Returned json by json_repair
is not in correct format which was expected.
To Reproduce
Steps to reproduce the behavior:
Run the file in python environment with langchain
and json_repair
installed
Expected behavior
I expected the repair_json
to repair the output json.
Screenshots
If applicable, add screenshots to help explain your problem.
Desktop (please complete the following information):
Additional context
OpenAI has returned incomplete json due to limit in completion prompt. The json is missing ":"
after 'answer40'
at the end of the response. I expected json_repair
to fix the issue but it has created unexpected json.
Code to get the json:
import json_repair
from langchain_core.messages.ai import AIMessage
response = AIMessage(content='', additional_kwargs={'function_call': {'arguments': '{""answer"":[{""traits"":""Female aged 60+"",""answer1"":""5"",""answer2"":""Labrador Retriever"",""answer3"":""75"",""answer4"":""Buddy"",""answer5"":""It was my late husband\'s nickname"",""answer6"":""Bud"",""answer7"":""Before 2020"",""answer8"":""yes"",""answer9"":""6"",""answer10"":""Instant connection"",""answer11"":""It\'s taught me more about unconditional love"",""answer12"":""5"",""answer13"":""Family Member"",""answer14"":""Because he\'s part of the family"",""answer15"":""Paw shake, about a week"",""answer16"":""14, in a dog bed by my bedside"",""answer17"":""yes"",""answer18"":""$1200"",""answer19"":""20%"",""answer20"":""60%"",""answer21"":""$100"",""answer22"":""Spent less in March"",""answer23"":""Monthly"",""answer24"":""Dry food"",""answer25"":""I know a lot about the brand"",""answer26"":""5"",""answer27"":""3"",""answer28"":""5"",""answer29"":""4"",""answer30"":""5"",""answer31"":""4"",""answer32"":""2"",""answer33"":""Blue Buffalo"",""answer34"":""High quality, my dog loves it, good value"",""answer35"":""5"",""answer36"":""4"",""answer37"":""5"",""answer38"":""4"",""answer39"":""5"",""answer40"":""4"",""answer41"":""3"",""answer42"":""3"",""answer43"":""I interact more"",""answer44"":""The ongoing pandemic"",""answer45"":""No"",""answer46"":""Yes"",""answer47"":""Car"",""answer48"":""Female"",""answer49"":""63"",""answer50"":""1"",""answer51"":""Graduate degree"",""answer52"":""Retired"",""answer53"":""Less than $25,000"",""answer54"":""Sarasota"",""answer55"":""Florida""},{""traits"":""Female aged 60+"",""answer1"":""3"",""answer2"":""Beagle"",""answer3"":""20"",""answer4"":""Scout"",""answer5"":""To Kill a Mockingbird is my favorite book"",""answer6"":""Scooty"",""answer7"":""After 2020"",""answer8"":""no"",""answer9"":""8"",""answer10"":""Overwhelmed with joy"",""answer11"":""I\'ve become more active"",""answer12"":""5"",""answer13"":""Best Friend"",""answer14"":""We do everything together"",""answer15"":""Stay, a couple of days"",""answer16"":""16, on the living room couch"",""answer17"":""yes"",""answer18"":""$800"",""answer19"":""30%"",""answer20"":""50%"",""answer21"":""$70"",""answer22"":""Spent more in March"",""answer23"":""Bi-weekly"",""answer24"":""Mix"",""answer25"":""I know just the basics about the brand"",""answer26"":""5"",""answer27"":""4"",""answer28"":""5"",""answer29"":""3"",""answer30"":""5"",""answer31"":""2"",""answer32"":""4"",""answer33"":""Purina"",""answer34"":""Affordable, accessible, and Scout enjoys it"",""answer35"":""4"",""answer36"":""5"",""answer37"":""4"",""answer38"":""3"",""answer39"":""5"",""answer40"":""2"",""answer41"":""4"",""answer42"":""2"",""answer43"":""No change"",""answer44"":""The pandemic, but we\'ve adapted"",""answer45"":""Yes"",""answer46"":""No"",""answer47"":""N/A-I\'m not planning to travel with my dog this Spring"",""answer48"":""Female"",""answer49"":""68"",""answer50"":""2"",""answer51"":""Bachelor\'s degree"",""answer52"":""Retired"",""answer53"":""$50,000 to $74,999"",""answer54"":""Boise"",""answer55"":""Idaho""},{""traits"":""Female aged 60+"",""answer1"":""7"",""answer2"":""Golden Retriever"",""answer3"":""65"",""answer4"":""Sunny"",""answer5"":""Her cheerful personality"",""answer6"":""Sun"",""answer7"":""Before 2020"",""answer8"":""yes"",""answer9"":""10"",""answer10"":""Pure happiness"",""answer11"":""I appreciate the simple joys more"",""answer12"":""5"",""answer13"":""Loyal companion"",""answer14"":""Sunny is always by my side"",""answer15"":""Open doors, took about a month"",""answer16"":""12, on her plush bed in the sunroom"",""answer17"":""yes"",""answer18"":""$1500"",""answer19"":""25%"",""answer20"":""40%"",""answer21"":""$130"",""answer22"":""Spent about the same in March"",""answer23"":""Weekly"",""answer24"":""Dry food"",""answer25"":""I know a lot about the brand"",""answer26"":""5"",""answer27"":""2"",""answer28"":""5"",""answer29"":""4"",""answer30"":""5"",""answer31"":""5"",""answer32"":""1"",""answer33"":""Royal Canin"",""answer34"":""Superb quality, vet recommended"",""answer35"":""5"",""answer36"":""3"",""answer37"":""5"",""answer38"":""4"",""answer39"":""5"",""answer40"":""5"",""answer41"":""2"",""answer42"":""2"",""answer43"":""I interact more"",""answer44"":""Pandemic, we\'re staying home more"",""answer45"":""Not Sure"",""answer46"":""Yes"",""answer47"":""Car"",""answer48"":""Female"",""answer49"":""65"",""answer50"":""1"",""answer51"":""Some college, no degree"",""answer52"":""Employed part-time"",""answer53"":""$25,000 to $49,999"",""answer54"":""Tucson"",""answer55"":""Arizona""},{""traits"":""Female aged 60+"",""answer1"":""9"",""answer2"":""Dachshund"",""answer3"":""14"",""answer4"":""Frankie"",""answer5"":""His long body reminds me of a hotdog"",""answer6"":""Frank"",""answer7"":""In 2020"",""answer8"":""no"",""answer9"":""48"",""answer10"":""Amused by his quirky attitude"",""answer11"":""Laughs are a daily occurrence now"",""answer12"":""5"",""answer13"":""Protector"",""answer14"":""Fearless despite his size"",""answer15"":""Dance, surprisingly one weekend"",""answer16"":""10, in my bed"",""answer17"":""yes"",""answer18"":""$500"",""answer19"":""20%"",""answer20"":""40%"",""answer21"":""$40"",""answer22"":""Spent less in March"",""answer23"":""Monthly"",""answer24"":""Wet food"",""answer25"":""I\'m not very familiar with the brand"",""answer26"":""5"",""answer27"":""3"",""answer28"":""4"",""answer29"":""3"",""answer30"":""4"",""answer31"":""3"",""answer32"":""3"",""answer33"":""Hill\'s Science Diet"",""answer34"":""Recommended by my vet, Frankie\'s health"",""answer35"":""4"",""answer36"":""4"",""answer37"":""4"",""answer38"":""3"",""answer39"":""4"",""answer40"":""4"",""answer41"":""3"",""answer42"":""4"",""answer43"":""I interact more"",""answer44"":""Retirement, I have more time"",""answer45"":""Yes"",""answer46"":""No"",""answer47"":""N/A-I\'m not planning to travel with my dog this Spring"",""answer48"":""Female"",""answer49"":""70"",""answer50"":""1"",""answer51"":""Associate degree"",""answer52"":""Retired"",""answer53"":""Less than $25,000"",""answer54"":""Springfield"",""answer55"":""Missouri""},{""traits"":""Female aged 60+"",""answer1"":""2"",""answer2"":""Poodle"",""answer3"":""10"",""answer4"":""Coco"",""answer5"":""Her fur\'s chocolate color"",""answer6"":""Cokes"",""answer7"":""After 2020"",""answer8"":""yes"",""answer9"":""3"",""answer10"":""Adoration"",""answer11"":""I feel less lonely"",""answer12"":""5"",""answer13"":""Best Friend"",""answer14"":""Coco follows me everywhere"",""answer15"":""Fetch, just a few sessions"",""answer16"":""18, on a designer doggy bed"",""answer17"":""yes"",""answer18"":""$2000"",""answer19"":""10%"",""answer20"":""70%"",""answer21"":""$160"",""answer22"":""Spent more in March"",""answer23"":""Weekly"",""answer24"":""Mix"",""answer25"":""I know a lot about the brand"",""answer26"":""5"",""answer27"":""2"",""answer28"":""5"",""answer29"":""5"",""answer30"":""5"",""answer31"":""3"",""answer32"":""5"",""answer33"":""Orijen"",""answer34"":""Coco\'s improvement, natural ingredients"",""answer35"":""5"",""answer36"":""3"",""answer37"":""5"",""answer38"":""5"",""answer39"":""5"",""answer40"":""3"",""answer41"":""5"",""answer42"":""3"",""answer43"":""I interact more"",""answer44"":""Having moved to a pet-friendly community"",""answer45"":""Yes"",""answer46"":""Yes"",""answer47"":""Car"",""answer48"":""Female"",""answer49"":""61"",""answer50"":""1"",""answer51"":""Graduate degree"",""answer52"":""Self-employed"",""answer53"":""$100,000 to $149,000"",""answer54"":""San Diego"",""answer55"":""California""},{""traits"":""Female aged 60+"",""answer1"":""11"",""answer2"":""Yorkshire Terrier"",""answer3"":""7"",""answer4"":""Pixie"",""answer5"":""Her tiny, fairy-like appearance"",""answer6"":""Pix"",""answer7"":""Before 2020"",""answer8"":""no"",""answer9"":""12"",""answer10"":""A bit anxious, she was so small"",""answer11"":""I enjoy everyday moments more"",""answer12"":""5"",""answer13"":""Family Member"",""answer14"":""She\'s been with me through thick and thin"",""answer15"":""Spin, just took two weeks"",""answer16"":""13, her cushioned crate"",""answer17"":""yes"",""answer18"":""$700"",""answer19"":""15%"",""answer20"":""45%"",""answer21"":""$60"",""answer22"":""Spent about the same in March"",""answer23"":""Bi-weekly"",""answer24"":""Dry food"",""answer25"":""I know just the basics about the brand"",""answer26"":""4"",""answer27"":""3"",""answer28"":""4"",""answer29"":""5"",""answer30"":""5"",""answer31"":""1"",""answer32"":""4"",""answer33"":""Iams"",""answer34"":""My Pixie\'s health, cost-effective"",""answer35"":""4"",""answer36"":""5"",""answer37"":""4"",""answer38"":""4"",""answer39"":""5"",""answer40"":""1"",""answer41"":""4"",""answer42"":""5"",""answer43"":""I interact more"",""answer44"":""The pandemic has us spending more time indoors"",""answer45"":""Not Sure"",""answer46"":""No"",""answer47"":""N/A-I\'m not planning to travel with my dog this Spring"",""answer48"":""Female"",""answer49"":""72"",""answer50"":""2"",""answer51"":""High school diploma or equivalent"",""answer52"":""Retired"",""answer53"":""Less than $25,000"",""answer54"":""Macon"",""answer55"":""Georgia""},{""traits"":""Female aged 60+"",""answer1"":""4"",""answer2"":""Border Collie"",""answer3"":""45"",""answer4"":""Shep"",""answer5"":""Traditional name for a sheepdog"",""answer6"":""Sheppy"",""answer7"":""Before 2020"",""answer8"":""yes"",""answer9"":""5"",""answer10"":""I felt protective"",""answer11"":""Daily physical activity is a must"",""answer12"":""5"",""answer13"":""Loyal companion"",""answer14"":""Always ready to work and play"",""answer15"":""We\'ve mastered herding techniques, took months"",""answer16"":""10, in the utility room on his mat"",""answer17"":""yes"",""answer18"":""$1300"",""answer19"":""10%"",""answer20"":""55%"",""answer21"":""$110"",""answer22"":""Spent about the same in March"",""answer23"":""Weekly"",""answer24"":""Dry food"",""answer25"":""I know a lot about the brand"",""answer26"":""5"",""answer27"":""4"",""answer28"":""5"",""answer29"":""3"",""answer30"":""5"",""answer31"":""3"",""answer32"":""2"",""answer33"":""Acana"",""answer34"":""High-quality, breed-specific formula"",""answer35"":""5"",""answer36"":""4"",""answer37"":""5"",""answer38"":""4"",""answer39"":""5"",""answer40"":""3"",""answer41"":""2"",""answer42"":""4"",""answer43"":""No change"",""answer44"":""Seasonal allergies, we adjust our time outside"",""answer45"":""No"",""answer46"":""No"",""answer47"":""N/A-I\'m not planning to travel with my dog this Spring"",""answer48"":""Female"",""answer49"":""67"",""answer50"":""1"",""answer51"":""Bachelor\'s degree"",""answer52"":""Employed part-time"",""answer53"":""$25,000 to $49,999"",""answer54"":""Topeka"",""answer55"":""Kansas""},{""traits"":""Female aged 60+"",""answer1"":""10"",""answer2"":""Chihuahua"",""answer3"":""5"",""answer4"":""Tiny"",""answer5"":""Her petite size"",""answer6"":""T"",""answer7"":""After 2020"",""answer8"":""yes"",""answer9"":""24"",""answer10"":""Amused by her sassiness"",""answer11"":""It\'s important to love fiercely"",""answer12"":""5"",""answer13"":""Best Friend"",""answer14"":""Tiny is always by my side, my constant tiny shadow"",""answer15"":""Sit pretty, she picked it up in days"",""answer16"":""14, she curls up in the bed under the window"",""answer17"":""yes"",""answer18"":""$550"",""answer19"":""40%"",""answer20"":""30%"",""answer21"":""$50"",""answer22"":""Spent less in March"",""answer23"":""Bi-weekly"",""answer24"":""Mix"",""answer25"":""I know just the basics about the brand"",""answer26"":""5"",""answer27"":""5"",""answer28"":""5"",""answer29"":""2"",""answer30"":""5"",""answer31"":""2"",""answer32"":""1"",""answer33"":""Cesar"",""answer34"":""Tiny loves it, it\'s affordable, easy to store"",""answer35"":""5"",""answer36"":""5"",""answer37"":""4"",""answer38"":""2"",""answer39"":""5"",""answer40"":""2"",""answer41"":""1"",""answer42"":""5"",""answer43"":""No change"",""answer44"":""The seasons changing is the biggest influence"",""answer45"":""Yes"",""answer46"":""Yes"",""answer47"":""Car"",""answer48"":""Female"",""answer49"":""69"",""answer50"":""1"",""answer51"":""Some college, no degree"",""answer52"":""Retired"",""answer53"":""$50,000 to $74,999"",""answer54"":""El Paso"",""answer55"":""Texas""},{""traits"":""Female aged 60+"",""answer1"":""6"",""answer2"":""Bulldog"",""answer3"":""50"",""answer4"":""Winston"",""answer5"":""His stately, British-like demeanor"",""answer6"":""Win"",""answer7"":""Before 2020"",""answer8"":""no"",""answer9"":""3"",""answer10"":""Awe, he\'s got so much character"",""answer11"":""I\'ve become more patient"",""answer12"":""5"",""answer13"":""Family Member"",""answer14"":""He\'s more like a child to me than a pet"",""answer15"":""Speak on command, a stubborn month or so"",""answer16"":""12, sprawled across the hallway rug"",""answer17"":""yes"",""answer18"":""$1100"",""answer19"":""15%"",""answer20"":""45%"",""answer21"":""$90"",""answer22"":""Spent about the same in March"",""answer23"":""Monthly"",""answer24"":""Dry food"",""answer25"":""I know a lot about the brand"",""answer26"":""4"",""answer27"":""2"",""answer28"":""4"",""answer29"":""5"",""answer30"":""5"",""answer31"":""3"",""answer32"":""3"",""answer33"":""Nutro"",""answer34"":""Non-GMO ingredients, Winston\'s preference"",""answer35"":""4"",""answer36"":""3"",""answer37"":""4"",""answer38"":""5"",""answer39"":""5"",""answer40"":""3"",""answer41"":""3"",""answer42"":""3"",""answer43"":""I interact less"",""answer44"":""Winston\'s aging, he is more independent"",""answer45"":""Not Sure"",""answer46"":""No"",""answer47"":""N/A-I\'m not planning to travel with my dog this Spring"",""answer48"":""Female"",""answer49"":""65"",""answer50"":""2"",""answer51"":""Graduate degree"",""answer52"":""Retired"",""answer53"":""Prefer not to say"",""answer54"":""Charleston"",""answer55"":""South Carolina""},{""traits"":""Female aged 60+"",""answer1"":""8"",""answer2"":""Shih Tzu"",""answer3"":""12"",""answer4"":""Gizmo"",""answer5"":""His playful and curious nature"",""answer6"":""Gizzy"",""answer7"":""In 2020"",""answer8"":""yes"",""answer9"":""60"",""answer10"":""Instant love, he\'s such a fluffball"",""answer11"":""I\'m more outgoing, he needs lots of socializing"",""answer12"":""5"",""answer13"":""Best Friend"",""answer14"":""We have an unbreakable bond"",""answer15"":""Roll over, a week\'s full of treats and praise"",""answer16"":""15, on his favorite armchair"",""answer17"":""yes"",""answer18"":""$950"",""answer19"":""10%"",""answer20"":""55%"",""answer21"":""$80"",""answer22"":""Spent about the same in March"",""answer23"":""Monthly"",""answer24"":""Dry food"",""answer25"":""I know just the basics about the brand"",""answer26"":""5"",""answer27"":""4"",""answer28"":""5"",""answer29"":""3"",""answer30"":""5"",""answer31"":""1"",""answer32"":""4"",""answer33"":""Science Diet"",""answer34"":""Gizzy likes it, vet approved, easy to find"",""answer35"":""4"",""answer36"":""4"",""answer37"":""5"",""answer38"":""3"",""answer39"":""5"",""answer40"":""1"",""answer41"":""4"",""answer42"":""4"",""answer43"":""No change"",""answer44"":""Seasonal weather, we keep our routine"",""answer45"":""Yes"",""answer46"":""Not Sure"",""answer47"":""N/A-I\'m not planning to travel with my dog this Spring"",""answer48"":""Female"",""answer49"":""64"",""answer50"":""1"",""answer51"":""Bachelor\'s degree"",""answer52"":""Employed full-time"",""answer53"":""$50,000 to $74,999"",""answer54"":""Reno"",""answer55"":""Nevada""},{""traits"":""Female aged 60+"",""answer1"":""1"",""answer2"":""Australian Shepherd"",""answer3"":""30"",""answer4"":""Blue"",""answer5"":""His striking blue eyes"",""answer6"":""Boo"",""answer7"":""After 2020"",""answer8"":""no"",""answer9"":""2"",""answer10"":""Thrilled, he was full of energy"",""answer11"":""I\'ve got a new sense of purpose"",""answer12"":""5"",""answer13"":""Loyal companion"",""answer14"":""He\'s loyal and always there for support"",""answer15"":""Fetch with a frisbee, about two weeks"",""answer16"":""14, on a runner in the hall"",""answer17"":""yes"",""answer18"":""$1000"",""answer19"":""5%"",""answer20"":""65%"",""answer21"":""$85"",""answer22"":""Spent more in March"",""answer23"":""Weekly"",""answer24"":""Dry food"",""answer25"":""I know a lot about the brand"",""answer26"":""5"",""answer27"":""3"",""answer28"":""5"",""answer29"":""4"",""answer30"":""5"",""answer31"":""4"",""answer32"":""1"",""answer33"":""Taste of the Wild"",""answer34"":""Grain-free, Blue\'s coat looks amazing"",""answer35"":""5"",""answer36"":""4"",""answer37"":""5"",""answer38"":""4"",""answer39"":""5"",""answer40', 'name': 'Answers'}})
final_json = json_repair.loads(response.additional_kwargs['function_call']['arguments'])['answer']
JSON return by json_repair:
[{'traits': '', 60: '', 'answer1': '', 5: ',', 'answer2': '', 'Retriever': '', 'answer3': '', 75: ',', 'answer4': '', 'Buddy': '', 'answer5': '', 'nickname': '', 'answer6': '', 'Bud': '', 'answer7': '', 2020: ',', 'answer8': '', 'yes': '', 'answer9': '', 6: ',', 'answer10': '', 'connection': '', 'answer11': '', 'love': '', 'answer12': '', 'answer13': '', 'Member': '', 'answer14': '', 'family': '', 'answer15': '', 'week': '', 'answer16': '', 14: 'in a dog bed by my bedside', ',': 'answer18', ':': 1200, 'answer19': '', 20: '', 'answer20': '', 'answer21': '', 100: ',', 'answer22': '', 'March': '', 'answer23': '', 'Monthly': '', 'answer24': '', 'food': '', 'answer25': '', 'brand': '', 'answer26': '', 'answer27': '', 3: ',', 'answer28': '', 'answer29': '', 4: ',', 'answer30': '', 'answer31': '', 'answer32': '', 2: ',', 'answer33': '', 'Buffalo': '', 'answer34': '', 'value': '', 'answer35': '', 'answer36': '', 'answer37': '', 'answer38': '', 'answer39': '', 'answer40': '', 'answer41': '', 'answer42': '', 'answer43': '', 'more': '', 'answer44': '', 'pandemic': '', 'answer45': '', 'No': '', 'answer46': '', 'Yes': '', 'answer47': '', 'Car': '', 'answer48': '', 'Female': '', 'answer49': '', 63: ',', 'answer50': '', 1: ',', 'answer51': '', 'degree': '', 'answer52': '', 'Retired': '', 'answer53': '', 25: 0, 'answer54': '', 'Sarasota': '', 'answer55': '', 'Florida': ''}, {'traits': '', 60: '', 'answer1': '', 3: ',', 'answer2': '', 'Beagle': '', 'answer3': '', 20: ',', 'answer4': '', 'Scout': '', 'answer5': '', 'book': '', 'answer6': '', 'Scooty': '', 'answer7': '', 2020: ',', 'answer8': '', 'no': '', 'answer9': '', 8: ',', 'answer10': '', 'joy': '', 'answer11': '', 'active': '', 'answer12': '', 5: ',', 'answer13': '', 'Friend': '', 'answer14': '', 'together': '', 'answer15': '', 'days': '', 'answer16': '', 16: 'on the living room couch', ',': 'answer18', ':': 800, 'answer19': '', 30: '', 'answer20': '', 50: 0, 'answer21': '', 70: ',', 'answer22': '', 'March': '', 'answer23': '', 'Bi-weekly': '', 'answer24': '', 'Mix': '', 'answer25': '', 'brand': '', 'answer26': '', 'answer27': '', 4: ',', 'answer28': '', 'answer29': '', 'answer30': '', 'answer31': '', 2: ',', 'answer32': '', 'answer33': '', 'Purina': '', 'answer34': '', 'it': '', 'answer35': '', 'answer36': '', 'answer37': '', 'answer38': '', 'answer39': '', 'answer40': '', 'answer41': '', 'answer42': '', 'answer43': '', 'change': '', 'answer44': '', 'adapted': '', 'answer45': '', 'Yes': '', 'answer46': '', 'No': '', 'answer47': '', 'Spring': '', 'answer48': '', 'Female': '', 'answer49': '', 68: ',', 'answer50': '', 'answer51': '', 'degree': '', 'answer52': '', 'Retired': '', 'answer53': '', 74: 999, 'answer54': '', 'Boise': '', 'answer55': '', 'Idaho': ''}, {'traits': '', 60: '', 'answer1': '', 7: ',', 'answer2': '', 'Retriever': '', 'answer3': '', 65: ',', 'answer4': '', 'Sunny': '', 'answer5': '', 'personality': '', 'answer6': '', 'Sun': '', 'answer7': '', 2020: ',', 'answer8': '', 'yes': '', 'answer9': '', 10: ',', 'answer10': '', 'happiness': '', 'answer11': '', 'more': '', 'answer12': '', 5: ',', 'answer13': '', 'companion': '', 'answer14': '', 'side': '', 'answer15': '', 'month': '', 'answer16': '', 12: 'on her plush bed in the sunroom', ',': 'answer18', ':': 1500, 'answer19': '', 25: 0, 'answer20': '', 40: '', 'answer21': '', 130: ',', 'answer22': '', 'March': '', 'answer23': '', 'Weekly': '', 'answer24': '', 'food': '', 'answer25': '', 'brand': '', 'answer26': '', 'answer27': '', 2: ',', 'answer28': '', 'answer29': '', 4: ',', 'answer30': '', 'answer31': '', 'answer32': '', 1: ',', 'answer33': '', 'Canin': '', 'answer34': '', 'recommended': '', 'answer35': '', 'answer36': '', 3: ',', 'answer37': '', 'answer38': '', 'answer39': '', 'answer40': '', 'answer41': '', 'answer42': '', 'answer43': '', 'answer44': '', 'answer45': '', 'Sure': '', 'answer46': '', 'Yes': '', 'answer47': '', 'Car': '', 'answer48': '', 'Female': '', 'answer49': '', 'answer50': '', 'answer51': '', 'degree': '', 'answer52': '', 'part-time': '', 'answer53': '', 49: 999, 'answer54': '', 'Tucson': '', 'answer55': '', 'Arizona': ''}, {'traits': '', 60: '', 'answer1': '', 9: ',', 'answer2': '', 'Dachshund': '', 'answer3': '', 14: ',', 'answer4': '', 'Frankie': '', 'answer5': '', 'hotdog': '', 'answer6': '', 'Frank': '', 'answer7': '', 2020: ',', 'answer8': '', 'no': '', 'answer9': '', 48: ',', 'answer10': '', 'attitude': '', 'answer11': '', 'now': '', 'answer12': '', 5: ',', 'answer13': '', 'Protector': '', 'answer14': '', 'size': '', 'answer15': '', 'weekend': '', 'answer16': '', 10: 'in my bed', ',': 'answer18', ':': 500, 'answer19': '', 20: '', 'answer20': '', 40: ',', 'answer21': '', 'answer22': '', 'March': '', 'answer23': '', 'Monthly': '', 'answer24': '', 'food': '', 'answer25': '', 'brand': '', 'answer26': '', 'answer27': '', 3: ',', 'answer28': '', 4: ',', 'answer29': '', 'answer30': '', 'answer31': '', 'answer32': '', 'answer33': '', 'Diet': '', 'answer34': '', 'health': '', 'answer35': '', 'answer36': '', 'answer37': '', 'answer38': '', 'answer39': '', 'answer40': '', 'answer41': '', 'answer42': '', 'answer43': '', 'more': '', 'answer44': '', 'time': '', 'answer45': '', 'Yes': '', 'answer46': '', 'No': '', 'answer47': '', 'Spring': '', 'answer48': '', 'Female': '', 'answer49': '', 70: ',', 'answer50': '', 1: ',', 'answer51': '', 'degree': '', 'answer52': '', 'Retired': '', 'answer53': '', 25: 0, 'answer54': '', 'Springfield': '', 'answer55': '', 'Missouri': ''}, {'traits': '', 60: '', 'answer1': '', 2: ',', 'answer2': '', 'Poodle': '', 'answer3': '', 10: '', 'answer4': '', 'Coco': '', 'answer5': '', 'color': '', 'answer6': '', 'Cokes': '', 'answer7': '', 2020: ',', 'answer8': '', 'yes': '', 'answer9': '', 3: ',', 'answer10': '', 'Adoration': '', 'answer11': '', 'lonely': '', 'answer12': '', 5: ',', 'answer13': '', 'Friend': '', 'answer14': '', 'everywhere': '', 'answer15': '', 'sessions': '', 'answer16': '', 18: 'on a designer doggy bed', ',': 'answer18', ':': 2000, 'answer19': '', 'answer20': '', 70: '', 'answer21': '', 160: ',', 'answer22': '', 'March': '', 'answer23': '', 'Weekly': '', 'answer24': '', 'Mix': '', 'answer25': '', 'brand': '', 'answer26': '', 'answer27': '', 'answer28': '', 'answer29': '', 'answer30': '', 'answer31': '', 'answer32': '', 'answer33': '', 'Orijen': '', 'answer34': '', 'ingredients': '', 'answer35': '', 'answer36': '', 'answer37': '', 'answer38': '', 'answer39': '', 'answer40': '', 'answer41': '', 'answer42': '', 'answer43': '', 'more': '', 'answer44': '', 'community': '', 'answer45': '', 'Yes': '', 'answer46': '', 'answer47': '', 'Car': '', 'answer48': '', 'Female': '', 'answer49': '', 61: ',', 'answer50': '', 1: ',', 'answer51': '', 'degree': '', 'answer52': '', 'Self-employed': '', 'answer53': '', 100: 0, 149: 0, 'answer54': '', 'Diego': '', 'answer55': '', 'California': ''}, {'traits': '', 60: ',', 'answer1': '', 11: ',', 'answer2': '', 'Terrier': '', 'answer3': '', 7: ',', 'answer4': '', 'Pixie': '', 'answer5': '', 'appearance': '', 'answer6': '', 'Pix': '', 'answer7': '', 2020: ',', 'answer8': '', 'no': '', 'answer9': '', 12: ',', 'answer10': '', 'small': '', 'answer11': '', 'more': '', 'answer12': '', 5: ',', 'answer13': '', 'Member': '', 'answer14': '', 'thin': '', 'answer15': '', 'weeks': '', 'answer16': '', 13: 'her cushioned crate', ',': 'answer18', ':': 700, 'answer19': '', 15: '', 'answer20': '', 45: '', 'answer21': '', 'answer22': '', 'March': '', 'answer23': '', 'Bi-weekly': '', 'answer24': '', 'food': '', 'answer25': '', 'brand': '', 'answer26': '', 4: ',', 'answer27': '', 3: ',', 'answer28': '', 'answer29': '', 'answer30': '', 'answer31': '', 1: ',', 'answer32': '', 'answer33': '', 'Iams': '', 'answer34': '', 'cost-effective': '', 'answer35': '', 'answer36': '', 'answer37': '', 'answer38': '', 'answer39': '', 'answer40': '', 'answer41': '', 'answer42': '', 'answer43': '', 'answer44': '', 'indoors': '', 'answer45': '', 'Sure': '', 'answer46': '', 'No': '', 'answer47': '', 'Spring': '', 'answer48': '', 'Female': '', 'answer49': '', 72: ',', 'answer50': '', 2: ',', 'answer51': '', 'equivalent': '', 'answer52': '', 'Retired': '', 'answer53': '', 25: 0, 'answer54': '', 'Macon': '', 'answer55': '', 'Georgia': ''}, {'traits': '', 60: '', 'answer1': '', 4: ',', 'answer2': '', 'Collie': '', 'answer3': '', 45: ',', 'answer4': '', 'Shep': '', 'answer5': '', 'sheepdog': '', 'answer6': '', 'Sheppy': '', 'answer7': '', 2020: ',', 'answer8': '', 'yes': '', 'answer9': '', 5: ',', 'answer10': '', 'protective': '', 'answer11': '', 'must': '', 'answer12': '', 'answer13': '', 'companion': '', 'answer14': '', 'play': '', 'answer15': '', 'months': '', 'answer16': '', 10: '', ',': 'answer18', ':': 1300, 'answer19': '', 'answer20': '', 55: '', 'answer21': '', 110: ',', 'answer22': '', 'March': '', 'answer23': '', 'Weekly': '', 'answer24': '', 'food': '', 'answer25': '', 'brand': '', 'answer26': '', 'answer27': '', 'answer28': '', 'answer29': '', 3: ',', 'answer30': '', 'answer31': '', 'answer32': '', 2: ',', 'answer33': '', 'Acana': '', 'answer34': '', 'formula': '', 'answer35': '', 'answer36': '', 'answer37': '', 'answer38': '', 'answer39': '', 'answer40': '', 'answer41': '', 'answer42': '', 'answer43': '', 'change': '', 'answer44': '', 'outside': '', 'answer45': '', 'No': '', 'answer46': '', 'answer47': '', 'Spring': '', 'answer48': '', 'Female': '', 'answer49': '', 67: ',', 'answer50': '', 1: ',', 'answer51': '', 'degree': '', 'answer52': '', 'part-time': '', 'answer53': '', 25: 0, 49: 999, 'answer54': '', 'Topeka': '', 'answer55': '', 'Kansas': ''}, {'traits': '', 60: '', 'answer1': '', 10: ',', 'answer2': '', 'Chihuahua': '', 'answer3': '', 5: ',', 'answer4': '', 'Tiny': '', 'answer5': '', 'size': '', 'answer6': '', 'T': '', 'answer7': '', 2020: ',', 'answer8': '', 'yes': '', 'answer9': '', 24: ',', 'answer10': '', 'sassiness': '', 'answer11': '', 'fiercely': '', 'answer12': '', 'answer13': '', 'Friend': '', 'answer14': '', 'shadow': '', 'answer15': '', 'days': '', 'answer16': '', 14: 'she curls up in the bed under the window', ',': 'answer18', ':': 550, 'answer19': '', 40: '', 'answer20': '', 30: '', 'answer21': '', 50: 0, 'answer22': '', 'March': '', 'answer23': '', 'Bi-weekly': '', 'answer24': '', 'Mix': '', 'answer25': '', 'brand': '', 'answer26': '', 'answer27': '', 'answer28': '', 'answer29': '', 2: ',', 'answer30': '', 'answer31': '', 'answer32': '', 1: ',', 'answer33': '', 'Cesar': '', 'answer34': '', 'store': '', 'answer35': '', 'answer36': '', 'answer37': '', 4: ',', 'answer38': '', 'answer39': '', 'answer40': '', 'answer41': '', 'answer42': '', 'answer43': '', 'change': '', 'answer44': '', 'influence': '', 'answer45': '', 'Yes': '', 'answer46': '', 'answer47': '', 'Car': '', 'answer48': '', 'Female': '', 'answer49': '', 69: ',', 'answer50': '', 'answer51': '', 'degree': '', 'answer52': '', 'Retired': '', 'answer53': '', 74: 999, 'answer54': '', 'Paso': '', 'answer55': '', 'Texas': ''}, {'traits': '', 60: '', 'answer1': '', 6: ',', 'answer2': '', 'Bulldog': '', 'answer3': '', 50: ',', 'answer4': '', 'Winston': '', 'answer5': '', 'demeanor': '', 'answer6': '', 'Win': '', 'answer7': '', 2020: ',', 'answer8': '', 'no': '', 'answer9': '', 3: ',', 'answer10': '', 'character': '', 'answer11': '', 'patient': '', 'answer12': '', 5: ',', 'answer13': '', 'Member': '', 'answer14': '', 'pet': '', 'answer15': '', 'so': '', 'answer16': '', 12: 'sprawled across the hallway rug', ',': 'answer18', ':': 1100, 'answer19': '', 15: '', 'answer20': '', 45: '', 'answer21': '', 90: ',', 'answer22': '', 'March': '', 'answer23': '', 'Monthly': '', 'answer24': '', 'food': '', 'answer25': '', 'brand': '', 'answer26': '', 4: ',', 'answer27': '', 2: ',', 'answer28': '', 'answer29': '', 'answer30': '', 'answer31': '', 'answer32': '', 'answer33': '', 'Nutro': '', 'answer34': '', 'preference': '', 'answer35': '', 'answer36': '', 'answer37': '', 'answer38': '', 'answer39': '', 'answer40': '', 'answer41': '', 'answer42': '', 'answer43': '', 'less': '', 'answer44': '', 'independent': '', 'answer45': '', 'Sure': '', 'answer46': '', 'No': '', 'answer47': '', 'Spring': '', 'answer48': '', 'Female': '', 'answer49': '', 65: ',', 'answer50': '', 'answer51': '', 'degree': '', 'answer52': '', 'Retired': '', 'answer53': '', 'say': '', 'answer54': '', 'Charleston': '', 'answer55': '', 'Carolina': ''}, {'traits': '', 60: ',', 'answer1': '', 8: ',', 'answer2': '', 'Tzu': '', 'answer3': '', 12: ',', 'answer4': '', 'Gizmo': '', 'answer5': '', 'nature': '', 'answer6': '', 'Gizzy': '', 'answer7': '', 2020: ',', 'answer8': '', 'yes': '', 'answer9': '', 'answer10': '', 'fluffball': '', 'answer11': '', 'socializing': '', 'answer12': '', 5: ',', 'answer13': '', 'Friend': '', 'answer14': '', 'bond': '', 'answer15': '', 'praise': '', 'answer16': '', 15: 'on his favorite armchair', ',': 'answer18', ':': 950, 'answer19': '', 10: '', 'answer20': '', 55: '', 'answer21': '', 80: ',', 'answer22': '', 'March': '', 'answer23': '', 'Monthly': '', 'answer24': '', 'food': '', 'answer25': '', 'brand': '', 'answer26': '', 'answer27': '', 4: ',', 'answer28': '', 'answer29': '', 3: ',', 'answer30': '', 'answer31': '', 1: ',', 'answer32': '', 'answer33': '', 'Diet': '', 'answer34': '', 'find': '', 'answer35': '', 'answer36': '', 'answer37': '', 'answer38': '', 'answer39': '', 'answer40': '', 'answer41': '', 'answer42': '', 'answer43': '', 'change': '', 'answer44': '', 'routine': '', 'answer45': '', 'Yes': '', 'answer46': '', 'Sure': '', 'answer47': '', 'Spring': '', 'answer48': '', 'Female': '', 'answer49': '', 64: ',', 'answer50': '', 'answer51': '', 'degree': '', 'answer52': '', 'full-time': '', 'answer53': '', 50: 0, 74: 999, 'answer54': '', 'Reno': '', 'answer55': '', 'Nevada': ''}, {'traits': '', 60: '', 'answer1': '', 1: ',', 'answer2': '', 'Shepherd': '', 'answer3': '', 30: ',', 'answer4': '', 'Blue': '', 'answer5': '', 'eyes': '', 'answer6': '', 'Boo': '', 'answer7': '', 2020: ',', 'answer8': '', 'no': '', 'answer9': '', 2: ',', 'answer10': '', 'energy': '', 'answer11': '', 'purpose': '', 'answer12': '', 5: ',', 'answer13': '', 'companion': '', 'answer14': '', 'support': '', 'answer15': '', 'weeks': '', 'answer16': '', 14: 'on a runner in the hall', ',': 'answer18', ':': 1000, 'answer19': '', 'answer20': '', 65: '', 'answer21': '', 85: ',', 'answer22': '', 'March': '', 'answer23': '', 'Weekly': '', 'answer24': '', 'food': '', 'answer25': '', 'brand': '', 'answer26': '', 'answer27': '', 3: ',', 'answer28': '', 'answer29': '', 4: ',', 'answer30': '', 'answer31': '', 'answer32': '', 'answer33': '', 'Wild': '', 'answer34': '', 'amazing': '', 'answer35': '', 'answer36': '', 'answer37': '', 'answer38': '', 'answer39': ''}]
0.20.1
I see that in #46 you fixed the case where there was a comma after the unescaped quote, but with some text in between the comma and the unescaped quote
But sometimes we get results back from the LLM that look like the following:
{"notes": "Sent a message to the "dictator", waiting on response."}
which gets truncated:
{"notes": "Sent a message to the \"dictator"}
Add this line to test_object_edge_cases and run test again.
assert repair_json('{"key": "Lorem "ipsum", s"}') == '{"key": "Lorem \\"ipsum\\", s"}'
I would expect that repair_json('{"key": "Lorem "ipsum", s"}') == '{"key": "Lorem \\"ipsum\\", s"}'
0.25.2
As shown by the cases below, IDs 1, 4, and 5 failed during the repair.
input: {"na"me": "Jack O"Sullivan", "id": "1"}
output: {"na": "e", "Jack O": "ullivan", "id": "1"}
------------
input: {"name": "Jack: The "OG" O"Sullivan"", "id": "2"}
output: {"name": "Jack: The \"OG\" O\"Sullivan\"", "id": "2"}
------------
input: {"name": "Jack: The "OG"", "surname": 'O'Sullivan', "id": "3"}
output: {"name": "Jack: The \"OG\"", "surname": "O'Sullivan", "id": "3"}
------------
input: {"test_str": {"1singlechar": "a""a""a", "2singlechars": "a"a"a"a"a"a"a"a"a"}, "id": "4"}
output: {"test_str": {"1singlechar": "a\"", "a": "a", "2singlechars": "a\"a\"a\"a\"a\"a\"a\"a\"a"}, "id": "4"}
------------
input: {'name': 'Jack O'Sullivan, 'id': '5'}
output: {"name": "Jack O", "id": "5"}
------------
from json_repair import repair_json
req_jsons = [
'{"na"me": "Jack O"Sullivan", "id": "1"}',
'{"name": "Jack: The "OG" O"Sullivan"", "id": "2"}',
'{"name": "Jack: The "OG"", "surname": \'O\'Sullivan\', "id": "3"}',
'{"test_str": {"1singlechar": "a""a""a", "2singlechars": "a"a"a"a"a"a"a"a"a"}, "id": "4"}',
"{'name': 'Jack O'Sullivan, 'id': '5'}",
]
for bad_json_string in req_jsons:
good_json_string = repair_json(bad_json_string, skip_json_loads=True)
print(f"input: {bad_json_string}\noutput: {good_json_string}")
print("------------")
input: {"na"me": "Jack O"Sullivan", "id": "1"}
output: {"na\me": "Jack O\"Sullivan", "id": "1"}
------------
input: {"name": "Jack: The "OG" O"Sullivan"", "id": "2"}
output: {"name": "Jack: The \"OG\" O\"Sullivan\"", "id": "2"}
------------
input: {"name": "Jack: The "OG"", "surname": 'O'Sullivan', "id": "3"}
output: {"name": "Jack: The \"OG\"", "surname": "O'Sullivan", "id": "3"}
------------
input: {"test_str": {"1singlechar": "a""a""a", "2singlechars": "a"a"a"a"a"a"a"a"a"}, "id": "4"}
output: {"test_str": {"1singlechar": "a\"\"a\"\"a", "2singlechars": "a\"a\"a\"a\"a\"a\"a\"a\"a"}, "id": "4"}
------------
input: {'name': 'Jack O'Sullivan, 'id': '5'}
output: {"name": "Jack O'Sullivan", "id": "5"}
Describe the bug
Wrong result when parsing json with trailing texts.
To Reproduce
The following code should return {'a': '', 'b': [{'c': 1}]}
json_repair.loads("""{"a": "", "b": [ { "c": 1} ]}```""")
# This is parsed to {'a': ', "b'}
json_repair.loads("""{ "a": "", "b": [ { "c": 1} ] \n}```""")
# This will raise exception TypeError: unhashable type: 'list'
Describe the bug
Hi, I was testing the json_repair module for a personal project that extracts information from a medical text and asks a LLM to fill in a JSON. The LLM I'm using is not great at returning well-formatted JSONs. That's why I was wondering how this module might help. However, I noticed that in case the ill-formatted JSON has objects that have extra closing brackets, the JSON parser stops altogether and assumes the JSON is ended, thus cutting off information.
To Reproduce
The ill-formatted JSON string is:
{"claimant_info":{"name":"John Doe","gender":"male","dominant_hand":"right-handed","date_of_birth":"01/01/2000"},"employment_info":{"occupation":"bank clerk","hours_per_week":0,"was_at_workplace_at_time_of_accident":false,"absence_not_working":[{"type":"sleep disturbance and frequent headaches","duration":""}],"work_restrictions":[{"type":""}]},"past_medical_history":[{"disease_or_pathology":"High cholesterol","text_span":""}]},"recovery_time":[{"body_part":"chest, neck, and back","recovery_time_in_days":"3-4 weeks from 1st treatment date or 9 to 12 visits whichever comes first","text_span":""}]},"dates":{"accident_date":"3/20/2021","examination_date":"3/26/2021","next_examination_date":"04/09/2021","signing_date":"3/26/2021 4:54:17 PM"}}
My test code:
from json_repair import json_repair
import json
jsonString = """{"claimant_info":{"name":"John Doe","gender":"male","dominant_hand":"right-handed","date_of_birth":"01/01/2000"},"employment_info":{"occupation":"bank clerk","hours_per_week":0,"was_at_workplace_at_time_of_accident":false,"absence_not_working":[{"type":"sleep disturbance and frequent headaches","duration":""}],"work_restrictions":[{"type":""}]},"past_medical_history":[{"disease_or_pathology":"High cholesterol","text_span":""}]},"recovery_time":[{"body_part":"chest, neck, and back","recovery_time_in_days":"3-4 weeks from 1st treatment date or 9 to 12 visits whichever comes first","text_span":""}]},"dates":{"accident_date":"3/20/2021","examination_date":"3/26/2021","next_examination_date":"04/09/2021","signing_date":"3/26/2021 4:54:17 PM"}}"""
repaired = json_repair.loads(jsonString)
output = json.dumps(repaired, indent=2)
with open("output.txt","w") as f:
f.write(output)
Expected behavior
I guess the expected behavior should be that if the extra closing parenthesis is followed by a comma, the parser should infer that the very same bracket is mislocated.
Desktop (please complete the following information):
Describe the bug
For a real number without a preceding zero, the value is converted to a whole number.
Example:
.25 -> 25
Should be:
.25 -> 0.25
Additional context
I encountered this output from an AI model.
Describe the bug
A clear and concise description of what the bug is.
We have an issue here: OpenDevin/OpenDevin#495
The LLM response tries to escape underscores. So the key new_monologue
becomes new\_monologue
in the LLM response. json_repair double-escapes the backslash, instead of removing it.
This behavior, where the LLM attempts to escape underscores, seems not uncommon. Maybe we have a special pattern of replacing \_
with _
?
Expected behavior
Escape characters removed
Describe the bug
(This might be too hard and out of scope depending on the implementation. If so, feel free to close)
The repair removes whitespace, so that if the LLM responds with:
{
"foo": "bar"
}
repair "corrects" it to
{"foo":"bar"}
This makes it hard to detect if an actual repair was made.
To Reproduce
repair JSON with whitespace
Expected behavior
whitespace is preserved
Describe the bug
If there's a missing out quote at the left of a json define, it doesn't repair properly
To Reproduce
Steps to reproduce the behavior:
Run this json file with json_repair
{
"words": abcdef",
"numbers": 12345",
"words2": ghijkl"
}
Gets parsed like this
{'words': 'abcdef', 'numbers': 12345, ',\n ': 'ords2', 'ghijkl': ''}
Expected behavior
Proper output:
{'words': 'abcdef', 'numbers': '12345', 'words2': 'ghijkl'}
0.26.0
It looks like repairing the object leads to escaped double quotes getting an extra backslash.
Run the following snippet:
import json_repair
a = '{"foo": "\\"bar\\""}'
print(json_repair.loads(a))
# {'foo': '"bar"'}
# => OK!
b = """{
"items": [
{
"foo": "\\"bar\\""
}
"""
print(json_repair.loads(b))
# {'items': [{'foo': '\\"bar"'}]}
# => KO, expected {'items': [{'foo': '"bar"'}]}
c = """{
"items": [
{
"foo": "\\"bar\\""
}
]
}"""
print(json_repair.loads(c))
# {'items': [{'foo': '"bar"'}]}
# => OK!
No extra backslash in the parsed string
0.25.1
Broken Json :
{
"name": "Mike",
"age": 29,
"is_student": "false",
"bio": "Loves to read and play guitar",
"hobbies": ["Reading" "Playing guitar" "Swimming"]
}
Repaired Json:
{
"age": 29,
"bio": "Loves to read and play guitar",
"hobbies": [
"Reading" "Playing guitar" "Swimming"
],
"is_student": "false",
"name": "Mike"
}
This is not how the repaired json should look like , i was expecting it to look like the below one
{
"age": 29,
"bio": "Loves to read and play guitar",
"hobbies": [
"Reading", "Playing guitar" ,"Swimming"
],
"is_student": "false",
"name": "Mike"
}
Although the library works fine in the case of array of integer
{
"name": "Mike",
"age": 29,
"is_student": "false",
"bio": "Loves to read and play guitar",
"hobbies": ["Reading" "Playing guitar" "Swimming"]
}
Broken Json :
{
"name": "Mike",
"age": 29,
"is_student": "false",
"bio": "Loves to read and play guitar",
"hobbies": ["Reading" "Playing guitar" "Swimming"]
}
Repaired Json:
{
"age": 29,
"bio": "Loves to read and play guitar",
"hobbies": [
"Reading" "Playing guitar" "Swimming"
],
"is_student": "false",
"name": "Mike"
}
This is not how the repaired json should look like , i was expecting it to look like the below one
{
"age": 29,
"bio": "Loves to read and play guitar",
"hobbies": [
"Reading", "Playing guitar" ,"Swimming"
],
"is_student": "false",
"name": "Mike"
}
Although the library works fine in the case of array of integer
Describe the bug
The following code throws TypeError: unhashable type: 'dict'
.
Notice that the json string is malformed (unmatched double quotes), however we should not throw exception in such cases.
To Reproduce
json_repair.loads('''{ "a": "aa", "de": "{ asdf": {} }" }''')
Exception is thrown after input of specific string
To reproduce:
good_json = repair_json(' - { "test_key": ["test_value", "test_value2"] }')
Expected behavior
Should convert string ' - { "test_key": ["test_value", "test_value2"] }' into '{ "test_key": ["test_value", "test_value2"] }'
Exception
File "...\venv\Lib\site-packages\json_repair\json_repair.py", line 251, in parse_number
return int(number_str)
^^^^^^^^^^^^^^^
ValueError: invalid literal for int() with base 10: '-'
0.25.0
Hi! Really appreciate this library :)
Somehow it's happened a few times recently that output from an LLM results in a RecursionError: maximum recursion depth exceeded
when passed into repair_json
. Unfortunately I don't have the specific output for those cases, but I've been able to reproduce one case.
An input to repair_json()
where there are at least 2972 sequential characters that don't contain valid JSON (for instance a paragraph of text) will result in this error.
Examples:
repair_json("a" * 2972)
repair_json('{"key": "value"}' + ("a" * 2972))
repair_json(("a" * 2972) + '{"key": "value"}' + ("b" * 2972))
paragraph = """
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Proin tincidunt laoreet lorem, ac posuere sapien luctus ut. Etiam volutpat vehicula dolor sit amet aliquet. Maecenas id maximus velit. Phasellus velit justo, consequat et tristique ac, tincidunt sed ligula. Cras ut auctor enim. Ut interdum euismod risus id posuere. Pellentesque habitant morbi tristique senectus et netus et malesuada fames ac turpis egestas. Vestibulum cursus felis massa, id faucibus nisl commodo vitae. Maecenas ex ipsum, consequat a eleifend sed, lacinia id nibh. Pellentesque semper ultrices nunc sit amet tincidunt. Integer pulvinar mi magna, a ultrices sem euismod vitae. Nullam odio turpis, suscipit eget viverra a, rutrum nec tellus.
Curabitur vitae tincidunt lorem, id tincidunt massa. Nam mi massa, accumsan sit amet tellus in, venenatis facilisis est. Sed eu risus fermentum, varius nulla ac, ullamcorper lacus. Nulla facilisi. Praesent a ex nunc. Integer iaculis elit vitae libero pretium elementum. Nullam eu leo vitae neque ullamcorper fermentum a sed tellus. Ut sollicitudin, nibh a faucibus suscipit, enim dolor sodales ante, a accumsan neque diam a justo. Mauris vel orci vitae tellus iaculis dictum id in magna. Duis auctor id dui eget iaculis. Sed quis massa commodo, aliquet tellus quis, tristique nisl.
In luctus tempus quam tempus vulputate. Maecenas laoreet arcu diam, sed bibendum sapien egestas vitae. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia curae; Phasellus laoreet ipsum non ante cursus imperdiet. Interdum et malesuada fames ac ante ipsum primis in faucibus. Aenean laoreet accumsan mollis. Proin feugiat, lacus non congue tincidunt, erat arcu tincidunt metus, eget dignissim ante quam ut diam. Vivamus luctus aliquam placerat. Fusce risus ante, porta ac molestie at, laoreet et odio. Sed quis facilisis magna. Vestibulum sagittis nunc tellus, iaculis ultricies est cursus vitae. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia curae; Duis posuere venenatis posuere. Sed a gravida est, sit amet condimentum massa.
Morbi efficitur aliquam dui a imperdiet. Duis lacus enim, interdum a orci nec, varius porttitor est. Donec vel mi eu mauris sagittis hendrerit sed quis nunc. Vestibulum tortor leo, pulvinar in dignissim ut, ultricies sit amet est. Morbi et viverra magna, eu lacinia nisl. Cras vitae tincidunt dui, vel lobortis velit. Suspendisse tristique imperdiet odio, ac sodales velit pulvinar at. Sed diam enim, imperdiet sit amet mi sollicitudin, rutrum condimentum leo. In id est quis diam pellentesque pharetra sit amet eget tortor.
Etiam vehicula massa quam, sit amet consequat tellus tincidunt vitae. Nam semper ex ut hendrerit pretium. Nam eleifend tincidunt lectus, ut consectetur orci mattis id. Mauris eu sapien id turpis ullamcorper facilisis vitae nec mi. Ut metus augue, mollis nec faucibus sed, malesuada quis ipsum. Vivamus sit amet odio orci. Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia curae; Donec posuere lectus risus, pharetra vestibulum turpis vulputate a. Nunc quis felis ullamcorper, blandit purus sit amet, ultricies nibh. Nam vitae sem imperdiet, posuere ex nec, rutrum metus. Aliquam congue magna id ultrices hendrerit.
"""
repair_json(paragraph)
It would be fine for my use case to just return an empty string or None if this error is caught, or a certain recursion depth is reached.
Is your feature request related to a problem? Please describe.
I'd prefer to be able to pass a file name to the function and have it automatically read the content and load it into the function, rather than implementing the logic myself.
Describe the solution you'd like
I'm willing to implement this in a PR using argparse
if that's good with you, it will simplify my use case in which the Python library is programatically invoked from a Bash file.
Sometimes I've gotten responses from LLMs that look like this:
```json
{
"msg": "test"
}
```
and this:
```
{
"msg": "test"
}
```
Do you think it makes sense to add markdown backtick stripping to this library? =)
0.25.3
Hi there!
Just upgraded from v0.13 to current release and found that loading a raw string does not work anymore:
>>> json_repair.loads('test')
''
I expected
>>> json_repair.loads('test')
'test'
as a fallback. This breaks my setup as I now have to check myself somehow if parameter is just a raw string without any JSON object definition. Is this a bug or are there any good reasons for that change?
Thanks!
json_repair.loads('test')
''
json_repair.loads('test')
'test'
0.19.2
Within a string with an unescaped quote followed at a later point by a comma, the string gets truncated after the second "
character in the unescaped quote within the string. If this string is at the end of the JSON object and the string is not immediately followed by }
(i.e. is followed by whitespace or e.g. a comma), then the final word in the string is parsed as a key with an empty (string) value.
This seems to relate to #44, but it seems the attempted fix for that bug report didn't fully resolve this.
For
>>> repair_json('{"lorem": "Lorem "ipsum" excepteur sint, suntid est laborum"}')
the recovered JSON is:
{
"lorem": "Lorem \"ipsum"
}
For any of the following examples
>>> repair_json('{"lorem": "Lorem "ipsum" excepteur sint, suntid est laborum" }')
>>> repair_json('{"lorem": "Lorem "ipsum" excepteur sint, suntid est laborum"\n}')
>>> repair_json('{"lorem": "Lorem "ipsum" excepteur sint, suntid est laborum",}')
the recovered JSON is:
{
"lorem": "Lorem \"ipsum",
"laborum": ""
}
Removing the comma, the output matches what we'd expect:
>>> repair_json('{"lorem": "Lorem "ipsum" excepteur sint suntid est laborum"}')
>>> repair_json('{"lorem": "Lorem "ipsum" excepteur sint suntid est laborum" }')
yields
{
"lorem": "Lorem \"ipsum\" excepteur sint suntid est laborum"
}
>>> print(repair_json('{"lorem": "Lorem "ipsum" excepteur sint suntid est laborum"}'))
{"lorem": "Lorem \"ipsum\" excepteur sint, suntid est laborum"}
>>> print(repair_json('{"lorem": "Lorem "ipsum" excepteur sint suntid est laborum" }'))
{"lorem": "Lorem \"ipsum\" excepteur sint, suntid est laborum"}
Describe the bug
The library works great for simple JSON response that comes from LLMs but for responses it completely misses them.
The results that this package returned to me
{ \"Summary"
Expected return
{
"Summary": "The customer, Joey, contacted Avanser to inquire about a specific vehicle model. He was interested in purchasing a silver-colored car and wanted to know if it was available. The agent checked the inventory and found that they had a similar model with a purple color. After discussing the availability of the silver model, the agent offered to allocate one of their salespeople to call Joey back to discuss further. The customer expressed his satisfaction with the service and requested doorstep delivery nationwide.",
"Brand": "JD",
"Model": "Silver One on the Back One" (Note: This is not a real vehicle model, but rather a description provided by the customer),
"Primary topic": "Vehicle Availability",
"Primary topic explanation": "The customer wanted to know if a specific silver-colored car was available for purchase.",
"Secondary topic": "Trade-in Options and Financing",
"Secondary topic explanation": "The customer mentioned that he had cash and was interested in trading in his current vehicle, but the agent clarified that they did not have the necessary information on hand.",
"Issue resolution": "Partially resolved",
"Issue resolution explanation": "The agent checked the inventory and found a similar model with a purple color, but could not confirm the availability of the silver model. The customer was offered an alternative solution to discuss further with one of their salespeople."
}
Environment (please complete the following information):
Describe the bug
The valid JSON {"foo": "bar \"foo\", baz"}
gets turned into the broken JSON {"foo": "bar \\"foo"}
when using repair_json
.
I think this is related to the escaped quotes and comma.
To Reproduce
Steps to reproduce the behavior:
repair_json('{"foo": "bar \"foo\", baz"}')
{"foo": "bar \\"foo"}
Expected behavior
Correct output {"foo": "bar \"foo\", baz"}
Describe the bug
First sorry for my english.
This package used :=
operator.
The operator start with python3.8
, so not support python3.7
.
But python3.7
will install success, and pyproject.toml
set support python3.7
.
Should set to python3.8
or not use :=
opeartor.
Line 14 in eaf67ab
To Reproduce
Steps to reproduce the behavior:
python3.7
.Expected behavior
Should run success in python3.7
or install failed in python3.7
.
Additional context
Example code(in python3.7):
import json
from json_repair import repair_json
json_string = r'{"a": 1 }{}'
json.loads(repair_json(json_string))
Describe the bug
The following "broken" json:
[
{
"foo": "Foo bar baz",
"tag": "#foo-bar-baz"
},
{
"foo": "foo bar "foobar" foo bar baz.",
"tag": "#foo-bar-foobar"
}
]
is repaired well by: https://josdejong.github.io/jsonrepair/
but not by this library.
To Reproduce
>>> bad_json
'[\n {\n "foo": "Foo bar baz",\n "tag": "#foo-bar-baz"\n },\n {\n "foo": "foo bar "foobar" foo bar baz.",\n "tag": "#foo-bar-foobar"\n }\n]'
>>> json_repair.loads(bad_json)
[{'foo': 'Foo bar baz', 'tag': '#foo-bar-baz"\n },\n {\n "foo', 'foo bar "foobar" foo bar baz.': 'tag', '#foo-bar-foobar': ''}]
Expected behavior
Expected output:
[
{
"foo": "Foo bar baz",
"tag": "#foo-bar-baz"
},
{
"foo": "foo bar \"foobar\" foo bar baz.",
"tag": "#foo-bar-foobar"
}
]
(as per https://josdejong.github.io/jsonrepair/)
output instead:
[{'foo': 'Foo bar baz', 'tag': '#foo-bar-baz"\n },\n {\n "foo', 'foo bar "foobar" foo bar baz.': 'tag', '#foo-bar-foobar': ''}]
Describe the bug
Issue with parsing when there is leading text
To Reproduce
json_repair.loads("Based on the information extracted, here is the filled JSON output: ```json { 'a': 'b' } ```")
# this returns the same string inputted to the function
Expected behavior
It returns { 'a': 'b' }
I've noticed that the repair works well with trailing text, e.g.
json_repair.loads("```json { 'a': 'b' } ``` This output reflects the information given in the input.")
# returns {'a': 'b'} as expected
Thanks for the library! :)
0.20.0
After parsing the "graphics" object is not in the correct hierarchy.
And also the bonus with bool variables does not seems to parsed correctly.
Use json_repair.loads() with following json:
https://raw.githubusercontent.com/vcmi-mods/tides-of-war/vcmi-1.5/Mods/alternative-creatures/content/config/creatures/rampart/dryad.json
Compare object for example with json5 library output.
When I try the example of:
import sys
import json_repair
from json_repair import repair_json
JSON_PATH = "/Path/To/JSON/File.txt"
print(sys.version)
try:
file_descriptor = open(JSON_PATH, 'r')
except OSError:
...
try:
with file_descriptor:
decoded_object = json_repair.load(file_descriptor)
except Exception as e:
print("Repairing logfile failed")
print(f"An exception occurred: {e}")
I get returned:
3.9.2 (default, Feb 28 2021, 17:03:44)
[GCC 10.2.1 20210110]
Repairing logfile failed
An exception occurred: '_io.TextIOWrapper' object has no attribute 'strip'
in this case the JSON file that is referenced with JSON_PATH is valid. I've also tried breaking it but I always get this exception.
As you can see I'm running Python 3, not sure if that could cause the problem?
# test
gpt_content = """```json
{
"Name": {
"en": "Jia-Ming Li",
"zh": "李家明",
"de": "Jia-Ming Li"
},
"Contact": {
"en": "Phone: 010-62788597\nFax: 010-62788597\nEmail: [email protected]",
"zh": "电话:010-62788597\n传真:010-62788597\n电子邮件:[email protected]",
"de": "Telefon: 010-62788597\nFax: 010-62788597\nE-Mail: [email protected]"
},
"Language Ability": {
"en": [
"English",
"Chinese"
],
"zh": [
"英语",
"中文"
],
"de": [
"Englisch",
"Chinesisch"
]
},
"Province": {
"en": "Beijing",
"zh": "北京",
"de": "Peking"
},
"Title": {
"en": "Professor, Director of the Center for Atomic and Molecular Nanosciences, Tsinghua University",
"zh": "教授,清华大学原子分子纳米科学研究中心主任",
"de": "Professor, Direktor des Zentrums für atomare und molekulare Nanowissenschaften, Tsinghua-Universität"
},
"Academic Background & Achievements": {
"en": [
"1968 - B.S. in Engineering, Taiwan University",
"1974 - Ph.D., University of Chicago",
"Academician, Chinese Academy of Sciences"
],
"zh": [
"1968 - **大学工程学士",
"1974 - 美国芝加哥大学博士",
"**科学院院士"
],
"de": [
"1968 - B.S. in Ingenieurwissenschaften, Universität Taiwan",
"1974 - Ph.D., Universität von Chicago",
"Akademiker, Chinesische Akademie der Wissenschaften"
]
},
"Work Experience": {
"en": [
"1974 - Research Associate, Department of Physics, University of Chicago",
"1975-1976 - Research Associate, Department of Physics and Astronomy, University of Pittsburgh",
"1977-1978 - Senior Research Associate, Laser Energy Research Institute, University of Rochester",
"1979-1982 - Associate Researcher, Institute of Physics, Chinese Academy of Sciences",
"1983-Present - Researcher, Institute of Physics, Chinese Academy of Sciences",
"1997-Present - Professor, Director of the Center for Atomic and Molecular Nanosciences, Department of Physics, Tsinghua University",
"2003-Present - Professor, Department of Physics, Shanghai Jiao Tong University"
],
"zh": [
"1974 - 美国芝加哥大学物理系,研究助理",
"1975-1976 - 美国匹兹堡大学物理天文系,研究助理",
"1977-1978 - 美国罗彻斯特大学激光能量研究所,高级研究助理",
"1979-1982 - **科学院物理研究所,副研究员",
"1983至今 - **科学院物理研究所,研究员",
"1997至今 - 清华大学物理系,原子分子纳米科学研究中心,教授,中心主任",
"2003至今 - 上海交通大学物理系,教授"
],
"de": [
"1974 - Forschungsassistent, Abteilung für Physik, Universität Chicago",
"1975-1976 - Forschungsassistent, Abteilung für Physik und Astronomie, Universität Pittsburgh",
"1977-1978 - Senior Research Associate, Laser Energy Research Institute, Universität Rochester",
"1979-1982 - Associate Researcher, Institut für Physik, Chinesische Akademie der Wissenschaften",
"1983-heute - Forscher, Institut für Physik, Chinesische Akademie der Wissenschaften",
"1997-heute - Professor, Direktor des Zentrums für atomare und molekulare Nanowissenschaften, Abteilung für Physik, Tsinghua-Universität",
"2003-heute - Professor, Abteilung für Physik, Shanghai Jiao Tong Universität"
]
},
"Awards": {
"en": [
"1986 - Kastler Prize, International Centre for Theoretical Physics",
"1990 - Second Class Prize, Natural Science, Chinese Academy of Sciences",
"1991 - Outstanding Young Expert, Chinese Academy of Sciences",
"1992 - Second Class Prize, Natural Science, Chinese Academy of Sciences",
"1994 - Advanced Individual in Scientific Research under the 863 Program, Ministry of Science and Technology of China",
"2001 - Advanced Individual Award for the 15th Anniversary of the 863 Program, Ministry of Science and Technology of China"
],
"zh": [
"1986 - 国际理论物理中心的 Kastler 奖",
"1990 - **科学院自然科学奖二等奖",
"1991 - **科学院有突出贡献的中青年专家",
"1992 - **科学院自然科学奖二等奖",
"1994 - 国防科学技术工业委员会评为“在863计划科研工作中的先进个人”",
"2001 - **人民解放军总装备部和国家科技部授予“863计划十五周年先进个人奖”"
],
"de": [
"1986 - Kastler-Preis, Internationales Zentrum für Theoretische Physik",
"1990 - Zweiter Klasse Preis, Naturwissenschaften, Chinesische Akademie der Wissenschaften",
"1991 - Herausragender junger Experte, Chinesische Akademie der Wissenschaften",
"1992 - Zweiter Klasse Preis, Naturwissenschaften, Chinesische Akademie der Wissenschaften",
"1994 - Fortgeschrittene Einzelperson in der wissenschaftlichen Forschung unter dem 863-Programm, Ministerium für Wissenschaft und Technologie von China",
"2001 - Fortgeschrittene Einzelperson Auszeichnung zum 15. Jahrestag des 863-Programms, Ministerium für Wissenschaft und Technologie von China"
]
},
"Areas of Focus": {
"en": [
"Atomic and molecular physics",
"Computational physics",
"Theoretical physics",
"Nanoscience"
],
"zh": [
"原子分子物理",
"计算物理",
"理论物理",
"纳米科学"
],
"de": [
"Atom- und Molekülphysik",
"Computational Physik",
"Theoretische Physik",
"Nanowissenschaft"
]
},
"Keywords for Area of Focus": {
"en": [
"quantum theory", "computational methods", "atomic properties", "molecular systems", "clusters", "physical properties", "dynamics", "theoretical calculations", "nanotubes", "semiconductors"
],
"zh": [
"量子理论", "计算方法", "原子属性", "分子系统", "团簇", "物理属性", "动力学", "理论计算", "纳米管", "半导体"
],
"de": [
"Quantentheorie", "Berechnungsmethoden", "atomare Eigenschaften", "molekulare Systeme", "Cluster", "physikalische Eigenschaften", "Dynamik", "theoretische Berechnungen", "Nanoröhren", "Halbleiter"
]
},
"Publications": {
"en": [
{
"Title": "Spectroscopy and Collision Theory: The Ar Absorption Spectrum",
"Author": "C.M.Lee (Jia-Ming Li), K.T.Lu",
"Publish Date": "1973-01-01"
},
{
"Title": "Variational Calculation of R-matrix: Application to Ar Photoabsorption",
"Author": "U.Fano, C.M.Lee (Jia-Ming Li)",
"Publish Date": "1973-01-01"
},
{
"Title": "Spectroscopy and Collision Theory: Atomic Eigenchannel Calculation by a Hartree-Fock-Roothaan Method",
"Author": "C.M.Lee (Jia-Ming Li)",
"Publish Date": "1974-01-01"
},
{
"Title": "Spin Polarization and Angular Distribution of Photoelectrons in Jacob-Wick Helicity Formalism: Application to Autoionzation Resonances",
"Author": "C.M.Lee (Jia-Ming Li)",
"Publish Date": "1974-01-01"
},
{
"Title": "Multichannel Photodetachment Theory",
"Author": "C.M.Lee (Jia-Ming Li)",
"Publish Date": "1975-01-01"
},
{
"Title": "Comment on Structure near the Cut-off Of the Continuous X-ray Spectrum of Lanthanum",
"Author": "C.M.Lee (Jia-Ming Li), R.H.Pratt",
"Publish Date": "1975-01-01"
},
{
"Title": "Radiative Capture of High-energy Electrons",
"Author": "C.M.Lee (Jia-Ming Li), R.H.Pratt",
"Publish Date": "1975-01-01"
},
{
"Title": "The Electron Bremsstrahlung Spectrum 1---500 keV",
"Author": "C.M.Lee (Jia-Ming Li), L.Kissel, R.H.Pratt, H.K.Tseng",
"Publish Date": "1976-01-01"
},
{
"Title": "Radiative Electron Capture by Mo Ions",
"Author": "C.M.Lee (Jia-Ming Li), R.H.Pratt",
"Publish Date": "1976-01-01"
},
{
"Title": "Multichannel Dissociative Recombination Theory",
"Author": "C.M.Lee (Jia-Ming Li)",
"Publish Date": "1977-01-01"
},
{
"Title": "Application of Low Energy Theorem in Electron Bremsstrahlung",
"Author": "R.H.Pratt, C.M.Lee (Jia-Ming Li)",
"Publish Date": "1977-01-01"
},
{
"Title": "Bremsstrahlung Spectrum from Atomic Ions",
"Author": "C.M.Lee (Jia-Ming Li), R.H.Pratt, H.K.Tseng",
"Publish Date": "1977-01-01"
},
{
"Title": "Radiative Charge Exchange Process in High-energy Ion-Atom Collisions",
"Author": "C.M.Lee (Jia-Ming Li)",
"Publish Date": "1978-01-01"
},
{
"Title": "On the Dispresion Relation for Electron-Atom Scattering",
"Author": "E.Gerjuoy, C.M.Lee (Jia-Ming Li)",
"Publish Date": "1978-01-01"
},
{
"Title": "Properties of Matter at High Pressures and Temperatures",
"Author": "C.M.Lee (Jia-Ming Li), E.Thorsos",
"Publish Date": "1978-01-01"
},
{
"Title": "Bremsstrahlung Energy Spectra from Electrons of Kinetic Energy 1keV~$\le$~T~$\le$~200~keV incident on Neutral Atoms 2~$\le$~Z~$\le$~92",
"Author": "R.H.Pratt,H.K.Tseng, C.M.Lee (Jia-Ming Li), L.kissel",
"Publish Date": "1977-01-01"
},
{
"Title": "Measurement of Compressed Core Density of Laser-imploded Target by X-ray Continuum Edge Shift",
"Author": "C.M.Lee (Jia-Ming Li), A.Hauer",
"Publish Date": "1978-01-01"
},
{
"Title": "Electron Bremsstrahlung Angular Distribution in the 1---500 keV Energy Range",
"Author": "H.K.Tseng, R.H.Pratt, C.M.Lee (Jia-Ming Li)",
"Publish Date": "1979-01-01"
},
{
"Title": "Explosive-pusher-type Laser Compression Experiment with Neon-filled Microballons",
"Author": "B.Yaakobi, D.Steel, E.Thorsos, A.Hauer, B.Perry, S.Skupsky, J.Geiger, C.M.Lee (Jia-Ming Li), S.Letzring, J.Rizzo, T.Mukaiyama, E.Lazarus, G.Halpern, H.Deckman, J.Delettrez",
"Publish Date": "1979-01-01"
},
{
"Title": "Relativistic Random Phase Approximation",
"Author": "W.R.Johoson, C.D.Lin, K.T.Cheng, C.M.Lee (Jia-Ming Li)",
"Publish Date": "1980-01-01"
},
{
"Title": "Electronic Impact Excitation of Li-Like Ions",
"Author": "Jia-Ming Li",
"Publish Date": "1980-01-01"
},
{
"Title": "Scattering Theory and Specctroscopy: Relativistic Multichannel Quantum Defect Theory",
"Author": "C.M.Lee (Jia-Ming Li), W.R.Johnson",
"Publish Date": "1980-01-01"
},
{
"Title": "Systematic Variation of Line-shift of K Radiation from Atomic Ions",
"Author": "Jia-Ming Li, Zhong-Xin Zhao",
"Publish Date": "1981-01-01"
},
{
"Title": "Variation in L, M, N Inner-shell Electron Binding Energies of Rare-earth Elements in Valence Transition",
"Author": "Jia-Ming Li, Zhong-Xin Zhao",
"Publish Date": "1982-01-01"
},
{
"Title": "Multichannel Inverse Dielectronic Recombination Theory",
"Author": "Jia-Ming Li",
"Publish Date": "1983-01-01"
},
{
"Title": "Quantum Defect Theory:Rydberg States of Molecules NO",
"Author": "Jia-Ming Li, Vo Ky Lan",
"Publish Date": "1983-01-01"
},
{
"Title": "Generalized Oscillator Strength Density",
"Author": "Bo-Gang Tian, Jia-Ming Li",
"Publish Date": "1984-01-01"
},
{
"Title": "Theoretical Calculations of Atomic Two-photon Ionization Processes",
"Author": "Ying-Jian Wu, Jia-Ming Li",
"Publish Date": "1985-01-01"
},
{
"Title": "Non-relativistic and Relativistic Atomic Configuration Theory: Excitation Energies and Radiative Transition Probabilities",
"Author": "Zhong-Xin Zhao, Jia-Ming Li",
"Publish Date": "1985-01-01"
},
{
"Title": "Minima of Oscillator Strenth Densities for Excited Atoms",
"Author": "Xiao-Ling Liang, Jia-Ming Li",
"Publish Date": "1985-01-01"
},
{
"Title": "Scaling Relation of Generlized Oscillator Strength Densities along Isoelectronic Sequence",
"Author": "Xiao-Chuan Pan, Jia-Ming Li",
"Publish Date": "1985-01-01"
},
{
"Title": "Eletronic Structure of Atomic Ions With the 4f electrons",
"Author": "Zhong-Xin Zhao, Jia-Ming Li",
"Publish Date": "1985-01-01"
},
{
"Title": "Ionization Channels of Superexcited Molecules",
"Author": "Xiao-Ling Liang, Xiao-Chuan Pan, Jia-Ming Li",
"Publish Date": "1985-01-01"
},
{
"Title": "Progress Report on Quantum Defect Theory: Dynamics of Excited Atoms and Molecules",
"Author": "Jia-Ming Li",
"Publish Date": "1986-01-01"
},
{
"Title": "Eletronic Impact Excitation Cross Sections and Rates: I Spin Allowed Excitation Processes",
"Author": "Bo-Gang Tian ,Jia-Ming Li",
"Publish Date": "1986-01-01"
},
{
"Title": "Current Topic in Atomic Physics: Studies on Excited Atoms and Molecules",
"Author": "Jia-Ming Li",
"Publish Date": "1986-01-01"
},
{
"Title":"""
decoded_object = json_repair.repair_json(gpt_content, return_objects=True, logging=True)
print(decoded_object)
It doesn't work, no response for a long time.
Describe the bug
The library runs into an infinite loop when calling repair_json('{foo: [}')
.
To Reproduce
Steps to reproduce the behavior:
repair_json('{foo: [}')
Expected behavior
Output of fixed json: {foo: []}
Unfortunately, I don't have the time to create a PR so I just report the bug here.
Hi, first of all, thanks for this very useful library!
My model occasionally produces JSON strings with empty keys, so I encountered the following issue:
Describe the bug
When a key in the JSON string is empty, the library runs into an infinite loop in 'parse_object'.
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Either return 'Invalid JSON format' or substitute the empty key? Not sure what's best here
Hi @mangiucugna,
Thank you for your efforts on this. I've encountered a similar issue with the output from the LLM. It seems that the repair_json
function isn't handling certain cases correctly.
For instance, when trying to repair the following JSON string:
json_str = '{\n"html": "<h3 id="title">Waarom meer dan 200 Technical Experts - "Passie voor techniek"?</h3>"}'
data = repair_json(json_str, return_objects=True)
The current output is:
{
'html': '<h3 id=',
'techniek': 'h3>',
'title': u'Waarom meer dan 200 Technical Experts - '
}
However, the expected output should be:
{
'html': '<h3 id="title">Waarom meer dan 200 Technical Experts - "Passie voor techniek"?</h3>'
}
It seems like the function is having trouble handling certain characters or nested structures properly. Would you mind looking into this further?
Thank you again for your attention to this matter.
Originally posted by @nikolaysm in #20 (comment)
Describe the bug
I was trying the json response streaming from openai api and fix it using this library. When the json input was {"
it got my program stuck for some reason. After deep investigation, I found it was an infinite loop in this library. Attached below are some screenshots with what's happening and the function that it's happening in. I added some print()
in the library code to check what's happening.
To Reproduce
Steps to reproduce the behavior:
{"
repair_json
function from the libraryExpected behavior
Should throw an error or have an empty object like {}
.
Desktop (please complete the following information):
Describe the bug
When there is a comment in the json or a value is missing, the tool creates a new k-v pair.
To Reproduce
{
"value_1": true, SHOULD_NOT_EXIST
"value_2": "data"
}
TRANSFORMS TO
{
"value_1": true,
"SHOULD_NOT_EXIST\n\n": "alue_2",
"": "data",
"}": ""
}
AND
{
"value_1":
"value_2": "data"
}
TRANSFORMS TO
{
"value_1": "value_2",
"": "data",
"}": ""
}
Expected behavior
Those are the 2 expected results:
{
"value_1": true,
"value_2": "data"
}
{
"value_1": ""
"value_2": "data"
}
Desktop (please complete the following information):
Additional context
The Json files were created with LLama2 and Mistral.
First of all, thank you for this amazing library! Just wanted to report a case, I got on my 10th try.
Describe the bug
ChatGPT returned the following string (I'm using JSON function call method, while using their API)
{ "content": "[LINK]("https://google.com")" }
To Reproduce
Steps to reproduce the behavior:
Expected behavior
String should be fixed to:
{ "content": "[LINK](https://google.com)" }
or similar
Additional context
I'm using the latest version
Traceback (line numbers may differ):
`[1930] Failed to execute script 'jsonfix' due to unhandled exception!
Traceback (most recent call last):
File "jsonfix.py", line 281, in repair_json
File "json/init.py", line 348, in loads
File "json/decoder.py", line 337, in decode
File "json/decoder.py", line 353, in raw_decode
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 2 column 20 (char 22)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "jsonfix.py", line 310, in
File "jsonfix.py", line 283, in repair_json
File "jsonfix.py", line 17, in parse
File "jsonfix.py", line 29, in parse_json
File "jsonfix.py", line 106, in parse_object
File "jsonfix.py", line 59, in parse_json
ValueError: Invalid JSON format`
Describe the bug
JSON like this:
[[1 ]
produces an Invalid JSON format
exception instead of repairing the JSON. It's the trailing whitespace that is the problem; this:
[[1]
is fine.
(This is a reduced version of JSON produced by ChatGPT where the original error occurred.)
To Reproduce
Try fixing the above JSON.
Note that there are two related issues:
Additional context
PR which fixes this coming shortly.
Imagine your LLM spits out:
{"foo": {"bar": {"baz": 1}, "zig": {"zag": 2}}
This JSON is missing a bracket. There are two places it could go that would fix the issue:
{"foo": {"bar": {"baz": 1}, "zig": {"zag": 2}}
|-------------------------^--------or---------^|
Currently, repair_json.loads() returns the valid JSON with the added closing bracket at the end. But my pydantic model actually requires the other solution.
So if I could do:
valid_jsons = json_repair.parse_all(invalid_llm_output)
validated_output = None
for valid_json in valid_jsons:
try:
validated_output = MyExpectedOutputModel.parse(valid_json)
except ValidationError:
continue
if not validated_output:
... do something ...
That would help make this in even more powerful tool in avoiding calling the LLM for fixing the JSON.
I'm testing an automation that scaffolds a project based on a txt file with architecture details, using a twitter posting tool as an example (I know, this wouldn't really work because they've changed their API since 2021).
Originally posted by @cooleydw494 in #3 (reply in thread)
0.20.1
When some of the string elements in the list are missing double quotes, the current repair program fixes all the elements as a whole.
I haven't found a suitable solution for this either, I was thinking of using a comma as the beginning of a new list element to judge, but it seems that list elements overlap in a variety of ways, e.g. sometimes a comma separates element delimiters, and sometimes it's a punctuation mark in a sentence.
I wonder if this could be fixed more precisely by adding a more detailed context judgment.
{
"people": ["Rilee Smith", travel bloggers, Matthias Keller, Ben Harrell"],
"additional_research_needed": [
"Current AI trends in the travel industry for 2024.",
"User satisfaction and feedback on AI travel planning tools like ChatGPT, Copilot, and Gemini.",
"Latest advancements in the AI-driven content marketing landscape."
]
}
Trying to be more precise in determining the contextual conditions of json format.
Describe the bug
Price like number not properly parsed.
To Reproduce
from json_repair import repair_json repair_json("{'price': [105,000.00']}")
Expected behavior
return "{'price': ['105,000.00']}" (or return "{'price': [105,000.00]}", format doesn't matter since a quote is loss)
Desktop (please complete the following information):
Hello,
I would like to extend my gratitude for the exceptional project you have created. Inspired by your work, I have developed a Go language version of json-repair, which can be found at https://github.com/RealAlexandreAI/json-repair. With the aim of fostering a collaborative development environment and providing developers with access to libraries in various programming languages, I propose the exchange of a friendly link between our projects.
This initiative is particularly relevant in addressing the challenge of handling the chaotic JSON strings often generated by Large Language Models (LLMs). By linking our repositories, we can offer a more comprehensive solution to the developer community, enabling them to effectively manage and repair JSON data across different platforms and programming languages.
I look forward to your positive response and the potential benefits that such a collaboration could bring to both our projects and their users.
Consider the following code:
json_repair.loads("""{ \n "b": "xxxxx" // comment 1.2 \n }""")
It should be {'b': 'xxxxx'}
.
But the library outputs {'b': 'xxxxx', 0.2: ''}
.
Hola,
Really useful library
Small llm versions can sometimes output json in single quotes and can be sometimes not very consistant with it.
Here is how I deal with it right now. It's not super inefficient but it work for now.
def fix_single_quotes(json_str):
# This pattern matches keys and values in a JSON string
pattern = r"'([^'])'(?=[\s,}:])|(?<=[:{,]\s)'([^'])"
# Use a list to store the parts of the string that will not be changed
parts = []
last_end = 0
# Iterate over all matches
for match in re.finditer(pattern, json_str):
# Add the part of the string before the match to the list
parts.append(json_str[last_end:match.start()])
# Replace the single quotes around the match with double quotes and add it to the list
parts.append('"' + match.group(1 or 2) + '"')
last_end = match.end()
# Add the part of the string after the last match to the list
parts.append(json_str[last_end:])
# Join the parts into a single string
return ''.join(parts)
Feel free to improve it.
Best,
Describe the bug
Can we handle missing comma when parsing dict?
To Reproduce
json_repair.loads('''{
"number": 1,
"reason": "According..."
"ans": "YES"
}''')
This code produces {'number': 1, 'reason': 'According..."\n "ans": "YES'}
.
But we expect {'number': 1, 'reason': 'According...', 'ans': 'YES'}
.
0.25.2
Working with LLMs (Llama) and having it produce some output in JSON format. There is an edge case I have encountered when working with chinese headings where it will often produce double quotes on the "title" property in the JSON string. This breaks the formatting.
Using the json_repair library should fix this, but instead it returns an empty string in the title.
Output:
[{"chapter_id": 1, "starting_time_stamp": "0:00:00", "title": ""}, {"chapter_id": 2, "starting_time_stamp": "0:01:00", "title": ""}, {"chapter_id": 3, "starting_time_stamp": "0:02:00", "title": ""}, {"chapter_id": 4, "starting_time_stamp": "0:04:00", "title": ""}, {"chapter_id": 5, "starting_time_stamp": "0:06:00", "title": ""}, {"chapter_id": 6, "starting_time_stamp": "0:09:00", "title": ""}, {"chapter_id": 7, "starting_time_stamp": "0:11:00", "title": ""}]
Use the following JSON.
raw_json = """[
{
"chapter_id": 1,
"starting_time_stamp": "0:00:00",
"title": ""国内苹果用户和安卓用户使用TikTok的各种方法"
},
{
"chapter_id": 2,
"starting_time_stamp": "0:01:00",
"title": ""苹果安卓通用最简单的方法"
},
{
"chapter_id": 3,
"starting_time_stamp": "0:02:00",
"title": ""不插卡使用"
},
{
"chapter_id": 4,
"starting_time_stamp": "0:04:00",
"title": ""免拔卡模式"
},
{
"chapter_id": 5,
"starting_time_stamp": "0:06:00",
"title": ""MITM抓包安装支持MITM的旧版TikTok客户端"
},
{
"chapter_id": 6,
"starting_time_stamp": "0:09:00",
"title": ""安卓用户使用修改版"
},
{
"chapter_id": 7,
"starting_time_stamp": "0:11:00",
"title": ""苹果端无视SIM卡地区限制的第三方修改版"
}
]"""
Calling code:
valid_json = repair_json(raw_json)
print(valid_json)
Expected the removal of one the quotes in the starting of the "title" object string.
0.25.3
Not sure whether to call this a bug or a feature request. Some models have a habit of getting into loops (Llama-3.1 in this case) so the output gets truncated by max_tokens and the JSON is borked. I'd say there were two issues here - one that it's not parsing the quotes correctly (look at the single vs double quotes in its output compared with the input) and secondly that it's not managing to include much of the input JSON. Is it possible to parse this?
LLM output:
{ "text description" : "subcutaneous oxycodone",\n"terms" : [\n {"term": "Localized swelling, mass and lump of skin and subcutaneous tissue", "score": 0},\n {"term": "Benign lipomatous neoplasm of skin and subcutaneous tissue of head, face and neck", "score": 0},\n {"term": "Localized hyperhidrosis", "score": 0},\n {"term": "Excessive and redundant skin and subcutaneous tissue", "score": 0},\n {"term": "Benign lipomatous neoplasm of skin and subcutaneous tissue of other and unspecified sites", "score": 0},\n {"term": "Superficial frostbite of neck", "score": 0},\n {"term": "Superficial frostbite", "score": 0},\n {"term": "Cellulitis and abscess of mouth", "score": 0},\n {"term": "Frostbite with tissue necrosis of neck", "score": 0},\n {"term": "Other disorders of skin and subcutaneous tissue, not elsewhere classified", "score": 0},\n {"term": "Localized swelling, mass and lump of skin and subcutaneous tissue", "score": 0},\n {"term": "Benign lipomatous neoplasm of skin and subcutaneous tissue of head, face and neck", "score": 0},\n {"term": "Localized hyperhidrosis", "score": 0},\n {"term": "Excessive and redundant skin and subcutaneous tissue", "score": 0},\n {"term": "Benign lipomatous neoplasm of skin and subcutaneous tissue of other and unspecified sites", "score": 0},\n {"term": "Superficial frostbite of neck", "score": 0},\n {"term": "Superficial frostbite", "score": 0},\n {"term": "Cellulitis and abscess of mouth", "score": 0},\n {"term": "Frostbite with tissue necrosis of neck", "score": 0},\n {"term": "Other disorders of skin and subcutaneous tissue, not elsewhere classified", "score": 0},\n {"term": "Localized swelling, mass and lump of skin and subcutaneous tissue", "score": 0},\n {"term": "Benign lipomatous neoplasm of skin and subcutaneous tissue of head, face and neck", "score": 0},\n {"term": "Localized hyperhidrosis", "score": 0},\n {"term": "Excessive and redundant skin and subcutaneous tissue", "score": 0},\n {"term": "Benign lipomatous neoplasm of skin and subcutaneous tissue of other and unspecified sites", "score": 0},\n {"term": "Superficial frostbite of neck", "score": 0},\n {"term": "Superficial frostbite", "score": 0},\n {"term": "Cellulitis and abscess of mouth", "score": 0},\n {"term": "Frostbite with tissue necrosis of neck", "score": 0},\n {"term": "Other disorders of skin and subcutaneous tissue, not elsewhere classified", "score": 0},\n {"term": "Localized swelling, mass and lump of skin and subcutaneous tissue", "score": 0},\n {"term": "Benign lipomatous neoplasm of skin and subcutaneous tissue of head, face and neck", "score": 0},\n {"term": "Localized hyperhidrosis", "score": 0},\n {"term": "Excessive and redundant skin and subcutaneous tissue", "score": 0},\n {"term": "Benign lipomatous neoplasm of skin and subcutaneous tissue of other and unspecified sites", "score": 0},\n {"term": "Superficial frostbite of neck", "score": 0},\n {"term": "Superficial frostbite", "score": 0},\n {"term": "Cellulitis and abscess of mouth", "score": 0},\n {"term": "Frostbite with tissue necrosis of neck", "score": 0},\n {"term": "Other disorders of skin and subcutaneous tissue, not elsewhere classified", "score": 0},\n {"term": "Localized swelling, mass and lump of skin and subcutaneous tissue", "score": 0},\n {"term": "Benign lipomatous neoplasm of skin and subcutaneous tissue of head, face and neck", "score": 0},\n {"
json_repair.loads():
{'text description" : "subcutaneous oxycodone': 'terms" : [\n {"term', 'Localized swelling, mass and lump of skin and subcutaneous tissue': 'score'}
Ideally, parsed with all the LLM output present in the loaded JSON, but at least something with the "text description" and "terms" objects correctly existing rather than being combined.
I appreciate it might be a big change to json_repair but I did wonder if there might be a way to pass a JSON schema to it, so it can ensure the output conforms.
Describe the bug
Failed JSON decoding
[6771] Failed to execute script 'jsonfix' due to unhandled exception!
Traceback (most recent call last):
File "jsonfix.py", line 281, in repair_json
File "json/__init__.py", line 348, in loads
File "json/decoder.py", line 337, in decode
File "json/decoder.py", line 353, in raw_decode
json.decoder.JSONDecodeError: Invalid control character at: line 2 column 48 (char 50)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "jsonfix.py", line 310, in <module>
File "jsonfix.py", line 283, in repair_json
File "jsonfix.py", line 17, in parse
File "jsonfix.py", line 29, in parse_json
File "jsonfix.py", line 106, in parse_object
File "jsonfix.py", line 59, in parse_json
ValueError: Invalid JSON format`
To Reproduce
Try to parse following json:
{
"real_content": "Some string: Some other string
Some string <a href=\"https://domain.com\">Some link</a>
"
}
Expected behavior
Correct json.
Is your feature request related to a problem? Please describe.
Code currently does not handle: '{"key": 1/3}' --> it will currently mess up the parsing of the rest of the json string. It will treat the "1" as the value, and the "3" as the next key and swap the keys and values for the rest of the json.
{"key1": 1/3, "key2": 1, "key3": "value3", "key4": "value4"}'
{'key1': 1, 3: 'key2', 1: 'key3', 'value3': 'key4'}
What I would like instead is '{"key": "1/3"}'.
This original json output of '{"key": 1/3}' is a response I have received from an LLM.
Describe the solution you'd like
I would like it to output the fraction as a string.
Describe alternatives you've considered
Additional context
0.23.1
When parsing broken json that looks like this:
[
{
"Snippet Summary Id": 1,
"Overview": "Syncing with Company",
"Description": The conversation focused on how this company's release management system integrates with ours, providing a streamlined workflow for documentation approval, unlike Jim.",
"What the Prospect said": "John was interested in understanding how the release flow works and how it can be used to approve documentation and drawings directly in the product.",
"Seller Response": "Gene explained that the configuration allows the release flow to start from the other product and push information to ours, enabling a wider team to approve documentation without needing direct access to our product.",
"Quote": "Okay. So configuration is done right now."
},
{
"Snippet Summary Id": 2,
"Overview": "Assigning Part Numbers",
"Description": "The discussion covered the capability of this product to assign part numbers to CAD data, a feature that might differentiate Our product from theirs.",
"What the Prospect said": "Eve was looking at the part table and seemed curious about how part numbers could be assigned and mapped to categories in our product.",
"Seller Response": "Gene demonstrated how part numbers could be assigned to CAD data through our product and mapped to various categories, showcasing the product's flexibility.",
"Quote": "One of the options is that you can ask the product to assign per numbers to your CAD data."
}
]
The missing quote after "Description":
is repaired but instead of closing the quote at the existing closing quote, the package inserts a new quote at the first comma it finds, resulting in this:
[
{
"Snippet Summary Id": 1,
"Overview": "Syncing with Company",
"Description": "The conversation focused on how this company's release management system integrates with ours",
"Jim.": "What the Prospect said\": \"John was interested in understanding how the release flow works and how it can be used to approve documentation and drawings directly in the product.",
"Seller Response": "Gene explained that the configuration allows the release flow to start from the other product and push information to ours, enabling a wider team to approve documentation without needing direct access to our product.",
"Quote": "Okay. So configuration is done right now."
},
{
"Snippet Summary Id": 2,
"Overview": "Assigning Part Numbers",
"Description": "The discussion covered the capability of this product to assign part numbers to CAD data, a feature that might differentiate Our product from theirs.",
"What the Prospect said": "Eve was looking at the part table and seemed curious about how part numbers could be assigned and mapped to categories in our product.",
"Seller Response": "Gene demonstrated how part numbers could be assigned to CAD data through our product and mapped to various categories, showcasing the product's flexibility.",
"Quote": "One of the options is that you can ask the product to assign per numbers to your CAD data."
}
]
string = """
[
{
"Snippet Summary Id": 1,
"Overview": "Syncing with Company",
"Description": The conversation focused on how this company's release management system integrates with ours, providing a streamlined workflow for documentation approval, unlike Jim.",
"What the Prospect said": "John was interested in understanding how the release flow works and how it can be used to approve documentation and drawings directly in the product.",
"Seller Response": "Gene explained that the configuration allows the release flow to start from the other product and push information to ours, enabling a wider team to approve documentation without needing direct access to our product.",
"Quote": "Okay. So configuration is done right now."
},
{
"Snippet Summary Id": 2,
"Overview": "Assigning Part Numbers",
"Description": "The discussion covered the capability of this product to assign part numbers to CAD data, a feature that might differentiate Our product from theirs.",
"What the Prospect said": "Eve was looking at the part table and seemed curious about how part numbers could be assigned and mapped to categories in our product.",
"Seller Response": "Gene demonstrated how part numbers could be assigned to CAD data through our product and mapped to various categories, showcasing the product's flexibility.",
"Quote": "One of the options is that you can ask the product to assign per numbers to your CAD data."
}
]
"""
repair_json(string, return_objects=True)
I'd expect this:
[
{
"Snippet Summary Id": 1,
"Overview": "Syncing with Company",
"Description": "The conversation focused on how this company's release management system integrates with ours, providing a streamlined workflow for documentation approval, unlike Jim.",
"What the Prospect said": "John was interested in understanding how the release flow works and how it can be used to approve documentation and drawings directly in the product.",
"Seller Response": "Gene explained that the configuration allows the release flow to start from the other product and push information to ours, enabling a wider team to approve documentation without needing direct access to our product.",
"Quote": "Okay. So configuration is done right now."
},
{
"Snippet Summary Id": 2,
"Overview": "Assigning Part Numbers",
"Description": "The discussion covered the capability of this product to assign part numbers to CAD data, a feature that might differentiate Our product from theirs.",
"What the Prospect said": "Eve was looking at the part table and seemed curious about how part numbers could be assigned and mapped to categories in our product.",
"Seller Response": "Gene demonstrated how part numbers could be assigned to CAD data through our product and mapped to various categories, showcasing the product's flexibility.",
"Quote": "One of the options is that you can ask the product to assign per numbers to your CAD data."
}
]
Overall this is an awesome tool!! It's handled everything else I've thrown at it perfectly.
Describe the bug
When I call "repair_json" with a certain input, it throws AttributeError in version 0.17.0:
in JSONParser.parse_string(self)
234 lstring_delimiter = "“"
235 rstring_delimiter = "”"
--> 236 elif char.isalpha():
237 # This could be a <boolean> and not a string. Because (T)rue or (F)alse or (N)ull are valid
238 if char.lower() in ["t", "f", "n"]:
239 value = self.parse_boolean_or_null()
AttributeError: 'bool' object has no attribute 'isalpha'
To Reproduce
json_repair.repair_json("[{]", return_objects=True)
Expected behavior
Returns "[]"
Desktop (please complete the following information):
Additional context
Bug occurs in 0.17.0 version only
Describe the bug
I am getting an error when trying to repair a small json file with one single issue: a missing ']'
To Reproduce
Just run the following code
from json_repair import repair_json
str = '''
{
"resourceType": "Bundle",
"id": "1",
"type": "collection",
"entry": [
{
"resource": {
"resourceType": "Patient",
"id": "1",
"name": [
{"use": "official", "family": "Corwin", "given": ["Keisha", "Sunny"], "prefix": ["Mrs."},
{"use": "maiden", "family": "Goodwin", "given": ["Keisha", "Sunny"], "prefix": ["Mrs."]}
]
}
}
]
}
'''
repair_json(str, skip_json_loads=True)
Observations
The problem seems to be the fact that "name" is made of two dicts. If you remove the second entry, and simply input
"name": [
{"use": "official", "family": "Corwin", "given": ["Keisha", "Sunny"], "prefix": ["Mrs."}
]
seems to work
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.