Git Product home page Git Product logo

json_repair's Introduction

This simple package can be used to repair a broken json file. To know all cases in which this package will work, check out the unit test.

Inspired by https://github.com/josdejong/jsonrepair with contributions by GPT-4

Motivation

[UPDATE] OpenAI just released a new update with JSON mode in function calling. So I guess if you use OpenAI with function calling you don't need this.

I was using GPT a lot and there is no sure fire way to get structured output out of it. You can ask for a JSON output or use the Functions paradigm, either way the documentation from OpenAI clearly states that it might not return a valid JSON. Luckily, the mistakes GPT makes are simple enough to be fixed without destroying the content. I searched for a lightweight python package but couldn't find any.

So I wrote this one.

You can look how I used it by checking out this demo: https://huggingface.co/spaces/mangiucugna/difficult-conversations-bot/

How to use

from json_repair import repair_json
try:
    good_json_string = repair_json(bad_json_string)
except Exception:
    # Not even this library could fix this JSON

You can use this library to completely replace json.loads():

import json_repair
try:
    decoded_object = json_repair.loads(json_string)
except Exception:
    # Manage Exception

or just

import json_repair
try:
    decoded_object = json_repair.repair_json(json_string, return_objects=True)
except Exception:
    # Manage Exception

How it works

This module will parse the JSON file following the BNF definition:

<json> ::= <primitive> | <container>

<primitive> ::= <number> | <string> | <boolean>
; Where:
; <number> is a valid real number expressed in one of a number of given formats
; <string> is a string of valid characters enclosed in quotes
; <boolean> is one of the literal strings 'true', 'false', or 'null' (unquoted)

<container> ::= <object> | <array>
<array> ::= '[' [ <json> *(', ' <json>) ] ']' ; A sequence of JSON values separated by commas
<object> ::= '{' [ <member> *(', ' <member>) ] '}' ; A sequence of 'members'
<member> ::= <string> ': ' <json> ; A pair consisting of a name, and a JSON value

If something is wrong (a missing parantheses or quotes for example) it will use a few simple heuristics to fix the JSON string:

  • Add the missing parentheses if the parser believes that the array or object should be closed
  • Quote strings or add missing single quotes
  • Adjust whitespaces and remove line breaks

I am sure some corner cases will be missing, if you have examples please open an issue or even better push a PR

How to develop

Just create a virtual environment with requirements.txt, the setup uses pre-commit to make sure all tests are run

How to release

You will need owner access to this repository

  • Edit pyproject.toml and update the version number appropriately using semver notation
  • Run python -m build
  • Commit and push all changes to the repository before continuing or the next steps will fail
  • Create a new release in Github, making sure to tag all the issues solved and contributors. Create the new tag, same as the one in the build configuration
  • Once the release is created, a new Github Actions workflow will start to publish on Pypi, make sure it didn't fail

Bonus Content

If you need some good Custom Instructions (System Message) to improve your chatbot responses try https://gist.github.com/mangiucugna/7ec015c4266df11be8aa510be0110fe4

json_repair's People

Contributors

mangiucugna avatar brettrp avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.