Git Product home page Git Product logo

rjson's Introduction

rjson

A C based JSON parser for R. Released versions can be found at http://cran.r-project.org/web/packages/rjson/index.html

Alex Couture-Beil [email protected]

Development notes

rjson uses earthly to containerize common development tasks.

unit tests

To run the unit tests, run:

earthly +unittest

rcheck

To run rcheck, run:

earthly +rcheck

Packaging rjson for cran

To create a source rjson_<version>.tar.gz, run:

earthly +cran

This will output a compressed source archive under output/.

Non-earthly tasks

To run R check with valgrind, in the past I have run:

docker run -v `pwd`:/foo -w /foo -ti -e VALGRIND_OPTS='--leak-check=full --show-reachable=yes' --rm --cap-add SYS_PTRACE rocker/r-devel-ubsan-clang R CMD check --use-valgrind rjson

(This should be moved into the Earthfile to make it easier to run).

rjson's People

Contributors

alexcb avatar brodieg avatar dfsp-spirit avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

rjson's Issues

convert emoji characters.

received via email:

rjson package fails to convert emoji characters.
Other tools like iconv and python3 say these are valid JSON strings.

> rjson::toJSON("๐ŸŒน")
Error in rjson::toJSON("๐ŸŒน") : unable to escape string. String is not utf8

Here's a quick look at the string I did via debugging:

writeBin("๐ŸŒน", "emojii.data")
$ cat emojii.data  | xxd 
00000000: f09f 8cb9 00                             .....

which matches up with what python encodes it as too:

>>> print("๐ŸŒน".encode('utf8'))
b'\xf0\x9f\x8c\xb9'

Convert strings with javascript escaped unicode properly

When using rjson to parse content JSON strings with javascript escaped characters: e.g. " #FOXSportsGO\n\ud83d\udcfb"

rjson will incorrectly parse this as an invalid utf-8 string "\xed\xa0\xbd\xed\xb3\xbb" instead of ๐Ÿ“ป.

Surrogate Pairs decoding

This more a question than a bug.
I was trying to decode emoji in twitter streams.
Example:
in json file (obained via filterStream [streamR package] i've got this:
{"text":"#smmdayit\n#21mcdg\n#formazione24 \nAre you ready??? \ud83d\ude0a http://t.co/vWWIqQHbfn"}
but after using parseTweets (StreamR package) that uses your FromJSON function from this package, i'm obtaining (on for the emoji section. ie: \ud83d\ude0a) this:
"\xed<U+00A0><U+00BD>\xed<U+00BD>\u008a" while i was expecting to see the Unicode scalar value 1F60A, as a result of the surrogate pair D83D + DE0A, that corresponds to ๐Ÿ˜Š

array with single element in JSON

Hello,
thanks for this package -
I encountered an issue when transforming this data with fromJSON and toJSON:
http://bl.ocks.org/mbostock/raw/1044242/readme-flare-imports.json

For example, the "imports" field in the 11th element contains only one element in the array enclosed with "[ ]":
... {"name":"flare.animate.Easing","size":17010,"imports":["flare.animate.Transition"]}, ...

Unfortunately, this property is missing after the fromJSON / toJSON transformation and the JS packageImports() function of http://bl.ocks.org/mbostock/1044242 doesn't work any more.

Please let me know if anything is unclear

Best, Bo

[minor] couple of compiler warnings

Hi @alexcb . Thanks for great work. I see couple of compiler warnings:

parser.c: In function โ€˜hasClassโ€™:
parser.c:68:16: warning: comparison between signed and unsigned integer expressions [-Wsign-compare]
for( i = 0; i < size; i++ ) {
^

parser.c: In function โ€˜UTF8EncodeUnicodeโ€™:
parser.c:112:40: warning: suggest parentheses around arithmetic in operand of โ€˜|โ€™ [-Wparentheses]
s[ 1 ] = (MASKBYTE | ( input >> 12 ) & MASKBITS );

I think would be nice to fix.

wired result (with NA involved)

I have some wired result in using rjson.

x <- list(ss = c(123L, NA_integer_))

rjson::toJSON(x)
# [1] "{\"ss\":[123,\"NA\"]}"

x integer vector with NA, but toJSON gives integer and string.

x <- list(ss = c(123L, NA_integer_))
rjson::fromJSON(rjson::toJSON(x))
# $ss
# $ss[[1]]
#  [1] 123
# $ss[[2]]
#  [1] "NA"

It is not equal to original x.

fromJSON("{test:\"123\"}") failed

Hi

{test: "123"} is a valid json string, and I parsed it using rjson, the result is

Error in fromJSON("{test:\"123\"}") : 
  parseTrue: expected to see 'true' - likely an unquoted string starting with 't'.

Of course, {"test": "123"} works fine. So is that a bug?

Read smart contacts with rjson

Hi Experts,

I have here small program that is reading smart contracts, these smart contracts are like this

Everything was working on version 0.2.17 for rjson (jsonlite was not working).
Now in version 0.2.21, it seems not working and give following error:
Example:
Error in rjson::fromJSON("[\n ... : not all data was parsed (3707 chars ere parsed out of a total of 3963 chars).

Before was working without any issue.
image

Note: This was scratch installation using the same OS but with new version of all packages on R.

I noticed now it is same issue as jsonlite at least it is same place.
Error: parse error: trailing garbage
"type": "function" } ], "evm": { "bytecode": {
(right here) ------^

Command that worked before. with smart contract
abi_load_UniswapV2Factory <-rjson::fromJSON('[
{
"anonymous": false,
"inputs": [
{
"indexed": true,
"internalType": "address",
"name": "token0",
"type": "address"
},
{
"indexed": true,
"internalType": "address",
"name": "token1",
"type": "address"
},
{
"indexed": false,
"internalType": "address",
"name": "pair",
"type": "address"
},
{
"indexed": false,
"internalType": "uint256",
"name": "",
"type": "uint256"
}
],
"name": "PairCreated",
"type": "event"
},
{
"constant": true,
"inputs": [
{
"internalType": "uint256",
"name": "",
"type": "uint256"
}
],
"name": "allPairs",
"outputs": [
{
"internalType": "address",
"name": "pair",
"type": "address"
}
],
"payable": false,
"stateMutability": "view",
"type": "function"
},
{
"constant": true,
"inputs": [],
"name": "allPairsLength",
"outputs": [
{
"internalType": "uint256",
"name": "",
"type": "uint256"
}
],
"payable": false,
"stateMutability": "view",
"type": "function"
},
{
"constant": false,
"inputs": [
{
"internalType": "address",
"name": "tokenA",
"type": "address"
},
{
"internalType": "address",
"name": "tokenB",
"type": "address"
}
],
"name": "createPair",
"outputs": [
{
"internalType": "address",
"name": "pair",
"type": "address"
}
],
"payable": false,
"stateMutability": "nonpayable",
"type": "function"
},
{
"constant": true,
"inputs": [],
"name": "feeTo",
"outputs": [
{
"internalType": "address",
"name": "",
"type": "address"
}
],
"payable": false,
"stateMutability": "view",
"type": "function"
},
{
"constant": true,
"inputs": [],
"name": "feeToSetter",
"outputs": [
{
"internalType": "address",
"name": "",
"type": "address"
}
],
"payable": false,
"stateMutability": "view",
"type": "function"
},
{
"constant": true,
"inputs": [
{
"internalType": "address",
"name": "tokenA",
"type": "address"
},
{
"internalType": "address",
"name": "tokenB",
"type": "address"
}
],
"name": "getPair",
"outputs": [
{
"internalType": "address",
"name": "pair",
"type": "address"
}
],
"payable": false,
"stateMutability": "view",
"type": "function"
},
{
"constant": false,
"inputs": [
{
"internalType": "address",
"name": "",
"type": "address"
}
],
"name": "setFeeTo",
"outputs": [],
"payable": false,
"stateMutability": "nonpayable",
"type": "function"
},
{
"constant": false,
"inputs": [
{
"internalType": "address",
"name": "",
"type": "address"
}
],
"name": "setFeeToSetter",
"outputs": [],
"payable": false,
"stateMutability": "nonpayable",
"type": "function"
}
],
"evm": {
"bytecode": {
"linkReferences": {},
"object": "",
"opcodes": "",
"sourceMap": ""
},
"deployedBytecode": {
"linkReferences": {},
"object": "",
"opcodes": "",
"sourceMap": ""
}
},
"interface": [
{
"anonymous": false,
"inputs": [
{
"indexed": true,
"internalType": "address",
"name": "token0",
"type": "address"
},
{
"indexed": true,
"internalType": "address",
"name": "token1",
"type": "address"
},
{
"indexed": false,
"internalType": "address",
"name": "pair",
"type": "address"
},
{
"indexed": false,
"internalType": "uint256",
"name": "",
"type": "uint256"
}
],
"name": "PairCreated",
"type": "event"
},
{
"constant": true,
"inputs": [
{
"internalType": "uint256",
"name": "",
"type": "uint256"
}
],
"name": "allPairs",
"outputs": [
{
"internalType": "address",
"name": "pair",
"type": "address"
}
],
"payable": false,
"stateMutability": "view",
"type": "function"
},
{
"constant": true,
"inputs": [],
"name": "allPairsLength",
"outputs": [
{
"internalType": "uint256",
"name": "",
"type": "uint256"
}
],
"payable": false,
"stateMutability": "view",
"type": "function"
},
{
"constant": false,
"inputs": [
{
"internalType": "address",
"name": "tokenA",
"type": "address"
},
{
"internalType": "address",
"name": "tokenB",
"type": "address"
}
],
"name": "createPair",
"outputs": [
{
"internalType": "address",
"name": "pair",
"type": "address"
}
],
"payable": false,
"stateMutability": "nonpayable",
"type": "function"
},
{
"constant": true,
"inputs": [],
"name": "feeTo",
"outputs": [
{
"internalType": "address",
"name": "",
"type": "address"
}
],
"payable": false,
"stateMutability": "view",
"type": "function"
},
{
"constant": true,
"inputs": [],
"name": "feeToSetter",
"outputs": [
{
"internalType": "address",
"name": "",
"type": "address"
}
],
"payable": false,
"stateMutability": "view",
"type": "function"
},
{
"constant": true,
"inputs": [
{
"internalType": "address",
"name": "tokenA",
"type": "address"
},
{
"internalType": "address",
"name": "tokenB",
"type": "address"
}
],
"name": "getPair",
"outputs": [
{
"internalType": "address",
"name": "pair",
"type": "address"
}
],
"payable": false,
"stateMutability": "view",
"type": "function"
},
{
"constant": false,
"inputs": [
{
"internalType": "address",
"name": "",
"type": "address"
}
],
"name": "setFeeTo",
"outputs": [],
"payable": false,
"stateMutability": "nonpayable",
"type": "function"
},
{
"constant": false,
"inputs": [
{
"internalType": "address",
"name": "",
"type": "address"
}
],
"name": "setFeeToSetter",
"outputs": [],
"payable": false,
"stateMutability": "nonpayable",
"type": "function"
}
]')
}

fix protect/unprotect rchk issues

rjson has two functions with loops that build up elements and protects them as they are created.

upon an error, a call to UNPROTECT( objs ); is made where objs is an int which is incremented for each PROTECT; however the rchk PROTECT/UNPROTECT linter does not support this form (however R does, and rjson predates these rchk checks).

             +rcheck | Function parseArray
             +rcheck |   [UP] protect stack is too deep, unprotecting all variables, results will be incomplete
             +rcheck |   [UP] unsupported form of unprotect, unprotecting all variables, results will be incomplete /rchk/packages/build/hWyBjPqk/rjson/src/parser.c:508
             +rcheck |   [UP] unsupported form of unprotect, unprotecting all variables, results will be incomplete /rchk/packages/build/hWyBjPqk/rjson/src/parser.c:514
             +rcheck |   [UP] unsupported form of unprotect, unprotecting all variables, results will be incomplete /rchk/packages/build/hWyBjPqk/rjson/src/parser.c:535
             +rcheck |   [UP] unsupported form of unprotect, unprotecting all variables, results will be incomplete /rchk/packages/build/hWyBjPqk/rjson/src/parser.c:585
             +rcheck |   [UP] unsupported form of unprotect, unprotecting all variables, results will be incomplete /rchk/packages/build/hWyBjPqk/rjson/src/parser.c:600
             +rcheck |   [UP] unsupported form of unprotect, unprotecting all variables, results will be incomplete /rchk/packages/build/hWyBjPqk/rjson/src/parser.c:604
             +rcheck |   [UP] unsupported form of unprotect, unprotecting all variables, results will be incomplete /rchk/packages/build/hWyBjPqk/rjson/src/parser.c:614

             +rcheck | Function parseList
             +rcheck |   [UP] protect stack is too deep, unprotecting all variables, results will be incomplete
             +rcheck |   [UP] unsupported form of unprotect, unprotecting all variables, results will be incomplete /rchk/packages/build/hWyBjPqk/rjson/src/parser.c:637
             +rcheck |   [UP] unsupported form of unprotect, unprotecting all variables, results will be incomplete /rchk/packages/build/hWyBjPqk/rjson/src/parser.c:642
             +rcheck |   [UP] unsupported form of unprotect, unprotecting all variables, results will be incomplete /rchk/packages/build/hWyBjPqk/rjson/src/parser.c:650
             +rcheck |   [UP] unsupported form of unprotect, unprotecting all variables, results will be incomplete /rchk/packages/build/hWyBjPqk/rjson/src/parser.c:667
             +rcheck |   [UP] unsupported form of unprotect, unprotecting all variables, results will be incomplete /rchk/packages/build/hWyBjPqk/rjson/src/parser.c:672
             +rcheck |   [UP] unsupported form of unprotect, unprotecting all variables, results will be incomplete /rchk/packages/build/hWyBjPqk/rjson/src/parser.c:680
             +rcheck |   [UP] unsupported form of unprotect, unprotecting all variables, results will be incomplete /rchk/packages/build/hWyBjPqk/rjson/src/parser.c:691
             +rcheck |   [UP] unsupported form of unprotect, unprotecting all variables, results will be incomplete /rchk/packages/build/hWyBjPqk/rjson/src/parser.c:707
             +rcheck |   [UP] unsupported form of unprotect, unprotecting all variables, results will be incomplete /rchk/packages/build/hWyBjPqk/rjson/src/parser.c:727
             +rcheck |   [UP] unsupported form of unprotect, unprotecting all variables, results will be incomplete /rchk/packages/build/hWyBjPqk/rjson/src/parser.c:741
             +rcheck |   [UP] unsupported form of unprotect, unprotecting all variables, results will be incomplete /rchk/packages/build/hWyBjPqk/rjson/src/parser.c:755
             +rcheck | Analyzed 166 functions, traversed 6001242 states.

I then also tried calling UNPROTECT in a loop ( see 9d05c9b ) but that introduced other issues.

A CRAN update is blocked until this is resolved.

v3.6.3 version?

Hi, we noticed you recently published a new release to CRAN. Will you be providing a v3.6.3 version?

unsupported SEXPTYPE: 19

This case below causes the above message to be printed:

{
    "array":[
	{
	    "foo":"bar"
	},
	{
	    "bah":"baz"
	},
    ]
}

It also fails to load the contents of the "array" object:

$array
list()

Btw, I'd recommend making that printed message an error.

Error not raised on trailing comma in dict

> fromJSON('{"a": "b",}')
list()
> fromJSON('{"a": 1,}')
list()
> fromJSON('{"a": null,}')
list()

This should have created an error, similar to:

> fromJSON('{,}')
Error in fromJSON("{,}") : unexpected character ','

Square brackets (toJSON)

Hello,

I have been using your R package rjson. It has been really useful for some of my current work, in particular for enabling the creation and integration of dynamic network graphs into a user interface generated with R package Shiny.

Everything was working smoothly until sometime this week when I updated a number of R packages as well as my version of R. Now the same code that previously worked fine to generate the contents of the file "Good.html" (shown below in the FIRST screenshot) has for some reason started generating instead the contents of the SECOND file below, "Problematic.html"...
good
problematic

You can see that while "Good.html" contains the appropriate (square) brackets in the section containing the bulk of the script, a la:

var g = { "nodes":[{.....}], "edges":[{....}]};

by contrast, "Problematic.html" is overrun with square brackets. I can use some regular expressions to convert Problematic back into Good, but I was hoping to ask your opinion first in case there was some (new?) argument in toJSON that I should be using to convert a list (in my case a list of two lists, the first of length 20_6 for "nodes" and the second of length 19_6 for "edges") to JSON format s.t. its bracketing format resembles the Good version and not the Problematic version.

Finally, if nothing has changed in rjson's toJSON function, might you still have any suggestions for me? I would really appreciate it. I'm keen to have something stable so I can release this stuff as part of a package whose development I'm trying to wrap up. I'd be really grateful for your help, but either way I just wanted to say thanks for all the great work you've been doing on rjson; it's been very useful and quite nice to work with so far.

Thanks for your time and input, and let me know if anything I said was unclear or if you have any questions.

All the best,
Cait.

incorrect simplify docs

https://cran.r-project.org/web/packages/rjson/rjson.pdf

Today it READS:

#As a result, this will output "1"

toJSON(fromJSON('[1]', simplify=TRUE))

#Compared with this which will output "[1]" as expected

toJSON(fromJSON('[1]', simplify=TRUE))

I think SHOULD READ:

#As a result, this will output "1"

toJSON(fromJSON('[1]', simplify=TRUE))

#Compared with this which will output "[1]" as expected

toJSON(fromJSON('[1]', simplify=FALSE)) <--- should be false?

[feature request] support for bit64::integer64

Hi Alex. Would it be possible to add serialization of integer64 from bit64 package?

rjson::toJSON(list(bit64::as.integer64(111)))
#"[5.48412866883784e-322]"

bit64 pkg uses R's double vectors to keep 64 bit integers (so essentially it is numeric vector with class = "integer64" attribute). Underlying memory layout is the same as C/C++ int64_t.

Add a simplify option to fromJSON to keep json list structure

when simplify=TRUE (the default), json lists will be simplified to vectors

fromJSON("[1,2,3]", simplify=TRUE) would return c(1,2,3)
fromJSON("[1,null, false]", simplify=TRUE) would return list(1, null, false)
fromJSON("[1]", simplify=TRUE) would return c(1)

When simply=FALSE, values will always be returned as lists()

fromJSON("[1,2,3]", simplify=FALSE) would return list(1,2,3)
fromJSON("[1,null, false]", simplify=FALSE) would return list(1, null, false)
fromJSON("[1]", simplify=FALSE) would return list(1)

This would allow users to separate the following cases:

fromJSON('{"key": "value"}')
fromJSON('{"key": ["value"]}')

Feature request: a multiline JSON output

Hi,

I really like rjson, especially its speed and reliability.
One thing that could be improved though: the output of toJSON() is a one-line string, and as such is not human readable.
For example, if you save your JSON to a file:

  • if you want to quickly have a glance at your data, this long line is not readable.
  • if you have saved many JSON files, and want to explore using command-line commands such as grep, it will match all the file or nothing.

I have my own workaround that post-processes the toJSON() ouput, but it would be nice to have it properly and natively implemented, maybe controlled by an argument.
What do you think ?

investigate "String is not utf8" error on what appears to be valid utf8 data

ibrary(base64enc)
library(rjson)
rjson::toJSON(rawToChar(base64decode("8JCNg/CQjL7wkIy58JCMuvCQjLDwkIyy8JCNiQ==")))

produces the error:
unable to escape string. String is not utf8

and if I move to bash, I see it's being decoded correctly:

$ echo 8JCNg/CQjL7wkIy58JCMuvCQjLDwkIyy8JCNiQ== | base64 -d
๐ƒ๐Œพ๐Œน๐Œบ๐Œฐ๐Œฒ๐‰

`fromJSON` Crash With Bad Input

On current CRAN version. Obviously shouldn't be a common issue, but this suggests there might be some undefined behavior lurking in the code that might be worth protecting against.

> rjson::fromJSON('"\\U1F600"')
R(9747,0x10fa46e00) malloc: *** error for object 0x7fba26f74370: pointer being freed was not allocated
R(9747,0x10fa46e00) malloc: *** set a breakpoint in malloc_error_break to debug
Abort trap: 6
rjson(issue20)$ 
R Under development (unstable) (2021-11-21 r81221)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Big Sur/Monterey 10.16

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] compiler_4.2.0 R6_2.5.0       magrittr_2.0.1 tools_4.2.0    roxygen2_7.1.1
 [6] Rcpp_1.0.7     xml2_1.3.2     stringi_1.6.2  knitr_1.33     xfun_0.23     
[11] stringr_1.4.0  rlang_0.4.11   purrr_0.3.4  

not all data parsed check introduces regression

Size check introduced in 7974ab7 does not account for multi-byte characters

rjson::fromJSON('{"a":"\uef"}')
## Error in rjson::fromJSON("{\"a\":\"รฏ\"}") : 
##  not all data was parsed (10 chars were parsed out of a total of 9 chars)

PR w/ fix incoming shortly.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.