Git Product home page Git Product logo

json.awk's People

Contributors

kimbo avatar mohd-akram avatar step- avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

json.awk's Issues

No output on GNU AWK, wrong output depending on AWK version

Introduction

Version 1.01 includes a kludgy work-around for some AWK implementations that incorrectly tokenize the backslash-quote escape in JSON strings. Recently I stumbled upon a good test case which has enabled me to:

  • Get rid of the kludgly work-around
  • Uncover and fix another major AWK implementation-dependent issue

This fix is included in JSON.awk version 1.1 (recommended upgrade) along with other minor enhancements and fixes.

If you want to trace the following steps you should start from https://github.com/step-/JSON.awk/blob/962a8e4a44eb310866a587df355973e62d43790c/JSON.awk and comment out lines 236-239, 241-242.

Issue Description

Create file test.json with the following contents:

{"sh": "[ -r 'k.cfg' ] || echo \"# k.cfg - `date`\" >'k.cfg';s=$(awk \"`echo -n 'BEGIN{nf=1} /^@Bs*mode=/{sub(/=.*/,@Q=123@Q);nf=0} {print} END{if(nf) print @Qmode=123@Q}'|sed 's/@Q/\\x22/g;s/@B/\\x5C/g'`\" 'k.cfg') && [ 0 != ${#s} ] && echo -n \"$s\" >'k.cfg'"}

Parse test.json with JSON.awk:

echo -e "test.json\n" | awk -f JSON.awk

I am going to show output from two AWK implementations; (A) GNU AWK 4.0.0 and (B) Busybox 1.17.1 AWK.
(A)

expected <, or }> but got <#> at input token 5
{ "sh" : "[ -r 'k.cfg' ] || echo \" <<#>> k . c f g - ` d a t

(B)

expected <value> but got <"> at input token 4
{ "sh" : <<">> [ - r ' k . c f g ' 

Now change line 230 from

CHAR="[^[:cntrl:]\\\"]"

to:

CHAR="[^[:cntrl:]\"\\]"

(A)

 awk: JSON.awk:241: (FILENAME=uiui FNR=1) fatal: Invalid range end: /"[^[:cntrl:]"\]*((\[^u[:cntrl:]]|\u[0-9a-fA-F]{4})[^[:cntrl:]"\]*)*"|-?(0|[1-9][0-9]*)([.][0-9]*)?([eE][+-]?[0-9]*)?|null|false|true|[[:space:]]+|./

(B)

expected <value> but got <"> at input token 4
{ "sh" : <<">> [ - r ' k . c f g ' 

So we get very different results and, in all cases, the tokenizer fails. Replacing a regex constant for the string constant in gsub() on line 241 fixes this issue on both platforms! Change line 241 to:

gsub(/\"[^[:cntrl:]\"\\]*((\\[^u[:cntrl:]]|\\u[0-9a-fA-F]{4})[^[:cntrl:]\"\\]*)*\"|-?(0|[1-9][0-9]*)([.][0-9]*)?([eE][+-]?[0-9]*)?|null|false|true|[[:space:]]+|./, "\n&", a1)

You may comment out line 230 if you like, it makes no difference anymore.

(A)

["sh"]  "[ -r 'k.cfg' ] || echo \"# k.cfg - `date`\" >'k.cfg';s=$(awk \"`echo -n 'BEGIN{nf=1} /^@Bs*mode=/{sub(/=.*/,@Q=123@Q);nf=0} {print} END{if(nf) print @Qmode=123@Q}'|sed 's/@Q/\\x22/g;s/@B/\\x5C/g'`\" 'k.cfg') && [ 0 != ${#s} ] && echo -n \"$s\" >'k.cfg'"

(B)

["sh"]  "[ -r 'k.cfg' ] || echo \"# k.cfg - `date`\" >'k.cfg';s=$(awk \"`echo -n 'BEGIN{nf=1} /^@Bs*mode=/{sub(/=.*/,@Q=123@Q);nf=0} {print} END{if(nf) print @Qmode=123@Q}'|sed 's/@Q/\\x22/g;s/@B/\\x5C/g'`\" 'k.cfg') && [ 0 != ${#s} ] && echo -n \"$s\" >'k.cfg'"

Conclusion

Although a complex regex constant isn't very readable it gets the job done correctly, so this change is committed for good - with due comments - in version 1.1

How to pass JSON using pipeline?

I want to be able to use the following command:

$ curl -s https://raw.github.com/archan937/jsonv.sh/master/examples/complex.json | awk -f JSON.awk

But until now, I have found that I can pass a file path, e.g.:

$ echo "examples/complex.json" | awk -f utils/json.awk

and that I can enter JSON with cat:

$ { echo -; echo; cat; } | awk -f utils/json.awk 
{"foo":"bar"}
^D
["foo"] "bar"

The latter option is close but I want it to be automated.

Can you provide a solution for this? I am trying to use JSON.awk with https://github.com/archan937/jsonv.sh (instead of JSON.sh)

Error about regex

I got this problem.
awk: ./JSON.awk:258: warning: regexp escape sequence "' is not a known regexp operator`

it's just a warning. But the result is ok.

HOW-TO mac OSX

I do not have access to Apple computers so I can't test JSON.awk under OSX. I would like for this thread to become a self-help resource for mac users.

If you have found an issue involving JSON.awk and a mac OSX system, and you have found a solution, please post a comment here describing the issue and the solution you have found.

In the interest of anyone reading please do not post issues without solutions. If you want support please open a new issue. This thread is about solutions.

Can't parse stdin with JSON data

There are cases when JSON.awk can't parse valid JSON data coming from a pipe. For instance see issue #6, comments 6 (definition) through 8 (analysis). Test case file

illegal primary in regular expression ^(|[^0-9])$

Got the following error when I tried json.awk on OSX.

awk: illegal primary in regular expression ^(|[^0-9])$ at [^0-9])$
 source line number 158 source file json.awk
 context is
        } else if (TOKEN ~ >>>  /^(|[^0-9])$/ <<< ) {

How to embed JSON.awk without editing function apply?

(Request started in PR #11)

The current method for embedding JSON.awk in a larger awk application requires modifying JSON.awk by editing stub function apply. Can a new embedding method be defined that doesn't require modifying the JSON.awk script, and that works across POSIX awk, mawk and gawk?

For instance, the following method (not tested, suggested in #11) leverages gawk's @include statement, which POSIX awk and mawk don't support.

$ declare -A "$(aws deploy get-deployment --deployment-id $DEPLOYMENT_ID \
  |awk -v STREAM=0 -v ARRAY="DEPLOYMENT_GITHUB" -v KEYS='"deploymentInfo","revision","gitHubLocation","(commitId|repository)"' '
    @include "json.awk";
{
    array = sprintf("%s=(",ARRAY);
    regex = "["KEYS"]";
    for (key in JPATHS) {
        if( JPATHS[key] ~ KEYS ) {
            n=patsplit(JPATHS[key], path, "\"([^\"]+)\"");
            array = sprintf("%s [%s]=%s", array, path[n-1], path[n]);
        }
    }
    array = sprintf("%s )", array);
    printf "%s", array;
}' -
)"
$ echo $DEPLOYMENT_GITHUB['repository'];
$ echo $DEPLOYMENT_GITHUB['commitId'];

mawk 1.3.3 support (Debian/Ubuntu)

It seems that the script is not mawk compatible. Running with test file:

{
    "asd":1
}

Result:

./test.json: expected <string> but got <EOF> at input token 2
{ <<EOF>>          " a s d "
invalid: ./test.json
expected <string> but got <EOF> at input token 2
{ <<EOF>>          " a s d "

System info:

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.1 LTS
Release:        16.04
Codename:       xenial

AWK: ii mawk 1.3.3-17ubuntu2

It works as expected with gawk.

Even the simplest json can not be parsed on FreeBSD or OS X

curl -O https://raw.githubusercontent.com/step-/JSON.awk/master/JSON.awk
cat <<EOF > x.json
{
    "name": "latest"
}
EOF
awk -f JSON.awk x.json 

produces

x.json: expected <string> but got <"> at input token 2
{ <<">> 
x.json: expected <value> but got <a> at input token 1
<<a>> m e ": " l a t e s t " 
invalid: x.json
expected <string> but got <"> at input token 2
{ <<">> 
expected <value> but got <a> at input token 1
<<a>> m e ": " l a t e s t "

This is on FreeBSD with awk version 20121220 (FreeBSD) as well as on Darwin with awk version 20070501.

function cb_fail1 never defined

I have downloaded the JSON.awk-1.4.2.tar.gz from JSON.awk tarball, and ported it to Ubuntu 18.04.3 LTS (Bionic Beaver), when I run it with
awk -f JSON.awk object.json
and it print out

awk: JSON.awk: line 409: function cb_fail1 never defined
awk: JSON.awk: line 409: function cb_parse_object_exit never defined
awk: JSON.awk: line 409: function cb_parse_object_enter never defined
awk: JSON.awk: line 409: function cb_parse_object_empty never defined
awk: JSON.awk: line 409: function cb_parse_array_exit never defined

the "object.json" file content list below:
{
"key": "Value"
}
Is it a issue?

Parse files with byte-order-mark

When parsing a JSON file with BOM such as:

https://github.com/dotnet/toolset/blob/40cc5860e2ef311b9aca733b1d2eccaa681bd422/TestAssets/InstallationScriptTests/InstallationScriptTests.json

JSON.awk gives the following error:

/datadrive/projects/toolset/TestAssets/InstallationScriptTests/InstallationScriptTests.json: expected <value> but got <> at input token 1
<<>> { "sdk" : { "version" : "1.0.0-beta.19463.3" } }

Current workaround is to strip these charecters using tool like sed sed '1s/^\xEF\xBB\xBF//' "$json_file" | awk -f JSON.awk - | .....

It would be nice if parser ignores these BOM characters so consumer do not need to strip them.

Bug?

gawk: JSON.awk:231: (FILENAME=c.tmp FNR=1) fatal: delete: illegal use of variable `JPATHS' as array

My gawk can not delete befor use.

split("",JPATHS)
is better than
delete JPATHS;

Use of [:cntrl:] character class in tokenize()

The POSIX [:cntrl:] character class does not exactly cover the same chars which must be escaped according to https://tools.ietf.org/html/rfc7159#section-7 (i.e. U+0000 through U+001F). [:cntrl:] does also cover U+007F, and all C1 control chars when used in a UTF locale. See the below example where I am getting an error in my locale "en_US.UTF-8". Apart from using LC_ALL=C, the error can be avoided when changing [:cntrl:] to the range defined in the spec: \x00-\x1F.

$ echo world_bank109.json | awk -f JSON.awk > /dev/null
world_bank109.json: expected <value> but got <"> at input token 263
, "productlinetype" : "L" , "project_abstract" : { "cdata" : <<">> T h e o b j e c t i
$ echo world_bank109.json | LC_ALL=C awk -f JSON.awk > /dev/null
(no error message here)

world_bank109.json.txt, which is line 109 from the world bank sample file at http://jsonstudio.com/resources/

[Busybox] "Invalid regexp" after applying the busybox patch from FAQ

Hello,

I'm running Busybox on Alpine 3.12.1. I'm still getting an "Invalid regexp" after applying the Busybox patch from the FAQ (sed -i "s#\\\000#\\\001#g" JSON.awk).

Before patch:

/ # wget -qO- http://localhost:8080/actuator/metrics/jvm.memory.committed | awk -f JSON.awk -
awk: bad regex '^|^��|^��|"[^"\\': Invalid regexp

After patch (sed -i "s#\\\000#\\\001#g" JSON.awk):

/ # wget -qO- http://localhost:8080/actuator/metrics/jvm.memory.committed | awk -f JSON.awk -
awk: bad regex '^|^��|^��|"[^"\\╔-]*((\\[^u╔-]|\\u[0-9a-fA-F][0-9a-fA-F][0-9a-fA-F][0-9a-fA-F])[^"\\╔-]*)*"|-?(0|[1-9][0-9]*)([.][0-9]+)?([eE][+-]?[0-9]+)?|null|false|true|[
]+|.': Invalid regexp
/ #

Infos:

/ # cat /etc/alpine-release
3.12.1
/ # awk --help
BusyBox v1.31.1 () multi-call binary.

Usage: awk [OPTIONS] [AWK_PROGRAM] [FILE]...

        -v VAR=VAL      Set variable
        -F SEP          Use SEP as field separator
        -f FILE         Read program from FILE
        -e AWK_PROGRAM
/ #

What are the valid ESCAPE characters?

The JSON grammar diagram on www.json.org shows a small set of valid escaped charactes, while the ESCAPE regex in tokenizer() is more inclusive. Should the ESCAPE regex be reduced to comply with json.org?

What do other JSON parsers do?

  • jshon reports, i.e., \x22 as an invalid escape

Experts advice?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.