bazaarvoice / jolt Goto Github PK

View Code? Open in Web Editor NEW

1.5K 70.0 327.0 1.29 MB

JSON to JSON transformation library written in Java.

License: Apache License 2.0

Shell 0.07% Java 99.93%

jolt's Issues

Removr should be able to handle Arrays

Make Jolt available by brew (maybe)

Dunno how brew handles / can handle Java as a dependency.

Changing date to string

I have a date as object in my input json . I want to store it as a string in my output json.
From what I understood Jolt doesn't do any type conversations. But I am wondering whether it is possible at all?

[docs] Getting started documents only one dependency while there are two

Getting started wiki page documents that jolt-core only depends on "apache.commons". It should be noted that it's commons-lang. More important issue is that javax.inject is also a dependency so either docs should be updated or javax.inject dependency removed.

Spec errors should have location and context

I'm a fumble-fingered fugitive from typing class who couldn't data-enter a JSON file correctly if my life depended on it. So I really appreciate the fix you made in 0.13 that identifies row and column for JSON format mistakes. Unfortunately, even if I get the JSON format correct, I can still make Jolt DSL mistakes and then have no clue on where I did something wrong. For example, if I have a JSON-legal leaf that reads:

{ "value": "&$3.DollarsEarned" }

Then Jolt gives me the error:

DotNotation (write key) can not contain '@', '*', or '$'.

This is nice, but it does not give me row and column. I know, the location information is probably lost when the json file was sucked into a Chainr object, but there are ways to preserve it (as attributes called joltRow and joltColumn if nothing else; I have modified XML parsers to do the same thing). At any rate, without too much work, I changed line 46 of ShiftrWriter.java to read:

throw new SpecException("DotNotation (write key) can not contain '@', '*', or '$' at " + dotNotation + ".");

Now, the error output gives me some context:

DotNotation (write key) can not contain '@', '*', or '$' at root.&$3.DollarsEarned.

Giving context and location in every SpecException would be nice. The problem is that there are lots of them (which is a great thing, BTW). I'd change them all myself, except:

I might break something,
I'm not so familiar with the code that I know exactly how to get location.
Some people might not like to see more error information (IMHO, they are silly), and
Upper management might not approve of my giving away source code.

So all I can do is ask.

Pretty please.

Create a "jolt-complete" module that bundles up "core", "json-utils", and "guice".

Concierge, and generally other server side Java deployments of Jolt, will pretty much grab all three of those artifacts, so let's just package them all up.

Leverage Github "Releases"

https://github.com/blog/1547-release-your-software

Fix JsonUtils removeRecursive to not break "un-necessarily" on ImmutableMaps

removeRecursive should do a contains, then a remove.

LHS parent value reference

I need some assistance.

Input

{
   "entities":[
      {
         "type":"alpha",
         "data":"foo"
      },
      {
         "type":"beta",
         "data":"bar"
      },
      {
         "type":"alpha",
         "data":"baz"
      }
   ]
}

Desired Output

{
   "alpha":[
      {
         "type":"alpha",
         "data":"foo"
      },
      {
         "type":"alpha",
         "data":"baz"
      }
   ],
   "beta":[
      {
         "type":"beta",
         "data":"bar"
      }
   ]
}

Using this shiftr spec

[
    {
        "operation": "shift",
        "spec": {
            "entities": {
                "*": {
                    "type": {
                        "*": {
                            "$2": "&1.[]"
                        }
                    }
                }
            }
        }
    }
]

I get...

{
   "alpha":[
      "0",
      "2"
   ],
   "beta":[
      "1"
   ]
}

Is there way insert the value object found at that index, as opposed to the index? Something like "entities[$2]": "&1.[]"

ArrayOrderObliviousDiffy fails to identify identical json documents

Given two identical json document with differently ordered sub-arrays, jolt does not identify then as identical.

jolt diffy -a expected.json actual.json

expected.json

{
  "TagDistributionOrder" : [ "ProsGames", "ConsGames" ],
  "TagDistribution" : {
    "ProsGames" : {
      "Id" : "ProsGames",
      "Label" : "ProsGames",
      "Values" : [ {
        "Value" : "Can Withstand Use",
        "Count" : 3
      }, {
        "Value" : "Interactive",
        "Count" : 3
      }, {
        "Value" : "Thought Provoking",
        "Count" : 3
      }, {
        "Value" : "Entertaining",
        "Count" : 3
      }, {
        "Value" : "Fun",
        "Count" : 2
      }, {
        "Value" : "Easy To Play",
        "Count" : 2
      }, {
        "Value" : "Educational",
        "Count" : 1
      } ]
    },
    "ConsGames" : {
      "Id" : "ConsGames",
      "Label" : "ConsGames",
      "Values" : [ {
        "Value" : "Visually Unpleasing",
        "Count" : 3
      }, {
        "Value" : "Difficult Instructions",
        "Count" : 2
      }, {
        "Value" : "Unoriginal",
        "Count" : 1
      }, {
        "Value" : "Boring",
        "Count" : 2
      }, {
        "Value" : "Poor Quality",
        "Count" : 2
      } ]
    }
  }
}

actual.json

{
  "TagDistributionOrder" : [ "ConsGames", "ProsGames" ],
  "TagDistribution" : {
    "ConsGames" : {
      "Id" : "ConsGames",
      "Label" : "ConsGames",
      "Values" : [ {
        "Value" : "Visually Unpleasing",
        "Count" : 3
      }, {
        "Value" : "Poor Quality",
        "Count" : 2
      }, {
        "Value" : "Difficult Instructions",
        "Count" : 2
      }, {
        "Value" : "Boring",
        "Count" : 2
      }, {
        "Value" : "Unoriginal",
        "Count" : 1
      } ]
    },
    "ProsGames" : {
      "Id" : "ProsGames",
      "Label" : "ProsGames",
      "Values" : [ {
        "Value" : "Thought Provoking",
        "Count" : 3
      }, {
        "Value" : "Interactive",
        "Count" : 3
      }, {
        "Value" : "Entertaining",
        "Count" : 3
      }, {
        "Value" : "Can Withstand Use",
        "Count" : 3
      }, {
        "Value" : "Fun",
        "Count" : 2
      }, {
        "Value" : "Easy To Play",
        "Count" : 2
      }, {
        "Value" : "Educational",
        "Count" : 1
      } ]
    }
  }
}

Erroneous Output

Differences found. Input #1 contained this:
{
  "TagDistribution" : {
    "ConsGames" : {
      "Values" : [ {
        "Value" : "Difficult Instructions"
      }, {
        "Value" : "Poor Quality",
        "Count" : 2
      } ]
    }
  }
}
Input #2 contained this:
{
  "TagDistribution" : {
    "ConsGames" : {
      "Values" : [ {
        "Value" : "Poor Quality"
      }, {
        "Value" : "Difficult Instructions",
        "Count" : 2
      } ]
    }
  }
}

Add '#' wildcard to support the hokey but useful Map to Array transform

Currently Jolt can transform Arrays into Maps, but it can't do the opposite of transforming a Map into an Array.

The idea is to add an implicit counter that can be be used to "auto generate" array indices.

The usecase is hokey, in that Map to Array transform will not have a guaranteed order, but sometimes there doesn't need to be an order.

Shiftr # needs to be updated. It's scope was too narrow.

It's inception use case of counting the number of '' matches works, but it does it by rolling all the '' hits up to the parent.

It does not work if you have prefixed "stars", aka "rating-" and "cdv-". In that case you want the number of matches against those individual specs, and not the parent.

Test case here : https://github.com/milosimpson/jolt/blob/bb08587844c4a9319f76a3645e503b1672537ae6/jolt-core/src/test/resources/json/shiftr/prefixedStarsToLists.json

Would be useful to allow addition to STOCK_TRANSFORMS

Hi, I'm a new user of Jolt. First off, thanks for a very useful and well put together library!

I am writing a new transform and the JSON using it will be created/modified by semi-technical users of our product. I'd rather not have them enter the fully qualified classname for this custom transform.

Currently STOCK_TRANSFORMS is unmodifiable and final, so I don't think there is a way to register a new one.

I may be able to make the changes if you think this is useful and have an idea how it should best be done...

Thanks, Alfie.

Add "@" logic to Defaultr

You can use * logic to apply defaults to children object, but if you do so it "forces" the parent object to be created.

If added "@" to Defaultr we could control that.

Jolt Diffy subcommand doesn't support file descriptors

Normal diff utility supports the use of file descriptors as inputs, but this doesn't work with Jolt Diffy:

Old school diff:

diff <(cat myfile) <(some-script.sh)

Diffy:

jolt diffy <(curl http://www.example.com/file.json) <(some-script-that-outputs-json.sh)

If you have at least one file, @milosimpson and @snkinard noted that stdin can be used to diff against a file as a workaround, although this syntax feels a little awkward:

curl -s http://www.example.com/file.json | jolt diffy some-file.json

Of course, it still doesn't work for the case where you want to use two file descriptors.

Design discussion - explicit usage of Jackson

Hi Milo, guys,

I would like to help you with developing/improving JOLT as it looks very promising! Before I start submitting any big changes I would like to discuss them with you and get general approval for my ideas. So first a couple of questions about existing design. Here goes the first one:

Why Transform.transform() and similar methods parse and return Object? Did you consider using Jackson's JsonNode, etc. instead?

It looks like JOLT is already using Jackson and keeping and maintaining 'hydrated' object only make the library code less readable and harder to understand for a newcomer. What do you think about using 'native' Jackson's interfaces?

Provide a Chainr factory method/class

ChainrSpecs are defined using JSON files. Currently, a Jolt user would have to rewrite the boilerplate to read from a JSON file into a ChainrSpec for every project where it is used. A factory method that can abstract this away would be quite convenient.

CLI appears to run in "silent" mode with shortened sub-commands

When using a shortened version of a sub command, the CLI appears to be running in 'mode'. ex:

jolt diffy diff3.json diff4.json

The above writes output to standard out. The below does not

jolt diff diff3.json diff4.json

The two commands should behave the same or shortened sub commands should not be considered valid by the argument parser

Make pretty Github page for Jolt

http://pages.github.com/

Implement Nudgr aka Shiftr with implied @ at every level

This would be useful for implementing "backwards compatibility" of JSON data APIs.
Example, current code generates data in "1.5" format, but you need to be able to transform it "down" to 1.4, and 1.3 for existing clients.
Assuming that the changes between versions are mostly "minor", the idea would be to write Jolt transforms for 1.5 -> 1.4 and 1.4 -> 1.3.
In that case, Shiftr is not your friend, as he will only pass thru data that is explicitly in his spec. We need a new transform for the usecase of "I just want to adjust / nudge a little bit of my input JSON."

Hello Concierge Team!

Transformation problem

Hello,
Can you help me with my problem with transformation of input json file like this:

[
    {
        "a": "t1",
        "b": "t2",
        "c": "t3",
        "d": "t5",
        "e": "t8"
    },
    {
        "a": "t1",
        "b": "t2",
        "c": "t3",
        "d": "t6",
        "e": "t9"
    },
    {
        "a": "t1",
        "b": "t2",
        "c": "t4",
        "d": "t7",
        "e": "t10"
    }
]

I need to get this json file:

{
    "a": "t1",
    "b": "t2",
    "c": {
        "t3": [
            {
                "d": "t5",
                "e": "t8"
            },
            {
                "d": "t6",
                "e": "t9"
            }
        ],
        "t4": [
            {
                "d": "t7",
                "e": "t10"
            }
        ]
    }
}

I don't understand how can I do this agregation with JOLT

I tried to create spec like this:

[
    {
        "operation": "shift",
        "spec": {
            "*": {
                "a": "a",
                "b": "b",
                "c": {
                    "*": {
                        "$": "c"
                    }
                }
            }
        }
    }
]

And result is:

{
  "a" : [ "t1", "t1", "t1" ],
  "b" : [ "t2", "t2", "t2" ],
  "c" : [ "t3", "t3", "t4" ]
}

Questions:

How can I agregate "a" and "b" to one value (if all values are the same)?
How can I add to elements of array "c" some children from top level of json?
Maybe another questions will apear later

or maybe this problem cannot be resolved using lastest (0.0.12) version of JOLT?

Command Line Tools for Jolt

Jolt should have a suite of command line tools, so one can utilize diffy and the various transforms at the command line

Create Demo site where transforms can be run

Something like a little ElasticBeanstalk or Heroku thing that can receive json run the transform and return.

Needs a pretty web page with reasonable examples pre-populated.

Fail build when license headers are omitted from a file

From @shawnsmith:

It's really common to forget to keep license headers up-to-date as code evolves. So when you're adding them to all the files I strongly suggest you look into something like http://creadur.apache.org/rat/ to fail the build when license headers are omitted. For example:

   <build>
      <plugins>
        <plugin>
          <groupId>org.apache.rat</groupId>
          <artifactId>apache-rat-plugin</artifactId>
          <version>0.9</version>
          <configuration>
            <excludes>
            <exclude>.git/**</exclude>
            <exclude>.gitignore</exclude>
            <exclude>**/resources/*.txt</exclude>
            </excludes>
          </configuration>
          <executions>
            <execution>
              <id>rat-check</id>
              <phase>test</phase>
              <goals>
                <goal>check</goal>
              </goals>
            </execution>
          </executions>
        </plugin>
  ...

Option to force existence of fields in a shiftr transformer

Hey,

do we have an option to "force" (e.g. fail where there is no such field) existence of a field that is being shifted to the other field? It would be very useful. If there's no such feature what do you think about adding it?

Cli for Chainr

Shiftr RHS "" should have more tests

Aka what happens if two different places in the spec try to write to RHS ""?

Design discussion - dependency on Guava in source code

Another design idea. I would like to introduce dependency on Guava in JOLT source tree (it's already there for tests) and start using Preconditions for validating arguments. Any concerns?

One of the changes would be to start throwing NPE instead of IAE exceptions for null method arguments.

Getting started example is not working

Hi
https://github.com/bazaarvoice/jolt/blob/master/gettingStarted.md
with version 0.0.10 is not working. getting JoltBootStrap.class not found.

Removr should be able to handle Left Hand Side "*" wildcards.

Example / Unit Test :

{
    "input": {
        "TAG-Sharpness$fr": "nettete",
        "TAG-Sharpness#fr_fr": "nettete",
        "TAG-Sharpness": "Sharpness",

        "TAG-Bob": "smith",
        "TAG-Bob$ge": "",

        "ThisIsSillypantsValue" : "should be delted",

        "buckets": {
            "a$b": "AB",
            "c$d": "cd",
            "bucket-a$b": "ab"
        }
    },

    "spec": {
        "TAG-*$*": "",
        "TAG-*#*": "",

        "*pants*" : "",

        "buckets": {
            "a$*": ""
        }
    },

    "expected": {
        "TAG-Sharpness": "Sharpness",

        "TAG-Bob": "smith",

        "buckets": {
            "c$d": "cd",
            "bucket-a$b": "ab"
        }
    }
}

Right now Revovr is just a simple recursive parallel tree walk which does not handle JSON arrays.

For the purposes of this use case, it does not need to handle JSON arrays.

I the idea here is to parse the Removr spec into RemovrSpec nodes.
A RemovrSpecNode would have two lists of children, Literal and Computed (following the pattern from CompositeShiftrSpec). If the RemovrSpecNode has no children, then it can just remove if it matches.

If a RemovrSpecNode just has literal children, then it can just loop over the list of LiteralChildren and see if their keys are in the input.

If a RemovrSpecNode has both Literal and ComputedChildren, then it should evaluate the Literal children first, then the Computed ones.

Should be able to reuse the StarPathElement class, to help with the pattern matching.

how do you do an array implode shift spec

input

        "food_pairing": [{
            "food_id": "7",
            "food_name": "Salad"
        }, {
            "food_id": "12",
            "food_name": "Fish"
        }, {
            "food_id": "13",
            "food_name": "Shellfish"
        }, {
            "food_id": "19",
            "food_name": "Vegetarian"
        }]

ouput

        "foods": "Salad, Fish, Shellfish, Vegetarian"

Can jolt achieve this transformation?

"Getting Started" java code breaks

Jolt is an awesomely great idea, and the java/json community should be grateful that Milo and Sam implemented it.

Unfortunately, the start-up example is difficult to understand (it would have made sense to use one of the prototypical music discograpy or book catalog examples).

More importantly, the java code in "Getting Started" has a bug. Specifically, in the java code at:

Chainr chainr = new Chainr( chainrSpecJSON );

I get:

Exception in thread "main" java.lang.ClassCastException: java.util.LinkedHashMap cannot be cast to com.bazaarvoice.jolt.JoltTransform

ArrayOrderObliviousDiffy has no tests

They were all in the parent project when Jolt was extracted.

Continuous Integration and Realtime Code Coverage Stats

The Jolt project should use [TravisCI|https://travis-ci.org/](or some other tool) for continuous integration testing and real time code coverage stats in the readme

Create key/value pair from 2 values

I am trying to convert this json:

{
    "id": "1234",
    "classification": "aclass",
    "condition": "poor",
    "links": [
        {
            "name": "self",
            "href": "https://selfurl.com"
        },
        {
            "name": "events",
            "href": "https://eventurl.com"
        }
    ],
    "eventList": {
        "totalCount": 2,
        "criticalCount": 1,
        "majorCount": 1,
        "minorCount": 2,
        "okCount": 0,
        "informationalCount": 0,
        "events": [
            {
                "id": "event1",
                "elementId": "element1",
                "requestedFields": [
                    {
                        "name": "Severity",
                        "value": "Fatal"
                    },
                    {
                        "name": "status_url",
                        "value": "https://statusurl.com"
                    }
                ]
            },
            {
                "id": "event2",
                "elementId": "element2",
                "requestedFields": [
                    {
                        "name": "Severity",
                        "value": "Medium"
                    },
                    {
                        "name": "status_url",
                        "value": "https://statusurl.com"
                    }
                ]
            }
        ]
    }
}

with this spec:

[
    {
        "operation": "shift",
        "spec": {
            "id": "summary.id",
            "classification": "summary.classification",
            "condition": "summary.condition",
            "eventList": {
                "informationalCount": "summary.informationcount",
                "majorCount": "summary.majorcount",
                "minorCount": "summary.minorcount",
                "okCount": "summary.okcount",
                "totalCount": "summary.totalcount",
                "events": {
                    "0": {
                        "*": {
                            "$": "headers[]",
                            "@": "rows[]"
                        }
                    },
                    "*": {
                        "*": {
                            "@": "rows[]"
                        }
                    }
                }
            }
        }
    }
]

which produces:

{
  "headers" : [ "id", "elementId", "requestedFields" ],
  "rows" : [ "event1", "element1", [ {
    "name" : "Severity",
    "value" : "Fatal"
  }, {
    "name" : "status_url",
    "value" : "https://statusurl.com"
  } ], "event2", "element2", [ {
    "name" : "Severity",
    "value" : "Medium"
  }, {
    "name" : "status_url",
    "value" : "https://statusurl.com"
  } ] ],
  "summary" : {
    "classification" : "aclass",
    "condition" : "poor",
    "id" : "1234",
    "informationcount" : 0,
    "majorcount" : 1,
    "minorcount" : 2,
    "okcount" : 0,
    "totalcount" : 2
  }
}

I want to make 'rows' an array of arrays where each event is stored in a nested array. I would also like to flatten the events and move any nested elements up a level. To do this the 'name' and 'value' values would have to become key/value pairs in the parent (event). So the output would be:

{
  "headers" : [ "id", "elementId", "requestedFields" ],
  "rows" : [ ["event1", "element1", "Severity" : "Fatal", "status_url" : "https://statusurl.com"], ["event2", "element2", "Severity" : "Medium", "status_url" : "https://statusurl.com" ] ],
  "summary" : {
    "classification" : "aclass",
    "condition" : "poor",
    "id" : "1234",
    "informationcount" : 0,
    "majorcount" : 1,
    "minorcount" : 2,
    "okcount" : 0,
    "totalcount" : 2
  }
}

Is this possible?

Thanks,

Rob

List<Object> list = Cardinality.many( Object input );
and
Object obj = Cadinality.one( Object input );

bazaarvoice / jolt Goto Github PK

jolt's Issues

Recommend Projects

Recommend Topics

Recommend Org