bazaarvoice / jolt Goto Github PK
View Code? Open in Web Editor NEWJSON to JSON transformation library written in Java.
License: Apache License 2.0
JSON to JSON transformation library written in Java.
License: Apache License 2.0
Dunno how brew handles / can handle Java as a dependency.
I have a date as object in my input json . I want to store it as a string in my output json.
From what I understood Jolt doesn't do any type conversations. But I am wondering whether it is possible at all?
Getting started wiki page documents that jolt-core only depends on "apache.commons". It should be noted that it's commons-lang. More important issue is that javax.inject is also a dependency so either docs should be updated or javax.inject dependency removed.
I'm a fumble-fingered fugitive from typing class who couldn't data-enter a JSON file correctly if my life depended on it. So I really appreciate the fix you made in 0.13 that identifies row and column for JSON format mistakes. Unfortunately, even if I get the JSON format correct, I can still make Jolt DSL mistakes and then have no clue on where I did something wrong. For example, if I have a JSON-legal leaf that reads:
{ "value": "&$3.DollarsEarned" }
Then Jolt gives me the error:
DotNotation (write key) can not contain '@', '*', or '$'.
This is nice, but it does not give me row and column. I know, the location information is probably lost when the json file was sucked into a Chainr object, but there are ways to preserve it (as attributes called joltRow and joltColumn if nothing else; I have modified XML parsers to do the same thing). At any rate, without too much work, I changed line 46 of ShiftrWriter.java to read:
throw new SpecException("DotNotation (write key) can not contain '@', '*', or '$' at " + dotNotation + ".");
Now, the error output gives me some context:
DotNotation (write key) can not contain '@', '*', or '$' at root.&$3.DollarsEarned.
Giving context and location in every SpecException would be nice. The problem is that there are lots of them (which is a great thing, BTW). I'd change them all myself, except:
So all I can do is ask.
Pretty please.
Concierge, and generally other server side Java deployments of Jolt, will pretty much grab all three of those artifacts, so let's just package them all up.
removeRecursive should do a contains, then a remove.
I need some assistance.
Input
{
"entities":[
{
"type":"alpha",
"data":"foo"
},
{
"type":"beta",
"data":"bar"
},
{
"type":"alpha",
"data":"baz"
}
]
}
Desired Output
{
"alpha":[
{
"type":"alpha",
"data":"foo"
},
{
"type":"alpha",
"data":"baz"
}
],
"beta":[
{
"type":"beta",
"data":"bar"
}
]
}
Using this shiftr spec
[
{
"operation": "shift",
"spec": {
"entities": {
"*": {
"type": {
"*": {
"$2": "&1.[]"
}
}
}
}
}
}
]
I get...
{
"alpha":[
"0",
"2"
],
"beta":[
"1"
]
}
Is there way insert the value object found at that index, as opposed to the index? Something like "entities[$2]": "&1.[]"
Given two identical json document with differently ordered sub-arrays, jolt does not identify then as identical.
jolt diffy -a expected.json actual.json
expected.json
{
"TagDistributionOrder" : [ "ProsGames", "ConsGames" ],
"TagDistribution" : {
"ProsGames" : {
"Id" : "ProsGames",
"Label" : "ProsGames",
"Values" : [ {
"Value" : "Can Withstand Use",
"Count" : 3
}, {
"Value" : "Interactive",
"Count" : 3
}, {
"Value" : "Thought Provoking",
"Count" : 3
}, {
"Value" : "Entertaining",
"Count" : 3
}, {
"Value" : "Fun",
"Count" : 2
}, {
"Value" : "Easy To Play",
"Count" : 2
}, {
"Value" : "Educational",
"Count" : 1
} ]
},
"ConsGames" : {
"Id" : "ConsGames",
"Label" : "ConsGames",
"Values" : [ {
"Value" : "Visually Unpleasing",
"Count" : 3
}, {
"Value" : "Difficult Instructions",
"Count" : 2
}, {
"Value" : "Unoriginal",
"Count" : 1
}, {
"Value" : "Boring",
"Count" : 2
}, {
"Value" : "Poor Quality",
"Count" : 2
} ]
}
}
}
actual.json
{
"TagDistributionOrder" : [ "ConsGames", "ProsGames" ],
"TagDistribution" : {
"ConsGames" : {
"Id" : "ConsGames",
"Label" : "ConsGames",
"Values" : [ {
"Value" : "Visually Unpleasing",
"Count" : 3
}, {
"Value" : "Poor Quality",
"Count" : 2
}, {
"Value" : "Difficult Instructions",
"Count" : 2
}, {
"Value" : "Boring",
"Count" : 2
}, {
"Value" : "Unoriginal",
"Count" : 1
} ]
},
"ProsGames" : {
"Id" : "ProsGames",
"Label" : "ProsGames",
"Values" : [ {
"Value" : "Thought Provoking",
"Count" : 3
}, {
"Value" : "Interactive",
"Count" : 3
}, {
"Value" : "Entertaining",
"Count" : 3
}, {
"Value" : "Can Withstand Use",
"Count" : 3
}, {
"Value" : "Fun",
"Count" : 2
}, {
"Value" : "Easy To Play",
"Count" : 2
}, {
"Value" : "Educational",
"Count" : 1
} ]
}
}
}
Erroneous Output
Differences found. Input #1 contained this:
{
"TagDistribution" : {
"ConsGames" : {
"Values" : [ {
"Value" : "Difficult Instructions"
}, {
"Value" : "Poor Quality",
"Count" : 2
} ]
}
}
}
Input #2 contained this:
{
"TagDistribution" : {
"ConsGames" : {
"Values" : [ {
"Value" : "Poor Quality"
}, {
"Value" : "Difficult Instructions",
"Count" : 2
} ]
}
}
}
Currently Jolt can transform Arrays into Maps, but it can't do the opposite of transforming a Map into an Array.
The idea is to add an implicit counter that can be be used to "auto generate" array indices.
The usecase is hokey, in that Map to Array transform will not have a guaranteed order, but sometimes there doesn't need to be an order.
It's inception use case of counting the number of '' matches works, but it does it by rolling all the '' hits up to the parent.
It does not work if you have prefixed "stars", aka "rating-" and "cdv-". In that case you want the number of matches against those individual specs, and not the parent.
Test case here : https://github.com/milosimpson/jolt/blob/bb08587844c4a9319f76a3645e503b1672537ae6/jolt-core/src/test/resources/json/shiftr/prefixedStarsToLists.json
Hi, I'm a new user of Jolt. First off, thanks for a very useful and well put together library!
I am writing a new transform and the JSON using it will be created/modified by semi-technical users of our product. I'd rather not have them enter the fully qualified classname for this custom transform.
Currently STOCK_TRANSFORMS is unmodifiable and final, so I don't think there is a way to register a new one.
I may be able to make the changes if you think this is useful and have an idea how it should best be done...
Thanks, Alfie.
You can use * logic to apply defaults to children object, but if you do so it "forces" the parent object to be created.
If added "@" to Defaultr we could control that.
Normal diff
utility supports the use of file descriptors as inputs, but this doesn't work with Jolt Diffy:
Old school diff:
diff <(cat myfile) <(some-script.sh)
Diffy:
jolt diffy <(curl http://www.example.com/file.json) <(some-script-that-outputs-json.sh)
If you have at least one file, @milosimpson and @snkinard noted that stdin
can be used to diff against a file as a workaround, although this syntax feels a little awkward:
curl -s http://www.example.com/file.json | jolt diffy some-file.json
Of course, it still doesn't work for the case where you want to use two file descriptors.
Hi Milo, guys,
I would like to help you with developing/improving JOLT as it looks very promising! Before I start submitting any big changes I would like to discuss them with you and get general approval for my ideas. So first a couple of questions about existing design. Here goes the first one:
Why Transform.transform() and similar methods parse and return Object? Did you consider using Jackson's JsonNode, etc. instead?
It looks like JOLT is already using Jackson and keeping and maintaining 'hydrated' object only make the library code less readable and harder to understand for a newcomer. What do you think about using 'native' Jackson's interfaces?
ChainrSpec
s are defined using JSON files. Currently, a Jolt user would have to rewrite the boilerplate to read from a JSON file into a ChainrSpec
for every project where it is used. A factory method that can abstract this away would be quite convenient.
When using a shortened version of a sub command, the CLI appears to be running in 'mode'. ex:
jolt diffy diff3.json diff4.json
The above writes output to standard out. The below does not
jolt diff diff3.json diff4.json
The two commands should behave the same or shortened sub commands should not be considered valid by the argument parser
This would be useful for implementing "backwards compatibility" of JSON data APIs.
Example, current code generates data in "1.5" format, but you need to be able to transform it "down" to 1.4, and 1.3 for existing clients.
Assuming that the changes between versions are mostly "minor", the idea would be to write Jolt transforms for 1.5 -> 1.4 and 1.4 -> 1.3.
In that case, Shiftr is not your friend, as he will only pass thru data that is explicitly in his spec. We need a new transform for the usecase of "I just want to adjust / nudge a little bit of my input JSON."
Hello,
Can you help me with my problem with transformation of input json file like this:
[
{
"a": "t1",
"b": "t2",
"c": "t3",
"d": "t5",
"e": "t8"
},
{
"a": "t1",
"b": "t2",
"c": "t3",
"d": "t6",
"e": "t9"
},
{
"a": "t1",
"b": "t2",
"c": "t4",
"d": "t7",
"e": "t10"
}
]
I need to get this json file:
{
"a": "t1",
"b": "t2",
"c": {
"t3": [
{
"d": "t5",
"e": "t8"
},
{
"d": "t6",
"e": "t9"
}
],
"t4": [
{
"d": "t7",
"e": "t10"
}
]
}
}
I don't understand how can I do this agregation with JOLT
I tried to create spec like this:
[
{
"operation": "shift",
"spec": {
"*": {
"a": "a",
"b": "b",
"c": {
"*": {
"$": "c"
}
}
}
}
}
]
And result is:
{
"a" : [ "t1", "t1", "t1" ],
"b" : [ "t2", "t2", "t2" ],
"c" : [ "t3", "t3", "t4" ]
}
Questions:
or maybe this problem cannot be resolved using lastest (0.0.12) version of JOLT?
Jolt should have a suite of command line tools, so one can utilize diffy and the various transforms at the command line
Something like a little ElasticBeanstalk or Heroku thing that can receive json run the transform and return.
Needs a pretty web page with reasonable examples pre-populated.
From @shawnsmith:
It's really common to forget to keep license headers up-to-date as code evolves. So when you're adding them to all the files I strongly suggest you look into something like http://creadur.apache.org/rat/ to fail the build when license headers are omitted. For example:
<build>
<plugins>
<plugin>
<groupId>org.apache.rat</groupId>
<artifactId>apache-rat-plugin</artifactId>
<version>0.9</version>
<configuration>
<excludes>
<exclude>.git/**</exclude>
<exclude>.gitignore</exclude>
<exclude>**/resources/*.txt</exclude>
</excludes>
</configuration>
<executions>
<execution>
<id>rat-check</id>
<phase>test</phase>
<goals>
<goal>check</goal>
</goals>
</execution>
</executions>
</plugin>
...
See also http://creadur.apache.org/rat/apache-rat-plugin/examples/custom-license.html
Hey,
do we have an option to "force" (e.g. fail where there is no such field) existence of a field that is being shifted to the other field? It would be very useful. If there's no such feature what do you think about adding it?
Aka what happens if two different places in the spec try to write to RHS ""?
Another design idea. I would like to introduce dependency on Guava in JOLT source tree (it's already there for tests) and start using Preconditions for validating arguments. Any concerns?
One of the changes would be to start throwing NPE instead of IAE exceptions for null method arguments.
Hi
https://github.com/bazaarvoice/jolt/blob/master/gettingStarted.md
with version 0.0.10 is not working. getting JoltBootStrap.class not found.
Example / Unit Test :
{
"input": {
"TAG-Sharpness$fr": "nettete",
"TAG-Sharpness#fr_fr": "nettete",
"TAG-Sharpness": "Sharpness",
"TAG-Bob": "smith",
"TAG-Bob$ge": "",
"ThisIsSillypantsValue" : "should be delted",
"buckets": {
"a$b": "AB",
"c$d": "cd",
"bucket-a$b": "ab"
}
},
"spec": {
"TAG-*$*": "",
"TAG-*#*": "",
"*pants*" : "",
"buckets": {
"a$*": ""
}
},
"expected": {
"TAG-Sharpness": "Sharpness",
"TAG-Bob": "smith",
"buckets": {
"c$d": "cd",
"bucket-a$b": "ab"
}
}
}
Right now Revovr is just a simple recursive parallel tree walk which does not handle JSON arrays.
For the purposes of this use case, it does not need to handle JSON arrays.
I the idea here is to parse the Removr spec into RemovrSpec nodes.
A RemovrSpecNode would have two lists of children, Literal and Computed (following the pattern from CompositeShiftrSpec). If the RemovrSpecNode has no children, then it can just remove if it matches.
If a RemovrSpecNode just has literal children, then it can just loop over the list of LiteralChildren and see if their keys are in the input.
If a RemovrSpecNode has both Literal and ComputedChildren, then it should evaluate the Literal children first, then the Computed ones.
Should be able to reuse the StarPathElement class, to help with the pattern matching.
input
"food_pairing": [{
"food_id": "7",
"food_name": "Salad"
}, {
"food_id": "12",
"food_name": "Fish"
}, {
"food_id": "13",
"food_name": "Shellfish"
}, {
"food_id": "19",
"food_name": "Vegetarian"
}]
ouput
"foods": "Salad, Fish, Shellfish, Vegetarian"
Can jolt achieve this transformation?
Jolt is an awesomely great idea, and the java/json community should be grateful that Milo and Sam implemented it.
Unfortunately, the start-up example is difficult to understand (it would have made sense to use one of the prototypical music discograpy or book catalog examples).
More importantly, the java code in "Getting Started" has a bug. Specifically, in the java code at:
Chainr chainr = new Chainr( chainrSpecJSON );
I get:
Exception in thread "main" java.lang.ClassCastException: java.util.LinkedHashMap cannot be cast to com.bazaarvoice.jolt.JoltTransform
They were all in the parent project when Jolt was extracted.
The Jolt project should use [TravisCI|https://travis-ci.org/](or some other tool) for continuous integration testing and real time code coverage stats in the readme
I am trying to convert this json:
{
"id": "1234",
"classification": "aclass",
"condition": "poor",
"links": [
{
"name": "self",
"href": "https://selfurl.com"
},
{
"name": "events",
"href": "https://eventurl.com"
}
],
"eventList": {
"totalCount": 2,
"criticalCount": 1,
"majorCount": 1,
"minorCount": 2,
"okCount": 0,
"informationalCount": 0,
"events": [
{
"id": "event1",
"elementId": "element1",
"requestedFields": [
{
"name": "Severity",
"value": "Fatal"
},
{
"name": "status_url",
"value": "https://statusurl.com"
}
]
},
{
"id": "event2",
"elementId": "element2",
"requestedFields": [
{
"name": "Severity",
"value": "Medium"
},
{
"name": "status_url",
"value": "https://statusurl.com"
}
]
}
]
}
}
with this spec:
[
{
"operation": "shift",
"spec": {
"id": "summary.id",
"classification": "summary.classification",
"condition": "summary.condition",
"eventList": {
"informationalCount": "summary.informationcount",
"majorCount": "summary.majorcount",
"minorCount": "summary.minorcount",
"okCount": "summary.okcount",
"totalCount": "summary.totalcount",
"events": {
"0": {
"*": {
"$": "headers[]",
"@": "rows[]"
}
},
"*": {
"*": {
"@": "rows[]"
}
}
}
}
}
}
]
which produces:
{
"headers" : [ "id", "elementId", "requestedFields" ],
"rows" : [ "event1", "element1", [ {
"name" : "Severity",
"value" : "Fatal"
}, {
"name" : "status_url",
"value" : "https://statusurl.com"
} ], "event2", "element2", [ {
"name" : "Severity",
"value" : "Medium"
}, {
"name" : "status_url",
"value" : "https://statusurl.com"
} ] ],
"summary" : {
"classification" : "aclass",
"condition" : "poor",
"id" : "1234",
"informationcount" : 0,
"majorcount" : 1,
"minorcount" : 2,
"okcount" : 0,
"totalcount" : 2
}
}
I want to make 'rows' an array of arrays where each event is stored in a nested array. I would also like to flatten the events and move any nested elements up a level. To do this the 'name' and 'value' values would have to become key/value pairs in the parent (event). So the output would be:
{
"headers" : [ "id", "elementId", "requestedFields" ],
"rows" : [ ["event1", "element1", "Severity" : "Fatal", "status_url" : "https://statusurl.com"], ["event2", "element2", "Severity" : "Medium", "status_url" : "https://statusurl.com" ] ],
"summary" : {
"classification" : "aclass",
"condition" : "poor",
"id" : "1234",
"informationcount" : 0,
"majorcount" : 1,
"minorcount" : 2,
"okcount" : 0,
"totalcount" : 2
}
}
Is this possible?
Thanks,
Rob
Remove the dependency on Apache Commons StringUtils.
Make a copy of the StringUtils class like ElasticSearch does.
ChainrFactory.fromClassPath is nice, but it only loads the spec. There is a lot of test code that is loading "input" and "expected". Would be nice if JsonUtils had the same "fromFile" and "fromClassPath" constructs.
For documentation purposes, it would be very handy to have comments in the JSON test fixtures.
First thing custom Java transforms tend to do is :
if ( ! obj instanceOf Map )
Either base class that does this for you, or change interface to have typed transform methods, with Chainr doing the instanceOf and calling the appropriate method.
It should just Transform what it can.
A Jolt CLI tool should be created. The Diffy CLI should become a sub command of this tool. Argparse4j should be able to facilitate this relatively easily with SubParsers.
A sub command for sortr should be implemented as well, with the option to pretty-print the output.
Reuse Shiftr tree walking infrastructure (like Cardinality does), but instead of writing data somewhere.
a) run specified java class and / or
b) mvel script
We are familiar with mvel from ElasticSearch scripting. It can be "compiled" once and run many times, which fit with the Jolt paradigm.
Some thing like
List<Object> list = Cardinality.many( Object input );
and
Object obj = Cadinality.one( Object input );
Sometimes you want all the arrays to be ArrayOrderOblivious, except for a special few.
Sometimes you want all the arrays to be Order correct, except for a special few.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.