Git Product home page Git Product logo

jolt's Issues

Changing date to string

I have a date as object in my input json . I want to store it as a string in my output json.
From what I understood Jolt doesn't do any type conversations. But I am wondering whether it is possible at all?

Spec errors should have location and context

I'm a fumble-fingered fugitive from typing class who couldn't data-enter a JSON file correctly if my life depended on it. So I really appreciate the fix you made in 0.13 that identifies row and column for JSON format mistakes. Unfortunately, even if I get the JSON format correct, I can still make Jolt DSL mistakes and then have no clue on where I did something wrong. For example, if I have a JSON-legal leaf that reads:

{ "value": "&$3.DollarsEarned" }

Then Jolt gives me the error:

DotNotation (write key) can not contain '@', '*', or '$'.

This is nice, but it does not give me row and column. I know, the location information is probably lost when the json file was sucked into a Chainr object, but there are ways to preserve it (as attributes called joltRow and joltColumn if nothing else; I have modified XML parsers to do the same thing). At any rate, without too much work, I changed line 46 of ShiftrWriter.java to read:

throw new SpecException("DotNotation (write key) can not contain '@', '*', or '$' at " + dotNotation + ".");

Now, the error output gives me some context:

DotNotation (write key) can not contain '@', '*', or '$' at root.&$3.DollarsEarned.

Giving context and location in every SpecException would be nice. The problem is that there are lots of them (which is a great thing, BTW). I'd change them all myself, except:

  1. I might break something,
  2. I'm not so familiar with the code that I know exactly how to get location.
  3. Some people might not like to see more error information (IMHO, they are silly), and
  4. Upper management might not approve of my giving away source code.

So all I can do is ask.

Pretty please.

LHS parent value reference

I need some assistance.

Input

{
   "entities":[
      {
         "type":"alpha",
         "data":"foo"
      },
      {
         "type":"beta",
         "data":"bar"
      },
      {
         "type":"alpha",
         "data":"baz"
      }
   ]
}

Desired Output

{
   "alpha":[
      {
         "type":"alpha",
         "data":"foo"
      },
      {
         "type":"alpha",
         "data":"baz"
      }
   ],
   "beta":[
      {
         "type":"beta",
         "data":"bar"
      }
   ]
}

Using this shiftr spec

[
    {
        "operation": "shift",
        "spec": {
            "entities": {
                "*": {
                    "type": {
                        "*": {
                            "$2": "&1.[]"
                        }
                    }
                }
            }
        }
    }
]

I get...

{
   "alpha":[
      "0",
      "2"
   ],
   "beta":[
      "1"
   ]
}

Is there way insert the value object found at that index, as opposed to the index? Something like "entities[$2]": "&1.[]"

ArrayOrderObliviousDiffy fails to identify identical json documents

Given two identical json document with differently ordered sub-arrays, jolt does not identify then as identical.

jolt diffy -a expected.json actual.json

expected.json

{
  "TagDistributionOrder" : [ "ProsGames", "ConsGames" ],
  "TagDistribution" : {
    "ProsGames" : {
      "Id" : "ProsGames",
      "Label" : "ProsGames",
      "Values" : [ {
        "Value" : "Can Withstand Use",
        "Count" : 3
      }, {
        "Value" : "Interactive",
        "Count" : 3
      }, {
        "Value" : "Thought Provoking",
        "Count" : 3
      }, {
        "Value" : "Entertaining",
        "Count" : 3
      }, {
        "Value" : "Fun",
        "Count" : 2
      }, {
        "Value" : "Easy To Play",
        "Count" : 2
      }, {
        "Value" : "Educational",
        "Count" : 1
      } ]
    },
    "ConsGames" : {
      "Id" : "ConsGames",
      "Label" : "ConsGames",
      "Values" : [ {
        "Value" : "Visually Unpleasing",
        "Count" : 3
      }, {
        "Value" : "Difficult Instructions",
        "Count" : 2
      }, {
        "Value" : "Unoriginal",
        "Count" : 1
      }, {
        "Value" : "Boring",
        "Count" : 2
      }, {
        "Value" : "Poor Quality",
        "Count" : 2
      } ]
    }
  }
}

actual.json

{
  "TagDistributionOrder" : [ "ConsGames", "ProsGames" ],
  "TagDistribution" : {
    "ConsGames" : {
      "Id" : "ConsGames",
      "Label" : "ConsGames",
      "Values" : [ {
        "Value" : "Visually Unpleasing",
        "Count" : 3
      }, {
        "Value" : "Poor Quality",
        "Count" : 2
      }, {
        "Value" : "Difficult Instructions",
        "Count" : 2
      }, {
        "Value" : "Boring",
        "Count" : 2
      }, {
        "Value" : "Unoriginal",
        "Count" : 1
      } ]
    },
    "ProsGames" : {
      "Id" : "ProsGames",
      "Label" : "ProsGames",
      "Values" : [ {
        "Value" : "Thought Provoking",
        "Count" : 3
      }, {
        "Value" : "Interactive",
        "Count" : 3
      }, {
        "Value" : "Entertaining",
        "Count" : 3
      }, {
        "Value" : "Can Withstand Use",
        "Count" : 3
      }, {
        "Value" : "Fun",
        "Count" : 2
      }, {
        "Value" : "Easy To Play",
        "Count" : 2
      }, {
        "Value" : "Educational",
        "Count" : 1
      } ]
    }
  }
}

Erroneous Output

Differences found. Input #1 contained this:
{
  "TagDistribution" : {
    "ConsGames" : {
      "Values" : [ {
        "Value" : "Difficult Instructions"
      }, {
        "Value" : "Poor Quality",
        "Count" : 2
      } ]
    }
  }
}
Input #2 contained this:
{
  "TagDistribution" : {
    "ConsGames" : {
      "Values" : [ {
        "Value" : "Poor Quality"
      }, {
        "Value" : "Difficult Instructions",
        "Count" : 2
      } ]
    }
  }
}

Add '#' wildcard to support the hokey but useful Map to Array transform

Currently Jolt can transform Arrays into Maps, but it can't do the opposite of transforming a Map into an Array.

The idea is to add an implicit counter that can be be used to "auto generate" array indices.

The usecase is hokey, in that Map to Array transform will not have a guaranteed order, but sometimes there doesn't need to be an order.

Shiftr # needs to be updated. It's scope was too narrow.

It's inception use case of counting the number of '' matches works, but it does it by rolling all the '' hits up to the parent.

It does not work if you have prefixed "stars", aka "rating-" and "cdv-". In that case you want the number of matches against those individual specs, and not the parent.

Test case here : https://github.com/milosimpson/jolt/blob/bb08587844c4a9319f76a3645e503b1672537ae6/jolt-core/src/test/resources/json/shiftr/prefixedStarsToLists.json

Would be useful to allow addition to STOCK_TRANSFORMS

Hi, I'm a new user of Jolt. First off, thanks for a very useful and well put together library!

I am writing a new transform and the JSON using it will be created/modified by semi-technical users of our product. I'd rather not have them enter the fully qualified classname for this custom transform.

Currently STOCK_TRANSFORMS is unmodifiable and final, so I don't think there is a way to register a new one.

I may be able to make the changes if you think this is useful and have an idea how it should best be done...

Thanks, Alfie.

Add "@" logic to Defaultr

You can use * logic to apply defaults to children object, but if you do so it "forces" the parent object to be created.

If added "@" to Defaultr we could control that.

Jolt Diffy subcommand doesn't support file descriptors

Normal diff utility supports the use of file descriptors as inputs, but this doesn't work with Jolt Diffy:

Old school diff:

diff <(cat myfile) <(some-script.sh)

Diffy:

jolt diffy <(curl http://www.example.com/file.json) <(some-script-that-outputs-json.sh)

If you have at least one file, @milosimpson and @snkinard noted that stdin can be used to diff against a file as a workaround, although this syntax feels a little awkward:

curl -s http://www.example.com/file.json | jolt diffy some-file.json

Of course, it still doesn't work for the case where you want to use two file descriptors.

Design discussion - explicit usage of Jackson

Hi Milo, guys,

I would like to help you with developing/improving JOLT as it looks very promising! Before I start submitting any big changes I would like to discuss them with you and get general approval for my ideas. So first a couple of questions about existing design. Here goes the first one:

Why Transform.transform() and similar methods parse and return Object? Did you consider using Jackson's JsonNode, etc. instead?

It looks like JOLT is already using Jackson and keeping and maintaining 'hydrated' object only make the library code less readable and harder to understand for a newcomer. What do you think about using 'native' Jackson's interfaces?

Provide a Chainr factory method/class

ChainrSpecs are defined using JSON files. Currently, a Jolt user would have to rewrite the boilerplate to read from a JSON file into a ChainrSpec for every project where it is used. A factory method that can abstract this away would be quite convenient.

CLI appears to run in "silent" mode with shortened sub-commands

When using a shortened version of a sub command, the CLI appears to be running in 'mode'. ex:

jolt diffy diff3.json diff4.json

The above writes output to standard out. The below does not

jolt diff diff3.json diff4.json

The two commands should behave the same or shortened sub commands should not be considered valid by the argument parser

Implement Nudgr aka Shiftr with implied @ at every level

This would be useful for implementing "backwards compatibility" of JSON data APIs.
Example, current code generates data in "1.5" format, but you need to be able to transform it "down" to 1.4, and 1.3 for existing clients.
Assuming that the changes between versions are mostly "minor", the idea would be to write Jolt transforms for 1.5 -> 1.4 and 1.4 -> 1.3.
In that case, Shiftr is not your friend, as he will only pass thru data that is explicitly in his spec. We need a new transform for the usecase of "I just want to adjust / nudge a little bit of my input JSON."

Transformation problem

Hello,
Can you help me with my problem with transformation of input json file like this:

[
    {
        "a": "t1",
        "b": "t2",
        "c": "t3",
        "d": "t5",
        "e": "t8"
    },
    {
        "a": "t1",
        "b": "t2",
        "c": "t3",
        "d": "t6",
        "e": "t9"
    },
    {
        "a": "t1",
        "b": "t2",
        "c": "t4",
        "d": "t7",
        "e": "t10"
    }
]

I need to get this json file:

{
    "a": "t1",
    "b": "t2",
    "c": {
        "t3": [
            {
                "d": "t5",
                "e": "t8"
            },
            {
                "d": "t6",
                "e": "t9"
            }
        ],
        "t4": [
            {
                "d": "t7",
                "e": "t10"
            }
        ]
    }
}

I don't understand how can I do this agregation with JOLT

I tried to create spec like this:

[
    {
        "operation": "shift",
        "spec": {
            "*": {
                "a": "a",
                "b": "b",
                "c": {
                    "*": {
                        "$": "c"
                    }
                }
            }
        }
    }
]

And result is:

{
  "a" : [ "t1", "t1", "t1" ],
  "b" : [ "t2", "t2", "t2" ],
  "c" : [ "t3", "t3", "t4" ]
}

Questions:

  1. How can I agregate "a" and "b" to one value (if all values are the same)?
  2. How can I add to elements of array "c" some children from top level of json?
  3. Maybe another questions will apear later

or maybe this problem cannot be resolved using lastest (0.0.12) version of JOLT?

Command Line Tools for Jolt

Jolt should have a suite of command line tools, so one can utilize diffy and the various transforms at the command line

Fail build when license headers are omitted from a file

From @shawnsmith:

It's really common to forget to keep license headers up-to-date as code evolves. So when you're adding them to all the files I strongly suggest you look into something like http://creadur.apache.org/rat/ to fail the build when license headers are omitted. For example:

   <build>
      <plugins>
        <plugin>
          <groupId>org.apache.rat</groupId>
          <artifactId>apache-rat-plugin</artifactId>
          <version>0.9</version>
          <configuration>
            <excludes>
            <exclude>.git/**</exclude>
            <exclude>.gitignore</exclude>
            <exclude>**/resources/*.txt</exclude>
            </excludes>
          </configuration>
          <executions>
            <execution>
              <id>rat-check</id>
              <phase>test</phase>
              <goals>
                <goal>check</goal>
              </goals>
            </execution>
          </executions>
        </plugin>
  ...

See also http://creadur.apache.org/rat/apache-rat-plugin/examples/custom-license.html

Design discussion - dependency on Guava in source code

Another design idea. I would like to introduce dependency on Guava in JOLT source tree (it's already there for tests) and start using Preconditions for validating arguments. Any concerns?

One of the changes would be to start throwing NPE instead of IAE exceptions for null method arguments.

Removr should be able to handle Left Hand Side "*" wildcards.

Example / Unit Test :

{
    "input": {
        "TAG-Sharpness$fr": "nettete",
        "TAG-Sharpness#fr_fr": "nettete",
        "TAG-Sharpness": "Sharpness",

        "TAG-Bob": "smith",
        "TAG-Bob$ge": "",

        "ThisIsSillypantsValue" : "should be delted",

        "buckets": {
            "a$b": "AB",
            "c$d": "cd",
            "bucket-a$b": "ab"
        }
    },

    "spec": {
        "TAG-*$*": "",
        "TAG-*#*": "",

        "*pants*" : "",

        "buckets": {
            "a$*": ""
        }
    },

    "expected": {
        "TAG-Sharpness": "Sharpness",

        "TAG-Bob": "smith",

        "buckets": {
            "c$d": "cd",
            "bucket-a$b": "ab"
        }
    }
}

Right now Revovr is just a simple recursive parallel tree walk which does not handle JSON arrays.

For the purposes of this use case, it does not need to handle JSON arrays.

I the idea here is to parse the Removr spec into RemovrSpec nodes.
A RemovrSpecNode would have two lists of children, Literal and Computed (following the pattern from CompositeShiftrSpec). If the RemovrSpecNode has no children, then it can just remove if it matches.

If a RemovrSpecNode just has literal children, then it can just loop over the list of LiteralChildren and see if their keys are in the input.

If a RemovrSpecNode has both Literal and ComputedChildren, then it should evaluate the Literal children first, then the Computed ones.

Should be able to reuse the StarPathElement class, to help with the pattern matching.

how do you do an array implode shift spec

input

        "food_pairing": [{
            "food_id": "7",
            "food_name": "Salad"
        }, {
            "food_id": "12",
            "food_name": "Fish"
        }, {
            "food_id": "13",
            "food_name": "Shellfish"
        }, {
            "food_id": "19",
            "food_name": "Vegetarian"
        }]

ouput

        "foods": "Salad, Fish, Shellfish, Vegetarian"

Can jolt achieve this transformation?

"Getting Started" java code breaks

Jolt is an awesomely great idea, and the java/json community should be grateful that Milo and Sam implemented it.

Unfortunately, the start-up example is difficult to understand (it would have made sense to use one of the prototypical music discograpy or book catalog examples).

More importantly, the java code in "Getting Started" has a bug. Specifically, in the java code at:

Chainr chainr = new Chainr( chainrSpecJSON );

I get:

Exception in thread "main" java.lang.ClassCastException: java.util.LinkedHashMap cannot be cast to com.bazaarvoice.jolt.JoltTransform

Create key/value pair from 2 values

I am trying to convert this json:

{
    "id": "1234",
    "classification": "aclass",
    "condition": "poor",
    "links": [
        {
            "name": "self",
            "href": "https://selfurl.com"
        },
        {
            "name": "events",
            "href": "https://eventurl.com"
        }
    ],
    "eventList": {
        "totalCount": 2,
        "criticalCount": 1,
        "majorCount": 1,
        "minorCount": 2,
        "okCount": 0,
        "informationalCount": 0,
        "events": [
            {
                "id": "event1",
                "elementId": "element1",
                "requestedFields": [
                    {
                        "name": "Severity",
                        "value": "Fatal"
                    },
                    {
                        "name": "status_url",
                        "value": "https://statusurl.com"
                    }
                ]
            },
            {
                "id": "event2",
                "elementId": "element2",
                "requestedFields": [
                    {
                        "name": "Severity",
                        "value": "Medium"
                    },
                    {
                        "name": "status_url",
                        "value": "https://statusurl.com"
                    }
                ]
            }
        ]
    }
}

with this spec:

[
    {
        "operation": "shift",
        "spec": {
            "id": "summary.id",
            "classification": "summary.classification",
            "condition": "summary.condition",
            "eventList": {
                "informationalCount": "summary.informationcount",
                "majorCount": "summary.majorcount",
                "minorCount": "summary.minorcount",
                "okCount": "summary.okcount",
                "totalCount": "summary.totalcount",
                "events": {
                    "0": {
                        "*": {
                            "$": "headers[]",
                            "@": "rows[]"
                        }
                    },
                    "*": {
                        "*": {
                            "@": "rows[]"
                        }
                    }
                }
            }
        }
    }
]

which produces:

{
  "headers" : [ "id", "elementId", "requestedFields" ],
  "rows" : [ "event1", "element1", [ {
    "name" : "Severity",
    "value" : "Fatal"
  }, {
    "name" : "status_url",
    "value" : "https://statusurl.com"
  } ], "event2", "element2", [ {
    "name" : "Severity",
    "value" : "Medium"
  }, {
    "name" : "status_url",
    "value" : "https://statusurl.com"
  } ] ],
  "summary" : {
    "classification" : "aclass",
    "condition" : "poor",
    "id" : "1234",
    "informationcount" : 0,
    "majorcount" : 1,
    "minorcount" : 2,
    "okcount" : 0,
    "totalcount" : 2
  }
}

I want to make 'rows' an array of arrays where each event is stored in a nested array. I would also like to flatten the events and move any nested elements up a level. To do this the 'name' and 'value' values would have to become key/value pairs in the parent (event). So the output would be:

{
  "headers" : [ "id", "elementId", "requestedFields" ],
  "rows" : [ ["event1", "element1", "Severity" : "Fatal", "status_url" : "https://statusurl.com"], ["event2", "element2", "Severity" : "Medium", "status_url" : "https://statusurl.com" ] ],
  "summary" : {
    "classification" : "aclass",
    "condition" : "poor",
    "id" : "1234",
    "informationcount" : 0,
    "majorcount" : 1,
    "minorcount" : 2,
    "okcount" : 0,
    "totalcount" : 2
  }
}

Is this possible?

Thanks,

Rob

Make jolt-core Pure Java

Remove the dependency on Apache Commons StringUtils.

Make a copy of the StringUtils class like ElasticSearch does.

Jolt CLI

A Jolt CLI tool should be created. The Diffy CLI should become a sub command of this tool. Argparse4j should be able to facilitate this relatively easily with SubParsers.

A sub command for sortr should be implemented as well, with the option to pretty-print the output.

Transform for data manipulation

Reuse Shiftr tree walking infrastructure (like Cardinality does), but instead of writing data somewhere.
a) run specified java class and / or
b) mvel script

We are familiar with mvel from ElasticSearch scripting. It can be "compiled" once and run many times, which fit with the Jolt paradigm.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.