bpsm / edn-java Goto Github PK

View Code? Open in Web Editor NEW

100.0 100.0 24.0 971 KB

a reader for extensible data notation

License: Eclipse Public License 1.0

Java 16.74% Clojure 83.26%

edn-java's People

Contributors

Stargazers

Watchers

edn-java's Issues

it should be possible to configure the parser with data

I'm thinking of something like:

Parser p = Parsers.newParser(…);
Parseable r = Parsers.newParseable( … );
Parser.Config cfg = (Parser.Config) p.nextValue(r);

Where the Parseable r reads edn that's something like this:

#bpsm.edn-java/parser-config {
    :listFactory #bpsm.edn-java/class "fully.qualifed.class.Name"
    ;; new fully.qualified.class.Name() returns a CollectionBuilder.Factory
    …
    :handlers {
        :my/base64 #bpsm.edn-java/method {
            :class #bpsm.edn-java/class "fully.qualified.class.Name"
            :name "staticFactoryMethod"
        }
        ;; fully.qualified.class.Name.staticFactoryMethod() 
        ;;     returns a TagHandler.
        …
}

Open design questions:

the example above makes heavy use of tags, distributing the work of parsing the configuration between the parser-config, class, and method tags. An alternative that leaves out the inner tags for brevity preferring instead to pack the knowledge into the handler for parser-config is conceivable.
the prefix #bpsm.edn-java is just made up. Convention elsewhere in the code base is to use #info.bsmithmannschott, but that's rather long -- though it's certainly unique.

Nestable / composable parsers

It would be very helpful to nest and compose parsers to deal with situations where you have a schema with different components that require different parsing logic.

For example consider the following hypothetical edn message:

{:vector-value [1.0 2.0 3.0 4.0]
 :attribute-list [:foo :bar]}

You might want a different parser for the vector-value to produce a specialised double[] array or similar, but stick with the regular parser for the attribute-list of keywords.

Implementation thoughts:

This would imply the user writing a customised parser for the overall map value, which could specify specific parsers for individual components
If we want to support arbitrary schemas then we would need to support arbitrary logic for nesting, e.g. I might have a vector that needed the 3rd element with custom logic, but only if the 1st element has the value "Herring"

Support # embedded in symbol names

// Symbols begin with a non-numeric character and can contain
// alphanumeric characters and `. * + ! - _ ?  $ % & = < >`. If `-`,
// `+` or `.` are the first character, the second character (if any)
// must be non-numeric. Additionally, `: #` are allowed as constituent
// characters in symbols other than as the first character.

It appears that edn-java parses symbols with embedded "#" incorrectly. Given the text [a#b {}] we expect a vector of these two items:

The symbol a#b
The empty map

Instead we get these two items:

The symbol a
The empty map, tagged as #b

Printer should be able to work with any Appendable (not just Writer)

Currently Printer requires a Writer. This is unnecessarily restrictive. We should be able to work with anything that's Appendable.

This follows up on issue-11.

EDN List, Vector types indistinguishable due to common RandomAccess interface

This may be a design decision, but it is not clear so here goes.

The documentation states "Lists "(...)" and vectors "[...]" are both mapped to implementations of java.util.List. A vector maps to a List implementation that also implements the marker interface java.util.RandomAccess."

However, due to this commit which changes the backing of the List type to ArrayList instead of LinkedList, there is no way to distinguish between Lists or Vectors when "roundtripped" through the parser.

This means that "(1, 2, 3)" and "[1, 2, 3]" will parse to identical values. Printing back the parsed List "(1, 2, 3)" to EDN via the printer will yield the Vector "[1, 2, 3]".

This is not caught in the unit tests because the method assertEquals merely tests that the parsed objects are equal (they are, as Vectors), but does not compare a second round of printing with the original string.

Teach edn-java to read namespaced maps as per CLJ-1910

See clojure/clojure@6d48ae3
See http://dev.clojure.org/jira/browse/CLJ-1910

Octal escapes in string and character literals?

#59 and #60 implement unicode escapes in character and string literals.

How about octal escapes?

I discovered that both clojure language reader and the edn reader from the official clojure github project - https://github.com/clojure/tools.reader - support this.

(Octal escapes in string literals come from Java, only that Java syntax for that is backlash followed by up to 3 digits, while in Clojure and in tools.reader exactly 3 digits are required.).

In string literals the syntax is baclash followed by 3 digits: \NNN. The first digit can be between 0 and 3, the last two digits are between 0 and 7.

For character literals the syntax is \oNNN. Again, the first digit is between 0 and 3, the last two are between 0 and 7.

$ clj -r
Clojure 1.10.1
user=> "aaa\062aaa"
"aaa2aaa"
user=> \o062
\2
user=> (require '[clojure.tools.reader.edn :as edn])
nil
user=> (edn/read-string "\"aaa\\062aaa\"")
"aaa2aaa"
user=> (edn/read-string "\\o062")
\2

The poposal in edn-format/edn#65 also includes octal escapes.

Not to say I personally need to use octal literals in my code. Just FYI. It may be good to have some consistency between EDN implementations.

Adopt X.Y.Z version naming

Currently we are using X.Y version naming. May I suggest we move to X.Y.Z?

This would be more consistent with other Maven/Clojure projects and make it easier to distinguish between major/minor/bugfix releases which we are likely to need ultimately.

Version 1.0?

Hi,
Thanks for the hard work you put to make this awesome library.
I see the library is at version 0.6, and no commit since summer last year.
Is the library stable enough to be used as is? Are there known problems to be aware of?
I am still in the process of learning - but I am happy to help.

With kind regards,
Nicolas

Add Java 8 to test with Travis too.

Thank you.

Support reading \uXXXX character literals

See edn-format/edn@ee49674

Fix broken GPG magic

Whatever magic set of configurations I made years ago to allow releases to Maven Central via Sonatype has ceased functioning for reasons I have not yet been able to diagnose. As a result 0.7.0 has been tagged, but not published on Maven Central.

[INFO] --- maven-gpg-plugin:1.6:sign-and-deploy-file (default-cli) @ edn-java ---
gpg: using "........" as default secret key for signing
gpg: signing failed: No pinentry
gpg: signing failed: No pinentry

I've googled around for what to do about "No pinetry" but not yet found anything that leads me to solution.

Remove checked exceptions from Printer interface

The printer interface currently declares checked exceptions (IOException).

I believe these should be removed for two main reasons:

Not all printers will perform IO (e.g. a specialised printer could build a byte array in memory)
It makes usage of printers less convenient for users (forced to add exception handling even where this is not necessary)

Comma character prints incorrectly

import us.bpsm.edn.printer.Printers;

public class CommaBug {

    public static void main(String[] args){
        System.out.println(Printers.printString(','));
    }
}

Running this program produces the following output

Exception in thread "main" us.bpsm.edn.EdnException: Whitespace character 0x2c is unsupported.
        at us.bpsm.edn.printer.Printers$11.eval(Printers.java:384)
        at us.bpsm.edn.printer.Printers$11.eval(Printers.java:357)
        at us.bpsm.edn.printer.Printers$1.printValue(Printers.java:142)
        at us.bpsm.edn.printer.Printers.printString(Printers.java:74)
        at us.bpsm.edn.printer.Printers.printString(Printers.java:56)

The expected output is

\,

mispelling, in the docs

java.meth.BigDecimal

should be

java.math.BigDecimal

PrintingExamples are sensitive to hash ordering

Release availability?

I wanted to try out the new 0.4.0 release, but wasn't able to find it on any of the usual public repos (Clojars, Maven Central). I could only find the snapshot versions.

Are the releases available anywhere that I should be aware of?

If not, I think they should be - it makes it much easier for people to pick up and run with the library

Values returned by Parser should be Serializable by default

For some use cases it is desirable that the Values produced by parsing EDN text be able to participate in Java Serialization.

Currently this is impossible because Keyword, Symbol, Tag, TaggedValue and DelegatingList do not implement Serializable.

Printer should also support pretty-printed output

Currently Printer produces output that is all in one line:

[{:a "asdfasdfasdfasdfasdfasdf" :b 1234 :c "uoiuojoijoijmoinoihohkjhlkjhlkjhu", :d #{ … } … } … ]

This is great for communication, since it's compact and no human has to be able to read it. On the other hand, it stinks for debugging scenarios or where edn data is stored in version control where it may be subject to merges.

Printer should support the option to format output in multiple lines with some amount of indentation to indicate logical nesting. It need not be highly configurable. It need not match the output of Clojure's pprint. It must be faster than Clojure's pprint.

add convenience factory methods to the various Named classes

e.g. Symbol.newSymbol(name) should construct a Symbol without a namespace. Keyword.newKeyword(ns, name) should be equivalent to Keyword.newKeyword(Symbol.newSymbol). Similarly for Tag.

symbols beginning with `+` must continue with a non-digit

"if -, + or . are the first character, the second character must be non-numeric."

performance tests need work

Commit 7e5bd61 merges an expanded performance-test branch to master, but this still needs work.

It's obvious that I don't know my way around Caliper, so I imagine potential improvements would become apparent by investing some time in learning that better.
Ideally, I'd like the benchmark results to provide results as characters/second, but I have no idea how I might go about doing that with caliper.
currently, benchmark times include the overhead of opening and closing the reader as well as the associated IO of the underlying characters. That has advantages, since it exercises the more complex of the two Parseable implementations. OTOH, parsing directly out of a String already in memory would produce less variable results.
I've not added performance tests for Printing. Printing itself is still in an early implementation, but it would be good to get something in place for detecting performance changes there too.

edn-java should provide a way to write edn

> and < symbols are not supported

ScannerImpl does not recognize symbols '>' and '<'.

doesn't read clojure ratios

;; clojure defacto-standard edn reader
user=> (clojure.edn/read-string "1/2")
1/2

// scala
val p = newParser(defaultConfiguration())
val r = p.nextValue(newParseable("{:x 1/2}")).asInstanceOf[java.util.Map[Keyword,Any]].toMap

us.bpsm.edn.EdnSyntaxException: Not a number: '1/'.
    at us.bpsm.edn.parser.ScannerImpl.readNumber(ScannerImpl.java:434)
    at us.bpsm.edn.parser.ScannerImpl.scanNextToken(ScannerImpl.java:153)
    at us.bpsm.edn.parser.ScannerImpl.nextToken(ScannerImpl.java:61)
        ...

Make strategy to determine if a sequence is vector or list pluggable

As suggested by abernard in a comment on issue 32:

On the issue of this handling Guava collections, I wonder if the code to determine List or Vector should be separated out into an interface. This would be something like:

interface SequenceTypeSelector {
    boolean isVector(Object o);
}

The selector could be attached to the ProtocolBuilder for the Printer (with a default implementation provided of course). Extending SequenceTypeSelector would allow custom dispatch for types, allowing the simple if-else select on java.util.RandomAccess, or a Map lookup for more complex type hierarchies.

This was in response to a comment of mine that I was losing the list/vector distinction in edn-java-guava since guava's immutable list implementations implement RandomAccess.

Review edn-format; compare to implementation

Issue 60 involves a requirement documented in edn-format that I'd not implemented because I missed it somehow.
I'd like to release 1.0, but before doing that it makes sense to review edn-format one last time to make sure that there's not something else I've missed.

Is it possible to grab this as a .jar package somewhere?

I am trying to parse a small bit of EDN in Processing and the only way I can import this is by using it as a 'library' via a compiled .jar.

Keywords should be interned during parsing

"If the target platform supports some notion of interning, it is a further semantic of keywords that all instances of the same keyword yield the identical object."

Reader flag to support unicode escapes in string literals

Despite the edn format doesn't specify unicode escapes in string literals (unlike for characters), in practice, it is very inconvenient sometimes. Optional support for unicode escapes in string literals, managed by reader config flag is very desirable.

Duplicate map keys are parsed without error

{:a 1 :a 2} parses without error in edn-java

whereas

user=> (clojure.edn/read-string "{:a 1 :a 2}")

IllegalArgumentException Duplicate key: :a  clojure.lang.PersistentArrayMap.createWithCheck (PersistentArrayMap.java:71)

The edn spec states that keys should appear "at most once" so I think an error should be reported in this scenario.

Parser should be able to work with any Readable (not just Reader)

This follows up on issue-11

This would open up CharBuffer and the like as possible sources. This will take some doing, as a raw Readable does not support character-at-a-time reading, which is what our Scanner needs internally.

Teach edn-java to optionally print namespaced maps as per CLJ-1910

See clojure/clojure@6d48ae3
See: http://dev.clojure.org/jira/browse/CLJ-1910

I don't think this new printing behavior should be the default as this could produce output that older versions of edn-java could not read by default.

CI Support with Travis?

I've found Travis CI to be a pretty good tool for continuous integration and testing.

https://travis-ci.org/

Ben - want me to add Travis CI support for edn-Java?

It basically requires:

One small config file in the root directory.
Anyone who wants to use Travis to sign into Travis with their GitHub account and switch on the CI for edn-java
(optional) include a CI build status on the README.md for the front page (Ben - would probably require you to use your account for this if you want to show the status on master)

Clojure's edn-read allows unicode in names, should edn-java do the same?

parser should return immutable lists, sets and maps

The LIsts, Sets and Maps returned by Parser should be immutable by default. The simplest way to achieve this within the JDK is to wrap them in Collections.unmodifiableXXX before returning them.

"Thus the resulting values should be considered immutable, and a reader implementation should yield values that ensure this, to the extent possible."

0.7.0 or 0.7.1

The pom.xml contains version 0.7.0, but the README mentions 0.7.1.

Is that a mistake in README? Or the pom.xml change is not pushed to github?

Switch branching strategy to "git-flow"

I intend to switch the branching model of edn-java git-flow.

This entails:

master will be reset to point to the same revision as the 0.4.0 tag.
future commits on master will only be actual releases (not SNAPSHOT)
the develop branch takes over the role currently played by 'master', but will build as version develop-SNAPSHOT.
releases will be prepared on temporary release branches originating from develop, with merge back to develop and master.
we decide on the release branch which version the release in preparation will have.
merging a release branch to master is making a release. The resulting revision will be tagged accordingly.
feature branches originate on develop and merge back to develop.

The git-flow model strikes me as a clean way to manage branches.

It has the advantage, on git hub, that the branch users see by default 'master' will show them the README of most recent stable release.

One potential drawback is that there's a monotonicity to putting all releases on master.

Consider this hypothetical: we release 2.0.0 but need to continue maintenance of 1.1.x, for some time because it takes users a while to upgrade to 2.0.0. Git-flow doesn't make explicit allowances for this, but it seems it could be addressed by branching "master-1.1.x" from the last 1.1.x tag and treating it like a second "master" branch.

I don't consider it likely that I'll need to maintain two production versions of edn-java in parallel at this stage in its life cycle (or really, ever), so I think this potential drawback is acceptable.

edn-java doesn't recognize 'foo//' as a symbol

I suspect this is a defect. Awaiting feedback on edn-format/edn#51 to know how to proceed.

QuickCheck to test?

Hi,

Any chance using QuickCheck https://github.com/clojure/test.check to test?

Thank you.

Symbol.checkName() should forbid names of the form "^[+][0-9].*"

See https://github.com/edn-format/edn#symbols

Which states that symbols beginning with +, must continue with a non-digit.

Currently we catch this for namespace-less symbols and for prefixes of symbols but don't catch it if the symbol has a legal prefix, but the name itself violates this rule.

This test should pass (by having scan throw an exception):

@Test(expected=EdnException.class)
public void symbolNameStartsWithPlusDigit() {
    scan("foo/+4blah");
}

Currently this test fails.

Iterate over list recieved via get.

Am storing parsed EDN in a map m

m.get(Keyword.newKeyword("modules"));

Under the keyword modules I have a list of maps:

[
{:active=1, :addr=10657, :sensors=[520, 519, 0, 0]}, 
{:active=0, :addr=8217, :sensors=[212, 520, 0, 0]}, 
{:active=0, :addr=0, :sensors=[0, 0, 0, 0]}
]

That's parsed. (if I println(m.get(Keyword.newKeyword("modules")));)

My question is how would I iterate through the [] list and access each of the maps directly. ( I haven't been able to do it as I am getting an error "... cannot convert from capture#2-of ? to ..." )

Thanks.

single quote in a string is incorrectly escaped

Printing a string containing single quotes will incorrectly escape these quotes with a backslash. In a groovy shell:

> System.out.withWriter("UTF-8") { us.bspm.edn.printer.Printers.newPrinter(it).printValue("a 'b' c") }
"a \'b\' c">

Version affected: 0.4.0

use shorter groupId and unify with package prefix

info.bsmithmannschott OK as a groupId, but very long as a package prefix. I'd like to use the same thing for both, so something shorter is desirable.

I've registered the domain bpsm.us, which is short enough.

GroupId will change from info.bsmithmannschott to us.bpsm.

For consistency, the common package prefix should should be changed from bpsm.edn to us.bpsm.edn, though this will cause some complication for merging back branches created before this switch.

eliminate dependency cycle between `…parser` and `…parser.inst`

edn-java should be free of cyclic dependencies between packages. The solution, in this case, is to pull the contents of us.bpsm.edn.parser.inst into us.bpsm.parser.

protocols implementation is a crude hack

The protocols implementation the printer builds on is a hack, particularly WRT how it detects and deals with ambiguity. Is there some better way to implement this? Which implementation do we choose if more than one could apply? Currently we require an explicit binding for the Object's concrete class.

Things to look at: for ideas:

How Clojure deals with this in its own multimethods.
How python deals with method lookup, seeing as it supports multiple inheritance.

bpsm / edn-java Goto Github PK

edn-java's People

Contributors

Stargazers

Watchers

Forkers

edn-java's Issues

Recommend Projects

Recommend Topics

Recommend Org