It would be very helpful to nest and compose parsers to deal with situations where you

The use case I have in mind is: You need to read a specific ed

I don't intend edn-java as a generic serialization mechanism for general object

Example API that I might want to build: <div class="snippet-clipboard-content notr

Nestable / composable parsers,about bpsm/edn-java

Comments (27)

mikera commented on September 26, 2024

I guess this could work something like a parser combinator library - if you had specialised parsers then you could do something like:

Parser p=Parsers,customMapParser(new ParserMapEntry(":vector-value",doubleVectorParser));

Or if you wanted a parser for vectors with a specific type of child:

Parser p= Parsers.vectorOf(childParser);

For more info on parser combinators:
http://en.wikipedia.org/wiki/Parser_combinator

from edn-java.

bpsm commented on September 26, 2024

No thanks. I don't intend for edn-java to become a parser combinator library. The extension mechanism you're proposing seems to overlap strongly with #extension/mechanism that's already built into edn. Also, I'm not seeing a concrete use case here that would necessitate this change.

An alternative to what you sketched above:

{:vector-value #some.ns/doubles [1.0, 2.0, 3.0, 4.0] ... }

This solution would imply constructing a List<Double> temporarily before the #some.ns/doubles handler transformed that to a double[]. If that turned out to be an actual performance problem for some use case, one could still consider giving the CollectionBuilder knowledge of the #some.ns/doubles so that it could decide to do something more optimal in that case.

But, if performance is a real problem and you find yourself shipping around huge vectors of doubles, I'd submit that a binary format would be more suitable in any case.

from edn-java.

mikera commented on September 26, 2024

The use case I have in mind is:

You need to read a specific edn representation e.g. "[[1 2 3] [4 5 6] [7 8 9]]"
You want to map this representation to generate a specific concrete class (e.g. a Matrix)

Guess it's a decision for whether we want to support this capability or not in edn-java: if not then someone with the use case above would need to build an extra conversion layer on top to convert to the correct target classes. Not a big deal I guess, but I suspect it would lead to a lot of wheel-reinventing.

Tagged extension mechanisms aren't really a solution if either: a) you want to use a specific simple, uncluttered edn representation or b) you don't have control over the input format. Tagged extensions seem much more useful for the case where you have unstructured data / don't have a defined schema in advance.

from edn-java.

mikera commented on September 26, 2024

Pragmatic suggestion - how about we leave this capability out of edn-java in the name of simplicity, but consciously design the Parser in a way that it is open to extension.

That way if somebody wants to build an edn-based schema validator or parser combinator library on top then it would be relatively easy to do so?

from edn-java.

bpsm commented on September 26, 2024

I don't intend edn-java as a generic serialization mechanism for general object graphs. There are plenty of other formats that do that well enough through implementations that pay the resulting complexity cost.
Schema support is also currently a non-goal. I could imagine that being useful in the future, but I'd rather extend the parser then -- with a concrete use case -- than try to guess now about what a hypothetical schema checker might need.

https://github.com/edn-format/edn :

edn is a system for the conveyance of values. It is not a type system, and has no schemas. Nor is it a system for representing objects - there are no reference types, nor should a consumer have an expectation that two equivalent elements in some body of edn will yield distinct object identities when read, unless a reader implementation goes out of its way to make such a promise. Thus the resulting values should be considered immutable, and a reader implementation should yield values that ensure this, to the extent possible.

I don't see a pressing need to make the current parser arbitrarily extensible given that a working edn parser can be written in a weekend and given that this edn parser is open source available for forking and modification by anyone who wishes to do so.
Also, I consider this a useful warning.

from edn-java.

mikera commented on September 26, 2024

Well the main reason I'm interested in edn-java as I see it is a very useful tool for "the conveyance of values", it's just that in Java the source value (for writing) and the desired end result (for reading) is more often than not an instance of some arbitrary class rather than a nice tree of nested lists and hashmaps (as it is in Clojure).

Or put it another way, it is an absolute requirement that I can do the following conversion effectively:

edn format -> MyJavaDataStructure

I strongly suspect that other potential users of edn-java will need to do the same thing.

So here are my options:

I can write a new edn parser for my own use cases in a weekend. Could even be fun. But I'd rather not do that - that way lies "The Curse of Lisp" (http://www.winestockwebdesign.com/Essays/Lisp_Curse.html if you haven't read it already)
I could write a separate layer that parses already parsed edn-java trees into MyJavaDataStructure or any other Java classes. Seems ugly - apart from the unnecessary runtime overhead of going via an intermediate representation, it seems a rather crazy design to layer a parser library on top of a parser library. I might as well do 1.
I could fork edn-java and rewrite it to my needs. Actually probably the easiest option for me since there is a lot of good code here already, but I'd rather not fork edn-java because I'd rather work with you to contribute to a definitive, high quality library.
We could make edn-java into an all-singing all-dancing parser library full of combinators and schema generation etc. I agree this would be a mistake - it's always better to keep libraries simple and focused on a core task
We can include enough extensibility in edn-java that I can implement my use cases quickly (which would be done outside edn-java, most likely by subclassing some Parser implementation and maybe writing one or two trivial combinators). This doesn't imply including any parser combinators in edn-java itself, but it does mean guaranteeing enough extensibility in the public API to support this use case.

My preferred option is no. 5, but obviously that needs your agreement as a design goal. I'm totally happy to write and maintain any code needed to make it work (which I think will be pretty small in any case, probably just a couple of extensible methods that can be overridden and a set of test cases to prove they work)

What do you suggest is the best way forward?

from edn-java.

bpsm commented on September 26, 2024

Well the main reason I'm interested in edn-java as I see it is a very useful tool for "the conveyance of values", it's just that in Java the source value (for writing) and the desired end result (for reading) is more often than not an instance of some arbitrary class rather than a nice tree of nested lists and hashmaps (as it is in Clojure).
Or put it another way, it is an absolute requirement that I can do the following conversion effectively:
edn format -> MyJavaDataStructure

This, it seems to me, is precisely the sort of thing that edn's tagged values are intended for. Why would they not work for you? Do you just not want to have to include the tags in your edn?

#my.package/MyClass { ... }

Parses as an instance of MyClass. Not ok?

{ ... }

Parses (magically) as an instance of MyClass. ok?

Assuming, I've understood your intent, what happens when you have MyClass and MyOtherClass. How does the extensible parser know which map maps to the former and which to the latter?

Also, have you considered that it might be possible to achieve your desired behavior by providing a custom map builder factory? I don't believe there's anything in edn-java that says the map factory actually has to return a map. (If so, then it's a bug and can be changed.)

So, wouldn't it be possible to pack the smarts you need into a set of appropriately intelligent collection builder factories?

from edn-java.

mikera commented on September 26, 2024

Two main reasons why tags don't seem satisfactory for my purposes:

I don't want to force my users to tag values in situation where the tags are clearly redundant (which is quite often when you are using a well-defined schema - the "tag" is implied by the context and structure of the data)
I can foresee situations where you don't fully control the edn format produced (e.g. results coming out of datomic perhaps?) but you still want to parse to a specific target class. Tags don't help you here.

The way I see it, tagged values are there to allow extensibility when the reader doesn't know what sort of value to expect - the tag is a way of removing ambiguity and saying "this value is special and should be handled in a specific way". My case is the opposite: The reader knows exactly how the next value should be handled (and indeed, any other kind of value could be regarded as an error).

I actually tried going the custom map / vector collection builder route as my first idea. This works well enough at the top level but then doesn't work for nested structures which require sub-elements to be interpreted differently (hence the title of this thread....)

from edn-java.

bpsm commented on September 26, 2024

I can see wanting automatic schema-guided unmarshalling of edn text into Java POJOs, when I put on my "Conventional Java Programmer" glasses, but, this completely misses the point of edn. Edn is about values. Maps are maps, not objects. If something whose data cand be written as a map is to have other semantics, that's what a tag is for. The whole point of edn is that you don't need a schema to make sense of the data.

Going the way you are suggesting would mean at least:

a schema language
the ability to validate a given edn value against a given schema.
an interpreter for the schema language that knows how to use it to guide the creation and initialization of POJOs from edn values described by that schema.

If you want to go this route, I would suggest option 2: traverse the values returned by the parser and translate them to POJOs in a separate pass. This strikes you as clumsy, but it's the most stable API to build atop you're likely to find in this library since it is a one-to-one reflection of what edn actually is: structured values.

I would be open to supporting additional hooks/extensibility to the parser to support your use case, if this can be achieved cleanly and without adding undue complexity to the parser's existing control flow. I'm not at all clear on what such an extension might look like though. Perhaps you could help me with a concrete example:

Let's suppose we have the following Java classes:

class Point { int x, y; }
class Rect {Point upperLeft, lowerRight; }
/* just imagine that Rect and Point have getters, 
setters, constructors whatever you might need */

Now, I'm going to claim that idiomatic edn might encode this so:

#geometry/rectangle {
  :upperLeft #geometry/point {:x 1, :y 2},
  :lowerRight #geometry/point {:x 3, :y 4}}

How this would be parsed should be pretty obvious. We'd just have to configure the parser with handlers for geometry/rectangle and geometry/point.

But, you're supposing that you're going to get data like this:

{:upperLeft {:x 1 :y 2}, :lowerRight {:x 3, :y 4}}

What kind of an API do you imagine for configuration the parser and how would you want to use it to produce a result equivalent to?:

new Rect(new Point(1, 2), new Point(3, 4));

from edn-java.

mikera commented on September 26, 2024

Example API that I might want to build:

Keyword kx=Keyword.intern(":x"), ky = Keyword.intern(":y");
Parser doubleParser=BasicParsers.DOUBLE_PARSER;
HashMap<Keyword,Parser> pointMap=new HashMap<Keyword,Parser>();
pointMap.put(kx,doubleParser);
pointMap.put(ky,doubleParser);
Parser pointParser=new CustomParsers.MapParser(pointMap) {
  public Point construct(Map<Object,Object> data) {
    return new Point((Double)data.get(kx),(Double)data.get(ky));
  }
}
Keyword kul=Keyword.intern(":upperLeft"), klr = Keyword.intern(":lowerRight");
HashMap<Keyword,Parser> rectMap=new HashMap<Keyword,Parser>();
rectMap.put(kul,pointParser);
rectMap.put(klr,pointParser);
Parser rectParser=new CustomParsers.MapParser(rectMap) {
  public Rect construct(Map<Object,Object> data) {
    return new Rect((Point)data.get(kul),(Point)data.get(klr));
  }
}

This probably could be simplified and tidied up with some better factory methods and/or use of generics but hopefully you get the idea..... the main point is that I now have a parser that outputs POJOs according to whatever edn schema I specify.

Hopefully you see that this is much nicer than forcing users to write an extra layer that does the POJO building on top of an already-parsed tree of edn data.

The option is left open to construct such parsers from a schema definition - don't yet need this myself but I could see it being helpful as a potential future extension that might make sense for some use cases.

So the hooks I think I need are:

Some sort of abstract base parser that I can extend to build the custom parsers. Would probably need to encapsulate the scanner functionality but not much else.
Abstract base parser needs to provide a nextToken(Parseable) implementation
I guess I need the Token enum to be public
It might be helpful to get access to the collection builders and associated factories. Not sure about this one though - it might make more sense for edn-java to keep these private as implementation details.
It would be nice if I could pass one of the custom parsers back to edn-java and associate it with a given tag. I think this requires either changing the current TagHandler interface or providing a new/updated interface that can do transform(Tag, Parseable)

Make sense?

from edn-java.

bpsm commented on September 26, 2024

I've been thinking about this, while preparing the other two branches for merging, but my opinions haven't changed much:

I think the very idea goes against the core intent of edn.
However, this doesn't prevent it from being useful to a Java dev.
I think the correct place for this is layered on top of the values produced by the Parser, not somehow intertwined with the parsing itself.
I'll try this out on a branch and see how it feels.

from edn-java.

bpsm commented on September 26, 2024

Please see a98a0e8 and let me know if that could (be extended to) address your use case.

from edn-java.

mikera commented on September 26, 2024

Hi Ben,

Had a look - some nice ideas, I like the fact that your approach is nice and standalone and doesn't require extensive hacking inside the core code.

Though ultimately this is basically a very simple version of my option 2) above and I still have a couple of big issues with this approach:

It's a parser built on top of a parser. I think this is less "simple" than extending a custom parser - look at the "pointParser" and "rectParser" definitions in my sample code that expresses how simple I think this should be.
It goes via an unnecessary intermediate representation. Building intermediate maps etc. is expensive compared to just copying tokens into the final constructed object. This is incidental complexity that IMHO isn't necessary - with the right hooks as I outlined above the unmarshaller should be able to access the raw stream of tokens for this purpose.

Regarding your specific implementation:

The Unmarshaller seems to be fixed around the idea of managing a specific map<->field mapping. This is fine but it is just one special case: you might also want to unmarshall using arrays, custom lists, lookups into static data structures , calls to a builder API etc.
I think the API is complex compared to the API I outlined above (which is effectively equivalent to the code in UnmarshallersTest.java). Basically you are building fieldhandlers, passing them to an unmarshalling config builder object, building an unmarshaller and layering the result on top of a regular parser. So your user has to deal with at least 4 abstractions. If you just allow users to extend a custom parser, they only need to deal with 1 abstraction (the parser itself).

If I get a free hour or two this week I'll try a quick implementation of my proposed approach to see how it compares.

from edn-java.

bpsm commented on September 26, 2024

If I get a free hour or two this week I'll try a quick implementation of my proposed approach to see how it compares.

That'd be great!

from edn-java.

mikera commented on September 26, 2024

OK, I hacked something together this morning that works reasonably well:

https://github.com/mikera/edn-java/tree/topic/unmarshal

I built it on top of your unmarshall branch in case we want to compare side-by-side

Key features:

Custom parser for building Java arrays (see CustomParsers.ArrayParser)
Custom parser for objects constructed from maps (see CustomParsers.MapParser) - this is roughly equivalent to the API I oulined above
Proves that the nested parser concept works (see CustomParsers.VectorParser) - parses a vector of anything
Specialised parsers for Long and Double. Mostly just a POC, but the idea is that by nesting these you can get some level of basic schema validation
It turned out that it was pretty uneasy to make CustomParser use Java generics for the returned object type - I think this is nice for Javaland users
Custom parsers get access to the raw stream of tokens if they need it.

Main trickiness was the fact that we don't have lookahead on the Scanner, so I had to do a slightly ugly nextValue(Object, Parseable) hack to get one-element lookahead. This works up to a point.... but isn't very nice and means we can't easily do things like call back into the generic edn-java parser implementation.

Having gone through this exercise, the thought occurs to me that maybe adopting Clojure's "sequence" abstraction for the scanner would be a win - we can make the scanning work like an immutable lazy list. This would solve a lot of problems around lookahead of tokens since you can keep a reference to the head, while also allowing the parser and scanner to behave as immutable objects.

Anyway, let me know what you think....

from edn-java.

bpsm commented on September 26, 2024

I'm still digesting your branch, so I won't comment on that now.

Regarding this:

Having gone through this exercise, the thought occurs to me that maybe adopting Clojure's "sequence" abstraction for the scanner would be a win - we can make the scanning work like an immutable lazy list.

I considered this at one point or another at all three levels (the character stream, the token stream, and even the parser).

Lazy sequence implies at least one memory allocation per item, so the character stream was right out for (hopefully obvious) performance reasons.

The Scanner is designed deliberately to either produce objects that the parser will be able to hang into the parse tree directly, or produce objects that require no allocation. (That's why Token is an enum.) Now I cheated a little for simplicity of implementation when it came time to parse integers: I always convert to a BigInteger first, before deciding wether to reinterpret the results as some kind of fixed-length number.

But, let's assume we replace our Scanner with what I'll call a TokenSeq:

TokenSeq {
    Object first(); // Token.*, String, BigInteger, …
    TokenSeq next(); 
}
// For the final node of any TokenSeq:
//   first() => Token.END_OF_INPUT
//   next() => this (returns itself).

Here are the consequences:

Scanner is again stateful and holds a copy of its source-of-characters (which we currently call Parseable).
We must be careful not to unintentionally hold on to head.
We incur an overhead of allocating and initializing one TokenSeq instance per scanned Token, even when not requiring look-ahead.

Is that worth it? Honestly, I don't know, but I'd be willing to try it out because now I'm curious.

What about the Parser?

ValueSeq {
    Object first(); // Map, List, String, BigInteger, … , Token.END_OF_INPUT
    TokenSeq next();
}

from edn-java.

bpsm commented on September 26, 2024

The first road-block in making TokenSeq work is this:

Currently we pass Parseable to our Parser. The reason we can call Parser.nextValue() repeatedly on the same Parseable and get successive values is because Parseable is mutable. In effect, Parser.nextValue() "returns" two things: the actual value parsed and the fact that Parseable has advanced by N characters.

So, how do we make this work for TokenSeq, which is immutable by definition? We can't pass it around the parser as a parameter between methods. How would the parse methods communicate to their caller the tail of TokenSeq that must still be considered the next time around?

I see three options:

Parser.valueOf() (and helper methods) must return the tuple (Value, TokenSeq), and the caller must know to unpack this and pass the contained TokenSeq to the next call to Parser.nextValue()
Parser must contain a mutable reference to the TokenSeq, which must be mutated every time we advance one token. That makes Parser once again stateful.
Parser must itself by a Seq, such that first() returns a parsed value and next() returns the Parser for the next (top-level) value. This implies that library users must take care not to hold onto the head, resulting in a public API that's hardly idiomatic Java.

I find 1 unacceptable and 3 problematic. That leaves only 2.

But, that won't fit with your ideas about nestable parsers, since presumably you'd want to reuse such a complex thing once you'd gone though the effort of building it up. But, that won't fly since binding Parser to TokenSeq to Parseable necessarily means Parser is single-use. The consequence there is that you'll need to separate the configuration describing your bunch-of-parsers (immutable and reusable) from the actual bunch-of-parsers (mutable and single-use).

This is one of the two reasons why I chose to separate Parser.Config from Parser (when Parser was still mutable). The second reason is that I was already planning on #20 when I made that design choice.

from edn-java.

mikera commented on September 26, 2024

Your points are all extremely valid - it's a tricky tradeoff.

The way to handle 1) in a functional style would (I think) be to have parse map a lazy sequence of tokens to a similar lazy sequence of object values. Very idiomatic in Clojure, not so much in Java where as you say you need to do a certain amount of unpacking (though it would be easy to add a convenience function to just get the first result....). Also the extra allocations are unavoidable (though I still think is is only going to be O(n) for a n-token input which you might consider acceptable)

I had a look at how a couple of other parsers do it.

jparsec uses a mutable "ParseContext" which encapsulates a position, a result and a few other things. Parsers are immutable, and can parse CharSequences / Readables directly by constructing a ParseContext on demand.

http://jparsec.cvs.sourceforge.net/viewvc/jparsec/jparsec/src/org/codehaus/jparsec/Parser.java?hideattic=0&revision=1.1.2.55&view=markup

Clojure makes heavy use of pushback readers. Interestingly it defines reader functions that have a method invoke(Object reader, Object leftbracket) e.g LispReader.VectorReader which is exactly analogous to the approach I took in my CustomParser implementation. I assume it is provably the case that Clojure / edn only ever need one token of lookahead?

https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/LispReader.

Scala's parser library appears to use the lazy sequence approach, and returns an immutable result object that encapsulates the tail of the input token stream (closest your option 1 above)

http://www.scala-lang.org/api/current/scala/util/parsing/combinator/Parsers.html

Conceptually, I still really like the token seq approach (for elegance, functional style, flexibility) but I agree that it will have performance issues and may seem strange to Java-oriented users so we may want to rule it out on those grounds.

Having gone through all this stuff my current view is that the best two options are:

Having a ParseContext for the mutable state, which keeps (at a minimum) the current token and a Readable / Parseable / Some other stream representation for the rest of the input. Interestingly, I think users only need to know about / access ParseContext directly if they want to do sequential parsing off the same stream. This could be the right approach if we think we ever need any other parsing state (line numbers? errors?).
Keep it pretty much as-is (i.e. keep the mutability in Parseable), but pull up the nextValue(Object firstToken, Parseable pbr) method from CustomParser into Parser. This enables us to do very efficient parsing similar to Clojure's approach, and works better for nested/composable parsers. Prevents us from doing arbitrary-lookahead parsing, but I still can't (currently) think of a case where we actually care about arbitrary lookahead.

from edn-java.

bpsm commented on September 26, 2024

Yuck.

I've implemented TokenSeq as a public interface to the capabilities of the Scanner. You can have a look at it on the topic/token-sequence branch.

I investigate your two suggested options tomorrow when I've had more sleep. I think the second one could work well enough.

In terms of lookahead: scanning edn into tokens requires the scanner be able to consider two characters from the stream. The parser for edn only has to consider one token at a time from the scanner. Your custom parsing doesn't really require more than one token either, I think, it's just that it divides the work such that it needs to look at the current token more than once before advancing the stream.

from edn-java.

mikera commented on September 26, 2024

TokenSeq implementation looks pretty sane. Quick comments:

I'm assuming it is only intended for single threaded use.
Using (first==null) as the test for whether the TokenSeq has been realised yet doesn't allow for null values. Though maybe that isn't an issue for edn-java since we use Token.NIL?
Did you rule out the Clojure-style approach of using null as the end of sequence marker? I see we get an infinite sequence of END_OF_INPUT at the end of the stream, which I guess is fine but might confuse people more used to the Clojure idiom. Maybe I'm just paranoid about making a finite sequence behave like an infinite sequence :-) how about at least throwing an exception if you call rest() beyond the end of the input?

A possible improvement that occurs to me is to set the Parseable to null to indicate that the next item has been realised. This has a few potential advantages:

You can now use null to indicate end of sequence, Clojure style if desired
It allows nulls in the token sequence.
Make the first field final and populate it on TokenSeq construction (since you now don't need to mutate it to indicate whether the tokenseq is realised). This should be an efficiency win.
Avoids "holding onto the head" by allowing the Parseable to be GC'd as soon the TokenSeq is fully realised. Probably irrelevant most of the time, though I suspect there could be some corner-case instances where there is a large set of data being held behind the Parseable reference.

On the lookahead point, the reason I found the need to keep the firstToken separately as a parameter is this: suppose you have a custom parser that is building something from a vector of objects, and it is halfway through processing the vector. When it reads the next token, it could be either a ] (which would signal the end of the vector) or anything else (which would signal the start of the next object contained within the vector). In the latter case, it needs to keep the token so that it can pass it to the next parser in the chain to parse the object.

I see you take this approach anyway in ParserImpl as a private implementation detail.

Alternatives would be:

Pushing back the token into the Parseable, but that is nasty at token level since you would have to print it!).
Having a peekToken method that doesn't consume the current token. Though this is problematic because either it needs the scanner to be stateful to remember the current token or it needs to scan the token multiple times.
Using the TokenSeq, which a lets you peek explore the token stream as much as you like without mutating it. Main downside of this of course is the extra overhead that TokenSeqs imply.

I think I like breaking out the firstToken parameter over all these alternatives, though it's a bit of a tricky call...

from edn-java.

bpsm commented on September 26, 2024

Single threaded

Yes, TokenSeq is intended for single-threaded use only. When would it ever make sense to consume the contents of a single TokenSeq from more than one thread concurrently? This is not intended to be a general sequence abstraction.

Not using `null`

Yes, I gave considerable to what role, if any 'null' should play in the interface and decided to avoid it entirely both in case of individual tokens and in the case of the TokenSeq itself.

Scanner.nextToken() and TokenSeq.first() both return an individual token. It's not clear here what null should mean. Does it signal the end of input? Does it signal that we just read 'nil' from the input stream? I side-stepped that awkwardness when designing Scanner by giving each of these concepts their own explicit representation: Token.END_OF_INPUT and Token.NIL respectively.

I don't see a good reason to that TokenSeq.first() should behave differently than Scanner.nextToken(), by for example, returning null in place of Token.NIL.

What I'm doing with TokenSeq is a variation on the null object pattern. The "empty" sequence is one whose first element is the distinguished value Token.END_OF_INPUT. The rest of an empty sequence is, unsurprisingly, empty.

I considered having providing TokenSeq.next() instead of TokenSeq.rest(). next() would have returned nil to signal end of sequence, but this has drawbacks:

It forces TokenSeq to always peek one token ahead since it will need to know if the first() of the next() it's just trying to create is Tokens.END_OF_INPUT, in which case it won't allocate a TokenSeq, but will instead return null.
It means that there is no such thing as an empty sequence. it must be represented as null.
This means that anyone given a sequence must do a null check on it before calling first() or next(), because, you know, the damn thing might be null.

Clojure Style

Clojure style using of using null to signal end of sequence or empty sequence works well for Clojure because in Clojure we're applying functions to values and these functions can encapsulate the necessary null checks to make the abstraction work.

This doesn't work for Java since in Java we're typically invoking methods on objects, which invites all sorts of misery when the reference we have in hand doesn't refer to any object.

firstToken

On the lookahead point, the reason I found the need to keep the firstToken separately as a parameter is this: suppose you have a custom parser that is building something from a vector of objects, and it is halfway through processing the vector. When it reads the next token, it could be either a ] (which would signal the end of the vector) or anything else (which would signal the start of the next object contained within the vector). In the latter case, it needs to keep the token so that it can pass it to the next parser in the chain to parse the object.

I don't want the public interface of Parser to require the caller to spoon-feed it firstTokens when parsing multiple edn values from a single stream. That seems like a losing API.

I see you take this approach anyway in ParserImpl as a private implementation detail.

Yes, but that's just a left-over from an earlier implementation. It's not required. See 85a77aa.

from edn-java.

bpsm commented on September 26, 2024

Here's what I think makes the most sense going forward:

I've made the Scanner part of edn-java’s public API. I intend to keep it stable. See 74df9ca. TokenSeq or whatever else custom parsers might require can be built on this.

I've created edn-pojos, which incorporates your custom parser code form your topic/unmarshal branch. Please feel free to make it your own by cloning it to your local machine, repackaging it and renaming it to your liking, and then pushing it to a fresh repo on your github account. I'll then fork your copy and remove mine.

from edn-java.

mikera commented on September 26, 2024

OK, makes sense. I've built the new repo here:

https://github.com/mikera/edn-pojos

Can you push a point release or a snapshot to a public Maven repository so I can make the Travis build work with the latest dependency on edn-java? Maven Central or Clojars would seem like the best options.....

from edn-java.

bpsm commented on September 26, 2024

Clojars seems like a poor fit, since edn-java isn't in Clojure, so I'll research what's necessary to get something on central. This may take a few days.

from edn-java.

mikera commented on September 26, 2024

Central would definitely be preferred!

I only mentioned Clojars as a temporary measure as it's slightly easier to set up, and I think edn-java fits into the general category of open source libraries useful in Clojure (even if the main audience might be Java users, it is still part of the same ecosystem). I even pushed an early build of edn-java out myself when I needed to test some build processes: (https://clojars.org/net.mikera/edn-java)

from edn-java.

bpsm commented on September 26, 2024

I've made a snapshot available as described in the readme at bpsm/edn-mvn-repo. I'm working on a real solution via Sonatype, but I wouldn't bet on me being done with that before next week.

from edn-java.

mikera commented on September 26, 2024

OK great, thanks!

Agree it's only a temporary solution but it works for testing out the build at least

from edn-java.

Nestable / composable parsers about edn-java HOT 27 CLOSED

Comments (27)

Single threaded

Not using `null`

Clojure Style

firstToken

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

Comments (27)

Single threaded

Not using null

Clojure Style

firstToken

Related Issues (20)

Recommend Projects

Recommend Topics

Recommend Org

Not using `null`