Git Product home page Git Product logo

Comments (8)

mikera avatar mikera commented on September 26, 2024

I've implemented this before in another library so know what to do - want a patch for edn-java for this one?

from edn-java.

bpsm avatar bpsm commented on September 26, 2024

fixed. Also made interning configurable.

from edn-java.

bpsm avatar bpsm commented on September 26, 2024

@mikera I just comitted a fix before I saw you message. Please feel free to make suggestions on how it could be improved.

from edn-java.

mikera avatar mikera commented on September 26, 2024

Just had a look, main comments would be:

  • Interners should be static so that different scanner instances return the same keyword. I think this is implied by the edn spec and is consistent with how Clojure interns keywords
  • Should use ConcurrentHashMap for storing the symbols with a putIfAbsent for atomic interning. This is needed to avoid subtle concurrency issues (clojure.lang.Keyword also uses this method....)
  • Should consider using weak references so that keywords get GC'd if no longer being used
  • I'm not sure if we should be interning strings - this could cause memory leaks in long running apps that churn a lot of data. Keywords and symbols are probably OK because the same keywords / symbols are likely to be used every time. If we want to intern strings, we definitely need to use weak or soft references

Happy to fix up some of this if needed, let me know

from edn-java.

mikera avatar mikera commented on September 26, 2024

Made a couple of quick commits in my branch which make the interning static. Hopefully this helps!

I think this is incompatible with the idea of configurable interning, since the interning code needs to be global. My view would be that it is more important to get a solid interning implementation working by default rather than having it configurable - it's one of those things where there is a "right way to do it" that should serve 99.99% of users.

from edn-java.

bpsm avatar bpsm commented on September 26, 2024

I considered global (static interning), but decided against it. I'll have a look at your commits and see if they change my mind.

As you've noted, global interning introduces concurrency issues, which Parser-scoped interning does not. A single Parser is not intended for use by more than one thread, nor would it make sense to support this.

Global interning can, in principle, use unbounded amounts of memory in a way that depends entirely on the peculiarities of the input data. I'd rather not encourage that.

Parser-local interning has a chance of staying the young generation. The maps and their content can be collected as soon as we're done with the parser. This seems like a potential win.

It's not that I'm barring the door to doing interning globally, it's just that it's not clear to me yet that it is needed.

This depends on what the aim is of the interning, and that's not clear to me yet. Right now, I just consider it a slight memory savings and efficiency hack, but that's surely influenced by my use case: I tend to parse a single file containing potentially thousands of homogenous maps. So, the same keyword occurs thousands of times. Clearly interning at the parser level is already a win here.

On the other hand, if I were parsing each of these same maps with a separate parser because I was retrieving each with an individual request, then I wouldn't see any gain from interning, but would that be such a loss? I'm not sure.

If interning means that we can always safely do this:

if (someKeyword == SOME_KEYWORD) { ... }

Then we'll need really global interning. (No public constructor, just a factory method for the class Keyword!)

If, on the other hand we're just hoping for less memory usage for large data sets and occasionally more efficient execution of:

if (someKeyword.equals(SOME_KEYWORD)) { ... }

Then we don't need global interning.

from edn-java.

bpsm avatar bpsm commented on September 26, 2024

@mikera I've reworked the Keyword interning such that it's global. I borrowed Clojure's approach to this, which you described briefly.

I've been thinking about it all afternoon, and decided that the most likely intended benefit for interning is to facilitate fast map lookups. To get that benefit, all keywords need to be interned and the equals method needs to know that it can rely on reference equality.

I've got an idea cooking for hooking into the scanner such that client code can exert some influence when token values are created by the scanner. This should allow clients who would benefit from interning of symbols or short strings to add this in themselves. (My application would benefit from interning symbols.)

from edn-java.

mikera avatar mikera commented on September 26, 2024

Cool I think global interning is the right way to go - as you note the value of global interning is that it allows you to make map lookups.

The fast map lookup is actually a special case of a more general advantage of global interning - it allows you to make some stronger assumptions about keyword identity (because an equality check can become an identity check). So you get benefits for set membership tests, sorting etc.

from edn-java.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.