Comments (8)
I've implemented this before in another library so know what to do - want a patch for edn-java for this one?
from edn-java.
fixed. Also made interning configurable.
from edn-java.
@mikera I just comitted a fix before I saw you message. Please feel free to make suggestions on how it could be improved.
from edn-java.
Just had a look, main comments would be:
- Interners should be static so that different scanner instances return the same keyword. I think this is implied by the edn spec and is consistent with how Clojure interns keywords
- Should use ConcurrentHashMap for storing the symbols with a putIfAbsent for atomic interning. This is needed to avoid subtle concurrency issues (clojure.lang.Keyword also uses this method....)
- Should consider using weak references so that keywords get GC'd if no longer being used
- I'm not sure if we should be interning strings - this could cause memory leaks in long running apps that churn a lot of data. Keywords and symbols are probably OK because the same keywords / symbols are likely to be used every time. If we want to intern strings, we definitely need to use weak or soft references
Happy to fix up some of this if needed, let me know
from edn-java.
Made a couple of quick commits in my branch which make the interning static. Hopefully this helps!
I think this is incompatible with the idea of configurable interning, since the interning code needs to be global. My view would be that it is more important to get a solid interning implementation working by default rather than having it configurable - it's one of those things where there is a "right way to do it" that should serve 99.99% of users.
from edn-java.
I considered global (static interning), but decided against it. I'll have a look at your commits and see if they change my mind.
As you've noted, global interning introduces concurrency issues, which Parser-scoped interning does not. A single Parser is not intended for use by more than one thread, nor would it make sense to support this.
Global interning can, in principle, use unbounded amounts of memory in a way that depends entirely on the peculiarities of the input data. I'd rather not encourage that.
Parser-local interning has a chance of staying the young generation. The maps and their content can be collected as soon as we're done with the parser. This seems like a potential win.
It's not that I'm barring the door to doing interning globally, it's just that it's not clear to me yet that it is needed.
This depends on what the aim is of the interning, and that's not clear to me yet. Right now, I just consider it a slight memory savings and efficiency hack, but that's surely influenced by my use case: I tend to parse a single file containing potentially thousands of homogenous maps. So, the same keyword occurs thousands of times. Clearly interning at the parser level is already a win here.
On the other hand, if I were parsing each of these same maps with a separate parser because I was retrieving each with an individual request, then I wouldn't see any gain from interning, but would that be such a loss? I'm not sure.
If interning means that we can always safely do this:
if (someKeyword == SOME_KEYWORD) { ... }
Then we'll need really global interning. (No public constructor, just a factory method for the class Keyword!)
If, on the other hand we're just hoping for less memory usage for large data sets and occasionally more efficient execution of:
if (someKeyword.equals(SOME_KEYWORD)) { ... }
Then we don't need global interning.
from edn-java.
@mikera I've reworked the Keyword interning such that it's global. I borrowed Clojure's approach to this, which you described briefly.
I've been thinking about it all afternoon, and decided that the most likely intended benefit for interning is to facilitate fast map lookups. To get that benefit, all keywords need to be interned and the equals method needs to know that it can rely on reference equality.
I've got an idea cooking for hooking into the scanner such that client code can exert some influence when token values are created by the scanner. This should allow clients who would benefit from interning of symbols or short strings to add this in themselves. (My application would benefit from interning symbols.)
from edn-java.
Cool I think global interning is the right way to go - as you note the value of global interning is that it allows you to make map lookups.
The fast map lookup is actually a special case of a more general advantage of global interning - it allows you to make some stronger assumptions about keyword identity (because an equality check can become an identity check). So you get benefits for set membership tests, sorting etc.
from edn-java.
Related Issues (20)
- Iterate over list recieved via get. HOT 1
- > and < symbols are not supported HOT 3
- Values returned by Parser should be Serializable by default HOT 1
- Comma character prints incorrectly HOT 3
- PrintingExamples are sensitive to hash ordering HOT 1
- Teach edn-java to read namespaced maps as per CLJ-1910 HOT 1
- Teach edn-java to optionally print namespaced maps as per CLJ-1910
- Duplicate map keys are parsed without error HOT 5
- Add Java 8 to test with Travis too. HOT 1
- QuickCheck to test? HOT 1
- mispelling, in the docs
- Version 1.0? HOT 1
- Reader flag to support unicode escapes in string literals HOT 14
- Support reading \uXXXX character literals HOT 1
- Review edn-format; compare to implementation HOT 1
- Support # embedded in symbol names HOT 1
- Clojure's edn-read allows unicode in names, should edn-java do the same?
- Fix broken GPG magic HOT 4
- Octal escapes in string and character literals? HOT 1
- 0.7.0 or 0.7.1 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from edn-java.