Comments (9)
I think text conversion depends on the implementation, i.e., the rules are not related to the data format. The compiler manual (see README) states the following.
Text validation is not part of the marshalling and unmarshalling
process. C and Go just pass any malformed UTF-8 characters. Java
and JavaScript replace unmappable content with the '?' character
(ASCII 63).
So java.lang.String is no longer backed by a char(acter) array. With the new implementation it is even harder to access the data in an efficient way. π Happy to hear about better alternatives for String#charAt(int). String#getBytes allocates memory. The unmarshaller uses String(byte[],int,int,java.nio.charset.Charset) now, and that works fine.
No external libraries for generated code is key!
Feel free to open an issue for a specific improvement idea.
from colfer.
Had a quick look at the new streams with String#chars. It is way slower π±than String#charAt(int).
from colfer.
-
The size fit is the maximum ratio from UTF-16 char(acters) to UTF-8 bytes. That is, encoding of a
char
costs 1, 2 or 3 bytes; never more. -
The golden cases have a string with all cases covered.
from colfer.
What do you mean with "for now"? π¬ This must hold forever, even with malformed UTF-16 sequences.
from colfer.
--- a/ecma/test.js
+++ b/ecma/test.js
@@ -50,6 +50,7 @@ function newGoldenCases() {
'87ffffffffffffffff2e5da4e77f': {t: new Date(-223), t_ns: 888999},
'0801417f': {s: 'A'},
'080261007f': {s: 'a\x00'},
+ '0804f0908d887f': {s: 'π'},
'0809c280e0a080f09080807f': {s: '\u0080\u0800\u{10000}'},
'08800120202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020202020207f': {s: ' '},
'0901ff7f': {a: new Uint8Array([0xFF])},
β¦ passes just fine.
from colfer.
Thank you for clarifying. So here are some specific suggestions:
- Java's size fit should probably use *4 or *5 as a general rule. Assuming ππ±βππ€·ββοΈπ₯π emojis are the norm than the exception.
- We should probably add ser/deser tests with UTF characters across all 4 character ranges in these tests.
from colfer.
Ah, yes, based on characters, not code points. That should be okay for now.
from colfer.
For now, it meant, until, Unicode ups the range dramatically.
from colfer.
Unicode doesn't up the range dramatically. It would would also be against their own stability policy.
from colfer.
Related Issues (20)
- Document limitation: Only 126 fields, ever, no removing HOT 7
- Backwards compatibility with version 1
- Suggestion: Continuous Fuzzing HOT 1
- [Feature request] publish colfer JS to npmjs.com HOT 2
- Will colfer support uint32/uint64 list in the future? HOT 1
- Suggestion:add the -o parameter and specify the output file name. HOT 3
- BufferOverflowException in writeObject HOT 8
- Document uint16 better. HOT 6
- Option to not output go struct definitions HOT 6
- Define Null-able types in some languages HOT 4
- Generated code doesn't compile with "illegal start of expression" HOT 1
- about compatibility, can you support adding a new field into the middle structure in the stream ? HOT 3
- colf for windows - stack exceeded HOT 1
- Dart support HOT 5
- Big Decimal Support HOT 6
- Empty structs generates invalid Go code HOT 4
- Generate schema from Kotlin code HOT 4
- Bit Field Handling HOT 11
- Java String Performance π
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
π Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. πππ
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google β€οΈ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from colfer.