Comments (8)
If you return a ByteIterator
it doesn't box! https://kotlin.godbolt.org/z/P5rqb8vx7
I forgot Kotlin has these iterator specializations. We of course never had access to them in Okio historially since it spent most of its life written in Java.
That said, as soon as someone calls an extension on it like map
the game is over and we're back to boxing.
On Java all big-B Byte
instances are cached as long as you go through Byte.valueOf
. I have no idea what happens on other Kotlin targets though.
from kotlinx-io.
On the JVM this would incur boxing and a ton of virtual dispatching making the performance pretty awful. What are you trying to do?
from kotlinx-io.
If I understand correctly, implementing
Iterable
would only incur boxing when theiterator
method is called.
Every call to Iterator#next()
would box, and then immediately unbox because it has to travel through the interface.
Kotlin does feature iterable specializations that avoid the boxing (ByteIterable
), but I don't recall the for
/in
syntax using them. And certainly none of the transforming operators use them as you mention below.
I don't have one specific use case, but it seems all around practical because we immediately can use entire suites of extensions methods on
Iterable
such as Kotlin Collections from the standard library. Also, I like to make classes that takeIterable
constructor parameters. Then I build a list or set to get a safe immutable copy inside the class.
For actual collections of objects this makes sense. Unfortunately for bytes you are much better off treating them as large units. ByteString
is already immutable and represents a chunk which as a whole semantically represents something. Additionally, bytes are typically not processed using the operators one would normally use on other iterable things. Sometimes you do, sure, but it's very rare, and in those cases you need to operate directly on the bytes rather than through a (comparatively) heavy abstraction like Iterable
.
In the 10 years of Okio we have not needed to iterate a ByteString
, at least not in a way that couldn't be done by shallow-copying its backing segments to a Buffer
and operating on that directly or its backing ByteArray
s via UnsafeCursor
.
I think the particular thing I noticed was that there is no existing
forEach
extension for ByteString. I could quickly write it myself, but it just gave me this idea because implementing Iterable would make it work out of the box in convenient ways.
The need to iterate bytes like this is not entirely uncommon, but if you're operating at the level of bytes you generally want to ingest with as little overhead as possible because you're interpreting those bytes as gzip data, png data, mp3 data, or something else. You aren't looking at each byte one-by-one and doing something with it individually.
On the topic of performance, I try to first write code that is safe, easy to read, and easy to maintain and try not to let performance preocupy me too much since in many real life scenarios these performance differences will be negligible.
This is a good strategy for high-level code which deals with objects and things. Correctness first, performance second.
I'm not sure it makes sense at this layer of abstraction, however. Concrete use-cases of what you are trying to accomplish would go a long way in helping us understand if this is the right strategy or if there's a better one. Speculatively adding the feature, though, is unlikely to yield great results.
Also what's the point of computer hardware getting better if we don't let it shave off a few lines of code for us?
I'm not sure when you started using computers, but computers today are perceptively slower than older ones! Using programs on my 333Mhz PC twenty-five years ago was a much more responsive and snappy experience than my M3 Mac.
The pace at which we've introduced indirection and abstraction at every layer has outpaced Moore's law (which itself has mostly died). The laws of thermodynamics and the speed of light remain fixed, and we're smack up against them now.
The ability to process things efficiently, especially those represented in low-level constructs like a series of bytes, has actually become a competitive advantage for apps.
from kotlinx-io.
Nice discussion!
but I don't recall the
for
/in
syntax using them.
They should, but for now it is a hidden knowledge for no reason. Filed https://youtrack.jetbrains.com/issue/KT-67379/
Example
class ByteString : Iterable<Byte> {
class BIter(private val arr: ByteArray) : ByteIterator() {
private var idx = -1
override fun nextByte(): Byte = arr[++idx]
override fun hasNext(): Boolean = idx < arr.size - 1
}
// Experiment: replace ByteIterator with Iterator<Byte> and see the bytecode difference
override fun iterator(): ByteIterator = BIter(byteArrayOf(1, 2, 3))
}
@Test
fun sample() {
var sum: Int = 0
for (b in ByteString()) {
sum += b
}
}
I'm not sure it makes sense at this layer of abstraction, however. Concrete use-cases of what you are trying to accomplish would go a long way in helping us understand if this is the right strategy or if there's a better one. Speculatively adding the feature, though, is unlikely to yield great results.
This is quite a subtle topic, especially if/when we are going to move ByteString
into the standard library and have kotlin.String
and kotlin.ByteString
along each other, where it is reasonable to expect them to be more or less aligned.
On the one hand, for passer-by usages (and if we are successful, we expect plenty of them!), the impedance mismatch would be quite noticeable -- why for String
a user can iterate over characters, while for ByteString
they cannot iterate over their bytes/ascii characters? (if there will be such a need at all) It's hard to explain either -- boxing considerations is not a lingua we expect everyone to understand and reason with. Having non-included batteries in the standard library is always questionable.
On the other hand, at the very core, the API is quite specialized, and exposing inherently suboptimal operations is a questionable thing to do. Even if boxing is optimizeable via ByteIterator
, the iterator allocation is not (and, more importantly, byte-by-byte processing becomes more straightforward than batch processing that we encourage). This is a perfect use case for https://youtrack.jetbrains.com/issue/KT-33851/iterator-less-for-statement-operator-convention though
from kotlinx-io.
The ability to process things efficiently, especially those represented in low-level constructs like a series of bytes, has actually become a competitive advantage for apps.
I share this sentiment: for apps, for tools, for IDEs, for everything, to be honest.
This is the topic that puzzles me regularly -- from one perspective, we can only provide quite tailored APIs that are hard to misuse ([lack of] byte iteration is a nice example!); it might help, but comes with its own price. From the other perspective, being efficient cannot really be an afterthought for an app, and no amount of well-thought API can help it if it's not a consideration in the first place. All the convenience extensions will be written anyway if they are not here.
The place on the spectrum from "efficient low-level API" and "convenient yet probably slower API" is really elusive, esp. for IO layer.
from kotlinx-io.
Boxing is a topic I don't understand very well. Can someone point me to something I can read on this? Or is it something that can be explained here? I understand that the Iterator object will have to be instantiated, but I don't understand why the ByteString object will have to be repeatedly boxed and unboxed as opposed to boxed once and then have a reference of the box held by the iterator.
Iterator over ByteString
would implement Iterator<Byte>
.
When targeting JVM, Kotlin's Iterator
interface is mapped to Java java.util.Iterator
(javadoc).
Java prohibits generic type instantiation using primitive types.
As a result, iterator's fun next(): Byte
would need to return Byte-object, not a primitive type. But values stored inside ByteString
's backing array are primitive bytes, not objects. So every byte needs to boxed into java.lang.Byte
before returning from next
and then, most likely, it needs to be unboxed at a use-site.
Here's an example of an iterator over ByteArray
: https://godbolt.org/z/WE8nonbeK
As you can see, a value retrieved from the array is wrapper into j.l.Byte
via java/lang/Byte.valueOf
call.
from kotlinx-io.
Thanks for explaining. If I understand correctly, implementing Iterable
would only incur boxing when the iterator
method is called.
I don't have one specific use case, but it seems all around practical because we immediately can use entire suites of extensions methods on Iterable
such as Kotlin Collections from the standard library. Also, I like to make classes that take Iterable
constructor parameters. Then I build a list or set to get a safe immutable copy inside the class.
I think the particular thing I noticed was that there is no existing forEach
extension for ByteString. I could quickly write it myself, but it just gave me this idea because implementing Iterable would make it work out of the box in convenient ways.
I also feel convinced because since ByteString is immutable, it is particularly more safe to iterate than bytearray (even concurrently).
On the topic of performance, I try to first write code that is safe, easy to read, and easy to maintain and try not to let performance preocupy me too much since in many real life scenarios these performance differences will be negligable. Also what's the point of computer hardware getting better if we don't let it shave off a few lines of code for us?
If the consensus to implementing Iterable
is not to, I would consider an alternative solution maybe to be extension methods to match what we have in the standard library for ByteArray or String.
from kotlinx-io.
Thank you both for the thoughtful answers! I have a lot to learn here.
Every call to Iterator#next() would box, and then immediately unbox because it has to travel through the interface.
Boxing is a topic I don't understand very well. Can someone point me to something I can read on this? Or is it something that can be explained here? I understand that the Iterator object will have to be instantiated, but I don't understand why the ByteString object will have to be repeatedly boxed and unboxed as opposed to boxed once and then have a reference of the box held by the iterator.
In the 10 years of Okio we have not needed to iterate a ByteString
I see my request might be going against an important principle. That principle being that binary objects should be handled with bulk operations.
I think I can respect this principle but still make a case for supporting iteration. One reason is debugging. If a bulk operation on a binary object has an issue, it might be useful to iterate one byte at a time and see what is going wrong, maybe with a breakpoint in there.
Another reason is being able to maintain more of a flow state in your head when writing an operation on bytes in which you can immidiately think of how to do it byte-by-byte, but figuring out the appropriate bulk operation might take some more thought.
I'm not sure when you started using computers, but computers today are perceptively slower than older ones!
The pace at which we've introduced indirection and abstraction at every layer has outpaced Moore's law
I started around 2005. I think this is a fascinating perspective but I personally don't see it. I mean, unless we are talking about gradle builds, that's about the only thing that is slow for me. But even there once the configuration cache was introduced it got super fast (at least when the configuration cache works correctly!). Then again, I take what you are saying as the truth given you have more experience.
On the one hand, for passer-by usages (and if we are successful, we expect plenty of them!), the impedance mismatch would be quite noticeable -- why for String a user can iterate over characters, while for ByteString they cannot iterate over their bytes/ascii characters?
For the record that's definitely what brought me here! It just stuck me as strange, even through I can see now there are valid reasons for the abscence of an iteration API on ByteString.
This is indeed a nice topic. Maybe documenting the reasoning behind not including forEach
could be just as valuable as the community (if not more) than including it at all. Though I still think some of my reasons in this comment are worth considering. I'll definitely respect whatever decision you all make.
from kotlinx-io.
Related Issues (20)
- Consider moving core IO primitives to the standard library prior to their stabilization
- Investigate the potential mechanisms for project-wide opt-out from unsafe API HOT 6
- Improve SegmentPool test coverage
- Add support for writing other primitive arrays to Buffers HOT 2
- [help] How to read and write a file? HOT 4
- Add a way to check if Path points to a link HOT 2
- `readFromHead`/`writeToTail` should propagate the `Int` returned from the lambda HOT 2
- Lambda-accepting `UnsafeBufferOperations` need contracts for calls-in-place
- Rename `RealSource` and `RealSink` to `BufferedSource` and `BufferedSink` HOT 2
- Should `Source.buffer` be `@UnsafeIoApi` instead of `@InternalIoApi`? HOT 3
- Evaluate use of unsigned numbers for public API HOT 9
- Add support for atomic file creation
- Move smoke pre-release tests to kotlinx-io repo
- Provide Sink/Source implementations writing to stdout/reading from stdin HOT 11
- Use ByteArray.set<Type>At and get<Type>At to read/write from Segments on Native
- Tidy up samples to make them more kotlin-playground-friendly HOT 2
- FileSystem: missing a function to construct an absolute path from a relative one HOT 2
- FS: On Windows, use Win32 API accepting wide string to avoid issues with the file path encoding
- ByteString: provide ByteString-factory overload accepting unsigned bytes
- Temporary file creation support HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kotlinx-io.