Comments (4)
That's an even better fix! I tested that locally and it worked, but assumed failing the wpt tests was a non-starter, so didn't bother suggesting it.
In most cases I would agree, but in this case there's no way to have an efficient implementation that passes the tests. For the tests to pass, you have to make the original ArrayBuffer
appear "neutered", and there's just no "good" way to do that.
I'm hoping the ArrayBuffer.prototype.transfer()
proposal goes through, so we can provide a correct and efficient implementation on modern browsers. Until then, I'm okay with breaking these couple of tests.
In that method
thisPool
is a node Buffer, whereslice()
is the same as TypedArray'ssubarray()
, so it doesn't incur a copy. Confusing, I know😕 .
Wow, TIL that Buffer.slice
!= TypedArray.slice
! That's a very unfortunate naming conflict. 😝
If I understand correctly, a Node ReadStream
creates an ArrayBuffer
as its "pool", and uses that to fulfill multiple reads. Once the pool is full, it allocates a new one and uses that for future reads. And when all references to the original pool are gone (because all references to the pushed chunks are gone), the original ArrayBuffer
gets garbage collected.
Yeah, it would be cool if we could do that with readable byte streams... You could almost get it to work with:
outer: while (true) {
let pool = new ArrayBuffer(1024);
let offset = 0;
while (offset < pool.byteLength) {
const { done, value } = await reader.read(new Uint8Array(pool, offset));
if (done) {
break outer;
}
pool = value.buffer;
offset += value.byteLength;
this.push(value);
}
}
But then the next call to reader.read()
would invalidate all previously pushed chunks using that same pool
. Unless you can somehow guarantee that those previous chunks are no longer needed when the next chunk is read... but that's probably not possible. 😞
Yep, and this is where we reach the limits of our ability to cleanly bridge all these APIs. The "lowest-common-denominator" concept in all these APIs is buffer message-passing.
I see. Indeed, that's quite a challenge to get a common I/O API across all platforms.
At the end of the day, forcing a copy anywhere (node or the browser) means we can't do things like this as flawlessly:
That looks amazing! Good luck with the project! 👍
from web-streams-polyfill.
Indeed, TransferArrayBuffer
aims to replicate the side effects of transferring an ArrayBuffer
, but in order to do so it copies the ArrayBuffer
- which is exactly what you want to avoid with a transfer! This is fine for the purposes of the reference implementation, but unacceptable for a polyfill.
I think I'll replace TransferArrayBuffer
and IsDetachedBuffer
with dummy implementations in the polyfill. All WPT tests related to transferring buffers are in readable-byte-streams/detached-buffers.any.js
, so I can just exclude that one file in the polyfill's test suite.
Node often uses internal ArrayBuffer pools (like here in fs.ReadStream), so the
Object.defineProperty
call is modifying the ArrayBuffer pool, breaking unrelated operations dependent on that pool.
Although we can fix this for the polyfill, I think this can still cause issues in the future when using a native readable byte stream implementation. A native implementation would actually transfer the ArrayBuffer
, so any references to the old buffer become "neutered" and cannot be re-used.
In your next
function, you pass the Uint8Array
returned by reader.read()
to controller.enqueue()
. The enqueue
method transfers the underlying buffer, so that buffer can never be used for a future call to reader.read()
. If I understand your question correctly, you want these buffers to come from a pool. This can become a problem if two calls to reader.read()
are allowed to return two Uint8Array
views on the same ArrayBuffer
from that pool.
The "intended" way to do zero-copy reads with a readable byte stream is to use controller.byobRequest
instead of controller.enqueue()
. Therefore, you must have your reader use a given buffer to store the read data in, rather than having your reader return a (new or re-used) buffer. For example, assuming you had a reader.readInto(view: ArrayBufferView): Promise<number>
method that returns the number of bytes read into view
, you could wrap a readable byte stream around that:
const readable = new ReadableStream({
type: 'bytes',
async pull(controller) {
const byobRequest = controller.byobRequest;
if (byobRequest) {
// using a BYOB reader
const bytesRead = await reader.readInto(byobRequest.view);
if (bytesRead === 0) {
controller.close();
} else {
byobRequest.respond(bytesRead);
}
} else {
// using a default reader
const buf = await reader.read();
if (!buf) {
controller.close();
} else {
controller.enqueue(buf.slice()); // need to copy if `buf.buffer` could be re-used
}
}
}
});
Alternatively, you can set autoAllocateChunkSize
so you always have a byobRequest
even when a default reader is used. That way, you only need the if
block in the above pull()
method.
Some off-topic remarks:
- You shouldn't need both
start
andpull
methods if both simply doawait next(controller)
. You can leave out thestart
method. - In your
next
function, you don't need a loop. If the internal queue size (as indicated bycontroller.desiredSize
) is greater than 0, the stream implementation will automatically call yourpull
method again. 😉
from web-streams-polyfill.
The "intended" way to do zero-copy reads with a readable byte stream is to use
controller.byobRequest
instead ofcontroller.enqueue()
.
Unfortunately, for this to be truly zero-copy, you would need to use BYOB readers everywhere, which would require a massive rewrite... I guess that's not really an option.
Perhaps an easier option would be to always copy the chunk before enqueuing it:
async function next(controller: ReadableStreamDefaultController<Uint8Array>) {
let buf = await reader.read(controller.desiredSize || null);
if (buf) {
controller.enqueue(buf.slice());
} else {
controller.close();
}
}
That way, you don't have to change the rest of the code, at the cost of one copy per chunk (given that the polyfill is fixed to not copy in TransferArrayBuffer
). However, it seems like this copy is unavoidable: even Node's file reader copies the read bytes to avoid consumers messing with the pool.
from web-streams-polyfill.
I think I'll replace TransferArrayBuffer and IsDetachedBuffer with dummy implementations in the polyfill. All WPT tests related to transferring buffers are in readable-byte-streams/detached-buffers.any.js, so I can just exclude that one file in the polyfill's test suite.
That's an even better fix! I tested that locally and it worked, but assumed failing the wpt tests was a non-starter, so didn't bother suggesting it.
In your next function, you pass the Uint8Array returned by reader.read() to controller.enqueue(). The enqueue method transfers the underlying buffer, so that buffer can never be used for a future call to reader.read(). If I understand your question correctly, you want these buffers to come from a pool. This can become a problem if two calls to reader.read() are allowed to return two Uint8Array views on the same ArrayBuffer from that pool.
Yes, that's the heart of the issue. I'm happy to conform to the semantics of whatever APIs I need to get the job done, so in my mind it's not so much about what I want (aside from ensuring zero-copy), and more about making the polyfill play nice with node.
Although we can fix this for the polyfill, I think this can still cause issues in the future when using a native readable byte stream implementation. A native implementation would actually transfer the ArrayBuffer, so any references to the old buffer become "neutered" and cannot be re-used.
Yes, but at the moment the only pooling that's happening is in node core, so if node ever gets a native ReadableByteStream implementation, they'll have to deal with this issue at that point anyway.
even Node's file reader copies the read bytes to avoid consumers messing with the pool
In that method thisPool
is a node Buffer, where slice()
is the same as TypedArray's subarray()
, so it doesn't incur a copy. Confusing, I know 😕.
The "intended" way to do zero-copy reads with a readable byte stream is to use controller.byobRequest instead of controller.enqueue(). Therefore, you must have your reader use a given buffer to store the read data in, rather than having your reader return a (new or re-used) buffer. For example, assuming you had a reader.readInto(view: ArrayBufferView): Promise method that returns the number of bytes read into view, you could wrap a readable byte stream around that:
Yes, I actually do this a few steps before in AdaptiveByteReader#read(size?)
. Allow me to clarify a bit, because the code I've linked to in this issue is... special.
The Apache Arrow project is a streaming zero-copy dataframe standard for large analytics workloads. We compile many flavors of the JS library for both node and browsers, and we want to provide a simple and consistent public API for reading/writing data regardless of which environment you're in. We're sensitive to bundle size, and would like to use the native APIs in each environment, so we accept and produce the environment's stream primitives at the edges, and adapt them to Iterables and AsyncIterables for use internally.
We have a few generic io primitive impls, but they're intentionally slim wrappers that adapt the native io primitives to our Iterable/AsyncIterable protocol. We could split everything out and build special versions for each environment, but doing this means there's just one npm package to consume, maximizes code reuse (I'm the main JS contributor at the moment, and I'm stretched pretty thin), and reduces the surface area/complexity for future maintainers.
The "intended" way to do zero-copy reads with a readable byte stream is to use controller.byobRequest instead of controller.enqueue().
Unfortunately, for this to be truly zero-copy, you would need to use BYOB readers everywhere, which would require a massive rewrite... I guess that's not really an option. 😕
Perhaps an easier option would be to always copy the chunk before enqueuing it:
Yep, and this is where we reach the limits of our ability to cleanly bridge all these APIs. The "lowest-common-denominator" concept in all these APIs is buffer message-passing. I could bake the "byob" concept into our "bridge" primitives, but then I force a copy in node, and am ultimately just rebuilding whatwg reference impl. At the end of the day, forcing a copy anywhere (node or the browser) means we can't do things like this as flawlessly:
Thanks for considering this change!
Best,
Paul
from web-streams-polyfill.
Related Issues (20)
- Use in a streaming library for nodejs and browsers HOT 6
- dom-exception.ts HOT 6
- closing a stream followed by respond(0) can error if close is not syncronously putting stream in closed state HOT 5
- `pipeTo` doesn't use `signal.reason` as error HOT 2
- Example for polyfilling TextDecoderStream in previous browsers HOT 2
- `pipeTo` with `preventCancel: true` never settle if `readable` doesn't produce new chunks HOT 2
- Upgrading to Typescript 4.9 results in error TS2322 when using web-streams-polyfill HOT 3
- Abilities to support web streams under IE8-? HOT 3
- Question: how to detach a TypedArray. HOT 1
- Running WPT HOT 13
- [question/help] tee:ing a async iterator HOT 6
- Ready to boost your popularity to like 22 mil download / week? HOT 14
- Upgrading to Typescript 4.4 results in error TS2345 when using web-streams-polyfill HOT 5
- ERR_UNSUPPORTED_DIR_IMPORT HOT 2
- Building with rollup ends with circular dependency warnings HOT 2
- `Cannot abort a stream that already has a writer` HOT 9
- "TypeError: Failed to execute 'pipeThrough' on 'ReadableStream'" HOT 4
- Issue with dynamic import HOT 3
- The stream is not in a state that permits enqueue; new Response(readable).blob().text() resolve to [object ReadableStream] not underlying source HOT 10
- ReadableStream and File/Blob HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from web-streams-polyfill.