wicg / direct-sockets Goto Github PK

View Code? Open in Web Editor NEW

259.0 45.0 11.0 161 KB

Direct Sockets API for the web platform

License: Other

HTML 100.00%

direct-sockets's Introduction

direct-sockets

This is a proposal for a future Direct Sockets web platform API.

direct-sockets's People

Contributors

Stargazers

Watchers

Forkers

fabiancook sukelluskello global-localhost global19 global19-atlassian-net ekmixon ewilligers grapegreen soi-20

direct-sockets's Issues

support for setting IP_TOS

how open are we to exposing more socket properties ? i'd really like to be able to set IP_TOS for interactive connections.

specifying a hostname in the user consent prompt is useless without DNS rebinding protection

The security considerations section says that user consent would be host-specific, but also that DNS rebinding protection would be limited to preventing connections to "private network addresses". There are two big problems with this:

If the DNS rebinding protection is only against rebinding to private IPs, you'd effectively grant permission to connect to any non-private IP at a specific port; the host component of the user consent is more or less useless. Perhaps this part could be improved a bit by requiring that the reverse DNS of the IP address matches the original hostname or so.
In IPv6, it is perfectly normal to have publicly routable IP addresses inside home networks and such. Filtering simply by a fixed list of private address blocks is useless for preventing connections to the local network.

Consider using streams for UDP sockets as well

There are basically two models for programming against UDP sockets:

Event style, where it's possible to tune in and out of the socket at any time, possibly missing messages
Streams-with-no-backpressure, where all messages accumulate into an infinitely-growing buffer, and get processed by the consumer in order.

This spec seems to have chosen (2), since it uses async iterables (which have a built-in buffer of that sort)... but it did so without using the streams API. If (2) is indeed the desired design, then I suggest using ReadableStream / WritableStream.

FWIW, we had a similar discussion in the early days of the streams API around https://github.com/sysapps/tcp-udp-sockets, which ended up using (2).

Align with WebSocketStream on opening

WebSocketStream (shipped in Chromium) is an example of a duplex stream of the type used here for TCP sockets. However, the design differs from yours in how the stream is created and opened. It would probably be worth aligning.

See WebSocketStream API design and the explainer. I can't actually find the spec... @ricea?

How to play with the API?

Hello, I'm very enthusiastic about this API and was wondering what is the current approach for getting it to run?

I did try read some discussion here: https://groups.google.com/a/chromium.org/g/blink-reviews/search?q=direct%20sockets but I can honestly say I don't understand the code or C++ well enough to follow what's being said.

Also, conservatively, when would the team expect to have something they could ship in the main Chromium? Even as a conditionally disabled feature? Very interested in this work, thank you.

Create a ReadableByteStream for TCP sockets

TCP is a byte stream transport so it should create a ReadableByteStream rather than a regular byte stream. This will also allow developers to create a "byob" stream reader and avoid allocating memory for receive buffers.

Define high trust mode

Hi! The proposal says “The api will only be available in high-trust mode.”

Is this high trust mode specified/described somewhere? If so, can you link it, please? If not, can you describe it here?

Support connectionless UDP sockets

The current specification only supports UDP sockets which are bound to a remote address and port. To listen for incoming UDP packets from any host the remoteAddress and remotePort parameters must be made optional. This will also require adding support for specifying a remoteAddress and remotePort when constructing individual UDP messages.

Accessing other services via TCP is not mentioned in threats

Example: Many organizations permit submitting mail from an user's PC unencrypted and unauthenticated.
This extension would allow any Web page or service to connect to port 25 of the mailserver and inject mail.

IPv6 zone index handling

might be getting a little ahead of things, but can we make sure that zone indexes are handled by the spec, even if as a non-normative section ?
https://en.wikipedia.org/wiki/IPv6_address#Scoped_literal_IPv6_addresses_(with_zone_index)

i mention this as it's easy to overlook, and would be nice to have it as part of the initial spec when projects start to implement it to help mitigate.

i don't know if there are other web standards that expose interface level information that could be used as precedence. WebRTC maybe?

Code injection via ads is not mentioned in threats

The most common method of having a page run Javascript the page doesn't know about is to insert it in an ad (see, for instance, Geoff Huston's IPv6 penetration measurements, which are mainly done via Google ads).

The security implications of this form of code injection needs to be explored.

Security implications of installing a kernel mode driver (such as WinPcap) to enable TCP raw sockets on Windows.

On Windows versions >= Windows XP SP2 raw sockets not only require administrator privileges but are are extremely limited to the point that sending TCP data over raw sockets is impossible. The standard way to work around that is to install a third party driver such as WinPcap which for obvious reasons is a potentially serious security risk, especially if the functionality of such a driver is exposed to ordinary users instead of being restricted to administrators only.

In addition, such drivers are typically detected and blocked by most antivirus/antimalware software and even if Chrome/Chromium came with its own such driver rather than just installing WinPcap it would almost certainly be flagged as a "Hacktool" by such security software as well.

dynamic socket settings

currently the various socket settings can only be set once at construction (connect) time. i don't understand why. *NIX platforms allow all of these (send buf size, recv buf size, keep alive, nodelay) to be modified on the fly. Windows does too afaik. is there a platform that doesn't allow this that the API is catering to ?

having this limitation makes it difficult to map existing POSIX code onto the web platform without rewriting/forking things.

Backend uses in Chrome Headless

Hi.
There is also a very promising use case - this is Chrome Headless on the backend.
IBM browser functions

per-endpoint user consent is incompatible with Distributed Hash Table usecase

https://github.com/WICG/raw-sockets/blob/master/docs/explainer.md lists "Distributed Hash Tables for P2P systems" as a usecase, but the "security considerations" section seems to suggest that user consent would be required for each host. This combination seems infeasible.

dangerous interaction with application-level protocol sniffing in routers

APIs that allow attackers to connect to attacker-owned servers on arbitrary low ports are dangerous because such connections may be interpreted by routers/firewalls that attempt to enable reverse connections on dynamic, separate ports by sniffing application traffic.

This is why Flash banned traffic to low ports to fix CVE-2017-2938.

For an illustration of how much an attack might look, see e.g. https://samy.pl/slipstream/.

Huge P2P Potential

I just want to mention how much I'm in favour of this API. Current in-browser p2p is a lame duck because of NAT. In order to bootstrap a mesh, we have to rely on centralized websockets servers for signaling. The introduction of raw sockets opens up the possibility for in-browser servers, which I believe to be essential for the next generation of web. Of course we do want a permission ask when user visits.

DTLS for UDP

The explainer states, "We should facilitate use of TLS on TCP connections." The corollary should be, "We should facilitate use of DTLS on UDP connections."

FWIW, WebRTC relies on DTLS so supporting User Agents should already have this capability implemented.

Bypassing CORS would be very useful

Threat

Attackers may use the API to by-pass third parties' CORS policies.

Mitigation

We could forbid the API from being used for TCP with the well known HTTPS port, whenever the destination host supports CORS.

A lot of services (Reddit and Twitter for instance) are available over the web yet have a CORS policy that forbids third-party websites from accessing them. That is so because they don’t want malicious scripts on third-party sites accessing their data.

Third-party clients for such services are in demand. The only way to do one at the moment is with a native app, which are plentiful for the aforementioned Reddit and Twitter.

I’ve personally made a third-party client for jeuxvideo.com’s forums in the past (biggest forums in Europe, and my app was eventually responsible for 10% of the messages posted there), which was loved both by users and the staff as it resulted in much more engagement. It was a web app though, with a back-end proxying the site, and it was delicate to do as the developers had to make special exceptions in their infrastructure specifically for my service when they had a lot of other priorities, resulting in my service often rendered unavailable.

Making a third-party client as a web app makes it available on all platforms, without requiring a download, while requiring less effort to implement it than on just one platform natively, and there are more developers available for the web platforms that can create such solutions. It’s an absolutely wonderful thing.

As long as the user is made aware clearly of what allowing Raw Sockets on an HTTPS service entails, I believe this should be made possible as it would unlock a ton of useful and very accessible services, without more harm done than is possible with non-HTTPS services accessed through Raw Sockets.

General explainer review

I did a read-through of the explainer and in addition to some more specific issues I will file, here are some minor ones. Especially-important ones are in bold. The ordering is roughly top-of-document to bottom-of-document.

The frequent mention of XMLHttpRequest is a bit strange and distracting, since that API has been superceded by the fetch API. I'd suggest just replacing mentions of it with fetch.

The "Initial Focus" section could be clearer on which bullet points from "Use cases" are in scope vs. out of scope for the initial focus. It's clear DHTs are out of scope but I'm unclear if avoiding "IP multicast and UDP Broadcast" removes other possibilities. I'd suggest a summary paragraph like: "In terms of the [use cases] identified above, this means our initial focus will be on solving: X, Y, Z, and W. Whereas, A, B, and C will not be solvable with the current proposal."

The "Permissions Policy integration" section could benefit from some background explanation of why permissions policy should be integrated at all. Something like, "This means that the direct sockets APIs will not be available to cross-origin subframes, unless the outer page explicitly delegates that ability."

"Security Considerations" mentions "high-trust mode" but does not give any further detail. This should link somewhere, probably.

The mitigation for CORS bypasses should mention this is not a complete mitigation as it means you could attack servers using anything besides "the well known HTTPS port" (which I assume means 443). E.g. you could attack port 80 or port 8080 or port 3000.

"User agents should reject connection attempts when Content Security Policy allows the unsafe-eval source expression": is this a "should" or a "must"/"will be required to"? In general any use of "should" is a bit suspicious.

The explainer gestures at a user-facing security model several times but never explains it. E.g. it talks about users typing things, or about an option to permit future connections from an origin to specific hosts. I think you need an up-front section, probably right after "Initial Focus", that explains this. Even if the goal is not for the spec to constrain user agent UI, you should explain what Chromium is planning to implement, as an example of the kind of UI that this API and your security analysis is being designed around.

Security implications of Chrome/Chromium running suid root (Linux, possibly BSD), which is required for raw sockets.

In order to be able to use raw sockets on Linux (and presumably BSD) a program must be run with root privileges which means making Chrome/Chromium suid root. Even if said privileges are dropped immediately after setting CAP_NET_RAW (and presumably CAP_NET_ADMIN for setting promiscuous mode and/or mac spoofing) the security risks are still significantly higher than not having it run as root at all.

Support TCP server sockets

Define a TCPServerSocket interface which provides a single ReadableStream where each "chunk" is a TCPSocket. Similar to WebTransport.incomingBidirectionalStreams.

Rename this project

The current name might be associated with IPPROTO_RAW / SOCK_RAW, as documented in https://linux.die.net/man/7/raw

The name native sockets has been suggested.

Massive security concerns

So, there is clearly going to be massive security concerns with this API... as there has been with every attempt to do this in the past (e.g., https://github.com/sysapps/tcp-udp-sockets ... to Web Sockets themselves). Before we even start on this, it might be worth having a call with various folks around how feasible it is to even attempt this API.

no way to control IPv4 or IPv6 hostname resolution

sadly, IPv6 / IPv4 dual stack reliability is still not a thing everywhere. as such, it can be helpful for applications to expose forcing of a particular IP version when connecting via a hostname. the currently proposed API lacks such a knob. and since there's no DNS type of API available currently either, it's (practically speaking) impossible to workaround.

can we add another property TCPSocketOptions & UDPSocketOptions to control this ? would want to support at least 3 values -- any (the default), ipv4, and ipv6. i have no opinions as how best to encode that in the standard.

constructor() method buffer sizes

the spec currently says:

If options["sendBufferSize"] is equal to 0, throw a TypeError.
If options["receiveBufferSize"] is equal to 0, throw a TypeError.

shouldn't those be less than or equal to 0 ?

keepAliveDelay setting: milliseconds vs boolean

the TCP constructor defines keepAliveDelay as so:

https://wicg.github.io/direct-sockets/#tcpsocketoptions-dictionary
keepAliveDelay member
Enables TCP Keep-Alive by setting SO_KEEPALIVE option on the socket equal to the specified Keep-Alive ping delay in milliseconds.

i don't understand this as SO_KEEPALIVE is a boolean in the UNIX & Windows worlds, it isn't a ping delay in milliseconds.

POSIX -- "Non-zero requests periodic transmission of keepalive messages (protocol-specific)."
Windows -- "This value is treated as a boolean value with 0 used to indicate FALSE (disabled) and a nonzero value to indicate TRUE (enabled)."

so what system/API is this targeting, and how are implementers expected to support it ?

Suggestion: condition this on a web permission + CSP + CORS + explicit permissions request to provide a strong layer of security

I was reading through #1 and w3ctag/design-reviews#548, and I feel that the security concerns could be mostly addressed by doing the following in concert:

Require the connection to be established from a secure browsing context.
- This reduces the likelihood of arbitrary script injection.
Add a CSP directive to limit what scripts can connect to, and disallow all without this header present.
- This allows site operators to prevent XSS holes from being exploited to target arbitrary servers with random connections.
Add a permission to limit what browsing contexts can create such connections, and disallow all without the permission explicitly declared (at least for cross-origin contexts).
- This allows site operators to prevent loaded iframes and the like from targeting arbitrary servers with random connections.
Require explicit user permission (similar to geolocation) to allow opening the sockets allowed per the CSP directive and permissions policy.
- The permissions would be able to provide the list of what could open the connection and the CSP directive would be able to provide the list of where it could connect, and these two lists could be provided to the user so they could make a more informed decision on the matter, without necessarily having to know all sorts of low-level networking internals. (If anything, it'd scare non-technical users a little, especially if the domains don't look like they're from the site in question.)
- For * and *:port, the prompt could ask if the user would like to give the site permission to search their network and connect to local devices on their network (using port whatever if such a port is given) instead, possibly with an additional snippet noting the obvious privacy ramifications. This would appear to the user to be a lot more dangerous than simply "can this website connect to anything on the internet", and so it'd be a little harder to simply socially engineer around.
- The permissions prompt here would resolve my concerns in #19 (comment) as it wouldn't be fully automated. (With an explicit permissions prompt, you'd have to have control over the machine to automate it, and by that point, it'd be easier and more cost-effective to just do it directly in C or whatever.)

I won't say it would address the social engineering concerns from the design review in full (virtually anything can be socially engineered around at least in theory - just ask your average scam boss or physical pentester), but I do feel this would at least be a step in the right direction.

Current status?

What is the current status of this proposal? Is there still an intent on moving it forward?

What is the concern with CORS?

Threat
Attackers may use the API to by-pass third parties' CORS policies.

Mitigation
We could forbid the API from being used for TCP with the well known HTTPS port, whenever the destination host supports CORS.

I don't get it. Isn't the whole reason for CORS "Resource Sharing" (e.g. indirectly using resources like cookies belonging to another domain). With direct sockets, no shared resource is being accessed, all the browser does is open a TCP connection (e.g. no cookie accessed or sent anywhere).

I can understand that TCP connections could be abused by some websites (e.g. using your browser for spamming, accessing unsecured local services, etc.) but this can be solved with a permission style popup just like with the geolocation or webcam APIs.

But I don't get how it has anything to do with CORS or why some arbitrary ports should be blocked.

Is IP multicast in scope?

It seems the API would allow IP mulicast, and if so it would help to describe some use cases for it. I'm not so closely involved any more but I did write a white paper, describing BBC R&D's work experimenting with multicast in the browser. This was based on a multicast profile of the QUIC transport protocol, available as an Internet-Draft: https://tools.ietf.org/html/draft-pardue-quic-http-mcast-06

@GrumpyOldTroll is pursuing browser multicast from a different angle -the MulticastReceiverAPI - https://github.com/GrumpyOldTroll/wicg-multicast-receiver-api/blob/master/explainer.md. So I'm pinging him here.

Proposed privacy mitigation- raises privacy concerns.

If the API is denied in private browsing modes, then you can easily detect if a user is in a private browser mode by simply attempting a connection, if its denied then you know the user is in some private mode,

this could be used as a way to track users using private modes, or potentially as a way of 'browser fingerprinting'

Specification is missing normative content

The specification right now contains:

IDL
A few non-normative summaries of method names (e.g. "openTCPSocket is used to open a client TCP socket.") and dictionary members (e.g. "The port that the socket should be bound to. When omitted, or 0, any port can be used.")
One "SHOULD NOT" normative requirement: "User agents that do not support direct sockets SHOULD NOT expose openTCPSocket() or openUDPSocket() on the Navigator interface." which is kind of strange because (a) in general if a user agent doesn't implement a spec, it won't implement the IDL portions of a spec either, so we don't bother stating that; (b) this is only a SHOULD NOT, not a MUST NOT, which is quite unusual.

To write a full specification, you'll need normative algorithm steps, paralleling implementation code, for all public APIs.

A good example of this for a streams-based spec is https://wicg.github.io/serial/. Notable places:

The requestPort() method shows how to normatively specify methods steps, including the processing of dictionary members, checking for transient activation, checking for permissions policy, etc.
The readable getter steps show how to lazily construct a ReadableStream instance, with the appropriate pullAlgorithm and cancelAlgorithm. You might not want to do this lazily, but the stream construction example there is the right amount of detail.

Note how it's very precise about error conditions, task queuing, state management, etc. But it's appropriately vague about the actual getting-bytes parts, saying "Invoke the operating system to read up to desiredSize bytes from the port, putting the result in the byte sequence bytes." I.e. you don't need to explain to people how to use OS TCP libraries.
Similarly the writable getter steps for the WritableStream construction.

Explainer should explain the API design choices

E.g.:

Why use streams? Especially for UDP sockets? #17 is relevant. (To be clear, I think using streams is the right decision!)
Explain the connection-establishment style. Probably after doing #18 which suggests a better design :).
Comparison to other popular socket APIs, e.g. B2G, Chrome Apps, Node.js.

Question regarding allowHalfOpen Option

Node.js' socket API allows for a allowHalfOpen option whose behavior is simply defined as:

If allowHalfOpen=false, when a socket receives a fin from the peer, the socket will be automatically transitioned into a draining state where the writable side is closed, existing writes in the queue are permitted to drain, followed by a fin sent immediately when the queue is drained.

If allowHalfOpen=true, when a socket receives a fin from the peer, only the readable side of the socket is closed. The writable side remains open until the user code explicitly closes the stream.

At all times, the stream allows a writable-closed-readable-open state.

For Node.js sockets, the default is allowHalfOpen=false.

The key question here is whether this is something that y'all considered for direct sockets and ruled out, didn't consider at all, decided Node.js' behavior is wrong, etc.

wicg / direct-sockets Goto Github PK

direct-sockets's Introduction

direct-sockets

Links

direct-sockets's People

Contributors

Stargazers

Watchers

Forkers

direct-sockets's Issues

Recommend Projects

Recommend Topics

Recommend Org