multiformats / multiaddr Goto Github PK

View Code? Open in Web Editor NEW

415.0 44.0 84.0 106 KB

Composable and future-proof network addresses

Home Page: https://multiformats.io/multiaddr

License: MIT License

Makefile 3.99% Go 61.46% Gherkin 14.37% Clojure 20.18%

multiaddr's Introduction

multiaddr

Composable and future-proof network addresses

Introduction
Use cases
- Encapsulation based on context
Specification
- Encoding
- Decoding
Protocols
Implementations
Contribute
License

Introduction

Multiaddr aims to make network addresses future-proof, composable, and efficient.

Current addressing schemes have a number of problems.

They hinder protocol migrations and interoperability between protocols.
They don't compose well. There are plenty of X-over-Y constructions, but only few of them can be addressed in a classic URI/URL or host:port scheme.
They don't multiplex: they address ports, not processes.
They're implicit, in that they presume out-of-band values and context.
They don't have efficient machine-readable representations.

Multiaddr solves these problems by modelling network addresses as arbitrary encapsulations of protocols.

Multiaddrs support addresses for any network protocol.
Multiaddrs are self-describing.
Multiaddrs conform to a simple syntax, making them trivial to parse and construct.
Multiaddrs have human-readable and efficient machine-readable representations.
Multiaddrs encapsulate well, allowing trivial wrapping and unwrapping of encapsulation layers.

Multiaddr was originally thought up by @jbenet.

Interpreting multiaddrs

Multiaddrs are parsed from left to right, but they should be interpreted right to left. Each component of a multiaddr wraps all the left components in its context. For example, the multiaddr /dns4/example.com/tcp/1234/tls/ws/tls (ignore the double encryption for now) is interpreted by taking the first tls component from the right and interpreting it as the libp2p security protocol to use for the connection, then passing the rest of the multiaddr to the websocket transport to create the websocket connection. The websocket transport sees /dns4/example.com/tcp/1234/tls/ws/ and interprets the tls in this context to mean that this is going to be a secure websocket connection. The websocket transport also gets the host to dial along with the tcp port from the rest of the multiaddr.

Components to the right can also provide parameters to components to the left, since they are in charge of the rest of the multiaddr's interpretation. For example, in /ip4/1.2.3.4/tcp/1234/tls/p2p/QmFoo the p2p component has the value of the peer id and it passes it to the next component, in this case the tls security protocol, as the expected peer id for this connection. Another example is /ip4/.../p2p/QmR/p2p-circuit/p2p/QmA, here p2p/QmA is passed to p2p-circuit and then the p2p-circuit component knows it needs to use the rest of the multiaddr as the information to connect to the relay node.

This enables nesting and arbitrary parameters. A component can parse arbitrary data with some encoding and pass it as a parameter to the next component of the multiaddr. For example, we could reference a specific HTTP path by composing path and urlencode components along with an http component. This would look like /dns4/example.com/http/GET/path/percentencode/somepath%2ftosomething. The percentencode parses the data and passes it as a parameter to path, which passes it as a named parameter (path=somepath/tosomething) to a GET request. A user may not like percentencode for their use case and may prefer to use lenprefixencode to have the multiaddr instead look like /dns4/example.com/http/GET/path/lenprefixencode/20_somepath/tosomething. This would work the same and require no changes to the path or GET component. It's important to note that the binary representation of the data in percentencode and lenprefixencode would be the same. The only difference is how it appears in the human-readable representation.

Use cases

TODO: unpack the shortcomings of URLs
- example: hostnames in https://
  - can't sidestep DNS
  - can't use different SNI vs. Host headers
  - can't do http-over-utp
  - TODO check out how http/1.1 vs. http/2 is distinguished
- rift between filesystem, web, and databases
TODO: case study: domain fronting
TODO: case study: tunnelling
TODO: case study: http proxying
TODO: case study: multi-hop circuit relay
TODO: case study: protocol migrations (e.g. ip4/ip6, 4in6, 6in4)

Encapsulation based on context

Although multiaddrs are self-describing, it's possible to further encapsulate them based on context. For example in a web browser, it's obvious that, given a hostname, HTTP should be spoken. The specifics of this HTTP connection are not important (except maybe the use of TLS), and will be derived from the browser's capabilities and configuration.

example.com/index.html
/http/example.com/index.html
/tls/sni/example.com/http/example.com/index.html
/dns4/example.com/tcp/443/tls/sni/example.com/http/example.com/index.html
/ip4/1.2.3.4/tcp/443/tls/sni/example.com/http/example.com/index.html

The resulting layers of encapsulation reflect exactly how the bidirectional stream between client and server is constructed.

Now you can imagine how based on the browser's configuration, the multiaddr might look different. For example you could use HTTP proxying or SOCKS proxying, or use domain fronting to evade censorship. This kind of proxying is of course possible without multiaddr, but only with multiaddr do we have a way of consistently addressing these networking constructions.

Specification

Human-readable multiaddr: (/<protoName string>/<value string>)+
- Example: /ip4/127.0.0.1/udp/1234
Machine-readable multiaddr: (<protoCode uvarint><value []byte>)+
- Same example: 0x4 0x7f 0x0 0x0 0x1 0x91 0x2 0x4 0xd2
- Values are usually length-prefixed with a uvarint

Multiaddr and all other multiformats use unsigned varints (uvarint). Read more about it in multiformats/unsigned-varint.

Encoding

TODO: specify the encoding (byte-array to string) procedure

Decoding

TODO: specify the decoding (string to byte-array) procedure

Protocols

See protocols.csv for a list of protocol codes and names, and protocols/ for specifications of the currently supported protocols.

TODO: most of these are way underspecified

/ip4, /ip6
/ipcidr
/dns4, /dns6
/dnsaddr
/tcp
/udp
/utp
/tls
/ws, /wss
/ipfs
/p2p-circuit
/p2p-webrtc-star, /p2p-webrtc-direct
/p2p-websocket-star
/onion

Implementations

js-multiaddr - stable
go-multiaddr - stable
- go-multiaddr-dns
java-multiaddr - stable
haskell-multiaddr - stable
py-multiaddr - stable
rust-multiaddr - stable
cs-multiaddress - alpha
net-ipfs-core - stable
swift-multiaddr - stable
elixir-multiaddr - alpha
multiaddr sub-module of Python module multiformats - alpha
dart-multiaddr - alpha
Kotlin
- kotlin-multiaddr - stable
- multiaddr part of Kotlin project multiformat - alpha

TODO: reconsider these alpha/beta/stable labels

Contribute

Contributions welcome. Please check out the issues.

Check out our contributing document for more information on how we work, and about contributing in general. Please be aware that all interactions related to multiformats are subject to the IPFS Code of Conduct.

Small note: If editing the README, please conform to the standard-readme specification.

License

multiaddr's People

Contributors

Stargazers

Watchers

Forkers

richardlitt sbuss basile-henry ianopolous mediachain keks celeduc tabrath richardschneider dryajov mateon1 marten-seemann ligi mkg20001 cryptominiun dataroads cristicmf gabcoh diemyst tomaka changjiashuai alvarlaigna rtradeltd backkem acud evanhahn amxx dalavancloud madninja blockarraygroup jorropo jayd2446 dappsinc jasnell filips123 lukereichold s-you ntninja cnxtech aratz-lasa shadowjonathan decanus l1kw1d yottachain straiforos defi-tools themanagementfirm chmac codynhat echallenge crypt0r3n3g4d3 isabella232 clash-of-games mxinden godcong sitedata osarukun hopejrd allanmangeni mriise samkenxstream sgtpooki alvin-reyes marcopolo ckousik jacquelinevv0693 sg495 violethaze74 ben221199 jacklund myldn andre-beautrait beautrait lilsunny243 p-shahi mywalleteosz iq-scm yogiliu silvrbckw erwin-kok andreasmhahn quantumcryptohub-io

multiaddr's Issues

how to encode publicKeys / other meta infromation

How would you add something like a public key? Like this?

/ip6/<ipv6 str addr>/<tcp int port>/<pubkey>

Also in rlpx we only have tcp. Never udp. So can we drop the udp/tcp identifier?
Lastly, Pubkeys are optional. How can that be expressed?

Related
#6 (comment)

Defining /http

Currently, libraries are implementing /http while it hasn't been defined 1 2. We should catch up with the implementations and define it, then implement the behaviour.

As far as I know:

Can be be wrapped within /tls to support https.
Is a path protocol (as /unix), meaning no further multiaddrs can follow specifying /http/a/b/c as we don't know when it ends. This is a general problem for path protocols (currently just /unix)
Unsure of how to deal with complex strings that contains ?, / and other characters that might confuse multiaddr. Encode them maybe?

Examples:

https://1.2.3.4:5001/api => /ip4/1.2.3.4/tcp/5001/tls/http/api
http://1.2.3.4/api => /ip4/1.2.3.4/tcp/80/http/api
http://1.2.3.4 => /ip4/1.2.3.4/tcp/80/http
http://1.2.3.4/api/ => /ip4/1.2.3.4/tcp/80/http/api/ (keeping trailing slash)

Issues mentioning /http and /tls/http (https)

Consolidate multiaddr transformations

We already support a number of map/filter like operations on multiaddrs, which take as input a list of multiaddrs and sometimes certain conditions, and return as output a list of multiaddrs.

fmt (a.k.a. mafmt)
- lets us define protocol schemata and filter addresses that match
- e.g. the schema /ip6/udp matches /ip6/::1/udp/4002
- implementations: go-multiaddr-fmt, js-mafmt
filter
- lets us filter addresses based on membership of an IP CIDR block
- e.g. /ip4/192.168.0.0/ipcidr/16 matches /ip4/192.168.1.42
- implementations: libp2p/go-maddr-filter
addr-util
- filters certain classes of IP addresses (link-local, transports that can't dial out, etc.)
- implementations: libp2p/go-addr-util, go-multiaddr-net
dns
- resolves /dnsaddr, /dns4, and /dns6 addresses
- implementations: go-multiaddr-dns, js-multiaddr

The first three should probably be collapsed into a single package.

A minimal generalized API for these transformations should be part of the multiaddr spec.

Proposal: Add keyword arguments to protocols

Problem description

The current MultiAddr spec does not have any good way for dealing with optional protocol parameters that have well defined defaults. Depending on the specific protocol in question different workarounds have been proposed, the predominant theme being recursion:

IPv6 link scopes: /ip6/fe00::32/ip6zone/6/…
TLS Server Name Identification: /tls/sni/example.com/…

This has the obvious problems that:

Each protocol must have a special parser which will then greedily swallow up all following components that it considers relevant
All possible of such “attribute protocol items” must be reserved to ensure that their names are no used as “regular” / “top-level” protocol names
- As protocols evolve this may also cause nasty conflicts between newly defined attributes and existing protocol names.
- While attribute names may be shared between different protocols they must still be treated as a separate class from top-level protocol names since they may never appear top-level while still sharing a common namespace with that top-level class.
It is not immediately obvious to a human reader which items are items of the previous top-level protocol and which constitute the start of a new encapsulation layer

There also do not seem any obvious advantages to this scheme that would somehow make the above problems appear like reasonable trade-offs.

Another proposal suggested in some places (#63) was using plain greediness: After a given protocol item shows up in the path, all further items are swallowed up and used as single “path parameter”:

HTTP: /http/example.com/api/v1 (here example.com is the hostname and /api/v1 the HTTP path base)
WS and WSS: /wss/example.com/api/v1/tls/ws
Unix domain: /unix/path/to/socket.sock/tls/ws

While HTTP arguably is a terminator protocol (meaning that no other protocol may follow it anyways – this notion needs separate discussion!), Unix domain sockets and WebSockets definitely are not. Hence, it is unclear how a parser should figure out that /tls does not refer to a path component and whether this even is the case (the parser would have to proactively probe the file system for this, which is very much not in line with the vision of MultiAddr being a common description of paths to application endpoints; with WebSockets this is not even reliably possible to start with).

The example with WebSockets in particular demonstrates why this cannot work. A suggested alternative was to wrap the path parameter inside some kind of special set of delimiters (different kinds of braces were suggested):

/wss/(/example.com/api/v1)/tls/ws

While this works, it does not take into account the fact that there is nothing usually required about the given parameter: The hostname can usually be inferred from previous protocol levels (and left empty if unknown) and the path may always be empty.

Also potentially relevant data (such as HTTP basic auth) may be missing from the above. By combining the two approaches discussed above we arrive at something similar to the following:

/wss/(/example.com:4443/api/v1)/user/john/password/doh/cookie/bla=blab/tls/ws

Or the following when excluding all attributes:

/wss/()/tls/ws

Neither of these strike the author as particularly intelligible.

This proposal will not attempt to resolve the issues with Unix domain sockets.

Proposed solution

Summary:

Allow each protocol to carry an arbitrary number of keyword arguments whose meaning is protocol dependent
Deprecate existing attribute protocol items: ip6zone
- (Are there more actually standardized at the moment?)

Text-representation syntax

Extending the current spec, each protocol name may now optionally be followed by an opening parenthesis character (() indicating the start of the protocol parameter list. This is to be followed by an arbitrary number of key-value parameters, each delimited by the coma character (,) and terminated by a closing parenthesis character ()). After this closing character a forward slash (/) is expected. If the parameter list is skipped the protocol name should immediately be followed by a forward slash (as is currently the case); an empty parameter list (()) is allowed as well.

Each key-value pair consists of a name, made up only of ASCII lower-case characters, ASCII digits and the ASCII minus sign (-), followed by a single equals sign (=), followed by an arbitrary UTF-8 encoded value. The value may contain any character other then the NUL-byte, but requires escaping of the following characters using a single backward slash (\) if they are to appear inside the value field: opening (() and closing parenthesis ()), the coma character (,) and the backward slash (\) itself. Most importantly the forward-slash (/) does not need to be escaped since it carries no special significance inside protocol parameter list; this allows for easy embedding of paths, like in the following example:

/http(host=example.com,base=/api/v1)
/http(base=/endpoint\(1:2\))

More examples:

/tls(sni=example.com)
/ip6(scope=6)/fe00::32/tcp/80/http
/wss(host=example.com:4443,base=/api/v1,user=john,password=doh,cookie=bla=blab)/tls/ws
- Note: The name host here refers to the HTTP Host-Header and has nothing to do where to connection will actually be made to.
/wss/tls/ws

Each protocol may still accept zero or one static parameters or known or unknown binary length after the final forward-slash. It is expected the use of optional parameters will be minimal in practice (HTTP-y stuff probably being the prominent exception here, not the rule).

(Precise syntax subject to change/bikeshedding!)

Binary-representation syntax

The general format for the binary syntax is:

<BinaryMultiAddr> := (<ProtocolBinary>(<AttributeBinary>*))+

<ProtocolBinary> is the binary MultiAddr representation of the protocol itself and uses the following format:

<ProtocolBinary> := <ProtocolType>([NIL]|<ProtocolValue>|<ProtocolLength><ProtocolValue>)

The format used for the <ProtocolValue> part of the representation depends on the <ProtocolType>:

[NIL] (No value): Used by all protocols with zero static parameters; no value follows and attributes or further protocols may immediately follow.
<ProtocolValue>: Used by all protocols with one static parameter of known binary length; the value, of a length predefined for each protocol type, immediately follows.
<ProtocolLength><ProtocolValue>: Used by all protocols with one static parameter of variable binary length; the <ProtocolLength> is a UVarInt containing the length of the following protocol value.
The mapping between the text and binary representation of the protocol's value may be implemented by an arbitrary protocol-specific function, as long as it is ensured that such transformation may be performed without loss of information with regards to the protocol described. That is, the following constraints must hold:
- text_value ࣃ≃ binary2text(text2binary(text_value))
- binary_value ≃ text2binary(binary2text(binary_value))
- ≃ means “must be equal with regards to the constraints imposed by the protocol” – for instance, DNS names are case-insensitive hence a loss of case may be acceptable as this is not considered relevant “information” in this protocol (XXX: find better wording for this).
Due to this definition it is not possible to parse binary MultiAddrs with unknown protocol values.

<AttributeBinary> is the binary MultiAddr representation of a single protocol attribute and must follow either a protocol binary representation or another attribute. All attributes share a single format:

<AttributeBinary> := [ATTR_TOKEN]<AttributeKey><AttributeLength><AttributeValue>

In this definition:

[ATTR_TOKEN] is a reserved UVarInt indicating the start of an attribute, whose value must not every be used for a <ProtocolValue> (TODO: Decide on a value)
<AttributeKey> is a UVarInt from a table of known attribute names. Attributes in this table are not bound to any specific protocol, it serves only as a look-up table for keeping the binary representation of attributes small.
<AttributeLength> is a UVarInt determining the length of the following <AttributeValue> in bytes.
<AttributeValue> is the UTF-8 encoded text of the attribute's value in the text representation.

TODO: Allow storing unknown attributes in binary, whose names are not in the table?

Other requirements

Unexpected parameters should result in an error when trying to instantiate the given protocol and may result in an error during parsing of the given MultiAddr. For each expected parameter there must be a sensible default value and parameters whose value corresponds to such default value should be omitted from the textual and binary representations. All parameters must be optional, for mandatory parameters the current /protoname/param syntax should be used instead.

EDIT 1: Some language improvements + language-change to always call it an “HTTP path base”, since the path only refers to the path bases used to multiplex different HTTP services of a single hostname and not about referring to actual single files

EDIT 2: Added example for escaping

EDIT 3: Specify binary encoding (but specific to the proposal at hand and for what we already have)

Reading list

The uniform resource locator (URL) is a data structure and an associated serialization format that aims to uniquely identify any resource on the Internet (and other networks). (See also uniform resource identifier (URI).) That’s a lofty goal, but it has proven more or less tractable and practical. Which is astounding and great! A global network namespace enables powerful applications, and powerful interactions between applications.

However, URLs have some problems of usability, security, and economics. Many of us have wished for a global namespace with fewer problems. I’ll address that first, and then I’ll have some fun with the technical aspects of the problem. You can skip that stuff, if you like.

https://noncombatant.org/2017/11/07/problems-of-urls/

add swarm protocol code

Currently there does not exist a protocol code for Ethereum's Swarm content hashes (https://swarm-guide.readthedocs.io/en/latest/usage.html#bzz-url-schemes).

This makes it difficult to use multiaddr in Ethereum Improvement Proposal #1577 (ethereum/EIPs#1577).

Multiple multiaddrs

How should multiple multiaddr addresses be expressed? (both in string and binary form). There may be a nice way to allow "anything that takes one multiaddr" to take "n multiaddrs".

How should variable length data be parsed for machine encoding?

Hello, I was wondering how variable length data formats are supposed to be handled for the packed machine encoding? How are we supposed to know when to read the next codec? Examples of variable length formats are: unix, any of the dns family, p2p/ipfs, garlic, and memory.

/dns in multiaddrs

There's been some conversation on how to handle DNS, in these issues and others:

Finally looking back at this-- sorry for the delay everyone -- @sivachandran @lgierth @Gaboose and more. I am going to collect the options here and make a decision in coming weeks.

Proposals so far

So far, the proposals are:

Proposal 1: `/dns/<domain>/<proto>`

# format
/dns/<domain>/<proto> 

# example
/dns/example.com/ip4  --resolves-to-->  /ip4/1.2.3.4

Proposal 2: `/dns/<domain>/<record-type>`

# format
/dns/<domain>/<record-type> 

# possibilities / examples 
/dns/example.com/A     --resolves-to-->  /ip4/1.2.3.4
/dns/example.com/AAAA  --resolves-to-->  /ip6/1::1
/dns/example.com/CNAME --resolves-to-->  /dns/another.com/*
/dns/example.com/MX    --resolves-to-->  /dns/mail.another.com/* 
/dns/example.com/A     --resolves-to-->  /dns/another.com/A  (through a CNAME)
/dns/example.com/TXT   --resolves-to-->  /ipfs/QmZS2PeM6rtSJXPPSBiqyNRss3M1JpuNAkkQmzEWR7aw1w  (through a dnslink)

Adding MQTT protocol

IoT seems to be a good use-case of libp2p in general, so support should be added for IoT protocols like MQTT. Luckily it already has a path-like addressing syntax, so it could be considered a (restricted) type of path protocol, eg /mqtt/stat/tasmota/POWER. This can then be combined with host resolution protocols, eg /mdns/mqtt.local/tcp/1883/tls/sni/mqtt/stat/tasmota/POWER

Handle unparsable multiaddrs

Due to the fact that binary multiaddrs don't include protocol definitions, we can't parse them unless we know all the relevant protocols.

However, we can parse a prefix. Therefore, I'd like to propose a special, string-only "unknown" protocol that takes a single multibase encoded argument. That is: `/ip4/1.2.3.4/tcp/123/unknown/bxyz". This would only exist in the string format and would allow us to keep and use multiaddrs we don't fully understand.

This should fix some of the problems described in #6.

Add WebTransport

Similar transport to Websocket that uses a http/3 instead. Great news for browsers as it (should) allow them to use the multiplexing and congestion control capabilities of QUIC.

w3 draft: https://www.w3.org/TR/webtransport/
firefox: https://bugzilla.mozilla.org/show_bug.cgi?id=1692754
chrome: https://chromestatus.com/feature/4854144902889472

Supporting ZeroMQ

ZeroMQ imposes specific framing on a transport and therefore requires further detail added to the address to differentiate it from a standard transport. For example: /ip4/127.0.0.1/tcp/3000 describes how to reach a server but if that server is running a 0MQ socket on it, standard tcp applications will not be able to communicate therefore an extension will be needed to adequately describe it: /ip4/127.0.0.1/tcp/3000/0mq

There are a lot of protocols (like #106) that require additional information added to the address, perhaps a multiaddr code that describes a generic protocol could be useful. For example: /ip4/127.0.0.1/tcp/3000/protocol/0mq This would allow developers to use multiaddr without requiring custom additions directly to the table. However this would only solve protocols that don't require a value since the protocol name becomes that value part - I think #70 might be more along the lines of that.

Preserve code 104 for backward compatability

104 is the ascii letter for h, which will be the first letter in http and https urls.

I've been suggesting the use of multiaddr in EIPs, but they've mostly been rejected. The reason given is that multiaddr is not "stable". Of course, mutliaddr will probably never be truly stable, but thats a discussion for another day.

All that said, preserving 104 would allow for wider multiaddr use in protocols which currently only allow for http and https urls

Protocol specifiers are ambiguous?

If you use Unix file paths and delimiters, it seems like it would be really easy to make things ambiguous.

First off, I don't see a "file" protocol type, is this the the unix protocol type listed in protocols.csv?

If so, what happens if I make a file path /home/icefox/tcp/127.0.0.1/foo and try to represent it as a multiaddr path?

/unix/home/icefox/tcp/127.0.0.1/foo

Okay, that's a little artificial, and you could specify that you can disambiguate from context; a unix protocol address cannot contain a tcp protocol subaddress. Though that seems to defeat the purpose; things stop being self-describing and become context-dependent. (And on Plan9 you certainly can do TCP via the filesystem.)

Okay, what about a more concrete example using already defined protocols:

/wss/example.com/http/example.com

Is this accessing the websockets endpoint wss://example.com/http/example.com, or is it trying to nest the HTTP protocol inside the websockets endpoint wss://example.com/ ?

Does Multiaddr Support ip4 and ip6 in same string?

Is it possible to encode something like:

/ip4/127.0.0.1/ip6/::1 to indicate that the two addresses go to the same place? Or is this invalid syntax?

Add experimental protocol

I propose to reserve an identifier for "experimental" protocols. Generally this option will be ignored (even dropped in public networks) but it can be used to facilitate implementers experimenting with new protocols, or with add-on to existing ones.

How to handle SSL/TLS?

If an address is listening for TLS connections, how should that be encoded? Encoding X509 certificates seems particularly tricky. You could base64 encode the certificates, but that is variable length, which the format doesn't seem to handle right now.

NPM, GIT and related

I would like to use multiaddr for locating packages like in npm package.json.

{
  "dependencies": {
    "http": "/ipfs/QmVv4Wz46JaZJeH5PMV4LGbRiiMKEmszPYY3g6fjGnVXBS",
    "react-fatigue-dev": "/github/tj:react-fatigue-dev",
    "lodash": "/npm/lodash"
  }
}

Github address

I think github address is a good idea, it is just a wrapper for the address.
Github address is much simpler because we can use : for user and repository separator.

/github/tj:react-fatigue-dev --->
  /dns/github.com/git/tj:react-fatigue-dev ---> tj/react-fatigue-dev in fact

Git address

Github address does not solve the problem if we cannot create a valid multiaddr for the repo.

/dns/github.com/git/tj/react-fatigue-dev ---> invalid multiaddr?

/unix multiaddr

/unix was added to go-multiaddr a while ago in multiformats/go-multiaddr#31. It needs to be added here too.

udp and tcp vs ip4 and ip6

What is the value of having udp and tcp in the registry, separate from ip4 and ip6? They are on different layers.

In the examples in the README.md you have tcp4://... but not so in the protocols.csv file.

One of the uses for this is to bootstrap IPFS to connect to other nodes. Are there any other uses? It is not apparent from the repo. If only for connecting to other IPFS nodes, you should probably be happy by using a generic ipfs-bootstrap: URI allowing hostnames (works for .onion as well) and IP addresses. If connecting to local sockets, a file URI also works.

Connectivity Transports

It would be interesting to add special transports for thing like:

packet loss
latency
bandwidth limits
with parameters specified as the address.

Something like:

/ip4/1.2.3.4/testloss/0.7/udp/1234     # 70% packet loss
/ip4/1.2.3.4/testlat/0.250:1/udp/1234  # + ~ N(0.25, 1) s of latency
/ip4/1.2.3.4/testbw/10M:1M/udp/1234    # bandwidth ratio limit: 10Mbps down : 1Mbps up

Extending multiaddr table per Application

Anything in the works to make it plugabble? In case I need to support some custom protocol for my app like:
"/ip4/104.236.151.122/david-protocol/best-protocol-eva!"

Release multiaddr to maven central

The build of dependent projects rely on this to be found on central or some other mvn repository. I can provide a PR that would allow to run

mvn release:prepare

I could then use the created tag to deploy it to central. Better still, someone else with appropriate authorization could run

mvn release:perform

Blocked by: ipld/java-cid#7

Where can i find a list off all the protocol codes?

Not just the ones in protocols.csv

Potential import collision: import path should be "github.com/rdumont/assistdog", not "github.com/hellomd/assistdog"

Background

I find that github.com/rdumont/assistdog and github.com/hellomd/assistdog coexist in this repo：
https://github.com/multiformats/multiaddr/blob/master/test/go.mod （Line 10 & 5）

github.com/hellomd/assistdog v0.0.0-20171107191847-c91d7a54538c 
github.com/rdumont/assistdog v0.0.0-20171107191847-c91d7a54538c // indirect

That’s because the assistdog has already renamed it’s import path from "github.com/hellomd/assistdog" to "github.com/rdumont/assistdog". When you used the old path "github.com/hellomd/assistdog" to import assistdog, go will reintroduces assistdog through the import statements "import github.com/rdumont/assistdog/…" in the go source file of assistdog.

https://github.com/rdumont/assistdog/blob/c91d7a54538c6d66d976d4e71ad31b8661eddea3/assist.go#L11

package assistdog
import (
	"github.com/DATA-DOG/godog/gherkin"
	"github.com/rdumont/assistdog/defaults"
	…
)

The "github.com/rdumont/assistdog" and "github.com/hellomd/assistdog" are the same repos. This will work in isolation, bring about potential risks and problems.

Solution

Replace all the old import paths, change "github.com/hellomd/assistdog" to "github.com/rdumont/assistdog ".
Where did you import it: https://github.com/multiformats/multiaddr/search?q=hellomd%2Fassistdog&unscoped_q=hellomd%2Fassistdog

Colon string format?

Should the string format be:

/ip4:192.168.0.1/udp:12345

/ip4/192.168.0.1/udp/12345

DNS name binary encoding

As mentioned in #22, /dns, /dns4, /dns6 & /dnsaddr do not currently have an official binary encoding defined.

In py-multiaddr we currently use Unicode for the text representation and IDNA-2008/Punnycode for the binary representation. Obviously this distinction is only relevant for domains with labels containing non-ASCII characters.

How is this handled in other implementations? Is the current behaviour something that could be standardized or is another behaviour more desirable?

How to deal with ip4 and ip6?

Having /ip4 and /ip6 explicit is very useful and nice. Though there are use cases where people want to be able to represent either, such as being able to have a /dns multiaddr like:

/dns/example.com/tcp/80

Resolve to either ip4 or ip6 addresses, depending on the application's capabilities

/ip4/1.2.3.4/tcp/80
/ip6/aaaa::1/tcp/80

I'm not convinced this is necessary, we can likely do without.
But opening issue to gather thoughts
This is also related to combinations of multiaddrs

Improve Readme

Multiaddr readme needs to get to level of multihash, etc

Handle Both Types Of Onion Service Address

Right now it appears /onion is a fixed size which likely relates to Tor v2 onion service addresses. However, there is a longer v3 onion service address that is supported. Please either make a new protocol name for v3 or remove the size restriction on /onion.

EIP1577 - Multiaddr support for ENS

Conversation happening at https://ethereum-magicians.org/t/eip1577-multiaddr-support-for-ens/1969

Variable length addresses

At the moment, multiaddr only handles fixed-length addresses. If we were to add support for IPFS, it has variable length addresses (multihashes).

Not sure exactly how multiaddr should address this. Perhaps variable length addresses can be prefixed with their byte length when in binary packed form. e.g.

(<1 byte proto> (<n byte addr>))+
(<1 byte proto> (<1 byte addr size (n-1)><n-1 byte addr>))+

Why don't the protocol codes simply start at 0 and increment?

In protocols.csv, I see that the code column contains the integer representations of each protocol under multiaddress. However, I do not notice a pattern between each protocol and its respective code. Why don't the codes just start at 0 and increment for each new protocol added? Is there a pattern or set of rules that constitutes the assignment of a code number to a protocol?

MPTCP addresses

Data Roads Foundation projects have an interest in utilizing MultiPath TCP (MPTCP) connections between our partner cooperative's mesh nodes, edge caches, and VPN gateways -- especially in regard to our Unwatch.Me project. Multipath erasure, RAIL, or network coded UDP connections will probably be used instead in the future (to avoid repeat sends during congestion loss); but MPTCP already has kernel level support so it is usable today.

Optional addendum request: In addition to onion tunnels, we also have an interest in utilizing i2p tunnels where available, with the potential for load balancing or secret sharing between both.

In the multiaddr use case where CID or metadata records contain multiple multiaddr paths to the same content or transform set, similar to a Magnet URI or Metalink, then even where MPTCP is not implemented on the underlying platform this could be used to show a difference between separate content servers, vs. lone servers which merely have multiple accessible address and TCP pathways into the same hardware. This distinction may become important for load balancing in high performance or low latency tolerance distributed applications, for example real time multiplayer online games.

The protocol table entry could be shortened to mtcp if that is desirable.

Hopefully this site and post will address any objections to adding this "experimental" protocol to the supported list:

http://blog.multipath-tcp.org/blog/html/2017/01/04/experimental.html

As the MPTCP RFC 6824 linked in the post specifies the ability to send and receive an arbitrary number of accessible TCP/IP address:port pairs on each end of a given MPTCP session or flow connection, this /mtcp/ prefix could be interpreted to assume that an arbitrary number of (ip4|ip6/<address-value>/<port-num>/)+ 3-tuples will follow in the multiaddr sequence -- and that repeating tcp/ for each would be inefficiently redundant. Of course the multi-multiaddr use case summarized above could point to a different pattern, such as /mp/(<ip-version>/<address-value>/<protocol>/<port-num>)+ -- wherein the MultiPath nested protocol-per-path mix can be completely arbitrary per node, and the use of MPTCP becomes an optional and platform specific multi-multiaddr implementation detail.

I look forward to discussing all practical options here with anyone interested. Please point me to any prior related discussions I may have missed in my search.

Add DTLS protocol

We're considering building DTLS transports in libp2p/libp2p#49. Therefore, I suggest to add a 'dtls' multiaddr protocol. I'd like to have some input on the following:

We don't really need a 'value' part of the multiaddr to establish a connection. However, we could add a certificate fingerprint as a form of authentication. Is this a good use of the 'value'? Can this be made optional?
Should I open a PR on this, the multicodec or both repo's?

traceroute for multiaddr

link to the links :)

ipfs/notes#192

dnsaddr doesn't compose with p2p-circuit

For example, I can't have:

/dnsaddr/foo.com/p2p/QmRelay/p2p-circuit/p2p/QmTarget`

This expects foo.com to resolve to /something/something/p2p/QmTarget.

Moving forward with multiaddr

Speaking with @lgierth we've come up with a preliminary todo list for making multiaddr fit for general usage:

Outline use cases (fill them out probably)
- dns over https should be included
Consilidate discussions about paths in segment values (url/unix paths)
Come up with multiaddr interface (what functions every implementation should expose)
Encoding/Decoding (WIP: #54)
Figure out how to handle partial decoding of multiaddr
- Basically be able to parse selected parts of a multiaddr even if there is missing codecs, allowing custom/experimental codecs
Update the entire ecosystem (go+js+rust first)
Document dnsaddr + (dnslink)
- make dnslink/dnsaddr separation clearer

Ambiguity in /memory text format

The /memory multiaddr can contain an arbitrary payload. But if this payload contains a /, then you can't properly serialize it without introducing an ambiguity.

I would propose to revert #92

python implementation

I am slowly but surely moving IPFS in the Python direction :)

I'm currently working on an implementation of multiaddr in python and I had a few questions.

What is the point of the binary format in terms of communicating with the IFPS api? Or is it only used internally?
In the binary format, is every part encoded as an unsigned varint? Or just the protocol codes and IPFS hashes?

Better readme?

Would be fantastic to have a better readme in this repo to showcase the benefits of the format

cc @whyrusleeping @diasdavid @lgeirth @RichardLitt

Note: String Address representations should support both lower and upper case versions

When adding string address representations that use both lower + upper case (e.g. base58), compatibility with case-sensitive file systems is broken. This ideally would be avoided.

Why?

You write of the URL standard:

This isn't optimal. Instead, addresses should be formatted so:

but you don't write WHY it isn't optiomal and WHY your new standard is better...

/ws with paths not supported?

It appears that specifying a path with websockets is not supported. This isn't explicit (the /protocols dir from the README is missing) but at all the examples I can find use a /ip4/x.x.x.x/tcp/XXXX/ws type format and the test file for go-ipfs specifies /ws/foo as an invalid test case.

e.g. the javascript library ws does allow you to specify a path:

new WebSocket('ws://www.host.com/path')

while this isn't meaningful on the websockets side, it can be useful when e.g. using nginx to upgrade a connection to a particular path. Was this a conscious decision?

IPCIDR missing from table

/ip4/192.168.0.0/ipcidr/16 is mentioned in the docs as a swarm address filter. However, ipcidr is not defined in the protocols.csv table.

Vague format specification.

This format seems very vague as to the specifics of each protocol.

It may seem very intuitive that an ipv4 address (a 32 bit unsigned integer) should look like .<bits 23 to 16 in decimal>.<bits 15 to 8 in decimal>. (2130706433 should be represented as 127.0.0.1) but I think this should be clearly specified somewhere. The same goes for the binary representation, network byte order may seem like the obvious choice but how do I know what to use for a multiaddr?

These are just protocols which I know of and they have well known and established address representations, there are plenty of protocols in the csv with which I am not familiar. I am not sure how to go about implementing multiaddr without this information.

1 byte binary representation vs. protocol codes > 255

The spec states that in the binary representation every protocol has a 1 byte code. However, there are quite some protocols defined with a number that is not representable with 1 byte.

The fate of /libp2p-webrtc-direct

@diasdavid IIRC it doesn't work, so should we remove it?

Compressed Multiaddr Combinations

When handling many multiaddrs that have common prefixes, or sections, it has been discussed that it may be useful to have a format for expressing these as combinations, or DAGs. This is to allow much more compressed transmissions, and even in-memory representations.

For example, the list:

/ip4/1.2.3.4/tcp/4001/ipfs/Qmf8SVETTnpzzCJyurQa2ekxFwKnUNNYycLHsNfVjiq19B
/ip4/1.2.3.4/udp/4002/utp/ipfs/Qmf8SVETTnpzzCJyurQa2ekxFwKnUNNYycLHsNfVjiq19B
/ip4/127.0.01/tcp/4001/ipfs/Qmf8SVETTnpzzCJyurQa2ekxFwKnUNNYycLHsNfVjiq19B
/ip4/127.0.0.1/udp/4002/utp/ipfs/Qmf8SVETTnpzzCJyurQa2ekxFwKnUNNYycLHsNfVjiq19B
/ip6/::1/tcp/4001/ipfs/Qmf8SVETTnpzzCJyurQa2ekxFwKnUNNYycLHsNfVjiq19B
/ip6/::1/udp/4002/utp/ipfs/Qmf8SVETTnpzzCJyurQa2ekxFwKnUNNYycLHsNfVjiq19B

OR-Lists

It is an expansion of the combinations

/ip4/1.2.3.4 OR /ip4/127.0.0.1 OR /ip6/::1
/tcp/4001 OR /udp/4002/udt
/ipfs/Qmf8SVETTnpzzCJyurQa2ekxFwKnUNNYycLHsNfVjiq19B

This notation could be much more compressed way to represent these than the full list.

DAGs

Another such representation could treat it like a dag:

n0 := /ip4/1.2.3.4
n1 := /ip4/127.0.0.1
n2 := /ip6/::1
n3 := /tcp/4001
n4 := /udp/4002/udt
n5 := /ipfs/Qmf8SVETTnpzzCJyurQa2ekxFwKnUNNYycLHsNfVjiq19B
n6 := n0 OR n1 OR n2
n7 := n3 OR n4
n8 := n6 AND n7 AND n5

This one could also be very compressed, and allow representing much more versatile lists.

Some open problems with this:

Transforming lists of addrs into these reprs.
Dealing with changes efficiently (one node added or dropped) -- this may turn out pretty easy
Use it for the wire format transmission of a big multiaddr set
Use it for the in-memory representation of a big multiaddr set! (this could be very cool)
Making all this super efficient.

I'm not convinced how useful vs complex this would be.
Opening to gather thoughts.
Wonder if there's already some generic way of doing this over any set of strings that we can just use.