ipfs / go-bitswap Goto Github PK

View Code? Open in Web Editor NEW

217.0 39.0 112.0 3.08 MB

The golang implementation of the bitswap protocol

License: MIT License

Go 100.00%

go-bitswap's People

Stargazers

Watchers

Forkers

ivan386 taylormike john-waitforit cannium dms3-fs dgrisham luceas chaog b5 kjzz jbenet huiyiwyh jangocheng zenground0 lordgenry raulk bpot mattskala wade-welles proximax-storage limomomo jayd2446 rtradeltd dirkmc hinshun jimpick swedneck kpp mgr9525 holajiawei mrd0ll4r diaakasem npfoss ethfsx rchaves-veniam michaelmure adlrocha wolneykien jatofg tzdybal badkk joaosilvestre95 isabella232 yourkeychen daotl ronsherfey coryschwartz hubertwng crustio paulteck jorropo paulzyf 0racl3z shaheerbakali vdamle synzhu peergos wcgcyx compscidr korlaism mazhaojia123 matias-correia quieoo guseggert sitedata laurentsenta mjjdeluxe cpucorecore tomsfreitas iand ariescodescream hlm-ipfs diogo464 salvivona rmseq datahop isgasho yoshikawa0711 uss2022sayahi ebma aetheriaxai lukax gitaaron savetherbtz epikd panicalways oascigil peterargue onflow andriykalashnykov pengandkun gmh5225 guillaumemichel rishvin alienc0der phantue99 galargh nujz parthshah1 isaac-defrain

go-bitswap's Issues

Avoid keeping blocks in memory longer than we need to

When we get a block, we currently send it to every interested session (SessionManager.ReceiveBlocksFrom). Looking into this, we mostly just need the CID. That would:

Save us some memory (sometimes).
Avoid keeping blocks we didn't ask for around. We currently do this to track duplicates but I'd rather toss them ASAP.

We still need the blocks themselves to send them to the user but we should be able to avoid that by removing the per-session PubSub instance and reverting to a global PubSub instance.

Store a complete wantlist

As far as I can tell, sessions mean that we don't actually store a complete wantlist anywhere. This is extremely frustrating from a UX standpoint and makes solving #21 difficult.

Meta: Directions For Bitswap Improvement

As @dirkmc takes over on a lot of Bitswap maintenance, just want to summarize what I see as possible gotchas and directions for bitswap improvement:

Sessions (requesting blocks):

Adjusting request splitting to incorporate less optimized peers in requests on a periodic random basis

See existing PR: #157

Improve/fix request splitting based on duplication

Currently, the session request splitter attempts to minimize duplication of blocks by splitting requests so each block request goes to fewer peers when bitswap is receiving lots of duplicates, in an attempt to bring that number down.

There is at least one case where this is not a good approach:
#120

Specifically, if we fail to get a response from a request, we fall back to broadcast (see sessions.handleTick) -- which tends to result in more duplicates, which tends to cause us to split more, which in the case where we are already failing to fetch blocks, is the opposite of what we want.

My recommendation is to reset the split to a low value whenever we miss a block (i.e. handleTick) and reset dup tracking as well, and not start again till the blocks we broadcast for are received. alternatively, we could track if a want was targetted or broadcast, and not count any broadcasts in dup tracking, since they always produce dupes.

Another improvement here: currently the splitting is very binary and goes from one extreme to the other, because the code for adjusting it is so simple. It could be made less sensitive by making adjustments less frequent the longer a session is running.

Increase Wantlist size

There's no really good reason to limit session wants to 32 at this point -- it oughta at least be a slightly bigger number. Low hanging, needs to be tested

Returning blocks

Increasing TaskWorker Concurrency

ipfs/boxo#116 -- probably can't hurt, is low hanging, but would need to be experimentally tested to see if it helps.

Making Bitswap less error prone

It's hard to tell how much tis affects things these days, but one potential slowdown for bitswap is that the protocol has no error correction, meaning that frames can get dropped without noticing. This can result in lost want requests. Right now the only fix is to periodically rebroadcast the whole wantlist (see rebroadcastTimer in messageQueue.go). Another fix I proposed was to extend the protocol: ipfs/specs#201

Improve decision logic

The previous bitswap maintainer @whyrusleeping stated "The peer request queue is a priority queue that sorts available tasks by some metric, currently, that metric is very simple and aims to fairly address the tasks of each other peer. More advanced decision logic will be implemented in the future." I've yet to touch this in my time on bitswap, but I wonder what could be accomplished here, since most of my work has been around the requesting of blocks rather than the sending of them.

Tracking Performance

These wouldn't necessarily improve performance themselves but might help identify bottlenecks more effectively:

Better Simulated Networks

There is a PR that would add simulated DHT queries to Benchmarks -- #136 -- I'm not sure that's the best direction

Honestly, I think realistic testbeds at a higher level are the best option here

Better tracing/logging

If we get a test bed that supports Jaeger, I think tracing would be the key tool to actually tracking performance. That would be able to give us actual time elapsed in a request in different parts of the code, and to see how calls actually get made in real world scenarios.

Fair warning: Logging and tracing may end up touching go-log, which has its own rats nest of tasks to make it better, which have lingered for some time.

I'll add more ideas if I think of them -- @hannahhoward

Panic when using sessions

This:

func TestSession(t *testing.T) {
	ctx := context.Background()
	p1, p2, closer := setupPeers(t)
	defer closer(t)

	m := map[string]string{
		"akey": "avalue",
	}

	codec := uint64(multihash.SHA2_256)
	node, err := cbor.WrapObject(m, codec, multihash.DefaultLengths[codec])
	if err != nil {
		t.Fatal(err)
	}

	t.Log("created node: ", node.Cid())
	err = p1.Add(ctx, node)
	if err != nil {
		t.Fatal(err)
	}

	sesGetter := p2.Session(ctx)
	_, err = sesGetter.Get(ctx, node.Cid())
	if err != nil {
		t.Fatal(err)
	}
}

results in:

--- FAIL: TestSession (0.63s)
    ipfs_test.go:151: created node:  bafyreigzkampgfuhmld36ljrywwcxogf5zyjbkasehbggbmgpy5tmrygpe
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
        panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x4ee586]

goroutine 597 [running]:
testing.tRunner.func1(0xc0003a5e00)
        /usr/lib64/go/1.12/src/testing/testing.go:830 +0x392
panic(0xbbefc0, 0x150a8c0)
        /usr/lib64/go/1.12/src/runtime/panic.go:522 +0x1b5
context.propagateCancel(0x0, 0x0, 0xed0760, 0xc000372b80)
        /usr/lib64/go/1.12/src/context/context.go:242 +0x26
context.WithCancel(0x0, 0x0, 0x7, 0x2a, 0x0)
        /usr/lib64/go/1.12/src/context/context.go:231 +0x9c
github.com/ipfs/go-bitswap/sessionmanager.(*SessionManager).NewSession(0xc00065a180, 0x0, 0x0, 0x0, 0xc0005e3ce8)
        /home/hector/go/pkg/mod/github.com/ipfs/[email protected]/sessionmanager/sessionmanager.go:69 +0x6b
github.com/ipfs/go-bitswap.(*Bitswap).NewSession(0xc0000c5970, 0x0, 0x0, 0x40452f, 0xc0005e3d50)
        /home/hector/go/pkg/mod/github.com/ipfs/[email protected]/bitswap.go:401 +0x46
github.com/ipfs/go-blockservice.(*Session).getSession(0xc00036eaa0, 0x0, 0x0)
        /home/hector/go/pkg/mod/github.com/ipfs/[email protected]/blockservice.go:337 +0xbe
github.com/ipfs/go-blockservice.getBlock(0xedb4c0, 0xc0000ac010, 0xc000456450, 0x24, 0xee2360, 0xc0003131a0, 0xc0005e3e50, 0xc0000100f0, 0xc0005e3e78, 0x8de929, ...)
        /home/hector/go/pkg/mod/github.com/ipfs/[email protected]/blockservice.go:224 +0x209
github.com/ipfs/go-blockservice.(*Session).GetBlock(0xc00036eaa0, 0xedb4c0, 0xc0000ac010, 0xc000456450, 0x24, 0xc0004b4aa0, 0x7f924dc83f50, 0xc0004b4aa0, 0xc0003a5e01)
        /home/hector/go/pkg/mod/github.com/ipfs/[email protected]/blockservice.go:345 +0x92
github.com/ipfs/go-merkledag.(*sesGetter).Get(0xc0000100f0, 0xedb4c0, 0xc0000ac010, 0xc000456450, 0x24, 0x0, 0x0, 0xc0000b27c0, 0x29)
        /home/hector/go/pkg/mod/github.com/ipfs/[email protected]/merkledag.go:136 +0x66
github.com/hsanjuan/ipfs-lite.TestSession(0xc0003a5e00)
        /home/hector/go/src/github.com/hsanjuan/ipfs-lite/ipfs_test.go:158 +0x361
testing.tRunner(0xc0003a5e00, 0xe00b40)

Bad log formatting in workers.go

Just saw this in my IPFS logs:

16:07:31.315 DEBUG    bitswap: %!s(int=1)  keys in bitswap wantlist workers.go:188

Apparently you are using log.Debug(n, " keys in bitswap wantlist") to print this message (which btw is valid according to the interface), it's however not implemented correctly by go-logger, which uses %s as the format string for all parameters.

My initial idea was that this should be fixed in the logger, but changing it from %s to %v might have cascading adverse effects on the entire IPFS codebase, so maybe it's better to roll with the fault and maybe document the issue?

cc @whyrusleeping

Odd test failure.

--- FAIL: TestRateLimitingRequests (0.02s)
    providerquerymanager_test.go:268: Queries made: 7
    providerquerymanager_test.go:269: Did not limit parallel requests to rate limit

Test suite unstable on CI after bitswap refactor

Getting a lot of unstable builds, based on new tests in bitswap refactor.

Cleanup block receiving logic

We appear to send blocks to sessions twice: once using ReceiveBlockFrom and once using UpdateRecevieCounters. We should be able to do this all in one go.

Improve Request Patterns

#165 outlines a typical request pattern for blocks from Bitswap. For example when downloading a file, IPFS will typically ask for the root block, then the next layer of blocks, then the sub-layers of each layer etc.

It's advantageous to spread the file around as quickly as possible between peers, so that

there is less load on the seed peer
peers can more quickly begin downloading from each other first
- higher bandwidth (more peers to download from)
- natural load balancing
- less load on the network (download from nearer peers)

A common strategy (used by eg BitTorrent) is to download random blocks and to try to get the rarest piece first. Because Bitswap doesn't have manifest files, we don't know in advance which peer has each block, and it can be expensive to find out.

If we assume that the application layer will call Bitswap with groups of block CIDs at a time (eg all the children of the root node) then one strategy for spreading out the data could be to request those block CIDs from the network in random order. In this case we should still request all CIDs from one GetBlocks() call before requesting any CIDs from subsequent GetBlocks() calls. For example:

GetBlocks(cid1, cid2, cid3)
GetBlocks(cidA, cidB, cidC)
Bitswap -> cid3, cid1, cid2, cidC, cidB, cidA -> Network

Bitswap Sessions Exploration

Bitswap sessions group together requests for related blocks (eg the blocks that make up a file) to take advantage of the fact that usually related blocks can be retrieved from the same peers.

This Issue describes the current implementation of sessions.

Background

In IPFS a file is broken up into blocks that are organized into a DAG. The file is identified by the CID of the block representing the DAG root. To fetch a file:

IPFS asks Bitswap for the block corresponding to the root CID
Bitswap retrieves the block with that CID
IPFS asks Bitswap for the blocks for the root's children
Bitswap retrieves the children
IPFS asks Bitswap for the children of those children, and so on

If these requests are all made using a single Bitswap session, the session remembers which peers have responded before, how "far" each peer is (in terms of network latency) and manages splitting the requests across peers and combining the responses.

Session Implementation

Timers

The session maintains two timers:

idle timer (1 second by default)
- triggered if no block has been received in the idle interval
- broadcasts the wantlist to all connected peers
- searches for more peers that have the first CID in the wantlist (eg by asking the DHT)
  The idle interval triggers after 1 second by default. Once at least one peer has responded to a request, the idle interval is set to 500ms + (3 x the average latency), with an increasing back-off each time the idle timer is triggered (if no block was received in the interim).
periodic search timer (1 minute by default)
- triggered periodically
- broadcasts a random CID from the wantlist to all connected peers
- searches for more peers that have the random CID (eg by asking the DHT)

Request sharding

Peer list

As peers are discovered, they are added to a peer list. Each request to a peer is timed, and the latency for the peer is adjusted according to the formula:
latency = <current latency> * 0.5 + <new latency> * 0.5

Live wants

The session maintains a list of "live" wants, ie requests for a CID that have not yet been fulfilled. There can be up to 32 live wants at a time.

Requests are made when

a new request for block CIDs is made via GetBlock() or GetBlocks()
- the free space in the live want queue is filled (up to 32 slots)
- any CIDs that don't fit are put into a secondary queue
a response comes in
- the CIDs from the response are removed from the live want queue
- the live want queue is filled with CIDs from the secondary queue

The CIDs that were added to the live want queue are sent out as a want request

Request Splitting

The first want request is broadcast to all peers that this node is connected to. As responses come in, the peers are added to the peer list in order of latency.

Once there are peers in the peer list, any subsequent want request

selects the "closest" (by latency) peers from the peer list, up to a maximum of 32
splits the peers into groups
splits the wanted CIDs into groups
sends each CID group to the peers in a peer group

For example if there are 8 peers (A - H) and 10 CIDs (0 - 9), and the split factor is 3:

cid0 cid3 cid6 cid9  ->  A D G
cid1 cid4 cid7       ->  B E H
cid2 cid5 cid8       ->  C F

Often more than one peer will have the same block (eg peers A & D might both have cid0). When the local node receives a block, it broadcasts a cancel message for that block CID to all connected peers. However the cancel may not be processed by the recipient peer before it has sent out the block, meaning the local node will receive a duplicate of the block. The local node keeps track of the ratio of duplicates / received blocks and adjusts the split factor:

If the ratio goes above 0.4 (a lot of duplicates) the split factor is increased. This means that the same CID will be sent to less peers, eg with a split factor of 5 for the above example:

  cid0 cid5  ->  A F
  cid1 cid6  ->  B G
  cid2 cid7  ->  C
  cid3 cid8  ->  D
  cid4 cid9  ->  E

If the ratio goes below 0.2 (very few duplicates) the split factor is decreased. This means that the same CID will be sent to more peers, eg with a split factor of 2 for the above example:

  cid0 cid2 cid4 cid6 cid8  ->  A C E G
  cid1 cid3 cid5 cid7 cid9  ->  B D F H

The split factor is initially set to 2 and can range from 1 - 16.

Currently the peer list is ordered linearly by latency, however there is a WIP PR to adjust the list order probabilistically according to latency.

Deduplicate Bitswap.GetBlocks with Session.GetBlocks

We need to be careful to avoid introducing a bunch of overhead but, ideally, Bitswap.GetBlocks would just call bs.NewSession(ctx).GetBlocks(ctx, cid).

Tests are unreliable

Particularly sessions tests, but there are various tests that fail CI somewhat frequently. We should to do a reliablility pass by running each individual test ~100 times locally -- https://varunksaini.com/blog/run-go-test-multiple-times/

Tag peers in request queue

We should tag peers in the request queue using the connection manager.

Wantlist Race B

Forked from: #99

Problem

Peer A connects to peer B.
Peer A sends its wantlist to peer B.
Peer A disconnects from peer B.
Peer B notices the disconnect, forgets peer A's wantlist.
Peer A immediately reconnects to peer B.
Peer A notices the disconnect, sees that it still has a connection, doesn't resend it's wantlist.

Solution

The only thing I can think of is the following: whenever the stream we're using to send wantlists closes, we (a) open a new one and (b) send the complete wantlist. Unfortunately, I really wanted to eventually move away from keeping a stream open entirely but, we'll, I can't think of a better solution.

Should the client repeat the broadcast block request?

scene

I have two nodes, which are in different LANs. They use a server on the public network for port forwarding, so that the two nodes are connected. The connection network bandwidth is about 5M and it is not very stable.

The A node has a resource of about 200M. When the Node B obtains the resource through the ipfs gateway, it will block on some data blocks and cannot obtain data.

The wantlist of node A is always hold hash of these data blocks, and the ledger of node B also has wantlist for node A, indicating that node B has received all data block requests, but for some reason The Node B does not send the data block to the A node, causing the A node to wait for the response of these data blocks.

If the A node can repeat the request to broadcast these data blocks, the B node can respond correctly when it receives the data block request again.

I am using ipfs private network

ipfs version: 
go-ipfs version: 0.4.18-
Repo version: 7
System version: amd64/linux
Golang version: go1.11.1

my point of view

The client should repeatedly broadcast long waiting data block requests. In order to solve the problem that the nearby nodes have resources but cannot be quickly obtained.

Improve Request Sharding

#165 outlines session request sharding in Bitswap. Each session maintains a list of peers that have responded to a wantlist request with a block, ordered by latency. Requests are split such that groups of CIDs (wants) are sent to groups of peers. As responses come in, the session adjusts the split factor up or down to maintain a balance between redundancy and too many duplicates.

Responses are processed on a block-by-block basis

The live want queue has 32 slots. Once the queue is full, then each time a response block is processed

one slot in the live want queue opens up (the slot the response block CID occupied)
the slot is filled with one CID (want) from the secondary queue
the CID is sent out as a new want request

However the splitter is designed to work over groups of blocks, not a single block. So regardless of the split factor, blocks will always be sent to the same peers until the split factor changes.

For example if there are 8 peers (A - H) and the split factor is 3, the peers will be split into three groups. The CID will be sent to the peers in the first group and the other two groups will be disregarded:

cid        ->  A D G
<none>     ->  B E H
<none>     ->  C F

All blocks will be sent to peers A, D and G until the split factor changes. Regardless of the split factor, blocks will always be sent to peer A.

Shard wants on a streaming basis

It would be better for a few reasons if all blocks in the response were processed as a group, however quite often the response contains a small number of blocks anyway, so any solution should be able to handle a single block as well as several blocks.

There is a WIP PR to adjust the peer list order probabilistically, which should mitigate the problem, as the order of peers will change somewhat with each request. However it may be easier to reason about the problem if wants are instead sharded on a streaming basis.

Sharding should

Probabilistically select N peers to send the block to.
N is analogous to the split factor: it should vary to find a balance between redundancy and too many duplicates.
Select peers with low latency and high probability of having the target block.
Balance load so peers don't get over-loaded.
Ask nearby peers for blocks even if they didn't have any so far.
Asking nearby peers we are already connected to for a block is a relatively cheap operation. If several peers are downloading the same data, we want them to quickly start downloading from each other so as to
- reduce load on the "seed" peer and the network as a whole
- increase bandwidth (download from several peers at once)

Concurrent map access in Entry.SesTrk

This happens because we send shared *Entry objects to each message queue but each message queue then goes and modifies the associated sessions.

IMO, the correct fix is to decouple state tracking from these entries.

Adding configuration to allow BitSwap receiving not wanted blocks

The issue is that I want to use BitSwap not only to handle blocks it wants, but also any other sent block. Currently BitSwap checks every received block through WantManager's IsWanted() method and just skips it, if false.

In my opinion, this feature may add some flexibility to BitSwap and it could be done by adding global variable like var AllowNotWanted = false or even optioning through options pattern. I need this feature as soon as possible and there is no other way, like implementation substitution, to change behaviour except original code modification. I would be glad to contribute, but first you need to make a decision regarding to the necessity of the change.

Thank you for attention.

Deduplicate provider logic

We currently find providers in both the providerQueueManager and in the sessions. We should:

Improve the session's version of this to match what we have in the providerQueueManager. That is, we should be deduplicating and rate limiting requests.
Get rid of the duplicate logic. We probably don't even need the reboadcastWorker. Sessions should handle this for us.

Occasionally try to find additional providers even if we already have them.

Currently, once we've found at least one peer providing blocks for us, we stop looking no matter how slow that peer is. Instead of settling, we should occasionally search for better providers.

Use different types for our wantlists and peer wantlists

We use them differently and we can save some overhead by using a more compact datastore for tracking peer wants. Specifically, due to the lack of sessions, we can trivially use a sync.Pool to reuse entries.

Don't decrease split on unique message

We should decrease the split when we fail to receive a message. Otherwise, we'll get a minimum of 50% duplicate blocks if everyone has every block.

Avoid calling FindProviders

We currently call FindProviders even if we don't need to. We need to be less eager about this.

improper timer interval lead to out of memory and high load.

go-bitswap/session/session.go

Line 399 in 2c47a55

s.tick.Reset(provSearchDelay)

provSearchDelay is 1s, which is too short, can lead to high system load, and memory consumption when pin many rare resources, then ipfs daemon process hung and dead.

provSearchDelay should not be used here.

a quick fix
s.tick.Reset(time.Minute)

How to set node peer-ID with higher priority for specific hash-key...?

My workplace node (upload 800Mbps), my home(upload 10Mbps) node.
Both host the same content.
https://transparentgov.net/json2tree/ipfsList.html

slow node always been used, even the fast node available.
I want to use fast node first.
Can I specify the fast(high bandwidth) node have higher priority?

I try: ipfs dht provide hash-key...

But it does not work.

What this command really does?
Does it inform IPFS that, my node peer-ID provide this hash-key...?

How to set node peer-ID with higher priority for specific hash-key...?

Bitswap not sending wanted blocks

I'm following up on ipfs/js-ipfs#1874 and I'm seeing an issue with the go-ipfs nodes deployed as the preload nodes for js-ipfs.

The situation I'm seeing is:

browser js-ipfs node is connected to go-ipfs node over websockets (wss)
browser node adds CID to wantlist
bitswap message is sent to go-ipfs successfully (as far as I can tell)
no response from go-ipfs node with block data
browser node recieves multiple messages from the go-ipfs node with it's wantlist (so communication is possible)

We know that this node HAS this content because https://node0.preload.ipfs.io/api/v0/refs?r=true&arg=QmS8hKtny73KoTh8cL2xLoLHjMDjkG9WYF3AHAdd9PHkxo returns 200.

@scout found this is the syslogs for the go-ipfs node:

bitswap: #033[0mSendMsg errored but neither 'done' nor context.Done() were set #033[0;37mwantmanager.go:236#033[0m

Godocs, particularly about NewSession()

Many methods in this module have no attached documentation.

Particularly the NewSession() method is undocumented. How do sessions work in bitswap? Can I use this Fetcher just like the normal one? When should we create NewSessions? Is it meant to be a long lived, fetcher that we use for all the lifetime of the application? If so, how do I make bitswap just use it by default? If not, there does not seem to be a way to cancel sessions, how does that happen?

What is an Instance? What is a SessionGenerator? Are we supposed to make use of PeerConnected, PeerDisconnected ? Does this have bad effects when not?

Setup Circle CI

Where is this formula implemented?

As the above image show, where is the formula implemented? I read this module and didn't found.

Memory leak in the latency tracker.

No requests are running but we appear to be leaking a bunch of these.

leak.zip

Another bad log formatting in workers.go

https://github.com/ipfs/go-bitswap/blob/master/workers.go#L262 uses a formatting string, but calls the non-formatting method:

16:30:41.383 DEBUG    bitswap: failed to connect to provider %s: %s <peer.ID Qm*iX9c5z> context deadline exceeded workers.go:242

High Level Docs On How Sessions Work

As part of developing a high level bitswap documentation, it would be great to document what sessions are, how you use them, and the core strategies used by sessions to speed up block transfer. Could live in docs folder or README

Initial wantlist handling is racy

The race is as follows:

Start processing the "Connect" event.

Read the current broadcast wantlist in:

go-bitswap/bitswap.go

Lines 438 to 444 in 916de59

 // Connected/Disconnected warns bitswap about peer connections. 

 func (bs *Bitswap) PeerConnected(p peer.ID) { 

 initialWants := bs.wm.CurrentBroadcastWants() 

 bs.pm.Connected(p, initialWants) 

 bs.engine.PeerConnected(p) 

 }

Add a new wantlist entry.

Start the peer's message handler:

go-bitswap/peermanager/peermanager.go

Lines 158 to 170 in 916de59

 func (pm *PeerManager) startPeerHandler(p peer.ID, initialEntries []*wantlist.Entry) PeerQueue { 

 mq, ok := pm.peerQueues[p] 

 if ok { 

 mq.RefIncrement() 

 return nil 

 } 

 mq = pm.createPeerQueue(p) 

 pm.peerQueues[p] = mq 

 mq.Startup(pm.ctx, initialEntries) 

 return mq 

 }

In this last function, initialEntries will be missing the wantlist entry from step 3.

Don't discard unwanted blocks we actually want

Currently, we discard blocks if they're not in the wantlist. However, we should probably accept them anyways if they're in the interest set.

Fail to get block from connected peer

I have a node that is connected to another node <peer.ID Qm*ZtVMeB>, <peer.ID Qm*ZtVMeB> has a data block QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE, but the request has not been successful, the log is printed under loop mistake.

12:30:29.307  INFO    bitswap: want blocks: [QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE] wantmanager.go:77
12:30:29.308 DEBUG    bitswap: New Provider Query on cid: QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE providerquerymanager.go:323
12:30:29.308 DEBUG    bitswap: Beginning Find Provider Request for cid: QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE providerquerymanager.go:230
12:30:29.309 DEBUG    bitswap: failed to connect to provider <peer.ID Qm*BvdXJv>: dial backoff providerquerymanager.go:242
12:30:29.313 DEBUG    bitswap: failed to connect to provider <peer.ID Qm*DNvq9J>: dial backoff providerquerymanager.go:242
12:30:29.313 DEBUG    bitswap: Received provider (<peer.ID Qm*ZtVMeB>) for cid (QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE) providerquerymanager.go:323
12:30:29.391 DEBUG    bitswap: Finished Provider Query on cid: QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE providerquerymanager.go:323
12:30:30.990  INFO    bitswap: want blocks: [QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE] wantmanager.go:77
12:30:30.990 DEBUG    bitswap: New Provider Query on cid: QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE providerquerymanager.go:323
12:30:30.990 DEBUG    bitswap: Beginning Find Provider Request for cid: QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE providerquerymanager.go:230
12:30:30.991 DEBUG    bitswap: failed to connect to provider <peer.ID Qm*BvdXJv>: dial backoff providerquerymanager.go:242
12:30:30.993 DEBUG    bitswap: Received provider (<peer.ID Qm*ZtVMeB>) for cid (QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE) providerquerymanager.go:323
12:30:30.993 DEBUG    bitswap: failed to connect to provider <peer.ID Qm*DNvq9J>: dial backoff providerquerymanager.go:242
12:30:31.831 DEBUG    bitswap: Finished Provider Query on cid: QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE providerquerymanager.go:323
12:30:32.677  INFO    bitswap: want blocks: [QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE] wantmanager.go:77
12:30:32.677 DEBUG    bitswap: New Provider Query on cid: QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE providerquerymanager.go:323
12:30:32.678 DEBUG    bitswap: Beginning Find Provider Request for cid: QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE providerquerymanager.go:230
12:30:32.678 DEBUG    bitswap: failed to connect to provider <peer.ID Qm*BvdXJv>: dial backoff providerquerymanager.go:242
12:30:32.683 DEBUG    bitswap: Received provider (<peer.ID Qm*ZtVMeB>) for cid (QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE) providerquerymanager.go:323
12:30:32.683 DEBUG    bitswap: failed to connect to provider <peer.ID Qm*DNvq9J>: dial backoff providerquerymanager.go:242
12:30:33.097 DEBUG    bitswap: Finished Provider Query on cid: QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE providerquerymanager.go:323
......

Executing the following command on the <peer.ID Qm*ZtVMeB> node is expected output

 > ipfs bitswap wantlist -p QmWrvYBRZm2jeffxyVtomoa3e8amVMXVyzzkdKXkzHRUV1
QmUmeNvU1HN5tLk8T6ihrVRbHuF2EGw4rF1UBqoGVudFXE

I am using a private network， the ipfs verion:

ipfs version --all
go-ipfs version: 0.4.19-
Repo version: 7
System version: amd64/darwin
Golang version: go1.12

Remove the public CancelWants method

This should be private. I believe this exists because go-ipfs used to provide a way to cancel wants but we removed that as it was error prone.

Users who don't want something should just stop asking for it...

Limit the number of retries when sending messages

We should give up if we keep failing instead of retrying until we fail to connect.

go-bitswap/messagequeue/messagequeue.go

Lines 164 to 170 in 916de59

 // send wantlist updates 

 for { // try to send this message until we fail. 

 if mq.attemptSendAndRecovery(ctx, wlm) { 

 return 

 } 

 } 

 }

Reject unwanted blocks

As far as I can tell, we don't actually do this.

Data races in go-bitswap

I hit these when running my tests, but I've only seen it once.

==================
WARNING: DATA RACE
Read at 0x00c02961bea0 by goroutine 429:
  github.com/ipfs/go-bitswap/network.(*netNotifiee).Connected()
      /home/hector/go/pkg/mod/github.com/ipfs/[email protected]/network/ipfs_impl.go:218 +0x6b
  github.com/libp2p/go-libp2p-swarm.(*Swarm).addConn.func1()
      /home/hector/go/pkg/mod/github.com/libp2p/[email protected]/swarm.go:221 +0x72
  github.com/libp2p/go-libp2p-swarm.(*Swarm).notifyAll.func1()
      /home/hector/go/pkg/mod/github.com/libp2p/[email protected]/swarm.go:452 +0x71

Previous write at 0x00c02961bea0 by goroutine 193:
  github.com/ipfs/go-bitswap/network.(*impl).SetDelegate()
      /home/hector/go/pkg/mod/github.com/ipfs/[email protected]/network/ipfs_impl.go:138 +0x3e
  github.com/ipfs/go-bitswap.New()
      /home/hector/go/pkg/mod/github.com/ipfs/[email protected]/bitswap.go:121 +0xf59
  github.com/hsanjuan/ipfs-lite.New()
      /home/hector/go/pkg/mod/github.com/hsanjuan/[email protected]/ipfs.go:91 +0x76a
  github.com/ipfs/ipfs-cluster/consensus/crdt.(*Consensus).setup()
      /home/hector/go/src/github.com/ipfs/ipfs-cluster/consensus/crdt/consensus.go:142 +0x843

Goroutine 429 (running) created at:
  github.com/libp2p/go-libp2p-swarm.(*Swarm).notifyAll()
      /home/hector/go/pkg/mod/github.com/libp2p/[email protected]/swarm.go:450 +0x1a9
  github.com/libp2p/go-libp2p-swarm.(*Swarm).addConn()
      /home/hector/go/pkg/mod/github.com/libp2p/[email protected]/swarm.go:220 +0x5a5
  github.com/libp2p/go-libp2p-swarm.(*Swarm).dial()
      /home/hector/go/pkg/mod/github.com/libp2p/[email protected]/swarm_dial.go:333 +0xb32
  github.com/libp2p/go-libp2p-swarm.(*Swarm).doDial()
      /home/hector/go/pkg/mod/github.com/libp2p/[email protected]/swarm_dial.go:248 +0x23e
  github.com/libp2p/go-libp2p-swarm.(*Swarm).doDial-fm()
      /home/hector/go/pkg/mod/github.com/libp2p/[email protected]/swarm_dial.go:233 +0x70
  github.com/libp2p/go-libp2p-swarm.(*activeDial).start()
      /home/hector/go/pkg/mod/github.com/libp2p/[email protected]/dial_sync.go:80 +0xb4

Goroutine 193 (running) created at:
  github.com/ipfs/ipfs-cluster/consensus/crdt.New()
      /home/hector/go/src/github.com/ipfs/ipfs-cluster/consensus/crdt/consensus.go:104 +0x548
  github.com/ipfs/ipfs-cluster.makeConsensus()
      /home/hector/go/src/github.com/ipfs/ipfs-cluster/ipfscluster_test.go:236 +0x1f9
  github.com/ipfs/ipfs-cluster.createComponents()
      /home/hector/go/src/github.com/ipfs/ipfs-cluster/ipfscluster_test.go:202 +0xf00
  github.com/ipfs/ipfs-cluster.createClusters()
      /home/hector/go/src/github.com/ipfs/ipfs-cluster/ipfscluster_test.go:331 +0x9db
  github.com/ipfs/ipfs-cluster.TestAdd()
      /home/hector/go/src/github.com/ipfs/ipfs-cluster/add_test.go:16 +0x70
  testing.tRunner()
      /usr/lib64/go/1.12/src/testing/testing.go:865 +0x163
==================

WARNING: DATA RACE
Read at 0x00c0001fbf40 by goroutine 429:
  github.com/ipfs/go-bitswap.(*Bitswap).PeerConnected()
      /home/hector/go/pkg/mod/github.com/ipfs/[email protected]/bitswap.go:332 +0x3e
  github.com/ipfs/go-bitswap/network.(*netNotifiee).Connected()
      /home/hector/go/pkg/mod/github.com/ipfs/[email protected]/network/ipfs_impl.go:218 +0x96
  github.com/libp2p/go-libp2p-swarm.(*Swarm).addConn.func1()
      /home/hector/go/pkg/mod/github.com/libp2p/[email protected]/swarm.go:221 +0x72
  github.com/libp2p/go-libp2p-swarm.(*Swarm).notifyAll.func1()
      /home/hector/go/pkg/mod/github.com/libp2p/[email protected]/swarm.go:452 +0x71

Previous write at 0x00c0001fbf40 by goroutine 193:
  github.com/ipfs/go-bitswap.New()
      /home/hector/go/pkg/mod/github.com/ipfs/[email protected]/bitswap.go:116 +0xbde
  github.com/hsanjuan/ipfs-lite.New()
      /home/hector/go/pkg/mod/github.com/hsanjuan/[email protected]/ipfs.go:91 +0x76a
  github.com/ipfs/ipfs-cluster/consensus/crdt.(*Consensus).setup()
      /home/hector/go/src/github.com/ipfs/ipfs-cluster/consensus/crdt/consensus.go:142 +0x843

Goroutine 429 (running) created at:
  github.com/libp2p/go-libp2p-swarm.(*Swarm).notifyAll()
      /home/hector/go/pkg/mod/github.com/libp2p/[email protected]/swarm.go:450 +0x1a9
  github.com/libp2p/go-libp2p-swarm.(*Swarm).addConn()
      /home/hector/go/pkg/mod/github.com/libp2p/[email protected]/swarm.go:220 +0x5a5
  github.com/libp2p/go-libp2p-swarm.(*Swarm).dial()
      /home/hector/go/pkg/mod/github.com/libp2p/[email protected]/swarm_dial.go:333 +0xb32
  github.com/libp2p/go-libp2p-swarm.(*Swarm).doDial()
      /home/hector/go/pkg/mod/github.com/libp2p/[email protected]/swarm_dial.go:248 +0x23e
  github.com/libp2p/go-libp2p-swarm.(*Swarm).doDial-fm()
      /home/hector/go/pkg/mod/github.com/libp2p/[email protected]/swarm_dial.go:233 +0x70
  github.com/libp2p/go-libp2p-swarm.(*activeDial).start()
      /home/hector/go/pkg/mod/github.com/libp2p/[email protected]/dial_sync.go:80 +0xb4

Goroutine 193 (running) created at:
  github.com/ipfs/ipfs-cluster/consensus/crdt.New()
      /home/hector/go/src/github.com/ipfs/ipfs-cluster/consensus/crdt/consensus.go:104 +0x548
  github.com/ipfs/ipfs-cluster.makeConsensus()
      /home/hector/go/src/github.com/ipfs/ipfs-cluster/ipfscluster_test.go:236 +0x1f9
  github.com/ipfs/ipfs-cluster.createComponents()
      /home/hector/go/src/github.com/ipfs/ipfs-cluster/ipfscluster_test.go:202 +0xf00
  github.com/ipfs/ipfs-cluster.createClusters()
      /home/hector/go/src/github.com/ipfs/ipfs-cluster/ipfscluster_test.go:331 +0x9db
  github.com/ipfs/ipfs-cluster.TestAdd()
      /home/hector/go/src/github.com/ipfs/ipfs-cluster/add_test.go:16 +0x70
  testing.tRunner()
      /usr/lib64/go/1.12/src/testing/testing.go:865 +0x163
==================

==================
WARNING: DATA RACE
Read at 0x00c02961fc20 by goroutine 429:
  github.com/ipfs/go-bitswap/wantmanager.(*WantManager).Connected()
      /home/hector/go/pkg/mod/github.com/ipfs/[email protected]/wantmanager/wantmanager.go:149 +0x52
  github.com/ipfs/go-bitswap.(*Bitswap).PeerConnected()
      /home/hector/go/pkg/mod/github.com/ipfs/[email protected]/bitswap.go:332 +0x63
  github.com/ipfs/go-bitswap/network.(*netNotifiee).Connected()
      /home/hector/go/pkg/mod/github.com/ipfs/[email protected]/network/ipfs_impl.go:218 +0x96
  github.com/libp2p/go-libp2p-swarm.(*Swarm).addConn.func1()
      /home/hector/go/pkg/mod/github.com/libp2p/[email protected]/swarm.go:221 +0x72
  github.com/libp2p/go-libp2p-swarm.(*Swarm).notifyAll.func1()
      /home/hector/go/pkg/mod/github.com/libp2p/[email protected]/swarm.go:452 +0x71

Previous write at 0x00c02961fc20 by goroutine 193:
  github.com/ipfs/go-bitswap/wantmanager.New()
      /home/hector/go/pkg/mod/github.com/ipfs/[email protected]/wantmanager/wantmanager.go:67 +0x1d1
  github.com/ipfs/go-bitswap.New()
      /home/hector/go/pkg/mod/github.com/ipfs/[email protected]/bitswap.go:90 +0x601
  github.com/hsanjuan/ipfs-lite.New()
      /home/hector/go/pkg/mod/github.com/hsanjuan/[email protected]/ipfs.go:91 +0x76a
  github.com/ipfs/ipfs-cluster/consensus/crdt.(*Consensus).setup()
      /home/hector/go/src/github.com/ipfs/ipfs-cluster/consensus/crdt/consensus.go:142 +0x843

Goroutine 429 (running) created at:
  github.com/libp2p/go-libp2p-swarm.(*Swarm).notifyAll()
      /home/hector/go/pkg/mod/github.com/libp2p/[email protected]/swarm.go:450 +0x1a9
  github.com/libp2p/go-libp2p-swarm.(*Swarm).addConn()
      /home/hector/go/pkg/mod/github.com/libp2p/[email protected]/swarm.go:220 +0x5a5
  github.com/libp2p/go-libp2p-swarm.(*Swarm).dial()
      /home/hector/go/pkg/mod/github.com/libp2p/[email protected]/swarm_dial.go:333 +0xb32
  github.com/libp2p/go-libp2p-swarm.(*Swarm).doDial()
      /home/hector/go/pkg/mod/github.com/libp2p/[email protected]/swarm_dial.go:248 +0x23e
  github.com/libp2p/go-libp2p-swarm.(*Swarm).doDial-fm()
      /home/hector/go/pkg/mod/github.com/libp2p/[email protected]/swarm_dial.go:233 +0x70
  github.com/libp2p/go-libp2p-swarm.(*activeDial).start()
      /home/hector/go/pkg/mod/github.com/libp2p/[email protected]/dial_sync.go:80 +0xb4

Goroutine 193 (running) created at:
  github.com/ipfs/ipfs-cluster/consensus/crdt.New()
      /home/hector/go/src/github.com/ipfs/ipfs-cluster/consensus/crdt/consensus.go:104 +0x548
  github.com/ipfs/ipfs-cluster.makeConsensus()
      /home/hector/go/src/github.com/ipfs/ipfs-cluster/ipfscluster_test.go:236 +0x1f9
  github.com/ipfs/ipfs-cluster.createComponents()
      /home/hector/go/src/github.com/ipfs/ipfs-cluster/ipfscluster_test.go:202 +0xf00
  github.com/ipfs/ipfs-cluster.createClusters()
      /home/hector/go/src/github.com/ipfs/ipfs-cluster/ipfscluster_test.go:331 +0x9db
  github.com/ipfs/ipfs-cluster.TestAdd()
      /home/hector/go/src/github.com/ipfs/ipfs-cluster/add_test.go:16 +0x70
  testing.tRunner()
      /usr/lib64/go/1.12/src/testing/testing.go:865 +0x163
==================

Example request

Bitswap is very useful for distribution system. I want to use it to make some interesting stuff.
but now I'm confused >.<
can you give me an example like go-libp2p-example did? or just more basically...
thank you.

Queue Memory Leak

As far as I can tell, we never actually remove peer request queues but let them build up over time. Obviously, this isn't good.

Realistic Virtual Test Network

We need a realistic virtual test network where we:

Can set latencies/bandwidth.
Can make some nodes unable to dial others.
Can run ~10,000 nodes.
Use the real DHT.

Honestly, we probably already have 1-3. We should be able to just use our existing mock network package with the real dht to run semi-realistic tests.

Improving Latency Measurement

#165 outlines latency measurement in Bitswap. The latency of the connection to a peer is maintained per-session, by measuring the time between a request for a CID and receipt of the corresponding block.

There are a few issues with measuring latency in this fashion:

Multiple sessions may concurrently request the same block.
For example:
- session A requests CID1
- <1 second>
- session B requests CID1
- <100 ms>
- the block for CID1 arrives (from the request by session A)
- session A records latency as <1s + 100ms>
- session B records latency as <100ms>
Incoming blocks are processed one at a time.
The latency measurement is adjusted for each block in a message (a message may contain many blocks).
If a peer doesn't have the block, it simply doesn't respond.
For broadcast requests, we ignore timeouts.
For targeted requests (requests sent to specific peers) we calculate latency based on the timeout. This isn't really accurate, as the peer may not have responded simply because it didn't have the block.
Ideally we would measure the "bandwidth delay product".
The bandwidth delay product is the <bandwidth of the connection> x <the latency>. It measures the amount of data that can fit in the pipe, and can be used to ensure that the pipe is always as full as possible.

Issues 1 & 2 can be addressed by measuring latency per-peer instead of per-session. This would also likely improve the accuracy of latency measurement as there would be a higher sample size.

Issue 3 can be addressed by either

changing the protocol such that if a peer doesn't have a CID it sends back a NOT_FOUND message
ignoring timeouts in latency calculations (this would be much simpler)

Issue 4 is more complex, and needs to consider transport specific nuances, such as TCP slow start. Latency may be a reasonable stand-in for now.

Add bandwidth limitations to benchmarks

Currently, benchmarks in dup_blocks_test.go (perhaps this should just become benchmarks.go) use testnet, which simply assigns a latency to each request for a block. This means you can in theory request an infinite amount of blocks from a single peer, they will only each be delayed by that peer's latency. In reality, presumably you would eventually fill the pipe in terms of bandwidth, causing additional delays. Moreover, your own peer might have a maximum bandwidth no matter whom it requests blocks from. We should add these facilities to testnet, and then write benchmarks that use them, to provide more accurate real world benchmarks for bitswap.

Actually connect to providers in sessions

When using bitswap sessions, we find providers, add the providers to the session, but never actually connect to them.

The basic fix is to call ConnectTo on each provider. A better fix would be to reconnect to peers in a session on demand. However, we may need to introduce some kind of backoff logic to do that properly.

Predict peers most likely to have blocks

Problem
Right now we don't make the best use of information we have about our historical bitswap interactions with peers when devising a set of peers to ask for blocks.

PR #8 has some important background information particularly this comment.

Proposal
Add behavior to bitswap that tracks received blocks and the peers they came from, which can subsequently be queried for an (ordered?) set of peers deemed the best candidates to ask for a given block.

Partition sessions into "good" and "bad" peers.

Related to #14.

One issue with the way sessions currently work is that a peer may have the root of a DAG and a subtree but may not have the entire thing. Given the way bitswap sessions currently work, this means we'll:

Add this peer to the session.
Continue asking this peer for pieces they don't have as we try to fetch the subtree they don't have.

One solution (that we briefly discussed in London) is to "tune" our session by partitioning peers into a good set and a bad set.

That is, we keep all peers in the session in an ordered list and, every time we receive a block, we promote that peer to the front of the list. We can then partition this list into "good" peers and "bad" peers:

If none of the good peers can respond to our request, we can move the partition to admit more and more bad peers until one of them responds.
If too many of the good peers are giving us blocks (too many duplicate blocks), we can move the partition to admit fewer good peers.

Unfortunately, this doesn't work well for random, parallel walks (e.g., EnumerateChildrenAsync) because we request blocks from random, unrelated parts of the graph at the same time.

However, there's a simple solution that we didn't consider in London: avoid doing this. That is, there's really no good performance reason for walking randomly. Instead, if we try to explore sequentially as much as possible, we may be able to better utilize sessions even without a more complicated dag-aware engine.

When I said "no good performance reason" I kind of lied. Given the above design, we could end up with one peer in the good set and a bunch of peers in the bad set that have different parts of the graph. A random walk is able to exploit this to download blocks in parallel but an in-order walk can't.

So, yeah. We still need a dag-aware bitswap engine but we may be able to get something pretty good without it.

High CPU usage when sending messages

For some reason, the MessageQuee.runQueue loop is consuming a bunch of CPU in the select. It looks like this is related to sending wantlist messages.

I'm guessing this is because we now no longer have (as far as I can tell) an overall max wantlist size. That means every session is sending wants to every peer when broadcasting. That means Requests*Peers operations.

I'm not sure how best to fix this. We can fix the CPU issue with better concurrent datastructures but I'm afraid we're likely sending massive wantlists. We almost want a per-peer rate-limit but I'm not sure how to best implement that.

But, before we can make any decisions here, we really need to know how much bitswap traffic this is causing.

remove interest cache

I believe this is a hold-over from a time when tofetch didn't provide a way to check "has". However, now that we have that, we can check exact membership instead of using a LRU cache.

(really, the LRU cache is likely to use more memory anyways as it uses interfaces all over the place).

TestRemovingPeersWhenManagerContextCancelled random failure

--- FAIL: TestRemovingPeersWhenManagerContextCancelled (0.01s)
    sessionmanager_test.go:154: received blocks for sessions after manager is shutdown
FAIL
FAIL	github.com/ipfs/go-bitswap/sessionmanager	0.069s

	// Connected/Disconnected warns bitswap about peer connections.
	func (bs *Bitswap) PeerConnected(p peer.ID) {
	initialWants := bs.wm.CurrentBroadcastWants()
	bs.pm.Connected(p, initialWants)
	bs.engine.PeerConnected(p)
	}

	func (pm PeerManager) startPeerHandler(p peer.ID, initialEntries []wantlist.Entry) PeerQueue {
	mq, ok := pm.peerQueues[p]
	if ok {
	mq.RefIncrement()
	return nil
	}

	mq = pm.createPeerQueue(p)
	pm.peerQueues[p] = mq
	mq.Startup(pm.ctx, initialEntries)
	return mq
	}

	// send wantlist updates
	for { // try to send this message until we fail.
	if mq.attemptSendAndRecovery(ctx, wlm) {
	return
	}
	}
	}