Git Product home page Git Product logo

dht's People

Contributors

ruslanfedoseenko avatar shiyanhui avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dht's Issues

比python的慢太多

对比pyhton的

https://github.com/NanYoMy/DHT-simDHT

慢的不是一个数量级,先不考虑解析的话,对方是可能到达3天上千万的hashinfo的,你这个我加了OnGetPeers,在里面request,也一样慢的不行

代码笔误bug

routingtable.go 代码223行,kbucket.Replace函数里面,候选节点插入到nodes列表之后没有break

6060端口的作用是?

运行起来后, 脚本打开了两个端口, TCP的6060和UDP的6881

6881这个比较好理解, 就是用于peers之间的通信的, 接收查询请求之类的场景.

但是这个6060的是干嘛的呢? 我在代码里面只看到起了一个http的6060的端口, 也不知道这个端口的作用

老哥,什么时候出第三篇爬虫教程?

老哥,什么时候出第三篇爬虫教程?
小弟我最近在学习DHT,碰巧看到你写的WIKI。你写的教程很好啊,第一次看见这种深浅适当的BT/DHT教程。
请问啥时候出第三篇?

采集到后面越来越慢

采集大概30W不到的数据之后,infohash重复度非常高,也导致了新增采集越来越少,越来越慢,请问有啥解决办法吗?

关于节点插入时bucket分裂问题

} else if root.KBucket().prefix.Compare(nd.id, prefixLen-1) == 0 {

routingtable.go 388行,bucket分裂的条件判断,是否应该是当前叶子节点和本机节点的前缀相同时,才去分裂,而代码中的判断是新插入节点与当前叶子节点进行比较。
PS:看了下有个人提了同样的问题,你的回答是为了容纳更多的节点,但是这样的话,后面else加入candidate的逻辑是不是都走不到了

关于NAT传透.

仅作参考。
bep5 应该是自带部分UDP 打洞效果的(Address-Restricted cone NAT and Port-Restricted cone NAT),你发送过 find_node 或其他任何消息的节点,向你发送get_peer 或 announce_peer 消息应该是可以穿透NAT的(当然一定时间内), 而通过其他节点的路由表发现你,直接向你发送消息就不可能传透
NAT了。

代码写的不是很好看.

特别是这个函数,看的我胸闷!
func (wire *Wire) fetchMetadata(r Request)
...

提几个建议

  1. 能否运用以下函数减少重复制造轮子:

binary.Read
io.ReadFull

2.能否把同一维度的东西归类写在一起

比如,
这个是一个维度的, 发送握手包,得到握手应答,发送额外握手包

	if sendHandshake(conn, infoHash, []byte(randomString(20))) != nil ||
		read(conn, 68, data) != nil ||
		onHandshake(data.Next(68)) != nil ||
		sendExtHandshake(conn) != nil {
		return
	}

但下面的代码呢? 读包头4 字节,再读1字节,再读1字节 . 这些不是同一维度的! 140行的函数看的头晕(1个for循环上下文乱跳,一堆暴露在外面的细节)

  1. 错误处理能否打个日志.贴个协议注释,链接.

写程序要逻辑清楚干净, 表明清楚意思.这份代码看的我真的难受,提点抱怨,见谅.

阿里云上接受不到数据, 通过日志分析绝大多数错误是decode

我在krpc.go里添加了一些日志

func handle(dht *DHT, pkt packet) {
	if len(dht.workerTokens) == dht.PacketWorkerLimit {

		fmt.Println("return from len(dht.workerTokens) == dht.PacketWorkerLimit")
		return
	}

	dht.workerTokens <- struct{}{}

	go func() {
		defer func() {
			<-dht.workerTokens
		}()

		if dht.blackList.in(pkt.raddr.IP.String(), pkt.raddr.Port) {

			fmt.Println("return from dht.blackList.in(pkt.raddr.IP.String(), pkt.raddr.Port)")
			return
		}

		data, err := Decode(pkt.data)
		if err != nil {

			fmt.Print("return from data, err := Decode(pkt.data)")
			fmt.Println(err)
			return
		}

		response, err := parseMessage(data)
		if err != nil {

			fmt.Print("return from response, err := parseMessage(data)")
			fmt.Println(err)
			return
		}

		if f, ok := handlers[response["y"].(string)]; ok {
			f(dht, pkt.raddr, response)
		}
	}()
}

然后用如下命令进行日志过滤


grep "Got a response" nohup_dht.logs | wc -l
grep "return from data, err := Decode(pkt.data)" nohup_dht.logs | wc -l
grep "return from response, err := parseMessage(data)" nohup_dht.logs | wc -l
grep "return from dht.blackList.in(pkt.raddr.IP.String(), pkt.raddr.Port)" nohup_dht.logs| wc -l  

得到结果如下

0
620
10
22

运行了俩分钟绝大多数都是decode error

return from data, err := Decode(pkt.data)invalid bencode when decode item

一条有用的数据都没拿到.

请问是解码有问题吗?

PS, 是通过在mac上编译出的linux版本, 编译命令

CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -o bin/exec_linux_dht src/main/main.go

centos版本

NAME="CentOS Linux"
VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

mac上go env信息

GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/xxx/Library/Caches/go-build"
GOEXE=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/xxx/godht:/usr/local/go/bin"
GORACE=""
GOROOT="/usr/local/Cellar/go/1.10/libexec"
GOTMPDIR=""
GOTOOLDIR="/usr/local/Cellar/go/1.10/libexec/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/_b/_xrkt7216glfsz7z989ss7zm0000gn/T/go-build666755492=/tmp/go-build -gno-record-gcc-switches -fno-common"

像路由表之类的数据结构能实现成接口吗?

DHT和KRPC中有一些像路由表(routingTable)之类的数据结构都是使用内存式容器实现的,这些数据结构能否实现成接口 Interface 吗?

之所以有这样的想法,主要是有个担心:如果结点Node一旦多了(比如几千万、几亿),需要几GB甚至几十GB的内存空间,内存有可能不足。

如果是接口,那么就可以根据需求自定义它们的存储,比如可以使用 Redis 来代替内存。

节点分裂的问题

} else if root.KBucket().prefix.Compare(nd.id, prefixLen-1) == 0 {

和你博客中说的好像有点差异

第一种情况是当前的路径是该节点ID(注意不是要插入的key,是“我”自己的ID)的前缀,那么就分裂

代码中是用了要插入的node,而不是自己的nodeid,我理解下来应该是自己的nodeid

请问是我理解的问题吗

按照demo什么也采集不到,完全没反应,什么情况呢?

不管是在本地还是在服务器上都是没反应?

package main
import (
    "fmt"
    "github.com/shiyanhui/dht"
)

func main() {
    downloader := dht.NewWire(65536)
    go func() {
        // once we got the request result
        for resp := range downloader.Response() {
            fmt.Println(resp.InfoHash, resp.MetadataInfo)
        }
    }()
    go downloader.Run()

    config := dht.NewCrawlConfig()
    config.OnAnnouncePeer = func(infoHash, ip string, port int) {
        // request to download the metadata info
        downloader.Request([]byte(infoHash), ip, port)
    }
    d := dht.New(config)

    d.Run()
}

如何加入dht网络?

教程《一步一步教你写BT种子嗅探器-DHT篇》的krpc部分提到:
一开始你是不在DHT网络中的,你需要别人把你介绍进去,任何一个在DHT中的人都可以。一般我们可以向 router.bittorrent.com:6881、 dht.transmissionbt.com:6881 等发送find_node请求,然后我们的DHT就可以开始工作了

这样好像还是需要一个有公网ip的网络节点啊,那这个中心节点关掉了不是同样无法工作了?请求老大解惑。同时想请教router.bittorrent.com:6881是如何把新节点介绍进dht网络中去的

端口占用的问题..

其他的DHT节点获取我的ip和端口 是从我发送的UDP包中获取的
如果我Server端与Client 绑定的是同一端口就会有冲突
如果不是同一端口 那么在其他节点的路由表中, 我的DHT节点ip和端口是 我Client端发送的ip 和端口
那么说 我的client 端和 Server不能同时运行吗?
还是我的理解有误?

Crawler mode speed

Even increasing the connection limits I notice that in crawler mode it gets only 60 peers/minute.
Is there a setting to increase the speed?
With another crawler I have I can get 100000/hour!

分布式 DHT 爬虫共享 Peers 和 BlockList

我想要实现一个分布式 DHT 爬虫,须要共享 Peers 和 BlockList,这样一来,当某个爬虫发现一个 Peer,或把某个 Node 放入 BlockList 时,其他爬虫也会立即知道。

如果 peersManagerblackList 两个 struct 能够自定义后端存储的话,上述需求就很容易实现。

如果能把 syncedMap 抽象成接口就好了!这样,用户就可以替换默认的 syncedMap 来实现共享式的 Peers 和 BlockList。

@shiyanhui 不知道有这方面的计划没?

Add a version string to all outgoing messages

BEP 5 has been updated to document the long standing de-facto standard of including a version string in RPC messages. This is an important feature for identifying implementations which may be lacking features or misbehaving. The relevant section of BEP5:

A key v should be included in every message with a client version string. The string should be a two character client identifier registered in BEP 20 followed by a two character version identifier.

Note that the convention is for the client identifier to identify the DHT implementation rather than the client application. Thus the same identifier should be used regardless of which client is using this module.

Fix config.KBucketSize Size

In line 74 of dht.go and in other places regarding maxsize you are setting a value wich overflows int32.
Error: /gopath/src/github.com/shiyanhui/dht/dht.go:74: constant 4294967296 overflows int.
(this is run on an armv7 machine 32bits architecture)

请问作者如何入库

如标题这东西直接输出到显示器 怎么入库啊 而且获取到内容json 并不标准, 我很懵逼

find_node no response

你好,看了您写的两篇教程非常激动,于是自己用java尝试着写嗅探器
我遇到的问题是:通过对 bootstrap 返回的所有节点 发送find_node请求,竟然没收到一个response, 卡了很久不知道哪里出了问题
bdecode对compact nodes info 里包含的 ip和port的解码应该不会错 , 请求老哥的帮助

Memory problems?

I just got an oom after running this for 2 hours, it was taking way more than 500MB of memory when checked. Could you do some memory profiling?

bitmap String() 长度异常

文件bitmap.go中String()方法

    for i := 0; i < div; i++ {
        buff[i] = fmt.Sprintf("%04b", bitmap.data[i])
    }

因为1byte=8bit,所以这里输出是否应该是"%08b"?

Big God, May ask How to build all the source?

I want build it as a single binary executable file, so I run go build just inside dht/ but, it says:can't load package: package dht: cannot find package "dht" in any of: /usr/local/Cellar/go/1.8.1/libexec/src/dht (from $GOROOT) /Users/jintian/go/src/dht (from $GOPATH) , What should I do?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.