opencurve / curve Goto Github PK

Curve is a sandbox project hosted by the CNCF Foundation. It's cloud-native, high-performance, and easy to operate. Curve is an open-source distributed storage system for block and shared file storage.

Home Page: https://opencurve.io

License: Apache License 2.0

Starlark 1.32% Shell 1.14% Python 2.14% Dockerfile 0.09% C++ 89.10% C 0.42% SWIG 0.01% Roff 0.06% Makefile 0.06% Go 4.95% Jinja 0.35% Java 0.34%

storage distributed-systems raft sds cloud-native-storage high-performance block-storage filestorage storage-engine posix-compatible

curve's Issues

read chunk 的stale read 问题

阅读代码发现ReadChunkRequest::Process 和 CopysetNode::on_apply 都使用了ConcurrentApplyModule。但是ConcurrentApplyModule对于属于同一个chunk读请求和写请求是分别用不同的任务队列（不同线程）处理的，感觉这样无法保证ReadChunkRequest::Process 描述的可以解决stale read问题。

为什么推荐是centos8呢？是因为内核的原因么？

fio 测试导致chunkserver offline

版本

https://github.com/opencurve/curve/releases/tag/v1.0.0

步骤

fio测试之前curve_ops_tool status查看chunk server，md，etcd没有offline
fio -direct=1 -iodepth=64 -thread -rw=randwrite -bs=4k -numjobs=4 -runtime=30 -group_reporting -name=test-curve -filename=/dev/nbd0 -ioengine=libaio -io_limit=400000G
数据盘上有少量io。
之后，curve_ops_tool status查看chunk server offline

cluster is not healthy
total copysets: 300, unhealthy copysets: 110, unhealthy_ratio: 36.6667%
...
chunkserver: total num = 36, online = 32, offline = 4(recoveringout = 0, chunkserverlist: [])
left size: min = 687GB, max = 688GB, average = 687.29GB, range = 1GB, variance = 0.21

查看offline的chunkserver的log，类似

I 2020-11-18T02:09:45-0500 49594 chunkfile_pool.cpp:306] get chunk success! now pool size = 44017
W 2020-11-18T02:09:45-0500 49589 ext4_filesystem_impl.cpp:142] open failed: Too many open files, file path = /data/chunkserver0/chunkfilepool/30235
E 2020-11-18T02:09:45-0500 49589 chunkfile_pool.cpp:368] file open failed, /data/chunkserver0/chunkfilepool/30235
I 2020-11-18T02:09:45-0500 49589 chunkfile_pool.cpp:289] src path = /data/chunkserver0/chunkfilepool/30235, dist path = /data/chunkserver0/copysets/4294967448/data/chunk_57670
E 2020-11-18T02:09:45-0500 49589 chunkfile_pool.cpp:311] write metapage failed, /data/chunkserver0/chunkfilepool/30235
W 2020-11-18T02:09:45-0500 49589 ext4_filesystem_impl.cpp:142] open failed: Too many open files, file path = /data/chunkserver0/chunkfilepool/35831
E 2020-11-18T02:09:45-0500 49589 chunkfile_pool.cpp:368] file open failed, /data/chunkserver0/chunkfilepool/35831
I 2020-11-18T02:09:45-0500 49589 chunkfile_pool.cpp:289] src path = /data/chunkserver0/chunkfilepool/35831, dist path = /data/chunkserver0/copysets/4294967448/data/chunk_57670
E 2020-11-18T02:09:45-0500 49589 chunkfile_pool.cpp:311] write metapage failed, /data/chunkserver0/chunkfilepool/35831
W 2020-11-18T02:09:45-0500 49589 ext4_filesystem_impl.cpp:142] open failed: Too many open files, file path = /data/chunkserver0/chunkfilepool/42129
E 2020-11-18T02:09:45-0500 49589 chunkfile_pool.cpp:368] file open failed, /data/chunkserver0/chunkfilepool/42129
I 2020-11-18T02:09:45-0500 49589 chunkfile_pool.cpp:289] src path = /data/chunkserver0/chunkfilepool/42129, dist path = /data/chunkserver0/copysets/4294967448/data/chunk_57670
E 2020-11-18T02:09:45-0500 49589 chunkfile_pool.cpp:311] write metapage failed, /data/chunkserver0/chunkfilepool/42129
W 2020-11-18T02:09:45-0500 49589 ext4_filesystem_impl.cpp:142] open failed: Too many open files, file path = /data/chunkserver0/chunkfilepool/21533
E 2020-11-18T02:09:45-0500 49589 chunkfile_pool.cpp:368] file open failed, /data/chunkserver0/chunkfilepool/21533
I 2020-11-18T02:09:45-0500 49589 chunkfile_pool.cpp:289] src path = /data/chunkserver0/chunkfilepool/21533, dist path = /data/chunkserver0/copysets/4294967448/data/chunk_57670
E 2020-11-18T02:09:45-0500 49589 chunkfile_pool.cpp:311] write metapage failed, /data/chunkserver0/chunkfilepool/21533
W 2020-11-18T02:09:45-0500 49589 ext4_filesystem_impl.cpp:142] open failed: Too many open files, file path = /data/chunkserver0/chunkfilepool/4215
E 2020-11-18T02:09:45-0500 49589 chunkfile_pool.cpp:368] file open failed, /data/chunkserver0/chunkfilepool/4215
I 2020-11-18T02:09:45-0500 49589 chunkfile_pool.cpp:289] src path = /data/chunkserver0/chunkfilepool/4215, dist path = /data/chunkserver0/copysets/4294967448/data/chunk_57670
E 2020-11-18T02:09:45-0500 49589 chunkfile_pool.cpp:311] write metapage failed, /data/chunkserver0/chunkfilepool/4215
E 2020-11-18T02:09:45-0500 49589 chunkserver_chunkfile.cpp:195] Error occured when create file. filepath = /data/chunkserver0/copysets/4294967448/data/chunk_57670
W 2020-11-18T02:09:45-0500 49589 chunkserver_datastore.cpp:197] Create chunk file failed.ChunkID = 57670, ErrorCode = 1
F 2020-11-18T02:09:45-0500 49589 op_request.cpp:479] write failed:  logic pool id: 1 copyset id: 152 chunkid: 57670 data size: 4096 data store return: 1

对应chunkserver手动无法拉起。尝试重启集群

ansible-playbook -i server.ini stop_curve.yml 
ansible-playbook -i server.ini start_curve.yml

之后有的chunkserver启动了，而另一些chunkserver offline。log与上面类似。

I 2020-11-18T02:36:46-0500 103320 chunkfile_pool.cpp:306] get chunk success! now pool size = 44013
W 2020-11-18T02:36:46-0500 103315 ext4_filesystem_impl.cpp:142] open failed: Too many open files, file path = /data/chunkserver4/chunkfilepool/13316
E 2020-11-18T02:36:46-0500 103315 chunkfile_pool.cpp:368] file open failed, /data/chunkserver4/chunkfilepool/13316
I 2020-11-18T02:36:46-0500 103315 chunkfile_pool.cpp:289] src path = /data/chunkserver4/chunkfilepool/13316, dist path = /data/chunkserver4/copysets/4294967520/data/chunk_91042
E 2020-11-18T02:36:46-0500 103315 chunkfile_pool.cpp:311] write metapage failed, /data/chunkserver4/chunkfilepool/13316
W 2020-11-18T02:36:46-0500 103315 ext4_filesystem_impl.cpp:142] open failed: Too many open files, file path = /data/chunkserver4/chunkfilepool/20387
E 2020-11-18T02:36:46-0500 103315 chunkfile_pool.cpp:368] file open failed, /data/chunkserver4/chunkfilepool/20387
I 2020-11-18T02:36:46-0500 103315 chunkfile_pool.cpp:289] src path = /data/chunkserver4/chunkfilepool/20387, dist path = /data/chunkserver4/copysets/4294967520/data/chunk_91042
E 2020-11-18T02:36:46-0500 103315 chunkfile_pool.cpp:311] write metapage failed, /data/chunkserver4/chunkfilepool/20387
W 2020-11-18T02:36:46-0500 103315 ext4_filesystem_impl.cpp:142] open failed: Too many open files, file path = /data/chunkserver4/chunkfilepool/25475
E 2020-11-18T02:36:46-0500 103315 chunkfile_pool.cpp:368] file open failed, /data/chunkserver4/chunkfilepool/25475
I 2020-11-18T02:36:46-0500 103315 chunkfile_pool.cpp:289] src path = /data/chunkserver4/chunkfilepool/25475, dist path = /data/chunkserver4/copysets/4294967520/data/chunk_91042
E 2020-11-18T02:36:46-0500 103315 chunkfile_pool.cpp:311] write metapage failed, /data/chunkserver4/chunkfilepool/25475
W 2020-11-18T02:36:46-0500 103315 ext4_filesystem_impl.cpp:142] open failed: Too many open files, file path = /data/chunkserver4/chunkfilepool/34096
E 2020-11-18T02:36:46-0500 103315 chunkfile_pool.cpp:368] file open failed, /data/chunkserver4/chunkfilepool/34096
I 2020-11-18T02:36:46-0500 103315 chunkfile_pool.cpp:289] src path = /data/chunkserver4/chunkfilepool/34096, dist path = /data/chunkserver4/copysets/4294967520/data/chunk_91042
E 2020-11-18T02:36:46-0500 103315 chunkfile_pool.cpp:311] write metapage failed, /data/chunkserver4/chunkfilepool/34096
W 2020-11-18T02:36:46-0500 103315 ext4_filesystem_impl.cpp:142] open failed: Too many open files, file path = /data/chunkserver4/chunkfilepool/734
E 2020-11-18T02:36:46-0500 103315 chunkfile_pool.cpp:368] file open failed, /data/chunkserver4/chunkfilepool/734
I 2020-11-18T02:36:46-0500 103315 chunkfile_pool.cpp:289] src path = /data/chunkserver4/chunkfilepool/734, dist path = /data/chunkserver4/copysets/4294967520/data/chunk_91042
E 2020-11-18T02:36:46-0500 103315 chunkfile_pool.cpp:311] write metapage failed, /data/chunkserver4/chunkfilepool/734
E 2020-11-18T02:36:46-0500 103315 chunkserver_chunkfile.cpp:195] Error occured when create file. filepath = /data/chunkserver4/copysets/4294967520/data/chunk_91042
W 2020-11-18T02:36:46-0500 103315 chunkserver_datastore.cpp:197] Create chunk file failed.ChunkID = 91042, ErrorCode = 1
F 2020-11-18T02:36:46-0500 103315 op_request.cpp:532] write failed:  logic pool id: 1 copyset id: 224 chunkid: 91042 data size: 4096 data store return: 1

两次做fio测试都有类似问题，无法测试性能。如果有其它测试方法希望分享一下。
另外请问清理集群是 ansible-playbook -i server.ini clean_curve.yml 么？实际有时候运行完再部署还是不行，不知道是什么文件没删掉。
ansible配置文件见附件。config.zip

基本是抄的 https://github.com/opencurve/curve/blob/master/docs/cn/deploy.md

使用命令 ansible 出现FAILED 无法部署

Describe the bug (描述bug)
使用命令 ansible-playbook -i server.ini deploy_curve.yml 进行部署的时候，提示

TASK [generate_config : generate configuration file directly] *************************************************************************************************
fatal: [localhost]: FAILED! => {"changed": false, "checksum": "27c7b68395f392cdc4d364ba6afa06b577c925ff", "msg": "Destination /etc/curve not writable"}

。。。。。。。。。。。
fatal: [localhost]: FAILED! => {"changed": true, "cmd": "sudo cp /etc/curve/etcd.conf.yml /etc/curve/etcd.conf.yml.bak", "delta": "0:00:00.064517", "end": "2021-02-21 09:41:20.241484", "msg": "non-zero return code", "rc": 1, "start": "2021-02-21 09:41:20.176967", "stderr": "cp: cannot stat '/etc/curve/etcd.conf.yml': No such file or directory", "stderr_lines": ["cp: cannot stat '/etc/curve/etcd.conf.yml': No such file or directory"], "stdout": "", "stdout_lines": []}

然后失败无法部署。

To Reproduce (复现方法)

删除 etc/curve 可复现这种情况。 ansiable 与 rm 操作都在root用户下进行
根据提示把 etc/curve 目录拥有者改成curve后没有这个报错。

chown -R curve:curve /etc/curve

Expected behavior (期望行为)

修复此bug

Versions (各种版本)

编译和部署均使用 docker opencurve/curveintegration:centos8 镜像

版本为 commit 1c81911 (HEAD -> master, origin/master, origin/HEAD)

Additional context/screenshots (更多上下文/截图)

求拉下微信群

General Question

群满了QAQ

ubuntu上无法编译成功

General Question

test/failpoint/failpoint_test.cpp:24:25: fatal error: fiu-control.h: No such file or directory
在源代码里面，找不到该文件
需要自己编译下libfiu 嘛？在构建文件里面没发现

Docker中执行 ./build.sh 编译错误

General Question

按照官方文档拉取 opencurve/curvebuild:centos8 镜像，随后拉取代码以后执行build.sh进行编译，出现如下错误：

[280 / 833] 8 actions, 7 running
Compiling external/com_google_protobuf/src/google/protobuf/descriptor.cc; 2s processwrapper-sandbox
@com_google_protobuf//:protobuf; 1s processwrapper-sandbox
@com_google_protobuf//:protoc_lib; 1s processwrapper-sandbox
@com_google_protobuf//:protoc_lib; 1s processwrapper-sandbox
@com_google_protobuf//:protobuf; 0s processwrapper-sandbox
Compiling external/com_google_protobuf/src/google/protobuf/descriptor.pb.cc; 0s processwrapper-sandbox
Compiling external/com_google_protobuf/src/google/protobuf/descriptor_database.cc; 0s processwrapper-sandbox
[-----] Compiling external/com_google_protobuf/src/google/protobuf/util/internal/object_writer.cc

Server terminated abruptly (error code: 14, error message: '', log file: '/root/.cache/bazel/_bazel_root/6f5f4033910d16741199bab868564698/server/jvm.out')

build phase1 failed
[root@0e79895da77d curve]# cat /root/.cache/bazel/_bazel_root/6f5f4033910d16741199bab868564698/server/jvm.out
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by com.google.protobuf.UnsafeUtil (file:/root/.cache/bazel/_bazel_root/install/792a28b07894763eaa2bd870f8776b23/_embedded_binaries/A-server.jar) to field java.lang.String.value
WARNING: Please consider reporting this to the maintainers of com.google.protobuf.UnsafeUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release

It seems the same architechure with GFS?

General Question

rt.

单机版本部署，格式化 /dev/nbd0 失败

Describe the bug (描述bug)

mkfs.ext4 /dev/nbd0 失败，以下查询信息：

root@ubuntu-xenial:/home/vagrant# curve_ops_tool status

Cluster status:
Get status metric from 127.0.0.1:8081 fail
No snapshot-clone-server is active
snapshot-clone-server 127.0.0.1:5556 is offline
cluster is not healthy
total copysets: 100, unhealthy copysets: 0, unhealthy_ratio: 0%
physical pool number: 1, logical pool number: 1
total space = 122021132GB, logical used = 24GB(0.00%, can be recycled = 0GB(0.00%)), physical used = 1GB(0.00%)

Client status:
nebd-server: version-0.1.0: 1

MDS status:
version: 0.1.0
current MDS: 127.0.0.1:6666
online mds list: 127.0.0.1:6666
offline mds list:

Etcd status:
version: 3.4.0
current etcd: 127.0.0.1:2379
online etcd list: 127.0.0.1:2379
offline etcd list:

SnapshotCloneServer status:
no version found!
GetAndCheckSnapshotCloneVersion fail
Get status metric from 127.0.0.1:8081 fail
current snapshot-clone-server:
online snapshot-clone-server list:
offline snapshot-clone-server list: 127.0.0.1:5556

ChunkServer status:
version: 0.1.0
chunkserver: total num = 3, online = 3, offline = 0(recoveringout = 0, chunkserverlist: [])
left size: min = 20278169GB, max = 56282713GB, average = 40673710.33GB, range = 36004544GB, variance = 227510007645070.22

root@ubuntu-xenial:/home/vagrant# curve-nbd list-mapped

id image device
18042 cbd:pool//test_curve_ /dev/nbd0

root@ubuntu-xenial:/home/vagrant# fdisk /dev/nbd0

Welcome to fdisk (util-linux 2.27.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.

fdisk: cannot open /dev/nbd0: Input/output error

root@ubuntu-xenial:/home/vagrant# lsblk

NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 10G 0 disk
└─sda1 8:1 0 10G 0 part /
sdb 8:16 0 10M 0 disk
nbd0 43:0 0 10G 0 disk
root@ubuntu-xenial:/home/vagrant#

ubuntu16.04 ,按照单机版本部署
logicalpools/name 改为了 2
curve 分支: master

ChunkServer 日志：server.tar.gz
/data/ 日志：data.tar.gz

挂载卷失败

问题描述：
创建完卷后进行卷挂载时提示失败，返回如下：
curve-nbd: kernel reported invalid size (85899345920, expected 10737418240)
curve-ndb: failed to map, status: Invalid argument

To Reproduce (复现方法)
单物理机进行单机部署. 步骤参照: https://github.com/opencurve/curve/blob/master/docs/cn/deploy.md#单机部署

Versions (各种版本)
OS: CentOS Linux release 7.3.1611 (Core)
gcc: 8.4.0
opsnssl: OpenSSL 1.1.1g 21 Apr 2020
git: 2.28.0
curve: v1.0.0-beta
nbd info

集群状态：

创建卷、挂载卷：

nebd.zip

【C-Plan选题一：清理代码中的TODO】wrap some configurations into EtcdConf

Describe the task you choose (描述你选择的任务)
// TODO(lixiaocui): why not store all configuration in EtcdConf? at src/mds/server/mds.cpp

Describe alternatives you've considered (optional) (描述你想到的方案(可选))

Additional context/screenshots (更多上下文/截图)

无法在centos7上面编译

Describe the bug (描述bug)
无法在centos7上面编译

To Reproduce (复现方法)
bazel 1.2.1
bash ./mk-tar.sh

Expected behavior (期望行为)

Versions (各种版本)
OS: centos7
Compiler: gcc (GCC) 7.3.1
curve-mds:
curve-chunkserver:
curve-snapshotcloneserver:
curve-sdk:
nebd:
curve-nbd:

Additional context/screenshots (更多上下文/截图)
name 'http_archive' is not defined
name 'new_git_repository' is not defined (did you mean 'git_repository'?)

【C-Plan选题二：捉虫计划】fix a initialization error.

**Describe alternatives you've considered (optional) **

in /curve/test/chunkserver/multiple_copysets_io_test.cpp function update_leader()

The leader field in the info structure is not explicitly initialized, which means that its initial value is 0, but in function update_leader, if the first if is judged to be flase because of the failure of the election of the leader, it will directly return to copyset->leader, but zero is a valid one Value, this will cause an error.

So I think that the initial value of copyset->leader should be set to -1, and a meaningless judgment in update_leader should be deleted, which will cause the boundary to be crossed after the initialization is -1

单机部署-部署集群报错 No daemon installed

General Question

我在单机部署中，进行集群部署时报错，报错信息如下

fatal: [localhost]: FAILED! => {"changed": true, "cmd": "sudo ./mds-daemon.sh start", "delta": "0:00:00.024325", "end": "2020-08-04 20:40:04.707755", "msg": "non-zero return code", "rc": 1, "start": "2020-08-04 20:40:04.683430", "stderr": "", "stderr_lines": [], "stdout": "No daemon installed", "stdout_lines": ["No daemon installed"]}

随后我往前查看部署日志，发现已经通过daemon install的验证

TASK [install_package : install daemon] ****************************************
included: /home/curve/curve/curve-ansible/roles/install_package/tasks/include/install_daemon.yml for localhost

TASK [install_package : determine if daemon installed] *************************
changed: [localhost]

TASK [install_package : set daemon_installed] **********************************
ok: [localhost]

我继续查看/home/curve/curve/curve-ansible/roles/install_package/tasks/include/install_daemon.yml的内容，发现会执行一条shell命令shell: daemon --version，我随后手动执行daemon --version并得到结果

[curve@localhost ~]$ daemon --version
daemon-0.6.4

表明已经安装了daemon，那么为什么又会报以上错误呢？
期待你们的答复，非常感谢

单机部署成功，创建文件失败

[root@057e65f0a884 curve-ansible]# curve_ops_tool status
Cluster status:
Copysets are not healthy!
Get status metric from 127.0.0.1:8081 fail
No snapshot-clone-server is active
snapshot-clone-server 127.0.0.1:5556 is offline
cluster is not healthy
total copysets: 100, unhealthy copysets: 13, unhealthy_ratio: 13%
physical pool number: 1, logical pool number: 1
total space = 0GB, logical used = 0GB(0.00%, can be recycled = 0GB(0.00%)), physical used = 0GB(0.00%)

Client status:

MDS status:
version: 9.9.9
current MDS: 127.0.0.1:6666
online mds list: 127.0.0.1:6666
offline mds list:

Etcd status:
version: 3.4.0
current etcd: 127.0.0.1:2379
online etcd list: 127.0.0.1:2379
offline etcd list:

ChunkServer status:
version: 9.9.9
chunkserver: total num = 3, online = 3, offline = 0(recoveringout = 0, chunkserverlist: [])
left size: min = 0GB, max = 0GB, average = 0.00GB, range = 0GB, variance = 0.00
[root@057e65f0a884 curve-ansible]# curve create --filename /test --length 10 --user root
E 2020-08-03T23:09:59.519749+0800 12349 server.cpp:994] Fail to listen 0.0.0.0:9000
E 2020-08-03T23:09:59.519836+0800 12349 server.cpp:1832] Fail to start dummy_server at port=9000
E 2020-08-03T23:09:59.529330+0800 12349 mds_client.cpp:314] CreateFile: filename = /test, owner = root, is nomalfile: 1, errocde = 4, error msg = kOwnerAuthFail, log id = 1
create fail, ret = -4
[root@057e65f0a884 curve-ansible]#

[root@057e65f0a884 curve-ansible]# ps -aux|grep chunkserver
root 6893 55.9 4.3 745128 88044 ? Sl 22:56 11:06 curve-chunkserver -bthread_concurrency=18 -raft_max_segment_size=8388608 -raft_max_install_snapshot_tasks_num=1 -raft_sync=true -conf=/etc/curve/chunkserver.conf -enableChunkfilepool=false -chunkFilePoolDir=./data/chunkserver0 -chunkFilePoolMetaPath=./data/chunkserver0/chunkfilepool.meta -chunkServerIp=127.0.0.1 -chunkServerPort=8200 -chunkServerMetaUri=local://./data/chunkserver0/chunkserver.dat -chunkServerStoreUri=local://./data/chunkserver0/ -copySetUri=local://./data/chunkserver0/copysets -raftSnapshotUri=curve://./data/chunkserver0/copysets -recycleUri=local://./data/chunkserver0/recycler -graceful_quit_on_sigterm=true -raft_sync_meta=true -raft_sync_segments=true -graceful_quit_on_sigterm=true -log_dir=./data/log/chunkserver0
root 6899 35.2 4.1 728712 84560 ? Sl 22:56 6:59 curve-chunkserver -bthread_concurrency=18 -raft_max_segment_size=8388608 -raft_max_install_snapshot_tasks_num=1 -raft_sync=true -conf=/etc/curve/chunkserver.conf -enableChunkfilepool=false -chunkFilePoolDir=./data/chunkserver1 -chunkFilePoolMetaPath=./data/chunkserver1/chunkfilepool.meta -chunkServerIp=127.0.0.1 -chunkServerPort=8201 -chunkServerMetaUri=local://./data/chunkserver1/chunkserver.dat -chunkServerStoreUri=local://./data/chunkserver1/ -copySetUri=local://./data/chunkserver1/copysets -raftSnapshotUri=curve://./data/chunkserver1/copysets -recycleUri=local://./data/chunkserver1/recycler -graceful_quit_on_sigterm=true -raft_sync_meta=true -raft_sync_segments=true -graceful_quit_on_sigterm=true -log_dir=./data/log/chunkserver1
root 6987 27.4 4.1 728696 85064 ? Sl 22:56 5:26 curve-chunkserver -bthread_concurrency=18 -raft_max_segment_size=8388608 -raft_max_install_snapshot_tasks_num=1 -raft_sync=true -conf=/etc/curve/chunkserver.conf -enableChunkfilepool=false -chunkFilePoolDir=./data/chunkserver2 -chunkFilePoolMetaPath=./data/chunkserver2/chunkfilepool.meta -chunkServerIp=127.0.0.1 -chunkServerPort=8202 -chunkServerMetaUri=local://./data/chunkserver2/chunkserver.dat -chunkServerStoreUri=local://./data/chunkserver2/ -copySetUri=local://./data/chunkserver2/copysets -raftSnapshotUri=curve://./data/chunkserver2/copysets -recycleUri=local://./data/chunkserver2/recycler -graceful_quit_on_sigterm=true -raft_sync_meta=true -raft_sync_segments=true -graceful_quit_on_sigterm=true -log_dir=./data/log/chunkserver2
root 12363 0.0 0.0 9180 1036 pts/0 R+ 23:16 0:00 grep --color=auto chunkserver

UUID is missing in fstab when using disk partition as chunkserver

Describe the bug (描述bug)
If we use disk partition as chunkserver, such as:

xxx@curve-chunk-node2:~$ lsblk | grep chunk
├─sdu2 65:66 0 1.1T 0 part /data/chunkserver11

then after deploy the UUID is missing in /etc/fstab:
#curvefs
UUID= /data/chunkserver11 ext4 rw,errors=remount-ro 0 0

./curve-chunkserver/home/nbs/chunkserver_deploy.sh:

this is because we don't handle the prifix "├─sdu2" of lsblk CLI.

To Reproduce (复现方法)
see above.

Expected behavior (期望行为)
support using disk partition as chunkserver, and UUID is ok in fstab.

Versions (各种版本)
OS:
Compiler:
curve-mds:
curve-chunkserver:
curve-snapshotcloneserver:
curve-sdk:
nebd:
curve-nbd:

Additional context/screenshots (更多上下文/截图)

有没有有支持DOCKER部署的打算啊 ANSIBLE什么的太麻烦了

General Question

如题，CEPH 现在支持DOCKER部署很是方便啊

is there a CSI plugin for Kubernetes?

General Question

Just as the title, how to integrate Curve with K8s?

【C-Plan】编译与部署

一、编译

基于curvebuild镜像，拉取docker镜像并启动

docker pull opencurve/curvebuild:centos8
docker run -it opencurve/curvebuild:centos8 /bin/bash

遇到无法联网问题，加上--net=host

docker run --net=host -it opencurve/curvebuild:centos8 /bin/bash

编译和打包顺利，没有遇到问题：

使用的是个人PC的虚拟机，编译打包耗时约一个小时

可以看到，tar文件打包成功

二、单机部署

基于curveintegration镜像

拉取并启动镜像

docker run --cap-add=ALL -v /dev:/dev -v /lib/modules:/lib/modules --privileged -it opencurve/curveintegration:centos8 /bin/bash

容器内无法联网，加上--net=host，重新 docker run

docker run --net=host --cap-add=ALL -v /dev:/dev -v /lib/modules:/lib/modules --privileged -it opencurve/curveintegration:centos8 /bin/bash

获取tar并解压

按步骤进行

执行单机部署

部署集群并启动服务

ansible-playbook -i server.ini deploy_curve.yml

执行命令查看当前集群状态

curve_ops_tool status

ansible-playbook -i client.ini deploy_nebd.yml
ansible-playbook -i client.ini deploy_nbd.yml
ansible-playbook -i client.ini deploy_curve_sdk.yml

前几次部署尝试安装NBD包一直遇到报错，原因是第一次添加nbd模块时权限不足，感谢前面的@taohansi同学提供的解决办法。

单机部署验证

可以看到nbd0卷

client如何支持热升级？

请问“Client还支持热升级，可以在用户无感知的情况下进行底层版本变更”是如何实现的？是client分为两个进程（一个轻量的只有接口的light client，一个有实际逻辑的core client）吗？

多机部署失败

Describe the bug (描述bug)
使用的是Debian 9 操作系统部署。
安装过程中，会检查所依赖的 lib。其中会检查软件podlators-perl，但是该软件被perl-modules-5.24替代，因此检查和安装都会失败，需要手动修改配置文件 roles/prepare_software_env/tasks/main.yml。

To Reproduce (复现方法)

Expected behavior (期望行为)

Versions (各种版本)
OS: Debian
Compiler: gcc-6.3.0 g++
curve-mds:
curve-chunkserver:
curve-snapshotcloneserver:
curve-sdk:
nebd:
curve-nbd:

Additional context/screenshots (更多上下文/截图)

【C-Plan选题一：清理代码中的TODO】parameters rename

Describe the task you choose (描述你选择的任务)
// TODO(hzchenwei3): change oldFileName to sourceFileName and newFileName to destFileName at src/mds/nameserver2/curvefs.cpp

Describe alternatives you've considered (optional) (描述你想到的方案(可选))

Additional context/screenshots (更多上下文/截图)

【C-Plan选题一：清理代码中的TODO】wrap initializing code of raft node options into a function

Describe the task you choose (描述你选择的任务)
wrap initializing code of raft node options into a function
// TODO(wudemiao): 放到nodeOptions的init中 at src/chunkserver/copyset_node.cpp

Describe alternatives you've considered (optional) (描述你想到的方案(可选))
I simply wrap the code, do not know if it's true......

Additional context/screenshots (更多上下文/截图)

【C-Plan选题一：清理代码中的TODO】add return value check

Describe the task you choose (描述你选择的任务)
// TODO(wuhanqing): 判断返回值 at src/client/libcbd_libcurve.cpp

Describe alternatives you've considered (optional) (描述你想到的方案(可选))

Additional context/screenshots (更多上下文/截图)

执行mk-tar.sh失败

Describe the bug (描述bug)

执行bash mk-tar.sh进行编译打包，进行到第七步打包python whell的打包py2时，报错。我的环境上同时安装了python2和python3。

To Reproduce (复现方法)

在同时有py2、py3的机器上执行bash mk-tar.sh

Expected behavior (期望行为)

期望py2和py3的包都打出来，但实际上只打出了py3的，py2的报错。

Versions (各种版本)
OS: Ubuntu 18.04
Compiler: gcc 7.4.0 bazel 0.17.2
curve-mds: master
curve-chunkserver: master
curve-snapshotcloneserver: master
curve-sdk: master
nebd: master
curve-nbd: master

Additional context/screenshots (更多上下文/截图)

在python包build过程中，会重建tmplib目录：https://github.com/opencurve/curve/blob/master/curvefs_python/configure.sh#L62-L73，
会从bazel-bin目录中拷贝so文件到tmplib。

然后在第一遍打包py3时没问题：
https://github.com/opencurve/curve/blob/master/mk-tar.sh#L90

但当打完py3的包后，bazel-bin目录已经发生了改变，导致再打包py2时，bazel-bin目录已经没有了需要的so文件，从而py2打包失败。

建议：

需要在打包前，备份一下bazel-bin中生成的各个so文件。Patch: #216
或者，由于py2已经官方废弃，不如curve这里只支持py3？

单机部署-nbd读写挂起

Describe the bug (描述bug)
单机部署-nbd读写挂起

To Reproduce (复现方法)
CentOS 8.2下单机部署，map nbd后读写/dev/nbd0

Expected behavior (期望行为)
正常读写

Versions (各种版本)
OS: CentOS 8.2 x86_64
Compiler:
curve-mds: 1.1.0-beta+5d648c9ec
curve-chunkserver: 1.1.0-beta+5d648c9ec
curve-snapshotcloneserver:
curve-sdk: 1.1.0-beta+5d648c9ec
nebd: 1.1.0-beta+5d648c9ec
curve-nbd: 1.1.0-beta+5d648c9ec

Additional context/screenshots (更多上下文/截图)
W 2020-10-26T16:46:02.317500+0800 2468 replicator.cpp:299] Group 4294967317 fail to issue RPC to 10.202.91.10:8202:0 _consecutive_error_times=1, [E1008]Reached timeout=500ms @10.202.91.10:8202
W 2020-10-26T16:46:02.317519+0800 2468 replicator.cpp:299] Group 4294967354 fail to issue RPC to 10.202.91.10:8202:0 _consecutive_error_times=1, [E1008]Reached timeout=500ms @10.202.91.10:8202
W 2020-10-26T16:46:02.317534+0800 2468 replicator.cpp:299] Group 4294967319 fail to issue RPC to 10.202.91.10:8202:0 _consecutive_error_times=1, [E1008]Reached timeout=500ms @10.202.91.10:8202
W 2020-10-26T16:46:02.317579+0800 2479 replicator.cpp:299] Group 4294967394 fail to issue RPC to 10.202.91.10:8202:0 _consecutive_error_times=1, [E1008]Reached timeout=500ms @10.202.91.10:8202
W 2020-10-26T16:46:02.317595+0800 2479 replicator.cpp:299] Group 4294967303 fail to issue RPC to 10.202.91.10:8202:0 _consecutive_error_times=1, [E1008]Reached timeout=500ms @10.202.91.10:8202
W 2020-10-26T16:46:02.317729+0800 2476 replicator.cpp:299] Group 4294967352 fail to issue RPC to 10.202.91.10:8202:0 _consecutive_error_times=1, [E1008]Reached timeout=500ms @10.202.91.10:8202
W 2020-10-26T16:46:02.317749+0800 2476 replicator.cpp:299] Group 4294967332 fail to issue RPC to 10.202.91.10:8202:0 _consecutive_error_times=1, [E1008]Reached timeout=500ms @10.202.91.10:8202
W 2020-10-26T16:46:02.317764+0800 2476 replicator.cpp:299] Group 4294967305 fail to issue RPC to 10.202.91.10:8202:0 _consecutive_error_times=1, [E1008]Reached timeout=500ms @10.202.91.10:8202
W 2020-10-26T16:46:02.317780+0800 2476 replicator.cpp:299] Group 4294967302 fail to issue RPC to 10.202.91.10:8202:0 _consecutive_error_times=1, [E1008]Reached timeout=500ms @10.202.91.10:8202
W 2020-10-26T16:46:02.348995+0800 2475 node.cpp:1244] node 4294967358:10.202.91.10:8200:0 received invalid RequestVoteResponse from 10.202.91.10:8201:0 state not in CANDIDATE but LEADER
W 2020-10-26T16:46:02.349308+0800 2464 node.cpp:1316] node 4294967344:10.202.91.10:8200:0 received invalid PreVoteResponse from 10.202.91.10:8202:0 state not in STATE_FOLLOWER but CANDIDATE
W 2020-10-26T16:46:02.563308+0800 2468 node.cpp:1292] node 4294967381:10.202.91.10:8200:0 received RequestVoteResponse from 10.202.91.10:8202:0 error: [E1008]Reached timeout=1000ms @10.202.91.10:8202
W 2020-10-26T16:46:02.563849+0800 2479 node.cpp:1292] node 4294967373:10.202.91.10:8200:0 received RequestVoteResponse from 10.202.91.10:8202:0 error: [E1008]Reached timeout=1000ms @10.202.91.10:8202
W 2020-10-26T16:46:02.564920+0800 2466 node.cpp:1292] node 4294967348:10.202.91.10:8200:0 received RequestVoteResponse from 10.202.91.10:8202:0 error: [E1008]Reached timeout=1000ms @10.202.91.10:8202
W 2020-10-26T16:46:02.628770+0800 2469 node.cpp:1222] node 4294967316:10.202.91.10:8200:0 term 362 steps down when reaching vote timeout: fail to get quorum vote-granted
W 2020-10-26T16:46:02.670714+0800 2469 node.cpp:1316] node 4294967306:10.202.91.10:8200:0 received invalid PreVoteResponse from 10.202.91.10:8202:0 state not in STATE_FOLLOWER but CANDIDATE
W 2020-10-26T16:46:02.702837+0800 2465 node.cpp:2065] node 4294967351:10.202.91.10:8200:0 ignore stale AppendEntries from 10.202.91.10:8201:0 in term 353 current_term 354
W 2020-10-26T16:46:02.703104+0800 2471 node.cpp:1316] node 4294967321:10.202.91.10:8200:0 received invalid PreVoteResponse from 10.202.91.10:8202:0 state not in STATE_FOLLOWER but CANDIDATE
W 2020-10-26T16:46:02.714792+0800 2479 node.cpp:1292] node 4294967338:10.202.91.10:8200:0 received RequestVoteResponse from 10.202.91.10:8202:0 error: [E1008]Reached timeout=1000ms @10.202.91.10:8202
W 2020-10-26T16:46:02.714850+0800 2479 node.cpp:1292] node 4294967324:10.202.91.10:8200:0 received RequestVoteResponse from 10.202.91.10:8202:0 error: [E1008]Reached timeout=1000ms @10.202.91.10:8202
W 2020-10-26T16:46:02.810375+0800 2473 node.cpp:1316] node 4294967316:10.202.91.10:8200:0 received invalid PreVoteResponse from 10.202.91.10:8202:0 state not in STATE_FOLLOWER but CANDIDATE
W 2020-10-26T16:46:02.811153+0800 2463 node.cpp:1244] node 4294967306:10.202.91.10:8200:0 received invalid RequestVoteResponse from 10.202.91.10:8202:0 state not in CANDIDATE but LEADER
W 2020-10-26T16:46:02.816112+0800 2479 node.cpp:1292] node 4294967344:10.202.91.10:8200:0 received RequestVoteResponse from 10.202.91.10:8202:0 error: [E1008]Reached timeout=1000ms @10.202.91.10:8202
W 2020-10-26T16:46:02.816541+0800 2468 node.cpp:1292] node 4294967353:10.202.91.10:8200:0 received RequestVoteResponse from 10.202.91.10:8202:0 error: [E1008]Reached timeout=1000ms @10.202.91.10:8202
W 2020-10-26T16:46:02.844225+0800 2468 node.cpp:1222] node 4294967326:10.202.91.10:8200:0 term 360 steps down when reaching vote timeout: fail to get quorum vote-granted
W 2020-10-26T16:46:02.863354+0800 2468 node.cpp:1244] node 4294967321:10.202.91.10:8200:0 received invalid RequestVoteResponse from 10.202.91.10:8202:0 state not in CANDIDATE but LEADER
W 2020-10-26T16:46:03.010736+0800 2478 node.cpp:1244] node 4294967316:10.202.91.10:8200:0 received invalid RequestVoteResponse from 10.202.91.10:8202:0 state not in CANDIDATE but LEADER
W 2020-10-26T16:46:03.028254+0800 2473 node.cpp:1316] node 4294967326:10.202.91.10:8200:0 received invalid PreVoteResponse from 10.202.91.10:8202:0 state not in STATE_FOLLOWER but CANDIDATE
W 2020-10-26T16:46:03.127198+0800 2476 node.cpp:1244] node 4294967326:10.202.91.10:8200:0 received invalid RequestVoteResponse from 10.202.91.10:8201:0 state not in CANDIDATE but LEADER
W 2020-10-26T16:49:58.732772+0800 2469 baidu_rpc_protocol.cpp:255] Fail to write into Socket{id=85899347749 fd=191 addr=10.202.91.10:49188:8200} (0x7f02bfd83f40): Unknown error 1014 [1014]
W 2020-10-26T16:51:19.010679+0800 2462 baidu_rpc_protocol.cpp:255] Fail to write into Socket{id=51539609032 fd=191 addr=10.202.91.10:49506:8200} (0x7f02c07a58c0): Unknown error 1014 [1014]
W 2020-10-26T16:51:39.987674+0800 2465 baidu_rpc_protocol.cpp:255] Fail to write into Socket{id=60129543972 fd=191 addr=10.202.91.10:49610:8200} (0x7f02bfd83d00): Unknown error 1014 [1014]

*** Aborted at 1603702159 (unix time) try "date -d @1603702159" if you are using GNU date ***
PC: @ 0x0 (unknown)
*** SIGILL (@0x559f3295db4e) received by PID 3261 (TID 0x7ff2d1c95700) from PID 848681806; stack trace: ***
@ 0x7ff2df048dd0 (unknown)
@ 0x559f3295db4e curve::client::RequestSender::ReadChunk()
@ 0x559f329573dd ZNSt17_Function_handlerIFvPN6google8protobuf7ClosureESt10shared_ptrIN5curve6client13RequestSenderEEEZNS6_13CopysetClient9ReadChunkERKNS6_11ChunkIDInfoEmlmmRKNS6_17RequestSourceIn
foES3_EUlS3_S8_E_E9_M_invokeERKSt9_Any_dataOS3_OS8
@ 0x559f32957faa curve::client::CopysetClient::DoRPCTask()
@ 0x559f329583b0 curve::client::CopysetClient::ReadChunk()
@ 0x559f32955a34 curve::client::RequestScheduler::ProcessOne()
@ 0x559f32955db2 curve::client::RequestScheduler::Process()
@ 0x7ff2ddc46b73 (unknown)
@ 0x7ff2df03e2de start_thread
@ 0x7ff2dd6a5e83 __GI___clone
@ 0x0 (unknown)
*** Aborted at 1603702162 (unix time) try "date -d @1603702162" if you are using GNU date ***
PC: @ 0x0 (unknown)
*** SIGILL (@0x558257b1cb4e) received by PID 3369 (TID 0x7f353cd42700) from PID 1471269710; stack trace: ***
@ 0x7f3549d63dd0 (unknown)
@ 0x558257b1cb4e curve::client::RequestSender::ReadChunk()
@ 0x558257b163dd ZNSt17_Function_handlerIFvPN6google8protobuf7ClosureESt10shared_ptrIN5curve6client13RequestSenderEEEZNS6_13CopysetClient9ReadChunkERKNS6_11ChunkIDInfoEmlmmRKNS6_17RequestSourceIn
foES3_EUlS3_S8_E_E9_M_invokeERKSt9_Any_dataOS3_OS8
@ 0x558257b16faa curve::client::CopysetClient::DoRPCTask()
@ 0x558257b173b0 curve::client::CopysetClient::ReadChunk()
@ 0x558257b14a34 curve::client::RequestScheduler::ProcessOne()
@ 0x558257b14db2 curve::client::RequestScheduler::Process()
@ 0x7f3548961b73 (unknown)
@ 0x7f3549d592de start_thread
@ 0x7f35483c0e83 __GI___clone
@ 0x0 (unknown)
*** Aborted at 1603702163 (unix time) try "date -d @1603702163" if you are using GNU date ***
PC: @ 0x0 (unknown)
*** SIGILL (@0x5615cb349b4e) received by PID 3395 (TID 0x7fa7e6df5700) from PID 18446744072823806798; stack trace: ***
@ 0x7fa7f35a0dd0 (unknown)
@ 0x5615cb349b4e curve::client::RequestSender::ReadChunk()
@ 0x5615cb3433dd ZNSt17_Function_handlerIFvPN6google8protobuf7ClosureESt10shared_ptrIN5curve6client13RequestSenderEEEZNS6_13CopysetClient9ReadChunkERKNS6_11ChunkIDInfoEmlmmRKNS6_17RequestSourceIn
foES3_EUlS3_S8_E_E9_M_invokeERKSt9_Any_dataOS3_OS8
@ 0x5615cb343faa curve::client::CopysetClient::DoRPCTask()
@ 0x5615cb3443b0 curve::client::CopysetClient::ReadChunk()
@ 0x5615cb341a34 curve::client::RequestScheduler::ProcessOne()
@ 0x5615cb341db2 curve::client::RequestScheduler::Process()
@ 0x7fa7f219eb73 (unknown)
@ 0x7fa7f35962de start_thread
@ 0x7fa7f1bfde83 __GI___clone
@ 0x0 (unknown)

完整日志请见附件：
nbd-hung-logs.tar.gz

part2 重试逻辑

General Question

void ClientClosure::OnRetry() {
    MetricHelper::IncremFailRPCCount(fileMetric_, reqCtx_->optype_);
   // -------条件1 chunkserverOPMaxRetry = 3
    if (reqDone_->GetRetriedTimes() >= failReqOpt_.chunkserverOPMaxRetry) {
        reqDone_->SetFailed(status_);
        LOG(ERROR) << OpTypeToString(reqCtx_->optype_)
                   << " retried times exceeds"
                   << ", IO id = " << reqDone_->GetIOTracker()->GetID()
                   << ", request id = " << reqCtx_->id_;
        done_->Run();
        return;
    }
    // -------条件2 chunkserverMaxRetryTimesBeforeConsiderSuspend = 20
    // ------- 我的理解是如果满足 reqDone_->GetRetriedTimes() >=        failReqOpt_.chunkserverMaxRetryTimesBeforeConsiderSuspend
    // ------- 肯定满足前面的条件1
    if (!reqDone_->IsSuspendRPC() && reqDone_->GetRetriedTimes() >=
        failReqOpt_.chunkserverMaxRetryTimesBeforeConsiderSuspend) {
        reqDone_->SetSuspendRPCFlag();
        MetricHelper::IncremIOSuspendNum(fileMetric_);
        LOG(WARNING) << "IO Retried "
                    << failReqOpt_.chunkserverMaxRetryTimesBeforeConsiderSuspend
                    << " times, set suspend flag! " << *reqCtx_
                    << ", IO id = " << reqDone_->GetIOTracker()->GetID()
                    << ", request id = " << reqCtx_->id_;
    }

    PreProcessBeforeRetry(status_, cntlstatus_);
    SendRetryRequest();
}

满足条件2，肯定会满足条件1，也就不会走到条件2，不知道我的理解对不对

raft snapshot 是不是存在不一致？

General Question

chunkserver raftsnapshot 模块，copyset_node.cpp 中 on_snapshot_save 方法只是列举了所有CHUNKFILE 类型的文件，并将文件名保存在 BRAFT_SNAPSHOT_META_FILE 文件中。后续 install_snapshot 的时候，follower会从leader下载这个meta文件、data目录下的chunkfile文件、data目录下的snapshotfile文件。snapshotfile可以理解为只读的，而chunkfile非只读的，那么安装快照的过程中，chunkfile有可能发生变化，会造成follower和leader数据的不一致。

不知道我理解的是不是有问题，还是有其他机制保证此处的一致性，敬请解答～

support k/v store like ssdb?

General Question

use curve as a k/v store service like redis?

单机部署第三步失败

General Question

按照单机部署：https://github.com/opencurve/curve/blob/master/docs/cn/deploy.md

执行第三步：ansible-playbook -i server.ini deploy_curve.yml
报如下错误：
TASK [install_package : determine if etcd exists] **********************************************************************************************************************************************************
fatal: [localhost]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: ssh: connect to host 127.0.0.1 port 1046: Connection refused", "unreachable": true}

Didn't start curve-mds by daemon

fatal: [mds3]: FAILED! => {"changed": true, "cmd": "sudo ./mds-daemon.sh start", "delta": "0:00:03.162596", "end": "2021-01-14 15:07:15.075607", "msg": "non-zero return code", "rc": 1, "start": "2021-01-14 15:07:11.913011", "stderr": "", "stderr_lines": [], "stdout": "subnet: 172.22.12.0/24\nport: 6666\nDidn't start curve-mds by daemon", "stdout_lines": ["subnet: 172.22.12.0/24", "port: 6666", "Didn't start curve-mds by daemon"]}
通过ansible进行集群部署，在1.0.1-rc0、1.1.0-beta版本都遇到了这个问题
问题发生在TASK [start_service : start by daemon]时，在第一遍运行启动运行集群脚本时会遇到这个问题脚本终止，第二遍可以顺利执行完脚本

请问，访问接口支持POSIX吗？

您好，访问接口支持POSIX吗？

安装失败

General Question

安装过程中总是提示创建文件被拒绝

【C-Plan】Ubuntu 18.04 下编译及部署

Describe the task you choose (描述你选择的任务)

完成 Ubuntu 18.04 环境下 curve 编译及部署

curve 编译

编译环境：4核8G Ubuntu 18.04

首先需要安装各种所需库依赖。 Dockerfile 中从 centos 基础镜像开始构建，依次 install 了许多安装包，对应 ubuntu 环境下需要安装如下预编译包（apt-get install）：

libssl-dev uuid-dev libfiu-dev libcurl4-openssl-dev zlib zlib1g.dev libnl-3-dev libboost-dev libunwind8-dev libnl-genl-3-dev python-pip

接着安装 bazel，执行

wget https://curve-build.nos-eastchina1.126.net/bazel-0.17.2-installer-linux-x86_64.sh
bash bazel-0.17.2-installer-linux-x86_64.sh

安装上述依赖和 bazel 均需 sudo 或 root 用户下执行。接下来编译及打包即可。

建议1：请勿打开 go mod，最好环境中不安装 golang。编译过程中下载了 1.12.8 版本的 golang 且对所有 etcd v3.4.0 的 go 依赖统一源代码形式下载至本地，编译时默认从源代码编译。由于 etcd 出现过替换 go.etcd.io/etcd 替换为 github.com/coreos/etcd ，以及 grpc 的版本问题，使用 go mod 下载的依赖会造成编译无法通过。

建议2：2次编译时可以将 https://github.com/opencurve/curve/blob/master/build.sh#L94 的 make clean 和 make all 注释掉，这部分 libetcdclient.so 已经编译完成后无需重新编译。

执行 bash build.sh 开始编译。由于未指定版本信息，打包完成生成如下4个文件

curve_9.9.9+5f9b5f68.tar.gz
curve-monitor_9.9.9+5f9b5f68.tar.gz
nbd_9.9.9+5f9b5f68.tar.gz
nebd_9.9.9+5f9b5f68.tar.gz

curve 部署

单机环境下按照部署文档中描述进行部署。

建议1：curve 使用了 perf，而 ubuntu 源中没有约定的 linux-perf 包，ubuntu 18.04 既定内核版本下安装：

sudo apt-get install linux-tools-common

安装完成后执行 perf 有对应提示，表明生效。
之后，将 https://github.com/opencurve/curve/blob/master/curve-ansible/roles/prepare_software_env/tasks/main.yml#L48 注释 install perf 相关。

建议2：部署权限问题。目前暂使用 root 用户直接部署，创建 curve 用户后包括 /etc/curve 等配置文件夹、nbd 执行等都存在权限问题，猜测 ansible 执行既定 playbooks 时仍需要 root 权限。

建议3：python 包网络下载慢建议添加 pip 国内源，etcd 也可手动下载至 /tmp/etcd-v3.4.0.tar.gz，并修改对应 ansible 配置文件。

【C-Plan】编译与部署

编译及部署初体验

本issue为C-Plan的必选任务，包含部署及编译。
本文先介绍遇到的问题，然后为成功的截图。

使用 opencurve/curveintegration:centos8 挂载与编译

部署成功

遇到情况：

1）
map的时候提示打不开镜像 nebd-server进程一直占用CPU高达25% (刚好4核之1核)，根据日志知道/data下面没有nebd目录，为什么会没有暂不清楚，用build目录下stop再start 没有办法解决，后来删除nebd日志并且执行 deploy_nebd.yml 可以启动。

I 2021-02-02T12:22:54.056986+0000 2000722 source_reader.cpp:59] SourceReader fdCloseThread run successfully
I 2021-02-02T12:22:54.056990+0000 2000722 nebd_server.cpp:60] NebdServer init curveRequestExecutor ok
W 2021-02-02T12:22:54.056999+0000 2000722 metafile_manager.cpp:142] File not exist: /data/nebd/nebdserver.meta
I 2021-02-02T12:22:54.057001+0000 2000722 metafile_manager.cpp:48] Init metafilemanager success.
I 2021-02-02T12:22:54.057020+0000 2000722 file_manager.cpp:84] Load file record finished.
I 2021-02-02T12:22:54.057021+0000 2000722 nebd_server.cpp:67] NebdServer init fileManager ok
I 2021-02-02T12:22:54.057047+0000 2000722 heartbeat_manager.cpp:42] Run heartbeat manager success.
I 2021-02-02T12:22:54.057049+0000 2000722 nebd_server.cpp:74] NebdServer init heartbeatManager ok
I 2021-02-02T12:22:54.057051+0000 2000722 nebd_server.cpp:76] NebdServer init ok
I 2021-02-02T12:22:54.057052+0000 2000722 nebd_server.cpp:78] nebd version: 9.9.9+984a60e7
E 2021-02-02T12:22:54.057240+0000 2000722 file_lock.cpp:38] open file failed, error = No such file or directory, filename = /data/nebd/nebd.sock.lock
E 2021-02-02T12:22:54.057245+0000 2000722 nebd_server.cpp:235] Address already in use
I 2021-02-02T12:22:54.057246+0000 2000722 file_manager.cpp:57] Stop file manager success.
I 2021-02-02T12:22:54.057404+0000 2000722 heartbeat_manager.cpp:48] Stopping heartbeat manager...
I 2021-02-02T12:22:54.057443+0000 2000722 heartbeat_manager.cpp:52] Stop heartbeat manager success.

2）curve_ops_tool delete -fileName=/test1 -userName=curve -forcedelete=true 无法删除卷
解决办法： -forcedelete=true 去掉，再进入recyclebin删除。为什么删除recyclebin下的文件userName为curve的时候失败，而改成root成功？

【C-Plan】编译及部署初体验

编译及部署初体验

本issue为C-Plan的必选任务，包含部署及编译。
本文先介绍遇到的问题，然后为成功的截图。

问题

1
执行ansible-playbook -i client.ini deploy_nbd.yml遇到权限问题

后加上sudo成功

sudo ansible-playbook -i client.ini deploy_nbd.yml

成功的流程

注意curve提供了两个docker镜像，分别用于开发和部署，需要分别拉取

docker pull opencurve/curvebuild:centos8
docker pull opencurve/curveintegration:centos8

根据文档，创建对应的container，执行对应步骤即可

编译及打包成功

部署成功
集群状态满足文档要求

查看/test的创建信息

观察到一个类型为nbd0类型的卷被添加

【C-Plan选题三】代码翻译&fix typo

Describe the task you choose (描述你选择的任务)

translate comments in these files:
- src/client/libcurve_file.h
- src/client/libcurve_file.cpp
- include/client/libcurve.h

Describe alternatives you've considered (optional) (描述你想到的方案(可选))

Additional context/screenshots (更多上下文/截图)

fix a typo (SNAPSTHO_FROZEN => SNAPSHOT_FROZEN) in these files:
- curvefs_python/curvefs_tool.py
- src/client/mds_client.cpp
- include/client/libcurve.h

单机部署提示could not find or access ../curve-mds/bin/导致部署失败

General Question

按照官方文档的指导，使用ansible-playbook命令部署单节点curve环境，在install mds bin阶段提示could not find or access ../curve-mds/bin/，导致部署失败。

查看脚本发现，在install mds bin阶段会将../curve-mds/bin/中的文件复制到/usr/bin中，但在../curve-mds目录下没有/bin目录，仅有DEBIAN和home两个文件夹。因此可以认为是curve-mds文件夹中没有mds组件，导致脚本无法找到源文件而报错。但整体阅读代码后并未找到下载或构建mds的命令，因此想请教一下如何解决这个问题。

OS：CentOS-8.2.2004
内核：4.18.0-193.el8.x86_64
openssl：1.1.1g FIPS 21 Apr 2020
gcc：8.4.1 20200928 （Red Hat 8.4.1-1）
curve：curve-1.0.4（1.2.1-rc0和1.3.0-beta2也有同样的问题）

跑测试用例snapshot_server_concurrent_itest失败

Describe the bug (描述bug)
编译curve通过后，跑测试命令（./bazel-bin/test/integration/snapshotcloneserver/snapshot_server_concurrent_itest）的时候先是会卡住如图1，最终失败如图2。

To Reproduce (复现方法)
1）下载：git clone https://github.com/opencurve/curve.git（master分支，最后commit如图3）
2）编译curve源码
3）启动fakes3（fakes3 -r /S3_DATA_DIR -p 9999 --license YOUR_LICENSE_KEY）
4）执行命令（./bazel-bin/test/integration/snapshotcloneserver/snapshot_server_concurrent_itest）
注：fakes3最终交互如图4

Expected behavior (期望行为)
修复此bug

Versions (各种版本)
OS: centos8.1
Compiler: gcc version 8.3.1
fakes3：FakeS3 2.0.0

Additional context/screenshots (更多上下文/截图)
图1：

图2：

图3：

图4：

【C-Plan】基于Docker的编译与部署

一、编译

基于VM虚拟机的ubuntu系统在docker中进编译
1、从仓库中下载curvebuild、curveintegration两个镜像，分别对应与编译与部署
```
docker pull opencurve/curvebuild:centos8
docker pull opencurve/curveintegration:centos8
```

2、运行Curvebuild镜像，进入curve目录

docker run -it opencurve/curvebuild:centos8 /bin/bash

3、执行编译命令
4、大概等待一个小时，结果如下

二、部署

运行curveintegration镜像，加上--net=host命令使其能够访问互联网

docker run --net=host --cap-add=ALL -v /dev:/dev -v /lib/modules:/lib/modules --privileged -it opencurve/curveintegration:centos8 /bin/bash

获取tar包并解压

wget https://github.com/opencurve/curve/releases/download/v{version}/curve_{version}.tar.gz
wget https://github.com/opencurve/curve/releases/download/v{version}/nbd_{version}.tar.gz
wget https://github.com/opencurve/curve/releases/download/v{version}/nebd_{version}.tar.gz
tar zxvf curve_{version}.tar.gz
tar zxvf nbd_{version}.tar.gz
tar zxvf nebd_{version}.tar.gz

(部署过程很顺利，只是上面的v{version}开始使用的是最新v1.0.2-beta会出现unhealthy_copysets不为0, 后参考其它同学的部署经验改成v1.0.1-rc0之后问题解决）

集群启动结果

安装Nebd服务和NBD包

ansible-playbook -i client.ini deploy_nebd.yml
ansible-playbook -i client.ini deploy_nbd.yml
ansible-playbook -i client.ini deploy_curve_sdk.yml

创建CRUVE卷，可以查看到NBD0

docker下部署失败

General Question

使用 https://github.com/opencurve/curve/blob/master/docs/cn/deploy.md#%E5%8D%95%E6%9C%BA%E9%83%A8%E7%BD%B2
推荐的docker image
docker run --cap-add=ALL -v /dev:/dev -v /lib/modules:/lib/modules --privileged -it opencurve/curveintegration:centos8 /bin/bash

最后运行ansible显示失败。这个意思是上面这个centos8版本太低一定要到4.18.0-193.el8.x86_64？

$ uname -a
Linux ec0a9b5f0083 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux  
$ ansible-playbook -i server.ini deploy_curve.yml  
...
TASK [check kernel version] **************************************************************************************************************************
[DEPRECATION WARNING]: Using tests as filters is deprecated. Instead of using `result|version_compare` instead use `result is version_compare`. This 
feature will be removed in version 2.9. Deprecation warnings can be disabled by setting deprecation_warnings=False in ansible.cfg.
fatal: [localhost]: FAILED! => {
    "assertion": "ansible_kernel|version_compare('3.15', '>=')", 
    "changed": false, 
    "evaluated_to": false
}

NO MORE HOSTS LEFT ***********************************************************************************************************************************
	to retry, use: --limit @/home/curve/curve/curve-ansible/deploy_curve.retry

PLAY RECAP *******************************************************************************************************************************************
localhost                  : ok=2    changed=0    unreachable=0    failed=1

请告知一个部署自测的时候可以成功的CentOS版本吧

请问开发人员测试的时候是用的CentOS哪个小版本？比如CentOS7.8或者CentOS8.2？
毕竟C/C++编写的东西，依赖问题还是挺多的，先谢啦~

三节点部署失败

General Question

使用https://github.com/opencurve/curve/releases/download/v1.0.0/curve_1.0.0+8b04e0ec.tar.gz，

CentOS8下部署单节点成功，部署三节点在deploy etcd这步失败。
具体是哪里生成的etcd.conf.yml并拷贝到其它主机的？在中控机以外的两台主机没有看到生成的etcd.conf.yml。

############################## deploy etcd ##############################
- name: prepare etcd
  hosts: etcd
  any_errors_fatal: true
  gather_facts: no
  become: yes
  become_user: "{{ sudo_user }}"
  become_flags: -iu {{ sudo_user }}
  tags:
    - etcd
  roles:
    - { role: install_package, package_name: etcd, install_with_deb: false, tags: install_etcd } // 我直接把etcd的下载包解压拷贝到/usr/bin下了
    - { role: generate_config, template_name: etcd.conf.yml, conf_path: "{{ etcd_config_path }}", tags: generage_config } // 三节点，etcd1（也是中控机）成功，etcd2和etcd3失败

ansible-playbook -i server.ini deploy_curve.yml 失败如下

TASK [generate_config : generate configuration file directly] ****************************************************************************************
fatal: [etcd3]: FAILED! => {"changed": false, "checksum": "d0ddc59580bbb243f260acea51c4a870449ecaf8", "msg": "Destination /etc/curve not writable"}
...ignoring
fatal: [etcd1]: FAILED! => {"changed": false, "checksum": "ed3f202012e267b199f35bc0c2d106bfb436a6a7", "msg": "Destination /etc/curve not writable"}
...ignoring
fatal: [etcd2]: FAILED! => {"changed": false, "checksum": "e02c13a758f55ac58ccbf3be5443e10833a72d91", "msg": "Destination /etc/curve not writable"}
...ignoring

TASK [generate_config : generate configuration file at /tmp] *****************************************************************************************
changed: [etcd1]
changed: [etcd2]
changed: [etcd3]

TASK [generate_config : mv config file] **************************************************************************************************************
changed: [etcd1]
fatal: [etcd2]: FAILED! => {"changed": true, "cmd": "sudo mv /tmp/etcd.conf.yml /etc/curve/etcd.conf.yml", "delta": "0:00:00.016609", "end": "2020-11-13 04:07:44.967946", "msg": "non-zero return code", "rc": 1, "start": "2020-11-13 04:07:44.951337", "stderr": "mv: cannot stat '/tmp/etcd.conf.yml': No such file or directory", "stderr_lines": ["mv: cannot stat '/tmp/etcd.conf.yml': No such file or directory"], "stdout": "", "stdout_lines": []}
fatal: [etcd3]: FAILED! => {"changed": true, "cmd": "sudo mv /tmp/etcd.conf.yml /etc/curve/etcd.conf.yml", "delta": "0:00:01.017238", "end": "2020-11-13 04:07:45.976883", "msg": "non-zero return code", "rc": 1, "start": "2020-11-13 04:07:44.959645", "stderr": "mv: cannot stat '/tmp/etcd.conf.yml': No such file or directory", "stderr_lines": ["mv: cannot stat '/tmp/etcd.conf.yml': No such file or directory"], "stdout": "", "stdout_lines": []}

NO MORE HOSTS LEFT ***********************************************************************************************************************************
	to retry, use: --limit @/home/curve/curve/curve/curve-ansible/deploy_curve.retry

PLAY RECAP *******************************************************************************************************************************************
etcd1                      : ok=52   changed=14   unreachable=0    failed=0   
etcd2                      : ok=51   changed=13   unreachable=0    failed=1   
etcd3                      : ok=51   changed=13   unreachable=0    failed=1   
localhost                  : ok=35   changed=8    unreachable=0    failed=0   
mds1                       : ok=34   changed=8    unreachable=0    failed=0   
mds2                       : ok=34   changed=8    unreachable=0    failed=0   
mds3                       : ok=34   changed=8    unreachable=0    failed=0   
nginx1                     : ok=34   changed=8    unreachable=0    failed=0   
nginx2                     : ok=34   changed=8    unreachable=0    failed=0   
server1                    : ok=36   changed=8    unreachable=0    failed=0   
server2                    : ok=36   changed=8    unreachable=0    failed=0   
server3                    : ok=36   changed=8    unreachable=0    failed=0   
snap1                      : ok=34   changed=8    unreachable=0    failed=0   
snap2                      : ok=34   changed=8    unreachable=0    failed=0   
snap3                      : ok=34   changed=8    unreachable=0    failed=0

而在显示成功的etcd1（中控机）上，生成的etcd.conf.xml显示是etcd3？？？

$ sudo head /etc/curve/etcd.conf.yml 
# This is the configuration file for the etcd server.

# Human-readable name for this member.
name: etcd3      <------------???

# Path to the data directory.
data-dir: /etcd/data

# Path to the dedicated wal directory.
wal-dir: /etcd/wal

【C-Plan】编译与部署

测试环境：

硬件配置：云服务器ECS 2C4G
操作系统：Centos8(非Docker环境编译/部署)
内核版本：4.18.0

部署
按官网教程逐步进行，遇以下情况

提示未安装GCC 解决方案:yum install gcc。
下载etcd-client时多次连接超时解决方案：其他机器下载etcd-client，上传至本机/tmp下，注释脚本/home/curve/curve/curve-ansible/roles/install_package/tasks/include/install_etcd.yml的32-41行下载etcd-client步骤。
启动./chunkserver_ctl.sh start all失败解决方案：修改chunkserver_ctl.sh脚本在164行打印curve-chunkserver的执行结果即$LD_PRELOAD，发现缺少libcurl-gnutls.so.4，则在/usr/lib64下创建软连接，ln -s libcurl.so.4 libcurl-gnutls.so.4。

结果验证

写入测试

编译

wget https://github.com/bazelbuild/bazel/releases/download/0.17.2/bazel-0.17.2-installer-linux-x86_64.sh 下载bazel并安装
yum install git gcc-c++ make zlib zlib-devel openssl openssl-devel 安装相应依赖 git clone https://github.com/albertito/libfiu.git && make && make install (该依赖不可使用yum install)
执行 ./replace-curve-repo.sh && ./build.sh

Client Python Api测试

在root下创建目录 mkdir curve_test && cd curve_test，拷贝 /usr/curvefs 到该目录 cp -r /usr/curvefs/ .
编写测试用例 vim main.py

3.执行

Client Cpp Api测试

在curve目录下创建目录 mkdir curvefs_cpp && cd curvefs_cpp，拷贝 curve/nebd/src/part2/BUILD 到该目录
修改BUILD文件
编写测试用例(模仿curvefs_python)
编译并执行
bazel build //curvefs_cpp:curvefs --copt -DHAVE_ZLIB=1 --compilation_mode=dbg -s --define=with_glog=true --define=libunwind=true --copt -DGFLAGS_NS=google --copt -Wno-error=format-security --copt -DUSE_BTHREAD_MUTEX
bazel-bin/curvefs_cpp/curvefs

【C-Plan-选题二】代码翻译

对以下板块的注释进行中文到英文的翻译

curve/src/chunkserver

能详细介绍下chunkfilepool吗？

在简介里看到这么一句，好奇具体是怎么做的

在状态机的实现上采用 chunkfilepool 的方式 ( 初始化集群的时候格式化出指定比例的空间用作 chunk ) 使得底层的写入放大为 0

【C-Plan】编译与部署

一、编译

1、基本步骤

编译的过程还是比较顺利的，按部就班操作即可，总共用时一个多小时

体验非常的丝滑顺畅

可以看到编译成功的提示：

打包成功：

二、部署

1、基本步骤

一路部署下来还是遇到不少坑的，先po出结果吧：

集群部署成功：

集群状态检查：

Nbd0卷检查：

2、踩坑&填坑

镜像要选择正确

最开始，我直接在curvebuild镜像下，对着编译打好的tar包解压并部署了。配环境真的需要很多时间～

还是使用curveintegration这个镜像最省事了

切换到curve用户时，会报出下列警告：

-bash: /dev/null: Permission denied

这是出了权限问题

设置一下，就可以了：

chmod 777 /dev/null

wget速度过慢，经常中断的问题

换用axel多线程下载工具，可以大大加快速度

带上-c参数，启用断点续传，就不用担心经常速度变成0了

部署集群时报错：Curl error (28): Timeout was reached

连接超时了，经常还和6号错误一同出现，这是源的问题

换一个能用的源可以解决问题

部署集群时报错：/usr/bin/python: No such file or directory

在bin下确实不存在python目录，但是存在具体版本的目录

使用软连接，把python连接到python2,7上：

ln -s /usr/bin/python2.7 /usr/bin/python

部署集群时报错：Errors during downloading metadata for repository 'base'

这也是源出问题了

换一个能用的源可以解决问题

部署集群时报错：dev/urandom not found

好像是权限出了问题，dev下是存在urandom的

在执行命令的时候加上sudo就可以了

chunkserver will exit with core dump when receive SIGINT

curve user feedback

Describe the bug (描述bug)

kill -2 $chunkserverid in normal enviroment
generate chunkserver core file

To Reproduce (复现方法)
kill -2 $chunkserverid

Expected behavior (期望行为)
chunkserver exits normally

Versions (各种版本)
OS: debain9
Compiler: use curve release version
curve-mds: v1.3.0-beta
curve-chunkserver: v1.3.0-beta
curve-snapshotcloneserver: v1.3.0-beta
curve-sdk: v1.3.0-beta
nebd: v1.3.0-beta
curve-nbd: v1.3.0-beta

Additional context/screenshots (更多上下文/截图)

ARM64 platform support

Is your feature request related to a problem? (你需要的功能是否与某个问题有关?)

Describe the solution you'd like (描述你期望的解决方法)

Curve的maintainers，你们好，我是来自华为的开源开发者，在这里想了解一下Curve社区对ARM64架构支持的意愿。我想在Curve社区推动ARM64的支持。这里有如下计划，想和社区的各位专家讨论一下：

接入ARM64 CI

目前，curve的CI平台是自建的jenkins，我们这边可以捐献ARM64的虚拟机到社区CI平台，以满足ARM64 CI的支持。
提交ARM64 support的相关代码，保证curve在arm64平台的build和test通过

目前，我已经在本地的arm64机器上完成了build相关的patch。wangxiyuan@19b2a66 在此patch基础上，可以顺利build curve。测试用例的相关测试与修复还在进行中。
ARM64 release

等build、test的问题在arm64全修复，并且ARM64 CI稳定运行一段时间后，可以官方release arm64的二进制文件。

ARM平台现在越来越受欢迎，并且在个人PC、服务器、云计算等领域也开始发力，不知curve社区对ARM64支持的工作是否感兴趣？如果社区同意，我可以负责相关代码的编写，以及ARM CI的后期维护工作，期待您的回复，欢迎讨论。谢谢

Describe alternatives you've considered (描述你想到的折衷方案)

ARM CI有很多方案，比如使用Travis CI等等。但由于curve是自建的CI平台，捐献ARM64机器应该是最好的方案，这样可以统一CI，便于控制、方便维护。

Additional context/screenshots (更多上下文/截图)

None

【C-Plan】编译和部署

环境

云服务器：CPU: 1核内存: 2GB
OS：CentOS 8

使用非 docker 方式进行部署，curve 版本是 1.0.2-rc0

问题及解决方案

问题

下载 etcd-client 时多次连接超时
启动 ./chunkserver_ctl.sh start all 失败
权限问题

执行 ansible 任务报错

fatal: [localhost]: FAILED! => {
"assertion": "ansible_kernel|version_compare('3.15', '>=')", 
"changed": false, 
"evaluated_to": false
}

解决方案

其中问题 1, 2 参照 #245 的解决方案；
问题 3 加上 sudo；
问题 4 升级centos系统内核，然后重启系统，最后安装的系统内核为 4.18.0-240.10.1.el8_3.x86_64。

部署结果

集群状态

创建 CURVE 卷，并通过 NBD 挂载到本地

opencurve / curve Goto Github PK

curve's Issues

版本

步骤

General Question

General Question

General Question

General Question

General Question

General Question

General Question

一、编译

二、单机部署

拉取并启动镜像

获取tar并解压

执行单机部署

单机部署验证

General Question

General Question

General Question

General Question

您好，访问接口支持POSIX吗？

General Question

curve 编译

curve 部署

编译及部署初体验

问题

成功的流程

General Question

一、编译

二、部署

General Question

General Question

一、编译

1、基本步骤

二、部署

1、基本步骤

2、踩坑&填坑

环境

问题及解决方案

问题

解决方案

部署结果

Recommend Projects

Recommend Topics

Recommend Org