Git Product home page Git Product logo

Comments (4)

BlaineEXE avatar BlaineEXE commented on June 29, 2024 1

It is expected that basic Rook will underperform Ceph on bare metal. If the hardware for each test is exactly the same, the difference will be primarily due to Kubernetes' software network overlays, which increase latency significantly, and which will somewhat limit the bandwidth available to the Ceph cluster.

Installing a cluster with network.provider: host or network.provider: multus will allow Ceph to use the raw underlying networks and regain much of the lost performance.

Some other performance ipacts may be related to Ceph's ability to infer OSD and MDS memory usage constraints based on resource limits/requests set for those pods. Those same constraints may not have been configured in the bare Ceph cluster.

from rook.

parth-gr avatar parth-gr commented on June 29, 2024

@kubecto you can use the direct mount pod for these operations, https://rook.io/docs/rook/latest-release/Troubleshooting/direct-tools/

from rook.

kubecto avatar kubecto commented on June 29, 2024

According to the instructions in the document, I conducted rbd and cephfs performance tests on ceph deployed with bare host compared with Ceph-ROok, and the results seem to show that the effect of rook-ceph is poor. Is there any way to optimize it

Bare host ceph vs rook-ceph performance report
image

Bare host rbd test case

modprobe rbd
ceph osd pool create testbench 100 100

rbd create image01 --size 1024 --pool testbench
rbd feature disable testbench/image01 object-map fast-diff deep-flatten
rbd map image01 --pool testbench --name client.admin
mkfs.ext4 /dev/rbd/testbench/image01
[root@ceph-1 ~]# mkdir /mnt/ceph-block-device
[root@ceph-1 ~]# mount  /dev/rbd/testbench/image01 /mnt/ceph-block-device
[root@ceph-1 ~]# cd /mnt/ceph-block-device && rbd bench --io-type write image01 --pool=testbench
bench  type write io_size 4096 io_threads 16 bytes 1073741824 pattern sequential
  SEC       OPS   OPS/SEC   BYTES/SEC
    1     44000  44059.96  180469606.41
    2     87184  43621.71  178674541.77
    3    128096  42718.14  174973521.12
    4    172592  43162.70  176794399.18
    5    214448  42901.29  175723664.45
    6    256144  42428.71  173787979.89
elapsed:     6  ops:   262144  ops/sec: 42145.24  bytes/sec: 172626920.56

rook-ceph rdb block device performance test

kubectl create -f deploy/examples/direct-mount.yaml
rbd create replicapool/test --size 10
rbd info replicapool/test
rbd feature disable replicapool/test fast-diff deep-flatten object-map
rbd map replicapool/test
mkfs.ext4 -m0 /dev/rbd0
mkdir /tmp/rook-volume
mount /dev/rbd0 /tmp/rook-volume
cd /tmp/rook-volume && rbd bench --io-type write test --pool=replicapool
bench  type write io_size 4096 io_threads 16 bytes 1073741824 pattern sequential
  SEC       OPS   OPS/SEC   BYTES/SEC
  181    260944   1419.38   5.5 MiB/s
elapsed: 181   ops: 262144   ops/sec: 1441.08   bytes/sec: 5.6 MiB/s

Bare host cephfs performance test

cat /etc/ceph/ceph.client.admin.keyring | grep key | awk '{print $3}' | base64
mkdir -p /mnt/mycephfs
mount -t ceph 10.102.26.31:6789,10.102.26.32:6789,10.102.26.33:6789:/ /mnt/mycephfs -o name=admin,secret=QVFDVHNlcGxlR3BRSWhBQWVaRWIzSzRHeUFxQmNiSE43RVVVdUE9PQo=
cd /mnt/mycephfs 
fio --name=randwrite --rw=randwrite --direct=1 --ioengine=libaio --bs=4k --iodepth=32 --size=1G --runtime=60 --group_reporting=1

randwrite: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32
fio-3.7
Starting 1 process
randwrite: Laying out IO file (1 file / 1024MiB)
Jobs: 1 (f=1): [w(1)][100.0%][r=0KiB/s,w=20.4MiB/s][r=0,w=5223 IOPS][eta 00m:00s]
randwrite: (groupid=0, jobs=1): err= 0: pid=127064: Tue May 21 17:37:38 2024
  write: IOPS=5510, BW=21.5MiB/s (22.6MB/s)(1024MiB/47574msec)
    slat (usec): min=3, max=33254, avg=15.61, stdev=140.68
    clat (usec): min=1797, max=58190, avg=5789.73, stdev=2711.33
     lat (usec): min=1815, max=58219, avg=5805.53, stdev=2715.18
    clat percentiles (usec):
     |  1.00th=[ 3163],  5.00th=[ 3654], 10.00th=[ 3982], 20.00th=[ 4359],
     | 30.00th=[ 4686], 40.00th=[ 4948], 50.00th=[ 5276], 60.00th=[ 5604],
     | 70.00th=[ 5997], 80.00th=[ 6521], 90.00th=[ 7504], 95.00th=[ 8979],
     | 99.00th=[17957], 99.50th=[21890], 99.90th=[34341], 99.95th=[43254],
     | 99.99th=[52691]
   bw (  KiB/s): min=14536, max=27064, per=99.98%, avg=22035.17, stdev=2513.77, samples=95
   iops        : min= 3634, max= 6766, avg=5508.78, stdev=628.45, samples=95
  lat (msec)   : 2=0.01%, 4=10.54%, 10=85.73%, 20=3.01%, 50=0.70%
  lat (msec)   : 100=0.02%
  cpu          : usr=2.18%, sys=9.15%, ctx=115866, majf=0, minf=30
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,262144,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: bw=21.5MiB/s (22.6MB/s), 21.5MiB/s-21.5MiB/s (22.6MB/s-22.6MB/s), io=1024MiB (1074MB), run=47574-47574msec

Disk stats (read/write):
  rbd1: ios=0/261729, merge=0/7535, ticks=0/1486709, in_queue=1472707, util=99.88%

rook-ceph cephfs performance test

# mkdir /tmp/registry
# grep mon_host /etc/ceph/ceph.conf | awk '{print $3}'
10.96.164.164:6789,10.96.32.39:6789,10.96.162.58:6789
# grep key /etc/ceph/keyring | awk '{print $3}'
AQBkDEtmTcGTIRAA+T90yknxF9MLlE36SjfLMA==
# mon_endpoints=$(grep mon_host /etc/ceph/ceph.conf | awk '{print $3}')
# my_secret=$(grep key /etc/ceph/keyring | awk '{print $3}')
# mount -t ceph -o mds_namespace=myfs,name=admin,secret=$my_secret $mon_endpoints:/ /tmp/registry

# cd /tmp/registry
# fio --name=randwrite --rw=randwrite --direct=1 --ioengine=libaio --bs=4k --iodepth=32 --size=1G --runtime=60 --group_reporting=1
randwrite: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=32
fio-3.19
Starting 1 process
randwrite: Laying out IO file (1 file / 1024MiB)
Jobs: 1 (f=1): [w(1)][100.0%][w=11.7MiB/s][w=2990 IOPS][eta 00m:00s]
randwrite: (groupid=0, jobs=1): err= 0: pid=2093: Wed May 22 02:37:50 2024
  write: IOPS=2905, BW=11.3MiB/s (11.9MB/s)(681MiB/60008msec); 0 zone resets
    slat (usec): min=4, max=16004, avg=11.75, stdev=40.54
    clat (usec): min=2382, max=91412, avg=11001.14, stdev=3988.86
     lat (usec): min=2389, max=91419, avg=11013.09, stdev=3989.07
    clat percentiles (usec):
     |  1.00th=[ 7111],  5.00th=[ 8291], 10.00th=[ 8717], 20.00th=[ 9241],
     | 30.00th=[ 9634], 40.00th=[10028], 50.00th=[10290], 60.00th=[10683],
     | 70.00th=[11207], 80.00th=[11863], 90.00th=[13304], 95.00th=[15008],
     | 99.00th=[26346], 99.50th=[31065], 99.90th=[71828], 99.95th=[85459],
     | 99.99th=[89654]
   bw (  KiB/s): min= 1632, max=15661, per=100.00%, avg=11622.39, stdev=1298.30, samples=119
   iops        : min=  408, max= 3915, avg=2905.58, stdev=324.56, samples=119
  lat (msec)   : 4=0.01%, 10=39.99%, 20=58.31%, 50=1.52%, 100=0.17%
  cpu          : usr=0.99%, sys=3.76%, ctx=46530, majf=0, minf=31
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=100.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.1%, 64=0.0%, >=64=0.0%
     issued rwts: total=0,174334,0,0 short=0,0,0,0 dropped=0,0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: bw=11.3MiB/s (11.9MB/s), 11.3MiB/s-11.3MiB/s (11.9MB/s-11.9MB/s), io=681MiB (714MB), run=60008-60008msec

OS (e.g. from /etc/os-release):

VERSION="7 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="7"
PRETTY_NAME="CentOS Linux 7 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:7"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-7"
CENTOS_MANTISBT_PROJECT_VERSION="7"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="7"

Kernel (e.g. uname -a):

uname -a
Linux ceph-1 3.10.0-1160.el7.x86_64 #1 SMP Mon Oct 19 16:18:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Cloud provider or hardware configuration:

vsphere 、4c8g

Use the ceph - ansible build ceph version 14.2.22 (ca74598065096e6fcbd8433c8779a2be0c889351) nautilus (stable)
Three nodes, each node has three disks, of which two store data, one store osd-db data

Rook cluster hardware configuration

4c8g、One data disk /dev/sdb is used as the osd data disk of ceph

# lsblk
NAME            MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda               8:0    0   150G  0 disk
├─sda1            8:1    0     1G  0 part /boot
└─sda2            8:2    0   149G  0 part
  ├─centos-root 253:0    0 141.1G  0 lvm  /
  └─centos-swap 253:1    0   7.9G  0 lvm
sdb               8:16   0    16G  0 disk
└─sdb1            8:17   0    16G  0 part
sr0              11:0    1  1024M  0 rom

Rook version (use rook version inside of a Rook Pod): rook-1.10.12
Storage backend version (e.g. for ceph do ceph -v): ceph version 17.2.5 (98318ae89f1a893a6ded3a640405cdbb33e08757) quincy (stable)
Kubernetes version (use kubectl version): 1.23.4
Kubernetes cluster type (e.g. Tectonic, GKE, OpenShift): vsphere
Storage backend status (e.g. for Ceph use ceph health in the Rook Ceph toolbox):

[rook@rook-ceph-tools-operator-image-84c58d79d8-pdtsg /]$ ceph -s
  cluster:
    id:     49639e22-ac64-449a-acb2-819ec3b9fb79
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum a,b,c (age 42h)
    mgr: a(active, since 42h), standbys: b
    mds: 1/1 daemons up, 1 hot standby
    osd: 3 osds: 3 up (since 42h), 3 in (since 42h)

  data:
    volumes: 1/1 healthy
    pools:   3 pools, 49 pgs
    objects: 283 objects, 1.0 GiB
    usage:   4.2 GiB used, 44 GiB / 48 GiB avail
    pgs:     49 active+clean

  io:
    client:   853 B/s rd, 1 op/s rd, 0 op/s wr

from rook.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.