Git Product home page Git Product logo

Comments (14)

jacobsa avatar jacobsa commented on June 18, 2024

Setting up in us-central1-a on n1-standard-4 with image backports-debian-7-wheezy-v20150320 and standard persistent disk.

from gcsfuse.

jacobsa avatar jacobsa commented on June 18, 2024

Setup notes

Create an instance with a built-in key that can access GCS:

gcloud compute instances create test --scopes storage-full --image debian-7-backports --machine-type n1-standard-4 --project github-jacobsa-gcloud --zone us-central1-a

Copy in key file:

gcloud compute --project github-jacobsa-gcloud copy-files --zone us-central1-a '/Users/jacobsa/Downloads/GCloud GitHub project-bc05fbd617c4.json' test:

ssh in and set up the shell:

gcloud compute --project github-jacobsa-gcloud ssh --zone us-central1-a test
set -o vi

Install and set up fuse:

sudo apt-get update && sudo apt-get install fuse
sudo adduser $USER fuse
sudo chmod g+rw /dev/fuse
sudo chgrp fuse /dev/fuse

Log out and back in so the group change takes effect:

exit
[...]
gcloud compute --project github-jacobsa-gcloud ssh --zone us-central1-a test
set -o vi

Install Go:

wget https://storage.googleapis.com/golang/go1.4.2.linux-amd64.tar.gz
sudo tar -C /usr/local -xzf go1.4.2.linux-amd64.tar.gz

export GOPATH=$HOME/go
export PATH=$PATH:/usr/local/go/bin
export PATH=$PATH:$GOPATH/bin

Install Git:

sudo apt-get update && sudo apt-get -y install git-core

Install gcsfuse:

go get -v github.com/googlecloudplatform/gcsfuse

Test notes

Generate some data:

dd if=/dev/urandom of=foo bs=1048576 count=1     # 1 MiB
dd if=/dev/urandom of=foo bs=1048576 count=100   # 100 MiB
dd if=/dev/urandom of=foo bs=1048576 count=1024  # 1 GiB

gcsfuse

Mount gcsfuse:

mkdir -p mp
gcsfuse --bucket jacobsa-gcs-integration-test --key_file GCloud\ GitHub\ project-bc05fbd617c4.json mp &

Copy in the file:

rm -f mp/foo
time cp foo mp/

gsutil

gsutil rm -f gs://jacobsa-gcs-integration-test/foo
time gsutil cp foo gs://jacobsa-gcs-integration-test/

from gcsfuse.

jacobsa avatar jacobsa commented on June 18, 2024

Results from a few runs

100 MiB

gcsfuse

real    0m8.240s
user    0m0.020s
sys     0m0.952s

real    0m6.245s
user    0m0.016s
sys     0m0.732s

real    0m6.353s
user    0m0.000s
sys     0m0.748s

gsutil

real    0m2.620s
user    0m1.548s
sys     0m0.672s

real    0m3.036s
user    0m1.552s
sys     0m0.700s

real    0m2.965s
user    0m1.384s
sys     0m0.684s

1 GiB

gcsfuse

real    1m5.938s
user    0m0.048s
sys     0m9.492s

real    1m1.468s
user    0m0.040s
sys     0m8.384s

gsutil

real    0m16.926s
user    0m6.420s
sys     0m2.832s

real    0m20.855s
user    0m6.408s
sys     0m2.928s

from gcsfuse.

jacobsa avatar jacobsa commented on June 18, 2024

So pretty slow, but not as terribly slow as I had feared. Some things to look at:

  • Is it possible cp is doing something braindead here? Try using dd instead, where we can control the block size.
  • What proportion of the time spent is writing to the local file vs. uploading vs. everything else? Add logging to find out.
  • Confirm that the kernel is sending 4 KiB writes, which is probably terrible, then fix.

from gcsfuse.

jacobsa avatar jacobsa commented on June 18, 2024

Trying with dd:

# <Generate 1 GiB foo>
rm -f mp/foo
time dd if=foo of=mp/foo bs=1048576

which gives:

real    0m58.100s
user    0m0.012s
sys     0m8.796s

which is not much different. Only slightly better with a block size of 10x that, on a single run:

real    0m53.033s
user    0m0.008s
sys     0m7.824s

from gcsfuse.

jacobsa avatar jacobsa commented on June 18, 2024

Confirmed: writes are 4 KiB.

from gcsfuse.

jacobsa avatar jacobsa commented on June 18, 2024

Hacked together a binary that uses bazilfuse.InitBigWrites in the response to fuse's init request, which appears to result in 32 KiB writes from the kernel.

New results for 1 GiB:

real    0m43.312s
user    0m0.052s
sys     0m2.656s

real    0m37.229s
user    0m0.064s
sys     0m1.980s

Better.

Surprisingly, InitWritebackCache plus big writes appears to make things much worse:

real    1m42.914s
user    0m0.012s
sys     0m0.220s

from gcsfuse.

jacobsa avatar jacobsa commented on June 18, 2024

Hacked in some logging that shows that with 32 KiB most of the time is spent in ObjectProxy.Sync, which is doing the network write:

2015/03/25 05:05:47.256616 object_proxy.go:244: WriteAt called
2015/03/25 05:05:47.256658 object_proxy.go:412: ensureLocalFile called
2015/03/25 05:05:47.293666 object_proxy.go:423: ensureLocalFile returning
2015/03/25 05:05:47.293758 object_proxy.go:259: WriteAt returning
2015/03/25 05:05:52.133747 object_proxy.go:296: Sync called
2015/03/25 05:06:29.445792 object_proxy.go:342: Sync returning

real    0m42.493s
user    0m0.044s
sys     0m1.908s

Now that's actually a bit odd to me. What is our io.Copy doing that gsutil is not? Serving from a bytes.Reader rather than the file doesn't seem to make a difference.

from gcsfuse.

jacobsa avatar jacobsa commented on June 18, 2024

FWIW, here is the CPU profile while calling CreateObject:

  flat  flat%   sum%        cum   cum%
8900ms 48.79% 48.79%     8900ms 48.79%  crypto/cipher.(*gcm).mul
2010ms 11.02% 59.81%     2010ms 11.02%  crypto/aes.encryptBlockAsm
1960ms 10.75% 70.56%     1960ms 10.75%  runtime.memmove
 760ms  4.17% 74.73%      760ms  4.17%  crypto/cipher.getUint64
 560ms  3.07% 77.80%    10200ms 55.92%  crypto/cipher.(*gcm).updateBlocks
 540ms  2.96% 80.76%      540ms  2.96%  runtime.futex
 520ms  2.85% 83.61%      560ms  3.07%  syscall.Syscall
 480ms  2.63% 86.24%     2640ms 14.47%  crypto/aes.(*aesCipher).Encrypt
 460ms  2.52% 88.76%     3760ms 20.61%  crypto/cipher.(*gcm).counterCrypt
 310ms  1.70% 90.46%      310ms  1.70%  crypto/cipher.fastXORWords

from gcsfuse.

jacobsa avatar jacobsa commented on June 18, 2024

Although gcsfuse CPU usage hovers at about 50%, so that's probably not the bottleneck.

from gcsfuse.

jacobsa avatar jacobsa commented on June 18, 2024

To do for alpha: Document these numbers, set performance expectations.

from gcsfuse.

jacobsa avatar jacobsa commented on June 18, 2024

Switched to resumable object uploads in jacobsa/gcloud@3b1af22, which seems to make a bit of a difference to upload speed. As far as I can tell, this effect is responsible for most of the remaining delta with gsutil. (Ignoring the unavoidable double-copy overhead when staging the temporary file in gcsfuse.)

from gcsfuse.

jacobsa avatar jacobsa commented on June 18, 2024

To do:

  1. Wait for #27 (increasing kernel write size).
  2. Try the test above again.
  3. Set performance expectations in the docs.
  4. Call it good enough or not. If good enough, close this bug. Either way remove from alpha milestone.

from gcsfuse.

jacobsa avatar jacobsa commented on June 18, 2024

Issue #27 is fixed.

Tried again with 1 GiB:

real    0m27.226s
user    0m0.064s
sys     0m2.152s

real    0m24.109s
user    0m0.040s
sys     0m1.736s

real    0m25.231s
user    0m0.048s
sys     0m1.616s

Much much better, and proabably good enough for now. Let's document that performance is comparable with gsutil, and wait for specific complaints.

from gcsfuse.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.