Comments (14)
Setting up in us-central1-a
on n1-standard-4
with image backports-debian-7-wheezy-v20150320
and standard persistent disk.
from gcsfuse.
Setup notes
Create an instance with a built-in key that can access GCS:
gcloud compute instances create test --scopes storage-full --image debian-7-backports --machine-type n1-standard-4 --project github-jacobsa-gcloud --zone us-central1-a
Copy in key file:
gcloud compute --project github-jacobsa-gcloud copy-files --zone us-central1-a '/Users/jacobsa/Downloads/GCloud GitHub project-bc05fbd617c4.json' test:
ssh in and set up the shell:
gcloud compute --project github-jacobsa-gcloud ssh --zone us-central1-a test
set -o vi
Install and set up fuse:
sudo apt-get update && sudo apt-get install fuse
sudo adduser $USER fuse
sudo chmod g+rw /dev/fuse
sudo chgrp fuse /dev/fuse
Log out and back in so the group change takes effect:
exit
[...]
gcloud compute --project github-jacobsa-gcloud ssh --zone us-central1-a test
set -o vi
Install Go:
wget https://storage.googleapis.com/golang/go1.4.2.linux-amd64.tar.gz
sudo tar -C /usr/local -xzf go1.4.2.linux-amd64.tar.gz
export GOPATH=$HOME/go
export PATH=$PATH:/usr/local/go/bin
export PATH=$PATH:$GOPATH/bin
Install Git:
sudo apt-get update && sudo apt-get -y install git-core
Install gcsfuse:
go get -v github.com/googlecloudplatform/gcsfuse
Test notes
Generate some data:
dd if=/dev/urandom of=foo bs=1048576 count=1 # 1 MiB
dd if=/dev/urandom of=foo bs=1048576 count=100 # 100 MiB
dd if=/dev/urandom of=foo bs=1048576 count=1024 # 1 GiB
gcsfuse
Mount gcsfuse:
mkdir -p mp
gcsfuse --bucket jacobsa-gcs-integration-test --key_file GCloud\ GitHub\ project-bc05fbd617c4.json mp &
Copy in the file:
rm -f mp/foo
time cp foo mp/
gsutil
gsutil rm -f gs://jacobsa-gcs-integration-test/foo
time gsutil cp foo gs://jacobsa-gcs-integration-test/
from gcsfuse.
Results from a few runs
100 MiB
gcsfuse
real 0m8.240s
user 0m0.020s
sys 0m0.952s
real 0m6.245s
user 0m0.016s
sys 0m0.732s
real 0m6.353s
user 0m0.000s
sys 0m0.748s
gsutil
real 0m2.620s
user 0m1.548s
sys 0m0.672s
real 0m3.036s
user 0m1.552s
sys 0m0.700s
real 0m2.965s
user 0m1.384s
sys 0m0.684s
1 GiB
gcsfuse
real 1m5.938s
user 0m0.048s
sys 0m9.492s
real 1m1.468s
user 0m0.040s
sys 0m8.384s
gsutil
real 0m16.926s
user 0m6.420s
sys 0m2.832s
real 0m20.855s
user 0m6.408s
sys 0m2.928s
from gcsfuse.
So pretty slow, but not as terribly slow as I had feared. Some things to look at:
- Is it possible
cp
is doing something braindead here? Try usingdd
instead, where we can control the block size. - What proportion of the time spent is writing to the local file vs. uploading vs. everything else? Add logging to find out.
- Confirm that the kernel is sending 4 KiB writes, which is probably terrible, then fix.
from gcsfuse.
Trying with dd:
# <Generate 1 GiB foo>
rm -f mp/foo
time dd if=foo of=mp/foo bs=1048576
which gives:
real 0m58.100s
user 0m0.012s
sys 0m8.796s
which is not much different. Only slightly better with a block size of 10x that, on a single run:
real 0m53.033s
user 0m0.008s
sys 0m7.824s
from gcsfuse.
Confirmed: writes are 4 KiB.
from gcsfuse.
Hacked together a binary that uses bazilfuse.InitBigWrites
in the response to fuse's init request, which appears to result in 32 KiB writes from the kernel.
New results for 1 GiB:
real 0m43.312s
user 0m0.052s
sys 0m2.656s
real 0m37.229s
user 0m0.064s
sys 0m1.980s
Better.
Surprisingly, InitWritebackCache
plus big writes appears to make things much worse:
real 1m42.914s
user 0m0.012s
sys 0m0.220s
from gcsfuse.
Hacked in some logging that shows that with 32 KiB most of the time is spent in ObjectProxy.Sync, which is doing the network write:
2015/03/25 05:05:47.256616 object_proxy.go:244: WriteAt called
2015/03/25 05:05:47.256658 object_proxy.go:412: ensureLocalFile called
2015/03/25 05:05:47.293666 object_proxy.go:423: ensureLocalFile returning
2015/03/25 05:05:47.293758 object_proxy.go:259: WriteAt returning
2015/03/25 05:05:52.133747 object_proxy.go:296: Sync called
2015/03/25 05:06:29.445792 object_proxy.go:342: Sync returning
real 0m42.493s
user 0m0.044s
sys 0m1.908s
Now that's actually a bit odd to me. What is our io.Copy
doing that gsutil is not? Serving from a bytes.Reader
rather than the file doesn't seem to make a difference.
from gcsfuse.
FWIW, here is the CPU profile while calling CreateObject:
flat flat% sum% cum cum%
8900ms 48.79% 48.79% 8900ms 48.79% crypto/cipher.(*gcm).mul
2010ms 11.02% 59.81% 2010ms 11.02% crypto/aes.encryptBlockAsm
1960ms 10.75% 70.56% 1960ms 10.75% runtime.memmove
760ms 4.17% 74.73% 760ms 4.17% crypto/cipher.getUint64
560ms 3.07% 77.80% 10200ms 55.92% crypto/cipher.(*gcm).updateBlocks
540ms 2.96% 80.76% 540ms 2.96% runtime.futex
520ms 2.85% 83.61% 560ms 3.07% syscall.Syscall
480ms 2.63% 86.24% 2640ms 14.47% crypto/aes.(*aesCipher).Encrypt
460ms 2.52% 88.76% 3760ms 20.61% crypto/cipher.(*gcm).counterCrypt
310ms 1.70% 90.46% 310ms 1.70% crypto/cipher.fastXORWords
from gcsfuse.
Although gcsfuse CPU usage hovers at about 50%, so that's probably not the bottleneck.
from gcsfuse.
To do for alpha: Document these numbers, set performance expectations.
from gcsfuse.
Switched to resumable object uploads in jacobsa/gcloud@3b1af22, which seems to make a bit of a difference to upload speed. As far as I can tell, this effect is responsible for most of the remaining delta with gsutil. (Ignoring the unavoidable double-copy overhead when staging the temporary file in gcsfuse.)
from gcsfuse.
To do:
- Wait for #27 (increasing kernel write size).
- Try the test above again.
- Set performance expectations in the docs.
- Call it good enough or not. If good enough, close this bug. Either way remove from alpha milestone.
from gcsfuse.
Issue #27 is fixed.
Tried again with 1 GiB:
real 0m27.226s
user 0m0.064s
sys 0m2.152s
real 0m24.109s
user 0m0.040s
sys 0m1.736s
real 0m25.231s
user 0m0.048s
sys 0m1.616s
Much much better, and proabably good enough for now. Let's document that performance is comparable with gsutil, and wait for specific complaints.
from gcsfuse.
Related Issues (20)
- Creating files does not result in them being created in bucket HOT 14
- Advertised inode permissions don't do anything HOT 9
- Error Encountered When Mounting GCS Bucket using gcsfuse on Cloud Run following Official Tutorial HOT 29
- GCSFuse can no longer be installed from the official repo (502 Bad Gateway) HOT 4
- Affected by CVE-2023-39323 and CVE-2023-39320? HOT 4
- GCSFuse sidecar is not compatible with restricted Kubernetes pod security standard HOT 2
- MaxRetryDuration flag is not implemented
- Please upgrade Go / Go Libraries - multiple vulnerabilities detected HOT 2
- GCSFUSE hanging while trying to write large files (~6GB) HOT 19
- offline installation on rockly linux running as a gcp vm HOT 5
- GCSFuse process crashing without logging anything HOT 14
- Application default credentials aren't reloaded on expiration HOT 2
- No such host while creating mounts HOT 5
- gcsfuse not working with custom container in Vertex AI on NVIDIA_L4 HOT 9
- `--implicit-dirs` not able to be used in conjunction with `mount -t ...` HOT 5
- Gcsfuse error 429 to many requests. HOT 8
- Concurrent Map iteration and map write HOT 7
- Feature Request: --pre-create-implicit-dirs HOT 5
- How to make gcsfuse mount persistent in GCP Linux RHEL VM HOT 4
- GCSFuse performance on Vertex AI custom training job HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from gcsfuse.