juicedata / juicefs Goto Github PK

View Code? Open in Web Editor NEW

10.2K 115.0 899.0 58.41 MB

JuiceFS is a distributed POSIX file system built on top of Redis and S3.

Home Page: https://juicefs.com

License: Apache License 2.0

Makefile 0.33% Go 84.11% Java 11.14% Shell 3.19% Dockerfile 0.02% C 1.22%

filesystem cloud-native golang redis distributed-systems storage object-storage posix hdfs s3 bigdata go

juicefs's People

Contributors

Stargazers

Watchers

Forkers

eliasyaoyc suzaku anylzer divinerapier zhaoxinlong zhenshuitieniu haiyang1987 run-lin zhangchi1992 zkkxu willfenglover aylei zhanglei singlecorner 3aceshowhand zuston ngsky my3157 kinderyj org-mars xuanwo lushan01 kz33 zmoon111 tom2jack isgasho zon-ke showsmall chengyh2go 1008610010 marks-yag reichswehr leosunli xyb b-xiang xinxinsh startime-h dll02 jianlirong zzq001010 laashub-soa tidesq 201341 xiaoxiaogol icichainz 502110983 smartree fujimuramasa monad-one dam1029 hufh mway08 limingzju villdecl ares7 amtech banbudu0 socios-linux vivicai kokizzu josephmisiti deepmagic-io lyrl marciopocebon rayleyva therealmarv saileshkhadka sjas ashikcodecamper innerpeace1203 ddling1216 armersong dut3062796s gujiacun heartshare lihuibng beaver-company trendingtechnology distributedarchitecture dk-lockdown ppker dngiveu irwinsun yujunz davies cooklhq rpbear88 sumnotes zyfjeff cybernetics luoshuihudie hadryan mu-l qianxiaoming laubersder huangweiboy2 last-saiyan any0503 rheehot umuzhaohui

juicefs's Issues

JuiceFS can't working on redis-enterprise

root@ubuntu:/# ./juicefs --debug format --storage=s3 --bucket=https://juicefs-peter-test.s3.us-east-1.amazonaws.com     --access-key={} --secret-key={}   redis-enterprise.default.svc.cluster.local:8001 data
2021/01/21 08:21:50.811888 juicefs[3018] <INFO>: Meta address: redis://redis-enterprise.default.svc.cluster.local:8001
2021/01/21 08:21:50.816244 juicefs[3018] <WARNING>: parse info: ERR command not found
2021/01/21 08:21:50.816410 juicefs[3018] <FATAL>: Meta is not available: create session: ERR command not found
root@ubuntu:/# redis-cli
Could not connect to Redis at 127.0.0.1:6379: Connection refused
not connected>
root@ubuntu:/# redis-cli -h redis-enterprise.default.svc.cluster.local -p 8001
redis-enterprise.default.svc.cluster.local:8001> ping
PONG
redis-enterprise.default.svc.cluster.local:8001>

Connect juicefs to ceph object store without formatting volume?

The quick start section in https://github.com/juicedata/juicefs says we should format the volume before to do anything, so I am wondering two questions:

can we use juicefs client to communicate with an existing object store and access those already existing data?
if I format volume using an existing object store, will the existing data on that object store be erased?

Question: Redis Sentinal Support

Actually the FAQ states that https://github.com/juicedata/juicefs#can-i-use-redis-cluster but is there a way to at least use redis sentinal or is this considered unsafe (and could loose data) aswell?

Refactor this repo to present a well documented Go library

What would you like to be added:

Refactor this repo to present a well documented Go library. This library should provide interfaces similar to os.Open so that we can use JuiceFS without actually mounting it.

Why is this needed:

We can't easily mount a filesystem in some environments (eg. testing / serverless)
The Go implementation is already here in this repo

Metrics for JuiceFS

What would you like to be added:
webui for juicefs
Why is this needed:
it is easier to monitor the file meta info and s3 usage info for people who don't know how to use redis and s3 client.

Comprehensive User Guide

What would you like to be added:

Add a user guide including all the details of command line argument, especially for the object store.

Why is this needed:

People may find it difficult to connect JuiceFS with some object store, for example, Ceph, we should have a guide on how to specify the arguments for every object store we support.

Backlog

tracking deletion of chunks

What would you like to be added:

Added a list or queue to tracking not-used chunks, so we can retry-deletion if something failed.

Why is this needed:

So make sure that no object is leaked.

Question: Is it required using Redis with persistence enabled?

Since JuiceFS relies on Redis to store the metadata of files. Is it required to enable the persistence features such as RDB, AOF, or both to make sure the metadata won't lose once the Redis server gets restarted?

P.S. I Can't log in to the Slack channel (error: <my-email> doesn’t have an account on this workspace.) so I posted my question here.

Measure code coverage by test

What would you like to be added:

Measure code coverage by test.

Why is this needed:

Code coverage is not perfect, but it's a reasonably good metric for quality.

not able to use google storage due to possible parsing bug when using "juicefs format"

Hello,

I try to make use of a google cloud storage bucket called juicefs

I'm having problems providing the right command line syntax.
From juicefs format -h and from looking at the source code aswell as the documentation it seems I need to provide the following options:

bucket
accesskey
secretkey
redisserver
directoryname
Maybe a region?

In google-language a bucket is referenced like this:
gs://juicefs

Entering --bucket gs://juicefs results in the following runtime error:

 ./juicefs format --storage gs --bucket gs://juicefs --accesskey Gxxxxxxxxxxxxxxxxxx --secretkey Oxxxxxxxxxxxxxxxxxxx redis://redis-master:6379/4 test3


2021/01/15 13:24:39.231337 juicefs[3546] <INFO>: Meta address: redis://redis-master:6379/4
panic: runtime error: index out of range [1] with length 1

goroutine 1 [running]:
github.com/juicedata/juicesync/object.newGS(0x7fffedb1a878, 0xc, 0x7fffedb1a891, 0x13, 0x7fffedb1a8b1, 0x14, 0x2, 0x0, 0x2c, 0xc000226a50)
        /go/pkg/mod/github.com/juicedata/[email protected]/object/gs.go:145 +0x3e6
github.com/juicedata/juicesync/object.CreateStorage(0x7fffedb1a86c, 0x2, 0x7fffedb1a878, 0xc, 0x7fffedb1a891, 0x13, 0x7fffedb1a8b1, 0x14, 0x2c, 0xc000226a50, ...)
        /go/pkg/mod/github.com/juicedata/[email protected]/object/object_storage.go:123 +0x1ad
github.com/juicedata/juicefs/pkg/object.CreateStorage(0x7fffedb1a86c, 0x2, 0x7fffedb1a878, 0xc, 0x7fffedb1a891, 0x13, 0x7fffedb1a8b1, 0x14, 0xc00069f8f8, 0x45f8c7, ...)
        /go/src/github.com/juicedata/juicefs/pkg/object/interface.go:57 +0x26d
main.createStorage(0xc00069fb60, 0x1588442, 0x8, 0xc000226a20, 0x24)
        /go/src/github.com/juicedata/juicefs/cmd/format.go:55 +0xa5
main.format(0xc00011c100, 0x8, 0xe)
        /go/src/github.com/juicedata/juicefs/cmd/format.go:165 +0x8fe
github.com/urfave/cli/v2.(*Command).Run(0xc00059ad80, 0xc0000cbd80, 0x0, 0x0)
        /go/pkg/mod/github.com/urfave/cli/[email protected]/command.go:163 +0x4ed
github.com/urfave/cli/v2.(*App).RunContext(0xc0004c91e0, 0x178d400, 0xc000044080, 0xc00003c0c0, 0xc, 0xc, 0x0, 0x0)
        /go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:313 +0x81f
github.com/urfave/cli/v2.(*App).Run(...)
        /go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:224
main.main()
        /go/src/github.com/juicedata/juicefs/cmd/main.go:78 +0x99c
root@juice-user-deployment-55c7f57c98-wnx96:/home#

Entering --bucket gs://juicefs. (dot at the end) lets the command run.

So after some trial-and-error I came up with a cli-command, that seems to work.

Command:

./juicefs format --storage gs --bucket gs://juicefs. --accesskey Gxxxxxxxxxxxxxxxxxx --secretkey Oxxxxxxxxxxxxxxxxxxx redis://redis-master:6379/5 test3 --force

Output:

2021/01/15 13:17:27.406060 juicefs[3468] <INFO>: Meta address: redis://redis-master:6379/5
2021/01/15 13:17:27.412904 juicefs[3468] <INFO>: Data uses gs://juicefs/test3/
2021/01/15 13:17:27.543149 juicefs[3468] <INFO>: Volume is formatted as {Name:test3 UUID:31777d36-1fee-4ae1-aeec-81399dc19289 Storage:gs Bucket:gs://juicefs. AccessKey:Gxxxxxxxxxxxxxxxxxx SecretKey:removed BlockSize:4096 Compression:lz4 Partitions:0}

Is this the right way to use the cli?
After mounting the volume I see there's new inodes created, but nothing gets synced to the cloud.

command:

mkdir juice
./juicefs mount redis://redis-master:6379/5 juice

output:

2021/01/15 13:21:38.852377 juicefs[3508] <INFO>: Meta address: redis://redis-master:6379/5
2021/01/15 13:21:38.859770 juicefs[3508] <INFO>: Data use gs://juicefs/test3/
2021/01/15 13:21:38.859798 juicefs[3508] <INFO>: mount volume test3 at juice
2021/01/15 13:21:38.859817 juicefs[3508] <INFO>: Cache: /var/jfsCache capacity: 1024 MB

Use Github Actions instead of Travis for CI/CD.

What would you like to be added:

Use Github Actions instead of Travis for CI/CD.

Why is this needed:

The ecosystem of Github Actions is much bigger than that of Travis, there are tons of pre-made actions ready to be reused.

Remove Usage Tracking

For commercial products, tracking user data may be an understandable and acceptable thing. But for an open source software... well, I think this will become a very controversial behavior (even if you just collects seemingly harmless data). I don't think people would want the JuiceFS process they run to send any data to endpoints that aren't part of their project.

Therefore, please consider removing any “tracking” behavior. ❤️

README is too long, maybe it's time to split it

We can extract some parts out of current README, such as Architecture, Getting started.

IPv6 cannot assign requested address

What happened:
I don't have active IPv6 link and no ipv6 is configured on any inteface.
I have this error log:

2021/01/23 19:12:06.391177 juicefs[15700] <WARNING>: upload chunks/0/0/1_0_4096: RequestError: send request failed
caused by: Put "https://dvc-juicefs-test.s3.fr-par.scw.cloud/test/chunks/0/0/1_0_4096": dial tcp [2001:bc8:1002::30]:443: connect: cannot assign requested address (try 1)

What you expected to happen:
use ipv4 when no ipv6 is available

Environment:

JuiceFS version (use ./juicefs --version): juicefs version dev (now HEAD)
Cloud provider or hardware configuration running JuiceFS: dedicated server
OS (e.g: cat /etc/os-release): Ubuntu 20.04
Kernel (e.g. uname -a): Linux 5.4.0-64-generic
Object storage (cloud provider and region): Scaleway fr-par
Redis info (version, cloud provider managed or self maintained): Simple docker redis container
Network connectivity (JuiceFS to Redis, JuiceFS to object storage): localhost

Support path-style URL for S3 (or S3-compatible) storage

What would you like to be added:

Support path-style URL for S3 (or S3-compatible) storage

Why is this needed:

Currently, JuiceFS only support virtual hosted-style URI. The difference between virtual hosted-style and path-style is:

Virtual hosted-style: https://<bucket>.s3.<region>.amazonaws.com
Path-style: https://s3.<region>.amazonaws.com/<bucket>

Although AWS would deprecate path-style URI in the future, but some S3-compatible storage still use path-style (e.g. Ceph RGW). So we may need support this type URI.

Why juicefs client communicate with object store with https protocol instead of s3

as the title says, since s3 is a standard protocol for object store, why use https protocol to communicate between juicefs client and object store?

Flaky test: pkg/chunk.TestAsyncStore

See https://travis-ci.com/github/juicedata/juicefs/builds/215000981

Support directories with millions of files.

What would you like to be added:

Currently, we fetch the attributes of files in single directory with single batch request to Redis, that could be slow or fail, and block other requests.

We can split those into small batches, for example, 1000 per batch.

Why is this needed:

The number of files could be millions, we don't want people be bited by that.

Backlog

Call MGET with small batches #110
Use HSCAN instead of HGETALL #128

JuiceFS does not cancel ongoing prefetch requests after file is closed

What happened:
JuiceFS keeps transfering blocks for IO ops has been cancelled. During fio read test of 4 GB transfer I cancelled the fio process at ~500MB was way too slow, JuiceFS process didn't react to cancelling the IO test, instead kept copying the block to S3 endpoint
What you expected to happen:
I expected JuiceFS to stop IO, and reflect the most recent state. Instead JuiceFS has continued the file transfer -- ignoring the cancelled IO request.
How to reproduce it (as minimally and precisely as possible):
format --compress none --force --access-key XXXXXXX --secret-key XXXXXXX --block-size 1024 --storage s3 --bucket=https://xxxxxxxxxx.s3.us-east-1.amazonaws.com REDIS-SEVER benchmark
juicefs mount --max-uploads=150 --io-retries=20 REDIS-SERVER /mnt/aws
fio --name=sequential-read --directory=/mnt/aws --rw=read --refill_buffers --bs=4M --size=4G

Anything else we need to know?:
was done on a Lenovo X1 7th edition, 16GB memory i7-8665U 4 core processor, ethernet hooked up to linux router with 500Mbit/sec symmetric fiber optic internet connection to Bell Canada
Environment:

JuiceFS version (use ./juicefs --version): juicefs version 0.9.3-34 (2021-01-26 15db788)
Cloud provider or hardware configuration running JuiceFS:
OS (e.g: cat /etc/os-release):
NAME="Linux Mint"
VERSION="20 (Ulyana)"
ID=linuxmint
ID_LIKE=ubuntu
PRETTY_NAME="Linux Mint 20" VERSION_ID="20" UBUNTU_CODENAME=focal
Kernel (e.g. uname -a): Linux io 5.4.0-58-generic #64-Ubuntu SMP Wed Dec 9 08:16:25 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Object storage (cloud provider and region): aws s3 us-east-1
Redis info (version, cloud provider managed or self maintained): Redis server v=5.0.7 sha=00000000:0 malloc=jemalloc-5.2.1 bits=64 build=636cde3b5c7a3923
Network connectivity (JuiceFS to Redis, JuiceFS to object storage): 500Mbit/sec fiber
Others:

Test it with xfstests

What would you like to be added:

Run integrity test using xfstests

Why is this needed:

We'd like to know more on the compatibility

Hide secrets from logging

What would you like to be added:

We should remove the secret from format when logging it.

Why is this needed:

Secrets shouldn't be logged.

Report input/output error when start a rand write io test

What happened:

fio report:

big-file-multi-write: (groupid=0, jobs=1): err= 5 (file:io_u.c:1756, func=io_u error, error=Input/output error): pid=15850: Tue Jan 12 18:03:50 2021

juicefs report:

2021/01/12 18:03:25.169256 juicefs[6976] <WARNING>: compact 27806 0 with 20 slices: message 1001 is not supported
2021/01/12 18:03:25.216898 juicefs[6976] <WARNING>: compact 27806 1 with 20 slices: message 1001 is not supported
2021/01/12 18:03:25.379931 juicefs[6976] <ERROR>: error: redis: transaction failed
2021/01/12 18:03:25.380029 juicefs[6976] <WARNING>: write inode:27806 error: input/output error
2021/01/12 18:03:25.380042 juicefs[6976] <ERROR>: write inode:27806 indx:1  input/output error

What you expected to happen:

fio and juicefs have no error report.

How to reproduce it (as minimally and precisely as possible):

Start a rand write on juicefs:

 fio --name=big-file-multi-write  --rw=randwrite --refill_buffers --bs=4k --size=100M --numjobs=1 --end_fsync=1

Anything else we need to know?:

Environment:

JuiceFS version (use ./juicefs --version): juicefs version 0.9.1 (2021-01-10T16:31:23Z 1b9f6f4)
Cloud provider or hardware configuration running JuiceFS: Aliyun, 8core, 16GiB mem
OS (e.g: cat /etc/os-release): Ubuntu 18.04.3
Kernel (e.g. uname -a): 4.15.0-66-generic
Object storage (cloud provider and region): Aliyun OSS
Redis info (version, cloud provider managed or self maintained): Aliyun RDS Redis 5.x
Network connectivity (JuiceFS to Redis, JuiceFS to object storage): Aliyun VPC
Others:

Helm Chart for K8S deploy

What would you like to be added:
Helm Chart for deploy juicefs in kubernetes

Why is this needed:
In my situation, I want use Chart for easily deploy and upgrade.
If possible, I can do something about that.

Fix errors reported by golangci-lint

What would you like to be added:

Fix errors reported by golangci-lint.

Why is this needed:

After fixing all the existing errors, golangci-lint can be added into the CI pipeline and pre-commit hook.

Prebuild JuiceFS for arm64 and armhf (Raspberry PI 3 and 4)

Hey,

would it be possible to pre-build those binary for arm64 and armhf ?

Regards
SeaLife

Check settings on Redis

What would you like to be added:

After connected to Redis, check the following settings on redis:

AOF is ON
RDB is ON
no in cluster mode (cluster_enabled is 0)
maxmemory_policy is set to noeviction
show a warning if it's not replicated
version should be >= 2.2

Why is this needed:

Redis is responsible for persistence of metadata, so it should NOT lose any data, otherwise the data in JuiceFS will be lost.

XAttr data still exists in meta server after removing the file.

What happened:
The XAttr data are still in redis after removing the file.

What you expected to happen:
The XAttr data of deleted file will also disappear.

How to reproduce it (as minimally and precisely as possible):
After run following commands:

# running in macOS Catalina 
./juicefs format localhost test
sudo ./juicefs mount localhost ~/jfs
cp ~/*.jpg ~/jfs
rm -f ~/jfs/*.jpg

The keys I would like to see in redis should be:

$ redis-cli keys "*" | sort
i1
nextchunk
nextinode
nextsession
sessions
setting
totalInodes
usedSpace

But there is a lot of junk XAttr data:

$ redis-cli keys "*" | sort
i1
nextchunk
nextinode
nextsession
sessions
setting
totalInodes
usedSpace
x2
x3
x4
x5
x6
x7
x8

Anything else we need to know?:

Environment:

JuiceFS version (use ./juicefs --version): 0.9.2 (2021-01-15 d0aa162)
Cloud provider or hardware configuration running JuiceFS: MacBook Air (13-inch, 2017)
OS (e.g: cat /etc/os-release): macOS 10.15.7
Kernel (e.g. uname -a): Darwin Kernel Version 19.6.0
Object storage (cloud provider and region): none
Redis info (version, cloud provider managed or self maintained): Redis 6.0.5 (00000000/0) 64 bit
Network connectivity (JuiceFS to Redis, JuiceFS to object storage): localhost
Others: none

Support Object Storage in China Mobile Cloud (ECloud)

Here is the product: https://ecloud.10086.cn/home/product-introduction/onest

Fix problems found by goreport

What would you like to be added:

Fix some of the problem found by goreport

Why is this needed:

goreport can help us improve code quality.

Add a tool to analyze accesslog

What would you like to be added:

Add a tool to read access log and generate realtime metrics for current workload, similar to top.

Why is this needed:

To understand the internal activities for current workload.

In daemon mode, the log should be output to syslog by default

What would you like to be added:

In non-daemon mode, log output is always to stderr, while in daemon mode, log output is to Syslog by default, unless --nosyslog is specified

Why is this needed:
I'm currently annoyed by the fact that the log output to Syslog is only available when the -quiet parameter is added.

Crashed after mount

What happened:
执行挂载命令./juicefs --trace mount localhost /root/jfs

fatal error: unexpected signal during runtime execution
[signal SIGBUS: bus error code=0x2 addr=0x1bbdc44 pc=0x468d19]

runtime stack:
runtime.throw(0x15f4d1b, 0x2a)
        /usr/local/go/src/runtime/panic.go:1116 +0x72
runtime.sigpanic()
        /usr/local/go/src/runtime/signal_unix.go:704 +0x4ac
runtime.gentraceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc000000300, 0x0, 0x0, 0x7fffffff, 0x1617e68, 0x7f8b05ffa9a8, 0x0, ...)
        /usr/local/go/src/runtime/traceback.go:189 +0x2b9
runtime.copystack(0xc000000300, 0x40000)
        /usr/local/go/src/runtime/stack.go:910 +0x287
runtime.shrinkstack(0xc000000300)
        /usr/local/go/src/runtime/stack.go:1178 +0x13d
runtime.scanstack(0xc000000300, 0xc00005ae98)
        /usr/local/go/src/runtime/mgcmark.go:815 +0x56e
runtime.markroot.func1()
        /usr/local/go/src/runtime/mgcmark.go:245 +0xc6
runtime.markroot(0xc00005ae98, 0x14)
        /usr/local/go/src/runtime/mgcmark.go:218 +0x310
runtime.gcDrain(0xc00005ae98, 0x7)
        /usr/local/go/src/runtime/mgcmark.go:1109 +0x118
runtime.gcBgMarkWorker.func2()
        /usr/local/go/src/runtime/mgc.go:1981 +0x177
runtime.systemstack(0xc000102900)
        /usr/local/go/src/runtime/asm_amd64.s:370 +0x66
runtime.mstart()
        /usr/local/go/src/runtime/proc.go:1116

goroutine 7 [GC worker (idle)]:
runtime.systemstack_switch()
        /usr/local/go/src/runtime/asm_amd64.s:330 fp=0xc0000d0f60 sp=0xc0000d0f58 pc=0x479ba0
runtime.gcBgMarkWorker(0xc000059800)
        /usr/local/go/src/runtime/mgc.go:1945 +0x1be fp=0xc0000d0fd8 sp=0xc0000d0f60 pc=0x428d1e
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc0000d0fe0 sp=0xc0000d0fd8 pc=0x47b981
created by runtime.gcBgMarkStartWorkers
        /usr/local/go/src/runtime/mgc.go:1839 +0x77

goroutine 1 [GC assist marking (scan), locked to thread]:
bytes.makeSlice(0x200, 0x0, 0x0, 0x0)
        /usr/local/go/src/bytes/buffer.go:229 +0x73
bytes.(*Buffer).grow(0xc0006479b0, 0x200, 0x0)
        /usr/local/go/src/bytes/buffer.go:142 +0x156
bytes.(*Buffer).Grow(...)
        /usr/local/go/src/bytes/buffer.go:161
io/ioutil.readAll(0x17a80e0, 0xc000491600, 0x200, 0x0, 0x0, 0x0, 0x0, 0x0)
        /usr/local/go/src/io/ioutil/ioutil.go:34 +0xa5
io/ioutil.ReadAll(...)
        /usr/local/go/src/io/ioutil/ioutil.go:45
google.golang.org/protobuf/internal/impl.legacyLoadFileDesc(0x218d6c0, 0x181, 0x181, 0x1, 0x0)
        /root/go/pkg/mod/google.golang.org/[email protected]/internal/impl/legacy_file.go:54 +0x178
google.golang.org/protobuf/internal/impl.legacyLoadMessageDesc(0x17e4260, 0x142d4a0, 0x15f55a5, 0x2b, 0x0, 0x0)
        /root/go/pkg/mod/google.golang.org/[email protected]/internal/impl/legacy_message.go:131 +0x357
google.golang.org/protobuf/internal/impl.legacyLoadMessageInfo(0x17e4260, 0x142d4a0, 0x15f55a5, 0x2b, 0x0)
        /root/go/pkg/mod/google.golang.org/[email protected]/internal/impl/legacy_message.go:48 +0xbd
google.golang.org/protobuf/internal/impl.Export.LegacyMessageTypeOf(0x17c0d60, 0x0, 0x15f55a5, 0x2b, 0x0, 0x0)
        /root/go/pkg/mod/google.golang.org/[email protected]/internal/impl/legacy_export.go:33 +0xa5
github.com/golang/protobuf/proto.RegisterType(0x17c0d60, 0x0, 0x15f55a5, 0x2b)
        /root/go/pkg/mod/github.com/golang/[email protected]/proto/registry.go:186 +0x4d
github.com/colinmarc/hdfs/v2/internal/protocol/hadoop_common.init.22()
        /root/go/pkg/mod/github.com/colinmarc/hdfs/[email protected]/internal/protocol/hadoop_common/TraceAdmin.pb.go:160 +0x4f

goroutine 18 [sleep]:
time.Sleep(0x8bb2c97000)
        /usr/local/go/src/runtime/time.go:188 +0xbf
github.com/juicedata/juicefs/pkg/utils.init.0.func1()
        /mywork/juicefs/pkg/utils/alloc.go:65 +0x30
created by github.com/juicedata/juicefs/pkg/utils.init.0
        /mywork/juicefs/pkg/utils/alloc.go:63 +0x35

goroutine 19 [chan receive]:
github.com/baidubce/bce-sdk-go/util/log.NewLogger.func1(0xc0001163c0)
        /root/go/pkg/mod/github.com/baidubce/[email protected]/util/log/logger.go:362 +0x145
created by github.com/baidubce/bce-sdk-go/util/log.NewLogger
        /root/go/pkg/mod/github.com/baidubce/[email protected]/util/log/logger.go:359 +0xda

goroutine 21 [select]:
go.opencensus.io/stats/view.(*worker).start(0xc00012ca00)
        /root/go/pkg/mod/[email protected]/stats/view/worker.go:154 +0x105
created by go.opencensus.io/stats/view.init.0
        /root/go/pkg/mod/[email protected]/stats/view/worker.go:32 +0x57

What you expected to happen:
正常挂载
How to reproduce it (as minimally and precisely as possible):
执行挂载命令
Anything else we need to know?:

Environment:

JuiceFS version (use ./juicefs --version):
0.9.3-22 (2021-01-25 8fcee47)
Cloud provider or hardware configuration running JuiceFS:
OS (e.g: cat /etc/os-release):
NAME="CentOS Linux"
VERSION="8 (Core)"
Kernel (e.g. uname -a):
4.18.0-167.el8.x86_64 #10 SMP Fri Oct 30 14:35:31 CST 2020 x86_64 x86_64 x86_64 GNU/Linux
Object storage (cloud provider and region):
Redis info (version, cloud provider managed or self maintained):
Redis server v=5.0.3 sha=00000000:0 malloc=jemalloc-5.1.0 bits=64 build=28849dbea6f07cc8
Network connectivity (JuiceFS to Redis, JuiceFS to object storage):
Others:

dial tcp: too many colons in address (IPv6)

I noticed this in my logs from time to time. It doesn't seem to cause big problems, but this error looks curious to me:

2021/01/13 20:44:57.482340 juicefs[81861] <WARNING>: upload chunks/0/12/12735_2_16777216: RequestError: send request failed
caused by: Put "https://sana-store.s3.fr-par.scw.cloud/seedbox/chunks/0/12/12735_2_16777216": dial tcp: address 2001:bc8:1002::30:443: too many colons in address (try 1)
2021/01/13 20:46:37.409525 juicefs[81861] <WARNING>: upload chunks/0/12/12829_3_16777216: RequestError: send request failed
caused by: Put "https://sana-store.s3.fr-par.scw.cloud/seedbox/chunks/0/12/12829_3_16777216": dial tcp: address 2001:bc8:1002::30:443: too many colons in address (try 1)
2021/01/13 21:09:19.104197 juicefs[81861] <WARNING>: upload chunks/0/13/13873_1_16777216: RequestError: send request failed
caused by: Put "https://sana-store.s3.fr-par.scw.cloud/seedbox/chunks/0/13/13873_1_16777216": dial tcp: address 2001:bc8:1002::30:443: too many colons in address (try 1)
2021/01/13 21:22:25.961086 juicefs[81861] <WARNING>: upload chunks/0/14/14410_1_16777216: RequestError: send request failed
caused by: Put "https://sana-store.s3.fr-par.scw.cloud/seedbox/chunks/0/14/14410_1_16777216": dial tcp: address 2001:bc8:1002::30:443: too many colons in address (try 1)
2021/01/13 21:59:58.166127 juicefs[81861] <WARNING>: upload chunks/0/15/15977_3_16777216: RequestError: send request failed
caused by: Put "https://sana-store.s3.fr-par.scw.cloud/seedbox/chunks/0/15/15977_3_16777216": dial tcp: address 2001:bc8:1002::30:443: too many colons in address (try 1)

I'm using Scaleway, the endpoint has these addresses:

➜  ~ dog s3.fr-par.scw.cloud
A s3.fr-par.scw.cloud. 22h51m22s   62.210.134.176
➜  ~ dog s3.fr-par.scw.cloud AAAA
AAAA s3.fr-par.scw.cloud. 23h59m21s   2001:bc8:1002::30

Maybe it uses IPv4 most of the time and fails on IPv6?

Let me know if I can provide more info.

Environment:

JuiceFS version (use ./juicefs --version): juicefs version 0.9.1-24 (2021-01-13 3dc45dc)
Cloud provider or hardware configuration running JuiceFS: dedicated server
OS (e.g: cat /etc/os-release): Debian 10
Kernel (e.g. uname -a): Linux 4.19.0-13-amd64 #1 SMP Debian 4.19.160-2 (2020-11-28)
Object storage (cloud provider and region): Scaleway fr-par (https://s3.fr-par.scw.cloud)
Redis info (version, cloud provider managed or self maintained): v5.0.3, self managed, installed from deb repo
Network connectivity (JuiceFS to Redis, JuiceFS to object storage): local redis, 1 Gbps to object storage

JuiceFS CSI plugin for Kubernetes

What would you like to be added:
CSI plugin for JuiceFS

Why is this needed:
As more and more applications are running on Kubernetes, CSI plugin is the de facto form to use third party storage like JuiceFS.

Add golangci-lint as a CI step

What would you like to be added:

Add golangci-lint as a CI step, preferably after #26 is fixed.

Why is this needed:

golangci-lint helps avoid error-prone patterns in code.

Mount JuiceFS using /etc/fstab

What would you like to be added:

Mount JuiceFS using rule defiined in /etc/fstab, for example,

redis_host    /jfs       juicefs     _netdev     0  0

The mount will find /sbin/mount.juicefs, run it as mount.juicefs redis_host /jfs -o _netdev

We cloud translate these arguments into the format it expect, at the beginning of juicefs.

Why is this needed:

We want to mount JuiceFS automatically after machine boot.

Design a JuiceFS's storage API protocol (different from S3 API) for users to implement their own backend

The storage layer of this project is based on object storage, which is great (I'm a big fan of this kind of usage). But it shouldn't be limited to AWS S3 or any other cloud service providers' products. It should also support users to implement and use their own custom storage backends by following a simplified (compared to S3 API) API protocol that belongs to JuiceFS.

Enable Redis client cache

What would you like to be added:

We can enable the redis client cache by command options, which could be useful for readonly workload.

Why is this needed:

Faster is always good.

Speed up using Lua script

What would you like to be added:

Currently, A lookup operation will issue two redis requests, we could reduce that to one request using Lua script.

When Lua script is not supported by redis server, we should fallback to current behavior.

Why is this needed:

Lookup() is called so frequently, we want it to be faster.

juicefs doesn't remove chunks from S3 after file rewrite

What happened:

juicefs doesn't remove chunks from S3 after file rewrite

What you expected to happen:

juicefs clean up chunks from the previous versions of the file

How to reproduce it (as minimally and precisely as possible):

juicefs mount -d localhost /s3storage
mkdir /s3storage/test/
for i in {1..100}; do dd if=/dev/urandom of=/s3storage/test/testrewrite bs=1M count=1; done

As a result S3 bucket has 106 objects and is 106MB in size

Anything else we need to know?:

Environment:

JuiceFS version (use ./juicefs --version): juicefs version dev (now HEAD) (but it's 0.9.3)
Cloud provider or hardware configuration running JuiceFS: DO Spaces
OS (e.g: cat /etc/os-release): 20.04.1 LTS
Kernel (e.g. uname -a): 5.4.0-51-generic
Object storage (cloud provider and region): DigitalOcean NYC
Redis info (version, cloud provider managed or self maintained): 5.0.7-2
Network connectivity (JuiceFS to Redis, JuiceFS to object storage): local Redis

Build JuiceFS release staticly which can run on alpine

What would you like to be added:
Build JuiceFS staticly with musl libc

Why is this needed:
Currently the release binary is built using glibc，Alpine Linux which is largely used under container environments as its small size, only has musl libc, we can add support for it.

Question: Support HDFS? And difference with Alluxio?

Performance: slow metadata service

What happened:
copying linux kernel tree show few KB/sec throughput at best, mdtest showed 5 - 10 transactions per second
What you expected to happen:
10MB/sec throughput for similar class file systems for linux kernel tree copy, 1000 - 10000 transaction per second based on redis performance
How to reproduce it (as minimally and precisely as possible):
on decent/non-virtual AWS EC2 instance setup/run redis, 22ms latency away run juiceFS mounted as a local directory, copy a recent linux kernel tree using {rsync, cp, midnight commander, ...}
Anything else we need to know?:
Great project/undertaking!
When the mounted directory is much 'closer' to redis, the metadata service exhibits fair behaviour. IO ops for larger files are OK, and for multi GB files JuiceFS has excellent performance; although this has not been verified for multi host mixed read/write scenerio.
Environment:

JuiceFS version (use ./juicefs --version): 0.9.3-211 (2021-01-28 9cdfa8a)
Cloud provider or hardware configuration running JuiceFS: a1.2xlarge
OS (e.g: cat /etc/os-release): ubuntu 20.04
Kernel (e.g. uname -a): 0.9.3-211 (2021-01-28 9cdfa8a)
Object storage (cloud provider and region): aws s3 us-east-1
Redis info (version, cloud provider managed or self maintained): 6.0.10
Network connectivity (JuiceFS to Redis, JuiceFS to object storage): 500Mbit/sec bell fiber to redis, same for s3
Others:

Use sync.map replace mutex

What would you like to be added:

Use sync.map replace mutex, there are so many lock and unlock operations in our code that it's easy to forget that unlock causes deadlocks. Instead of mutex, use sync.map and atomic variable, Take the following code for example.

prefetch.go
mem_cache.go
.....

Why is this needed:

Can not see the files in Finder if xattr enabled.

What happened:
Can not see any file in Finder on macOS unless --enable-xattr=false.

What you expected to happen:
View and access files using Finder.

How to reproduce it (as minimally and precisely as possible):

./juicefs format localhost test
sudo ./juicefs mount --enable-xattr=true localhost ~/jfs
cp test.jpg ~/jfs
echo 'hello' > ~/jfs/test.txt
mkdir ~/jfs/docs
ls ~/jfs
open ~/jfs

All files and directory is listed in terminal but only the docs directory is displayed in the Finder.

Anything else we need to know?:
none

Environment:

JuiceFS version (use ./juicefs --version): 0.9.3-5 (2021-01-19 18baa89)
Cloud provider or hardware configuration running JuiceFS: MacBook Air (13-inch, 2017)
OS (e.g: cat /etc/os-release): macOS Catalina 10.15.7
Kernel (e.g. uname -a): Darwin Kernel Version 19.6.0
Object storage (cloud provider and region): none
Redis info (version, cloud provider managed or self maintained): Redis 6.0.5 (00000000/0) 64 bit
Network connectivity (JuiceFS to Redis, JuiceFS to object storage): localhost
Others:
- macFUSE 4.0.5

Handle Redis NOSCRIPT error

What would you like to be added:

Scripts loaded in Redis scripts cache might be removed by the SCRIPT FLUSH command. In such case, a NOSCRIPT error is returned.
This error should be handled and the script should be reloaded.

Why is this needed:

We don't want a SCRIPT FLUSH to break our client.

juicefs umount <mountpoint>

Use juicefs umount to unmount a volume

Add a tool to do benchmark

What would you like to be added:

Add a tool to do benchmark the mounted JuiceFS

Why is this needed:

After a volume is mounted, we would like to how it is performing.

Question: how does JuiceFS interact with S3?

Is it possible to get some more info on how JuiceFS interacts with S3?

Primarily, I'm interested in when a read and write occurs to S3 vs Redis?

Some background… whilst S3 is an amazing object store, it's per request costs ($5/million writes and $0.40 per million reads) can be extremely expensive if you're dealing with lots of tiny objects.

With regards to the chunking/slicing that JuiceFS performs, does this mean writing a lot of small files at once results in only a few S3 put operations and reading them back in write-order would result in only a few reads?

Thanks in advance and I'm excited to keep an eye on this project. 👍

Backup tool for meta

What would you like to be added:

Provide a tool to dump metadata as JSON format, then we can have other tool to assemble them and get data back.

Why is this needed:

In worsest case, if we lose Redis, we'd like to have a tool to get most of data back from S3.

juicedata / juicefs Goto Github PK

juicefs's People

Contributors

Stargazers

Watchers

Forkers

juicefs's Issues

Backlog

Backlog

Recommend Projects

Recommend Topics

Recommend Org