Git Product home page Git Product logo

juicefs's Introduction

JuiceFS Logo

Latest Stable Release GitHub Workflow Status GitHub Workflow Status Go Report English doc Join Slack

JuiceFS is a high-performance POSIX file system released under Apache License 2.0, particularly designed for the cloud-native environment. The data, stored via JuiceFS, will be persisted in Object Storage (e.g. Amazon S3), and the corresponding metadata can be persisted in various compatible database engines such as Redis, MySQL, and TiKV based on the scenarios and requirements.

With JuiceFS, massive cloud storage can be directly connected to big data, machine learning, artificial intelligence, and various application platforms in production environments. Without modifying code, the massive cloud storage can be used as efficiently as local storage.

📖 Document: Quick Start Guide

Highlighted Features

  1. Fully POSIX-compatible: Use as a local file system, seamlessly docking with existing applications without breaking business workflow.
  2. Fully Hadoop-compatible: JuiceFS' Hadoop Java SDK is compatible with Hadoop 2.x and Hadoop 3.x as well as a variety of components in the Hadoop ecosystems.
  3. S3-compatible: JuiceFS' S3 Gateway provides an S3-compatible interface.
  4. Cloud Native: A Kubernetes CSI Driver is provided for easily using JuiceFS in Kubernetes.
  5. Shareable: JuiceFS is a shared file storage that can be read and written by thousands of clients.
  6. Strong Consistency: The confirmed modification will be immediately visible on all the servers mounted with the same file system.
  7. Outstanding Performance: The latency can be as low as a few milliseconds, and the throughput can be expanded nearly unlimitedly (depending on the size of the Object Storage). Test results
  8. Data Encryption: Supports data encryption in transit and at rest (please refer to the guide for more information).
  9. Global File Locks: JuiceFS supports both BSD locks (flock) and POSIX record locks (fcntl).
  10. Data Compression: JuiceFS supports LZ4 or Zstandard to compress all your data.

Architecture | Getting Started | Advanced Topics | POSIX Compatibility | Performance Benchmark | Supported Object Storage | Who is using | Roadmap | Reporting Issues | Contributing | Community | Usage Tracking | License | Credits | FAQ


Architecture

JuiceFS consists of three parts:

  1. JuiceFS Client: Coordinates Object Storage and metadata storage engine as well as implementation of file system interfaces such as POSIX, Hadoop, Kubernetes, and S3 gateway.
  2. Data Storage: Stores data, with supports of a variety of data storage media, e.g., local disk, public or private cloud Object Storage, and HDFS.
  3. Metadata Engine: Stores the corresponding metadata that contains information of file name, file size, permission group, creation and modification time and directory structure, etc., with supports of different metadata engines, e.g., Redis, MySQL, SQLite and TiKV.

JuiceFS Architecture

JuiceFS can store the metadata of file system on different metadata engines, like Redis, which is a fast, open-source, in-memory key-value data storage, particularly suitable for storing metadata; meanwhile, all the data will be stored in Object Storage through JuiceFS client. Learn more

data-structure-diagram

Each file stored in JuiceFS is split into "Chunk" s at a fixed size with the default upper limit of 64 MiB. Each Chunk is composed of one or more "Slice"(s), and the length of the slice varies depending on how the file is written. Each slice is composed of size-fixed "Block" s, which are 4 MiB by default. These blocks will be stored in Object Storage in the end; at the same time, the metadata information of the file and its Chunks, Slices, and Blocks will be stored in metadata engines via JuiceFS. Learn more

How JuiceFS stores your files

When using JuiceFS, files will eventually be split into Chunks, Slices and Blocks and stored in Object Storage. Therefore, the source files stored in JuiceFS cannot be found in the file browser of the Object Storage platform; instead, there are only a chunks directory and a bunch of digitally numbered directories and files in the bucket. Don't panic! This is just the secret of the high-performance operation of JuiceFS!

Getting Started

Before you begin, make sure you have:

  1. One supported metadata engine, see How to Set Up Metadata Engine
  2. One supported Object Storage for storing data blocks, see Supported Object Storage
  3. JuiceFS Client downloaded and installed

Please refer to Quick Start Guide to start using JuiceFS right away!

Command Reference

Check out all the command line options in command reference.

Containers

JuiceFS can be used as a persistent volume for Docker and Podman, please check here for details.

Kubernetes

It is also very easy to use JuiceFS on Kubernetes. Please find more information here.

Hadoop Java SDK

If you wanna use JuiceFS in Hadoop, check Hadoop Java SDK.

Advanced Topics

Please refer to JuiceFS Document Center for more information.

POSIX Compatibility

JuiceFS has passed all of the compatibility tests (8813 in total) in the latest pjdfstest .

All tests successful.

Test Summary Report
-------------------
/root/soft/pjdfstest/tests/chown/00.t          (Wstat: 0 Tests: 1323 Failed: 0)
  TODO passed:   693, 697, 708-709, 714-715, 729, 733
Files=235, Tests=8813, 233 wallclock secs ( 2.77 usr  0.38 sys +  2.57 cusr  3.93 csys =  9.65 CPU)
Result: PASS

Aside from the POSIX features covered by pjdfstest, JuiceFS also provides:

  • Close-to-open consistency. Once a file is written and closed, it is guaranteed to view the written data in the following opens and reads from any client. Within the same mount point, all the written data can be read immediately.
  • Rename and all other metadata operations are atomic, which are guaranteed by supported metadada engine transaction.
  • Opened files remain accessible after unlink from same mount point.
  • Mmap (tested with FSx).
  • Fallocate with punch hole support.
  • Extended attributes (xattr).
  • BSD locks (flock).
  • POSIX record locks (fcntl).

Performance Benchmark

Basic benchmark

JuiceFS provides a subcommand that can run a few basic benchmarks to help you understand how it works in your environment:

JuiceFS Bench

Throughput

A sequential read/write benchmark has also been performed on JuiceFS, EFS and S3FS by fio.

Sequential Read Write Benchmark

Above result figure shows that JuiceFS can provide 10X more throughput than the other two (see more details).

Metadata IOPS

A simple mdtest benchmark has been performed on JuiceFS, EFS and S3FS by mdtest.

Metadata Benchmark

The result shows that JuiceFS can provide significantly more metadata IOPS than the other two (see more details).

Analyze performance

See Real-Time Performance Monitoring if you encountered performance issues.

Supported Object Storage

  • Amazon S3 (and other S3 compatible Object Storage services)
  • Google Cloud Storage
  • Azure Blob Storage
  • Alibaba Cloud Object Storage Service (OSS)
  • Tencent Cloud Object Storage (COS)
  • Qiniu Cloud Object Storage (Kodo)
  • QingStor Object Storage
  • Ceph RGW
  • MinIO
  • Local disk
  • Redis
  • ...

JuiceFS supports numerous Object Storage services. Learn more.

Who is using

JuiceFS is production ready and used by thousands of machines in production. A list of users has been assembled and documented here. In addition JuiceFS has several collaborative projects that integrate with other open source projects, which we have documented here. If you are also using JuiceFS, please feel free to let us know, and you are welcome to share your specific experience with everyone.

The storage format is stable, and will be supported by all future releases.

Roadmap

  • User and group quotas
  • Snapshots
  • Write once read many (WORM)

Reporting Issues

We use GitHub Issues to track community reported issues. You can also contact the community for any questions.

Contributing

Thank you for your contribution! Please refer to the JuiceFS Contributing Guide for more information.

Community

Welcome to join the Discussions and the Slack channel to connect with JuiceFS team members and other users.

Usage Tracking

JuiceFS collects anonymous usage data by default to help us better understand how the community is using JuiceFS. Only core metrics (e.g. version number) will be reported, and user data and any other sensitive data will not be included. The related code can be viewed here.

You could also disable reporting easily by command line option --no-usage-report:

juicefs mount --no-usage-report

License

JuiceFS is open-sourced under Apache License 2.0, see LICENSE.

Credits

The design of JuiceFS was inspired by Google File System, HDFS and MooseFS. Thanks for their great work!

FAQ

Why doesn't JuiceFS support XXX Object Storage?

JuiceFS supports many Object Storage services. Please check out this list first. If the Object Storage you want to use is compatible with S3, you could treat it as S3. Otherwise, try reporting any issue.

Can I use Redis Cluster as metadata engine?

Yes. Since v1.0.0 Beta3 JuiceFS supports the use of Redis Cluster as the metadata engine, but it should be noted that Redis Cluster requires that the keys of all operations in a transaction must be in the same hash slot, so a JuiceFS file system can only use one hash slot.

See "Redis Best Practices" for more information.

What's the difference between JuiceFS and XXX?

See "Comparison with Others" for more information.

For more FAQs, please see the full list.

Stargazers over time

Star History Chart

juicefs's People

Contributors

201341 avatar aixjing avatar caitinchen avatar chnliyong avatar davies avatar dependabot[bot] avatar eryugey avatar hexilee avatar jiefenghuang avatar joyliuc avatar kyungwan-nam avatar polyrabbit avatar rayw000 avatar sandyxsd avatar sanwan avatar showjason avatar solracsf avatar suave avatar suzaku avatar tangyoupeng avatar timfeirg avatar tonicmuroq avatar xiaogaozi avatar xyb avatar yuhr123 avatar yujunz avatar yunhuichen avatar zhijian-pro avatar zhoucheng361 avatar zwwhdls avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

juicefs's Issues

Handle Redis NOSCRIPT error

What would you like to be added:

Scripts loaded in Redis scripts cache might be removed by the SCRIPT FLUSH command. In such case, a NOSCRIPT error is returned.
This error should be handled and the script should be reloaded.

Why is this needed:

We don't want a SCRIPT FLUSH to break our client.

Use Github Actions instead of Travis for CI/CD.

What would you like to be added:

Use Github Actions instead of Travis for CI/CD.

Why is this needed:

The ecosystem of Github Actions is much bigger than that of Travis, there are tons of pre-made actions ready to be reused.

Add a tool to do benchmark

What would you like to be added:

Add a tool to do benchmark the mounted JuiceFS

Why is this needed:

After a volume is mounted, we would like to how it is performing.

dial tcp: too many colons in address (IPv6)

I noticed this in my logs from time to time. It doesn't seem to cause big problems, but this error looks curious to me:

2021/01/13 20:44:57.482340 juicefs[81861] <WARNING>: upload chunks/0/12/12735_2_16777216: RequestError: send request failed
caused by: Put "https://sana-store.s3.fr-par.scw.cloud/seedbox/chunks/0/12/12735_2_16777216": dial tcp: address 2001:bc8:1002::30:443: too many colons in address (try 1)
2021/01/13 20:46:37.409525 juicefs[81861] <WARNING>: upload chunks/0/12/12829_3_16777216: RequestError: send request failed
caused by: Put "https://sana-store.s3.fr-par.scw.cloud/seedbox/chunks/0/12/12829_3_16777216": dial tcp: address 2001:bc8:1002::30:443: too many colons in address (try 1)
2021/01/13 21:09:19.104197 juicefs[81861] <WARNING>: upload chunks/0/13/13873_1_16777216: RequestError: send request failed
caused by: Put "https://sana-store.s3.fr-par.scw.cloud/seedbox/chunks/0/13/13873_1_16777216": dial tcp: address 2001:bc8:1002::30:443: too many colons in address (try 1)
2021/01/13 21:22:25.961086 juicefs[81861] <WARNING>: upload chunks/0/14/14410_1_16777216: RequestError: send request failed
caused by: Put "https://sana-store.s3.fr-par.scw.cloud/seedbox/chunks/0/14/14410_1_16777216": dial tcp: address 2001:bc8:1002::30:443: too many colons in address (try 1)
2021/01/13 21:59:58.166127 juicefs[81861] <WARNING>: upload chunks/0/15/15977_3_16777216: RequestError: send request failed
caused by: Put "https://sana-store.s3.fr-par.scw.cloud/seedbox/chunks/0/15/15977_3_16777216": dial tcp: address 2001:bc8:1002::30:443: too many colons in address (try 1)

I'm using Scaleway, the endpoint has these addresses:

➜  ~ dog s3.fr-par.scw.cloud
A s3.fr-par.scw.cloud. 22h51m22s   62.210.134.176
➜  ~ dog s3.fr-par.scw.cloud AAAA
AAAA s3.fr-par.scw.cloud. 23h59m21s   2001:bc8:1002::30

Maybe it uses IPv4 most of the time and fails on IPv6?

Let me know if I can provide more info.

Environment:

  • JuiceFS version (use ./juicefs --version): juicefs version 0.9.1-24 (2021-01-13 3dc45dc)
  • Cloud provider or hardware configuration running JuiceFS: dedicated server
  • OS (e.g: cat /etc/os-release): Debian 10
  • Kernel (e.g. uname -a): Linux 4.19.0-13-amd64 #1 SMP Debian 4.19.160-2 (2020-11-28)
  • Object storage (cloud provider and region): Scaleway fr-par (https://s3.fr-par.scw.cloud)
  • Redis info (version, cloud provider managed or self maintained): v5.0.3, self managed, installed from deb repo
  • Network connectivity (JuiceFS to Redis, JuiceFS to object storage): local redis, 1 Gbps to object storage

Support directories with millions of files.

What would you like to be added:

Currently, we fetch the attributes of files in single directory with single batch request to Redis, that could be slow or fail, and block other requests.

We can split those into small batches, for example, 1000 per batch.

Why is this needed:

The number of files could be millions, we don't want people be bited by that.

Backlog

  • Call MGET with small batches #110
  • Use HSCAN instead of HGETALL #128

IPv6 cannot assign requested address

What happened:
I don't have active IPv6 link and no ipv6 is configured on any inteface.
I have this error log:

2021/01/23 19:12:06.391177 juicefs[15700] <WARNING>: upload chunks/0/0/1_0_4096: RequestError: send request failed
caused by: Put "https://dvc-juicefs-test.s3.fr-par.scw.cloud/test/chunks/0/0/1_0_4096": dial tcp [2001:bc8:1002::30]:443: connect: cannot assign requested address (try 1)

What you expected to happen:
use ipv4 when no ipv6 is available

Environment:

  • JuiceFS version (use ./juicefs --version): juicefs version dev (now HEAD)
  • Cloud provider or hardware configuration running JuiceFS: dedicated server
  • OS (e.g: cat /etc/os-release): Ubuntu 20.04
  • Kernel (e.g. uname -a): Linux 5.4.0-64-generic
  • Object storage (cloud provider and region): Scaleway fr-par
  • Redis info (version, cloud provider managed or self maintained): Simple docker redis container
  • Network connectivity (JuiceFS to Redis, JuiceFS to object storage): localhost

In daemon mode, the log should be output to syslog by default

What would you like to be added:

In non-daemon mode, log output is always to stderr, while in daemon mode, log output is to Syslog by default, unless --nosyslog is specified

Why is this needed:
I'm currently annoyed by the fact that the log output to Syslog is only available when the -quiet parameter is added.

Comprehensive User Guide

What would you like to be added:

Add a user guide including all the details of command line argument, especially for the object store.

Why is this needed:

People may find it difficult to connect JuiceFS with some object store, for example, Ceph, we should have a guide on how to specify the arguments for every object store we support.

Backlog

JuiceFS CSI plugin for Kubernetes

What would you like to be added:
CSI plugin for JuiceFS

Why is this needed:
As more and more applications are running on Kubernetes, CSI plugin is the de facto form to use third party storage like JuiceFS.

Performance: slow metadata service

What happened:
copying linux kernel tree show few KB/sec throughput at best, mdtest showed 5 - 10 transactions per second
What you expected to happen:
10MB/sec throughput for similar class file systems for linux kernel tree copy, 1000 - 10000 transaction per second based on redis performance
How to reproduce it (as minimally and precisely as possible):
on decent/non-virtual AWS EC2 instance setup/run redis, 22ms latency away run juiceFS mounted as a local directory, copy a recent linux kernel tree using {rsync, cp, midnight commander, ...}
Anything else we need to know?:
Great project/undertaking!
When the mounted directory is much 'closer' to redis, the metadata service exhibits fair behaviour. IO ops for larger files are OK, and for multi GB files JuiceFS has excellent performance; although this has not been verified for multi host mixed read/write scenerio.
Environment:

  • JuiceFS version (use ./juicefs --version): 0.9.3-211 (2021-01-28 9cdfa8a)
  • Cloud provider or hardware configuration running JuiceFS: a1.2xlarge
  • OS (e.g: cat /etc/os-release): ubuntu 20.04
  • Kernel (e.g. uname -a): 0.9.3-211 (2021-01-28 9cdfa8a)
  • Object storage (cloud provider and region): aws s3 us-east-1
  • Redis info (version, cloud provider managed or self maintained): 6.0.10
  • Network connectivity (JuiceFS to Redis, JuiceFS to object storage): 500Mbit/sec bell fiber to redis, same for s3
  • Others:

Add a tool to analyze accesslog

What would you like to be added:

Add a tool to read access log and generate realtime metrics for current workload, similar to top.

Why is this needed:

To understand the internal activities for current workload.

Measure code coverage by test

What would you like to be added:

Measure code coverage by test.

Why is this needed:

Code coverage is not perfect, but it's a reasonably good metric for quality.

Enable Redis client cache

What would you like to be added:

We can enable the redis client cache by command options, which could be useful for readonly workload.

Why is this needed:

Faster is always good.

Hide secrets from logging

What would you like to be added:

We should remove the secret from format when logging it.

Why is this needed:

Secrets shouldn't be logged.

tracking deletion of chunks

What would you like to be added:

Added a list or queue to tracking not-used chunks, so we can retry-deletion if something failed.

Why is this needed:

So make sure that no object is leaked.

Report input/output error when start a rand write io test

What happened:

fio report:

big-file-multi-write: (groupid=0, jobs=1): err= 5 (file:io_u.c:1756, func=io_u error, error=Input/output error): pid=15850: Tue Jan 12 18:03:50 2021

juicefs report:

2021/01/12 18:03:25.169256 juicefs[6976] <WARNING>: compact 27806 0 with 20 slices: message 1001 is not supported
2021/01/12 18:03:25.216898 juicefs[6976] <WARNING>: compact 27806 1 with 20 slices: message 1001 is not supported
2021/01/12 18:03:25.379931 juicefs[6976] <ERROR>: error: redis: transaction failed
2021/01/12 18:03:25.380029 juicefs[6976] <WARNING>: write inode:27806 error: input/output error
2021/01/12 18:03:25.380042 juicefs[6976] <ERROR>: write inode:27806 indx:1  input/output error

What you expected to happen:

fio and juicefs have no error report.

How to reproduce it (as minimally and precisely as possible):

Start a rand write on juicefs:

 fio --name=big-file-multi-write  --rw=randwrite --refill_buffers --bs=4k --size=100M --numjobs=1 --end_fsync=1

Anything else we need to know?:

Environment:

  • JuiceFS version (use ./juicefs --version): juicefs version 0.9.1 (2021-01-10T16:31:23Z 1b9f6f4)
  • Cloud provider or hardware configuration running JuiceFS: Aliyun, 8core, 16GiB mem
  • OS (e.g: cat /etc/os-release): Ubuntu 18.04.3
  • Kernel (e.g. uname -a): 4.15.0-66-generic
  • Object storage (cloud provider and region): Aliyun OSS
  • Redis info (version, cloud provider managed or self maintained): Aliyun RDS Redis 5.x
  • Network connectivity (JuiceFS to Redis, JuiceFS to object storage): Aliyun VPC
  • Others:

Fix problems found by goreport

What would you like to be added:

Fix some of the problem found by goreport

Why is this needed:

goreport can help us improve code quality.

Can not see the files in Finder if xattr enabled.

What happened:
Can not see any file in Finder on macOS unless --enable-xattr=false.

What you expected to happen:
View and access files using Finder.

How to reproduce it (as minimally and precisely as possible):

./juicefs format localhost test
sudo ./juicefs mount --enable-xattr=true localhost ~/jfs
cp test.jpg ~/jfs
echo 'hello' > ~/jfs/test.txt
mkdir ~/jfs/docs
ls ~/jfs
open ~/jfs

All files and directory is listed in terminal but only the docs directory is displayed in the Finder.

Anything else we need to know?:
none

Environment:

  • JuiceFS version (use ./juicefs --version): 0.9.3-5 (2021-01-19 18baa89)
  • Cloud provider or hardware configuration running JuiceFS: MacBook Air (13-inch, 2017)
  • OS (e.g: cat /etc/os-release): macOS Catalina 10.15.7
  • Kernel (e.g. uname -a): Darwin Kernel Version 19.6.0
  • Object storage (cloud provider and region): none
  • Redis info (version, cloud provider managed or self maintained): Redis 6.0.5 (00000000/0) 64 bit
  • Network connectivity (JuiceFS to Redis, JuiceFS to object storage): localhost
  • Others:
    • macFUSE 4.0.5

not able to use google storage due to possible parsing bug when using "juicefs format"

Hello,

I try to make use of a google cloud storage bucket called juicefs

I'm having problems providing the right command line syntax.
From juicefs format -h and from looking at the source code aswell as the documentation it seems I need to provide the following options:

  • bucket
  • accesskey
  • secretkey
  • redisserver
  • directoryname
  • Maybe a region?

In google-language a bucket is referenced like this:
gs://juicefs

Entering --bucket gs://juicefs results in the following runtime error:

 ./juicefs format --storage gs --bucket gs://juicefs --accesskey Gxxxxxxxxxxxxxxxxxx --secretkey Oxxxxxxxxxxxxxxxxxxx redis://redis-master:6379/4 test3


2021/01/15 13:24:39.231337 juicefs[3546] <INFO>: Meta address: redis://redis-master:6379/4
panic: runtime error: index out of range [1] with length 1

goroutine 1 [running]:
github.com/juicedata/juicesync/object.newGS(0x7fffedb1a878, 0xc, 0x7fffedb1a891, 0x13, 0x7fffedb1a8b1, 0x14, 0x2, 0x0, 0x2c, 0xc000226a50)
        /go/pkg/mod/github.com/juicedata/[email protected]/object/gs.go:145 +0x3e6
github.com/juicedata/juicesync/object.CreateStorage(0x7fffedb1a86c, 0x2, 0x7fffedb1a878, 0xc, 0x7fffedb1a891, 0x13, 0x7fffedb1a8b1, 0x14, 0x2c, 0xc000226a50, ...)
        /go/pkg/mod/github.com/juicedata/[email protected]/object/object_storage.go:123 +0x1ad
github.com/juicedata/juicefs/pkg/object.CreateStorage(0x7fffedb1a86c, 0x2, 0x7fffedb1a878, 0xc, 0x7fffedb1a891, 0x13, 0x7fffedb1a8b1, 0x14, 0xc00069f8f8, 0x45f8c7, ...)
        /go/src/github.com/juicedata/juicefs/pkg/object/interface.go:57 +0x26d
main.createStorage(0xc00069fb60, 0x1588442, 0x8, 0xc000226a20, 0x24)
        /go/src/github.com/juicedata/juicefs/cmd/format.go:55 +0xa5
main.format(0xc00011c100, 0x8, 0xe)
        /go/src/github.com/juicedata/juicefs/cmd/format.go:165 +0x8fe
github.com/urfave/cli/v2.(*Command).Run(0xc00059ad80, 0xc0000cbd80, 0x0, 0x0)
        /go/pkg/mod/github.com/urfave/cli/[email protected]/command.go:163 +0x4ed
github.com/urfave/cli/v2.(*App).RunContext(0xc0004c91e0, 0x178d400, 0xc000044080, 0xc00003c0c0, 0xc, 0xc, 0x0, 0x0)
        /go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:313 +0x81f
github.com/urfave/cli/v2.(*App).Run(...)
        /go/pkg/mod/github.com/urfave/cli/[email protected]/app.go:224
main.main()
        /go/src/github.com/juicedata/juicefs/cmd/main.go:78 +0x99c
root@juice-user-deployment-55c7f57c98-wnx96:/home#

Entering --bucket gs://juicefs. (dot at the end) lets the command run.

So after some trial-and-error I came up with a cli-command, that seems to work.

Command:

./juicefs format --storage gs --bucket gs://juicefs. --accesskey Gxxxxxxxxxxxxxxxxxx --secretkey Oxxxxxxxxxxxxxxxxxxx redis://redis-master:6379/5 test3 --force

Output:

2021/01/15 13:17:27.406060 juicefs[3468] <INFO>: Meta address: redis://redis-master:6379/5
2021/01/15 13:17:27.412904 juicefs[3468] <INFO>: Data uses gs://juicefs/test3/
2021/01/15 13:17:27.543149 juicefs[3468] <INFO>: Volume is formatted as {Name:test3 UUID:31777d36-1fee-4ae1-aeec-81399dc19289 Storage:gs Bucket:gs://juicefs. AccessKey:Gxxxxxxxxxxxxxxxxxx SecretKey:removed BlockSize:4096 Compression:lz4 Partitions:0}

Is this the right way to use the cli?
After mounting the volume I see there's new inodes created, but nothing gets synced to the cloud.

command:

mkdir juice
./juicefs mount redis://redis-master:6379/5 juice

output:

2021/01/15 13:21:38.852377 juicefs[3508] <INFO>: Meta address: redis://redis-master:6379/5
2021/01/15 13:21:38.859770 juicefs[3508] <INFO>: Data use gs://juicefs/test3/
2021/01/15 13:21:38.859798 juicefs[3508] <INFO>: mount volume test3 at juice
2021/01/15 13:21:38.859817 juicefs[3508] <INFO>: Cache: /var/jfsCache capacity: 1024 MB

Question: how does JuiceFS interact with S3?

Is it possible to get some more info on how JuiceFS interacts with S3?

Primarily, I'm interested in when a read and write occurs to S3 vs Redis?

Some background… whilst S3 is an amazing object store, it's per request costs ($5/million writes and $0.40 per million reads) can be extremely expensive if you're dealing with lots of tiny objects.

With regards to the chunking/slicing that JuiceFS performs, does this mean writing a lot of small files at once results in only a few S3 put operations and reading them back in write-order would result in only a few reads?

Thanks in advance and I'm excited to keep an eye on this project. 👍

Check settings on Redis

What would you like to be added:

After connected to Redis, check the following settings on redis:

  1. AOF is ON
  2. RDB is ON
  3. no in cluster mode (cluster_enabled is 0)
  4. maxmemory_policy is set to noeviction
  5. show a warning if it's not replicated
  6. version should be >= 2.2

Why is this needed:

Redis is responsible for persistence of metadata, so it should NOT lose any data, otherwise the data in JuiceFS will be lost.

Refactor this repo to present a well documented Go library

What would you like to be added:

Refactor this repo to present a well documented Go library. This library should provide interfaces similar to os.Open so that we can use JuiceFS without actually mounting it.

Why is this needed:

  1. We can't easily mount a filesystem in some environments (eg. testing / serverless)
  2. The Go implementation is already here in this repo

JuiceFS can't working on redis-enterprise

root@ubuntu:/# ./juicefs --debug format --storage=s3 --bucket=https://juicefs-peter-test.s3.us-east-1.amazonaws.com     --access-key={} --secret-key={}   redis-enterprise.default.svc.cluster.local:8001 data
2021/01/21 08:21:50.811888 juicefs[3018] <INFO>: Meta address: redis://redis-enterprise.default.svc.cluster.local:8001
2021/01/21 08:21:50.816244 juicefs[3018] <WARNING>: parse info: ERR command not found
2021/01/21 08:21:50.816410 juicefs[3018] <FATAL>: Meta is not available: create session: ERR command not found
root@ubuntu:/# redis-cli
Could not connect to Redis at 127.0.0.1:6379: Connection refused
not connected>
root@ubuntu:/# redis-cli -h redis-enterprise.default.svc.cluster.local -p 8001
redis-enterprise.default.svc.cluster.local:8001> ping
PONG
redis-enterprise.default.svc.cluster.local:8001>

Test it with xfstests

What would you like to be added:

Run integrity test using xfstests

Why is this needed:

We'd like to know more on the compatibility

Crashed after mount

What happened:
执行挂载命令./juicefs --trace mount localhost /root/jfs

fatal error: unexpected signal during runtime execution
[signal SIGBUS: bus error code=0x2 addr=0x1bbdc44 pc=0x468d19]

runtime stack:
runtime.throw(0x15f4d1b, 0x2a)
        /usr/local/go/src/runtime/panic.go:1116 +0x72
runtime.sigpanic()
        /usr/local/go/src/runtime/signal_unix.go:704 +0x4ac
runtime.gentraceback(0xffffffffffffffff, 0xffffffffffffffff, 0x0, 0xc000000300, 0x0, 0x0, 0x7fffffff, 0x1617e68, 0x7f8b05ffa9a8, 0x0, ...)
        /usr/local/go/src/runtime/traceback.go:189 +0x2b9
runtime.copystack(0xc000000300, 0x40000)
        /usr/local/go/src/runtime/stack.go:910 +0x287
runtime.shrinkstack(0xc000000300)
        /usr/local/go/src/runtime/stack.go:1178 +0x13d
runtime.scanstack(0xc000000300, 0xc00005ae98)
        /usr/local/go/src/runtime/mgcmark.go:815 +0x56e
runtime.markroot.func1()
        /usr/local/go/src/runtime/mgcmark.go:245 +0xc6
runtime.markroot(0xc00005ae98, 0x14)
        /usr/local/go/src/runtime/mgcmark.go:218 +0x310
runtime.gcDrain(0xc00005ae98, 0x7)
        /usr/local/go/src/runtime/mgcmark.go:1109 +0x118
runtime.gcBgMarkWorker.func2()
        /usr/local/go/src/runtime/mgc.go:1981 +0x177
runtime.systemstack(0xc000102900)
        /usr/local/go/src/runtime/asm_amd64.s:370 +0x66
runtime.mstart()
        /usr/local/go/src/runtime/proc.go:1116

goroutine 7 [GC worker (idle)]:
runtime.systemstack_switch()
        /usr/local/go/src/runtime/asm_amd64.s:330 fp=0xc0000d0f60 sp=0xc0000d0f58 pc=0x479ba0
runtime.gcBgMarkWorker(0xc000059800)
        /usr/local/go/src/runtime/mgc.go:1945 +0x1be fp=0xc0000d0fd8 sp=0xc0000d0f60 pc=0x428d1e
runtime.goexit()
        /usr/local/go/src/runtime/asm_amd64.s:1374 +0x1 fp=0xc0000d0fe0 sp=0xc0000d0fd8 pc=0x47b981
created by runtime.gcBgMarkStartWorkers
        /usr/local/go/src/runtime/mgc.go:1839 +0x77

goroutine 1 [GC assist marking (scan), locked to thread]:
bytes.makeSlice(0x200, 0x0, 0x0, 0x0)
        /usr/local/go/src/bytes/buffer.go:229 +0x73
bytes.(*Buffer).grow(0xc0006479b0, 0x200, 0x0)
        /usr/local/go/src/bytes/buffer.go:142 +0x156
bytes.(*Buffer).Grow(...)
        /usr/local/go/src/bytes/buffer.go:161
io/ioutil.readAll(0x17a80e0, 0xc000491600, 0x200, 0x0, 0x0, 0x0, 0x0, 0x0)
        /usr/local/go/src/io/ioutil/ioutil.go:34 +0xa5
io/ioutil.ReadAll(...)
        /usr/local/go/src/io/ioutil/ioutil.go:45
google.golang.org/protobuf/internal/impl.legacyLoadFileDesc(0x218d6c0, 0x181, 0x181, 0x1, 0x0)
        /root/go/pkg/mod/google.golang.org/[email protected]/internal/impl/legacy_file.go:54 +0x178
google.golang.org/protobuf/internal/impl.legacyLoadMessageDesc(0x17e4260, 0x142d4a0, 0x15f55a5, 0x2b, 0x0, 0x0)
        /root/go/pkg/mod/google.golang.org/[email protected]/internal/impl/legacy_message.go:131 +0x357
google.golang.org/protobuf/internal/impl.legacyLoadMessageInfo(0x17e4260, 0x142d4a0, 0x15f55a5, 0x2b, 0x0)
        /root/go/pkg/mod/google.golang.org/[email protected]/internal/impl/legacy_message.go:48 +0xbd
google.golang.org/protobuf/internal/impl.Export.LegacyMessageTypeOf(0x17c0d60, 0x0, 0x15f55a5, 0x2b, 0x0, 0x0)
        /root/go/pkg/mod/google.golang.org/[email protected]/internal/impl/legacy_export.go:33 +0xa5
github.com/golang/protobuf/proto.RegisterType(0x17c0d60, 0x0, 0x15f55a5, 0x2b)
        /root/go/pkg/mod/github.com/golang/[email protected]/proto/registry.go:186 +0x4d
github.com/colinmarc/hdfs/v2/internal/protocol/hadoop_common.init.22()
        /root/go/pkg/mod/github.com/colinmarc/hdfs/[email protected]/internal/protocol/hadoop_common/TraceAdmin.pb.go:160 +0x4f

goroutine 18 [sleep]:
time.Sleep(0x8bb2c97000)
        /usr/local/go/src/runtime/time.go:188 +0xbf
github.com/juicedata/juicefs/pkg/utils.init.0.func1()
        /mywork/juicefs/pkg/utils/alloc.go:65 +0x30
created by github.com/juicedata/juicefs/pkg/utils.init.0
        /mywork/juicefs/pkg/utils/alloc.go:63 +0x35

goroutine 19 [chan receive]:
github.com/baidubce/bce-sdk-go/util/log.NewLogger.func1(0xc0001163c0)
        /root/go/pkg/mod/github.com/baidubce/[email protected]/util/log/logger.go:362 +0x145
created by github.com/baidubce/bce-sdk-go/util/log.NewLogger
        /root/go/pkg/mod/github.com/baidubce/[email protected]/util/log/logger.go:359 +0xda

goroutine 21 [select]:
go.opencensus.io/stats/view.(*worker).start(0xc00012ca00)
        /root/go/pkg/mod/[email protected]/stats/view/worker.go:154 +0x105
created by go.opencensus.io/stats/view.init.0
        /root/go/pkg/mod/[email protected]/stats/view/worker.go:32 +0x57

What you expected to happen:
正常挂载
How to reproduce it (as minimally and precisely as possible):
执行挂载命令
Anything else we need to know?:

Environment:

  • JuiceFS version (use ./juicefs --version):
  • 0.9.3-22 (2021-01-25 8fcee47)
  • Cloud provider or hardware configuration running JuiceFS:
  • OS (e.g: cat /etc/os-release):
  • NAME="CentOS Linux"
    VERSION="8 (Core)"
  • Kernel (e.g. uname -a):
  • 4.18.0-167.el8.x86_64 #10 SMP Fri Oct 30 14:35:31 CST 2020 x86_64 x86_64 x86_64 GNU/Linux
  • Object storage (cloud provider and region):
  • Redis info (version, cloud provider managed or self maintained):
    Redis server v=5.0.3 sha=00000000:0 malloc=jemalloc-5.1.0 bits=64 build=28849dbea6f07cc8
  • Network connectivity (JuiceFS to Redis, JuiceFS to object storage):
  • Others:

juicefs doesn't remove chunks from S3 after file rewrite

What happened:

juicefs doesn't remove chunks from S3 after file rewrite

What you expected to happen:

juicefs clean up chunks from the previous versions of the file

How to reproduce it (as minimally and precisely as possible):

juicefs mount -d localhost /s3storage
mkdir /s3storage/test/
for i in {1..100}; do dd if=/dev/urandom of=/s3storage/test/testrewrite bs=1M count=1; done

As a result S3 bucket has 106 objects and is 106MB in size

Anything else we need to know?:

Environment:

  • JuiceFS version (use ./juicefs --version): juicefs version dev (now HEAD) (but it's 0.9.3)
  • Cloud provider or hardware configuration running JuiceFS: DO Spaces
  • OS (e.g: cat /etc/os-release): 20.04.1 LTS
  • Kernel (e.g. uname -a): 5.4.0-51-generic
  • Object storage (cloud provider and region): DigitalOcean NYC
  • Redis info (version, cloud provider managed or self maintained): 5.0.7-2
  • Network connectivity (JuiceFS to Redis, JuiceFS to object storage): local Redis

Use sync.map replace mutex

What would you like to be added:

Use sync.map replace mutex, there are so many lock and unlock operations in our code that it's easy to forget that unlock causes deadlocks. Instead of mutex, use sync.map and atomic variable, Take the following code for example.

prefetch.go
mem_cache.go
.....

Why is this needed:

Speed up using Lua script

What would you like to be added:

Currently, A lookup operation will issue two redis requests, we could reduce that to one request using Lua script.

When Lua script is not supported by redis server, we should fallback to current behavior.

Why is this needed:

Lookup() is called so frequently, we want it to be faster.

Metrics for JuiceFS

What would you like to be added:
webui for juicefs
Why is this needed:
it is easier to monitor the file meta info and s3 usage info for people who don't know how to use redis and s3 client.

Helm Chart for K8S deploy

What would you like to be added:
Helm Chart for deploy juicefs in kubernetes

Why is this needed:
In my situation, I want use Chart for easily deploy and upgrade.
If possible, I can do something about that.

XAttr data still exists in meta server after removing the file.

What happened:
The XAttr data are still in redis after removing the file.

What you expected to happen:
The XAttr data of deleted file will also disappear.

How to reproduce it (as minimally and precisely as possible):
After run following commands:

# running in macOS Catalina 
./juicefs format localhost test
sudo ./juicefs mount localhost ~/jfs
cp ~/*.jpg ~/jfs
rm -f ~/jfs/*.jpg

The keys I would like to see in redis should be:

$ redis-cli keys "*" | sort
i1
nextchunk
nextinode
nextsession
sessions
setting
totalInodes
usedSpace

But there is a lot of junk XAttr data:

$ redis-cli keys "*" | sort
i1
nextchunk
nextinode
nextsession
sessions
setting
totalInodes
usedSpace
x2
x3
x4
x5
x6
x7
x8

Anything else we need to know?:

Environment:

  • JuiceFS version (use ./juicefs --version): 0.9.2 (2021-01-15 d0aa162)
  • Cloud provider or hardware configuration running JuiceFS: MacBook Air (13-inch, 2017)
  • OS (e.g: cat /etc/os-release): macOS 10.15.7
  • Kernel (e.g. uname -a): Darwin Kernel Version 19.6.0
  • Object storage (cloud provider and region): none
  • Redis info (version, cloud provider managed or self maintained): Redis 6.0.5 (00000000/0) 64 bit
  • Network connectivity (JuiceFS to Redis, JuiceFS to object storage): localhost
  • Others: none

JuiceFS does not cancel ongoing prefetch requests after file is closed

What happened:
JuiceFS keeps transfering blocks for IO ops has been cancelled. During fio read test of 4 GB transfer I cancelled the fio process at ~500MB was way too slow, JuiceFS process didn't react to cancelling the IO test, instead kept copying the block to S3 endpoint
What you expected to happen:
I expected JuiceFS to stop IO, and reflect the most recent state. Instead JuiceFS has continued the file transfer -- ignoring the cancelled IO request.
How to reproduce it (as minimally and precisely as possible):
format --compress none --force --access-key XXXXXXX --secret-key XXXXXXX --block-size 1024 --storage s3 --bucket=https://xxxxxxxxxx.s3.us-east-1.amazonaws.com REDIS-SEVER benchmark
juicefs mount --max-uploads=150 --io-retries=20 REDIS-SERVER /mnt/aws
fio --name=sequential-read --directory=/mnt/aws --rw=read --refill_buffers --bs=4M --size=4G

Anything else we need to know?:
was done on a Lenovo X1 7th edition, 16GB memory i7-8665U 4 core processor, ethernet hooked up to linux router with 500Mbit/sec symmetric fiber optic internet connection to Bell Canada
Environment:

  • JuiceFS version (use ./juicefs --version): juicefs version 0.9.3-34 (2021-01-26 15db788)

  • Cloud provider or hardware configuration running JuiceFS:

  • OS (e.g: cat /etc/os-release):
    NAME="Linux Mint"
    VERSION="20 (Ulyana)"
    ID=linuxmint
    ID_LIKE=ubuntu
    PRETTY_NAME="Linux Mint 20" VERSION_ID="20" UBUNTU_CODENAME=focal

  • Kernel (e.g. uname -a): Linux io 5.4.0-58-generic #64-Ubuntu SMP Wed Dec 9 08:16:25 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

  • Object storage (cloud provider and region): aws s3 us-east-1

  • Redis info (version, cloud provider managed or self maintained): Redis server v=5.0.7 sha=00000000:0 malloc=jemalloc-5.2.1 bits=64 build=636cde3b5c7a3923

  • Network connectivity (JuiceFS to Redis, JuiceFS to object storage): 500Mbit/sec fiber

  • Others:

Mount JuiceFS using /etc/fstab

What would you like to be added:

Mount JuiceFS using rule defiined in /etc/fstab, for example,

redis_host    /jfs       juicefs     _netdev     0  0

The mount will find /sbin/mount.juicefs, run it as mount.juicefs redis_host /jfs -o _netdev

We cloud translate these arguments into the format it expect, at the beginning of juicefs.

Why is this needed:

We want to mount JuiceFS automatically after machine boot.

Question: Is it required using Redis with persistence enabled?

Since JuiceFS relies on Redis to store the metadata of files. Is it required to enable the persistence features such as RDB, AOF, or both to make sure the metadata won't lose once the Redis server gets restarted?

P.S. I Can't log in to the Slack channel (error: <my-email> doesn’t have an account on this workspace.) so I posted my question here.

Support path-style URL for S3 (or S3-compatible) storage

What would you like to be added:

Support path-style URL for S3 (or S3-compatible) storage

Why is this needed:

Currently, JuiceFS only support virtual hosted-style URI. The difference between virtual hosted-style and path-style is:

  • Virtual hosted-style: https://<bucket>.s3.<region>.amazonaws.com
  • Path-style: https://s3.<region>.amazonaws.com/<bucket>

Although AWS would deprecate path-style URI in the future, but some S3-compatible storage still use path-style (e.g. Ceph RGW). So we may need support this type URI.

Backup tool for meta

What would you like to be added:

Provide a tool to dump metadata as JSON format, then we can have other tool to assemble them and get data back.

Why is this needed:

In worsest case, if we lose Redis, we'd like to have a tool to get most of data back from S3.

Fix errors reported by golangci-lint

What would you like to be added:

Fix errors reported by golangci-lint.

Why is this needed:

After fixing all the existing errors, golangci-lint can be added into the CI pipeline and pre-commit hook.

Add golangci-lint as a CI step

What would you like to be added:

Add golangci-lint as a CI step, preferably after #26 is fixed.

Why is this needed:

golangci-lint helps avoid error-prone patterns in code.

Remove Usage Tracking

For commercial products, tracking user data may be an understandable and acceptable thing. But for an open source software... well, I think this will become a very controversial behavior (even if you just collects seemingly harmless data). I don't think people would want the JuiceFS process they run to send any data to endpoints that aren't part of their project.

Therefore, please consider removing any “tracking” behavior. ❤️

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.