containerd / nydus-snapshotter Goto Github PK
View Code? Open in Web Editor NEWA containerd snapshotter with data deduplication and lazy loading in P2P fashion
Home Page: https://nydus.dev/
License: Apache License 2.0
A containerd snapshotter with data deduplication and lazy loading in P2P fashion
Home Page: https://nydus.dev/
License: Apache License 2.0
How about adding a subcommand "containerd-nydus-grpc config" to generate configuration file for containerd-nydus-grpc?
It may get authentication information from docker/nerdctl configuration.
See point 3 at dragonflyoss/nydus#716
Previously, the commit erofs: basic support for erofs + fscache daemon
supported erofs with fscache daemon.
Since the shared domain feature is not be implemented in Linux Kernel, the config field DomainID
is unused.
The cfg field "DomainID" is not used since the shared domain feature is not implemented in Linux Kernel.
After Kernel implements this feature, nydus-snapshotter should adapt the DomainID
field.
Currently, nydus image puts an annotation to the manifest to track all referenced blobs in bootstrap: nydus
This will cause the label kv size limitation to be exceeded in containerd when acceld/buildkit write a nydus manifest included a large number of blobs into the content store: containerd
I noticed that the blobs annotation only be used for blob cache gc in nydus-snapshotter: nydus-snapshotter
A feasible workaround is to use nydusd sock API to get the blobs in use, instead of using the list in manifest annotation, so we can remove the annotation from acceld/buildkit.
systemctl restart nydus-snapshotter
Then nydusd is terminated ending up with bash: /bin/ls: Transport endpoint is not connected
There is no need to run twice there.
So that we can be compatible with OCI image design for container ecosystem, for example implement lazy load for nydus image in buildkit.
Instead of hardcoding it or just relying on a fixed parameter.
When a container image is not used or referenced by a container, nydus-snapshotter is responsible to do GC for the blobcache files. But restarting snapshotter ends in states missing and no blobcache files GS is performed.
As a e2e test case comment says:
# After the snapshotter container is stopped, it seems that Nydusd doesn't umount it
# so we need to umount it here, otherwise you cannot delete this directory.
# Frankly, I don't know why Nydusd didn't clean up these resources.
sudo umount -f /var/lib/containerd-test/io.containerd.snapshotter.v1.nydus/mnt
The reason is that nydus-snapshotter always forks a nydusd when starts but exits leaving the nydusd unsignaled.
In fact, we can't just terminate sharedMode nydusd when handling SIGINT and SIGTERM since it can still serve container images while restarting nydus-snapshotter. But we call terminate it when nydusd when snapshotter is aware that no container image is being served.
Currently, nydus snapshotter only supports fetching private registry auth creds from docker config.json. But in k8s world, there is usually a secret of type kubernetes.io/dockerconfigjson
, we can also fetch the creds from the secret by providing a kubeconfig to snapshotter.
So that users can easily deploy and run it.
When nydusd
is down unexpectedly, IsLikelyNotMountPoint
can not run expectedly, as stat fuse mountpoint will failed and the fuse mountpoint can not cleanup when image is removed. So I think IsLikelyNotMountPoint use command findmnt
maybe more suitable.
here can refer to this,
func (m *Mounter) IsLikelyNotMountPoint(file string) (bool, error) {
file, err := filepath.Abs(file)
if err != nil {
return true, err
}
cmdPath, err := exec.LookPath("findmnt")
if err != nil {
// no findmnt found, judge moutpoint by device
log.L.Printf("no findmnt command found, use device to judge")
return m.isLikelyNotMountPoint(file)
}
args := []string{"--types", "fuse", "-o", "target", "--noheadings", "--target", file}
log.L.Printf("findmnt command: %v %v", cmdPath, args)
out, err := exec.Command(cmdPath, args...).CombinedOutput()
if err != nil {
// if findmnt didn`t return, just claim it's not a mount point
return true, err
}
strOut := strings.TrimSuffix(string(out), "\n")
log.L.Printf("IsLikelyNotMountPoint findmnt output: %v", strOut)
if strOut == file {
return false, nil
}
return true, nil
}
func (m *Mounter) isLikelyNotMountPoint(file string) (bool, error) {
stat, err := os.Stat(file)
if err != nil {
return true, err
}
rootStat, err := os.Stat(filepath.Dir(strings.TrimSuffix(file, "/")))
if err != nil {
return true, err
}
// If the directory has a different device as parent, then it is a mountpoint.
if stat.Sys().(*syscall.Stat_t).Dev != rootStat.Sys().(*syscall.Stat_t).Dev {
return false, nil
}
return true, nil
}
For private images, currently, our auth needs to be manually filled into the config template, this experience is very poor, and we need to support getting from the docker auth config to populate.
Config file generated by snapshotter given to nydusd looks so strange. It looks a mixture of fusedev/rafs and fscache.
Originally, the config file is only configuring rafs not nydusd.
{
"device": {
"backend": {
"type": "registry",
"config": {
"readahead": false,
"host": "xxx.com",
"repo": "foor/bar",
"auth": "<AUTH>",
"scheme": "https",
"proxy": {
"fallback": false
},
"timeout": 5,
"connect_timeout": 5,
"retry_limit": 2
}
},
"cache": {
"type": "blobcache",
"config": {
"work_dir": "/var/lib/containerd-nydus-grpc/cache",
"disable_indexed_map": false
}
}
},
"mode": "direct",
"digest_validate": false,
"enable_xattr": true,
"fs_prefetch": {
"enable": true,
"prefetch_all": true,
"threads_count": 4,
"merging_size": 0,
"bandwidth_rate": 0
},
"type": "",
"id": "",
"domain_id": "",
"config": {
"id": "",
"backend_type": "",
"backend_config": {
"readahead": false,
"proxy": {
"fallback": false
}
},
"cache_type": "",
"cache_config": {
"work_dir": ""
},
"metadata_path": ""
}
}
Perform removing container images by nerdctl image prune --all
, only data blob files are deleted while meta files are residual.
Sometimes we saw mount failure as
failed to mount daemon AsQl4TceS5aaKIsu33yd6g: failed to shared mount: http response: 503, error code: , error message:
No error code no error message, just 503 status code.
Let nydus-snasphotter report more accurate disk usage.
When performing ctr snapshot --snapshotter nydus usage
, the total usage of all layers is not accurate.
Nydus supports several different types of storage backend, "localfs", "oss" and "registry".
Since users may use different backend for different images in order to meet different scenarios, it'd better to have nydus-snapshotter support multiple nydus configuration in one shot.
With that being said, how to choose a specific nydus config is not a straightforward thing as it depends on a container image's metadata.
Maybe we can leave an annotation in bootstrap.
When runtime starts nydusd, it has to provide necessary information to nydusd like configuration or auth.
Current Mount struct does not accommodate such information. So nydus has to provide a mount helper binary on host.
Another method to address this to pass more necessary information to runtime via Mount struct.
At present, our snapshot will download the image through the go standard http client, if the http download fails and does not do a retry, ideally we should do a retry internally, we can replace the standard http client with go-retryablehttp
Unlink fusedev mode, nydusd has a logging file to store messages. We'd better keep this consistent with fusedev mode.
Otherwise it is hard for us to maintain and investigate.
If the nydus-snapshotter has been forcibly terminated, there may be intermediate file left on disk, thus cause inconsistent system state. So we should use temp file/rename to ensure atomic file operations.
The current CI is missing e2e tests. We should run end-to-end tests to make sure change works.
Currently the downloading of layers (including the bootstrap layer) is done entirely by snapshotter, not containerd, so we need to consider these:
Maybe we can support the CRI like configuration like this:
Use reflink to optimize file copy operation, so example to copy the stargz blob.meta file.
Now it's set to a fixed value of 10
:
nydus-snapshotter/pkg/process/manager.go
Lines 133 to 138 in 672434e
In my environment, nydus-snapshotter's log size has reached 80GB. We should rotate it when it exceeds a certain size.
So nydus snapshotter can be started as systemd based service. Users don't need to restart it after rebooting
If nydus-snapshotter is ever started in multiple or single daemon mode, it can't change the daemon mode from the last startup set.
It is not very friendly to be used.
At present, nydus-snapshotter is configured by its command line parameters some of which are passed to the nydusd daemon.
At the same time, users have to provide a nydusd JSON configuration template. It's a minimal version of nydusd JSON configuration which will be enriched by nydus-snapshotter with necessary extra information like registry auth, etc. It is not very friendly to end-user especially since some items in the JSON fill might be overwritten by nydus-snapshotter.
On the other hand, nydusd's configuration file is going to evolve to its next version, which means nydus-snapshooter's configuration loading and parsing logic has to adapt it. And we don't have to change systemd service unit file when we want to change nydus-snapshotter's work mode and parameters.
I am proposing a TOML format nydus-snapshotter configuration file:
cleanup_on_close = false
enable_stargz = false
root = "/var/lib/containerd-nydus"
version = 1
[binaries]
nydusd_path = "/usr/local/bin/nydusd"
nydusimage_path = "/usr/local/bin/nydus-image"
[log]
# Snapshotter's log level
level = "info"
log_rotate_compress = true
log_rotate_local_time = true
log_rotate_max_age =
log_rotate_max_backups =
log_rotate_max_size =
log_to_stdout = false
[system]
collect_metrics = false
# Management API server unix domain socket path
socket =
[remote.auth]
enable_kubeconfig_keychain = false
kubeconfig_path = "/home/foo/.kube"
[snapshot]
enable_nydus_overlayfs = false
sync_remove = false
[daemon]
# fuse or fscache
fs_drvier = "fuse"
# Specify nydusd log level
log_level = "info"
# How to process when daemon dies: "none", "restart", "failover"
recover_policy = "restart"
# Speicfy a configuration templiate file
template_path = ""
# configuration of remote backend storage. fuse and fscache
# can share the same backend configuration.
[daemon.storage]
connect_timeout = 5
# NOTE: mirrors and proxy can't be set at the same time
mirrors = [{host = , headers = , auth_though = }]
# proxy =
disable_indexed_map = false
# container images data can be cached locally
enable_cache = true
prefetch_config = {enable = true, threads_count = 8, merging_size = 1048576}
retry_limit = 2
scheme = "https"
timeout = 5
type = "registry"
[daemon.fuse]
# loading rafs metadata mode
digest_validate = false
enable_xattr = true
iostats_files = false
mode = "direct"
# Nydusd works as a fscache/cachefiles userspace daemon
[daemon.fscache]
conig = {cache_tpye = "fscache"}
type = "bootstrap"
[cache_manager]
enable = true
gd_period = "24h"
[image]
public_key_file = "/path/to/key/file"
validate_signature = true
As containerd config default
produce containerd.toml version 2 for latest containerd, so README.md
should include how to set nydus working env for containerd.toml version 2
Now users can download nydus-snapshotter from image-service, but nydus-snapshotter repo has not any release yet.
Related to kata-containers/tests#4446
Duplicate one here from dragonflyoss/nydus#638
For now View doesn’t work as expected.
This issue is originally opened from dragonflyoss/nydus#239.
As the nydus-snapshotter has migrated here. We also open in issue for it to trace following events.
Nydusd can only be started when preparing the uppmost writable layer, which means pulling images won't bring up nydusd.
It introduces container startup latency.
At present, the log of nydusd is not output to the log file. When panic occurs in Nydusd, the information of the panic will be lost. Therefore, it is necessary to output stdout and stderr to the log file as well.
args = append(args, "--apisock", d.GetAPISock())
args = append(args, "--log-level", d.LogLevel)
if !d.LogToStdout {
args = append(args, "--log-file", d.LogFile())
}
log.L.Infof("start nydus daemon: %s %s", m.nydusdBinaryPath, strings.Join(args, " "))
cmd := exec.Command(m.nydusdBinaryPath, args...)
if d.LogToStdout {
cmd.Stdout = os.Stdout
cmd.Stderr = os.Stdout
}
return cmd, nil
We can wrap the dest writer with bufio to improve the performance on writing blob to containerd content store.
Somehow, nydusd fails in startup. For example, cachefiles
dev is busy. Snapshoter still record it into DB and does not reap the zombie process.
Most errors returned in pkg/nydussdk/client.go don't have a context, so it's hard to determine where the error is returned.
Skip merging when the stargz image only has one layer, we could just copy the layer bootstrap file to image.boot
.
https://github.com/containerd/rust-extensions/tree/main/crates/snapshots
We'd better fill "scheme": "https"
field as default in the daemon config, but still can fallback to "scheme": "http"
by pre-checking the registry server.
Our Nydusd supports the localfs mode to start, in other words, the blob layer is placed on the local file system in advance, but at present, we do not provide the auxiliary ability to put the blob layer in the corresponding directory, which requires us manually from the registry download the blob layer and extract it to the corresponding directory. Obviously, this is more complicated to operate and maintain. We can support this scenario through nydus-snapshotter, which downloads the blob layer from the registry through snapshotter and puts it into the blob cache directory configured by Nydusd.
Support to pack container writable layer into nydus blob format, and generate a new image in containerd content store.
The snapshotter works as the server of CRI image service and proxy the request to contained, so that we can get the auth of the private registry.
Snapshotter is better to have a mechanism to monitor nydusds. If nydusd is dead somehow, the nydus-snapshotter should be notified.
Nydus-snapshotter's option config-path is required now, where actually only the registry/OSS auth has to be passed to nydusd.
But nydus-snapshotter now can take in auth from local host's docker configuration. It means nydus-snapshotter can make up a comprehensive json configuration file for nydusd itself. So end users can skip the configuration step. It is convenient.
&cli.StringFlag{
Name: "config-path",
Required: true,
Usage: "path to the configuration file",
Destination: &args.ConfigPath,
},
sudo nerdctl --snapshotter nydus run -it --net none gechangwei/python:3.7-nydus bash
FATA[0001] wait until daemon ready by checking status: failed to check status: failed to create new nydus client: failed to build transport for nydus client: stat /var/lib/containerd-nydus-grpc/socket/1_jWaFHnQcezLGcQdXDfwg/api.sock: no such file or directory: unknown
In fact, the socket file existes
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.