Git Product home page Git Product logo

nydus-snapshotter's Issues

Support shared domain for erofs + fscache daemon

Previously, the commit erofs: basic support for erofs + fscache daemon supported erofs with fscache daemon.
Since the shared domain feature is not be implemented in Linux Kernel, the config field DomainID is unused.
The cfg field "DomainID" is not used since the shared domain feature is not implemented in Linux Kernel.
After Kernel implements this feature, nydus-snapshotter should adapt the DomainID field.

The blobs annotation in manifest should be deprecated

Currently, nydus image puts an annotation to the manifest to track all referenced blobs in bootstrap: nydus

This will cause the label kv size limitation to be exceeded in containerd when acceld/buildkit write a nydus manifest included a large number of blobs into the content store: containerd

I noticed that the blobs annotation only be used for blob cache gc in nydus-snapshotter: nydus-snapshotter

A feasible workaround is to use nydusd sock API to get the blobs in use, instead of using the list in manifest annotation, so we can remove the annotation from acceld/buildkit.

sharedMode snapshotter doesn't shutdown nydusd when handling os signal INT and TERM

As a e2e test case comment says:

          # After the snapshotter container is stopped, it seems that Nydusd doesn't umount it
          # so we need to umount it here, otherwise you cannot delete this directory. 
          # Frankly, I don't know why Nydusd didn't clean up these resources.
          sudo umount -f /var/lib/containerd-test/io.containerd.snapshotter.v1.nydus/mnt

The reason is that nydus-snapshotter always forks a nydusd when starts but exits leaving the nydusd unsignaled.

In fact, we can't just terminate sharedMode nydusd when handling SIGINT and SIGTERM since it can still serve container images while restarting nydus-snapshotter. But we call terminate it when nydusd when snapshotter is aware that no container image is being served.

use `findmnt` to judge `IsLikelyNotMountPoint`

When nydusd is down unexpectedly, IsLikelyNotMountPoint can not run expectedly, as stat fuse mountpoint will failed and the fuse mountpoint can not cleanup when image is removed. So I think IsLikelyNotMountPoint use command findmnt maybe more suitable.

here can refer to this,

func (m *Mounter) IsLikelyNotMountPoint(file string) (bool, error) {
	file, err := filepath.Abs(file)
	if err != nil {
		return true, err
	}
	cmdPath, err := exec.LookPath("findmnt")
	if err != nil {
		// no findmnt found, judge moutpoint by device
		log.L.Printf("no findmnt command found, use device to judge")
		return m.isLikelyNotMountPoint(file)
	}
	args := []string{"--types", "fuse", "-o", "target", "--noheadings", "--target", file}
	log.L.Printf("findmnt command: %v %v", cmdPath, args)

	out, err := exec.Command(cmdPath, args...).CombinedOutput()
	if err != nil {
		// if findmnt didn`t return, just claim it's not a mount point
		return true, err
	}
	strOut := strings.TrimSuffix(string(out), "\n")
	log.L.Printf("IsLikelyNotMountPoint findmnt output: %v", strOut)
	if strOut == file {
		return false, nil
	}

	return true, nil
}

func (m *Mounter) isLikelyNotMountPoint(file string) (bool, error) {
	stat, err := os.Stat(file)
	if err != nil {
		return true, err
	}
	rootStat, err := os.Stat(filepath.Dir(strings.TrimSuffix(file, "/")))
	if err != nil {
		return true, err
	}
	// If the directory has a different device as parent, then it is a mountpoint.
	if stat.Sys().(*syscall.Stat_t).Dev != rootStat.Sys().(*syscall.Stat_t).Dev {
		return false, nil
	}

	return true, nil
}

Config file given to nydusd looks strange

Config file generated by snapshotter given to nydusd looks so strange. It looks a mixture of fusedev/rafs and fscache.
Originally, the config file is only configuring rafs not nydusd.

{
  "device": {
    "backend": {
      "type": "registry",
      "config": {
        "readahead": false,
        "host": "xxx.com",
        "repo": "foor/bar",
        "auth": "<AUTH>",
        "scheme": "https",
        "proxy": {
          "fallback": false
        },
        "timeout": 5,
        "connect_timeout": 5,
        "retry_limit": 2
      }
    },
    "cache": {
      "type": "blobcache",
      "config": {
        "work_dir": "/var/lib/containerd-nydus-grpc/cache",
        "disable_indexed_map": false
      }
    }
  },
  "mode": "direct",
  "digest_validate": false,
  "enable_xattr": true,
  "fs_prefetch": {
    "enable": true,
    "prefetch_all": true,
    "threads_count": 4,
    "merging_size": 0,
    "bandwidth_rate": 0
  },
  "type": "",
  "id": "",
  "domain_id": "",
  "config": {
    "id": "",
    "backend_type": "",
    "backend_config": {
      "readahead": false,
      "proxy": {
        "fallback": false
      }
    },
    "cache_type": "",
    "cache_config": {
      "work_dir": ""
    },
    "metadata_path": ""
  }
}

report nydus image usage

Let nydus-snasphotter report more accurate disk usage.

When performing ctr snapshot --snapshotter nydus usage, the total usage of all layers is not accurate.

support different nydus configurations

Nydus supports several different types of storage backend, "localfs", "oss" and "registry".

Since users may use different backend for different images in order to meet different scenarios, it'd better to have nydus-snapshotter support multiple nydus configuration in one shot.

With that being said, how to choose a specific nydus config is not a straightforward thing as it depends on a container image's metadata.
Maybe we can leave an annotation in bootstrap.

Try to enhance Mount struct of containerd

When runtime starts nydusd, it has to provide necessary information to nydusd like configuration or auth.
Current Mount struct does not accommodate such information. So nydus has to provide a mount helper binary on host.
Another method to address this to pass more necessary information to runtime via Mount struct.

Replace standard http client to go-retryablehttp

At present, our snapshot will download the image through the go standard http client, if the http download fails and does not do a retry, ideally we should do a retry internally, we can replace the standard http client with go-retryablehttp

logs of nydusd in fscache mode is missing

Unlink fusedev mode, nydusd has a logging file to store messages. We'd better keep this consistent with fusedev mode.
Otherwise it is hard for us to maintain and investigate.

Use temp file/rename to ensure atomic file ops

If the nydus-snapshotter has been forcibly terminated, there may be intermediate file left on disk, thus cause inconsistent system state. So we should use temp file/rename to ensure atomic file operations.

Add end-to-end CI

The current CI is missing e2e tests. We should run end-to-end tests to make sure change works.

Provide CRI like configuration for registry access

Currently the downloading of layers (including the bootstrap layer) is done entirely by snapshotter, not containerd, so we need to consider these:

  1. registry mirror support;
  2. choose http/https registry scheme;
  3. client https tls cert support;
  4. retryable http request;
  5. registry auth for multiple registries;
  6. retryable multiple mirrors;

Maybe we can support the CRI like configuration like this:

image

Implement hybrid mode of nydus-snapshotter

If nydus-snapshotter is ever started in multiple or single daemon mode, it can't change the daemon mode from the last startup set.
It is not very friendly to be used.

Introduce nydus-snapshotter toml configuration file

At present, nydus-snapshotter is configured by its command line parameters some of which are passed to the nydusd daemon.
At the same time, users have to provide a nydusd JSON configuration template. It's a minimal version of nydusd JSON configuration which will be enriched by nydus-snapshotter with necessary extra information like registry auth, etc. It is not very friendly to end-user especially since some items in the JSON fill might be overwritten by nydus-snapshotter.
On the other hand, nydusd's configuration file is going to evolve to its next version, which means nydus-snapshooter's configuration loading and parsing logic has to adapt it. And we don't have to change systemd service unit file when we want to change nydus-snapshotter's work mode and parameters.

I am proposing a TOML format nydus-snapshotter configuration file:

cleanup_on_close = false
enable_stargz = false
root = "/var/lib/containerd-nydus"
version = 1

[binaries]
nydusd_path = "/usr/local/bin/nydusd"
nydusimage_path = "/usr/local/bin/nydus-image"

[log]
# Snapshotter's log level
level = "info"
log_rotate_compress = true
log_rotate_local_time = true
log_rotate_max_age = 
log_rotate_max_backups = 
log_rotate_max_size = 
log_to_stdout = false

[system]
collect_metrics = false
# Management API server unix domain socket path
socket = 

[remote.auth]
enable_kubeconfig_keychain = false
kubeconfig_path = "/home/foo/.kube"

[snapshot]
enable_nydus_overlayfs = false
sync_remove = false

[daemon]
# fuse or fscache
fs_drvier = "fuse"
# Specify nydusd log level
log_level = "info"
# How to process when daemon dies: "none", "restart", "failover"
recover_policy = "restart"
# Speicfy a configuration templiate file
template_path = ""

# configuration of remote backend storage. fuse and fscache 
# can share the same backend configuration.
[daemon.storage]
connect_timeout = 5
#  NOTE: mirrors and proxy can't be set at the same time
mirrors = [{host = , headers = , auth_though = }]
# proxy =
disable_indexed_map = false
# container images data can be cached locally
enable_cache = true
prefetch_config = {enable = true, threads_count = 8, merging_size = 1048576}
retry_limit = 2
scheme = "https"
timeout = 5
type = "registry"

[daemon.fuse]
# loading rafs metadata mode
digest_validate = false
enable_xattr = true
iostats_files = false
mode = "direct"

# Nydusd works as a fscache/cachefiles userspace daemon
[daemon.fscache]
conig = {cache_tpye = "fscache"}
type = "bootstrap"

[cache_manager]
enable = true
gd_period = "24h"

[image]
public_key_file = "/path/to/key/file"
validate_signature = true

containerd.toml version 2 support

As containerd config default produce containerd.toml version 2 for latest containerd, so README.md should include how to set nydus working env for containerd.toml version 2

Log nydusd's stderr and stdout to nydus-snapshotter log

At present, the log of nydusd is not output to the log file. When panic occurs in Nydusd, the information of the panic will be lost. Therefore, it is necessary to output stdout and stderr to the log file as well.

	args = append(args, "--apisock", d.GetAPISock())
	args = append(args, "--log-level", d.LogLevel)
	if !d.LogToStdout {
		args = append(args, "--log-file", d.LogFile())
	}

	log.L.Infof("start nydus daemon: %s %s", m.nydusdBinaryPath, strings.Join(args, " "))

	cmd := exec.Command(m.nydusdBinaryPath, args...)
	if d.LogToStdout {
		cmd.Stdout = os.Stdout
		cmd.Stderr = os.Stdout
	}
	return cmd, nil

nydus sdk lack of error context

Most errors returned in pkg/nydussdk/client.go don't have a context, so it's hard to determine where the error is returned.

Support download blob layers to cache dir

Our Nydusd supports the localfs mode to start, in other words, the blob layer is placed on the local file system in advance, but at present, we do not provide the auxiliary ability to put the blob layer in the corresponding directory, which requires us manually from the registry download the blob layer and extract it to the corresponding directory. Obviously, this is more complicated to operate and maintain. We can support this scenario through nydus-snapshotter, which downloads the blob layer from the registry through snapshotter and puts it into the blob cache directory configured by Nydusd.

Monitor nydusd

Snapshotter is better to have a mechanism to monitor nydusds. If nydusd is dead somehow, the nydus-snapshotter should be notified.

Make nydus-snapshotter config-path as optional

Nydus-snapshotter's option config-path is required now, where actually only the registry/OSS auth has to be passed to nydusd.
But nydus-snapshotter now can take in auth from local host's docker configuration. It means nydus-snapshotter can make up a comprehensive json configuration file for nydusd itself. So end users can skip the configuration step. It is convenient.

		&cli.StringFlag{
			Name:        "config-path",
			Required:    true,
			Usage:       "path to the configuration file",
			Destination: &args.ConfigPath,
		},

Snapshotter occasionally report error message

sudo nerdctl --snapshotter nydus   run -it --net none gechangwei/python:3.7-nydus bash
FATA[0001] wait until daemon ready by checking status: failed to check status: failed to create new nydus client: failed to build transport for nydus client: stat /var/lib/containerd-nydus-grpc/socket/1_jWaFHnQcezLGcQdXDfwg/api.sock: no such file or directory: unknown

In fact, the socket file existes

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.