Git Product home page Git Product logo

squashfs's Introduction

squashfs

PkgGoDev Go Report Card

A PURE Go library to read squashfs. There is currently no plans to add archive creation support as it will almost always be better to just call mksquashfs. I could see some possible use cases, but probably won't spend time on it unless it's requested (open a discussion if you want this feature).

The library has two parts with this github.com/CalebQ42/squashfs being easy to use as it implements io/fs interfaces and doesn't expose unnecessary information. 95% this is the library you want. If you need lower level access to the information, use github.com/CalebQ42/squashfs/low where far more information is exposed.

Currently has support for reading squashfs files and extracting files and folders.

Special thanks to https://dr-emann.github.io/squashfs/ for some VERY important information in an easy to understand format. Thanks also to distri's squashfs library as I referenced it to figure some things out (and double check others).

FUSE

As of v1.0, FUSE capabilities has been moved to a separate library.

Limitations

  • No Xattr parsing.
  • Socket files are not extracted.
    • From my research, it seems like a socket file would be useless if it could be created.
  • Fifo files are ignored on darwin

Issues

  • Significantly slower then unsquashfs when nested images
    • This seems to be related to above along with the general optimization of unsquashfs and it's compression libraries.
      • Not to mention it's written in C
    • Times seem to be largely dependent on file tree size and compression type.
      • My main testing image (~100MB) using Zstd takes about 5x longer.
      • An Arch Linux airootfs image (~780MB) using XZ compression with LZMA filters takes about 30x longer.
      • A Tensorflow docker image (~3.3GB) using Zstd takes about 12x longer.

Note: These numbers are using FastOptions(). DefaultOptions() takes about 2x longer.

Recommendations on Usage

Due to the above performance consideration, this library should only be used to access files within the archive without extraction, or to mount it via Fuse.

  • Neither of these use cases are largely effected by the issue above.

squashfs's People

Contributors

calebq42 avatar srevinsaju avatar stffabi avatar tri-adam avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

squashfs's Issues

Library re-organisation into pure squashfs libary and fuse implementation

I'm considering using this library in rclone.

It works very well according to my tests.

However I find that it imports more stuff than I'd like ideally. Specifically to use the squashfs library as an archiver (like archive/zip) pulls in github.com/CalebQ42/fuse and github.com/seaweedfs/fuse which I don't need and will take up space in my binary.

It looks possible to re-arrange the library so it has an internal "pure" squashfs implementation then add the fuse implementation on top of that.

I think it could even be done in a backwards compatible way with some cunning use of type aliases, though it might be cleaner to declare a v2.

What do you think?

extract a single file ?

How can I extract a specific file with this library. I have a big squashfs file BUT I only want to extract a small file from it. We can do that with unsquashfs like

unsquashfs my.squashfs -extract-file meta/snap.yaml

Wrong path separator used while reading the squashfs?

Hi there,

Thanks for your nifty library and it's been a real help for me.

But I've got something wrong when reading the squashfs on Windows hosts.

After some investigation, it seems line 55 here

squashfs/reader_fs.go

Lines 43 to 60 in 54d193a

func (f FS) OpenFile(name string) (*File, error) {
name = filepath.Clean(name)
if !fs.ValidPath(name) {
return nil, &fs.PathError{
Op: "open",
Path: name,
Err: fs.ErrInvalid,
}
}
if name == "." || name == "" {
return f.File, nil
}
split := strings.Split(name, "/")
for i := range f.e {
if f.e[i].Name != split[0] {
continue
}
if len(split) > 1 && f.e[i].Type != inode.Dir {

shall be altered to

	split := strings.Split(name, string(os.PathSeparator))

When running on Windows, a path separator will be modified to \ after filepath.Clean even if it was passed with /
So I think it's better to split its path by os specified separator rather than hardcoding /

What do you think?

BRs.

Possible to use this package for a Go-based AppImage runtime?

Due to libfuse2 vanishing from mainstream distributions, we are forced to look into alternatives.

One possible alternative would be to write a new AppImage runtime in Go. https://github.com/orivej/static-appimage/ has done it, but using zip rather than squashfs. https://github.com/kost/static-appimage/releases has binaries to try out.

I can see a couple of advantages when using Go for the AppImage runtime, but as most things, this is a question of tradeoffs.

Do you think it would be possible to make https://github.com/orivej/static-appimage/ work with squashfs using https://github.com/CalebQ42/squashfs? Would it be a huge undertaking?

Compilation issue in linux

Seeing the issue on linux.

Architecture: x86_64
goos: linux
goarch: amd64

github.com/CalebQ42/squashfs/fuse_linux.go:3:15: undefined: fuse

probably import "github.com/CalebQ42/fuse" is missing.

Compile-time error on MacOS

The library can't be used in macOS due to a missing fuse.ENODATA symbol.

Is it possible to replace it with something go-generic, like os.ErrNotExists?

# github.com/CalebQ42/squashfs
../../../go/pkg/mod/github.com/!caleb!q42/[email protected]/fuse2.go:103:19: undefined: fuse.ENODATA
../../../go/pkg/mod/github.com/!caleb!q42/[email protected]/fuse2.go:115:14: undefined: fuse.ENODATA
../../../go/pkg/mod/github.com/!caleb!q42/[email protected]/fuse3.go:101:19: undefined: fuse.ENODATA
../../../go/pkg/mod/github.com/!caleb!q42/[email protected]/fuse3.go:113:14: undefined: fuse.ENODAT

Optionally dereference symlinks

Thanks @CalebQ42 for this extremely useful library.

When I wrote go-appimage, one area that I would have loved to have is the option to follow (dereference) symlinks when extracting from squashfs. Sometimes want to extract the symlinks as such, sometimes you want to extract the files/directories they point to.

I think the go-appimage code could be significantly cleaned up if this capability could become part of this library.

Error Opening fragmented files

Hi @CalebQ42, thanks again for the great module!

I'm troubleshooting an issue with Anchore Syft (anchore/syft#1150) via its underlying stereoscope module (anchore/stereoscope#140), which is a consumer of version 0.5.4 of this module. The issue seems to present itself on Open is called on an inode that references a fragment block. At the user level, this manifests itself as an error being returned from Open. In the specific example referenced by anchore/syft#1150, the error is:

open opt/intel/oneapi/compiler/2022.1.0/linux/lib/oclfpga/host/linux64/bin/perl/lib/5.30.3/pod/perlpod.pod: unexpected EOF

Having immersed myself in your code as well as https://dr-emann.github.io/squashfs/squashfs.html, I believe I've identified a few issues with fragment handling. I'll open a PR shortly with patches that seem to fix the issue.

Thanks!

Open fails with "[" filename

I'm getting an error when calling Open on a path obtained while running a WalkDir on an squashfs.FS. The path in question is a tad odd, namely, bin/[, but is otherwise unremarkable.

Having traced through the code, I think the problem is here:

if match, _ := path.Match(split[0], f.entries[i].Name); match {

At this point in the code, path.Match("[", "[") ends up being called, but this returns false for a match. I believe this is because [ is treated as a special character in Match. A direct check for equality may be more appropriate here? I will open a PR shortly with that for review.

Error decompressing files with lots of NULLs

Summary: A file with 321 NULLs unpacks as 128k of NULLs.

I've made a small test archive - unzip the bug.sqsh to demonstrate:

bug.sqfs.zip

go install github.com/CalebQ42/squashfs/...@main
go-unsquashfs bug.sqfs bug-out-go-unsquashfs
unsquashfs -d bug-out-unsquashfs bug.sqfs

Comparing the two extractions

$ ls -l bug-out-go-unsquashfs
total 128
-rw-rw-r-- 1 ncw ncw 131072 Aug 11 17:37 file2

vs

$ ls -l bug-out-unsquashfs
total 4
-rw-rw-r-- 1 ncw ncw 321 Sep  7  2020 file2

file2 is 321 bytes of NULLs where it seems to have unpacked as 131072 bytes (so one block) of NULLs with go-unsquashfs

bug.sqsh was made like this, so with default options for mksquashfs.

mksquashfs source/ bug.sqfs

Support reading "." for fs.FS

Thanks for this awesome library. It would be great to be able to read the current directory with "." on fs.FS. This would also allow using fs.Walkdir starting at the current directory.

I tried to implement it and provided a PR #6

Error extracting

Hi

I am seeing an error while extracting.

"Unsupported file type. Inode type: 5"

go-lzo license is GPL-2.0

This library is declared as MIT, however, depends on a library (statically) with a GPL-2.0 license (github.com/rasky/go-lzo). How much of a problem is this? I'm not certain... I'm not a lawyer ๐Ÿ˜„ . But, this is really easy to miss and and wanted to make certain you were aware of this.

(squashfs is awesome btw โค๏ธ !)

unexpected EOF errors when unpacking a large archive

When I use this program to unpack an archive (I wrote this to learn the squashfs API)

package main

import (
	"context"
	"fmt"
	"io"
	"io/fs"
	"log"
	"os"
	"path/filepath"
	"runtime"

	"github.com/CalebQ42/squashfs"
	"golang.org/x/sync/errgroup"
)

func unpackSquashfs(inputFile, outputDir string) (err error) {
	ctx := context.Background()
	f, err := os.Open(inputFile)
	if err != nil {
		return fmt.Errorf("squashfs open failed: %w", err)
	}
	defer func() {
		closeErr := f.Close()
		if err == nil {
			err = closeErr
		}
	}()

	rd, err := squashfs.NewReader(f)
	if err != nil {
		return fmt.Errorf("squashfs reader creation failed: %w", err)
	}

	g, gCtx := errgroup.WithContext(ctx)
	g.SetLimit(runtime.NumCPU())

	err = fs.WalkDir(rd.FS, ".", func(path string, d fs.DirEntry, err error) error {
		if err != nil {
			return err
		}
		outPath := filepath.Join(outputDir, path)
		fmt.Printf("Writing %q\n", path)

		if d.IsDir() {
			return os.MkdirAll(outPath, 0777)
		}

		// Run file copies in parallel
		g.Go(func() error {
			if gCtx.Err() != nil {
				return nil
			}

			r, err := rd.FS.Open(path)
			if err != nil {
				return fmt.Errorf("unpack open source failed: %w", err)
			}
			defer r.Close()

			w, err := os.Create(outPath)
			if err != nil {
				return fmt.Errorf("unpack open destination failed: %w", err)
			}
			defer w.Close()

			_, err = io.Copy(w, r)
			if err != nil {
				return fmt.Errorf("unpack copy failed: %w", err)
			}
			return nil
		})

		return nil
	})
	if err != nil {
		return fmt.Errorf("squashfs read walk failed: %w", err)
	}

	err = g.Wait()
	if err != nil {
		return fmt.Errorf("squashfs copies failed: %w", err)
	}

	return nil

}

func main() {
	if len(os.Args) < 3 {
		log.Printf("Syntax: %s <image.sqfs> <directory>", os.Args[0])
		os.Exit(1)
	}
	imageFile, inputDirectory := os.Args[1], os.Args[2]

	err := unpackSquashfs(imageFile, inputDirectory)
	if err != nil {
		log.Fatalf("Failed: %v", err)
	}
}

then sometimes I get

Failed: squashfs read walk failed: readdir usr/local/lib/python3.11/dist-packages/tensorflow/include/external/gemmlowp/meta: unexpected EOF

I can reproduce this with go-unsquashfs but it appears to have a memory leak so unless you have a very large amount of RAM it won't finish. This probably deserves another issue!

Exactly which file returns the error varies

Failed: squashfs read walk failed: readdir usr/local/lib/python3.11/dist-packages/grpc/_cython: unexpected EOF

I tracked this error down to the binary.Read returning ErrUnexpectedEOF here

err = binary.Read(r, binary.LittleEndian, &h)

It does appear to be reading near the end of the file at the time the unexpected EOF is generated so I conjecture there is some padding at the end or something which is causing the binary.Read to read a few extra bytes.

I don't know why it is inconsistent though.

You can generate a squashfs which demonstrates the problem with this script

#! /usr/bin/bash
ORG=${ORG:-tensorflow}
IMG=${IMG:-tensorflow}
TAG=${TAG:-latest-gpu-jupyter}
docker export $(docker create $ORG/$IMG:$TAG) -o $IMG.tar.gz
mkdir -p $IMG && tar xf $IMG.tar.gz -C $IMG
[ -f $IMG.sqfs ] && rm $IMG.sqfs
mksquashfs $IMG $IMG.sqfs  -comp zstd -Xcompression-level 3 -b 1M -no-xattrs -all-root

If that doesn't demonstrate the problem I can send you one which does, but it is 3GB so will take a long time to upload with my puny internet!

PS Actually with the magic of docker you can probably generate an identical archive if I tell you the ID of the image I used is a2cf87758fef

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.