rimerosolutions / entrusted Goto Github PK
View Code? Open in Web Editor NEWSanitize documents to safe PDFs, for active content removal
License: GNU General Public License v3.0
Sanitize documents to safe PDFs, for active content removal
License: GNU General Public License v3.0
Since day one, a bunch of shell scripts has been used to create releases artifacts.
Migrate to GitHub actions for scalability and convenience.
There are couple of things that need to happen. There were previous attempts to migrate to GitHub actions, so technical details and considerations are well understood.
aarch64/arm64
As much as possible everything is statically linked to avoid external dependencies, but that mostly possible with command-line applications.
libXcb
, libXfixes
, etc.wayland
or x11
libraries available?A custom seccomp
profile is utilized to reduce allowed system calls (syscalls
) in the sandbox container image. This has proven to be challenging since the implementation, as any issue with it is "opaque" without much error details.
Conversions succeed under Fedora for the standard test dataset. Fedora is a good test generally because when it works, it's almost guaranteed to also work under Windows and Mac OS. Furthermore, Fedora has selinux
enabled by default and that can occasionally result into slightly different problems that are not seen on other operating systems (file permissions issues while moving files from/to container volumes, etc.).
Tested conversions under a live CD of Fedora (amd64) and it failed abruptly without lots of information.
This is likely due to the custom seccomp
profile again and we need to find yet again missing syscalls
.
seccomp
profile.strace
aware sandbox container image to avoid spending too much time troubleshootingseccomp
profiles, this is very annoyingUpdate/enhance the existing RELEASING file.
All conversions are tested with and without OCR (English only) using files in the test_data
folder.
podman
for the CLI, on a Live CD of Alpine Linux
.The Entrusted
binaries provided in releases only run on Glibc systems.
I built a new virtual machine (Void Linux x86_64-musl) to assess a potential migration from xorg to Wayland.
Entrusted
Provide binaries that will run on Musl
systems such as Alpine Linux, Void Linux Musl, Gentoo Musl builds, etc.
GitHub Actions
to build for Musl
systems (aarch64
and amd64
)-glibc
and -musl
suffixes)Notes:
sed
or perl
)FLTK-RS
and fetch them by specifying the CFLTK_BUNDLE_URL
environment variable to fetch remote artifactsThere's a desire to transition from podman to nerdctl on the Live CD, because it provides graceful apparmor support for rootless containers. The Live CD ISO image size should not get out of control though.
800 MB
to 1.4 GB
)
apparmor
DEB packagesnerdctl
related binariesfuse-overlayfs
as "containerd snapshotter" Understand at a high level the nerdctl
directory layout and typical storage size
Understand at a high level nerdctl snapshotters
and relevant disk storage requirements
Evaluate opportunities to compress the container image "BLOB size" on the Live CD
Even though there are plans to migrate to GitHub Actions, compile times are annoying with Rust
.
Reduce compile times both for local builds and later on GitHub Actions
axtloss found creative ways to package Entrusted 0.2.4
as Flakpak (based on the Live CD ISO image).
A lot has changed since 0.2.4
, and there will be more improvements in 0.3.1
0.2.4
, ~670 MB targeted for 0.3.1
)0.3.1
Sync with axtloss to ensure that 0.3.1
is available as a Flatpak
.
None anticipated so far, but there are 2 major updates in the Live CD (boot loader, custom kernel).
See #56
Dependencies are not explicit in RPM
or DEB
packages beyond glibc
.
Note: In the upcoming version, there will be also be a download artifact for systems that use musl
instead of libc
(in the form of a tar.gz
archive).
tar.gz
artifacts to mention required shared librariesThe Live CD is based on Debian stable. The build process generates ISO images for amd64
and aarch64
The build process is a bit too complex to both maintain and support:
Try containerizing most of the build process similarly to Lima Alpine ISO build or other "Live CD kits".
We need a "ready-to-go" folder structure with the entire contents of $HOME/.local/share/containers
in the Live CD.
$HOME/.local/share/containers
This is about requesting new builds (via GitHub Actions) with latest changes prior testing more 0.3.1.
One of the key design principles for the Web interface was always to keep a familiar look and feel with the Desktop interface.
Not having "tabs" in the Web UI is an inconsistency. It has also other side effects as new conversion settings get introduced.
Implement tab containers for "Settings" and "Convert" (defaulting to "Convert") just like in the Desktop interface.
This is the default selected tab, just like for the Desktop client.
The user can optionally customize defaults prior conversions in the "Settings" tab: there will likely be more settings over time.
gVisor is a container security platform. It could be nice to leverage it on the Live CD for additional security hardening.
This has been tested few times a while back and the performance was really really bad with Podman under Linux ...
gVisor
, under Podman
(Linux): roughly 30 secondsgVisor
, under Podman
(Linux), close to 28 minutes if I recall correctly
gVisor
performance and publish non-scientific benchmarks for now
gVisor
works in tandem with the seccomp security profile added in 0.3.0gVisor
could be an optional setting in the Web interface of the Live CDBy default, conversion results are stored in the same folder as the original input files. There's also a button for customizing the output location on a per file basis (save icons in the "file conversion queue").
Entrusted
It would be nice to be able to click directly on a link to open the PDF result directly from Entrusted
.
Entrusted
is also likely the default PDF viewer for a given Desktop environment (Linux, Mac OS or Windows)In the settings tab, the "Open resulting PDF with" option is removed.
In the file conversion queue, a new "Open" link is displayed upon successful processing. Please note that the existing "Logs" button is also streamlined to become more of an "hyperlink" widget.
The user interface seems less cluttered with buttons and is more streamlined with the Web interface look and feel.
Few manual tests are run prior each release. In 0.3.0
a simple functional test was added in entrusted_client
(Cucumber for Rust)
Automate the existing functional test via GitHub actions.
Add new GitHub Actions workflow (amd64 only, excluding Windows due to execution errors on GitHub)
Docker
as container solution)Lima
as container solution)There were previous efforts to reduce the ISO image size (#35). This can be taken a little bit further with a custom kernel for which the installation size is smaller than the stock Debian Linux image (linux-image-amd64
or linux-image-arm64
).
There were few known challenges while attempting to build a tiny but functional kernel image
QEMU
: "Existing boot services..."The linux-image-cloud-*
Debian packages are not suitable because they remove network drivers and we're stuck with only the loopback interface. We need a better middle ground for Entrusted
.
Use a custom kernel to further reduce the Live CD ISO image size.
In the early days of Entrusted
, its file processing times was comparable to Dangerzone
.
Entrusted
became much faster, parallel conversions were then considered not essential
Keep in mind that this would also require switching from "row indexes" to "document ids" when submitting file for processing... It is not technically difficult to implement a quick and dirty initial implementation (Fast mapping of IDs to widget for progress notifications, etc.).
Docker
, xywxs happens and the container crashes"For every conversion I do, a cmd window opens (I believe to start the Docker container) and does not close, and the entrusted-gui begins Not Responding. I can then just close the cmd window and it opens a new one and continues with the page separation. At the end of the conversion, it says it has Failed and the Logs end with "Access is denied. (os error 5)." Even with this supposed failure, the PDF does come out the other end successfully.
I'm on Windows 10, using Docker (which I'm not very familiar with). I would like to say I appreciate entrusted because it handles very large PDFs and dangerzone does not seem to.
For a while, application components have been built with rust 1.64.0
. As improvements and security issues get incorporated, it's important to stay reasonably up to date.
Build projects with rust 1.67.0
.
livecd
, main
, supportimages
)Make couple of updates to existing documentation
Application bundles are sandboxed on Mac OS, which means that they can't typically access most resources outside the application itself. Via entitlements, applications can get additional permissions.
Launching external programs such as Docker
, Podman
or Lima
doesn't seem to work anymore via a wrapper script. Essentially the previous trick (script run through login shell) doesn't seem to work for launching applications that are outside the application bundle.
Even if an application is within the user PATH
, due to the Apple sandbox, Entrusted
is not be able to access it.
Converting /Users/me/Downloads/testdata/sample-odp.odp
No container runtime executable found!
Please install Docker or Lima, and make sure that it's running.
The error message displayed by the application is confusing from a user perspective.
me-iMac-Pro$ which lima
/Users/me/Tools/lima-0.16.0/bin/lima
The solution for now is to only support Docker Desktop. The assumption is that we're dealing with a standard installation with Docker.app
dragged to the /Applications
folder.
com.docker.docker
)NSTask
doesn't help even with an absolute path to a given program. It also makes the code more complex.NSWorkspace
doesn't seem to allow low-level code (capture process output as it becomes available, etc.).Once testing is complete and satisfactory
The program is unable to produce the final PDF result under Linux with Docker
as container solution. The application fails with cryptic "permission denied errors".
This can be reproduce by following the testing instructions for arm64 under amd64, using "user-built dev snapshots" ("0.2.6 dev").
entrusted-cli
with any supported file type such as a PDF document (this will fail)Docker
command and remove the seccomp
profile option (wrong syscalls on arm64)Docker
commandThis has been detected in the upcoming 0.2.6
development builds and seems Docker specific under Linux. Most of the Linux testing is happening only with Podman
since the early days of this project.
/tmp/entrusted/safe
to make it write-able for all users:mkdir -p /tmp/entrusted/safe && chmod -R a+rw /tmp/entrusted/safe
Docker
with the --privileged
flag which is not desirableTopic | Detail |
---|---|
Operating System | Alpine Linux Live CD |
CPU Architecture | aarch64 |
Container Solution | Docker |
Entrusted Version | 0.2.6 dev build |
entrusted
, this differs from the user running insider the container solution (user ID permissions mismatch).The root cause appears to be directory ownership issues and how they get mapped inside Docker. The simplest solution might ensure that the temporary directory mapped to the container volume is write-able by everybody (libc::chmod
).
In the upcoming 0.3.1
release, the plan is to utilize a single boot manager (grub
) instead of previously a combination of 2 boot managers (isolinux
for BIOS
and grub
for UEFI
). This has apparently introduced a bug, maybe after shell scripts polishing.
The Live CD can no longer boot in BIOS
mode on amd64/x86_64
systems.
The system boots successfully after the grub
boot manager prompt.
The boot process fails with grub
complaining about a missing prefix
variable not being found.
Welcome to GRUB!
../../grub-core/kern/dl.c:944:variable 'prefix' isn't setBacktrace (.text 0xa058 .data 0x15e2c)
As of now, there are 3 officially supported container solutions:
default instance
for nowThere's also hidden direct support for [nerdctl](https://github.com/containerd/nerdctl
It is desirable to easily supporting additional container solutions such as Rancher Desktop and possibly much more. This request might not be worth the trouble:
As of now, the Linux GUI artifact is an AppImage file. This is achieved with linuxdeploy. The original goal was creating a fully self-contained binary, but this was never done properly (time constraints and current AppImage
knowledge).
Do not create an AppImage
file for the GUI, as most of the dependencies are not currently embedded anyway.
fuse2
vs fuse3
vs fuse
).fuse
libraries are not resolved, the program will silently failed to start without notice, but that's a general problem.musl
.spec
file or deb control
file.Is it possible to build the iso
only for Linux environment (as opposed to building also for Windows, etc.)?
If so, how to accomplish that?
Starting with release 0.2.4 (UI responsiveness "improvement"), a regression bug was introduced.
This translates into intermittent conversion failures because the PDF result might not be available anymore on container image volumes (process exits for a conversion and then the next conversion starts).
Please note that the issue seems to happen on Mac OS and Windows more often than under Linux (tiling window manager testing).
0.2.4
and above, constrained to the graphical Desktop interface.
Re-process individually documents that failed to process.
This has been observed under Mac OS (recent versions), but it also happens on other operating systems such as Linux and probably Windows too.
Below is a screenshot
entrusted_client
module).SwingWorker
in Java
. That would ensure that all UI updates (and associated state modifications) are happening in the main UI thread.From entrusted/ci_cd/live_cd
, running the build.sh
script gives this error:
Error: copying system image from manifest list: writing blob: adding layer with blob "sha256:xxxxxxxxxxxxxxa02d4eb6bb": processing tar file(potentially insufficient UIDs or GIDs available in user namespace (requested 0:42 for /etc/gshadow): Check /etc/subuid and /etc/subgid if configured locally and run "podman system migrate": lchown /etc/gshadow: invalid argument): exit status 1
+ retVal=125
+ '[' 125 '!=' 0 ']'
+ echo 'Could not build entrusted-cli and entrusted-webserver'
Could not build entrusted-cli and entrusted-webserver
Building from entrusted/ci_cd
gives:
ERRO[0000] cannot find UID/GID for user $USER: no subuid ranges found for user $USER in /etc/subuid - check rootless mode in man pages.
WARN[0000] Using rootless single mapping into the namespace. This might break some images. Check /etc/subuid and /etc/subgid for adding sub*ids if not using a network user
Trying to pull docker.io/uycyjnzgntrn/rust-windows:1.67.0...
Getting image source signatures
Copying blob 7139ad4c9de2 done
Error: writing blob: adding layer with blob "sha256:xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxcf9702c6ae68b8c": processing tar file(potentially insufficient UIDs or GIDs available in user namespace (requested 0:42 for /etc/gshadow): Check /etc/subuid and /etc/subgid if configured locally and run "podman system migrate": lchown /etc/gshadow: invalid argument): exit status 1
WARN[0271] Failed to add pause process to systemd sandbox cgroup: Process org.freedesktop.systemd1 exited with status 1
+ retVal=125
+ '[' 125 -ne 0 ']'
+ echo 'Failure to build Windows binaries'
Failure to build Windows binaries
+ exit 1
+ retVal=1
+ '[' 1 -ne 0 ']'
+ echo 'Windows build failure'
Windows build failure
+ exit 1
This is on branch main
.
Linux x86_64
Any ideas how to debug?
Since 0.3.1
, releases artifacts are assemble via the GitHub infrastructure (GitHub Actions), instead of using a local Ubuntu virtual machine. Please note that few pre-existing shell scripts are also reused in GitHub actions.
Existing shell scripts were built with an emphasis on supporting multiple architectures quickly (x86_64 and arm64)
On top of items below, it would be nice to come closer to "reproducible builds" (containerized builds only help with parts of that).
The current ISO CD release has a 30 second grub timeout at launch, having an optional ISO release that has a small or 0 timeout value so that it launches straight away would be very useful for when packaging it as an application. This is an issue with the flatpak being developed for this app: https://github.com/axtloss/flatpaks/tree/main/com.github.rimerosolutions.entrusted
There was a discussion about Flatpak support and how to approach it going forward ( #42). This issue will actually implement the envisioned solution.
2 main problems
The release 0.3.0
of Entrusted
runs a 5.x
kernel for the Live CD. In development builds, a custom kernel has been introduced via the kernel-deblive-smallserver project but the version is 6.1.8
.
Kernel 6.1
releases are now flagged as long term kernel releases and 6.1.8
is no longer the latest available version.
When 0.3.1
is released, ensure that it ships with the latest available kernel 6.1.x
version.
6.1.x
versionThe current Live CD ISO image is roughly 800 MB
. As functionality gets added, the ISO image size will grow.
For the Live CD, there's a correlation between the RAM allocated to the virtual machine and free space
75%
of the RAM is allocated as free space on the Live CD (grub boot parameters: ramdisk-size
, etc.)1 GB
of RAM allocated to a virtual machine, the application still runs pretty well. The intent is to keep RAM requirements as low as possible.Reduce the ISO image size as much as possible while keeping it functional. Pay attention to aarch64/arm64
support to ensure that the ISO image doesn't hang as noticed in early testing stages.
zstd
compression for the initramfs
generation and the Live CD squashfs
dataEach time that the sandbox container image changes (different distribution or release update), the custom seccomp
profile might break. The sandbox container base OS was changed from Debian buster to Debian bookworm (latest current stable Debian version).
The conversions of Office documents crashes unexpectedly on arm64/aarch64
, likely due to missing required syscalls.
The conversion succeeds for all supported documents on arm64, including Office documents.
The conversion of Office documents terminates abruptly, with the usual vague error messages.
This is the output of the underlying entrusted-cli
command invocation. podman
is invoked directly to get more insights about the failure. It seems that a missing syscall is preventing LibreOffice from copying files around?
localhost:~$ /usr/bin/podman run --rm --network none --cap-drop all --userns kee
p-id --security-opt no-new-privileges --security-opt seccomp=/tmp/entrusted/secc
omp-entrusted-profile-0.3.1.json -v /home/entrusted/entrusted/test_data/sample-d
oc.doc:/tmp/input_file:Z -v /tmp/entrusted/safe:/safezone:Z -e ENTRUSTED_LANGID=
en docker.io/uycyjnzgntrn/entrusted_container:0.3.1 /usr/local/bin/entrusted-con
tainer --visual-quality medium --log-format json
{"percent_complete":5,"data":"Converting to PDF using LibreOffice"}
LibreOffice 7.4 - Fatal Error: The application cannot be started.
User installation could not be completed.
Entrusted
has implemented its own file type detection business logic
libmagic
bindings, few pure Rust libraries, etc.)
The libraries that we care about need to perform mime detection based on file bytes rather than relying on file extensions.
There was a discussion about Flatpak support and how to approach it going forward ( #42). This issue will actually implement the envisioned solution.
entrusted-container
) to accept command-line argumentspodman
or docker
Even though the UI toolkit (FLTK) supports Wayland, this is not enabled at build time.
Enable Wayland
support in UI builds as more people are using Wayland
nowadays.
entrusted-client
project to enable the use-wayland
featurefltk-rs
version 1.3.29
During the migration of the CI/CD pipeline to GitHub actions (#16 ), a random error has been observed for the amd64
ISO image.
If hardened_malloc is going to be a problem, consider a replacement such as mimalloc (secure mode).
Random error with GitHub actions dev build (amd64 ISO image)
There have been few iterations of the build pipeline on GitHub but it never led to errors related to hardened_malloc
After observing the above issue, it was confirmed that disabling hardened_malloc
makes the problem go away...
As I'm in the middle of implementing foundational support that will enable easier Flatpak packaging, there are couple of things to reconsider.
Additional software installation for the user
Higher computing resources footprint
Lack of service management
gVisor is a container sandbox developed by Google that focuses on security, efficiency and ease of use.
gVisor
has been preferred to a combination of seccomp (existing) and apparmor profiles:
apparmor
supportapparmor
settingsnerdctl
supports apparmor
with rootless containers
nerdctl
rootless supportnerdctl
than podman
Enable gVisor
on the Live CD as it provides a rather complete out of the box container security solution.
gVisor
entrusted-webservice
systemd
service for gVisor
support
seccomp
support via environment variablesgVisor
support via environment variablesentrusted-client
component for gVisor
support
tmpfs
filesystem (5 MB) to account for LibreOffice setup, as it needs to create data in XDG_CONFIG_HOME
...). This excludes images and PDF files processing.userns
flag when gVisor
support is desiredThere are couple of hardening options implemented on the Live CD starting with 0.3.0
It would be nice to build on top of previous efforts with AppArmor, especially for the Live CD which is a fully controlled environment. A custom AppArmor security profile could be loaded during file processing.
AppArmor
profileAppArmor
entrusted-client
component
security-opt
flags to the container solutionAll conversions are tested with and without OCR (English only) using files in the test_data
folder.
podman
for the CLI, on a Live CD of Alpine Linux
(emulation with QEMU).Just noticed some permission denied issues with Podman under Linux (Ubuntu), while trying to run the functional test of the application in entrusted_client
(via cargo test
). Per comments, this is mostly requires documentation changes, but minor code changes will be reverted too (for clarity and to avoid forgetting about few technical details).
entrusted_container
(to be confirmed).test_data
folder with the CLI or Desktop GUI....
{"percent_complete":91,"data":"Collecting PDF pages"}
{"percent_complete":92,"data":"Updating bookmarks and page numbering"}
{"percent_complete":93,"data":"Processing PDF structure"}
{"percent_complete":94,"data":"Updating PDF dictionary"}
{"percent_complete":95,"data":"Combining PDF objects"}
{"percent_complete":96,"data":"Compressing PDF"}
{"percent_complete":98,"data":"Saving PDF"}
{"percent_complete":98,"data":"Failed to copy file from /tmp/safe-output-compressed.pdf to /entrusted/safe-output-compressed.pdf"}
{"percent_complete":99,"data":"Conversion failed with reason: Permission denied (os error 13)"}
{"percent_complete":100,"data":"Elapsed time: 0 hours 0 minutes 4 seconds"}
The programs completes successfully.
The programs fails while converting the PDF result to the mounted volume.
entrusted_client
Originally, Entrusted
only supported x86_64/amd64
architectures. Then releases artifacts for arm64/aarch64
became available as it was possible to generate relevant binaries "relatively easily".
The current naming conventions for release artifacts are a bit unorthodox, usually the version number comes right after the "artifactname" portion.
artifactname
-osname
-architecture
-version
.fileextension
entrusted-linux-aarch64-0.3.0.deb
Adopt the following naming pattern going forward:
artifactname
-version
-osname
-architecture
.fileextension
entrusted-0.3.0-linux-aarch64.deb
It seems that there's another FLTK regression for drag and drop under Linux and maybe under other operating systems (Mac & Windows).
It's possible to drag and drop multiple files in the "file drop zone" (Red background).
This was only observed under Fedora with "Gnome files".
Apparently drag and drop only works for a single file with fltk-rs 1.4.8
.
This was only observed under Fedora with "Gnome files".
Need to investigate to get a reasonable understanding of what could be wrong...
Cannot use fltk-rs 1.4.9+ yet, as its requirements have changed and there were failures with the windows build (from an Ubuntu VM). Need to investigate what needs to be done with the local CI/CD pipeline and the flow in GitHub Actions.
As software libraries and dependencies get upgraded, maintaining a custom seccomp
profile has become a maintenance nightmare.
seccomp
profile is neither quick nor trivialseccomp
profile
Disable custom seccomp
profile, as minimizing required syscalls
is both tedious and mistakes crash conversions.
seccomp
profile json fileseccomp
profile checksseccomp
profileThe current user interfaces are mostly based on previous Desktop interface implementations (many years ago) and the initial UI capabilities of Dangerzone.
Can the UI be simpler and yet flexible, and without adding too many screens?
I do not know the right answer at this time, as I built Entrusted
only for myself originally.
Dangerzone
, among other featuresThere's a feeling that it's potentially adding too many screens for a rather simple application.
For user settings, most users will never need the "advanced" settings.
There could be a different screen for selecting files and then another one during conversions, with the option to go back to add files?
Until the 0.3.2
release, all release builds were performed on an Ubuntu virtual machine. This relies on the rust-linux-darwin-builder container image for cross-compiling Mac OS binaries from Linux.
That build process doesn't seem to work anymore (arm64
only it seems):
rust-linux-darwin-builder
x86_64
and arm64
).The local build pipeline for macos fails for the arm64 architecture.
Run the scripts ./ci_cd/macos/build.sh
from an Ubuntu virtual machine.
It seems that few symbols are missing when compiling for arm64
.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.