moby / hyperkit Goto Github PK

A toolkit for embedding hypervisor capabilities in your application

License: BSD 2-Clause "Simplified" License

Makefile 0.74% C 93.71% Shell 0.31% OCaml 1.06% DTrace 0.66% Roff 0.74% Go 2.77%

hyperkit's Introduction

HyperKit

HyperKit is a toolkit for embedding hypervisor capabilities in your application. It includes a complete hypervisor, based on xhyve/bhyve, which is optimized for lightweight virtual machines and container deployment. It is designed to be interfaced with higher-level components such as the VPNKit and DataKit.

HyperKit currently only supports macOS using the Hypervisor.framework. It is a core component of Docker Desktop for Mac.

Requirements

OS X 10.10.3 Yosemite or later
a 2010 or later Mac (i.e. a CPU that supports EPT)

Reporting Bugs

If you are using a version of Hyperkit which is embedded into a higher level application (e.g. Docker Desktop for Mac) then please report any issues against that higher level application in the first instance. That way the relevant team can triage and determine if the issue lies in Hyperkit and assign as necessary.

If you are using Hyperkit directly then please report issues against this repository.

Usage

$ hyperkit -h

Building

$ git clone https://github.com/moby/hyperkit
$ cd hyperkit
$ make

The resulting binary will be in build/hyperkit

To enable qcow support in the block backend an OCaml OPAM development environment is required with the qcow module available. A suitable environment can be setup by installing opam and libev via brew and using opam to install the appropriate libraries:

$ brew install opam libev
$ opam init
$ eval `opam env`
$ opam pin add qcow.0.11.0 git://github.com/mirage/ocaml-qcow -n
$ opam pin add qcow-tool.0.11.0 git://github.com/mirage/ocaml-qcow -n
$ opam install uri qcow.0.11.0 conduit.2.1.0 lwt.5.3.0 qcow-tool mirage-block-unix.2.12.0 conf-libev logs fmt mirage-unix prometheus-app

Notes:

opam config env must be evaluated each time prior to building hyperkit so the build will find the ocaml environment.
Any previous pin of mirage-block-unix or qcow should be removed with the commands:
```
$ opam update
$ opam pin remove mirage-block-unix
$ opam pin remove qcow
```

Tracing

HyperKit defines a number of static DTrace probes to simplify investigation of performance problems. To list the probes supported by your version of HyperKit, type the following command while HyperKit VM is running:

 $ sudo dtrace -l -P 'hyperkit$target' -p $(pgrep hyperkit)

Refer to scripts in dtrace/ directory for examples of possible usage and available probes.

Relationship to xhyve and bhyve

HyperKit includes a hypervisor derived from xhyve, which in turn was derived from bhyve. See the original xhyve README which incorporates the bhyve README.

We try to avoid deviating from these upstreams unnecessarily in order to more easily share code, for example the various device models/emulations should be easily shareable.

Reporting security issues

The maintainers take security seriously. If you discover a security issue, please bring it to their attention right away!

Please DO NOT file a public issue, instead send your report privately to [email protected].

Security reports are greatly appreciated and we will publicly thank you for it. We also like to send gifts—if you're into Docker schwag, make sure to let us know. We currently do not offer a paid security bounty program, but are not ruling it out in the future.

hyperkit's People

Contributors

Stargazers

Watchers

Forkers

rn pnasrat balrajsingh djs55 ijc mor1 huslage dsheets samoht zchee thajeztah svendowideit shykes nathanleclaire troipolloi jpetazzo rrarunan alexmavr nicolaka avsm awesome-docker programmerq cloudxtreme vanloswang genevera sakaia rkazak sffej mbrukman alexfishman thebsdbox rnbwd classic2u ashee leepro luigirizzo dev-alex-alex2006hw hvaara yallop jakirkham amorphid johanneswuerbach gauravtayal0 digideskio northpole79 hu19891110 lihuanshuai nkwilson samanalysis fortesit trikyas joshado rowhit jackbro daddyauden gnuhub robbkistler dockernuts cgvarela nextideallc muharremokutan asimpletune cheljsberg rjmcguire runt18 mattlk13 justincormack guangminglion danmack errordeveloper connectthefuture manouchehri guillaumerose curtisz willb andyzweb lfany magnuss markeissler kleopatra999 duvitech-llc pooyadav dlorenc yuce andrewrothstein aburan28 cluo lengxxx daviddesimone xdslab adrozdovbering akimd kirschd zerocry srenatus xtreme-stevehiehn tocheskey pcfdev-forks brizzle alexxnica

hyperkit's Issues

vsock crash

Hi,

We have an annoying crash of docker for mac which boils down to invalid ringbuf check here AFAIU.

Log:

Aug  4 19:28:49 estus Docker[42833] <Error>: Socket.TCPV4.read connected TCP: caught Uwt.Uwt_error(Uwt.ECONNRESET, "uwt_read", "") returning Eof
Aug  4 19:28:49 estus Docker[42833] <Error>: Socket.TCPV4.write connected TCP: caught Uwt.Uwt_error(Uwt.EPIPE, "write", "") returning Eof
Aug  4 19:28:49 estus Docker[42833] <Error>: Socket.Stream: caught Uwt.Uwt_error(Uwt.ENOTCONN, "shutdown", "")
Aug  4 19:28:49 estus Docker[42833] <Error>: Socket.TCPV4.write connected TCP: caught Uwt.Uwt_error(Uwt.EPIPE, "write", "") returning Eof
Aug  4 19:28:49 estus Docker[42833] <Error>: Socket.Stream: caught Uwt.Uwt_error(Uwt.ENOTCONN, "shutdown", "")
Aug  4 19:28:55 estus Docker[42833] <Error>: Socket.TCPV4.write connected TCP: caught Uwt.Uwt_error(Uwt.EPIPE, "write", "") returning Eof
Aug  4 19:28:55 estus Docker[42833] <Error>: Socket.Stream: caught Uwt.Uwt_error(Uwt.ENOTCONN, "shutdown", "")
Aug  4 19:28:55 estus Docker[42833] <Error>: Socket.TCPV4.write connected TCP: caught Uwt.Uwt_error(Uwt.EPIPE, "write", "") returning Eof
Aug  4 19:28:55 estus Docker[42833] <Error>: Socket.Stream: caught Uwt.Uwt_error(Uwt.ENOTCONN, "shutdown", "")
Aug  4 19:28:55 estus Docker[42832] <Error>: Fatal unexpected exception: Socket.Closed
Aug  4 19:28:55 estus Docker[42834] <Notice>: virtio-net-vpnkit: initialising, opts="uuid=01b794ee-68a9-4b92-9c79-b2b44e5f271e,path=/Users/free2use/Library/Containers/com.docker.docker/Data/s50,macfile=/Users/free2use/Library/Containers/com.docker.docker/Data/com.docker.driver.amd64-linux/mac.0"
Aug  4 19:28:55 estus Docker[42834] <Notice>: Error reading vsyslog length: EOF
Aug  4 19:28:55 estus Docker[42834] <Notice>: Assertion failed: (sc->reply_cons != sc->reply_prod), function send_response_common, file src/pci_virtio_sock.c, line 879.

As far as I can see the error is here:

    if (sc->reply_prod == VTSOCK_REPLYRINGSZ)
        sc->reply_prod = 0;
    /* check for reply ring overflow */
    /* XXX correct check? */
    DPRINTF(("TX: QUEUING REPLY IN SLOT %x (prod %x, cons %x)\n",
         slot, sc->reply_prod, sc->reply_cons));
    assert(sc->reply_cons != sc->reply_prod);

I.e. someone should "XXX correct check" :)

Alternatively, if it's hard to fix could we make VTSOCK_REPLYRINGSZ bigger (if it will help to make error more rare)?

Allow the VM to be a router

First of all, it worth mentioning that I'm not a networking expert by any means and not even sure that this is right repo to file this issue, but I've made quite some tests with all that.

The problem is that seems like Hyperkit VM is not able to work as a router, when routing of the host os is configured to do so.

Use Case

Briefly, the use case is to be able to access containers running inside the Hyperkit virtual machine by IP and name from the host machine (Mac OS). It doesn't matter whether these are docker container or Kubernetes pods.

The use case for that is described in these issues in more details:

#55
docker/for-mac#155

I'm not sure why it's working for VirtualBox and not for Hyperkit.

Problem

Basically I was spinning up virtual machines based on boot2docker.iso using the same ISO I can't route traffic through the VM with Hyperkit, but a do can with VirtualBox.

Example using https://github.com/kubernetes/minikube on MacOS High Sierra:

minikube start --vm-driver hyperkit.
kubectl run nginx --image nginx:alpine.
kubectl get pods -o wide (note the IP of the pod and find out it's CIDR, let's say it's 172.18.0.0/16).
sudo route -n add 172.18.0.0/16 $(minikube ip).
Use curl to access the IP of the nginx pod. In my case it hangs until timeout.

Doing the same but with VirtualBox driver works totally fine.

It seems like there is something inside Hyperkit (or maybe underlying MacOS hypervisor that does not let packets going through).

Although doing tcpdump on the Minikube interface (normally named bridge0) show that packets are routed correctly.

make fails with F_PUNCHHOLE error

on mac os sierra

🐈  make
gen src/lib/dtrace.d
cc src/lib/vmm/intel/vmcs.c
cc src/lib/vmm/intel/vmx.c
cc src/lib/vmm/intel/vmx_msr.c
cc src/lib/vmm/io/vatpic.c
cc src/lib/vmm/io/vatpit.c
cc src/lib/vmm/io/vhpet.c
cc src/lib/vmm/io/vioapic.c
cc src/lib/vmm/io/vlapic.c
cc src/lib/vmm/io/vpmtmr.c
cc src/lib/vmm/io/vrtc.c
cc src/lib/vmm/vmm.c
cc src/lib/vmm/vmm_api.c
cc src/lib/vmm/vmm_callout.c
cc src/lib/vmm/vmm_host.c
cc src/lib/vmm/vmm_instruction_emul.c
cc src/lib/vmm/vmm_ioport.c
cc src/lib/vmm/vmm_lapic.c
cc src/lib/vmm/vmm_mem.c
cc src/lib/vmm/vmm_stat.c
cc src/lib/vmm/vmm_util.c
cc src/lib/vmm/x86.c
cc src/lib/acpitbl.c
cc src/lib/atkbdc.c
cc src/lib/block_if.c
src/lib/block_if.c:275:21: error: variable has incomplete type 'struct fpunchhole'
                struct fpunchhole arg = { .fp_flags = 0, .reserved = 0, .fp_offset = (off_t) fp_offset, .fp_length = (off_t) fp_length };
                                  ^
src/lib/block_if.c:275:10: note: forward declaration of 'struct fpunchhole'
                struct fpunchhole arg = { .fp_flags = 0, .reserved = 0, .fp_offset = (off_t) fp_offset, .fp_length = (off_t) fp_length };
                       ^
src/lib/block_if.c:276:26: error: use of undeclared identifier 'F_PUNCHHOLE'
                ret = fcntl(bc->bc_fd, F_PUNCHHOLE, &arg);
                                       ^
src/lib/block_if.c:713:22: error: variable has incomplete type 'struct fpunchhole'
                        struct fpunchhole arg = { .fp_flags = 0, .reserved = 0, .fp_offset = 0, .fp_length = 0 };
                                          ^
src/lib/block_if.c:713:11: note: forward declaration of 'struct fpunchhole'
                        struct fpunchhole arg = { .fp_flags = 0, .reserved = 0, .fp_offset = 0, .fp_length = 0 };
                               ^
src/lib/block_if.c:714:18: error: use of undeclared identifier 'F_PUNCHHOLE'
                        if (fcntl(fd, F_PUNCHHOLE, &arg) == 0) {
                                      ^
4 errors generated.
make: *** [build/lib/block_if.o] Error 1

Multiboot support

For booting unikernels such as Solo5 and Rumprun it would be good to support multiboot.
@avsm Tells me that "Thomas Haggett had a half finished tree so we could finish that up".

vCPU thaw triggering VMEXIT_ABORT

In [docker/for-mac#85] the report with diagnostic bundle ID 264231A3-7008-4FAA-9D89-83D089716817/20160817-153827 shows a hyperkit crash on wake: (hostname and username redacted)

Aug 17 14:59:30 Docker[com.docker.driver.amd64-linux][27183] <Notice>: System wants to go to sleep
Aug 17 14:59:30 Docker[com.docker.driver.amd64-linux][27183] <Notice>: Asking com.docker.hyperkit to freeze vcpus
Aug 17 14:59:31 Docker[com.docker.driver.amd64-linux][27183] <Notice>: vcpu 3 waiting for signal to resume
Aug 17 14:59:31 Docker[com.docker.driver.amd64-linux][27183] <Notice>: vcpu 0 waiting for signal to resume
Aug 17 14:59:31 Docker[com.docker.driver.amd64-linux][27183] <Notice>: vcpu freeze complete: allowing sleep
Aug 17 14:59:31 kernel[kern][0] <Notice>: PM response took 105 ms (27183, com.docker.drive)
Aug 17 14:59:31 Docker[com.docker.driver.amd64-linux][27183] <Notice>: vcpu 1 waiting for signal to resume
Aug 17 15:00:10 Docker[com.docker.driver.amd64-linux][27183] <Notice>: System wants to wake up
Aug 17 15:00:10 Docker[com.docker.driver.amd64-linux][27183] <Notice>: Asking com.docker.hyperkit to thaw vcpus
Aug 17 15:00:10 Docker[com.docker.driver.amd64-linux][27183] <Notice>: vcpu 3 received signal, resuming
Aug 17 15:00:10 Docker[com.docker.driver.amd64-linux][27183] <Notice>: vm exit[3]
Aug 17 15:00:10 Docker[com.docker.driver.amd64-linux][27183] <Notice>:       reason          VMX
Aug 17 15:00:10 Docker[com.docker.driver.amd64-linux][27183] <Notice>:       rip             0xffffffff8103b2e4
Aug 17 15:00:10 Docker[com.docker.driver.amd64-linux][27183] <Notice>:       inst_length     3
Aug 17 15:00:10 Docker[com.docker.driver.amd64-linux][27183] <Notice>:       status          0
Aug 17 15:00:10 Docker[com.docker.driver.amd64-linux][27183] <Notice>: virtio-net-vpnkit: initialising, opts="uuid=...
Aug 17 15:00:10 Docker[com.docker.driver.amd64-linux][27183] <Notice>:       exit_reason     2
Aug 17 15:00:10 Docker[com.docker.driver.amd64-linux][27183] <Notice>: Interface will have uuid 7916572e-0a15-49fc-93a3-b8bb31c3ed15
Aug 17 15:00:10 Docker[com.docker.driver.amd64-linux][27183] <Notice>:       qualification   0x0000000000000000
Aug 17 15:00:10 Docker[com.docker.driver.amd64-linux][27183] <Notice>:       inst_type               0
Aug 17 15:00:10 Docker[com.docker.driver.amd64-linux][27183] <Notice>:       inst_error              0
Aug 17 15:00:10 Docker[com.docker.driver.amd64-linux][27183] <Notice>: Connection established with MAC=c0:ff:ee:c0:ff:ee and MTU 1500

It looks like it died in vmexit_vmx with VMEXIT_ABORT

Bhyve unneeded code?

The consport.c and the -b flag code along with the init_bvmcons() code is redundant as this should (I think) always make use of lpc device emulation handles by uart_emul.c for tty support.. therefore never using the bhyve virtual machine console that is exposed through consport.c

I've already tested that this isn't needed, from the xhyve source tree on the mac.

Improve automated testing

Currently by the looks of it circle just checks the source can compile and build. To support contributor development of greater OS coverage issue #19 it would be nice if there was a simple test cycle that could be run per OS

I'd imagine a shell based test driver that boots the OS - records console output and does some validation would be a simple first step.

I don't want to implement without discussion so please give me feedback if this sounds of use.

Help understanding the main differences between this project and xhyve

Hi, I'm just getting into hyperkit and I was hoping someone could just briefly say in a few words what the main differences between this project and xhyve are? Thank you!

Docker Network Not Operating Harmoniously with Ethernet and Proxy

Hello. I have run into an issue. Our organization uses z-scaler for a network proxy. We have gotten Docker with Docksal working to run our Drupal websites on our local Windows development machines. We're mighty chuffed about this, but there's one issue. When we try to push and pull to and from GitHub and the Docker repos, we can't establish a connection. We also can't access most content via port 80 except for locations that are explicitly whitelisted by our proxy. We ARE able to access them with Docker disabled, but of course our dev sites can't be accessed with Docker turned off. So it seems to me there's some kind of specific user permissions which are not being picked up from Windows by Docker in order to access these resources while it is running.

Specifically, enabling DockerNAT causes me to lose access to most internet locations through my browser and command line. Disabling it restores that access. Is there some kind of setting in the DockerNAT connection that needs to be configured to work between Docker and my proxy?

I also see that I have an Ethernet connection in my list of network connections and a vEthernet (HNS Internal NIC). Does one of those have to be disabled and/or enabled in order for DockerNAT to run harmoniously?

Note: we do have the "Manual proxy configuration" set under Settings > Proxies for Docker and we have proxy settings set for both babun and git.

Does anyone have any thoughts or guidance in regards to this issue. It's certainly vexing.

No network interface created on Host that links to Guest

Currently there is no virtual network interface created on the Host (MacOSX) which links to the interface inside the Guest (VM). This creates a lot of limitation in terms of IP routing.

As it stands the Guest can route outwards and even ping the Host but the other way is not true. This also adds additional limitations to docker which are not desirable.

What I am trying to achieve is to create a route in the Host which hops over the Guest into one of its containers. However since there is no IP mapped to the Guest inside the Host there is no way to do this.

VirtualBox solves this by create a virtual interface with an IP address as such:

vboxnet5: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500
    ether 0a:00:27:00:00:05
    inet 172.16.41.1 netmask 0xffffff00 broadcast 172.16.41.255

Also in the past this was solved with docker-machine using the following:

sudo route -n add 172.17.0.0/16 $(docker-machine ip <MACHINE NAME>)

Furthermore, from what I have seen so far it seems that bhyve and xhyve do support creating tap interfaces on the Host which should map to the Guest which would solve this whole issue.
https://www.freebsd.org/doc/handbook/virtualization-host-bhyve.html
https://github.com/mist64/xhyve#networking
http://tuntaposx.sourceforge.net/

Monitor power events for ACPI crash workaround

As noted in #7 it would be better if hyperkit would watch for OSX power events itself and freeze the VCPUs rather than relying on an external monitor to do so and send signals.

Wrong location of the ramdisk in kexec

In machyve/xhyve#116, VinDuv(no mention because noisy) said

The initramfs is actually corrupted by the Linux self-decompression mechanism. The decompressed kernel is written (in this case) from address 0x01000000 (16MB) to 0x0210c388 (~33MB). xhyve puts the initramfs at 0x01343000 (19MB), which gets overwritten.
"Adding a sufficient offset to ramdisk_start in kexec.c allows the installer to start successfully. I have not tested it furthe"

Yes, kexec.c source have a wrong point.
I tried another kernel such as minikube-iso, it will fail on

Initramfs unpacking failed: junk in compressed archive

ramdisk_start variable was defined ALIGNUP((kernel.base + kernel.size), 0x1000ull);, but if use large kernel size, this start memory size is small.

This diff is tentative. 0x210c388ull variable is unfounded. (I know, very ugly. it's not correct ALIGNUP.)
But it will work on the larger size kernel than boot2docker.iso.

diff --git a/src/firmware/kexec.c b/src/firmware/kexec.c
index 61aeebb..a5ee627 100644
--- a/src/firmware/kexec.c
+++ b/src/firmware/kexec.c
@@ -184,7 +184,7 @@ kexec_load_ramdisk(char *path) {
    sz = (size_t) ftell(f);
    fseek(f, 0, SEEK_SET);

-   ramdisk_start = ALIGNUP((kernel.base + kernel.size), 0x1000ull);
+   ramdisk_start = ALIGNUP((kernel.base + kernel.size), 0x210c388ull);

    if ((ramdisk_start + sz) > memory.size) {
        /* not enough memory */

I was read the source a little, but I could not be to know the correct value.
(related the linux initramfs specification? or other? Sorry, I don't know.)

Do you have any idea? or have the information of any specification?
Thanks.

ref: https://github.com/coreos/minikube-iso/issues/17#issuecomment-250664776

Chewing RAM

I've had a simple webserver running for a couple of days or so. Nothing major, just serving pages from my mac. Hyperkit is eating more that 2gb of ram?
The setup is (nodejs + a mongo db offsite) what gives?
https://imgur.com/a/kxeZR
Im dont know what information about the process would be helpfull, if you have any suggestions, i would try to recreate the scenario.

f_punchhole for < macos 10.12?

Is there a way to get the punchhole headers needed to compile hyperkit onto a < 10.12 system? I was trying to compile the latest master branch today when I ran into the same error described in #160:

src/lib/block_if.c:275:21: error: variable has incomplete type 'struct fpunchhole'
                struct fpunchhole arg = { .fp_flags = 0, .reserved = 0, .fp_offset = (off_t) fp_offset, .fp_length = (off_t) fp_length };
                                  ^
src/lib/block_if.c:275:10: note: forward declaration of 'struct fpunchhole'
                struct fpunchhole arg = { .fp_flags = 0, .reserved = 0, .fp_offset = (off_t) fp_offset, .fp_length = (off_t) fp_length };
                       ^
src/lib/block_if.c:276:26: error: use of undeclared identifier 'F_PUNCHHOLE'
                ret = fcntl(bc->bc_fd, F_PUNCHHOLE, &arg);
                                       ^
src/lib/block_if.c:713:22: error: variable has incomplete type 'struct fpunchhole'
                        struct fpunchhole arg = { .fp_flags = 0, .reserved = 0, .fp_offset = 0, .fp_length = 0 };
                                          ^
src/lib/block_if.c:713:11: note: forward declaration of 'struct fpunchhole'
                        struct fpunchhole arg = { .fp_flags = 0, .reserved = 0, .fp_offset = 0, .fp_length = 0 };
                               ^
src/lib/block_if.c:714:18: error: use of undeclared identifier 'F_PUNCHHOLE'
                        if (fcntl(fd, F_PUNCHHOLE, &arg) == 0) {
                                      ^
4 errors generated.
make: *** [build/lib/block_if.o] Error 1
cc src/lib/block_if.c

(RFE?) Hyperthread NUMA topology not presented

Consider, on a 2017 Macbook Pro on MacOS Sierra:
$ sysctl -n machdep.cpu.brand_string
Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz
$ docker info
Containers: 1
Running: 1
Paused: 0
Stopped: 0
Images: 15
Server Version: 17.06.2-ce
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 6e23458c129b551d5c9871e5174f6b1b7f6d1170
runc version: 810190ceaa507aa2727d7ae6f4790c76ec150bd2
init version: 949e6fa
Security Options:
seccomp
Profile: default
Kernel Version: 4.9.41-moby
Operating System: Alpine Linux v3.5
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 1.952GiB
Name: moby
ID: V7IC:5SQA:ZEIA:GNI3:QAON:ZM2I:7UYA:VFXS:3547:YUZG:VR2S:XWYE
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
File Descriptors: 26
Goroutines: 46
System Time: 2017-09-20T18:36:38.74907427Z
EventsListeners: 1
No Proxy: *.local, 169.254/16
Registry: https://index.docker.io/v1/
Experimental: true
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
$ docker run -it --rm centos /usr/bin/lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 2
On-line CPU(s) list: 0,1
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 142
Model name: Intel(R) Core(TM) i5-7360U CPU @ 2.30GHz
Stepping: 9
CPU MHz: 2300.000
BogoMIPS: 4608.00
Hypervisor vendor: vertical
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 4096K
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht pbe syscall nx pdpe1gb lm constant_tsc rep_good nopl xtopology nonstop_tsc eagerfpu pni pclmulqdq dtes64 ds_cpl ssse3 sdbg fma cx16 xtpr pcid sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch fsgsbase bmi1 hle avx2 bmi2 erms rtm xsaveopt arat

This is a 1-socket CPU with two threads that shows up as a 2-socket system, with each socket having a single thread. Software designed for utilizing L1 cache effectively won't operate properly under this runtime. Are there configuration options available in hyperkit/xhyve that would allow us to set CPU/NUMA topology correctly, or is this an RFE?

connecting to virtio sockets from VM hangs HyerKit

When using the virtsock stress tests and just running the client in the VM with no one listening seems to hang HyperKit (or at least the virtio socket backend).

With DfM create follow the instructions in the stress the test README to create a Dockerfile and then run:

docker build -t stress . && docker run -it --rm --net=host --privileged stress -c 2

This will cause some messages like this to be printed:

Client connecting to 00000002.00005653
2017/07/03 09:30:31 [00000] Failed to Dial: 00000002.00005653 failed connect() to 00000002.00005653: connection reset by peer
2017/07/03 09:30:31 [00001] Failed to Dial: 00000002.00005653 failed connect() to 00000002.00005653: connection reset by peer
2017/07/03 09:30:31 [00002] Failed to Dial: 00000002.00005653 failed connect() to 00000002.00005653: connection reset by peer
2017/07/03 09:30:31 [00003] Failed to Dial: 00000002.00005653 failed connect() to 00000002.00005653: connection reset by peer
2017/07/03 09:30:31 [00004] Failed to Dial: 00000002.00005653 failed connect() to 00000002.00005653: connection reset by peer
2017/07/03 09:30:31 [00005] Failed to Dial: 00000002.00005653 failed connect() to 00000002.00005653: connection reset by peer
2017/07/03 09:30:31 [00006] Failed to Dial: 00000002.00005653 failed connect() to 00000002.00005653: connection reset by peer
2017/07/03 09:30:31 [00007] Failed to Dial: 00000002.00005653 failed connect() to 00000002.00005653: connection reset by peer

and eventually it will hang. The subsequently, docker commands, or anything else going over virtio sockets will hang.

Here is the thread backtrace when hyperkit is in this state:

lldb -p 36032
(lldb) process attach --pid 36032
Process 36032 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
    frame #0: 0x00007fffa161ed96 libsystem_kernel.dylib`kevent + 10
libsystem_kernel.dylib`kevent:
->  0x7fffa161ed96 <+10>: jae    0x7fffa161eda0            ; <+20>
    0x7fffa161ed98 <+12>: movq   %rax, %rdi
    0x7fffa161ed9b <+15>: jmp    0x7fffa1616caf            ; cerror_nocancel
    0x7fffa161eda0 <+20>: retq

Executable module set to "/Applications/Docker.app/Contents/Resources/bin/hyperkit".
Architecture set to: x86_64h-apple-macosx.
(lldb) thread backtrace all
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
  * frame #0: 0x00007fffa161ed96 libsystem_kernel.dylib`kevent + 10
    frame #1: 0x000000010972f35d hyperkit`main + 10856
    frame #2: 0x00007fffa14ef235 libdyld.dylib`start + 1
    frame #3: 0x00007fffa14ef235 libdyld.dylib`start + 1

  thread #2
    frame #0: 0x00007fffa161deb6 libsystem_kernel.dylib`__select + 10
    frame #1: 0x0000000109867938 hyperkit`unix_select + 305
    frame #2: 0x00000001097a7123 hyperkit`camlLwt_engine__fun_3017 + 35
    frame #3: 0x00000001097a6d1a hyperkit`camlLwt_engine__fun_2956 + 442
    frame #4: 0x00000001097a93d8 hyperkit`camlLwt_main__run_1327 + 136
    frame #5: 0x00000001097e7689 hyperkit`camlThread__fun_1564 + 137
    frame #6: 0x00000001098624cc hyperkit`caml_start_program + 92
    frame #7: 0x0000000109846cb5 hyperkit`caml_thread_start + 107
    frame #8: 0x00007fffa170893b libsystem_pthread.dylib`_pthread_body + 180
    frame #9: 0x00007fffa1708887 libsystem_pthread.dylib`_pthread_start + 286
    frame #10: 0x00007fffa170808d libsystem_pthread.dylib`thread_start + 13

  thread #3
    frame #0: 0x00007fffa161deb6 libsystem_kernel.dylib`__select + 10
    frame #1: 0x0000000109846d73 hyperkit`caml_thread_tick + 76
    frame #2: 0x00007fffa170893b libsystem_pthread.dylib`_pthread_body + 180
    frame #3: 0x00007fffa1708887 libsystem_pthread.dylib`_pthread_start + 286
    frame #4: 0x00007fffa170808d libsystem_pthread.dylib`thread_start + 13

  thread #4
    frame #0: 0x00007fffa161dbf2 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fffa17097fa libsystem_pthread.dylib`_pthread_cond_wait + 712
    frame #2: 0x000000010984376b hyperkit`worker_loop + 123
    frame #3: 0x00007fffa170893b libsystem_pthread.dylib`_pthread_body + 180
    frame #4: 0x00007fffa1708887 libsystem_pthread.dylib`_pthread_start + 286
    frame #5: 0x00007fffa170808d libsystem_pthread.dylib`thread_start + 13

  thread #5, name = 'callout'
    frame #0: 0x00007fffa161dbf2 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fffa1709833 libsystem_pthread.dylib`_pthread_cond_wait + 769
    frame #2: 0x000000010971373b hyperkit`callout_thread_func + 195
    frame #3: 0x00007fffa170893b libsystem_pthread.dylib`_pthread_body + 180
    frame #4: 0x00007fffa1708887 libsystem_pthread.dylib`_pthread_start + 286
    frame #5: 0x00007fffa170808d libsystem_pthread.dylib`thread_start + 13

  thread #6, name = 'net:ipc:tx'
    frame #0: 0x00007fffa161dbf2 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fffa17097fa libsystem_pthread.dylib`_pthread_cond_wait + 712
    frame #2: 0x00000001097232f9 hyperkit`pci_vtnet_tx_thread.887 + 246
    frame #3: 0x00007fffa170893b libsystem_pthread.dylib`_pthread_body + 180
    frame #4: 0x00007fffa1708887 libsystem_pthread.dylib`_pthread_start + 286
    frame #5: 0x00007fffa170808d libsystem_pthread.dylib`thread_start + 13

  thread #7, name = 'net:ipc:rx'
    frame #0: 0x00007fffa161deb6 libsystem_kernel.dylib`__select + 10
    frame #1: 0x0000000109723788 hyperkit`pci_vtnet_tap_select_func.888 + 662
    frame #2: 0x00007fffa170893b libsystem_pthread.dylib`_pthread_body + 180
    frame #3: 0x00007fffa1708887 libsystem_pthread.dylib`_pthread_start + 286
    frame #4: 0x00007fffa170808d libsystem_pthread.dylib`thread_start + 13

  thread #8, name = 'blk:3:0'
    frame #0: 0x00007fffa161dbf2 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fffa17097fa libsystem_pthread.dylib`_pthread_cond_wait + 712
    frame #2: 0x0000000109716986 hyperkit`blockif_thr + 256
    frame #3: 0x00007fffa170893b libsystem_pthread.dylib`_pthread_body + 180
    frame #4: 0x00007fffa1708887 libsystem_pthread.dylib`_pthread_start + 286
    frame #5: 0x00007fffa170808d libsystem_pthread.dylib`thread_start + 13

  thread #9, name = 'vsock:tx'
    frame #0: 0x00007fffa161dcca libsystem_kernel.dylib`__psynch_rw_wrlock + 10
    frame #1: 0x00007fffa1706d77 libsystem_pthread.dylib`_pthread_rwlock_lock + 478
    frame #2: 0x00000001097254ba hyperkit`pci_vtsock_tx_thread + 3364
    frame #3: 0x00007fffa170893b libsystem_pthread.dylib`_pthread_body + 180
    frame #4: 0x00007fffa1708887 libsystem_pthread.dylib`_pthread_start + 286
    frame #5: 0x00007fffa170808d libsystem_pthread.dylib`thread_start + 13

  thread #10, name = 'vsock:rx'
    frame #0: 0x00007fffa161dc22 libsystem_kernel.dylib`__psynch_mutexwait + 10
    frame #1: 0x00007fffa1708dfa libsystem_pthread.dylib`_pthread_mutex_lock_wait + 100
    frame #2: 0x0000000109726433 hyperkit`get_sock + 9
    frame #3: 0x000000010972596e hyperkit`pci_vtsock_rx_thread + 302
    frame #4: 0x00007fffa170893b libsystem_pthread.dylib`_pthread_body + 180
    frame #5: 0x00007fffa1708887 libsystem_pthread.dylib`_pthread_start + 286
    frame #6: 0x00007fffa170808d libsystem_pthread.dylib`thread_start + 13

  thread #11, name = 'vcpu:0'
    frame #0: 0x00007fffa161dbf2 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fffa1709833 libsystem_pthread.dylib`_pthread_cond_wait + 769
    frame #2: 0x0000000109711a29 hyperkit`xh_vm_run + 1641
    frame #3: 0x00000001097300f3 hyperkit`vcpu_thread + 1215
    frame #4: 0x00007fffa170893b libsystem_pthread.dylib`_pthread_body + 180
    frame #5: 0x00007fffa1708887 libsystem_pthread.dylib`_pthread_start + 286
    frame #6: 0x00007fffa170808d libsystem_pthread.dylib`thread_start + 13

  thread #12
    frame #0: 0x00007fffa161e44e libsystem_kernel.dylib`__workq_kernreturn + 10
    frame #1: 0x00007fffa170848e libsystem_pthread.dylib`_pthread_wqthread + 1023
    frame #2: 0x00007fffa170807d libsystem_pthread.dylib`start_wqthread + 13

  thread #13, name = 'vcpu:1'
    frame #0: 0x00007fffa161dbf2 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fffa1709833 libsystem_pthread.dylib`_pthread_cond_wait + 769
    frame #2: 0x0000000109711a29 hyperkit`xh_vm_run + 1641
    frame #3: 0x00000001097300f3 hyperkit`vcpu_thread + 1215
    frame #4: 0x00007fffa170893b libsystem_pthread.dylib`_pthread_body + 180
    frame #5: 0x00007fffa1708887 libsystem_pthread.dylib`_pthread_start + 286
    frame #6: 0x00007fffa170808d libsystem_pthread.dylib`thread_start + 13

  thread #14, name = '9p:db'
    frame #0: 0x00007fffa161f246 libsystem_kernel.dylib`read + 10
    frame #1: 0x0000000109720092 hyperkit`pci_vt9p_thread + 392
    frame #2: 0x00007fffa170893b libsystem_pthread.dylib`_pthread_body + 180
    frame #3: 0x00007fffa1708887 libsystem_pthread.dylib`_pthread_start + 286
    frame #4: 0x00007fffa170808d libsystem_pthread.dylib`thread_start + 13

  thread #15, name = '9p:port'
    frame #0: 0x00007fffa161f246 libsystem_kernel.dylib`read + 10
    frame #1: 0x0000000109720092 hyperkit`pci_vt9p_thread + 392
    frame #2: 0x00007fffa170893b libsystem_pthread.dylib`_pthread_body + 180
    frame #3: 0x00007fffa1708887 libsystem_pthread.dylib`_pthread_start + 286
    frame #4: 0x00007fffa170808d libsystem_pthread.dylib`thread_start + 13

  thread #16
    frame #0: 0x00007fffa161dbf2 libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fffa17097fa libsystem_pthread.dylib`_pthread_cond_wait + 712
    frame #2: 0x000000010984376b hyperkit`worker_loop + 123
    frame #3: 0x00007fffa170893b libsystem_pthread.dylib`_pthread_body + 180
    frame #4: 0x00007fffa1708887 libsystem_pthread.dylib`_pthread_start + 286
    frame #5: 0x00007fffa170808d libsystem_pthread.dylib`thread_start + 13
(lldb)

Running Ubuntu cloud images with user-data

I am trying to run the Ubuntu Cloud images with hyperkit. My script is as follows:

KERNEL="xenial-server-cloudimg-amd64-vmlinuz-generic"
INITRD="xenial-server-cloudimg-amd64-initrd-generic"
CMDLIN="earlyprintk=serial console=ttyS0 root=/dev/sda1"

IMAGE="xenial-server-cloudimg-amd64-disk1.img"
QCOW2="xenial-server-cloudimg-amd64-disk1.qcow2"

hyperkit -A -m 512M -s 0:0,hostbridge \
  -s 31,lpc \
  -l com1,stdio \
  -s 1:0,ahci-hd,file://$(pwd)/$QCOW2,format=qcow  \
  -s 5,ahci-cd,$(pwd)/seed.img \
  -f kexec,$KERNEL,$INITRD,$CMDLIN

To convert it to a proper qcow2 format.

qemu-img convert -O qcow2 $IMAGE $QCOW2
qemu-img resize $QCOW2 +50G

By default, it tries to read a cloudinit metadata from network and fails. I instead use the cloud-localds on Ubuntu to generate user-data and meta-data files in a CD-ROM (ISO image). However I get the following output and it hangs while trying to read from that ATA device.

How it is done on Ubuntu with KVM can be found here: http://blog.dustinkirkland.com/2016/09/howto-launch-ubuntu-cloud-image-with.html

[  OK  ] Created slice system-getty.slice.
[  OK  ] Found device /dev/ttyS0.
[  OK  ] Listening on Load/Save RF Kill Switch Status /dev/rfkill Watch.
hyperkit: [INFO] Allocator: 32764 used; 0 junk; 0 erased; 0 available; 0 copies; 0 roots; 0 Copying; 0 Copied; 0 Flushed; 0 Referenced; max_cluster = 32767
                                                                                                                                                           hyperkit: [INFO] Allocator: file contains cluster 0 .. 32767 will enlarge file to 0 .. 33279
                                                                                           hyperkit: [INFO] resize: adding available clusters (Node ((x 32768) (y 33279) (l Empty) (r Empty) (h 1) (cardinal 512)))
                                                       hyperkit: [ERROR] write sector = 2004992 length = 4096: I/O deadline exceeded
                                                                                                                                    hyperkit: [INFO] (((description "Anonymous client 6305")
                                   (locks (((description "cluster 1") (mode Read) (released false)))))
                                                                                                        ((description "write sector = 2004992 length = 4096")
     (locks
                 (((description "cluster 6874") (mode Read) (released false))
                                                                                     ((description "cluster 14644") (mode Read) (released false))))))
                                                                                                                                                     [   39.012675] ata1.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x6 frozen
[   39.013944] ata1.00: failed command: WRITE FPDMA QUEUED
[   39.014795] ata1.00: cmd 61/00:00:c0:78:80/01:00:04:00:00/40 tag 0 ncq 131072 out
[   39.014795]          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
[   39.017010] ata1.00: status: { DRDY }

Docker crashes when container is making lots of DNS queries

Hello,

One of my containers is making 30 DNS queries / second which are resulting in failure in development environment (the failure is expected). After a minute of running, Docker crashes. I see the following in the log file:

9/20/16 10:38:11.680 PM Docker[13074]: Docker is not responding: Get http://./info: EOF: waiting 0.5s
9/20/16 10:38:12.184 PM Docker[13074]: Docker is not responding: Get http://./info: EOF: waiting 0.5s
9/20/16 10:38:12.686 PM Docker[13074]: Docker is not responding: Get http://./info: EOF: waiting 0.5s
9/20/16 10:38:13.186 PM Docker[13074]: Docker is not responding: Get http://./info: EOF: waiting 0.5s
9/20/16 10:42:51.000 PM kernel[0]: file: table is full
9/20/16 10:42:51.045 PM Docker[13073]: Socket.Datagram.input: bind returned file table overflow
9/20/16 10:42:51.000 PM kernel[0]: file: table is full
9/20/16 10:42:51.046 PM Docker[13073]: Socket.Datagram.input: bind returned file table overflow
9/20/16 10:42:51.000 PM kernel[0]: file: table is full
9/20/16 10:42:51.077 PM Docker[13073]: Socket.Datagram.input: bind returned file table overflow
9/20/16 10:42:51.000 PM kernel[0]: file: table is full
9/20/16 10:42:51.101 PM Docker[13073]: Socket.Datagram.input: bind returned file table overflow
9/20/16 10:42:51.000 PM kernel[0]: file: table is full
9/20/16 10:42:51.109 PM Docker[13073]: Socket.Datagram.input: bind returned file table overflow
9/20/16 10:42:51.000 PM kernel[0]: file: table is full

Output of lsof on host machine (Mac):

$ ps xau|grep -i docker|awk '{ print "-p "  $2 }'|xargs lsof
...
com.docke 15037 andreychernih 1071u    IPv4 0x47027f022025ad3d        0t0      UDP *:61073
com.docke 15037 andreychernih 1072u    IPv4 0x47027f022025e495        0t0      UDP *:50644
com.docke 15037 andreychernih 1073u    IPv4 0x47027f022025aad5        0t0      UDP *:58157
com.docke 15037 andreychernih 1074u    IPv4 0x47027f022025a86d        0t0      UDP *:59711
com.docke 15037 andreychernih 1075u    IPv4 0x47027f022025e6fd        0t0      UDP *:49271
com.docke 15037 andreychernih 1076u    IPv4 0x47027f022025e965        0t0      UDP *:53894
com.docke 15037 andreychernih 1077u    IPv4 0x47027f022025ebcd        0t0      UDP *:52993
com.docke 15037 andreychernih 1078u    IPv4 0x47027f022025ee35        0t0      UDP *:58466
com.docke 15037 andreychernih 1079u    IPv4 0x47027f022025a605        0t0      UDP *:50211
com.docke 15037 andreychernih 1080u    IPv4 0x47027f022025a39d        0t0      UDP *:55463
com.docke 15037 andreychernih 1081u    IPv4 0x47027f022025a135        0t0      UDP *:50014
com.docke 15037 andreychernih 1082u    IPv4 0x47027f02202697b5        0t0      UDP *:64889
com.docke 15037 andreychernih 1083u    IPv4 0x47027f022026954d        0t0      UDP *:53156
com.docke 15037 andreychernih 1084u    IPv4 0x47027f02202692e5        0t0      UDP *:52518
com.docke 15037 andreychernih 1085u    IPv4 0x47027f0220269a1d        0t0      UDP *:55100
com.docke 15037 andreychernih 1086u    IPv4 0x47027f0220269c85        0t0      UDP *:57927
com.docke 15037 andreychernih 1087u    IPv4 0x47027f0220269eed        0t0      UDP *:52377
com.docke 15037 andreychernih 1088u    IPv4 0x47027f022026907d        0t0      UDP *:57464
com.docke 15037 andreychernih 1089u    IPv4 0x47027f0220268e15        0t0      UDP *:54893
com.docke 15037 andreychernih 1090u    IPv4 0x47027f0220268bad        0t0      UDP *:63309
com.docke 15037 andreychernih 1091u    IPv4 0x47027f022026a155        0t0      UDP *:64795
com.docke 15037 andreychernih 1092u    IPv4 0x47027f022026a3bd        0t0      UDP *:58145
com.docke 15037 andreychernih 1093u    IPv4 0x47027f02202686dd        0t0      UDP *:54076
com.docke 15037 andreychernih 1094u    IPv4 0x47027f022025cc85        0t0      UDP *:49685
com.docke 15037 andreychernih 1095u    IPv4 0x47027f022026a88d        0t0      UDP *:63341
com.docke 15037 andreychernih 1096u    IPv4 0x47027f022026aaf5        0t0      UDP *:61734
com.docke 15037 andreychernih 1097u    IPv4 0x47027f022026ad5d        0t0      UDP *:50594
com.docke 15037 andreychernih 1098u    IPv4 0x47027f022026b22d        0t0      UDP *:54337
com.docke 15037 andreychernih 1099u    IPv4 0x47027f0220268475        0t0      UDP *:55839
...

Looks like kernel is running out of open files limit.

$ netstat -na
...
udp4       0      0  *.56162                *.*
udp4       0      0  *.57800                *.*
udp4       0      0  *.55147                *.*
udp4       0      0  *.57258                *.*
udp4       0      0  *.59388                *.*
udp4       0      0  *.50489                *.*
udp4       0      0  *.51222                *.*
udp4       0      0  *.57331                *.*
udp4       0      0  *.55401                *.*
udp4       0      0  *.57428                *.*
udp4       0      0  *.50763                *.*
udp4       0      0  *.49741                *.*
udp4       0      0  *.54452                *.*
udp4       0      0  *.54194                *.*
udp4       0      0  *.49581                *.*
udp4       0      0  *.65192                *.*
udp4       0      0  *.60835                *.*
udp4       0      0  *.52061                *.*
udp4       0      0  *.61005                *.*
udp4       0      0  *.57004                *.*
udp4       0      0  *.64485                *.*
udp4       0      0  *.64104                *.*
udp4       0      0  *.53502                *.*
udp4       0      0  *.64804                *.*
udp4       0      0  *.64373                *.*
udp4       0      0  *.57235                *.*
udp4       0      0  *.64774                *.*
udp4       0      0  *.58719                *.*
udp4       0      0  *.58898                *.*
...

This can be reproduced by doing something like this inside the container:

$ while [[ true ]]; do nslookup nonexistinghost; done

Hypervisor features

Several traditional hypervisor features are not supported, such as

suspend/resume
live relocation
support for hardware performance counters

Can't boot Alpine Linux on Hyperkit

Trying to run Alpine Linux on Hyperkit. Get a "failed" error when running.

Macos Sierra + Hyperkit build from master.

install.sh

KERNEL="mnt/boot/vmlinuz-virthardened"
INITRD="mnt/boot/initramfs-virthardened"
CMDLINE="alpine_dev=cdrom:iso9660 modules=af_socket,loop,squashfs,sd-mod,usb-storage,sr-mod,earlyprintk=serial console=ttyS0"
MEM="-m 8G"
SMP="-c 4"
NET="-s 2:0,virtio-net"
IMG_CD="-s 3,ahci-cd,alpine-virt-3.6.2-x86_64.iso"
IMG_HDD="-s 4,virtio-blk,hdd.img"
PCI_DEV="-s 0:0,hostbridge -s 31,lpc"
LPC_DEV="-l com1,stdio"
ACPI="-A"
#UUID="-U FBD1B520-3130-488A-A4E4-AC490BDAC231"
hyperkit $ACPI $UUID $SMP $MEM $PCI_DEV $LPC_DEV $NET $IMG_CD $IMG_HDD -f kexec,$KERNEL,$INITRD,"$CMDLINE" -w

run.sh

KERNEL="vmlinuz-virthardened"
INITRD="initramfs-virthardened"
CMDLINE="earlyprintk=serial console=ttyS0 modules=af_socket,loop,squashfs,sd-mod,usb-storage,sr-mod,ext4"
MEM="-m 8G"
#SMP="-c 4"
NET="-s 2:0,virtio-net"
#IMG_CD="-s 3,ahci-cd,alpine-3.2.3-x86_64.iso"
IMG_HDD="-s 4,virtio-blk,hdd.img"
PCI_DEV="-s 0:0,hostbridge -s 31,lpc"
LPC_DEV="-l com1,stdio"
ACPI="-A"
hyperkit $ACPI $SMP $MEM $PCI_DEV $LPC_DEV $NET $IMG_HDD -f kexec,$KERNEL,$INITRD,"$CMDLINE" -w

Kernel output

[    0.000000] Linux version 4.9.32-0-virthardened (buildozer@build-3-6-x86_64) (gcc version 6.3.0 (Alpine 6.3.0) ) #1-Alpine SMP Fri Jun 16 12:39:17 GMT 2017
[    0.000000] Command line: earlyprintk=serial console=ttyS0 modules=af_socket,loop,squashfs,sd-mod,usb-storage,sr-mod,ext4
[    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'standard' format.
[    0.000000] x86/fpu: Using 'eager' FPU context switches.
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x00000000bfffffff] usable
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000023fffffff] usable
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] SMBIOS 2.6 present.
[    0.000000] e820: last_pfn = 0x240000 max_arch_pfn = 0x400000000
[    0.000000] MTRR: Disabled
[    0.000000] x86/PAT: MTRRs disabled, skipping PAT initialization too.
[    0.000000] x86/PAT: Configuration [0-7]: WB  WT  UC- UC  WB  WT  UC- UC
[    0.000000] CPU MTRRs all blank - virtualized system.
[    0.000000] e820: last_pfn = 0xc0000 max_arch_pfn = 0x400000000
[    0.000000] Using GB pages for direct mapping
[    0.000000] RAMDISK: [mem 0x7fade000-0x7fffffff]
[    0.000000] ACPI: Early table checksum verification disabled
[    0.000000] ACPI: RSDP 0x00000000000F2400 000024 (v02 BHYVE )
[    0.000000] ACPI: XSDT 0x00000000000F2480 000044 (v01 BHYVE  BVXSDT   00000001 INTL 20140828)
[    0.000000] ACPI: APIC 0x00000000000F2500 00005A (v01 BHYVE  BVMADT   00000001 INTL 20140828)
[    0.000000] ACPI: FACP 0x00000000000F2A00 00010C (v05 BHYVE  BVFACP   00000001 INTL 20140828)
[    0.000000] ACPI: DSDT 0x00000000000F2C00 000A2D (v02 BHYVE  BVDSDT   00000001 INTL 20140828)
[    0.000000] ACPI: FACS 0x00000000000F2BC0 000040
[    0.000000] ACPI: FACS 0x00000000000F2BC0 000040
[    0.000000] ACPI: HPET 0x00000000000F2B40 000038 (v01 BHYVE  BVHPET   00000001 INTL 20140828)
[    0.000000] ACPI: MCFG 0x00000000000F2B80 00003C (v01 BHYVE  BVMCFG   00000001 INTL 20140828)
[    0.000000] Setting APIC routing to physical flat.
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
[    0.000000]   DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
[    0.000000]   Normal   [mem 0x0000000100000000-0x000000023fffffff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000001000-0x000000000009efff]
[    0.000000]   node   0: [mem 0x0000000000100000-0x00000000bfffffff]
[    0.000000]   node   0: [mem 0x0000000100000000-0x000000023fffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000001000-0x000000023fffffff]
[    0.000000] ACPI: PM-Timer IO Port: 0x408
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] high edge lint[0x1])
[    0.000000] IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 high edge)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 low level)
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] ACPI: HPET id: 0x80860701 base: 0xfed00000
[    0.000000] smpboot: Allowing 1 CPUs, 0 hotplug CPUs
[    0.000000] e820: [mem 0xc0000000-0xffffffff] available for PCI devices
[    0.000000] Booting paravirtualized kernel on bare hardware
[    0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.000000] setup_percpu: NR_CPUS:32 nr_cpumask_bits:32 nr_cpu_ids:1 nr_node_ids:1
[    0.000000] percpu: Embedded 32 pages/cpu @ffff88023fc00000 s91352 r8192 d31528 u2097152
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 2064265
[    0.000000] Kernel command line: earlyprintk=serial console=ttyS0 modules=af_socket,loop,squashfs,sd-mod,usb-storage,sr-mod,ext4
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] Dentry cache hash table entries: 1048576[    0.002270] ACPI: 1 ACPI AML tables successfully acquired and loaded
[    0.002771] Mount-cache hash table entries: 16384 (order: 5, 131072 bytes)
[    0.003219] Mountpoint-cache hash table entries: 16384 (order: 5, 131072 bytes)
[    0.004511] CPU: Physical Processor ID: 0
[    0.004847] PAX: PCID detected
[    0.005059] Last level iTLB entries: 4KB 64, 2MB 8, 4MB 8
[    0.005492] Last level dTLB entries: 4KB 64, 2MB 0, 4MB 0, 1GB 4
[    0.032310] Freeing SMP alternatives memory: 20K (ffffffff81a10000 - ffffffff81a15000)
[    0.045581] smpboot: Max logical packages: 1
[    0.046937] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.261852] smpboot: CPU0: Intel(R) Core(TM) i7-6820HQ CPU @ 2.70GHz (family: 0x6, model: 0x5e, stepping: 0x3)
[    0.262599] Performance Events: unsupported p6 CPU model 94 no PMU driver, software events only.
[    0.263735] NMI watchdog: disabled (cpu0): hardware events not enabled
[    0.264166] NMI watchdog: Shutting down hard lockup detector on all cpus
[    0.264684] x86: Booted up 1 node, 1 CPUs
[    0.264949] ----------------
[    0.265156] | NMI testsuite:
[    0.265358] --------------------
[    0.265547]   remote IPI:  ok  |
[    0.265753]    local IPI:  ok  |
[    0.266015] --------------------
[    0.266201] Good, all   2 testcases passed! |
[    0.266450] ---------------------------------
[    0.266700] smpboot: Total of 1 processors activated (5424.00 BogoMIPS)
[    0.267512] devtmpfs: initialized
[    0.267807] x86/mm: Memory block size: 128MB
[    0.269798] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.270422] futex hash table entries: 256 (order: 2, 16384 bytes)
[    0.270941] NET: Registered protocol family 16
[    0.271419] cpuidle: using governor ladder
[    0.271718] cpuidle: using governor menu
[    0.272014] ACPI FADT declares the system doesn't support PCIe ASPM, so disable it
[    0.272630] ACPI: bus type PCI registered
[    0.272903] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[    0.273577] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0xe0000000-0xefffffff] (base 0xe0000000)
[    0.274231] PCI: not using MMCONFIG
[    0.274436] PCI: Using configuration type 1 for base access
[    0.281669] HugeTLB registered 1 GB page size, pre-allocated 0 pages
[    0.282118] HugeTLB registered 2 MB page size, pre-allocated 0 pages
[    0.282698] ACPI: Added _OSI(Module Device)
[    0.282926] ACPI: Added _OSI(Processor Device)
[    0.283162] ACPI: Added _OSI(3.0 _SCP Extensions)
[    0.283411] ACPI: Added _OSI(Processor Aggregator Device)
[    0.284079] ACPI: Interpreter enabled
[    0.284285] ACPI: (supports S0 S5)
[    0.284468] ACPI: Using IOAPIC for interrupt routing
[    0.284760] PCI: MMCONFIG for domain 0000 [bus 00-ff] at [mem 0xe0000000-0xefffffff] (base 0xe0000000)
[    0.285334] PCI: MMCONFIG at [mem 0xe0000000-0xefffffff] reserved in ACPI motherboard resources
[    0.285880] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    0.287149] ACPI: PCI Root Bridge [PC00] (domain 0000 [bus 00])
[    0.287469] acpi PNP0A03:00: _OSC: OS supports [ExtendedConfig ASPM ClockPM Segments MSI]
[    0.287994] acpi PNP0A03:00: _OSC failed (AE_NOT_FOUND); disabling ASPM
[  7 9 10 11 12 14 15) *0, disabled.
[    0.299122] ACPI: PCI Interrupt Link [LNKF] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
[    0.300011] ACPI: PCI Interrupt Link [LNKG] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
[    0.300798] ACPI: PCI Interrupt Link [LNKH] (IRQs 3 4 5 6 7 9 10 11 12 14 15) *0, disabled.
[    0.301721] SCSI subsystem initialized
[    0.302583] PCI: Using ACPI for IRQ routing
[    0.337687] clocksource: Switched to clocksource hpet
[    0.338009] VFS: Disk quotas dquot_6.6.0
[    0.338239] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    0.338668] pnp: PnP ACPI init
[    0.339022] system 00:00: [io  0x0220-0x0223] has been reserved
[    0.339364] system 00:00: [io  0x0224-0x0227] has been reserved
[    0.339702] system 00:00: [io  0x04d0-0x04d1] has been reserved
[    0.340039] system 00:00: [io  0x0400-0x0407] has been reserved
[    0.340407] system 00:00: [mem 0xe0000000-0xefffffff] has been reserved
[    0.340917] pnp: PnP ACPI: found 4 devices
[    0.347983] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[    0.348563] pci 0000:00:02.0: BAR 6: assigned [mem 0xc0004000-0xc00047ff pref]
[    0.348955] pci 0000:00:04.0: BAR 6: assigned [mem 0xc0004800-0xc0004fff pref]
[    0.349346] pci 0000:00:1f.0: BAR 6: assigned [mem 0xc0005000-0xc00057ff pref]
[    0.349817] NET: Registered protocol family 2
[    0.350193] TCP established hash table entries: 65536 (order: 7, 524288 bytes)
[    0.351097] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
[    0.352342] TCP: Hash tables configured (established 65536 bind 65536)
[    0.352724] UDP hash table entries: 4096 (order: 5, 131072 bytes)
[    0.353074] UDP-Lite hash table entries: 4096 (order: 5, 131072 bytes)
[    0.353601] NET: Registered protocol family 1
[    0.353873] pci 0000:00:1f.0: Activating ISA DMA hang workarounds
[    0.355897] Unpacking initramfs...
[    0.446953] Freeing initrd memory: 5256K (ffff88007fade000 - ffff880080000000)
[    0.447632] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[    0.448261] software IO TLB [mem 0xbc000000-0xc0000000] (64MB) mapped at [ffff8800bc000000-ffff8800bfffffff]
rdmsr to register 0x64e on vcpu 0
                                 rdmsr to register 0x34 on vcpu 0
                                                                 [    0.449793] workingset: timestamp_bits=46 max_order=21 bucket_order=0
[    0.450569] Key type asymmetric registered
[    0.450841] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
[    0.451334] io scheduler noop registered
[    0.451580] io scheduler deadline registered (default)
[    0.451944] pci_hotplug: PCI Hot Plug PCI Core version: 0.5
[    0.452277] pciehp: PCI Express Hot Plug Controller Driver version: 0.4
[    0.452660] ERST DBG: ERST support is disabled.
[    0.452924] xenfs: not registering filesystem on non-xen platform
[    0.453380] Serial: 8250/16550 driver, 4 ports, IRQ sharing enabled
[    0.475997] 00:01: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
[    0.497906] 00:02: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A
[    0.499010] VMware PVSCSI driver - version 1.0.7.0-k
[    0.499421] i8042: PNP: No PS/2 controller found. Probing ports directly.
[    1.026258] i8042: Can't read CTR while initializing i8042
[    1.026851] i8042: probe of i8042 failed with error -5
[    1.027324] ACPI Error: Could not enable RealTimeClock event (20160831/evxfevnt-212)
[    1.027863] ACPI Warning: Could not enable fixed event - RealTimeClock (4) (20160831/evxface-654)
[    1.029069] rtc_cmos 00:03: rtc core: registered rtc_cmos as rtc0
[    1.029628] rtc_cmos 00:03: alarms up to one day, y3k, 114 bytes nvram
[    1.030121] gre: GRE over IPv4 demultiplexor driver
[    1.030462] Key type dns_resolver registered
[    1.030975] registered taskstats version 1
[    1.031382] rtc_cmos 00:03: setting system clock to 2017-09-22 23:42:27 UTC (1506123747)
[    1.032394] Freeing unused kernel memory: 1144K (ffffffff818f2000 - ffffffff81a10000)
[    1.032839] Write protecting the kernel read-only data: 8192k
[    1.033237] Freeing unused kernel memory: 76K (ffff8800015ed000 - ffff880001600000)
[    1.034042] Freeing unused kernel memory: 220K (ffff8800017c9000 - ffff880001800000)
Alpine Init 3.1.0-r3
 * Loading boot drivers: [    1.042033] loop: module loaded
[    1.045031] ACPI: bus type USB registered
[    1.045274] usbcore: registered new interface driver usbfs
[    1.045569] usbcore: registered new interface driver hub
[    1.045863] usbcore: registered new device driver usb
[    1.047078] usbcore: registered new interface driver usb-storage
ok.
 * Mounting boot media: [    1.066929] virtio-pci 0000:00:02.0: virtio_pci: leaving for legacy driver
[    1.067488] virtio-pci 0000:00:04.0: virtio_pci: leaving for legacy driver
[    1.077376]  vda: vda1 vda2 vda3
[    1.488851] tsc: Refined TSC clocksource calibration: 2710.056 MHz
[    1.489786] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x27105a0899e, max_idle_ns: 440795341951 ns
[    1.602975] EXT4-fs (vda1): mounted filesystem with ordered data mode. Opts: (null)
[    1.615969] EXT4-fs (vda3): mounted filesystem with ordered data mode. Opts: (null)
[    1.619363] random: fast init done
[    2.496167] clocksource: Switched to clocksource tsc
failed.
initramfs emergency recovery shell launched. Type 'exit' to continue boot
sh: can't access tty; job control turned off
/ #

Hyperkit segfaults if format=qcow is passed to it without qcow-config

Hyperkit segfaults if a qcow disk is used, but the qcow-config option is not explicitly passed to it.

Here's an example:

$ FLAGS="-A -s 0:0,hostbridge -s 31,lpc -l com1,stdio -m 4G -s 2:0,virtio-net"

This segfaults:

# com.docker.hyperkit $FLAGS -s "4,virtio-blk,file:///tmp/test.qcow2,format=qcow" -f "kexec,vmlinuz,initramfs.img,earlyprintk=serial console=ttyS0"

While this doesn't:

# com.docker.hyperkit $FLAGS -s "4,virtio-blk,file:///tmp/test.qcow2,format=qcow,qcow-config=" -f "kexec,vmlinuz,initramfs.img,earlyprintk=serial console=ttyS0"

I'm not sure if qcow-config is intended to be a mandatory option, but if it is, hyperkit should probably throw an error instead of segfaulting.

Allow more than 16 vCPUs

Currently https://github.com/docker/hyperkit/blob/master/src/include/xhyve/vmm/vmm_common.h#L33 restricts the maximum number of vCPUs to 16. For compute-heavy workloads, this arbitrary restriction may be undesirable (see docker/for-mac#1144). This number is mainly used to size static arrays but, as @ijc25 notes, there may be work checking that ACPI tables are set up correctly to offer enough cores correctly.

USB Passthrough

Are there any plans to add a USB-passthrough feature? This would be hugely beneficial to people like myself who are using Docker toolbox to allow for USB passthrough into containers running on mac OS. I would rather use the newer Docker for Mac, but it is not doable without the ability to forward USB into containers.

If this would violate the verdict to not deviate from xhyve, is this question more appropriately asked in that repo?

Clock drift issues

See machyve/xhyve#46 and docker/for-mac#17 (and dozens of other confused threads all over the net, mostly stemming from Docker for Mac) for more details. I think this is the right place to address the issue, though.

See also LnL7/nix-docker#7 which links other related threads together.

Hyperkit leaks

Good afternoon,

I noticed(and could read it in other issues) that Hyperkit is having trouble freeing up memory when it have been allocated by the linux kernel which annoys me big time. Do any tickets exists for fixing that issue? Or tickets in underlaying projects that might lead to the issue being fixed in Hyperkit? Or basically anything that would get me closer to taking a stab at fixing it?

password for the tinycore linux

What is the password needed to login to the tinycore linux instance?

amitava:hyperkit amitava$ ./hyperkitrun.sh
~/opt/hyperkit/test ~/opt/hyperkit
Downloading tinycore linux
Password:
Sorry, try again.
Password:
Sorry, try again.
Password:
Sorry, try again.
sudo: 3 incorrect password attempts
~/opt/hyperkit
kexec: failed to load initrd test/initrd.gz
                                           ./hyperkitrun.sh: line 39: 65311 Abort trap: 6           build/com.docker.hyperkit $ACPI $MEM $SMP $PCI_DEV $LPC_DEV $NET $IMG_CD $IMG_HDD $UUID -f kexec,$KERNEL,$INITRD,"$CMDLINE"

macOS guest support

The macOS license actually allows running macOS in a VM on Apple host hardware, which hyperkit would satisfy. I'm wondering what work would be required to enable that. Since quite a few people have figured out how to run it under qemu (with questionable licensing implications), it seems like there's probably enough FOSS code out there that HyperKit could take inspiration from to support macOS guests too.

I imagine the two biggest missing pieces are EFI emulation and the magic "don't steal macOS" blurb.

Readme in OPAM Dev Environment is not working on High Sierra

➜  hyperkit git:(master) ✗ opam install uri qcow.0.10.3 conduit.1.0.0 lwt.3.1.0 qcow-tool mirage-block-unix.2.9.0 conf-libev logs fmt mirage-unix prometheus-app

The following dependencies couldn't be met:
  - qcow-tool -> qcow = 0.10.4
  - qcow-tool -> sha = 1.9
Your request can't be satisfied:
  - Conflicting version constraints for qcow
  - sha.1.9 is not available because your system doesn't comply with ocaml-version < "4.06.0".

No solution found, exiting

I tried following, it installed but not sure how will it work: (I am not too familiar with ocaml)

opam install uri qcow conduit lwt qcow-tool mirage-block-unix.2.9.0 conf-libev logs fmt mirage-unix prometheus-app

Hyperkit heavy system load when nothing runs

I had some heavy load of com.docker.hyperload (fan blows) reporting in Activity Monitor once 100% and once 200% on (4 cores/8 threads)

There are no containers running.

What can I do to report this better when this happens again?

RFC: packaging for brew installation

I'm looking at creating a Homebrew package so that brew install hyperkit will install the binary. We need a couple of changes to make this easier to do:

tagged releases so we have stable tarballs in Homebrew with a sha256 hash. We can address it directly via changeset, but a monotonically increasing version number is useful for the package manager.
resources need to be downloaded by Brew and stored locally, so this makes the OPAM packaging slightly tricky. We could cut a release tarball that vendors the various mirage-block packages needed, and does a local OCaml bootstrap (perhaps based on a cleaned up opam-boot.

I can work on the latter, but it would be good to get maintainer consensus on whether we are ok with tagging and releasing fully vendored tarballs that include the third-party sources from OPAM in one giant tarball that can be built standalone with just the Golang and OCaml compilers installed.

PCI-passthrough/GPU support

This would mainly be useful for machine learning development in Docker, and would allow us to use https://github.com/NVIDIA/nvidia-docker (or similar) to build and run GPU containers.

See machyve/xhyve#108.

make test fails in osx due to mktemp

amitava:hyperkit amitava$ make
gen src/dtrace.d
cc src/vmm/x86.c
cc src/vmm/vmm.c
cc src/vmm/vmm_host.c
cc src/vmm/vmm_mem.c
cc src/vmm/vmm_lapic.c
cc src/vmm/vmm_instruction_emul.c
cc src/vmm/vmm_ioport.c
cc src/vmm/vmm_callout.c
cc src/vmm/vmm_stat.c
cc src/vmm/vmm_util.c
cc src/vmm/vmm_api.c
cc src/vmm/intel/vmx.c
cc src/vmm/intel/vmx_msr.c
cc src/vmm/intel/vmcs.c
cc src/vmm/io/vatpic.c
cc src/vmm/io/vatpit.c
cc src/vmm/io/vhpet.c
cc src/vmm/io/vioapic.c
cc src/vmm/io/vlapic.c
cc src/vmm/io/vpmtmr.c
cc src/vmm/io/vrtc.c
cc src/acpitbl.c
cc src/atkbdc.c
cc src/block_if.c
cc src/consport.c
cc src/dbgport.c
cc src/inout.c
cc src/ioapic.c
cc src/md5c.c
cc src/mem.c
cc src/mevent.c
cc src/mptbl.c
cc src/pci_ahci.c
cc src/pci_emul.c
cc src/pci_hostbridge.c
cc src/pci_irq.c
cc src/pci_lpc.c
cc src/pci_uart.c
cc src/pci_virtio_9p.c
cc src/pci_virtio_block.c
cc src/pci_virtio_net_tap.c
cc src/pci_virtio_net_vmnet.c
cc src/pci_virtio_net_vpnkit.c
cc src/pci_virtio_rnd.c
cc src/pm.c
cc src/post.c
cc src/rtc.c
cc src/smbiostbl.c
cc src/task_switch.c
cc src/uart_emul.c
cc src/xhyve.c
cc src/virtio.c
cc src/xmsr.c
cc src/firmware/bootrom.c
cc src/firmware/kexec.c
cc src/firmware/fbsd.c
ld com.docker.hyperkit.sym
dsym com.docker.hyperkit.dSYM
strip com.docker.hyperkit

Run test - fails

amitava:hyperkit amitava$ make test
usage: mktemp [-d] [-q] [-t prefix] [-u] template ...
       mktemp [-d] [-q] [-u] -t prefix
make: *** [test/vmlinuz] Error 1

Generalize mktemp call - see http://unix.stackexchange.com/questions/30091/fix-or-alternative-for-mktemp-in-os-x

amitava:hyperkit amitava$ git diff HEAD^ HEAD
diff --git a/test/tinycore.sh b/test/tinycore.sh
index c3eff43..67f3de1 100755
--- a/test/tinycore.sh
+++ b/test/tinycore.sh
@@ -9,7 +9,7 @@ set -e

 BASE_URL="http://distro.ibiblio.org/tinycorelinux/"

-TMP_DIR=$(mktemp -d)
+TMP_DIR=$(mktemp -q -d -t "$(basename "$0").XXXXXX")=
 INITRD_DIR="${TMP_DIR}"/initrd

Now getting Error 23

amitava:hyperkit amitava$ make test
Downloading tinycore linux
make: *** [test/initrd.gz] Error 23

OSSpinlock is depreciated

Is this being used in Docker Mac Beta? I'm running macOS dev beta 3 - and somebody on the forums figured out that the ntpd daemon was causing the vm to hang (the normal behavior is for it to start normally, then after using the network for a few minutes - docker becomes unresponsive. Unloading ntpd - docker f works perfectly fine.

sudo launchctl unload/System/Library/LaunchDaemons/org.ntp.ntpd.plist

So I came here to investigate why this was happening. When I started to compile hyperkit, the build stopped and there was a warning that said something like

OSSpinLock is depreciated in version of macOs

Google searched it - found this article: http://engineering.postmates.com/Spinlocks-Considered-Harmful-On-iOS/

Apparently the command is considered harmful and causes networking issues for iOS and macOS - so in Sierra it's not even allowed to be compiled and potentially causing the hang - it's probably causing issues in all versions of OSX but in Sierra it's debilitating. Unless you killntpd.

So i went into the docs to find what they are replacing OSSpinLock with, and I found an undocumented command in os called os_unfair_lock https://developer.apple.com/reference/os/os_unfair_lock

So I started rewriting hacking on hyperkit (and parts of xhyve) to replace it. I've never actually attempted to program C (or even Objective C) - but just wanted to let ya'll know, and if I get it to compile locally - I'll submit a pull request.

I assume the new API would probably be the most efficient replacement - although there may be other ways around this that don't use unfair_lock - I assume that Apple intends that to be a replacement? But the docs don't say anything about it.

Support for booting more guest operating systems

Linux is the only “first class” operating system supported at the moment. FreeBSD does boot, but requires running the installer and so isn’t as seamless.

UEFI support exists (#10) to add UEFI BIOS support to boot Windows, OpenBSD, or NetBSD, but this requires more testing.

iptables failed

Often encounter below problem when trying to start a container via groovy client lib:

de.gesellix.docker.client.DockerResponse(status:[text:Internal Server Error, code:500, success:false], headers:Content-Length: 419 Content-Type: application/json Date: Tue, 12 Jul 2016 02:48:01 GMT Server: Docker/1.12.0-rc3 (linux) , contentType:application/json, mimeType:application/json, contentLength:419, stream:null, content:[message:driver failed programming external connectivity on endpoint anony-WebHiveSql_master (50070d9139a670a89d21b7c617156e9d382ea858604b4976fef00b32897e84db): iptables failed: iptables --wait -t nat -A DOCKER -p tcp -d 0/0 --dport 20000 -j DNAT --to-destination 172.17.0.2:8080 ! -i docker0: Fatal error: exception Unix.Unix_error(Unix.ENOTCONN, "open", "/var/log/service-port-opener.log") (exit status 2)])

the container config is like:

{
"Cmd": [
"/sbin/init"
],
"Image": "deploy_img:1.0",
"Mounts": [
{
"Source": "/tmp/docker-deploy/anony-WebHiveSql_master//scripts",
"Destination": "/scripts",
"RW": true
},
{
"Source": "/tmp/docker-deploy/anony-WebHiveSql_master//web-hive-sql",
"Destination": "/tmp/logs/web-hive-sql/",
"RW": true
},
{
"Source": "/Users/andy/.gradle/",
"Destination": "/root/.gradle/",
"RW": true
},
{
"Source": "/Users/andy/.ssh/",
"Destination": "/root/.ssh",
"RW": true
}
],
"ExposedPorts": {
"8080/tcp": {
}
},
"HostConfig": {
"Binds": [
"/tmp/docker-deploy/anony-WebHiveSql_master//scripts:/scripts",
"/tmp/docker-deploy/anony-WebHiveSql_master//web-hive-sql:/tmp/logs/web-hive-sql/",
"/Users/andy/.gradle/:/root/.gradle/",
"/Users/andy/.ssh/:/root/.ssh"
],
"PortBindings": {
"8080/tcp": [
{
"HostPort": "20000"
}
]
}
},
"Tty": true
}

Since neither Mac OS nor the target container has "iptables", so I guess it may be an issue with hyperkit.

proposal: coding styleguide (or tools)

Hi,
This is not a criticism, it is the proposal.

I was some debug and wrote a fixed code for kexec.c, and reads other files now.
As #65, I think whitespace = fix is good, but still space and tab has mixed on code indent.

i.e. https://github.com/docker/hyperkit/blob/master/src/hyperkit.c#L170-L173
The line of 170 is space, but L171-173 is tab...

I don't mind either way(space or tab), but can we unify code style by any formatting tools such as clang-format?(I think clang-format is modern and generally)
For contributors who want to add any new features, hyperkit maintainer, This problem is no good for anybody.

I'm not going to blame(criticism?) the original(xhyve), but hyperkit became OSS project by the Docker Inc.
If we want to smoothly develop in the future, I think the code style should be unified.

Feature request: USB driver pass through

I have to talk to all these smart card readers over USB.
I don't want to support win, osx, Linux plus all the USB drivers. I have 12 readers, so I end up with 12 x 3 = 36 drivers.

So run it all on linux as a docker guest. That has only the Linux USB drivers for each of the 12 reader types.

But it does not work when I tried a test.

Also i don't want to flag privileged mode and drop my pants.

Any ideas. Is this in the your boiling the ocean department still ?

I have the exact same use case for the Yubi key aspects also btw

feature request: nested virtualization

It would be great if the vmx CPU flag was possible to pass inside VMs created by hyperkit, so that, in example, one could run hypervisors in Docker.

qcow2 file seems to reach a limit.

I don't know lots about hypervisor. If there is a default limit to its VM space why can't this be set through hyperkit? Or can it? I have someone advising me to go in and resize the qcow2 file directly (it's currently at 60.8Gb which is mostly the database volume)

feature request: hyperkit-ctl

It would be convenient to have a hyperkit-ctl command that could:

List running instances
Perform shutdown/poweroff/reboot operations
Show VM configuration
Mount/Unmount ISO's

This came up in discussion on the LinuxKit slack channel when the "interface" for interacting with running instances was described as ps aux | grep hyperkit and kill -9.

Ability to route from host to container

As mentioned here

Unfortunately, due to limitations in OS X, we’re unable to route traffic to containers, and from containers back to the host.

Are there any plans to add support for this, or any other workarounds which don't involve using port forwarding? Furthermore, if routing does not work then how exactly does port forwarding get the packets to the container?

hyperkit returns 1 on clean exit/powerdown

Running hyperkitrun.sh and use sudo poweroff cause the hyperkit process to exit with code 1. It would be nice if it exits with 0 on a clean guest shutdown

virtio-vsock can't handle a high connection rate

This Docker for Mac bug report: docker/for-mac#1417 contains an interesting set of repro steps:

dd if=/dev/zero of=zero.txt bs=1000 count=1
docker run -dit -p 8888:80 --name apache -v "$PWD":/usr/local/apache2/htdocs/ httpd:2.4
for i in {1..100}; do curl -s -S -0 --no-keepalive "http://127.0.0.1:8888/zero.txt?[1-100]" > /dev/null & done; wait

I believe this runs 100 concurrent instances of curl which each perform 100 TCP opens and closes in series. If I run this program inside Moby (where 127.0.0.1:8888 is also bound by the user space proxy) then it works fine.

If I run it on the Mac, I get a bunch of errors about dropped connections.

The difference between the 2 configurations is on the Mac, vpnkit is listening on the TCP port and calling connect to the Unix domain socket used to establish virtio-vsock connections. It then forwards to this Unix domain socket.

I don't see any sign of hitting the vpnkit maximum connection limit (although if the fds weren't closed promptly then the host could be temporarily running out of fds). Ideally I'd like to run a test on the Mac which bypassed the virtio-vsock interface but I don't have anything set up atm (perhaps using vmnet.framework would do the trick?)

I suspect that the virtio-vsock Unix domain socket interface is unable to cope with the high connect/close rate. If this is a fundamental (or hard to fix) problem then we could rewrite the proxy to multiplex over one virtio-vsock connection.

Prevent system entering power saving modes while docker runs

I am running some very long build scripts (20-30 mins) and if I keep the GUI busy, everything is fine.

However, if I leave the computer inactive, the builds crash with various errors.

Starting the script with caffeinate helps, but occasionally I still experience some hangouts. Restarting the script usually continues correctly.

Suggestion:

enhance the docker software with code to prevent macOS to enter power saving modes

(At first glance I thought that this feature should be addressed by the docker utility, but since it is specific to macOS, I guess it is more appropriate to be handled by the hyperkit, if possible).

Expose `virtio-sock` in go bindings

Some downstream projects may want to use virtio-sock via the go bindings. Need to expose this in the API.

Example that uses the 9p backend

I see that in theory at least there's some 9p support in HyperKit, which is awesome, but I don't see any obvious examples of how to use it, in tests or anywhere else. Can someone point me in the right direction, or add something to the tests?

Clarify what can be done without superuser permissions

I'm quite intrigued by the possibility of unprivileged VMs in macOS and am wondering what HyperKit functionality I'd be missing out on by running it as non-root, and how fundamental that is to the model.

(this ticket was prompted by @puffnfresh's tweet https://twitter.com/puffnfresh/status/869875923553734656)

Deadlock in AHCI stop blockif_cancel due to lack of SIGCONT signal delivery

A long running blk io operation (we saw it with a long slow TRIM operation, but expect anything longer than ~30s would do) can cause a Linux guest to attempt to stop and reset the AHCI controller.

The issue appears to be that the SIGCONT which is sent by blockif_cancel and expected to be delivered via the mevent thread calling back into blockif_sigcont_handler never arrives and so blockif_cancel blocks forever. The interesting backtrace is:

  thread #13: tid = 0x2d000a5, 0x00007fff8c0aac8a libsystem_kernel.dylib`__psynch_cvwait + 10, name = 'vcpu:1'
    frame #0: 0x00007fff8c0aac8a libsystem_kernel.dylib`__psynch_cvwait + 10
    frame #1: 0x00007fff8c19496a libsystem_pthread.dylib`_pthread_cond_wait + 712
    frame #2: 0x000000010ea97316 com.docker.hyperkit`pci_ahci_write + 128 at block_if.c:899 [opt]
    frame #3: 0x000000010ea97296 com.docker.hyperkit`pci_ahci_write [inlined] ahci_port_stop(p=<unavailable>) + 180 at pci_ahci.c:445 [opt]
    frame #4: 0x000000010ea971e2 com.docker.hyperkit`pci_ahci_write [inlined] pci_ahci_port_write(sc=<unavailable>, offset=<unavailable>, value=<unavailable>) + 492 at pci_ahci.c:2051 [opt]
    frame #5: 0x000000010ea96ff6 com.docker.hyperkit`pci_ahci_write(vcpu=<unavailable>, pi=<unavailable>, baridx=<unavailable>, offset=<unavailable>, size=<unavailable>, value=<unavailable>) + 498 at pci_ahci.c:2156 [opt]
    frame #6: 0x000000010ea9c974 com.docker.hyperkit`pci_emul_mem_handler(vcpu=1, dir=<unavailable>, addr=<unavailable>, size=<unavailable>, val=<unavailable>, arg1=0x00007fd1ca7000a0, arg2=<unavailable>) + 284 at pci_emul.c:394 [opt]
    frame #7: 0x000000010ea9686d com.docker.hyperkit`mem_write(unused=<unavailable>, vcpu=<unavailable>, gpa=<unavailable>, wval=<unavailable>, size=<unavailable>, arg=<unavailable>) + 47 at mem.c:151 [opt]
    frame #8: 0x000000010ea9292b com.docker.hyperkit`vmm_emulate_instruction [inlined] emulate_mov(vm=<unavailable>, vcpuid=<unavailable>, gpa=<unavailable>, vie=<unavailable>, memread=<unavailable>, memwrite=(com.docker.hyperkit`mem_write at mem.c:147), arg=<unavailable>) + 442 at vmm_instruction_emul.c:497 [opt]
    frame #9: 0x000000010ea92771 com.docker.hyperkit`vmm_emulate_instruction(vm=<unavailable>, vcpuid=<unavailable>, gpa=<unavailable>, vie=<unavailable>, paging=<unavailable>, memread=<unavailable>, memwrite=<unavailable>, memarg=<unavailable>) + 2930 at vmm_instruction_emul.c:1421 [opt]
    frame #10: 0x000000010ea91739 com.docker.hyperkit`xh_vm_emulate_instruction(vcpu=<unavailable>, gpa=<unavailable>, vie=<unavailable>, paging=<unavailable>, memread=<unavailable>, memwrite=<unavailable>, memarg=<unavailable>) + 87 at vmm_api.c:828 [opt]
    frame #11: 0x000000010eaab6d0 com.docker.hyperkit`vmexit_inst_emul [inlined] emulate_mem(paddr=<unavailable>, vie=0x000000010edcd670, paging=0x000000010edcd658) + 31 at mem.c:202 [opt]
    frame #12: 0x000000010eaab6b1 com.docker.hyperkit`vmexit_inst_emul(vme=<unavailable>, pvcpu=<unavailable>) + 180 at hyperkit.c:486 [opt]
    frame #13: 0x000000010eaabf3b com.docker.hyperkit`vcpu_thread [inlined] vcpu_loop(vcpu=1) + 116 at hyperkit.c:630 [opt]
    frame #14: 0x000000010eaabec7 com.docker.hyperkit`vcpu_thread(param=<unavailable>) + 1076 at hyperkit.c:276 [opt]
    frame #15: 0x00007fff8c193aab libsystem_pthread.dylib`_pthread_body + 180
    frame #16: 0x00007fff8c1939f7 libsystem_pthread.dylib`_pthread_start + 286
    frame #17: 0x00007fff8c193221 libsystem_pthread.dylib`thread_start + 13

Can be reproduced by e.g. introducing a sleep(35) into blockif_proc's BOP_DELETE handler and then running fstrim on a filesystem from inside the guest. ijc@d8af9d6 contains some code to do that along with some debugging around the blockif_cancel code paths.

This seems likely to be down to a different in the kevent/kqueue semantics between FreeBSD (where this code originates via bhyve) and OSX. FreeBSD kqueue(2) and OSX kevent(2) differ in their descriptions of EVFILT_SIGNAL in that the OSX version says explicitly:

Only signals sent to the process, not to a particular thread, will trigger the filter.

While the FreeBSD one does not. The signal is sent with pthread_kill(be->be_tid, SIGCONT); so would be expected to be subject to this caveat.

However there is something I do not understand about the the original code on FreeBSD which makes me reluctant to just start coding a fix.

There are 3 threads involved:

The VCPU I/O emulation thread (ioemu), runs the device model e.g. the pci_ahci.c code
The block IO thread (blkio), performs actual IO onto the underlying backend devices
The mevent thread, listens for various events using kqueue/kevent. This is actually the process' original main thread which calls mevent_dispatch after initialisation.

There are 2 sets queues involved:

The three blockif request queues (bc->bc_freeq, bc->bc_pendq and bc->bc_busyq) which between them containing theBLOCKIF_MAXREQ elements (type struct blockif_elem) bc->bc_reqs. These are protected by bc->bc_mtxandbc->bc_cond`.
The (global) blockif_bse_head which contains a chain of struct blockif_sig_elem *, protected through the use of atomic_cmpset_ptr. Each blockif_sig_elem contains bse_mtx and bse_cond used for completion.

In normal processing the ioemu thread will process an MMIO (e.g. for pci_ahci.c, from ahci_handle_rw, atapi_read, ahci_handle_dsm_trim and others) will call a helper function which passes a blockif_req to blockif_request which calls blockif_enqueue enqueues a blockif_elem then pokes bc->bc_cond.

This will wake the blkio thread (which is a single thread, but can be multiple in bhvye upstream) which was waiting on bc->bc_cond which will then call blockif_dequeue which takes the blockif_elemand sets be->be_status to BUSY and be->be_tid to the thread which is claiming the work (that is, blkio).

The blkio thread will then process the IO via blockif_proc which will issue various blocking system calls. When the I/O completes an ioemu provided callback is called (for pci_ahci.c this would be ata_ioreq_cb or atapi_ioreq_cb, these update the emulation state etc, i.e. marking the command complete), this callback is passed err which is either 0 (success) or an errno value (fail). Finally the blkio thread calls blockif_complete which frees the blockif_elem.

This all seems reasonable enough.

However upon cancellation, which happens in blockif_cancel (in the case of pci_ahci.c this is called from ahci_port_stop from the ioemu thread), things are more complex.

If the blockif_request is not active, that is, the corresponding blockif_elem has not been claimed by the blkio thread (it is on bc->bc_pendq), then it is simply calling blockif_complete. Also simple enough.

However if the blockif_request is active, that is, the corresponding blockif_elem is on bc->bc_busyq and therefore has a non-zero be_tid and a be_status of BUSY then it will allocate a new struct blockif_sig_elem bse on the stack and add it to the global blockif_bse_head (using atomic_cmpset_ptr). It will then send a SIGCONT to the be_tid with pthread_kill(be->be_tid, SIGCONT); and then block waiting for the embedded bse_cond to be signalled and the blockif_sig_elem completed.

So far so good but at this point I lose track of what is going on because the SIGCONT is delivered via kqueue to the mevent thread and the blockif_sigcont_handler callback and not to the blkio thread. The callback handler does nothing other than walk the global list marking each blockif_sig_elem complete and kicking the corresponding bse_cond (which wakes the ioemu thread). It takes no action WRT the blkio thread.

The only way I can see this working on FreeBSD is that receiving the SIGCONT causes the system call which the blkio thread is currently in to return with EINTR (or similar) while delivering the actual signal to another thread via the kevent. This has a subtle dependency on the ordering of the events (the system call must return before the signal handler callback is called) and is not something which is made clear in any of the documentation I've been able to find.

I'm also not sure what happens if the blkio thread is merely on the way to calling the system call at the point where the cancellation signal arrives. Seems like it would block when it actually made the call? Might be harmless (since things rely on these things returning via their normal return path to signal the error) or might result in things not being cancelled as expected.

This needs more thought and investigation, I shall ask on the Bhyve list hence setting down my understanding here.

support multiple disks in Go API

We want to be able to support attaching multiple disks from the Go API for LinuxKit, currently we only do for qemu.

This is needed to fix moby/tool#68 and make all the tests more portable.

Plus it is just a useful feature.

vsock filename prefix

I need a way to prefix vsock filenames.
@ijc25 please have a look at this and tell me if that should be done differently: c5898bf