crc-org / snc Goto Github PK

View Code? Open in Web Editor NEW

100.0 13.0 49.0 640 KB

Single Node Cluster creation scripts for OpenShift 4.x as used by CodeReady Containers

Home Page: https://crc.dev

License: Apache License 2.0

Shell 99.02% Dockerfile 0.98%

openshift-deployment openshift-cluster openshift snc single-node-cluster codeready podman

snc's Introduction

Single node cluster (snc) scripts for OpenShift 4

How to use?

Clone this repo git clone https://github.com/code-ready/snc.git
cd <directory_to_cloned_repo>
./snc.sh

How to create disk image?

Once your snc.sh script run successfully.
You need to wait for around 30 mins till cluster settle.
./createdisk.sh crc-tmp-install-data

Monitoring

The installation is a long process. It can take up to 45 mins. You can monitor the progress of the installation with kubectl.

$ export KUBECONFIG=<directory_to_cloned_repo>/crc-tmp-install-data/auth/kubeconfig
$ kubectl get pods --all-namespaces

Building SNC for OKD 4

Before running ./snc.sh, you need to create a pull secret file, and set a couple of environment variables to override the default behavior.
Select the OKD 4 release that you want to build from: https://origin-release.apps.ci.l2s4.p1.openshiftapps.com
For example, to build release: 4.5.0-0.okd-2020-08-12-020541

# Create a pull secret file

cat << EOF > /tmp/pull_secret.json
{"auths":{"fake":{"auth": "Zm9vOmJhcgo="}}}
EOF

# Set environment for OKD build
export OKD_VERSION=4.5.0-0.okd-2020-08-12-020541
export OPENSHIFT_PULL_SECRET_PATH="/tmp/pull_secret.json"

# Build the Single Node cluster
./snc.sh

When the build is complete, create the disk image as described below.

export BUNDLED_PULL_SECRET_PATH="/tmp/pull_secret.json"
./createdisk.sh crc-tmp-install-data

Creating container image for bundles

After running snc.sh/createdisk.sh, the generated bundles can be uploaded to a container registry using this command:

./gen-bundle-image.sh <version> <openshift/okd/podman>

Note: a GPG key is needed to sign the bundles before they are wrapped in a container image.

Troubleshooting

OpenShift installer will create 2 VMs. It is sometimes useful to ssh inside the VMs. Add the following lines in your ~/.ssh/config file. You can then do ssh master and ssh bootstrap.

Host master
    Hostname 192.168.126.11
    User core
    IdentityFile <directory_to_cloned_repo>/id_ecdsa_crc
    StrictHostKeyChecking no
    UserKnownHostsFile /dev/null

Host bootstrap
    Hostname 192.168.126.10
    User core
    IdentityFile <directory_to_cloned_repo>/id_ecdsa_crc
    StrictHostKeyChecking no
    UserKnownHostsFile /dev/null

Environment Variables

The following environment variables can be used to change the default values of bundle generation.

SNC_GENERATE_MACOS_BUNDLE : if set to 0, bundle generation for MacOS is disabled, any other value will enable it. SNC_GENERATE_WINDOWS_BUNDLE : if set to 0, bundle generation for Windows is disabled, any other value will enable it. SNC_GENERATE_LINUX_BUNDLE : if set to 0, bundle generation for Linux is disabled, any other value will enable it.

Please note the SNC project is “as-is” on this Github repository. At this time, it is not an offically supported Red Hat solution.

snc's People

Contributors

Stargazers

Watchers

snc's Issues

Put VM image file last in the tar archive

Currently, the files in the archive are in a random order. I believe it should be possible to specify in which order they are by listing all the files explicitly on tar cmdline (instead of just the directory), or by using the -T argument to tar.
The benefit of doing that would be that it would make it much faster to access the smaller files as we would not need to skip the multi-GB VM image first.

Update kubelet.service in 99_master-kubelet-no-taint.yaml file.

We are using an older service file which is not going to work with 4.2.0 side if we don't make changes since now by default the crio service is not running and this is something specified in the Wants=rpc-statd.service crio.service and After=crio.service as part of the kubelet unit file
.

Current kubelet unit file on the 4.2 side looks like.

$ sudo systemctl cat kubelet
# /etc/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet
Wants=rpc-statd.service crio.service
After=crio.service

[Service]
Type=notify
ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests
ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state
EnvironmentFile=/etc/os-release
EnvironmentFile=-/etc/kubernetes/kubelet-workaround
EnvironmentFile=-/etc/kubernetes/kubelet-env

ExecStart=/usr/bin/hyperkube \
    kubelet \
      --config=/etc/kubernetes/kubelet.conf \
      --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \
      --rotate-certificates \
      --kubeconfig=/var/lib/kubelet/kubeconfig \
      --container-runtime=remote \
      --container-runtime-endpoint=/var/run/crio/crio.sock \
      --node-labels=node-role.kubernetes.io/master,node.openshift.io/os_id=${ID} \
      --minimum-container-ttl-duration=6m0s \
      --cloud-provider= \
      --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \
       \
      --register-with-taints=node-role.kubernetes.io/master=:NoSchedule \
      --v=3 \

Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Increase default disk size

Currently the disk we create has a 14GB partition/filesystem for user data, with about half of it free. This turns out to be too small for some use cases, and enlarging it can be tricky in some cases.
Since we are using dynamically allocated disk images, changing the size to 20GB should not make much difference on the overall size. qcow2, vmdk, and maybe vhdx all support this.

snc.sh fails due to missing metadata.json file

I followed the steps mentioned in the README page and executed ./snc.sh after copying the openshift-install binary. But it fails with below output:

$ ./snc.sh
/home/dshah/bin/oc
/home/dshah/src/snc/yq
/usr/bin/jq
DEBUG OpenShift Installer unreleased-master-1133-ga99bd59f377226102c146befca60e948f0c601c8 
DEBUG Built from commit a99bd59f377226102c146befca60e948f0c601c8 
FATAL Failed while preparing to destroy cluster: open crc-tmp-install-data/metadata.json: no such file or directory 
OpenShift pull secret must be specified through the OPENSHIFT_PULL_SECRET environment variable

There is indeed no metadata.json file:

$ ls -a crc-tmp-install-data
.  ..  .openshift_install.log

I'm trying to do this on Fedora 30 system. Also, what's OPENSHIFT_PULL_SECRET supposed to be set to?

Block the createdisk script if cert is not rotated for 30 days

Right now we don't have any check for cert rotation, we should first check if the cert rotated successfully for 30 days and then only start the disk creation process otherwise error out and tell user to wait for it. It will help our CI and also users who want to create the bundle using this script.

change the vbox to virtualbox for the bundle creation

We should put our bundle name according to the hypervisor driver flag which we use in the crc to make it consistent.

libvirt
virtualbox
hyperkit
hyperv

Add the script for creating the virtualbox vmdk files also.

We should also add the script to create the vmdk disk images and generated the same json metadata file.

Convert image to vhdx for use with Hyper-V

Needs additional packages and converted to a dynamic virtual disk:

$ ${SSH} core@api.${CRC_VM_NAME}.${BASE_DOMAIN} rpm-ostree install hyperv-daemons
$ qemu-img convert -f qcow2 -O vhdx -o subformat=dynamic crc.qcow2 crc.vhdx

Create a user developer with password as developer

Add some default pv as part of snc script

We are not creating any PV which make it difficult for some of the tools like odo to work correctly, we need to fix it so it should work as expected.

The way oc cluster up used to do it https://github.com/openshift/origin/blob/release-3.11/pkg/oc/clusterup/coreinstall/components/persistent-volumes/setup_persistent_volumes.go#L39-L91

snc.sh fails with SSH "failed to create SSH client" error

Hi, I can't build the image for CRC to start with as when I run snc.sh I keep getting errors relating to SSH client:

time="2019-07-26T16:32:07+02:00" level=debug msg="Apply complete! Resources: 9 added, 0 changed, 0 destroyed."
time="2019-07-26T16:32:07+02:00" level=debug msg="OpenShift Installer unreleased-master-1440-g6e3bad2ccdc9960ecf5caf20e6b4fff03d00aadb"
time="2019-07-26T16:32:07+02:00" level=debug msg="Built from commit 6e3bad2ccdc9960ecf5caf20e6b4fff03d00aadb"
time="2019-07-26T16:32:07+02:00" level=info msg="Waiting up to 30m0s for the Kubernetes API at https://api.crc.testing:6443..."
time="2019-07-26T16:32:10+02:00" level=debug msg="Still waiting for the Kubernetes API: Get https://api.crc.testing:6443/version?timeout=32s: dial tcp 192.168.126.11:6443: connect: no route to host"
time="2019-07-26T16:32:15+02:00" level=debug msg="Still waiting for the Kubernetes API: Get https://api.crc.testing:6443/version?timeout=32s: dial tcp 192.168.126.10:6443: connect: connection refused"
...
time="2019-07-26T17:01:40+02:00" level=debug msg="Still waiting for the Kubernetes API: the server could not find the requested resource"
time="2019-07-26T17:02:07+02:00" level=debug msg="Fetching \"Install Config\"..."
time="2019-07-26T17:02:07+02:00" level=debug msg="Loading \"Install Config\"..."
time="2019-07-26T17:02:07+02:00" level=debug msg="  Loading \"SSH Key\"..."
time="2019-07-26T17:02:07+02:00" level=debug msg="  Loading \"Base Domain\"..."
time="2019-07-26T17:02:07+02:00" level=debug msg="    Loading \"Platform\"..."
time="2019-07-26T17:02:07+02:00" level=debug msg="  Loading \"Cluster Name\"..."
time="2019-07-26T17:02:07+02:00" level=debug msg="    Loading \"Base Domain\"..."
time="2019-07-26T17:02:07+02:00" level=debug msg="  Loading \"Pull Secret\"..."
time="2019-07-26T17:02:07+02:00" level=debug msg="  Loading \"Platform\"..."
time="2019-07-26T17:02:07+02:00" level=debug msg="Using \"Install Config\" loaded from state file"
time="2019-07-26T17:02:07+02:00" level=debug msg="Reusing previously-fetched \"Install Config\""
time="2019-07-26T17:02:07+02:00" level=info msg="Pulling debug logs from the bootstrap machine"
time="2019-07-26T17:04:14+02:00" level=error msg="failed to create SSH client, ensure the proper ssh key is in your keyring or specify with --key: dial tcp 31.199.53.9:22: connect: connection timed out"
time="2019-07-26T17:04:14+02:00" level=fatal msg="waiting for Kubernetes API: context deadline exceeded"

Setup:

$ ./openshift-install version
./openshift-install unreleased-master-1440-g6e3bad2ccdc9960ecf5caf20e6b4fff03d00aadb
built from commit 6e3bad2ccdc9960ecf5caf20e6b4fff03d00aadb
release image registry.svc.ci.openshift.org/origin/release:4.2

I'm on RHEL 7.6:

$ cat /etc/redhat-release
Red Hat Enterprise Linux Workstation release 7.6 (Maipo)
$ uname -a
Linux 3.10.0-957.12.2.el7.x86_64 #1 SMP Fri Apr 19 21:09:07 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

dep version:

dep version
dep:
 version     : v0.5.4
 build date  : 2019-07-01
 git hash    : 1f7c19e
 go version  : go1.12.6
 go compiler : gc
 platform    : linux/amd64
 features    : ImportDuringSolve=false

Steps to reproduce:

Go through:
https://github.com/code-ready/snc/blob/master/README.md
https://github.com/openshift/installer/blob/master/docs/dev/libvirt/README.md#one-time-setup
https://github.com/openshift/installer/blob/master/docs/dev/libvirt/README.md#build-the-installer

And finally:

$ cp <built_installer_binary> <directory_to_cloned_repo>
$ cd <directory_to_cloned_repo>
$ ./snc.sh

OpenShift installer
openshift-installer was built from master branch sources, as well, setting TAGS=libvirt.
Pull secret was obtained from https://try.openshift.com and stored in env variable as expected.

I've been checking my OS configuration WRT virtualization and then installed last qemu release from RPMs as the one delivered with RHEL 7.6 wasn't accepting one of the parameters passed by the shell script.
I've been testing libvirt/qemu setup by creating a RHEL 7 Workstation ISO based guest via virt-install and also configuring and running minishift, too. Both tests were successful.

Is there any information from your side on this?
Thanks.

Add version field to metadata file

This makes unmarshalling in go easier/more future proof.

snc build: Unknown name requested, could not find keepalived-ipfailover in UpdatePayload

I installed the centos7 on the laptop and installed all dependencies for the openshift-install and snc script.
(Add kvm-common repository to have the latest qemu-kvm)
When I run the snc script. I get the error that the master was not started in the 30min.

[user@ce23652 snc]$ sudo virsh list --all
[sudo] password for user: 
 Id    Name                           State
----------------------------------------------------
 3     crc-85mk2-bootstrap            running
 4     crc-85mk2-master-0             running

[user@ce23652 snc]$ sudo virsh console 3
Connected to domain crc-85mk2-bootstrap
Escape character is ^]
[ 3126.214688] SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)
[ 3126.884085] SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)
[ 3127.608045] SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)
[ 3128.289552] SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)

[user@ce23652 snc]$ sudo virsh console 4
Connected to domain crc-85mk2-master-0
Escape character is ^]
[**    ] A start job is running for Ignition (disks) (52min 7s / no limit)[ 3130.116850] ignition[626]: GET https://api-int.crc.testing:22623/config/master: attempt #629
[ 3130.133065] ignition[626]: GET error: Get https://api-int.crc.testing:22623/config/master: dial tcp 192.168.126.10:22623: connect: connection refused
[  *** ] A start job is running for Ignition (disks) (52min 9s / no limit)

My crc-xxx-boostrap image does not start.

5b81 (image=quay.io/openshift-release-dev/ocp-release:4.1.9, name=stupefied_gagarin)
Aug 08 23:47:59 crc-8mjbc-bootstrap podman[2831]: 2019-08-08 23:47:59.806207983 +0000 UTC m=+0.303240560 container start ba321b1f5f60edd9596199dea4d33999a5d6b57cb9f3852e27d6741ecc645b81 (image=quay.io/openshift-release-dev/ocp-release:4.1.9, name=stupefied_gagarin)
Aug 08 23:47:59 crc-8mjbc-bootstrap podman[2831]: 2019-08-08 23:47:59.806261766 +0000 UTC m=+0.303294302 container attach ba321b1f5f60edd9596199dea4d33999a5d6b57cb9f3852e27d6741ecc645b81 (image=quay.io/openshift-release-dev/ocp-release:4.1.9, name=stupefied_gagarin)
Aug 08 23:47:59 crc-8mjbc-bootstrap bootkube.sh[1483]: F0808 23:47:59.947444       1 image.go:32] error: error: Unknown name requested, could not find keepalived-ipfailover in UpdatePayload
Aug 08 23:48:00 crc-8mjbc-bootstrap systemd[1]: bootkube.service: Main process exited, code=exited, status=255/n/a
Aug 08 23:48:00 crc-8mjbc-bootstrap systemd[1]: bootkube.service: Failed with result 'exit-code'.
Aug 08 23:48:05 crc-8mjbc-bootstrap systemd[1]: bootkube.service: Service RestartSec=5s expired, scheduling restart.
Aug 08 23:48:05 crc-8mjbc-bootstrap systemd[1]: bootkube.service: Scheduled restart job, restart counter is at 1.
Aug 08 23:48:05 crc-8mjbc-bootstrap systemd[1]: Stopped Bootstrap a Kubernetes cluster.
``
Do I am missing something?
What is the best OS to build the bundle?

Remove pull secret from the disk image.

We need to replace the pull secret from the disk image so we can easily share it, right now we have 2 different places which it is listed and should be removed.

$ cat pull-secret.yaml 
apiVersion: v1
data:
  .dockerconfigjson: e30K
kind: Secret
metadata:
  name: pull-secret
  namespace: openshift-config
type: kubernetes.io/dockerconfigjson

$ oc get secrets pull-secret -n openshift-config -oyaml

/var/lib/kubelet/config.json

Disable the kubelet service before creating the disk image

We should disable the kubelet service since on Mac/Windows we are going to run the dns server before starting the cluster.

Provide a way to undo changes to system

I spent 2 days trying to debug why https://github.com/openshift/installer stopped working for me, only to find out in the end that it was because when I recently ran the SNC script, it left /etc/NetworkManager/dnsmasq.d/openshift.conf, which takes precedence over /etc/NetworkManager/conf.d/openshift.conf that Installer uses.

We should provide a way to undo this (or any other relevant host configurations SNC script does).

Wait until the VM is shutdown after `virsh shutdown` call

createdisk.sh calls virsh shutdown to cleanly stop the crc VM. However, this call is asynchronous, so if we don't wait until the image is fully shutdown, the next step could end up working from a corrupted image (because it was being modified while we were reading it).
Calling virsh event would hopefully do that job for us without needing to poll. Alternatively virsh domstate will tell us if the VM is still running or not.

How to create CRC bundle

Do i use SNC to build a bundle for CRC?

I'd like to use CRC, but it requires a bundle file which I don't see how to build.

Any help is appreciated.

Clearer naming for disk image

Can we have clearer naming for the disk image?
https://github.com/praveenkumar/snc/blob/5aa1c220db8c5e57c7de999c706c1702744d50a8/createdisk.sh#L36

something like system.img ?

Add a env variable which take release tag otherwise date for creating the tarball folder

We need to add a env or argument to take the release tag info an then create the directory with that instead using the date and also the tarball.

like in case of v4.1.0-rc0 it should be crc_libvirt_v4.1.0-rc0 instead crc_libvirt_2019-04-28 and same for tarball.

Move repo to code-ready namespace

Add release info in the metadata json file

As of now we are not adding the release info to the metadata file which we should add like v4.1.0-rc .

Remove the cluster ID before creation of disk image.

We need different cluster ID for different clusters so we need to remove it.

$ oc patch clusterversion version -p '{"spec":{"clusterID":""}}' --type merge

crc-bundle-info.json has wrong metadata for virtualbox

The extension/format of the disk image are not updated, so they both are 'qcow2' instead of 'vbox'

Workaround hyperkit qcow2 bug

This is a workaround for an old issue that the project has been having, see crc-org/osp4#9
Some qcow2 images created by qemu-img causes a freeze in hyperkit qcow2 implementation, see moby/hyperkit#221
Enabling lazy refcounts in the qcow2 image seems to avoid this issue, and this is a performance improvement which should not have many adverse effects (slower startup in case of improper VM shutdown), see https://lists.gnu.org/archive/html/qemu-devel/2012-06/msg03827.html

Set the hostname details as part of snc

We need to set the hostname after at the end of snc script.

== on the VM ==

$ hostnamectl set-hostname crc-xxxx-master-0

Don't include `crc_libvirt.sh` in the bundle for libvirt.

Right now we are adding the crc_libvirt.sh to the bundle that we need to remove.

Remove archive (tar.xz) extension from the bundles

/cc @gbraad

Add registry route so user can able to use insecure way of connecting internal registry.

As of now default registry route is from the CRD is not working as expected so we need to make it available.

$ oc patch config.imageregistry.operator.openshift.io/cluster --patch '{"spec":{"defaultRoute":true}}' --type=merge

Issues with bundle creation targeting Windows 10 /Hyper-V

Hi,

I try to build my own Hyper-V bundle and test it on the windows 10.
I get an error: id_rsa_crc: Access is denied
Do I need to setup any special rights on windows 10?

λ  .\crc.exe version                                                                        
version: 0.89.1-alpha-4.1.6+3946ae0                                                         
C:\work\programs\crc            
λ  .\crc.exe start --log-level debug -b crc_hyperv_4.1.6.crcbundle                                                                                                             
INFO Checking if oc binary is cached                                                                                                                                           
DEBU oc binary already cached                                                                                                                                                  
INFO Checking if CRC bundle is cached in '$HOME/.crc'                                                                                                                          
INFO Check Windows 10 release                                                                                                                                                  
INFO Hyper-V installed and operational                                                                                                                                         
INFO Is user a member of the Hyper-V Administrators group                                                                                                                      
INFO Does the Hyper-V virtual switch exist                                                                                                                                     
INFO Extracting the crc_hyperv_4.1.6.crcbundle Bundle tarball ...                                                                                                              
ERRO Error occurred: Error to get bundle Metadata Error during extraction : open C:\Users\apetras\.crc\cache\crc_hyperv_2019-08-07\id_rsa_crc: Access is denied.               
C:\work\programs\crc

Need update for crc-bundle-info file creation (extra slash)

update_json_description crc-tmp-install-data/
+ cat crc-tmp-install-data//crc-bundle-info.json
+ jq '.clusterInfo.masterHostname = "crc-fdzt7-master-0"'
+ jq '.clusterInfo.sshPrivateKeyFile = "id_rsa_crc"'
+ jq '.clusterInfo.kubeConfig = "kubeconfig"'
+ jq '.clusterInfo.kubeadminPasswordFile = "kubeadmin-password"'
+ jq '.storage.diskImages[0].name = "crc.qcow2"'
+ jq '.storage.diskImages[0].format = "qcow2"'
cat: crc-tmp-install-data//crc-bundle-info.json: No such file or directory
+ rm crc-tmp-install-data//crc-bundle-info.json
rm: cannot remove ‘crc-tmp-install-data//crc-bundle-info.json’: No such file or directory

Shouldn't set OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE if there is no release tag

Looks like https://github.com/code-ready/snc/blob/master/snc.sh#L61-L66 is not working as it suppose to work and even I don't have any release tag from snc side then also it is trying to setup the release image override.

WARNING Found override for ReleaseImage. Please be warned, this is not advised
[...]
Setting OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE to quay.io/openshift-release-dev/ocp-release:

Don't disable the marketplace operator

Right now we are disabling the marketplace operator by default but it is not talking much resource like we anticipated, also most of the operator developer need it during the development phase and even user need to get some operator for it.

remove id_rsc_crc public and private key from the repo and generate it during snc run

We should not publishing ssh key pair here but generate it during the snc run and the use it during the cluster start and also copy those on the disk creation step.

Explain how to build virtualbox image to run on OSX

As I understand with this issue #13 to the code was added script that can build virtualbox images. It would be nice to see somewhere explanation of how to use it. Can you share command example how to use it?

2 routers being created with a single node

crc version
crc - Local OpenShift 4.x cluster
version: 0.87.0-alpha-4.1.0+3a5033a

There is a pending router-default-X pod after a successful deployment:

oc get pods --all-namespaces  | grep -v -E 'Running|Completed'
NAMESPACE                                               NAME                                                              READY   STATUS      RESTARTS   AGE
openshift-ingress                                       router-default-5fdbdc678-s8bn2                                    0/1     Pending     0          10d

oc get events
LAST SEEN   TYPE      REASON             OBJECT                               MESSAGE
5m8s        Normal    Pulled             pod/router-default-5fdbdc678-9xqcc   Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7afd1b6aace6db643532680ca61761cf66ded116f8f673ec89121dbd424b2a15" already present on machine
5m2s        Normal    Created            pod/router-default-5fdbdc678-9xqcc   Created container router
5m2s        Normal    Started            pod/router-default-5fdbdc678-9xqcc   Started container router
9d          Warning   FailedScheduling   pod/router-default-5fdbdc678-s8bn2   0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports.
2m8s        Warning   FailedScheduling   pod/router-default-5fdbdc678-s8bn2   0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports.

As it only deploys a single node, the ports 80/443 are already bounded to the other router:

oc get pods
NAME                             READY   STATUS    RESTARTS   AGE
router-default-5fdbdc678-9xqcc   1/1     Running   0          10d
router-default-5fdbdc678-s8bn2   0/1     Pending   0          10

As a workaround, scale the replicas to 1 as:

oc patch \
   --namespace=openshift-ingress-operator \
   --patch='{"spec": {"replicas": 1}}' \
   --type=merge \
   ingresscontroller/default

Update network config and add `api-int` entry for dns.

Openshift now need api-int query to be part of the network configuration we need to add it in our libvirt template side.

https://github.com/code-ready/snc/blob/master/crc_libvirt.template#L100-L101

<host ip='192.168.126.11'> 
 <hostname>api.ReplaceMeWithCorrectVmName.ReplaceMeWithCorrectBaseDomain</hostname>
 <hostname>api-int.ReplaceMeWithCorrectVmName.ReplaceMeWithCorrectBaseDomain</hostname>
 <hostname>etcd-0.ReplaceMeWithCorrectVmName.ReplaceMeWithCorrectBaseDomain</hostname>
</host>

Update the readme file.

We need to update the readme file with recent changes we pushed.

snc.sh fails due to connection timeout

while executing ./snc.sh on a Ubuntu machine, the script fails with the following error

ERROR                                              
ERROR Error: Unassigned variable                   
ERROR                                              
ERROR The input variable "libvirt_master_ips" has not been assigned a value. This is 
ERROR a bug in Terraform; please report it in a GitHub issue. 
ERROR                                              
ERROR                                              
ERROR Error: Unassigned variable                   
ERROR                                              
ERROR The input variable "libvirt_uri" has not been assigned a value. This is a bug 
ERROR in Terraform; please report it in a GitHub issue. 
ERROR                                              
ERROR                                              
ERROR Error: Unassigned variable                   
ERROR                                              
ERROR The input variable "libvirt_bootstrap_ip" has not been assigned a value. This 
ERROR is a bug in Terraform; please report it in a GitHub issue. 
ERROR                                              
ERROR                                              
ERROR Error: Unassigned variable                   
ERROR                                              
ERROR The input variable "libvirt_network_if" has not been assigned a value. This is 
ERROR a bug in Terraform; please report it in a GitHub issue. 
ERROR                                              
ERROR                                              
ERROR Error: Unassigned variable                   
ERROR                                              
ERROR The input variable "os_image" has not been assigned a value. This is a bug in 
ERROR Terraform; please report it in a GitHub issue. 
ERROR                                              
ERROR Failed to read tfstate: open /tmp/openshift-install-103064960/terraform.tfstate: no such file or directory 
FATAL failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply using Terraform 
This is known to fail with:
'pool master is not ready - timed out waiting for the condition'
see https://github.com/openshift/machine-config-operator/issues/579
ssh: Could not resolve hostname api.crc.testing: Name or service not known
ssh: Could not resolve hostname api.crc.testing: Name or service not known
ssh: Could not resolve hostname api.crc.testing: Name or service not known
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host

Make sure create shasum for openshift.conf file and validate every time

This might be required because if a person who used manually creation of the openshift.conf then we don't update atm but this cause the issues.

kernel command line argument should be taken after installing any packages and performing a reboot

crc-org/crc#420 was caused because we installed the hyperV packages which change the ostree. Since we didn't rebooted the system it doesn't switch the ostree and caused this issue. We need to either get that kernel line args before or reboot the system and then get it.

Add sha256sum for the disk image to metadata

We should add the disk image sha256sum in the metadata so we can easily able to cross-check the used disk image.

Add driver info to the bundle metadata file

We should also have driver info in the bundle metadata file so some sanity check we do when user run against with a specific bundle.

Add hyperkit support

hyperkit support needs additional files/metadata:

we need to get the correct vmlinuz and initramfs, probably extracting them from the VM image
crc-bundle-info.json needs to describe these files
then we'll either need to also extract the full kernel command line for use by hyperkit, or add the ostree hash, the disk uuid, and the boot 'index' to the metadata file. The full kernel command line is similar to CMDLINE="BOOT_IMAGE=/ostree/rhcos-$OSTREE_HASH/$KERNEL console=tty0 console=ttyS0,115200n8 rootflags=defaults,prjquota rw root=UUID=$ROOT_UUID ostree=/ostree/boot.0/rhcos/$OSTREE_HASH/0 coreos.oem.id=qemu ignition.platform.id=qemu" (the so called 'boot index' is the '0' which appears twice in the ostree=... bit). This data will go in crc-bundle-info.json

Creates worker nodes

After running the provided script, the cluster comes up but it has worker nodes. Its probably just about setting replicas to 0 in install config. I'm trying that right now..

Pull dnsmasq container image before creating the disk image

As of now we are going to use quay.io/crcont/dnsmasq:latest as part of internal dns resolution so it would be great to have it in as part of disk image then pulling it from the registry everytime a user start the cluster.

Remove the state of pods and container before creating the diskimage

We need to remove all the pods/container before creating the disk images so that when it is used for crc purpose it will not take the state of cluster which is shutdown. This should be part of create disk since it will be executed just before that or we can create an another script which contains the steps like deployments/replicas changes and this one and executed before createdisk but after the cert rotation from the snc side.

=== On the VM ==

$ systemctl stop kubelet
$ crictl stopp $(crictl pods -q)
$ crictl rmp $(crictl pods -q)

Force certificate rotation after creating the cluster

We currently need to wait 24h after creating the cluster before getting certificates valid for one month. We should have a way of forcing the rotation of the certificates without waiting for 24h

Experimenting with the commands below seemed to help:

# validity is 30 times the base
oc create -n openshift-config configmap unsupported-cert-rotation-config --from-literal='base=30s'

# forcing rotation
oc get secret -A -o json | jq -r '.items[] | select(.metadata.annotations."auth.openshift.io/certificate-not-after" | .!=null and fromdateiso8601<='$( date --date='+1year' +%s )') | "-n \(.metadata.namespace) \(.metadata.name)"' | xargs -n3 oc patch secret -p='{"metadata": {"annotations": {"auth.openshift.io/certificate-not-after": null}}}'

# Wait ~ 5-10 minutes

# Make sure at least the apiserver serving cert has 15 min validity (change your cluster name based on your kubeconfig)
openssl s_client -connect api.tnozicka-1.devcluster.openshift.com:6443 | openssl x509 -noout -dates


# go back to normal certrotation setting
oc delete -n openshift-config configmap unsupported-cert-rotation-config

Project/namespace deletion goes to hanging state.

On CRC side when a user create a new project/namespace and want to delete then it goes to hanging state forever. Looking on the kubernetes side and found out it is occur because of CRD's [0][1], as part of our CRC creation we scale down the cluster wide monitoring operator[2] which serve 'v1beta1.metrics.k8s.io` as per my understanding. if we also delete the apiservice that would not break anything for the users and if a user want to try out the cluster wide monitoring then it will again created as part of CRD's.

[0] kubernetes/kubernetes#60807
[1] kubernetes/kubernetes#60807 (comment)
[2] https://github.com/code-ready/snc/blob/master/snc.sh#L184-L186

Add virt-sparsify

$ qemu-img info crc.qcow2 
image: crc.qcow2
file format: qcow2
virtual size: 31G (33285996544 bytes)
disk size: 9.0G
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: true
    refcount bits: 16
    corrupt: false

$ virt-sparsify ./crc.qcow2 ./crc_sparse.qcow2 --tmp tmp/
[   0.1] Create overlay file in tmp/ to protect source disk
[   0.1] Examine source disk
 100% ⟦▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒⟧ --:--
[   5.9] Fill free space in /dev/sda2 with zero
[   6.7] Fill free space in /dev/sda3 with zero
 100% ⟦▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒⟧ 00:00
[  47.1] Copy to destination and make sparse
[  98.0] Sparsify operation completed with no errors.
virt-sparsify: Before deleting the old disk, carefully check that the 
target disk boots and works correctly.


$ qemu-img info crc_sparse.qcow2 
image: crc_sparse.qcow2
file format: qcow2
virtual size: 31G (33285996544 bytes)
disk size: 8.3G
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false