Git Product home page Git Product logo

snc's Introduction

Single node cluster (snc) scripts for OpenShift 4

How to use?

  • Clone this repo git clone https://github.com/code-ready/snc.git
  • cd <directory_to_cloned_repo>
  • ./snc.sh

How to create disk image?

  • Once your snc.sh script run successfully.
  • You need to wait for around 30 mins till cluster settle.
  • ./createdisk.sh crc-tmp-install-data

Monitoring

The installation is a long process. It can take up to 45 mins. You can monitor the progress of the installation with kubectl.

$ export KUBECONFIG=<directory_to_cloned_repo>/crc-tmp-install-data/auth/kubeconfig
$ kubectl get pods --all-namespaces

Building SNC for OKD 4

  • Before running ./snc.sh, you need to create a pull secret file, and set a couple of environment variables to override the default behavior.
  • Select the OKD 4 release that you want to build from: https://origin-release.apps.ci.l2s4.p1.openshiftapps.com
  • For example, to build release: 4.5.0-0.okd-2020-08-12-020541
# Create a pull secret file

cat << EOF > /tmp/pull_secret.json
{"auths":{"fake":{"auth": "Zm9vOmJhcgo="}}}
EOF

# Set environment for OKD build
export OKD_VERSION=4.5.0-0.okd-2020-08-12-020541
export OPENSHIFT_PULL_SECRET_PATH="/tmp/pull_secret.json"

# Build the Single Node cluster
./snc.sh
  • When the build is complete, create the disk image as described below.
export BUNDLED_PULL_SECRET_PATH="/tmp/pull_secret.json"
./createdisk.sh crc-tmp-install-data

Creating container image for bundles

After running snc.sh/createdisk.sh, the generated bundles can be uploaded to a container registry using this command:

./gen-bundle-image.sh <version> <openshift/okd/podman>

Note: a GPG key is needed to sign the bundles before they are wrapped in a container image.

Troubleshooting

OpenShift installer will create 2 VMs. It is sometimes useful to ssh inside the VMs. Add the following lines in your ~/.ssh/config file. You can then do ssh master and ssh bootstrap.

Host master
    Hostname 192.168.126.11
    User core
    IdentityFile <directory_to_cloned_repo>/id_ecdsa_crc
    StrictHostKeyChecking no
    UserKnownHostsFile /dev/null

Host bootstrap
    Hostname 192.168.126.10
    User core
    IdentityFile <directory_to_cloned_repo>/id_ecdsa_crc
    StrictHostKeyChecking no
    UserKnownHostsFile /dev/null

Environment Variables

The following environment variables can be used to change the default values of bundle generation.

SNC_GENERATE_MACOS_BUNDLE : if set to 0, bundle generation for MacOS is disabled, any other value will enable it. SNC_GENERATE_WINDOWS_BUNDLE : if set to 0, bundle generation for Windows is disabled, any other value will enable it. SNC_GENERATE_LINUX_BUNDLE : if set to 0, bundle generation for Linux is disabled, any other value will enable it.

Please note the SNC project is “as-is” on this Github repository. At this time, it is not an offically supported Red Hat solution.

snc's People

Contributors

adrianriobo avatar anjannath avatar bttnns avatar cardil avatar cfergeau avatar cgruver avatar danielmenezesbr avatar eranco74 avatar guillaumerose avatar mtarsel avatar noseka1 avatar prashanth684 avatar praveenkumar avatar redbeam avatar wking avatar zeenix avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

snc's Issues

Put VM image file last in the tar archive

Currently, the files in the archive are in a random order. I believe it should be possible to specify in which order they are by listing all the files explicitly on tar cmdline (instead of just the directory), or by using the -T argument to tar.
The benefit of doing that would be that it would make it much faster to access the smaller files as we would not need to skip the multi-GB VM image first.

Update kubelet.service in 99_master-kubelet-no-taint.yaml file.

We are using an older service file which is not going to work with 4.2.0 side if we don't make changes since now by default the crio service is not running and this is something specified in the Wants=rpc-statd.service crio.service and After=crio.service as part of the kubelet unit file
.

Current kubelet unit file on the 4.2 side looks like.

$ sudo systemctl cat kubelet
# /etc/systemd/system/kubelet.service
[Unit]
Description=Kubernetes Kubelet
Wants=rpc-statd.service crio.service
After=crio.service

[Service]
Type=notify
ExecStartPre=/bin/mkdir --parents /etc/kubernetes/manifests
ExecStartPre=/bin/rm -f /var/lib/kubelet/cpu_manager_state
EnvironmentFile=/etc/os-release
EnvironmentFile=-/etc/kubernetes/kubelet-workaround
EnvironmentFile=-/etc/kubernetes/kubelet-env

ExecStart=/usr/bin/hyperkube \
    kubelet \
      --config=/etc/kubernetes/kubelet.conf \
      --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig \
      --rotate-certificates \
      --kubeconfig=/var/lib/kubelet/kubeconfig \
      --container-runtime=remote \
      --container-runtime-endpoint=/var/run/crio/crio.sock \
      --node-labels=node-role.kubernetes.io/master,node.openshift.io/os_id=${ID} \
      --minimum-container-ttl-duration=6m0s \
      --cloud-provider= \
      --volume-plugin-dir=/etc/kubernetes/kubelet-plugins/volume/exec \
       \
      --register-with-taints=node-role.kubernetes.io/master=:NoSchedule \
      --v=3 \

Restart=always
RestartSec=10

[Install]
WantedBy=multi-user.target

Increase default disk size

Currently the disk we create has a 14GB partition/filesystem for user data, with about half of it free. This turns out to be too small for some use cases, and enlarging it can be tricky in some cases.
Since we are using dynamically allocated disk images, changing the size to 20GB should not make much difference on the overall size. qcow2, vmdk, and maybe vhdx all support this.

snc.sh fails due to missing metadata.json file

I followed the steps mentioned in the README page and executed ./snc.sh after copying the openshift-install binary. But it fails with below output:

$ ./snc.sh
/home/dshah/bin/oc
/home/dshah/src/snc/yq
/usr/bin/jq
DEBUG OpenShift Installer unreleased-master-1133-ga99bd59f377226102c146befca60e948f0c601c8 
DEBUG Built from commit a99bd59f377226102c146befca60e948f0c601c8 
FATAL Failed while preparing to destroy cluster: open crc-tmp-install-data/metadata.json: no such file or directory 
OpenShift pull secret must be specified through the OPENSHIFT_PULL_SECRET environment variable

There is indeed no metadata.json file:

$ ls -a crc-tmp-install-data
.  ..  .openshift_install.log

I'm trying to do this on Fedora 30 system. Also, what's OPENSHIFT_PULL_SECRET supposed to be set to?

Block the createdisk script if cert is not rotated for 30 days

Right now we don't have any check for cert rotation, we should first check if the cert rotated successfully for 30 days and then only start the disk creation process otherwise error out and tell user to wait for it. It will help our CI and also users who want to create the bundle using this script.

Convert image to vhdx for use with Hyper-V

Needs additional packages and converted to a dynamic virtual disk:

$ ${SSH} core@api.${CRC_VM_NAME}.${BASE_DOMAIN} rpm-ostree install hyperv-daemons
$ qemu-img convert -f qcow2 -O vhdx -o subformat=dynamic crc.qcow2 crc.vhdx

snc.sh fails with SSH "failed to create SSH client" error

Hi, I can't build the image for CRC to start with as when I run snc.sh I keep getting errors relating to SSH client:

time="2019-07-26T16:32:07+02:00" level=debug msg="Apply complete! Resources: 9 added, 0 changed, 0 destroyed."
time="2019-07-26T16:32:07+02:00" level=debug msg="OpenShift Installer unreleased-master-1440-g6e3bad2ccdc9960ecf5caf20e6b4fff03d00aadb"
time="2019-07-26T16:32:07+02:00" level=debug msg="Built from commit 6e3bad2ccdc9960ecf5caf20e6b4fff03d00aadb"
time="2019-07-26T16:32:07+02:00" level=info msg="Waiting up to 30m0s for the Kubernetes API at https://api.crc.testing:6443..."
time="2019-07-26T16:32:10+02:00" level=debug msg="Still waiting for the Kubernetes API: Get https://api.crc.testing:6443/version?timeout=32s: dial tcp 192.168.126.11:6443: connect: no route to host"
time="2019-07-26T16:32:15+02:00" level=debug msg="Still waiting for the Kubernetes API: Get https://api.crc.testing:6443/version?timeout=32s: dial tcp 192.168.126.10:6443: connect: connection refused"
...
time="2019-07-26T17:01:40+02:00" level=debug msg="Still waiting for the Kubernetes API: the server could not find the requested resource"
time="2019-07-26T17:02:07+02:00" level=debug msg="Fetching \"Install Config\"..."
time="2019-07-26T17:02:07+02:00" level=debug msg="Loading \"Install Config\"..."
time="2019-07-26T17:02:07+02:00" level=debug msg="  Loading \"SSH Key\"..."
time="2019-07-26T17:02:07+02:00" level=debug msg="  Loading \"Base Domain\"..."
time="2019-07-26T17:02:07+02:00" level=debug msg="    Loading \"Platform\"..."
time="2019-07-26T17:02:07+02:00" level=debug msg="  Loading \"Cluster Name\"..."
time="2019-07-26T17:02:07+02:00" level=debug msg="    Loading \"Base Domain\"..."
time="2019-07-26T17:02:07+02:00" level=debug msg="  Loading \"Pull Secret\"..."
time="2019-07-26T17:02:07+02:00" level=debug msg="  Loading \"Platform\"..."
time="2019-07-26T17:02:07+02:00" level=debug msg="Using \"Install Config\" loaded from state file"
time="2019-07-26T17:02:07+02:00" level=debug msg="Reusing previously-fetched \"Install Config\""
time="2019-07-26T17:02:07+02:00" level=info msg="Pulling debug logs from the bootstrap machine"
time="2019-07-26T17:04:14+02:00" level=error msg="failed to create SSH client, ensure the proper ssh key is in your keyring or specify with --key: dial tcp 31.199.53.9:22: connect: connection timed out"
time="2019-07-26T17:04:14+02:00" level=fatal msg="waiting for Kubernetes API: context deadline exceeded"
  • Setup:
$ ./openshift-install version
./openshift-install unreleased-master-1440-g6e3bad2ccdc9960ecf5caf20e6b4fff03d00aadb
built from commit 6e3bad2ccdc9960ecf5caf20e6b4fff03d00aadb
release image registry.svc.ci.openshift.org/origin/release:4.2

I'm on RHEL 7.6:

$ cat /etc/redhat-release
Red Hat Enterprise Linux Workstation release 7.6 (Maipo)
$ uname -a
Linux 3.10.0-957.12.2.el7.x86_64 #1 SMP Fri Apr 19 21:09:07 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

dep version:

dep version
dep:
 version     : v0.5.4
 build date  : 2019-07-01
 git hash    : 1f7c19e
 go version  : go1.12.6
 go compiler : gc
 platform    : linux/amd64
 features    : ImportDuringSolve=false
  • Steps to reproduce:

Go through:
https://github.com/code-ready/snc/blob/master/README.md
https://github.com/openshift/installer/blob/master/docs/dev/libvirt/README.md#one-time-setup
https://github.com/openshift/installer/blob/master/docs/dev/libvirt/README.md#build-the-installer

And finally:

$ cp <built_installer_binary> <directory_to_cloned_repo>
$ cd <directory_to_cloned_repo>
$ ./snc.sh
  • OpenShift installer
    openshift-installer was built from master branch sources, as well, setting TAGS=libvirt.
    Pull secret was obtained from https://try.openshift.com and stored in env variable as expected.

I've been checking my OS configuration WRT virtualization and then installed last qemu release from RPMs as the one delivered with RHEL 7.6 wasn't accepting one of the parameters passed by the shell script.
I've been testing libvirt/qemu setup by creating a RHEL 7 Workstation ISO based guest via virt-install and also configuring and running minishift, too. Both tests were successful.

Is there any information from your side on this?
Thanks.

snc build: Unknown name requested, could not find keepalived-ipfailover in UpdatePayload

I installed the centos7 on the laptop and installed all dependencies for the openshift-install and snc script.
(Add kvm-common repository to have the latest qemu-kvm)
When I run the snc script. I get the error that the master was not started in the 30min.

[user@ce23652 snc]$ sudo virsh list --all
[sudo] password for user: 
 Id    Name                           State
----------------------------------------------------
 3     crc-85mk2-bootstrap            running
 4     crc-85mk2-master-0             running
[user@ce23652 snc]$ sudo virsh console 3
Connected to domain crc-85mk2-bootstrap
Escape character is ^]
[ 3126.214688] SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)
[ 3126.884085] SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)
[ 3127.608045] SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)
[ 3128.289552] SELinux: mount invalid.  Same superblock, different security settings for (dev mqueue, type mqueue)
[user@ce23652 snc]$ sudo virsh console 4
Connected to domain crc-85mk2-master-0
Escape character is ^]
[**    ] A start job is running for Ignition (disks) (52min 7s / no limit)[ 3130.116850] ignition[626]: GET https://api-int.crc.testing:22623/config/master: attempt #629
[ 3130.133065] ignition[626]: GET error: Get https://api-int.crc.testing:22623/config/master: dial tcp 192.168.126.10:22623: connect: connection refused
[  *** ] A start job is running for Ignition (disks) (52min 9s / no limit)

My crc-xxx-boostrap image does not start.

5b81 (image=quay.io/openshift-release-dev/ocp-release:4.1.9, name=stupefied_gagarin)
Aug 08 23:47:59 crc-8mjbc-bootstrap podman[2831]: 2019-08-08 23:47:59.806207983 +0000 UTC m=+0.303240560 container start ba321b1f5f60edd9596199dea4d33999a5d6b57cb9f3852e27d6741ecc645b81 (image=quay.io/openshift-release-dev/ocp-release:4.1.9, name=stupefied_gagarin)
Aug 08 23:47:59 crc-8mjbc-bootstrap podman[2831]: 2019-08-08 23:47:59.806261766 +0000 UTC m=+0.303294302 container attach ba321b1f5f60edd9596199dea4d33999a5d6b57cb9f3852e27d6741ecc645b81 (image=quay.io/openshift-release-dev/ocp-release:4.1.9, name=stupefied_gagarin)
Aug 08 23:47:59 crc-8mjbc-bootstrap bootkube.sh[1483]: F0808 23:47:59.947444       1 image.go:32] error: error: Unknown name requested, could not find keepalived-ipfailover in UpdatePayload
Aug 08 23:48:00 crc-8mjbc-bootstrap systemd[1]: bootkube.service: Main process exited, code=exited, status=255/n/a
Aug 08 23:48:00 crc-8mjbc-bootstrap systemd[1]: bootkube.service: Failed with result 'exit-code'.
Aug 08 23:48:05 crc-8mjbc-bootstrap systemd[1]: bootkube.service: Service RestartSec=5s expired, scheduling restart.
Aug 08 23:48:05 crc-8mjbc-bootstrap systemd[1]: bootkube.service: Scheduled restart job, restart counter is at 1.
Aug 08 23:48:05 crc-8mjbc-bootstrap systemd[1]: Stopped Bootstrap a Kubernetes cluster.
``
Do I am missing something?
What is the best OS to build the bundle?

Remove pull secret from the disk image.

We need to replace the pull secret from the disk image so we can easily share it, right now we have 2 different places which it is listed and should be removed.

$ cat pull-secret.yaml 
apiVersion: v1
data:
  .dockerconfigjson: e30K
kind: Secret
metadata:
  name: pull-secret
  namespace: openshift-config
type: kubernetes.io/dockerconfigjson

$ oc get secrets pull-secret -n openshift-config -oyaml
  • /var/lib/kubelet/config.json

Provide a way to undo changes to system

I spent 2 days trying to debug why https://github.com/openshift/installer stopped working for me, only to find out in the end that it was because when I recently ran the SNC script, it left /etc/NetworkManager/dnsmasq.d/openshift.conf, which takes precedence over /etc/NetworkManager/conf.d/openshift.conf that Installer uses.

We should provide a way to undo this (or any other relevant host configurations SNC script does).

Wait until the VM is shutdown after `virsh shutdown` call

createdisk.sh calls virsh shutdown to cleanly stop the crc VM. However, this call is asynchronous, so if we don't wait until the image is fully shutdown, the next step could end up working from a corrupted image (because it was being modified while we were reading it).
Calling virsh event would hopefully do that job for us without needing to poll. Alternatively virsh domstate will tell us if the VM is still running or not.

How to create CRC bundle

Do i use SNC to build a bundle for CRC?

I'd like to use CRC, but it requires a bundle file which I don't see how to build.

Any help is appreciated.

J

Workaround hyperkit qcow2 bug

This is a workaround for an old issue that the project has been having, see crc-org/osp4#9
Some qcow2 images created by qemu-img causes a freeze in hyperkit qcow2 implementation, see moby/hyperkit#221
Enabling lazy refcounts in the qcow2 image seems to avoid this issue, and this is a performance improvement which should not have many adverse effects (slower startup in case of improper VM shutdown), see https://lists.gnu.org/archive/html/qemu-devel/2012-06/msg03827.html

Issues with bundle creation targeting Windows 10 /Hyper-V

Hi,

I try to build my own Hyper-V bundle and test it on the windows 10.
I get an error: id_rsa_crc: Access is denied
Do I need to setup any special rights on windows 10?

λ  .\crc.exe version                                                                        
version: 0.89.1-alpha-4.1.6+3946ae0                                                         
C:\work\programs\crc            
λ  .\crc.exe start --log-level debug -b crc_hyperv_4.1.6.crcbundle                                                                                                             
INFO Checking if oc binary is cached                                                                                                                                           
DEBU oc binary already cached                                                                                                                                                  
INFO Checking if CRC bundle is cached in '$HOME/.crc'                                                                                                                          
INFO Check Windows 10 release                                                                                                                                                  
INFO Hyper-V installed and operational                                                                                                                                         
INFO Is user a member of the Hyper-V Administrators group                                                                                                                      
INFO Does the Hyper-V virtual switch exist                                                                                                                                     
INFO Extracting the crc_hyperv_4.1.6.crcbundle Bundle tarball ...                                                                                                              
ERRO Error occurred: Error to get bundle Metadata Error during extraction : open C:\Users\apetras\.crc\cache\crc_hyperv_2019-08-07\id_rsa_crc: Access is denied.               
C:\work\programs\crc                                                                                                                                                                                                            

Need update for crc-bundle-info file creation (extra slash)

update_json_description crc-tmp-install-data/
+ cat crc-tmp-install-data//crc-bundle-info.json
+ jq '.clusterInfo.masterHostname = "crc-fdzt7-master-0"'
+ jq '.clusterInfo.sshPrivateKeyFile = "id_rsa_crc"'
+ jq '.clusterInfo.kubeConfig = "kubeconfig"'
+ jq '.clusterInfo.kubeadminPasswordFile = "kubeadmin-password"'
+ jq '.storage.diskImages[0].name = "crc.qcow2"'
+ jq '.storage.diskImages[0].format = "qcow2"'
cat: crc-tmp-install-data//crc-bundle-info.json: No such file or directory
+ rm crc-tmp-install-data//crc-bundle-info.json
rm: cannot remove ‘crc-tmp-install-data//crc-bundle-info.json’: No such file or directory

Don't disable the marketplace operator

Right now we are disabling the marketplace operator by default but it is not talking much resource like we anticipated, also most of the operator developer need it during the development phase and even user need to get some operator for it.

2 routers being created with a single node

crc version
crc - Local OpenShift 4.x cluster
version: 0.87.0-alpha-4.1.0+3a5033a

There is a pending router-default-X pod after a successful deployment:

oc get pods --all-namespaces  | grep -v -E 'Running|Completed'
NAMESPACE                                               NAME                                                              READY   STATUS      RESTARTS   AGE
openshift-ingress                                       router-default-5fdbdc678-s8bn2                                    0/1     Pending     0          10d

oc get events
LAST SEEN   TYPE      REASON             OBJECT                               MESSAGE
5m8s        Normal    Pulled             pod/router-default-5fdbdc678-9xqcc   Container image "quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7afd1b6aace6db643532680ca61761cf66ded116f8f673ec89121dbd424b2a15" already present on machine
5m2s        Normal    Created            pod/router-default-5fdbdc678-9xqcc   Created container router
5m2s        Normal    Started            pod/router-default-5fdbdc678-9xqcc   Started container router
9d          Warning   FailedScheduling   pod/router-default-5fdbdc678-s8bn2   0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports.
2m8s        Warning   FailedScheduling   pod/router-default-5fdbdc678-s8bn2   0/1 nodes are available: 1 node(s) didn't have free ports for the requested pod ports.

As it only deploys a single node, the ports 80/443 are already bounded to the other router:

oc get pods
NAME                             READY   STATUS    RESTARTS   AGE
router-default-5fdbdc678-9xqcc   1/1     Running   0          10d
router-default-5fdbdc678-s8bn2   0/1     Pending   0          10

As a workaround, scale the replicas to 1 as:

oc patch \
   --namespace=openshift-ingress-operator \
   --patch='{"spec": {"replicas": 1}}' \
   --type=merge \
   ingresscontroller/default

Update network config and add `api-int` entry for dns.

Openshift now need api-int query to be part of the network configuration we need to add it in our libvirt template side.

https://github.com/code-ready/snc/blob/master/crc_libvirt.template#L100-L101

<host ip='192.168.126.11'> 
 <hostname>api.ReplaceMeWithCorrectVmName.ReplaceMeWithCorrectBaseDomain</hostname>
 <hostname>api-int.ReplaceMeWithCorrectVmName.ReplaceMeWithCorrectBaseDomain</hostname>
 <hostname>etcd-0.ReplaceMeWithCorrectVmName.ReplaceMeWithCorrectBaseDomain</hostname>
</host>

snc.sh fails due to connection timeout

while executing ./snc.sh on a Ubuntu machine, the script fails with the following error

ERROR                                              
ERROR Error: Unassigned variable                   
ERROR                                              
ERROR The input variable "libvirt_master_ips" has not been assigned a value. This is 
ERROR a bug in Terraform; please report it in a GitHub issue. 
ERROR                                              
ERROR                                              
ERROR Error: Unassigned variable                   
ERROR                                              
ERROR The input variable "libvirt_uri" has not been assigned a value. This is a bug 
ERROR in Terraform; please report it in a GitHub issue. 
ERROR                                              
ERROR                                              
ERROR Error: Unassigned variable                   
ERROR                                              
ERROR The input variable "libvirt_bootstrap_ip" has not been assigned a value. This 
ERROR is a bug in Terraform; please report it in a GitHub issue. 
ERROR                                              
ERROR                                              
ERROR Error: Unassigned variable                   
ERROR                                              
ERROR The input variable "libvirt_network_if" has not been assigned a value. This is 
ERROR a bug in Terraform; please report it in a GitHub issue. 
ERROR                                              
ERROR                                              
ERROR Error: Unassigned variable                   
ERROR                                              
ERROR The input variable "os_image" has not been assigned a value. This is a bug in 
ERROR Terraform; please report it in a GitHub issue. 
ERROR                                              
ERROR Failed to read tfstate: open /tmp/openshift-install-103064960/terraform.tfstate: no such file or directory 
FATAL failed to fetch Cluster: failed to generate asset "Cluster": failed to create cluster: failed to apply using Terraform 
This is known to fail with:
'pool master is not ready - timed out waiting for the condition'
see https://github.com/openshift/machine-config-operator/issues/579
ssh: Could not resolve hostname api.crc.testing: Name or service not known
ssh: Could not resolve hostname api.crc.testing: Name or service not known
ssh: Could not resolve hostname api.crc.testing: Name or service not known
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
error: unable to recognize "STDIN": Get https://api.crc.testing:6443/api?timeout=32s: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host
Unable to connect to the server: dial tcp: lookup api.crc.testing on 127.0.0.53:53: no such host

Add hyperkit support

hyperkit support needs additional files/metadata:

  • we need to get the correct vmlinuz and initramfs, probably extracting them from the VM image
  • crc-bundle-info.json needs to describe these files
  • then we'll either need to also extract the full kernel command line for use by hyperkit, or add the ostree hash, the disk uuid, and the boot 'index' to the metadata file. The full kernel command line is similar to CMDLINE="BOOT_IMAGE=/ostree/rhcos-$OSTREE_HASH/$KERNEL console=tty0 console=ttyS0,115200n8 rootflags=defaults,prjquota rw root=UUID=$ROOT_UUID ostree=/ostree/boot.0/rhcos/$OSTREE_HASH/0 coreos.oem.id=qemu ignition.platform.id=qemu" (the so called 'boot index' is the '0' which appears twice in the ostree=... bit). This data will go in crc-bundle-info.json

Creates worker nodes

After running the provided script, the cluster comes up but it has worker nodes. Its probably just about setting replicas to 0 in install config. I'm trying that right now..

Remove the state of pods and container before creating the diskimage

We need to remove all the pods/container before creating the disk images so that when it is used for crc purpose it will not take the state of cluster which is shutdown. This should be part of create disk since it will be executed just before that or we can create an another script which contains the steps like deployments/replicas changes and this one and executed before createdisk but after the cert rotation from the snc side.

=== On the VM ==

$ systemctl stop kubelet
$ crictl stopp $(crictl pods -q)
$ crictl rmp $(crictl pods -q)

Force certificate rotation after creating the cluster

We currently need to wait 24h after creating the cluster before getting certificates valid for one month. We should have a way of forcing the rotation of the certificates without waiting for 24h

Experimenting with the commands below seemed to help:

# validity is 30 times the base
oc create -n openshift-config configmap unsupported-cert-rotation-config --from-literal='base=30s'

# forcing rotation
oc get secret -A -o json | jq -r '.items[] | select(.metadata.annotations."auth.openshift.io/certificate-not-after" | .!=null and fromdateiso8601<='$( date --date='+1year' +%s )') | "-n \(.metadata.namespace) \(.metadata.name)"' | xargs -n3 oc patch secret -p='{"metadata": {"annotations": {"auth.openshift.io/certificate-not-after": null}}}'

# Wait ~ 5-10 minutes

# Make sure at least the apiserver serving cert has 15 min validity (change your cluster name based on your kubeconfig)
openssl s_client -connect api.tnozicka-1.devcluster.openshift.com:6443 | openssl x509 -noout -dates


# go back to normal certrotation setting
oc delete -n openshift-config configmap unsupported-cert-rotation-config

Project/namespace deletion goes to hanging state.

On CRC side when a user create a new project/namespace and want to delete then it goes to hanging state forever. Looking on the kubernetes side and found out it is occur because of CRD's [0][1], as part of our CRC creation we scale down the cluster wide monitoring operator[2] which serve 'v1beta1.metrics.k8s.io` as per my understanding. if we also delete the apiservice that would not break anything for the users and if a user want to try out the cluster wide monitoring then it will again created as part of CRD's.

[0] kubernetes/kubernetes#60807
[1] kubernetes/kubernetes#60807 (comment)
[2] https://github.com/code-ready/snc/blob/master/snc.sh#L184-L186

Add virt-sparsify

$ qemu-img info crc.qcow2 
image: crc.qcow2
file format: qcow2
virtual size: 31G (33285996544 bytes)
disk size: 9.0G
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: true
    refcount bits: 16
    corrupt: false

$ virt-sparsify ./crc.qcow2 ./crc_sparse.qcow2 --tmp tmp/
[   0.1] Create overlay file in tmp/ to protect source disk
[   0.1] Examine source disk
 100% ⟦▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒⟧ --:--
[   5.9] Fill free space in /dev/sda2 with zero
[   6.7] Fill free space in /dev/sda3 with zero
 100% ⟦▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒▒⟧ 00:00
[  47.1] Copy to destination and make sparse
[  98.0] Sparsify operation completed with no errors.
virt-sparsify: Before deleting the old disk, carefully check that the 
target disk boots and works correctly.


$ qemu-img info crc_sparse.qcow2 
image: crc_sparse.qcow2
file format: qcow2
virtual size: 31G (33285996544 bytes)
disk size: 8.3G
cluster_size: 65536
Format specific information:
    compat: 1.1
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.