submariner-io / shipyard Goto Github PK

Framework and scripts to create multiple Kubernetes clusters with kind (K8s in Docker) for local E2E testing and development.

Home Page: https://submariner.io/for_developers/shipyard/

License: Apache License 2.0

Makefile 5.26% Shell 28.36% Go 65.89% Python 0.49%

shipyard's People

Contributors

Stargazers

Watchers

shipyard's Issues

Connector Pod in E2e tests is not waiting before retrying during some failures

Currently, as part of e2e tests, as soon as Connector pod is
scheduled, it tries to connect to the listener pod with a wait
interval configured in Config.ConnectionTimeout (defaults to 18 secs).

However, in some scenarios, if there is any error while accessing
the remote server, the current logic does not wait for 18 secs
before retrying for the next time. So, all the CONN_TRIES seem
to happen one after the other and the test-case is marked as failed.
This is seen with Globalnet jobs where it takes time for the
ingress/egress rules to programmed on the Gateway nodes.

Example:
Waiting for the connector pod "tcp-check-pod5rfhb" to exit, returning what connector sent
INFO: Pod "tcp-check-pod5rfhb" output:
nc: 169.254.2.13 (169.254.2.13:1234): No route to host
nc: 169.254.2.13 (169.254.2.13:1234): No route to host
nc: 169.254.2.13 (169.254.2.13:1234): No route to host
nc: 169.254.2.13 (169.254.2.13:1234): No route to host
nc: 169.254.2.13 (169.254.2.13:1234): No route to host
nc: 169.254.2.13 (169.254.2.13:1234): No route to host
nc: 169.254.2.13 (169.254.2.13:1234): No route to host

Support deploying with OVN/Multiple drivers

As part of the OVN work we will need to deploy clusters with mixed drivers, ideally OVN + weave side by side so we can cover OVN with no new jobs.

See: submariner-io/enhancements#11
Related-Issue: submariner-io/submariner#778

Dependabot can't resolve your Go dependency files

Dependabot can't resolve your Go dependency files.

As a result, Dependabot couldn't update your dependencies.

The error Dependabot encountered was:

go: gopkg.in/[email protected]: unrecognized import path "gopkg.in/inf.v0" (parse https://gopkg.in/inf.v0?go-get=1: no go-import meta tags ())

If you think the above is an error on Dependabot's side please don't hesitate to get in touch - we'll do whatever we can to fix it.

View the update logs.

Lighthouse images are not pulled locally with kind when deploying from scratch

What happened:
When making a deployment (make deploy from https://github.com/submariner-io/submariner) from scratch and submariner related images are not in local "docker images", the lighthouse images (lighthouse-agent and lighthouse-coredns) are not pulled.

The lighthouse deployment pods have the "ImagePullBackOff" status.

$ kubectl --kubeconfig output/kubeconfigs/kind-config-cluster2 -n submariner-operator get pods
NAME                                            READY   STATUS             RESTARTS   AGE
submariner-gateway-zcrw9                        1/1     Running            0          109s
submariner-lighthouse-agent-ccdbc9659-t8bv5     0/1     ImagePullBackOff   0          108s
submariner-lighthouse-coredns-557485fbc-pn45l   0/1     ImagePullBackOff       0          107s
submariner-lighthouse-coredns-557485fbc-wwm89   0/1     ImagePullBackOff   0          107s
submariner-operator-6675977db7-l5nl7            1/1     Running            0          2m2s
submariner-routeagent-276tw                     1/1     Running            0          109s
submariner-routeagent-z89p5                     1/1     Running            0          108s
submariner-routeagent-zth62                     1/1     Running            0          108s

Deployments request the following images:

$ kubectl --kubeconfig output/kubeconfigs/kind-config-cluster2 -n submariner-operator describe deployment submariner-lighthouse-agent | grep -i image
    Image:      localhost:5000/lighthouse-agent:local

$ kubectl --kubeconfig output/kubeconfigs/kind-config-cluster2 -n submariner-operator describe deployment submariner-lighthouse-coredns | grep -i image
    Image:      localhost:5000/lighthouse-coredns:local

The "docker images" output (lighthouse image are missing):

$ docker images
REPOSITORY                                           TAG                 IMAGE ID            CREATED             SIZE
quay.io/submariner/submariner-networkplugin-syncer   dev                 3b82d8483bdc        4 minutes ago       110MB
quay.io/submariner/submariner-networkplugin-syncer   devel               3b82d8483bdc        4 minutes ago       110MB
submariner                                           master              f2814c74b8f7        11 minutes ago      800MB
quay.io/submariner/submariner-globalnet              <none>              460ad9783141        50 minutes ago      124MB
quay.io/submariner/submariner-globalnet              dev                 6b44c121f332        50 minutes ago      124MB
quay.io/submariner/submariner-globalnet              devel               6b44c121f332        50 minutes ago      124MB
localhost:5000/submariner-route-agent                local               ad6c3273b6d4        50 minutes ago      124MB
quay.io/submariner/submariner-route-agent            dev                 ad6c3273b6d4        50 minutes ago      124MB
quay.io/submariner/submariner-route-agent            devel               ad6c3273b6d4        50 minutes ago      124MB
quay.io/submariner/submariner-route-agent            <none>              1a3ee61d89bc        50 minutes ago      124MB
quay.io/submariner/submariner                        <none>              32c193e4b57b        50 minutes ago      245MB
localhost:5000/submariner                            local               c17d290c5a16        50 minutes ago      245MB
quay.io/submariner/submariner                        dev                 c17d290c5a16        50 minutes ago      245MB
quay.io/submariner/submariner                        devel               c17d290c5a16        50 minutes ago      245MB
localhost:5000/submariner-operator                   local               5dab07c69884        11 hours ago        10MB
quay.io/submariner/submariner-operator               dev                 5dab07c69884        11 hours ago        10MB
quay.io/submariner/submariner-operator               devel               5dab07c69884        11 hours ago        10MB
quay.io/submariner/shipyard-dapper-base              devel               45c574b97e56        17 hours ago        800MB
fedora                                               33                  b3048463dcef        5 days ago          175MB
registry.access.redhat.com/ubi8/ubi-minimal          latest              c103a05423dd        2 weeks ago         103MB
localhost:5000/nettest                               local               ed8d90d0ba28        3 weeks ago         24.3MB
quay.io/submariner/nettest                           dev                 ed8d90d0ba28        3 weeks ago         24.3MB
quay.io/submariner/nettest                           devel               ed8d90d0ba28        3 weeks ago         24.3MB
quay.io/submariner/submariner-networkplugin-syncer   <none>              1554c6f5835a        3 weeks ago         165MB
registry                                             2                   2d4f4b5309b1        5 months ago        26.2MB
kindest/node                                         v1.17.0             ec6ab22d89ef        10 months ago       1.23GB

If I pull the images manually, everything is working.

What you expected to happen:
The "make deploy" command should pull all requested images and prepare the environment.

How to reproduce it (as minimally and precisely as possible):
Delete all submariner related images from docker images.

Environment:

Submariner version: v0.7.0
Kubectl version: Client Version: version.Info{Major:"1", Minor:"19", GitVersion:"v1.19.4", GitCommit:"d360454c9bcd1634cf4cc52d1867af5491dc9c5f", GitTreeState:"clean", BuildDate:"2020-11-11T13:17:17Z", GoVersion:"go1.15.2", Compiler:"gc", Platform:"linux/amd64"}
Kind version: 0.9.0
OS: Fedora 33
Kernel: Linux max 5.9.8-200.fc33.x86_64 #1 SMP Tue Nov 10 21:58:19 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

The docs troubleshooting guide is mentioning the lighthouse image issue, but I think that it should be happen automatically.

Update nettest image to include netperf

D/S CI collect K8 logs for debugging failed tests

we need to add a feature to record and collect cluster/aplications pod logs and save them in files , to be displayed when we click on a job in jenkins
example :
how would we know what went wrong below??
https://qe-jenkins-csb-skynet.cloud.paas.psi.redhat.com/job/Submariner-OSP-AWS/783/Test-Report/

Decide how to extract the CIDRs from Armada

It might be useful to extract the cluster/pod CIDRs that armada generates, we should investigate this further.

Move dapper-base image building to shipyard

Extend the E2E testing in Armada

The E2E coverage is very small, we should add more tests to cover the functionality better

Run subctl benchmark as part of OSP-AWS tests

In order to test subctl benchmark feature, we will follow the documentation: https://github.com/submariner-io/submariner-website/pull/274/files
And execute it on our OSP-AWS (Jenkins downstream) environment.

Originally posted by @manosnoam in https://github.com/submariner-io/submariner-operator/issue_comments/697210490

Review and propose changes for all files in pkg/deploy, pkg/image and pkg/wait directory

Go over all files there, review, and fix as appropriate

Split out deployment from subm/e2e.sh and move to shipyard

The deployment part creates the multi cluster environment, split it out and move to this project

Support FIPS mode on Red Hat-supported platforms

At some point we should support FIPS mode, which requires building Go binaries in such a way that they can use the system’s OpenSSL libraries rather than Go’s default BoringSSL. This is possible using go-toolset, which is available on RHEL, and can be enabled on UBI8 on hosts with an appropriate license:

 docker run -it --rm registry.access.redhat.com/ubi8/ubi:latest
 dnf install go-toolset

This currently provides Go 1.13 which is fine for our purposes.

Review the E2E testing in Armada itself

Review the existing E2E testing and fix as deemed appropriate

Provide Shipyard shared infra in all submariner-io repos

Currently, some repositories don't provide the shared Shipyard infrastructure:

It would be useful to be able to run local linting in those repos, via the shared Shipyard tooling, instead of relying on GHAs in CI.

While adding Shipyard infra to these projects, the update/refactor the (now-removed) docs about the process of adding Shipyard to repos.

Implement basic GH action

A basic action is needed for any KIND deployment to work, since the default GH actions VMs don't have enough free space.

Review comments that haven't been addressed yet and convert them to GH issues

Meta issue to review un-addressed comments in PR #1 and convert them to workable issues

Add latency pods to shipyard framework network_pods

To be used in benchmark latency tests
subctl benchmark
benchmark latency

Add globalnet detection to framework using dynclient

Submariner E2E tests detect the globalnet configuration dynamically [1] on init. Lighthouse does not do such detection and it would benefit from that to perform globalnet detection.

So we propose moving this detection (via dynclient) to the framework itself, then all framework usages can detect the globanet flag in the context.

[1] https://github.com/submariner-io/submariner/blob/master/test/e2e/framework/framework.go#L46

Review and propose changes for all files in pkg/clusters directory

Go over all files there, review, and fix as appropriate

Add capability to run consuming projects e2e

We need a way to run CI jobs that test that we don't break consuming projects.

One idea is to trigger such jobs in GHA by comments, eg "/testprojects" or something like that

D/S CI - add action to collect all cluster logs after job has failures

after a job is finished , we need to collect some basic cluster
logs , otherwise when the env is deleted we have no way to compare underlying issues of subsequent failures.

This should be added in:
https://github.com/manosnoam/ocp-multi-cluster-tester

Move codegen logic to Shipyard

The codegen target is used in Submariner and also was copied into Lighthouse.

It would be better to place it in Shipyard

Add globalnet flag to E2E Framework

The code to detect whether globalnet is enabled resides in the submariner project. Instead of moving it to shipyard, pass it as a flag on the go test command line as the front-end script already knows if globalnet is enabled.

Shipyard support for deploying OVN

As part of this task, we want to implement necessary changes in Shipyard to install OVN CNI.

https://github.com/ovn-org/ovn-kubernetes/blob/master/contrib/kind.sh

See: submariner-io/enhancements#11
Related-Issue: submariner-io/submariner#778

RunNoConnectivityTest has very low timeout -> false positives

test/e2e/tcp/connectivity.go

func RunNoConnectivityTest(p ConnectivityTestParams) (*framework.NetworkPod, *framework.NetworkPod) {
	if p.ConnectionTimeout == 0 {
		p.ConnectionTimeout = 5

mangelajo 4 hours ago Member
I guess this is the code we had, but while we are at refactors I don't want us to forget about it :)

Such timeout is too low:

we create listener with 5 sec timeout,
we go to 2nd cluster, and create connector, with the same timeout
but chances are that the creation time will be much higher than the connection attempts.

This leads to false positives, and we should probably bump this to at least (arbitrary) 30 seconds. I know it will make this test slower, but it will have less unlike false possitives.

We could refine our testing images in the future for better handling of timeouts, etc.

Implement github actions / shipyard

globalnet: E2E not working

https://travis-ci.com/github/submariner-io/submariner-operator/jobs/313547481

Shipyard expects clusters X to be allocated CIDR 169.254.X.0/24. But this is incorrect as Cluster1, which works as broker shouldn't be allocated any CIDR Current allocations are

 global_CIDRs['cluster2']='169.254.0.0/19'

 global_CIDRs['cluster3']='169.254.32.0/19'

These are being changed in submariner-io/submariner-operator#288 to

 global_CIDRs['cluster2']='169.254.0.0/24'

 global_CIDRs['cluster3']='169.254.1.0/24'

This should be modified to align with new values. The actual CIDR depends on globalnet-cluster-size passed at time of deploy-broker or join. So, either it should be made dynamic, or use the newer values that assume a cluster size of 255

Lighthouse now requires a ServiceExport

Without that ServiceExport, submariner-operator which smoke-tests the
discovery of the IP will fail.

Related-commit: submariner-io/lighthouse@c3bac6a

Fix polarion not reporting of Ginkgo tests junit output

We want to import the Junit.xml results that is being generated from E2E tests, into Polarion.
There's a bug in JUMP polarion script, that fails to parse test messages () into Polarion.

The required solution is in 2 phases:

Phase 1: Create a temporary workaround to modify the XML file, to be able to be read by JUMP tool - This can be done in Python or in Bash.

Phase 2: A complete resolution in the Ginkgo Test framework (GO) - to create the "" under "" section as PR (pull request).
This requires experience in the Ginkgo framework, to fix + add unit-tests (not related to Submariner), to verify the PR , and approval by the community.

Please go over the ticket, and try to reproduce the issue. Once reproduced, you can start fixing it.

CI: Allow arbitrary cluster names

Right now the assumption is cluster1, cluster2, etc.

It would be nice to support eg west, east, etc..

-dp-context of E2E has different orders across projects

Shipyard adds -dp-context cluster1 -dp-context cluster2 -dp-context cluster3, such order is used by Lighthouse.

Submariner uses -dp-context cluster2 -dp-context cluster3 -dp-context cluster1

Subctl consumes e2e providing only two dp contexts.

We need to make them uniform so :
* The shipyard e2e script will work for all
* Subctl can consume all uniformly.

I created this, but probably the best option is to make them all uniform the way shipyard includes them today, which makes more sense.

This needs to be thought.

submariner-io/lighthouse#151

Add capability to specify the broker cluster

Currently it's always cluster1, probably we need to add it to cluster_settings and be able to specify which cluster is the broker.

Add shared build target for images

Right now images are rebuilt each time, but we can only rebuild them if we need to.

It would be best to put this logic in Shipyard and use everywhere else.

Move version script fix to shipyard repo.

This issue is to keep track of the task of moving submariner-io/lighthouse#124 to Shipyard if that is better. If not this shall be closed.

E2E on submariner-operator fails (netshoot fimeout setting)

submariner-io/submariner-operator#376

https://github.com/submariner-io/submariner-operator/pull/376/checks?check_run_id=671110418#step:5:8351

deployment.apps/netshoot created
Waiting for netshoot pods to be ready.
[submariner-operator]$ [cluster2] kubectl rollout status deploy/netshoot --timeout=
[submariner-operator]$ [cluster2] kubectl rollout status deploy/netshoot --timeout=
[submariner-operator]$ [cluster2] command kubectl --context=cluster2 rollout status deploy/netshoot --timeout=
[submariner-operator]$ [cluster2] kubectl --context=cluster2 rollout status deploy/netshoot --timeout=
Error: invalid argument "" for "--timeout" flag: time: invalid duration

Testing in github actions (shipyard)

reload-images restart=routeagent does not work consistently.

In Submariner repo, we have a make target "make reload-images restart=routeagent" which allows us to update the route-agent Pods in the KIND Clusters and restart them at the same time. However, it was seen that when we execute that command, not all route-agent pods are getting restarted.
This is because the submariner-operator overrides the changes done as part of restart command shown below.

kubectl patch -n submariner-operator daemonset submariner-routeagent --type=json -p=[{"op": "replace", "path": "/spec/template/spec/containers/0/imagePullPolicy", "value": "Always" },{"op": "replace", "path": "/spec/template/metadata/labels/modified", "value": "1595246714"}]

Need a more reliable way to support this target.

Move E2E test Framework code to shipyard

test_connection fails while validating connectivity between the clusters

CI failure : nodes not found

https://qe-jenkins-csb-skynet.cloud.paas.psi.redhat.com/job/Submariner-OSP-AWS/783/testReport/junit/(root)/Setup%20subm/044___Test_Submariner_E2e_With_Subctl/
Failure :
Status: "Failure",
Message: "nodes "ip-10-166-136-149" not found",
Reason: "NotFound",

make deploy using=prometheus unusable outside Shipyard

The deployment assumes the YAML files are accessible, they aren’t in projects consuming Shipyard.

Old E2E namespaces should be removed or overridden

When running E2E tests (e.g. with subctl) multiple times, it may occur that old E2E namespaces, which includes Nginx, are left for days without being deleted.
This will eventually cause a low disk space and memory on the cluster nodes, and then pods will start to crash/evicted:

$ kubectl get pods -n submariner-operator -o wide
NAME                                           READY   STATUS    RESTARTS   AGE     IP             NODE                             NOMINATED NODE   READINESS GATES
submariner-gateway-px2b9                       0/1     Evicted   0          4m      <none>         default-cl1-k7hcq-worker-g2cnf   <none>           <none>
submariner-globalnet-2b74f                     0/1     Evicted   0          3m59s   <none>         default-cl1-k7hcq-worker-g2cnf   <none>           <none>
submariner-lighthouse-agent-6bc4766f97-lbxc2   1/1     Running   0          3h36m   10.255.0.213   default-cl1-k7hcq-worker-hctn6   <none>           <none>
submariner-lighthouse-coredns-c88f64f5-q4hfp   1/1     Running   0          3h36m   10.255.0.215   default-cl1-k7hcq-worker-hctn6   <none>           <none>
submariner-lighthouse-coredns-c88f64f5-qhhrf   1/1     Running   0          3h36m   10.255.0.214   default-cl1-k7hcq-worker-hctn6   <none>           <none>
submariner-operator-dcbdf5669-vngwt            1/1     Running   0          3h36m   10.255.0.212   default-cl1-k7hcq-worker-hctn6   <none>           <none>
submariner-routeagent-2j8bx                    1/1     Running   0          3h36m   10.166.2.69    default-cl1-k7hcq-worker-hctn6   <none>           <none>
submariner-routeagent-9w9q4                    1/1     Running   0          3h36m   10.166.2.159   default-cl1-k7hcq-master-1       <none>           <none>
submariner-routeagent-cdjhs                    1/1     Running   0          3h36m   10.166.2.92    default-cl1-k7hcq-master-2       <none>           <none>
submariner-routeagent-htcbm                    0/1     Evicted   0          3m59s   <none>         default-cl1-k7hcq-worker-g2cnf   <none>           <none>
submariner-routeagent-s6xpd                    1/1     Running   0          3h36m   10.166.2.82    default-cl1-k7hcq-master-0       <none>           <none>

Originally posted by @sridhargaddam in submariner-io/submariner#913 (comment)

Add using= flag support for cable-driver, fixing hard-coded "strongswan" and ignored overrides in operator CI

We currently implement using=<cable-driver> only in submariner-io/submariner, not in Shipyard. This has caused confusion in submariner-io/submariner-operator, where we try to use using=<cable-driver> although it isn't respected. We also have the cable-driver flag hard coded to strongswan, which is an issue as we try to switch the default to libreswan this release.

E2e Connector pod should wait until its ready before running any tests

Currently, in the e2e dataplane tests, we create connector and listener pods which internally use 'nc' utility for validating tcp connectivity. For listener pod, we wait until its ready, but the connector pod runs the connectivity test as soon as its deployed. While this is okay for vanilla submariner use-cases, its prone to failures with globalnet jobs.

Basically when using globalnet, one has to wait until the globalIp is annotated on the Pod/Service before cross-cluster connectivity can be validated.

Support 2 cluster deployments

Currently if deploying 2 clusters then connectivity isn't checked. Support deploying just 2 clusters and have connectivity check if they have submariner.
This is related to #136 since if we can deploy broker where submariner is deployed, we can save a cluster.

Shipyard & operator (and perhaps lighthouse) stand to benefit from this.

submariner-io / shipyard Goto Github PK

shipyard's People

Contributors

Stargazers

Watchers

Forkers

shipyard's Issues

Recommend Projects

Recommend Topics

Recommend Org