Git Product home page Git Product logo

cloud-prepare's People

Contributors

aswinsuryan avatar dependabot[bot] avatar dfarrell07 avatar jaanki avatar maayanf24 avatar mangelajo avatar mkolesnik avatar nyechiel avatar skeeey avatar skitt avatar sridhargaddam avatar submariner-bot avatar tpantelis avatar vthapar avatar yboaron avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

cloud-prepare's Issues

any plan to upgrade k8s package to above v0.19

hi, team, have any plans to upgrade k8s package like client-go to above v0.19?
The submariner-addon is using client-go v0.20.5, and there are some incompatible issues when I import cloud-prepare .

# github.com/submariner-io/admiral/pkg/resource
vendor/github.com/submariner-io/admiral/pkg/resource/dynamic.go:29:21: not enough arguments in call to d.client.Get
	have (string, "k8s.io/apimachinery/pkg/apis/meta/v1".GetOptions)
	want (context.Context, string, "k8s.io/apimachinery/pkg/apis/meta/v1".GetOptions, ...string)
vendor/github.com/submariner-io/admiral/pkg/resource/dynamic.go:38:24: not enough arguments in call to d.client.Create
	have (*unstructured.Unstructured, "k8s.io/apimachinery/pkg/apis/meta/v1".CreateOptions)

the dynamic client methods are changed from v0.19.

Deprecate non-dedicated gateways on 0.15

What would you like to be removed:
Let's deprecate this mode for 0.15 and remove it on 0.16

Why is this needed:
We're not actually testing or using non-dedicated gateway mode from cloud prepare.
Furthermore, a more K8s native approach is to deploy with LB mode which actually handles the cloud related operations properly.

From a technical debt perspective, the code is just sitting there and making cloud-prepare and subctl harder to maintain, with it removed we'll have an easier maintenance burden and could even further simplify the cloud prepare code.

Remove "generic cloud prepare" mode on 0.16

What would you like to be added:
Let's deprecate this mode for 0.15 and remove it on 0.16

Why is this needed:
We're not actually testing or using the "generic cloud prepare" mode.
Furthermore, we already have code for making sure gateways have labels in the join command, which makes much more sense. Having a duplicate code path just increases the maintenance burden without adding any value.
Additionally, this mode on cloud prepare adds no special additional capabilities beyond what subctl join already does or can do.

From a technical debt perspective, the code is just sitting there and making cloud-prepare & subctl harder to maintain, with it removed we'll have an easier maintenance burden and could even further simplify cloud prepare code.

Cloud-prepare: Add support for configuring existing worker nodes as GW nodes on GCP

What would you like to be added:
We need support for configuring one (or more) of worker nodes in the Cluster as Submariner Gateway nodes.
The following configuration has to be done for the Gateway nodes.

  1. The GCP instance networkTag has to be updated to match the External Firewall Rules
  2. The GCP instance should be associated with a public-ip
  3. The K8s worker node should be tagged with submariner.io/gateway=true tag

Note: The following issue handles the dedicated Gateway node use-case #91

cloud-prepare should support selecting a suitable GW instance type

What would you like to be added:
While installing dedicated Gateway nodes in some clouds, the region may not support the requested instance type.
For example, AWS us-west-1 does not support m5n.large instance type. It would be useful if cloud-prepare library can support a mode where it can choose the most appropriate instance available in the region.

Currently, the signature for the API is
NewOcpGatewayDeployer(cloud api.Cloud, msDeployer ocp.MachineSetDeployer, instanceType string)

We can modify the code to allow an empty instanceType and treat this as an option for cloud-prepare to choose the appropriate instance available in the region.

Allow scaling the number of gateway nodes above the number of Zones

What would you like to be added:

The cloud prepare implementation for GCP and Azure should allow scaling beyond the number of Zones.

Why is this needed:

Limiting the maximum number of gateway nodes to number of Zones mays restrict the number of g/w nodes that can be spawned if region has too few Zones.

GW security group not deleted after running 'subctl cloud cleanup aws'

After running the following commands on OCP 4.11 cluster on AWS:

  • subctl cloud prepare aws --ocp-metadata < metadatafile >
  • subctl cloud cleanup aws --ocp-metadata < metadatafile >
    check [1] for more details.

subctl version : subctl version: v0.14.0-rc4

I can still see the submariner GW security group on AWS UI.

[1]

$ subctl cloud prepare aws --ocp-metadata aws-cluster-a/metadata.json
✓ Preparing AWS cloud for Submariner deployment
✓ Obtained infra ID "yb-awsa-9b2kp" and region "us-east-2" from OCP metadata file "aws-cluster-a/metadata.json"
✓ Initializing AWS connectivity
✓ Retrieving VPC ID
✓ Retrieved VPC ID vpc-0f50548218add38be
✓ Validating pre-requisites
✓ Validated pre-requisites
✓ Creating Submariner gateway security group
✓ Created Submariner gateway security group yb-awsa-9b2kp-submariner-gw-sg
✓ Adjusting public subnet yb-awsa-9b2kp-public-us-east-2a to support Submariner
✓ Adjusted public subnet yb-awsa-9b2kp-public-us-east-2a to support Submariner
✓ Deploying gateway node for public subnet yb-awsa-9b2kp-public-us-east-2a
✓ Deployed gateway node for public subnet yb-awsa-9b2kp-public-us-east-2a
✓ Retrieving VPC ID
✓ Retrieved VPC ID vpc-0f50548218add38be
✓ Validating pre-requisites
✓ Validated pre-requisites
✓ Opening port 4800 protocol udp for intra-cluster communications
✓ Opened port 4800 protocol udp for intra-cluster communications
$
$ subctl cloud cleanup aws --ocp-metadata aws-cluster-a/metadata.json
✓ Obtained infra ID "yb-awsa-9b2kp" and region "us-east-2" from OCP metadata file "aws-cluster-a/metadata.json"
✓ Initializing AWS connectivity
✓ Retrieving VPC ID
✓ Retrieved VPC ID vpc-0f50548218add38be
✓ Validating pre-requisites
✓ Validated pre-requisites
✓ Removing gateway node for public subnet yb-awsa-9b2kp-public-us-east-2a
✓ Removed gateway node for public subnet yb-awsa-9b2kp-public-us-east-2a
✓ Untagging public subnet yb-awsa-9b2kp-public-us-east-2a from supporting Submariner
✓ Untagged public subnet yb-awsa-9b2kp-public-us-east-2a from supporting Submariner
✓ Deleting Submariner gateway security group
✓ Deleted Submariner gateway security group
✓ Retrieving VPC ID
✓ Retrieved VPC ID vpc-0f50548218add38be
✓ Validating pre-requisites
✓ Validated pre-requisites
✓ Revoking intra-cluster communication permissions
✓ Revoked intra-cluster communication permissions

cloud-prepare: Extract GatewayDeployer interface

Following discussion with OCM consumers it seems that it would be better to extract gateway deployment to a different API.

The proposed API is:

type GatewayDeployInput struct {
	// List of ports to open externally so that Submariner can reach and be reached by other Submariners
	PublicPorts []PortSpec

	// Amount of gateways that are being deployed
	//  0 (AutoGateways) = Deploy gateways per the default deployer policy (Default if not specified)
	//
	// 1-* = Deploy the amount of gateways requested (May fail if there aren't enough public subnets)
	Gateways int
}

type GatewayDeployer interface {
	Deploy(input GatewayDeployInput, reporter Reporter) error
	Cleanup(reporter Reporter) error
}

What would you like to be added:

Why is this needed:

Deprecate "generic cloud prepare" mode on 0.15

What would you like to be added:
Let's deprecate this mode for 0.15 and remove it on 0.16

Why is this needed:
We're not actually testing or using the "generic cloud prepare" mode.
Furthermore, we already have code for making sure gateways have labels in the join command, which makes much more sense. Having a duplicate code path just increases the maintenance burden without adding any value.
Additionally, this mode on cloud prepare adds no special additional capabilities beyond what subctl join already does or can do.

From a technical debt perspective, the code is just sitting there and making cloud-prepare & subctl harder to maintain, with it removed we'll have an easier maintenance burden and could even further simplify cloud prepare code.

AWS prepare without credentials present fails due to `no EC2 IMDS role found`

What happened:
Running cloud prepare for AWS via subctl, when no ~/.aws/credentials file is found, fails with:

 ✓ Preparing AWS cloud for Submariner deployment
 ✓ Obtained infra ID "mkolesni-subm-deb2-42pgb" and region "us-east-1" from OCP metadata file "mkolesni-subm-deb2/metadata.json"
 ✓ Initializing AWS connectivity
 ✗ Retrieving VPC ID 
 ✗ Unable to retrieve the VPC ID: error describing AWS VPCs: operation error EC2: DescribeVpcs, failed to sign request: failed to retrieve credentials: failed to refresh cached credentials, no EC2 IMDS role found, operation error ec2imds: GetMetadata, request canceled, context deadline exceeded
 ✗ Failed to prepare AWS cloud: unable to retrieve the VPC ID: error describing AWS VPCs: operation error EC2: DescribeVpcs, failed to sign request: failed to retrieve credentials: failed to refresh cached credentials, no EC2 IMDS role found, operation error ec2imds: GetMetadata, request canceled, context deadline exceeded

subctl version: devel

What you expected to happen:
It should present a clear error message
On 0.11.2 it used to present this message:

 ✗ Retrieving AWS credentials from your AWS configuration
 ✗ failed to read AWS credentials from /root/.aws/credentials: open /root/.aws/credentials: no such file or directory

How to reproduce it (as minimally and precisely as possible):
Install openshift on AWS using openshift-installer: ./openshift-install create cluster
Run cloud prepare: subctl cloud prepare aws

Anything else we need to know?:

Environment:

  • Diagnose information (use subctl diagnose all):
  • Gather information (use subctl gather):
  • Cloud provider or hardware configuration: AWS
  • Install tools: openshift-install 4.10.16
  • Others: Happens on devel and on 0.12.1

Azure - ResourceNotFound during deployment of gateway

Running cloud prepare (OCP 4.12 install, if it matters) and the perparation fails due to not being able to find interfaces:

[root@36efcb63aedb shipyard]# subctl cloud prepare azure --ocp-metadata output/ocp-cluster2/ --auth-file ~/.azure/osServicePrincipal.json
 ✓ Preparing Azure cloud for Submariner deployment 
 ✓ Obtained infra ID "mkolesni-testday-clus-6qgc4" and region "eastus" from OCP metadata file "output/ocp-cluster2/"
 ✓ Retrieving Azure credentials from your Azure authorization file "/root/.azure/osServicePrincipal.json"
 ✓ Initializing Azure connectivity 
 ✗ Deploying gateway node 
 ✗ Failed to open the Submariner gateway port for already existing nodes: error getting the interfaces "mkolesni-testday-clus-fgz8k-worker-eastus2-h2r5f-nic" from resource group "mkolesni-testday-clus-6qgc4-rg": GET https://management.azure.com/subscriptions/03e5f0ef-0741-442a-bc1b-ba34ceb3f63f/resourceGroups/mkolesni-testday-clus-6qgc4-rg/providers/Microsoft.Network/networkInterfaces/mkolesni-testday-clus-fgz8k-worker-eastus2-h2r5f-nic
--------------------------------------------------------------------------------
RESPONSE 404: 404 Not Found
ERROR CODE: ResourceNotFound
--------------------------------------------------------------------------------
{
  "error": {
    "code": "ResourceNotFound",
    "message": "The Resource 'Microsoft.Network/networkInterfaces/mkolesni-testday-clus-fgz8k-worker-eastus2-h2r5f-nic' under resource group 'mkolesni-testday-clus-6qgc4-rg' was not found. For more details please go to https://aka.ms/ARMResourceNotFoundFix"
  }
}
--------------------------------------------------------------------------------

 ✗ Failed to prepare Azure  cloud: Deployment failed : failed to open the Submariner gateway port for already existing nodes: error getting the interfaces "mkolesni-testday-clus-fgz8k-worker-eastus2-h2r5f-nic" from resource group "mkolesni-testday-clus-6qgc4-rg": GET https://management.azure.com/subscriptions/03e5f0ef-0741-442a-bc1b-ba34ceb3f63f/resourceGroups/mkolesni-testday-clus-6qgc4-rg/providers/Microsoft.Network/networkInterfaces/mkolesni-testday-clus-fgz8k-worker-eastus2-h2r5f-nic
--------------------------------------------------------------------------------
RESPONSE 404: 404 Not Found
ERROR CODE: ResourceNotFound
--------------------------------------------------------------------------------
{
  "error": {
    "code": "ResourceNotFound",
    "message": "The Resource 'Microsoft.Network/networkInterfaces/mkolesni-testday-clus-fgz8k-worker-eastus2-h2r5f-nic' under resource group 'mkolesni-testday-clus-6qgc4-rg' was not found. For more details please go to https://aka.ms/ARMResourceNotFoundFix"
  }
}
--------------------------------------------------------------------------------


subctl version: v0.15.0-m2

AWS: cloud-prepare fails if a security group already exists

What happened:

I deployed Submariner on a cluster, then removed it, and tried to prepare the cluster again for Submariner (the gateway node had been removed).

cloud-prepare failed with

 ✗ InvalidGroup.Duplicate: The security group 'skitt-1-c6nr4-submariner-gw-sg' already exists for VPC 'vpc-00b8d5512b0e9a056'
        status code: 400, request id: 770c894e-4bea-4ae1-abc6-c2fed115bb7e
Failed to prepare AWS cloud: InvalidGroup.Duplicate: The security group 'skitt-1-c6nr4-submariner-gw-sg' already exists for VPC 'vpc-00b8d5512b0e9a056'
        status code: 400, request id: 770c894e-4bea-4ae1-abc6-c2fed115bb7e

What you expected to happen:

The cluster to be set up for Submariner.

How to reproduce it (as minimally and precisely as possible):

Run cloud-prepare, delete the gateway node, run cloud-prepare again.

Anything else we need to know?:

Environment:

  • Diagnose information (use subctl diagnose all):
  • Gather information (use subctl gather):
  • Cloud provider or hardware configuration: AWS
  • Install tools: subctl (devel), openshift-installer 4.6.6
  • Others:

WIth dedicated node security group is associated with incorrect VM in cloud-prepare for Openstack

What happened:
Security group is associated with incorrect VM in cloud-prepare for Openstack while using dedicated g/w node. Instead of the VM created for dedicated g/w node. It is getting associated with a different node.

What you expected to happen:
Security groups should be associated with new node created which has the g/w node tag.

How to reproduce it (as minimally and precisely as possible):
run cloud prepare rhos with dedicated g/w node set to true.

Anything else we need to know?:

Environment:

  • Diagnose information (use subctl diagnose all):
  • Gather information (use subctl gather):
  • Cloud provider or hardware configuration:
  • Install tools:
  • Others:

cloud-prepare: better credentials

Currently we're using the openshift-machine-api user in AWS to run subctl cloud prepare/cleanup.
This user is created by the OpenShift installer and has some of the permissions we need, but is also lacking the following:

  • DeleteTags
  • AuthorizeSecurityGroupIngress
  • RevokeSecurityGroupIngress
  • CreateSecurityGroup
  • DeleteSecurityGroup
  • DescribeInstanceTypeOfferings

We have several alternatives to solve this:

  1. Receive credentials from the user (via command line, or read from AWS credentials file?)
  2. Get OpenShift to expand permissions in the user we're using (highly unlikely)
  3. Get OpenShift to create a user for Submariner during installation (also quite unlikely)
  4. Create the user ourselves somehow (another command? part of this command?)

cloud-prepare: Extract region & infra ID from a `metadata.json` file or K8s

What would you like to be added:
It would be more user friendly if we can let the user specify his metadata.json file and extract the infra ID and region from there.

e.g.:
subctl cloud prepare aws --ocp-metadata /path/to/metadata.json

instead of:
subctl cloud prepare aws --infra-id <infraid> --region <region>

Alternatively we could perhaps extract it from the OpenShift installation (only if it's stored there).

Why is this needed:

OSP: Subctl cloud prepare not happening sometimes

When running "subctl cloud prepare osp ..." on a fresh cluster, it properly configures the gateway node, creates the necessary Security Group rules and associates the SG to the nodes. However, if one of the existing nodes is already labelled as a gateway node and we run "subctl cloud prepare osp ..." it assumes that necessary steps are already done and is not associating the required Security Group rules on the nodes.

Workaround:

  • With subctl driven deployment: delete the submariner.io/gateway=true label and re-run the subctl cloud-prepare command.
  • With ACM: uninstall the submariner-addon from the ManagedCluster and re-install it.

Azure SDK migration

This week’s bump to the latest Azure SDK failed:

pkg/azure/azure.go:25:2: SA1019: package github.com/Azure/azure-sdk-for-go/services/network/mgmt/2021-03-01/network is deprecated: Please note, this package has been deprecated. A replacement package is available [github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/network/armnetwork](https://pkg.go.dev/github.com/Azure/azure-sdk-for-go/sdk/resourcemanager/network/armnetwork). We strongly encourage you to upgrade to continue receiving updates. See [Migration Guide](https://aka.ms/azsdk/golang/t2/migration) for guidance on upgrading. Refer to our [deprecation policy](https://azure.github.io/azure-sdk/policies_support.html) for more details. (staticcheck)
	"github.com/Azure/azure-sdk-for-go/services/network/mgmt/2021-03-01/network"
	^

See https://github.com/Azure/azure-sdk-for-go/blob/main/documentation/MIGRATION_GUIDE.md for details.

Remove non-dedicated gateways on 0.16

What would you like to be removed:
Let's deprecate this mode for 0.15 and remove it on 0.16

Why is this needed:
We're not actually testing or using non-dedicated gateway mode from cloud prepare.
Furthermore, a more K8s native approach is to deploy with LB mode which actually handles the cloud related operations properly.

From a technical debt perspective, the code is just sitting there and making cloud-prepare harder to maintain, with it removed we'll have an easier maintenance burned and could even further simplify cloud prepare code.

In Openstack, cloud prepare does not create gateway node with OVN Kubernetes CNI

What happened:
In Openstack, cloud prepare does not create gateway node with OVN Kubernetes CNI

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):
Run cloud prepare in a Openshift cluster with OVNKubernetes CNI

Anything else we need to know?:
The below error is shown is machineset controller

E0406 15:03:56.893497       1 controller.go:326]  "msg"="Reconciler error" "error"="error creating Openstack instance: error getting security groups: security group asuryanarhos-r7jdj-submariner-internal-sg not found" "controller"="machine-controller" "name"="asuryanarhos-r7jdj-submariner-gw-0-6vdcb" "namespace"="openshift-machine-api" "object"={"name":"asuryanarhos-r7jdj-submariner-gw-0-6vdcb","namespace":"openshift-machine-api"} "reconcileID"="26d333f5-695e-4213-807e-a24caf5a0b2e"
W0406 15:20:57.362743       1 controller.go:382] asuryanarhos-r7jdj-submariner-gw-0-6vdcb: failed to create machine: error creating Openstack instance: error getting security groups: security group asuryanarhos-r7jdj-submariner-internal-sg not found

Environment:

  • Diagnose information (use subctl diagnose all):
  • Gather information (use subctl gather):
  • Cloud provider or hardware configuration:
  • Install tools:
  • Others:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.