Git Product home page Git Product logo

nsx-operator's Introduction

NSX Operator

Overview

An operator for leveraging NSX network resources to manage networking for Kubernetes cluster.

Getting Started

Build nsx-operator with the following command:

make all

Documentation

Right now nsx-operator supports SecurityPolicy CRD reconciling, check out Security Policy document for details.

License

This repository is available under the Apache 2.0 license.

nsx-operator's People

Contributors

dantingl avatar dependabot[bot] avatar ggverma avatar gran-vmv avatar heypnus avatar jwsui avatar liu4480 avatar lxiaopei avatar seanpang-vmware avatar taozou1 avatar timdengyun avatar wenqiq avatar wsquan171 avatar zhengxiexie avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nsx-operator's Issues

nsx/client GetClient can't be triggered twice

if GetClient repeatdly, it would report error

2024-02-04 03:19:37.781 ERROR util/utils.go:250 handle http response {"status": 403, "requestUrl": "https://10.221.156.13/api/v1/node/version", "response body": "{"module_name":"common-service","error_message":"Bad XSRF token","error_code":98}", "error": "received HTTP Error"}
github.com/vmware-tanzu/nsx-operator/pkg/nsx/util.HandleHTTPResponse
/root/nsx-operator/pkg/nsx/util/utils.go:250
github.com/vmware-tanzu/nsx-operator/pkg/nsx.(*Cluster).GetVersion
/root/nsx-operator/pkg/nsx/cluster.go:351
github.com/vmware-tanzu/nsx-operator/pkg/clean.httpQueryDLBResources
/root/nsx-operator/pkg/clean/clean_dlb.go:55
github.com/vmware-tanzu/nsx-operator/pkg/clean.CleanDLB
/root/nsx-operator/pkg/clean/clean_dlb.go:77
github.com/vmware-tanzu/nsx-operator/pkg/clean.Clean.func2
/root/nsx-operator/pkg/clean/clean.go:66
k8s.io/client-go/util/retry.OnError.func1
/root/go/pkg/mod/k8s.io/[email protected]/util/retry/util.go:51
k8s.io/apimachinery/pkg/util/wait.runConditionWithCrashProtection
/root/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/wait.go:145
k8s.io/apimachinery/pkg/util/wait.ExponentialBackoff
/root/go/pkg/mod/k8s.io/[email protected]/pkg/util/wait/backoff.go:461
k8s.io/client-go/util/retry.OnError
/root/go/pkg/mod/k8s.io/[email protected]/util/retry/util.go:50
github.com/vmware-tanzu/nsx-operator/pkg/clean.Clean
/root/nsx-operator/pkg/clean/clean.go:59
main.main
/root/nsx-operator/cmd_clean/main.go:92
runtime.main
/usr/local/go/src/runtime/proc.go:267

Go and dependabot thinks there is a v1.0.0 tag for pkg/apis

Dependabot opened a bump PR for:

I wanted to take a look at the diff and realised the repo does not have a v1.0.0 tag or pkg/apis/v1.0.0.

Taking a look at how go resolves this:

for v1.0.0:

❯ go mod download -json github.com/vmware-tanzu/nsx-operator/pkg/[email protected]
{
        "Path": "github.com/vmware-tanzu/nsx-operator/pkg/apis",
        "Version": "v1.0.0",
        "Info": "/Users/schlotterc/go/pkg/mod/cache/download/github.com/vmware-tanzu/nsx-operator/pkg/apis/@v/v1.0.0.info",
        "GoMod": "/Users/schlotterc/go/pkg/mod/cache/download/github.com/vmware-tanzu/nsx-operator/pkg/apis/@v/v1.0.0.mod",
        "Zip": "/Users/schlotterc/go/pkg/mod/cache/download/github.com/vmware-tanzu/nsx-operator/pkg/apis/@v/v1.0.0.zip",
        "Dir": "/Users/schlotterc/go/pkg/mod/github.com/vmware-tanzu/nsx-operator/pkg/[email protected]",
        "Sum": "h1:jmHI88hySjGqkpc/QUmSY5G5SsDvoXxmKFEK/GmcHWs=",
        "GoModSum": "h1:ZR/7rewflpAhnswQ6NVkFN0JmaqHgmvDyFVsJLmZ+pw=",
        "Origin": {
                "VCS": "git",
                "URL": "https://github.com/vmware-tanzu/nsx-operator",
                "Subdir": "pkg/apis",
                "Hash": "553261d24be7d22d76251fae0cd85bb51be9bb9d",
                "Ref": "refs/tags/pkg/apis/v1.0.0"
        }
}

for v0.1.0

❯ go mod download -json github.com/vmware-tanzu/nsx-operator/pkg/[email protected]
{
        "Path": "github.com/vmware-tanzu/nsx-operator/pkg/apis",
        "Version": "v0.1.0",
        "Info": "/Users/schlotterc/go/pkg/mod/cache/download/github.com/vmware-tanzu/nsx-operator/pkg/apis/@v/v0.1.0.info",
        "GoMod": "/Users/schlotterc/go/pkg/mod/cache/download/github.com/vmware-tanzu/nsx-operator/pkg/apis/@v/v0.1.0.mod",
        "Zip": "/Users/schlotterc/go/pkg/mod/cache/download/github.com/vmware-tanzu/nsx-operator/pkg/apis/@v/v0.1.0.zip",
        "Dir": "/Users/schlotterc/go/pkg/mod/github.com/vmware-tanzu/nsx-operator/pkg/[email protected]",
        "Sum": "h1:HdnQb/X9vJ8a5WQ03g/0nDr9igIIK1fF6wO5wOtkJT4=",
        "GoModSum": "h1:Q4JzNkNMvjo7pXtlB5/R3oME4Nhah7fAObWgghVmtxk=",
        "Origin": {
                "VCS": "git",
                "URL": "https://github.com/vmware-tanzu/nsx-operator",
                "Subdir": "pkg/apis",
                "Hash": "1269a61ff22c969923f260553d7961803e53f63e",
                "Ref": "refs/tags/pkg/apis/v0.1.0"
        }
}

If we now take a look at the Hash, which is the commit hash referenced:

  • v1.0.0: 553261d24be7d22d76251fae0cd85bb51be9bb9d:

    ❯ git show 553261d24be7d22d76251fae0cd85bb51be9bb9d
    commit 553261d24be7d22d76251fae0cd85bb51be9bb9d
    Merge: 1ef2441 1d51c28
    Author: zhengxiexie <[email protected]>
    Date:   Mon Oct 30 13:13:15 2023 +0800
    
        Merge pull request #288 from zhengxiexie/codegen_alpha2_vpc_dev
    
        Support codegen for v1alpha2
  • v0.1.0: 1269a61ff22c969923f260553d7961803e53f63e

    ❯ git show 1269a61ff22c969923f260553d7961803e53f63e
    commit 1269a61ff22c969923f260553d7961803e53f63e (tag: pkg/apis/v0.1.0)
    Author: XiaoPei Liu <[email protected]>
    Date:   Fri Dec 22 10:02:17 2023 +0800
    
    Change pkg/apis and pkg/client in go.mod
    
    In go.mod, change to use local pkg/apis and pkg/client.

We can see v1.0.0 is actually older. Also that commit only exists on the vpc_dev branch, not on main.

I wanted to know where this came from, if there maybe was a push of a v1.0.0 tag by accident?

HA GC problem

When nsx-operator turns on HA mode, two pods can run GC simultaneously, this may lead to a problem that the subnetset are not consistent.
This problem is found by yifeng.xiao, not sure whether it is the direct reason, yet we can take it into consideration to refactor the GC stage.

Remove rule index from SecuirtyPolicy rule id and group id

Currently, the SecuirtyPolicy generated rule id is:
sp_spUID_ruleIndex_portIndex_portAddressIndex.

The group Id is:
spName-ruleIndex-src
or spName-ruleIndex-dst

for namedport group is like:
sp_spUID_ruleIndex_portIndex_portAddressIndex_ipset

ruleIndex is SP CR rule index and portIndex_portAddressIndex will computed based on rule port list or generated namedport list.

For example, namedport: http is corresponding: ports 8080-808 and ports 9091-9091,
So, the port index, port address index are:
0_0 ->8080
0_1 -> 8081
1_0 -> 9090
1_1 -> 9091

The rule index, portIndex and portAddressIndex will be changed when rule deleted or added. This will change generated rule and group ID as well. This is not a ideal solution, it's good to keep rule and group unique only based on rule and group contents.

So, it's better to remove rule index from SecuirtyPolicy rule id and group id to make rule and group unique

Operator Config [DEFAULT] debug = True is not working as expecting DEBUG log message output

When user set config option debug = True is not working as expected. The logger output is the same with default level:0

In current logger, user can input argument -log-level int to change logger level,
The default -log-level is 0, which means info and Error log will be logged.

The -log-level is set as 1, which means info, Error and Debug log will be logged.
For example: when -log-level is 1, we can see DEBUG message like:
2023-07-28 05:13:25.303 DEBUG cmd/main.go:59 starting NSX Operator

The -log-level is set as 2, which means info, Error, Debug log and more less detailed message will be logged.
Like output below:
2023-07-28 05:13:25.303 DEBUG cmd/main.go:59 starting NSX Operator
2023-07-28 05:13:25.303 LEVEL(-2) cmd/main.go:60 starting NSX Operator

So, if setting debug = True in config file, which means log level should be 1. So, we need to change log level is equal to or more than 1, depending -log-level set by user, it should take bigger value of two inputs.

DHCP subnetport has ipAddress "/26"

status:
  attachment:
    id: 39383535-6338-4132-ad38-6537372d3432
  conditions:
  - lastTransitionTime: "2024-05-01T21:19:17Z"
    message: NSX subnet port has been successfully created/updated
    reason: NSX API returned 200 response code for PATCH
    status: "True"
    type: Ready
  networkInterfaceConfig:
    ipAddresses:
    - gateway: 172.26.1.1
      ipAddress: /26     <----------------------------
    logicalSwitchUUID: 7c5f3494-545e-4162-bf56-b11e1260ae66

The ipAddress looks not regular. If we follow the T1 networking implementation, the ipAddress should be null, but that has a side effect that the VM operator cannot calculate the netmask from ipAddress. According to the discussion, the side effect is acceptable from VM operator side. Let's use the issue to track the fix.

Remove reliance on vsphere-automation sdk fork

Currently nsx-operator replaces the vsphere-automation-sdk with a fork. This fork does not contain the same versions as its upstream counterpart. As a result, when one tries to import nsx-operator into a go project, we are greeted with the following errors:

go: github.com/vmware/vsphere-automation-sdk-go/[[email protected]](mailto:[email protected]): invalid version: unknown revision lib/v0.5.2
go: github.com/vmware/vsphere-automation-sdk-go/[[email protected]](mailto:[email protected]): invalid version: unknown revision runtime/v0.5.2

The workaround is that downstream projects need to add the following configuration into their repo to properly sync modules:

replace (
  github.com/vmware/vsphere-automation-sdk-go/lib => github.com/TaoZou1/vsphere-automation-sdk-go/lib v0.5.2
  github.com/vmware/vsphere-automation-sdk-go/runtime => github.com/TaoZou1/vsphere-automation-sdk-go/runtime v0.5.2
)

Ideally, this project should just move back onto the mainline repo. If this isn't possible, do not use the replace directive for dependencies. Use the fork directly so that transitive dependencies can be correctly calculated.

SecurityPolicy can not delete a rule when this rule index is not last one.

In certain case, SecurityPolicy can not delete a rule when this rule index is not last one.
For example, for the SP below:

apiVersion: nsx.vmware.com/v1alpha1
kind: SecurityPolicy
metadata:
  name: sp-app-access222
  namespace: ns-1
spec:
  priority: 0
  appliedTo:
    - podSelector:
        matchLabels:
          role: db
  rules:
    - direction: in ----->>rule1
      action: allow
      sources:
        - podSelector:
            matchLabels:
              app: coffee
      ports:
        - protocol: TCP
          port: 6001

    - direction: in ----->>rule2
      action: allow
      sources:
        - vmSelector:
            matchLabels:
              app: coco
      ports:
        - protocol: TCP
          port: 8282

    - direction: out. ----->>rule3
      action: allow
      destinations:
        - namespaceSelector:
            matchLabels:
              ns-name: ns-2
          vmSelector:
            matchLabels:
              app: web
      ports:
        - protocol: TCP
          port: 3306

When user tries to delete the second rule2, it will fail. The error is showing the source group in the rule 2 is being used and can not be deleted.

    - direction: in
      action: allow
      sources:
        - vmSelector:
            matchLabels:
              app: coco
      ports:
        - protocol: TCP
          port: 8282

This issue is caused by the original rule2 ID: sp_spUID_1_0_0 will be regenerated by the original rule3, since we don't use the unique ID for the each rule generated.

Support NSXT backup/restore for NSXServiceAccount

When Antrea Cluster is created after a backup point, the PI/CCP will be lost after NSXT restored to the backup point.
nsx-operator need to read PI/CCP info from CR and re-create them on NSXT.

Target release: 4.1.2

Remove controller-runtime package in pkg/apis package

Hi team, can we remove the controller-runtime dependency in NSX Operator APIs: https://github.com/vmware-tanzu/nsx-operator/blob/main/pkg/apis/go.mod#L8?

Because vSphere CPI wants to catch up with new k8s releases faster in future, we are working on removing controller-runtime dependency because it can't release in-time with the new k8s release.

vSphere CPI now is leveraging the NSX APIs and we found that NSX APIs also have the dependency on controller-runtime, therefore, we hope that NSX API can also remove that dependency.

Just removing this dependency on API package should be good enough for vSphere CPI, the main module can be untouched.

refactor error info

Now, log.Error is everywhere, both inside(service and utils module etc) and outside layer(controller module).
Suggest only putting err = fmt.Errorf("context info: %v", err) in inside layer, and not putting log.Error(err).
Put log.Error(err) in outside layer. In this way it could avoid much redundancy error info.

Make note that Subnet/SubnetSet for SubnetPorts are exclusive

Elaborating on a comment from @bryanv .

ref:

// SubnetPortSpec defines the desired state of SubnetPort.
type SubnetPortSpec struct {
// Subnet defines the parent Subnet name of the SubnetPort.
Subnet string `json:"subnet,omitempty"`
// SubnetSet defines the parent SubnetSet name of the SubnetPort.
SubnetSet string `json:"subnetSet,omitempty"`

Can we make comments indicating that Subnet and SubnetSet are exclusive for a SubnetPortSpec? That if you choose "Subnet", then "SubnetSet" may not also be selected - and what the expected failure case would be here. Additionally - is the failure handled by a ValidatingWebhook, or will the SubnetPort resolve incorrectly (ValidatingWebhook might be better to reject outright).

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.