Git Product home page Git Product logo

raven's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

raven's Issues

[feature request]enhance ANP for yurt-tunnel in raven

At first, this issue is part of #31 , and is used for ANP enhancement. Include the following work:

  1. rebase upstream ANP, including tunnel-agent and tunnel-server
  2. remove interceptor in yurt-tunnel-server in order to reduce the complexity of yurt-tunnel.
  3. based on ANP to implement http.Handler interface, and yurt-tunnel-server will call this handler to forward requests.
  4. based on ANP to implement Agent interface, and yurt-tunnel-agent will call the instance to handle requests from tunnel-server.

Helm install error, and raven-agent-config is lost.

forward-node-ip: {{ .Values.vpn.forwardNodeIP }}


error msg

Error from server (BadRequest): error when creating "cm.yaml": ConfigMap in version "v1" cannot be handled as a ConfigMap: v1.ConfigMap.Data: ReadString: expects " or n, but found f, error found in #10 byte of ...|node-ip":false,"vpn-|..., bigger context ...|{"apiVersion":"v1","data":{"forward-node-ip":false,"vpn-driver":"libreswan"},"kind":"ConfigMap",|...

reason

Run helm template comes to

# Source: raven-agent/templates/config.yaml
apiVersion: v1
data:
  vpn-driver: libreswan
  forward-node-ip: false
kind: ConfigMap
metadata:
  name: raven-agent-config
  namespace: kube-system

the forward-node-ip: false should be forward-node-ip: "false"

Suggestion

using quote in helm chart, and maybe some change in operator for this configuration.
forward-node-ip: {{ .Values.vpn.forwardNodeIP | quote }}

port conflict when start up

log

E0413 11:39:32.227102       1 options.go:90] "failed to new manager for raven agent controller" err="error listening on :8080: listen tcp :8080: listen: address already in use"
Error: failed to create manager: error listening on :8080: listen tcp :8080: listen: address already in use

the listening address should be modifiable.

[Bug] EngineController `lastSeenNetwork` not correct set

Why

  1. c.network set to new value.
  2. c.lastSeenNetwork set to c.network, and also set to new value.
func (c *EngineController) sync() error {
	gws, err := c.gatewayLister.List(labels.Everything())
	if err != nil {
		return err
	}
	// As we are going to rebuild a full state, so cleanup before proceeding.
	c.network = &types.Network{
		LocalEndpoint:   nil,
		RemoteEndpoints: make(map[types.GatewayName]*types.Endpoint),
		LocalNodeInfo:   make(map[types.NodeName]*v1alpha1.NodeInfo),
		RemoteNodeInfo:  make(map[types.NodeName]*v1alpha1.NodeInfo),
	}

	// ...

	// Only update lastSeenNetwork when all operations succeeded.
	c.lastSeenNetwork = c.network
	return nil
}

How to fix

Create a temporary variable, and assign the new value.

[Raven-L7] Endpoints manager implementation

As proposal https://github.com/openyurtio/openyurt/blob/master/docs/proposals/20220930-unifying-cloud-edge-comms.md mentioned: we need to implement the endpoints manager in raven-controller-manager to handle the l7 proxy server selection.

Generally, raven l7 server is exposed as a service on cloud side, when users access edge side through kubectl logs/exec or prometheus/metrics server, the access request needs to be intercepted by the l7 proxy server firstly. Since l7 proxy server will be launched only on cloud gateway nodes, and the l7 proxy server is implemented as part of raven-agent instead of a standalone pod, we can not leverage k8s service mechanism to handle raven l7 server endpoints selection directly. So we need to implement the endpoints manager to take the responsibility of raven proxy server endpoints selection for the corresponding service.

It mainly includes the tasks below:

  1. Dynamically update the raven l7 proxy server service endpoints according to the l7 servers state.
  2. When user request arrives, select the proper l7 server endpoints to handle the user request.

Using VTI devices and Routes to manage VPN connections

Raven now use the Subnet-based configuration of Libreswan to create VPN connections, which is strongly rely on the "One-Subnet-One-Node" supposition. This brings troubles when the CNI implimentation don't obay the supposition, many of CNI even makes subnets be able to cross multiple nodes.

Both Libreswan and WireGuard can support a Route-based management method. Using this, maybe raven can only watch ips of Pod object and use them to configure routes (policy route + ipset + iptables mark) instead of rely on the "subnets" of nodes.

【feature request】support for proxying requests through nodeIP.

Now, raven supports forwarding requests through service and podIP, maybe it is a good idea to support forwarding requests through nodeIP, so kubectl logs/exec command can work based on raven when nodeIPs are not duplicated in the cluster.

when nodeIPs may duplicated in the cluster, it is recommended to use nodeName to access edge nodes from cloud for kubectl logs/exec command. and this feature will be supported in issue: #70

not absolutely compatible with calico

On cluster deployed with calico as the network plugin, the podCIDR is assigned by ippool as blockCIDR. Every node is bound to blockaffinity whose CIDR is different from node.spec.podCIDR. So IPsec's policy added by node.spec.podCIDR is not right.

[RavenL7 Proxy] Gateway pickup controller optimization

  1. Rename the Gateway controller to the GatewayPickup Controller
  2. Select nodes to provide tunnel or proxy service based on Spec.Endpoints
  3. Update the configuration of each ActiveEndpoint according to the global configmap raven-cfg, Spec.ProxyConfig, and Spec.TunnelConfig
  4. Traverse the nodes in the local network domain and update the node information to the status of the Gateway

Raven support passing through NAT

In many edge computing cases, the edge nodes must deploy in a local NAT network. one or more edge nodes in this local NAT network has permission to access the remote Cloud API server. However, the Cloud nodes cannot access the edge node initiatively.
As a professional network solution for edge computing, Raven needs to support this typical scenario as soon as possible.

[feature request]merge yurt-tunnel-server/agent into raven except ANP

This issue is part of #31 . It is used for merging yurt-tunnel options/config into raven, and include the following work:

  1. add a tunnel-entry for yurt-tunnel-server and yurt-tunnel-agent
  2. to use node labels and raven gateway info to select yurt-tunnel-server or yurt-tunnel-agent to startup
  3. migrate yurt-tunnel-server and yurt-tunnel-agent into raven except ANP part. and ANP part will be covered in #41
  4. remove iptables manager in yurt-tunnel-server
  5. Define two interface for underlying requests forwarding implementation, like ANP.

[Bug] Should configure loose mode reverse path filtering (rp_filter) for both `flannel.1` and `raven` nic.

For now, raven is using vxlan to forward cross-edge packets. Under such circumstances, when using flannel CNI, loose mode reverse path filtering is required for both flannel.1 and raven Interface. Otherwise, the non-gateway node will not be able to communicate with nodes in other gateways.

https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt
rp_filter - INTEGER
0 - No source validation.
1 - Strict mode as defined in RFC3704 Strict Reverse Path
Each incoming packet is tested against the FIB and if the interface
is not the best reverse path the packet check will fail.
By default failed packets are discarded.
2 - Loose mode as defined in RFC3704 Loose Reverse Path
Each incoming packet's source address is also tested against the FIB
and if the source address is not reachable via any interface
the packet check will fail.

Usually, the rp_filter defaults to strict mode. With strict mode, the host will first check whether the source of the received packet is reachable through the interface it came in, which will block some traffic.

Here are some explanations:
Assumes we have 2 nodes in the local gateway and 1 node in the remote gateway:

  • Non-gateway node: flannel.1: 10.244.2.0/24, raven 240.168.2.202/8
  • Gateway node: flannel.1: 10.244.1.0/24, raven 240.168.1.202/8
  • Remote gateway node: flannel.1: 10.244.0.0/24, raven 240.168.0.202/8

When the non-gateway node sends packets to nodes in a remote gateway. The packet will be routed to the local gateway node and then go through the VPN tunnel.

why need loose mode reverse path filtering for raven

Let's assume that the gateway node receives a packet(src=10.244.2.0, dst=10.244.0.0) from the non-gateway node on raven(240.168.1.202)nic, then it will find that the reverse path for 10.244.2.0 should be flannel.1, so the packet will be dropped. That's why we should configure loose mode reverse path filtering for raven.

why need loose mode reverse path filtering for flannel.1

If the non-gateway node receives a packet(src=10.244.0.0, dst=10.244.2.0) from the local gateway node on flannel.1(10.244.2.0/24)nic, then it will find that the reverse path for 10.244.0.0 should be raven (240.168.2.202 ), so the packet will be dropped. That's why we should configure loose mode reverse path filtering for flannel.1.

hot to fix
Set /proc/sys/net/ipv4/conf/{Interface}/rp_filter to 2 to enable loose mode:

echo 2 > /proc/sys/net/ipv4/conf/raven/rp_filter
echo 2 > /proc/sys/net/ipv4/conf/flannel.1/rp_filter

[Question] How to avoid entering nodes manually?

If I understand the documentation correctly, nodes must be both labeled and entered into the gateway spec. The labeling of the nodes can be done when registering a node. After that I have to manually adjust the gateway spec. This makes automation very difficult.

Are there currently other ways to do this, e.g. only via the labels? What is the reason for the double entry?

[Raven-L7] DNS manager implementation

As proposal https://github.com/openyurtio/openyurt/blob/master/docs/proposals/20220930-unifying-cloud-edge-comms.md mentioned: we need to implement the DNS manager in raven-controller-manager for hostname DNS parser.

It mainly includes the tasks below:

  1. Create and maintain the map between nodename and IP address, which needs to handle cloud nodes and edge nodes separately, because requests to cloud nodes are inter-domain communication, while requests to edge nodes are cross-domain communication.
  2. Record and generate the L7 proxy server Loadbalancer/EIP for L7 proxy agent to connect.

We can leverage most of the YurtTunnel DNS logic to implement it in raven.

[Raven-L7] Raven l7 proxy implementation

As proposal https://github.com/openyurtio/openyurt/blob/master/docs/proposals/20220930-unifying-cloud-edge-comms.md mentioned: we need to implement l7 proxy which is responsible for establishing the layer 7 connection between gateway nodes, it mainly includes the tasks below:

  1. Implement the l7 proxy agent in raven-agent of edge gateway node
  2. Implement the l7 proxy server in raven-agent of cloud gateway node
  3. L7 proxy agent connects the Loadbalancer/EIP exposed by cloud l7 proxy server and establish a tunnel connection to it
  4. Cloud l7 proxy server forwards the user request to the corresponding L7 proxy agent according to the hostname map
  5. Edge l7 proxy agent forwards the user request to the corresponding kubelet port of solo node or node in a nodepool

can not get public ip

When I deploy raven following the tutorial, the pod of raven's console print error message error get public ip by any of the apis: [https://api.ipify.org https://api.my-ip.io/ip https://ip4.seeip.org].

I think that it's better to configure the public ip by node annotation when the edge node is deploy in inner network or not.

support to use helm to deploy raven

As the title, we should support raven can be deployed via helm. In addition, helm charts of raven should be synced to
openyurtio/openyurt-helm repo automatically.

btw: raven-controller-manager component should also be supported to deploy via helm.

[feature request] Add reconciliation loop to check route entries and vpn connections periodically.

Currently, If failure happens while setting route table or IPsec connection, we will requeue the event and retry later.

func (c *EngineController) handleEventErr(err error, event interface{}) {
if err == nil {
c.queue.Forget(event)
return
}
if c.queue.NumRequeues(event) < maxRetries {
klog.Infof("error syncing event %v: %v", event, err)
c.queue.AddRateLimited(event)
return
}
utilruntime.HandleError(err)
klog.Infof("dropping event %q out of the queue: %v", event, err)
c.queue.Forget(event)
}

This approach has some problems:

  1. The re-entered event may break correctness.
  2. If max retries exceeded, the event get lost forever.
  3. If rules are deleted by the user accidentally, there is no way to recover. (unless to restart the agent)

To improve this, we need a reconciliation mechanism to compare the current state of the node and the desired state (sync from the informer) periodically and make any required changes if they are mismatched.

[Question] How to build a experimental environment

Hi, I'm trying to learning raven. I want to know if there is any way to set up a experimental environment quickly.
I only have one PC and one cloud server. It' s difficult to simulate a real scenario where there are some edge nodes and their networks are not connected.

Could I use kind or minikube or some other methods to achieve the goal?

[Bug] Avoid nat when there is only a single node at endpoints in the Gateway

Why

When there is only a single node at the endpoints in the Gateway, the NAT avoidance rules are cleaned up.

// pkg/networkengine/routedriver/vxlan/vxlan.go L490~501
err = vx.iptables.NewChainIfNotExist(iptablesutil.NatTable, iptablesutil.RavenPostRoutingChain)
if err != nil {
	errList = errList.Append(fmt.Errorf("error create %s chain: %s", iptablesutil.PostRoutingChain, err))
}
err = vx.iptables.DeleteIfExists(iptablesutil.NatTable, iptablesutil.PostRoutingChain, "-m", "comment", "--comment", "raven traffic should skip NAT", "-o", "raven0", "-j", iptablesutil.RavenPostRoutingChain)
if err != nil {
	errList = errList.Append(fmt.Errorf("error deleting %s chain rule: %s", iptablesutil.PostRoutingChain, err))
}
err = vx.iptables.ClearAndDeleteChain(iptablesutil.NatTable, iptablesutil.RavenPostRoutingChain)
if err != nil {
	errList = errList.Append(fmt.Errorf("error deleting %s chain %s", iptablesutil.RavenPostRoutingChain, err))
}

[RavenL7 Proxy] GatewayInternalService controller implementation

As proposal https://github.com/openyurtio/openyurt/blob/master/docs/proposals/20230613-raven-l7-proxy.md mentioned: we need to implement the GatewayInternalService controller in yurt-manager for cross-pool request forwarding service within the cluster, all requests using nodename+port are parsed by CoreDNS to the address of the x-raven-proxy-internal-service

  1. Create and maintain a Service named x-raven-proxy-internal-service, and manage Ports according to the Spec.proxyconfig in Gateway CR
  2. Create and maintain a Endpoints matching Service named x-raven-proxy-internal-service, select ActiveEndpoints (Type = Proxy) in gateway whose exposeType is LoadBalaner or Public as endpoints
  3. In the future, the service topology capability of yurthub can be used to automatically send cross-pool requests to gateway nodes in the local node pool

[Raven-L7] Gateway manager optimization

We need to optimize gateway manager to achieve the goal below:

  1. gateway CR can be auto-created/deleted for solo edge nodes by default:
    1). For solo edge nodes, treat it as a gateway node by default and try to launch l7 proxy agent
    2). Whether to create a gateway CR to notify other gateways is TBD, it's better to make it transparent to users
  2. gateway CR can be auto-created/deleted for nodepools by default:
    1). When a nodepool is created, and some nodes are added into the pool, a default gateway is created for the pool
    2). When some other nodes are added to/removed from the pool, the default gateway endpoints can be updated dynamically
    3). When the nodepool is deleted, the corresponding gateway CR is deleted automatically from the cluster
    4). When users updated a gateway of a pool, how to maintain it automatically is TBD

There are more technical details need to be nailed down for this feature, we can implement step 1 for solo edge nodes firstly at current stage, how to optimize it for nodepool usage scenarios needs more investigation.

add featuregate framework

It can turn some features on or off using the --feature-gates command line flag. Such as, raven L7 feature includes many components: gateway manager, dns manager, cert manager and so on. Some components code can be commited, but not executed initially. Because they are controlled through the L7 feature turn-off.

[RavenL7 Proxy] GatewayPublicService controller implementation

As proposal https://github.com/openyurtio/openyurt/blob/master/docs/proposals/20230613-raven-l7-proxy.md mentioned: we need to implement the GatewayPublicService controller in yurt-manager for cross-pool proxy if using LoadBalancer to expose proxy server, the private gateway node actively establishes a long link with the public gateway node, the cross-pool request will be forwarded through the proxy server to the proxy client

  1. Create and maintain a Service named x-raven-proxy-svc-${NodeName}, and manage Ports according to the Spec.proxyconfig in Gateway CR
  2. Create and maintain a Endpoints matching Service named x-raven-proxy-svc-${NodeName}, select ActiveEndpoints (Type = Proxy) in gateway whose exposeType is LoadBalaner as endpoints

[feature request]Support raven can run on windows environment

  • constraints:
  1. On windows environment, nodes in NodePool use host network to communicate with each other instead of using flannel Vxlan. in the other words, flannel components will not be installed on edge nodes.
  2. On windows environment, pods only support hostNetwork mode for reducing complexity.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.