openyurtio / raven Goto Github PK
View Code? Open in Web Editor NEWprovide layer 3 and layer 7 network connectivity among pods in different physical regions
License: Apache License 2.0
provide layer 3 and layer 7 network connectivity among pods in different physical regions
License: Apache License 2.0
When raven is adapted in production environment, raven gateway agent should be work in a high availability, because even one raven gateway instance fails, the cloud-edge or edge-edge communication will not be effected.
Currently, the ipvs
not walk the ip rule
cause ipvs backend cannot redirect to gateway node. We need re-route
after ipvs
dnat.
At first, this issue is part of #31 , and is used for ANP enhancement. Include the following work:
The related issue #16 has been moved to done in roadmap 0.2。
so if I want to deploy raven on my cluster which uses kubeproxy ipvs mode , which image should I choose?
raven-agent:latest Last pushed 2 months ago and raven-agent:main seems the right image?
The Architecture image in README.md shows there is a Raven Controller Manager. But I found every raven-agent has a controller manager.
Error from server (BadRequest): error when creating "cm.yaml": ConfigMap in version "v1" cannot be handled as a ConfigMap: v1.ConfigMap.Data: ReadString: expects " or n, but found f, error found in #10 byte of ...|node-ip":false,"vpn-|..., bigger context ...|{"apiVersion":"v1","data":{"forward-node-ip":false,"vpn-driver":"libreswan"},"kind":"ConfigMap",|...
Run helm template comes to
# Source: raven-agent/templates/config.yaml
apiVersion: v1
data:
vpn-driver: libreswan
forward-node-ip: false
kind: ConfigMap
metadata:
name: raven-agent-config
namespace: kube-system
forward-node-ip: false
should be forward-node-ip: "false"
using quote in helm chart, and maybe some change in operator for this configuration.
forward-node-ip: {{ .Values.vpn.forwardNodeIP | quote }}
log
E0413 11:39:32.227102 1 options.go:90] "failed to new manager for raven agent controller" err="error listening on :8080: listen tcp :8080: listen: address already in use"
Error: failed to create manager: error listening on :8080: listen tcp :8080: listen: address already in use
the listening address should be modifiable.
c.network
set to new value.c.lastSeenNetwork
set to c.network
, and also set to new value.func (c *EngineController) sync() error {
gws, err := c.gatewayLister.List(labels.Everything())
if err != nil {
return err
}
// As we are going to rebuild a full state, so cleanup before proceeding.
c.network = &types.Network{
LocalEndpoint: nil,
RemoteEndpoints: make(map[types.GatewayName]*types.Endpoint),
LocalNodeInfo: make(map[types.NodeName]*v1alpha1.NodeInfo),
RemoteNodeInfo: make(map[types.NodeName]*v1alpha1.NodeInfo),
}
// ...
// Only update lastSeenNetwork when all operations succeeded.
c.lastSeenNetwork = c.network
return nil
}
Create a temporary variable, and assign the new value.
As proposal https://github.com/openyurtio/openyurt/blob/master/docs/proposals/20220930-unifying-cloud-edge-comms.md mentioned: we need to implement the endpoints manager in raven-controller-manager to handle the l7 proxy server selection.
Generally, raven l7 server is exposed as a service on cloud side, when users access edge side through kubectl logs/exec or prometheus/metrics server, the access request needs to be intercepted by the l7 proxy server firstly. Since l7 proxy server will be launched only on cloud gateway nodes, and the l7 proxy server is implemented as part of raven-agent instead of a standalone pod, we can not leverage k8s service mechanism to handle raven l7 server endpoints selection directly. So we need to implement the endpoints manager to take the responsibility of raven proxy server endpoints selection for the corresponding service.
It mainly includes the tasks below:
Raven now use the Subnet-based configuration of Libreswan to create VPN connections, which is strongly rely on the "One-Subnet-One-Node" supposition. This brings troubles when the CNI implimentation don't obay the supposition, many of CNI even makes subnets be able to cross multiple nodes.
Both Libreswan and WireGuard can support a Route-based management method. Using this, maybe raven can only watch ips of Pod object and use them to configure routes (policy route + ipset + iptables mark) instead of rely on the "subnets" of nodes.
replace raven image repo in template:
Line 77 in 5999735
Now, raven supports forwarding requests through service and podIP, maybe it is a good idea to support forwarding requests through nodeIP, so kubectl logs/exec
command can work based on raven when nodeIPs are not duplicated in the cluster.
when nodeIPs may duplicated in the cluster, it is recommended to use nodeName to access edge nodes from cloud for kubectl logs/exec
command. and this feature will be supported in issue: #70
On cluster deployed with calico as the network plugin, the podCIDR
is assigned by ippool
as blockCIDR
. Every node is bound to blockaffinity
whose CIDR
is different from node.spec.podCIDR
. So IPsec's policy added by node.spec.podCIDR
is not right.
As we known, SLB is very common on cloud nodes that used by end user to expose public service, so if we can support SLB for gateway, then end user can use SLB to expose gateway on cloud nodes to gateways on edge nodes.
In many edge computing cases, the edge nodes must deploy in a local NAT network. one or more edge nodes in this local NAT network has permission to access the remote Cloud API server. However, the Cloud nodes cannot access the edge node initiatively.
As a professional network solution for edge computing, Raven needs to support this typical scenario as soon as possible.
This issue is part of #31 . It is used for merging yurt-tunnel options/config into raven, and include the following work:
To ensure the high availability of raven l7 proxy, we need to launch more than one l7 proxy on cloud side or edge nodepools.
This issue can be combined with l3 gateway HA feature: #39
For now, raven is using vxlan to forward cross-edge packets. Under such circumstances, when using flannel CNI, loose mode reverse path filtering is required for both flannel.1
and raven
Interface. Otherwise, the non-gateway node will not be able to communicate with nodes in other gateways.
https://www.kernel.org/doc/Documentation/networking/ip-sysctl.txt
rp_filter - INTEGER
0 - No source validation.
1 - Strict mode as defined in RFC3704 Strict Reverse Path
Each incoming packet is tested against the FIB and if the interface
is not the best reverse path the packet check will fail.
By default failed packets are discarded.
2 - Loose mode as defined in RFC3704 Loose Reverse Path
Each incoming packet's source address is also tested against the FIB
and if the source address is not reachable via any interface
the packet check will fail.
Usually, the rp_filter
defaults to strict mode. With strict mode, the host will first check whether the source of the received packet is reachable through the interface it came in, which will block some traffic.
Here are some explanations:
Assumes we have 2 nodes in the local gateway and 1 node in the remote gateway:
When the non-gateway node sends packets to nodes in a remote gateway. The packet will be routed to the local gateway node and then go through the VPN tunnel.
why need loose mode reverse path filtering for raven
Let's assume that the gateway node receives a packet(src=10.244.2.0, dst=10.244.0.0) from the non-gateway node on raven
(240.168.1.202)nic, then it will find that the reverse path for 10.244.2.0
should be flannel.1
, so the packet will be dropped. That's why we should configure loose mode reverse path filtering for raven
.
why need loose mode reverse path filtering for flannel.1
If the non-gateway node receives a packet(src=10.244.0.0, dst=10.244.2.0) from the local gateway node on flannel.1
(10.244.2.0/24)nic, then it will find that the reverse path for 10.244.0.0
should be raven
(240.168.2.202 ), so the packet will be dropped. That's why we should configure loose mode reverse path filtering for flannel.1
.
hot to fix
Set /proc/sys/net/ipv4/conf/{Interface}/rp_filter
to 2 to enable loose mode:
echo 2 > /proc/sys/net/ipv4/conf/raven/rp_filter
echo 2 > /proc/sys/net/ipv4/conf/flannel.1/rp_filter
As proposal https://github.com/openyurtio/openyurt/blob/master/docs/proposals/20220930-unifying-cloud-edge-comms.md mentioned: we need to implement cert manager for L7 connection establishment, which mainly includes 2 parts:
We can try to leverage YurtTunnel Cert manager logic to implement this feature in Raven.
If I understand the documentation correctly, nodes must be both labeled and entered into the gateway spec. The labeling of the nodes can be done when registering a node. After that I have to manually adjust the gateway spec. This makes automation very difficult.
Are there currently other ways to do this, e.g. only via the labels? What is the reason for the double entry?
As proposal https://github.com/openyurtio/openyurt/blob/master/docs/proposals/20220930-unifying-cloud-edge-comms.md mentioned: we need to implement the DNS manager in raven-controller-manager for hostname DNS parser.
It mainly includes the tasks below:
We can leverage most of the YurtTunnel DNS logic to implement it in raven.
As proposal https://github.com/openyurtio/openyurt/blob/master/docs/proposals/20220930-unifying-cloud-edge-comms.md mentioned: we need to implement l7 proxy which is responsible for establishing the layer 7 connection between gateway nodes, it mainly includes the tasks below:
When I deploy raven following the tutorial, the pod of raven's console print error message error get public ip by any of the apis: [https://api.ipify.org https://api.my-ip.io/ip https://ip4.seeip.org]
.
I think that it's better to configure the public ip by node annotation when the edge node is deploy in inner network or not.
As the title, we should support raven can be deployed via helm. In addition, helm charts of raven should be synced to
openyurtio/openyurt-helm repo automatically.
btw: raven-controller-manager component should also be supported to deploy via helm.
There is no arm64 arch in https://hub.docker.com/r/openyurt/raven-agent
Currently, If failure happens while setting route table or IPsec connection, we will requeue the event and retry later.
raven/pkg/k8s/engine_controller.go
Lines 115 to 129 in a4ee18f
This approach has some problems:
To improve this, we need a reconciliation mechanism to compare the current state of the node and the desired state (sync from the informer) periodically and make any required changes if they are mismatched.
Hi, I'm trying to learning raven. I want to know if there is any way to set up a experimental environment quickly.
I only have one PC and one cloud server. It' s difficult to simulate a real scenario where there are some edge nodes and their networks are not connected.
Could I use kind
or minikube
or some other methods to achieve the goal?
When there is only a single node at the endpoints in the Gateway, the NAT avoidance rules are cleaned up.
// pkg/networkengine/routedriver/vxlan/vxlan.go L490~501
err = vx.iptables.NewChainIfNotExist(iptablesutil.NatTable, iptablesutil.RavenPostRoutingChain)
if err != nil {
errList = errList.Append(fmt.Errorf("error create %s chain: %s", iptablesutil.PostRoutingChain, err))
}
err = vx.iptables.DeleteIfExists(iptablesutil.NatTable, iptablesutil.PostRoutingChain, "-m", "comment", "--comment", "raven traffic should skip NAT", "-o", "raven0", "-j", iptablesutil.RavenPostRoutingChain)
if err != nil {
errList = errList.Append(fmt.Errorf("error deleting %s chain rule: %s", iptablesutil.PostRoutingChain, err))
}
err = vx.iptables.ClearAndDeleteChain(iptablesutil.NatTable, iptablesutil.RavenPostRoutingChain)
if err != nil {
errList = errList.Append(fmt.Errorf("error deleting %s chain %s", iptablesutil.RavenPostRoutingChain, err))
}
@BSWANG PTL
As proposal https://github.com/openyurtio/openyurt/blob/master/docs/proposals/20230613-raven-l7-proxy.md mentioned: we need to implement the GatewayInternalService controller in yurt-manager for cross-pool request forwarding service within the cluster, all requests using nodename+port are parsed by CoreDNS to the address of the x-raven-proxy-internal-service
take over the capabilities of YurtTunnel
I think client-go
informer style is more appropriate than Reconcile
style in raven-agent
controller.
Line 41 in 6b14ecb
I suggest we should replace the Reconcile
style with client-go
style.
/kind feature
Gateway public ip may change, update if necessary.
We need to optimize gateway manager to achieve the goal below:
There are more technical details need to be nailed down for this feature, we can implement step 1 for solo edge nodes firstly at current stage, how to optimize it for nodepool usage scenarios needs more investigation.
It can turn some features on or off using the --feature-gates
command line flag. Such as, raven L7 feature includes many components: gateway manager, dns manager, cert manager and so on. Some components code can be commited, but not executed initially. Because they are controlled through the L7 feature turn-off.
As proposal https://github.com/openyurtio/openyurt/blob/master/docs/proposals/20230613-raven-l7-proxy.md mentioned: we need to implement the GatewayPublicService controller in yurt-manager for cross-pool proxy if using LoadBalancer to expose proxy server, the private gateway node actively establishes a long link with the public gateway node, the cross-pool request will be forwarded through the proxy server to the proxy client
As proposal https://github.com/openyurtio/openyurt/blob/master/docs/proposals/20230613-raven-l7-proxy.md mentioned: we need extend raven's 7-layer proxy for cross-pool communication to accommodate multi-region IP conflict scenarios
As proposal https://github.com/openyurtio/openyurt/blob/master/docs/proposals/20230613-raven-l7-proxy.md mentioned: we need to implement the DNS controller in yurt-manager for hostname DNS parser
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.