Comments (17)
When attaching to the second class of networks, NAPI will select an IP address from the source network, save it in the "overlay_router" field, and use it as the next-hop address in the "routes" object. This IP address will be mapped in Portolan to a special MAC address recognized by the overlay devices. (See the Overlay Changes section for more here.)
Does this represent a special entry in the existing portolan_vnet_mac_ip
table/bucket, with the existing schema? Or are we looking to create a new table/bucket, or modify the existing one?
from rfd.
In the future "remote" may be used to indicate other kinds of remote networks, possibly reachable through some kind of authenticated tunnel.
Instead of making remote
a boolean do we want to instead make it a type (e.g. xdc
, tunnel
). Or perhaps a sub object like so:
# sdc-napi /networks/410fc93e-957a-4344-9112-ec17d5a946b5 | json -H
{
"uuid": "410fc93e-957a-4344-9112-ec17d5a946b5",
"remote": {
"type": "xdc",
"uuid": "025133ae-d107-47ab-aa08-27bb5e16e699",
"dc": "us-east-3"
},
"subnet": "10.0.34.0/24",
"fabric": true,
"vnet_id": 56634,
"vlan_id": 23
}
from rfd.
Note: RFD 130 was published to spell out different types of remote networks.
from rfd.
Note that when a network appears in an "attached_networks" array, then it will also contain its own mirroring "attached_networks" entry, to guarantee that two networks are always mutually routable and help prevent users from accidentally configuring a network to pass traffic in one direction but forgetting to do so in the other.
Do remote networks have attached_networks
in them? If so, what if the remote network is not Triton?
To get around these, we will use a special MAC address to determine whether we need to inspect the destination IP address (which we can then use to find the UL3 information), whether we need to rewrite the VL2 information, and what VNET identifier to use. We will also need to change the source MAC address to match the special MAC address being used on the destination fabric network.
Earlier you mention the overlay_router
. The overlay router's MAC address is this special MAC address... that's how we know off-link vs. on-link in overlay&varpd. That connection should be spelled out?
from rfd.
Specifically, on each subnet an IP is allocated for the router, and ARP requests for this IP will return the MAC of the overlay router. Within the overlay module, an additional flag (OVERLAY_ENTRY_F_ROUTER
) will be created. During outbound processing, when the target is looked up (by VL2 dest MAC), if the resulting overlay_target_entry_t
s has the OVERLAY_ENTRY_F_ROUTER
flag set, that indicates the packet should be routed. VL2->UL3 lookup requests for this MAC from overlay causes varpd to return IN6ADDR_ANY for the UL3 address. This should keep things from looking too exotic from the instance's perspective.
from rfd.
@rjloura I like the "remote"
as the type of remote instead of a boolean. The object is nice, too, we would just need to figure out how searching on its subfields would work. Maybe something like:
# sdc-napi /networks?remote.type=fabric&remote.dc=us-east-3
Does this represent a special entry in the existing portolan_vnet_mac_ip table/bucket, with the existing schema? Or are we looking to create a new table/bucket, or modify the existing one?
This will probably have to be something special given that we'll need to probably mark it in some way for Portolan. We'll want to figure out what our migration scheme for the new bucket layouts is. I think we can probably do something similar to how NAPI upgrades its other buckets today, but maybe not.
Earlier you mention the overlay_router. The overlay router's MAC address is this special MAC address... that's how we know off-link vs. on-link in overlay&varpd. That connection should be spelled out?
Yes, I'll try to make this clearer.
In the current proposed scheme, there is one router MAC address for a VNET ID.
Under this scheme, we need to inspect and use the sender's IP address in order
to determine what route to use, since there might be multiple routes matching
the destination. An alternative approach would be to assign a MAC address per
network and use that for disambiguation. We had discussed doing so initially
but were uncertain of how to propagate them out.
@rmustacc suggested having things work using the following steps:
- When NAPI sets up the IP-to-MAC mapping for Portolan, it marks the MAC address
address as a router MAC - The guest instance on a fabric will need to ARP for the MAC address of the
next-hop IP, as usual. - When varpd does the VL3 request, the response from Portolan will flag the MAC
as a router MAC address. (We were planning on filling in all zeroes for the
underlay address information in this case, but we should probably add a
bitfield for IP/MAC properties that we can extend in the future.) - varpd can then add the MAC address to the overlay device as a new router MAC
if it's not already there. - varpd could possibly kick off a bulk request for all destination routes for
that MAC address. - When overlay(5) sees a packet arrive for a router MAC address heading to a
destination IP address that it doesn't recognize, then it can ask varpd to
find it the route for that MAC address and destination IP (instead of the
source IP and destination IP).
from rfd.
varpd could possibly kick off a bulk request for all destination routes for
that MAC address.
My first question would've been where do the answers get stored, until I read this:
When overlay(5) sees a packet arrive for a router MAC address heading to a
destination IP address that it doesn't recognize, then it can ask varpd to
find it the route for that MAC address and destination IP (instead of the
source IP and destination IP).
which suggests varpd is the correct place to cache this information.
This way the state in overlay reduces drastically, the only real changes manifest in overlay_targ_lookup_t
and in overlay_targ_resp_t
. For lookups, the destination IP needs to show up in an ```otl_l3req`` case. In a response, the overlay's target point, or an alternative specifically for off-link targets, needs to know a bit more to make a transmitted packet palatable for a target (remote VLAN, remote vnetid, src MAC, etc.).
from rfd.
A problem with this I mentioned in MM though is that it seems like one could subvert the attachment policy for an instance using static arp entries.
from rfd.
Let me dive deep and see if I can imagine an actual attack:
Consider a next-hop-IP x.y.z.N has blessed-MAC 0a:0b:0c:0d:0e:0f, which has reachability to victim-net. A rogue-VM on net a.b.c.Q can do route add x.y.z.0/24 -interface a.b.c.Q; route add victim-net x.y.z.N; arp -s x.y.z.N a:b:c:d:e:f
. From then on out, the rogue-VM can attempt to reach victim-net.
1.) If a.b.c.Q is on a different vnetid, won't its overlay AND varpd state be distinct from an actual x.y.z.0/24 attachment, and therefore there would be no state (and portolan would reject based on vnetid)?
2.) If a.b.c.Q is the same-customer-but-rogue, could the source MAC be included in the lookup? (Or is that theoretically forgable too using vnics?)
What's your threat model here?
from rfd.
Regarding question 2: Zones cannot change their MAC address. I'm not sure about bhyve/KVM instances yet, however.
from rfd.
I don't think you'd be able to get a reply, but the packet would still be received if overlay already has the destination information -- we don't do a varpd lookup on every packet (nor do I think we would want to).
That's the other thing I had mentioned was wondering if the source MAC could be used to determine the source fabric, and how reliable that is. AFAIK, we don't currently allow MAC spoofing anywhere. @papertigers mentioned that there was discussion about possibly allowing it for running vrrp or similar. However, I'm not sure how permissible that'd be -- would it still be restricted to just the MAC(s) that could move around, or would it turn off all checking and allow an arbitrary MAC address to be set?
What I was thinking about was if you have something like:
typedef struct overlay_fabric_t {
struct in6_addr ofb_ip;
uint64_t ofb_vid;
uint32_t ofb_dcid;
uint16_t ofb_vlan;
uint8_t ofb_prefixlen;
} overlay_fabric_t;
typedef struct overlay_fabric_entry_t {
/* bookkeeping stuff for attachment, pointers to overlay_dev_t, etc */
overlay_fabric_t ofe_fabric;
} overlay_fabric_entry_t;
There'll need to be a way for varpd to send these to overlay (likely new ioctls that add/remove as well as the attachment information for them).
Then each overlay_target_entry_t
can add a field for its VL3 IP as well as a pointer to the correct overlay_fabric_entry_t
for that target. Then each overlay_target_entry_t
is hashed based on VL2 MAC and <fabric, VL3 IP>. If we had an overlay_target_entry_t
for the source VL2 MAC (if a customer has multiple instances on the same CN, this is likely to happen anyway), we could do a lookup on the src VL2 MAC of the packet to determine the source fabric.
from rfd.
I don't think you'd be able to get a reply, but the packet would still be received if overlay already has the destination information -- we don't do a varpd lookup on every packet (nor do I think we would want to).
Exploiting an existing destination only works for rogue-VM-same-user, as I understand it. Different user means different vnetid, and different state in overlay.
As for the new structures you propose, are those for varpd or for overlay?
from rfd.
Today it can't happen cross customer, but if we ever did support cross-customer routing, then it becomes a similar concern. Even for one customer though, I think it could still be a problem. If $BIG_CUSTOMER has multiple groups under the same account and saw traffic arrive on instances from things that shouldn't ever be able to reach it -- that's bound to raise some eyebrows.
The structures would be for overlay, though overlay_fabric_t
might get shared with varpd for exchanging information about fabrics.
from rfd.
Would source-MAC checking/enforcement solve the $BIG_CUSTOMER problem? (And can a KVM or bhyve instance change their NIC's MAC w/o droppage?)
from rfd.
When a user informs napi
that they want to attach networks to their fabric the rfd outlines the following currently:
"attached_networks": [
"f4104070-df1e-4c4a-891c-58951abd72e8",
"103b4f01-b8bc-42a5-886a-0a680da22d20",
"b1963383-6b1a-4025-b73d-a7fb43ff7624"
],
It seems that we will need more than just a uuid here. Perhaps a dcid + uuid would be better suited. If a user happens to get unlucky and napi uses the same uuid for different fabrics in different DCs, then we need a way to clue napi in on which network it is they are actually trying to add. It also probably makes sense to tell napi the DC and network uuid anyways so that it can create a local network that maps to the remote network more easily.
from rfd.
Yes, and I don't know :)
Fundamentally, the problem is we get an mblk_t
and have to try to determine which fabric it originates from so we can determine the correct destination. Because this could have security implications (in that we don't want packets to be sent to things that shouldn't be able to receive them -- even if the receiver cannot reply), it seems like we'd like something that could not be subverted by root in the instance.
from rfd.
@papertigers you missed the level-of-indirection provided in the nascent remote-network-object described in 119's text. It maps <dcid, that-dc's-uuid> to a my-dc-UUID.
from rfd.
Related Issues (20)
- RFD 156: SmartOS/Triton Boot Modernization HOT 24
- RFD 150 Operationalizing Prometheus discussion HOT 20
- Discussion for RFD 158 HOT 14
- RFD 117: Discussion HOT 2
- RFD 159 Discussion HOT 4
- RFD 160 Discussion: CloudWatch-like Metrics for Manta HOT 1
- RFD 163 Cloud Firewall Logging discussion HOT 13
- RFD 164 Open Source Policy HOT 4
- RFD 165 Discussion HOT 8
- RFD 166 Improving phy Management
- RFD 168 Bootstrapping a Manta Buckets deployment HOT 5
- RFD 169 Encrypted kernel crash dump HOT 7
- RFD 170 Manta Picker Component HOT 11
- RFD 171 Discussion!!! 🎉 HOT 32
- RFD 172 CNS Aggregation Discussion HOT 10
- RFD 174 Manta storage efficiency discussion HOT 46
- RFD 175 SmartOS integration process changes discussion HOT 27
- RFD 176: discussion HOT 5
- RFD 181: Improving Manta Storage Unit Cost (MinIO) Discussion HOT 3
- RFD 182: Altering system pool detection in SmartOS/Triton
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rfd.