sonic-net / dash Goto Github PK

View Code? Open in Web Editor NEW

77.0 77.0 87.0 47.9 MB

Disaggregated APIs for SONiC Hosts

License: Apache License 2.0

P4 12.06% Python 71.93% Jinja 5.08% Dockerfile 0.04% Makefile 2.95% Shell 0.43% C++ 7.10% Roff 0.32% C 0.09%

dash's People

Stargazers

Watchers

Forkers

pirabhur marian-pritsak 5gapp chrispsommers milexm mariobaldi albertovillarreal-keys prsunny srfngbl mitalum mmiele budgrise krisney-msft mhanif jafingerhut venkatmahalingam jian-hong-wu lisahnguyen zafirhafeez laquiante reshmaintel venkat-pullela-keys pettershao-ragilenetworks isabella232 keshugupta1 byreal oleksandrivantsiv mukeshmv dgalan-xxia madhudhavala raghuray1 bortok stayyule vijasrin stormliangms saad-mzhr prabhataravind avill-keys pterosaur mgheorghe anton7811 plvision sanjayth jasonschuang huangchengjiu theasianpianist vincent-xs utkarshayachit farck vkuma82 jimmyzhai vmytnykx lguohan murthyvijay wangyf2001 desaimg1 andriy-kokhan xumia james92618 jfingerh taras-keryk-plv clarklee-guizhao minhajttl1 ganglyu yuriilisovskyi attitude-1025 ashutosh-agrawal amithgspn yongliang0922 ravi861 r12f kcudnik yuqing-cat amithgspn2302 keydphi zhixiongniu meyappank aevyz vishantinet devenjagasia albgonzpri hamidrezatelus bluecmd sharmasushant vivek781113

dash's Issues

Extract default actions etc. from P4 model and add comments to generated SAI header files

Originally commented in #32 (review):

[SAI/sai_api_gen.py]
        for key in table[KEY_TAG]:
            sai_table_data['keys'].append(get_sai_key_data(program, key))

        for action in table['actions']:

Can you extract the P4 default action and render a suitable comment in the generated header file? For example, ACL default actions are deny and this should appear in the generated .h. Likewise, if there are other attributes or subtleties which are buried or implied by the P4 code, can we try to make them explicit in the generated .h header? If you cannot add to the current PR, perhaps create an issue for this so we can track it. Thanks.

Enforce xn rate limit per ENI

Related information

Scope - P4 DASH pipeline.

Fix routing action in vnet-vnet document

In this section of the doc
https://github.com/mmiele/DASH/blob/cb3d87ddf6247093c67f0ca6970c000d4db3857c/documentation/vnet2vnet-service/design/vnet-to-vnet-service.md#routing-a-packet-to-address-10101

1. Perform LPM lookup.
2. Select routing table DASH_ROUTE_TABLE:10.1.0.0/24. The action type is vnet and the value is Vnet1; and the overlay_ip=10.0.0.6.
3. Look up DASH_ROUTING_TYPE:vnet. The value for vnet is maprouting.

Points 2 and 3 above should have action type / routing type as vnet_direct instead of vnet.

Update HA HLD doc per latest design

Per HA WG meeting, this spec needs updating:
https://github.com/Azure/DASH/blob/main/documentation/high-avail/design/high-availability-and-scale.md

For example, instead of longer ASN prefix to establish preferred routes, the intent is to use load balancing (e.g. hash-based) from the ToR to the DPUs to balance active-active traffic. Default mode of operation is active-active.

The entire doc should be reviewed and refreshed to replace preliminary or obsolete concepts with the latest thinking from the SDN architects. In addition, some more details were explained by @mzms and we should attempt to capture these in writing. Michal proposed we create a "v2" version of the spec.

IPv6 routing support

Related information

Scope - P4 DASH pipeline. Requires definition of IPv6 header in overlay pipeline headers, proper parsing, and match on L3 addresses along with IPv4 addresses.

@marian-pritsak to provide further information, give to Hanif.

Notes

Needs to be coded in P4 behavioral model
No change in bmv2, just P4

Add additional HA requirements from https://github.com/Azure/DASH/pull/56/ into high-availability-and-scale.md

#56 contains the following content in the proposal document. If this is indeed the case, these requirements should go into https://github.com/Azure/DASH/blob/main/documentation/high-avail/design/high-availability-and-scale.md

Microsoft has provided some additional requirements:

HA Interoperability is required between vendors
- Pairing cards from different vendors is not the typical deployment, but must work
The HA packet format and protocol must be public
- This allows sniffed/mirrored HA messages to be analyzed
- No vendor-private protocol is allowed
The HA protocol for syncing active flows could have a base mode and optional modes
- Additional modes could be defined, for example to the reduce PPS/bps needed for the active sync messages
- A vendor only needs to support the base mode
- Any optional modes must also be public
The HA protocol does not need to reliably sync 100% of the flows between cards
- Ideally all flows are synced. But is ok if a small number of flows (hundreds out of 10s of millions) are missed.

Rename behavioral model folder/files to DASH

Related information

Item owned by NVidia, adding Marian
Scope - P4 DASH pipeline. Deprecate name "sirius" in P4 source files and change it to "dash"

Provide northbound gNMI/OpenConfig DASH-related schema & mapping to SAI

Provide (or identify if already exists) gNMI/OpenConfig schema which will be used to control DASH features, and also some description or specification of the mapping to equivalent SAI objects (i.e. what would need to be implememented in orchagent). Also see related #37

IPv6 IP Options extension headers

In bmv2 we are currently not supporting (able to parse/process) extensions. IPv6 has multiple headers. In v4 and v6 we are handling only vanilla headers. Supporting extension headers in bmv2.

IP options and fragment handling

https://github.com/Azure/DASH/blob/6b2a638d6f620469fdff59716c65bb7286b5ef61/sirius-pipeline/sirius_parser.p4#L34

How should IP options and IP fragments be handled?

Application table in P4 pipeline code

Not sure if this is the best place to ask these questions, but here goes.
(This is primarily addressed to Marian)

In the sirius_pipeline.p4 code, "table appliance" is defined.
The table's key is an appliance_id. Do we expect that there are multiple
appliance entries, and if so where does the index come from? Or is
this table really just a single entry with some global attributes?

One of the attributes is "neighbor_mac", and is used for the outer dmac
when encapping the vxlan header stack onto the packet. I wanted to
double-check that there is indeed one such mac address (and that putting
the attribute here was not just a placeholder).

Thanks,
Bud

Enforcement of per-VM CPS and flow table size

It is my understanding that per-VM throughput limits are enforced by the NIC on the host. Where/how are per-VM CPS and flow table size limits enforced?

[info request] DASH interop with SONiC

in dash we have static vxlan configuration and i assume dash appliance connects to sonic switches.
is vxlan terminated on the sonic switch ?

what is the typical network topology with the VM's , servers, sonic switches, dash appliances, core router, external connection to outside internet ....

what is the typical configuration that is done on each device? (protocols ...)

Simulator Development

Related information

Marian to provide guidance, owner is NVidia

Will every technology provider provide a docker container? Or packaged differently?

Upload P4 to SAI translation tool

@lguohan or @prsunny - would you please upload the tool to translate P4 to SAI?

SAI Thrift Server integration into bmv2

dash and sonic parameters naming sync

in my opinion there should be no difference between sonic and sonic-dash for the attributes and values that have the same purpose/functionality.(dash may have extra fields that is fine, but common fields should be called same)

if sonic calls it "packet action" why dash calls it "action"
if sonic uses FORWARD , drop, mirror ...., can sonic-dash also use same values. instead of allow deny

ideology will be to be able to copy paste the config and have it working (provided it uses supported params by both projects)

sonic defines the acl schema as
https://github.com/Azure/SONiC/blob/master/doc/acl/ACL-High-Level-Design.md
"ACL_RULE_TABLE:0d41db739a2cc107:3f8a10ff": {
"priority" : "55",
"IP_PROTOCOL" : "TCP",
"SRC_IP" : "20.0.0.0/25",
"DST_IP" : "20.0.0.0/23",
"L4_SRC_PORT_RANGE: "1024-65535",
"L4_DST_PORT_RANGE: "80-89",
"PACKET_ACTION" : "FORWARD"
},

sonic -dash defines the acl schema as
https://github.com/Azure/DASH/blob/main/documentation/general/design/dash-sonic-hld.md
DASH_ACL_RULE:{{group_id}}|{{rule_num}}
key = DASH_ACL_RULE:group_id:rule_num ; unique rule num within the group.
; field = value
priority = INT32 value ; priority of the rule, lower the value, higher the priority
action = allow/deny
terminating = true/false ; if true, stop processing further rules
protocols = list of INT ',' separated; E.g. 6-udp, 17-tcp
src_addr = list of source ip prefixes ',' separated
dst_addr = list of destination ip prefixes ',' separated
src_port = list of range of source ports ',' separated
dst_port = list of range of destination ports ',' separated

IPv6 ACL support

Needs to be coded in P4 behavioral model
No change in bmv2, just P4

Notes
Hi Mohammad, the following list of tables will require IPv6 support:
sirius_outbound.routing
sirius_outbound.ca_to_pa
ACL stages (sirius_acl file)
All those tables match on IPv4 address. We need to support matching on either IPv4 address or IPv6 address depending on the packet's overlay ethertype.
Per Mario Baldi - Collaborator
We will need IPv6 support also in connection tracking, hence generally throughout sirius_conntrack.p4. My suggestion would be to first finalize connection tracking with the proper connection removal behavior just for IPv4 (building on the current version of the code) and then later add IPv6 support.

Compare OpenContrail to Pensando model for flow tracking, TCP State Machine

Related information

We (as a team) need to make a decision re: the model to use, then we can resolve the PR and Issue below. Ideally we make sure all xn tracking cases are covered.

Related work items

Improve drop handling

Need to have a consistent plan for packet drops.
When we decide to drop a packet, do we just mark it for drop or stop processing immediately with a "return"?
Also need to consider when counters are incremented.
The vip table's deny action is logically @defaultonly.
But making this change would break the SAI generation due to the generation tool.
If the SAI generation tool is enhanced, then we could make this change.
The lookup for table eni_ether_address_map should drop the packet on lookup miss.
In general, need more negative checking in the code.

XN Tracking: SAI API for xn tracking (HA)

Related information

Expectation in SAI API generator? We need to finalize xn tracking, give to
VenkatM

XN: In Compiler

Related Information

Owner - Nvidia. Scope - p4c. Requires support for new table properties such as add_on_miss, idle_timeout_with_auto_delete, default_action.

what will be supported by Dash ? NVGRE, VXLAN, or both ?
any other tunneling protocol to be supported ?

and to what extent, will full VxLAN and full NVGRE be supported as per spec?
https://datatracker.ietf.org/doc/html/rfc7348
https://datatracker.ietf.org/doc/html/rfc7637
....

Routing-type vnet-direct

Add BGP failover details in HA Spec

https://github.com/chrispsommers/DASH/blob/main/documentation/high-avail/design/high-availability-and-scale.md has statements describing failover mechanisms such as:

In case of failure the BGP routes from "SDN Appliance 1" (previously active) will be withdrawn and TOR will prefer "SDN Appliance 2" and redirect traffic there, ensuring continuous traffic and uninterrupted customer experience.

It would be helpful to add details such as:

Expand on "BGP routines ...will be withdrawn." There's are assumptions in that statement, e.g. is BFD (Bidirectional Forwarding Detection) assumed? What are BFD timer values (100 msec)?
Unplanned failover downtime is stated as < 2 sec, which is a system limit. What are the potential contributors to this downtime, and how shall we budget the downtime amongst them?

Definition of non-standard P4 constructs in sirius P4 code, and plan for how to run them?

Examples:

The new match kinds list and range_list
New keywords state_context and state_graph

To my knowledge, these are not supported by the latest open source P4 compiler and behavioral-model / BMv2 software switch.

Is there a plan to release an open source implementation of P4 compiler and BMv2 (or other software switch) that can compile these programs?

Or perhaps to release alternate versions of these P4 programs, perhaps ones mechanically translated from these into P4 that can be compiled and run on an open source P4-programmable software switch?

Or some other plan to enable others to run the reference code, in some way?

Open sourcing SDN (Bluebird) Agent

When does Microsoft plan to open source the SDN (Bluebird) agent? As part of the SONiC DASH, we will need the SDN agent for solution testing.

Detail questions on dash-handling-fragmented-packets.md

These statements are made in the latest version of that file as of 2022-Feb-23:

If a subsequent packet arrives that is the start of a fragmented packet, the Frag ID must be used to create a new temporal flow that can be uniquely identified by the (Frag ID, DST, SRC) tuple.
If the connection is closed with the arrival of the FIN packet then all temporal flows must be closed as well.

The last statement seems to be assuming that temporal flows can be associated with a 5-tuple. That is true if the first IP fragment arriving for one original unfragmented packet contains the L4 header.

Is there a preferred behavior if the first IP fragment for a particular (Frag ID, DST, SRC) tuple is a non-first fragment, and contains no L4 header information, and thus cannot be associated with a particular 5-tuple (at least not yet)?

Translation: P4 Runtime layer SAI -> Simulator

Related information

NVidia - this is the 'glue' layer. Simulator exposes P4Runtime API for configuration. The goal is to align it with the SAI api by providing a translation from SAI to P4Runtime.

DASH SAI objects getting appended multiple times to header file

DASH SAI generator: Every time "make sai" is run, the generator is appending duplicate objects to the _sai_object_key_entry_t union.

:~/DASH/dash-pipeline$ grep "sai_direction_lookup_entry_t" SAI/SAI/inc/saiobject.h
    sai_direction_lookup_entry_t direction_lookup_entry;
    sai_direction_lookup_entry_t direction_lookup_entry;
    sai_direction_lookup_entry_t direction_lookup_entry;

Subsequent build of vnet_out fails:

In file included from /SAI/SAI/inc/sai.h:48,

                 from vnet_out.cpp:5:

/SAI/SAI/inc/saiobject.h:122:34: error: redeclaration of 'sai_direction_lookup_entry_t _sai_object_key_entry_t::direction_lookup_entry'

  122 |     sai_direction_lookup_entry_t direction_lookup_entry;

      |                                  ^~~~~~~~~~~~~~~~~~~~~~

/SAI/SAI/inc/saiobject.h:95:34: note: previous declaration 'sai_direction_lookup_entry_t _sai_object_key_entry_t::direction_lookup_entry'

   95 |     sai_direction_lookup_entry_t direction_lookup_entry;

      |                                  ^~~~~~~~~~~~~~~~~~~~~~

Dash build bmv2 container permission error

The bmv2 container image used in the Dash build requires write access to Volumes mounted on the host machine.
However this docker image comes with hard-coded uid/gid assigned in the Dockerfile when building the image.
The build workflow involves using a pre-built bmv2 image pulled from the registry.
When running a container from this docker image, the build commands run from within the container using this uid try to write to the mounted volumes in the host machine resulting in permission issues.

# make sai
Generate SAI library headers and implementation...
docker run -v /home/mukesh/dash/dash-pipeline/bmv2:/bmv2 -v /home/mukesh/dash/dash-pipeline/SAI:/SAI -v /home/mukesh/dash/dash-pipeline/tests:/tests --network=host --rm -it \
        --name build_sai-mukesh \
        -w /SAI chrissommers/dash-bmv2:pr127-220623 \
    ./generate_dash_api.sh
Directory ./lib will be deleted...
Traceback (most recent call last):
  File "./sai_api_gen.py", line 357, in <module>
    shutil.rmtree('./lib')
  File "/usr/lib/python3.8/shutil.py", line 718, in rmtree
    _rmtree_safe_fd(fd, path, onerror)
  File "/usr/lib/python3.8/shutil.py", line 675, in _rmtree_safe_fd
    onerror(os.unlink, fullname, sys.exc_info())
  File "/usr/lib/python3.8/shutil.py", line 673, in _rmtree_safe_fd
    os.unlink(entry.name, dir_fd=topfd)
PermissionError: [Errno 13] Permission denied: 'utils.cpp'
Makefile:85: recipe for target 'sai' failed

XN Tracking: xn in Simulator

Related information

Owner - Nvidia. Scope - BMv2. Requires support for PNA table properties and externs such as idle_timeout_with_auto_delete, add_on_miss, default_action and add_entry, set_entry_expire_time. It will allow for automatic learning and expiring entries.

Specify Inter-DPU HA flow sync communications requirements/restrictions

@mzms @lguohan Please state requirements/expectations for Inter-DPU HA flow sync communications requirements/restrictions:

All communications between cards travels over the datacenter network i.e. DPU-ToR-DPU
Are there any control-plane interactions, or is it DPU-DPU?
What protocols allowed (TCP/UDP, unicast/multicast)?
What IP address endpoints should be used (New one, same as gNMI management address but different port, etc.?)
What SLA for the DPU-DPU path can be guaranteed (e.g. will a burst of sync updates experience packet drops/throttling)?
Can vendors implement their own protocols, or must there be an interoperability standard ? Is there room for both?
Please articulate continuous updating vs. live migration?

Testing: Discuss test plan for pipeline (high level); Gerald, Marian, Chris

All s/be common regardless of the framework

Discussed in the bmv2 WG; Chris asked for some initial VNET test cases
Nvidia to provide C code to program tables and send packets w/scapp1 (2 weeks) - Keysight will convert to an automated test

Invalid match type: 'list' error when try to run bmv2 switch for sirius_pipeline project

I followed the same steps as mentioned in README in https://github.com/Azure/DASH/tree/main/sirius-pipeline

Build the environment >>>> worked well
make docker
Build pipeline >>>> worked well
make clean
make bmv2/sirius_pipeline.bmv2/sirius_pipeline.json
Run software switch >>>> did't work well
make run-switch

when i debugged i see below one is the main issue
$ p4c -b bmv2 bmv2/sirius_pipeline.p4 -o bmv2/sirius_pipeline.bmv2 >>>> worked well
$ simple_switch --log-console --interface 0@veth0 --interface 1@veth2 /bmv2/sirius_pipeline.bmv2/sirius_pipeline.json >>>> did't work well
Calling target program-options parser
Invalid match type: 'list'

Here iam attaching compiled program output sirius_pipeline.json as sirius_pipeline.txt becasue .json is not allowed to attach in this issue tracker.
sirius_pipeline.txt

Could some one suggest possible solution or work around

Support for list-match type (P4) for ACLs, pending BMV2 fork

Related information

Scope - BMv2. DASH ACL defines two new match types - list and range_list. They need to be handled by the simulator, so it can match a field in a packet against a list of values and generate a hit if one of the values in the list matches.

To Be Documented: SDN Controller's knowledge of flows and its control of various counters

Based on the discussion in Community meeting today, please help confirm/clarify the following understanding:

SDN Controller does not need to know about the dynamically learned flows. In other words, SDN Controller won't poll the switch for the flows or expects the switch to notify it when a new flow is learned or when an existing flow is deleted.
SDN Controller won't poll the switch for per flow statistics or expect the switch to stream per flow statistics to it periodically or when the flow is deleted
SDN Controller will explicitly create counters for specific entities. We discuss counter per ENI and per routing entry in today's meeting. I also see counters being associated with CA to PA mapping and ACL in P4 model. Are there any other counters the SDN Controller will create?

Improve SDN Features, Packet Transforms and Scale document

This issue collects information and related PRs intended to improve the document: SDN Features, Packet Transforms and Scale
Tasks that may produce related documentation:

Routing guidelines. WIP, see #67
Packet Flow and Transforms. WIP, see #114
VM to VM communication scenario in VNET. Work in draft form not yet available for PR
Support features such as Telemetry, Metering, Counters, Billing, Watchdogs, BGP?
Servicing
Update SDN Features, Packet Transforms and Scale accordingly. WIP.

Remove eni_id from pa_validation table keys

The pa_validation keys should just be VNI and SIP.

UDP Support; waiting on test infrastructure

Related information

Scope - P4 DASH pipeline. Requires definition of UDP header in overlay pipeline headers, proper parsing, and match on L4 ports along with TCP ports.

Notes

Needs to be coded in P4 behavioral model
No change in bmv2, just P4

Testing: Discuss test plan for TCP State machine

Black box (as described in the Test HLD, config + packets in/out), vs White box to observe the State (e.g. of the TCP state machine).

Gerald asked if we could now perform sirius-pipeline TCP state-machine testing, e.g. by sending in packets in a specific sequence and reading the “state” of the TCM state-machine at each transition to verify operation. Gerald feels this is necessary to qualify an implementation (e.g. in the lab - it doesn’t have to be done at full-speed nor run in production with this observability). Lots of discussion, some comments:
o Chris – yes you can send specific sequences of packets using the test framework (ixia-c/snappi) but there are no APIs to read TCP state-machine states in the existing design. All current APIs derive from the P4 model itself, so using this approach, we’d have to “model” such APIs as P4 registers or pseudo-tables. Also note that there is no stateful bmv2 implementation at this time, the current one is vanilla bmv2 from the p4lang repo.
o Marian – there are no APIs for this. Also since it would not be a production feature, this is additional work for vendors. Can’t we instead use black-box testing? This means sending in sequences of packets and testing the expected output, without explicitly reading internal states. (Gerald’s proposal is essentially “white-box” testing where we can read the internal state of the system.)
o Consensus was to continue this discussion in the DASH behavioral model WG.

XN Tracking: Fragment handling IPv4

Related information

Scope - P4 DASH pipeline. Requires extending the conntrack table key with Frag ID and using it instead of L4 ports.

What is desired behavior for classifying non-first IPv4/IPv6 fragments?

When IPv4/IPv6 packets are fragmented, the first fragment contains the the TCP/UDP L4 ports, and usually also contains the TCP flags fields [1]. Thus first IP fragments contain all the information required within themselves for performing ACL-like classification.

However, non-first IP fragments never contain the L4 ports. Many "stateless" ACL-like classifications, i.e. those that don't do IP reassembly in the data plane, typically define explicit separate rules about what should be done with non-first IP fragments, configured by the user. Without a more stateful mechanism in the data plane, that is the best I think one can do.

If a flow cache is created after ACL-like classifications, is the expectation that non-first fragments must somehow be associated with the first fragment, and match the ACL rules that have L4 ports in them, and be treated the same way as the first fragment was? If yes, then that is a trickier thing to implement in low-cost-per-Gbps data planes. It will be impossible in some data planes, possible in others.

If it is acceptable for non-first IP fragments to be handled in a more expensive general purpose CPU core in software, that opens up much more precise fragment handling behavior possibilities.

If these questions do not make the issue clear, please ask in a follow up comment and I can give more details.

[1] Attackers can send only the first 8 bytes of a TCP header in a first fragment if they are trying to bypass matching of ACL-like classification on TCP flags, and most Cisco routers (for example) discard such first IP fragments if they contain only a partial TCP header. For more discussion on that corner case, see https://datatracker.ietf.org/doc/html/rfc1858

guid used in dash configuration example.

https://github.com/Azure/DASH/blob/main/documentation/gnmi/design/dash-reference-config-example.md

i see in this document a lot of
"253de6f9-37bd-40ce-9cb2-9715915941d3": {
"qos-id": "253de6f9-37bd-40ce-9cb2-9715915941d3",

It is my opinion that the sample config in the documentation should use very descriptive human readable ids to make it easier to read and to understand.

also any further user facing api's should maintain those names from the config if i call it "dash-qos-rule-77" when i do a config get i should get it back with "dash-qos-rule-77" and if i do show detailed info on qos --rule-id "dash-qos-rule-77" it should work

internally the application may chose to use such guids, but user facing configuration should have user readable ids in form of "names" and what you set is what you get.

Detailed requirements questions on some desired parameter ranges for a DASH device

If these are already documented in the DASH repo, my apologies for the noise in asking, and I would appreciate greatly a pointer to where they are already documented.

What is the maximum timeout interval from seeing first FIN in a TCP connection, until the connection state is deleted, that should be possible to configure?

Is there a maximum rate at which a DASH device must handle received IP fragments? Answers like "must be able to do so at 100% line rate of all received traffic" or "maximum of 50,000 IP fragments per second, and any more than that may be discarded without being processed" are two extreme possible answers, both of which aid product designers in deciding how to handle them.

sirius-pipeline : utils.cpp: In function ‘p4::config::v1::P4Info parse_p4info(const char*)’: google::protobuf::io issue

Hello DASH community and @marian-pritsak
Here i followed the steps mentioned in https://github.com/Azure/DASH/blob/main/sirius-pipeline/README.md

make clean >>> working fine
make bmv2/sirius_pipeline.bmv2/sirius_pipeline.json >>> working fine
make sai >>> not working well
make sai has 2 steps one updating SAI folder and second compile created lib folder , seeing issue in compiling the lib fodler

Issue in detail:
A)
Updated SAI folder with teh steps mentioned in generate_dash_api.sh which is in SAI folder
sudo ./SAI/sai_api_gen.py bmv2/sirius_pipeline.bmv2/sirius_pipeline_p4rt.json --ignore-tables=appliance,eni_meter,slb_decap --overwrite=true dash
B)
once SAI folder updated and correspodng lib folder created , the next step is
cd lib
sudo make which results the following issue
$ sudo make
g++
-c
-I ../SAI/inc/
-I ../SAI/experimental/
-fPIC
-g
utils.cpp
saidash.cpp saidashacl.cpp saidashvnet.cpp
utils.cpp: In function ‘p4::config::v1::P4Info parse_p4info(const char*)’:
utils.cpp:62:27: error: ‘IstreamInputStream’ is not a member of ‘google::protobuf::io’; did you mean ‘CodedInputStream’?
62 | google::protobuf::io::IstreamInputStream istream_(&istream);
| ^~~~~~~~~~~~~~~~~~
| CodedInputStream
utils.cpp:63:42: error: ‘istream_’ was not declared in this scope; did you mean ‘istream’?
63 | google::protobuf::TextFormat::Parse(&istream_, &p4info);
| ^~~~~~~~
| istream
make: *** [Makefile:2: libsai.so] Error 1

ASK
Need help on how to resolve this issue ? this is blokcing to move to next steps like
make test
make run-test

Questions on Program Scale Testing Requirements

The "Program Scale Testing Requirements for LAB Validation" document stated that
"b. Download new policies and delete old policies at a significant rate to ensure that CPS, Active Connections, Aging, and new
Policies are properly handled with the external memory, which is often the bottleneck for performance."

Please help provide the additional info:

What will be the "significant rate" that we should expect for the policies being added and deleted?
What are the expected behaviors of the new policies on the existing active connections? For example, if the new policy results in a deny action for the existing connections, should the impacted connections be removed? If yes, is there an expectation on the rate of connection removal?

Connection Tracking: Too loose, UDP and Aging

Does the model remove the flow entirely on the FIN flag? Or does it wait for some time after the FIN flag so that the final ACKs can be forwarded? Is this too loose? The connection should be active only until an ACK packet in each direction covers the sequence number of the FIN packet in the other direction or timeout, whichever occurs first. I believe it would take ~7 states per direction to strictly track the TCP state. Should TCP connection tracking also support TCP window tracking?

What about support for UDP "connections" and the behavior of aging of both UDP and TCP flows?

What is the desired behavior when an ACL or route table configuration changes that potentially affects the forwarding behavior of an established flow? The specifications describe a slow path and fast path where the ACL and route lookups are avoided in the fast path (for performance). The current behavioral model appears to execute the ACL and route lookup on each and every packet. As a result there is no behavioral difference between the slow path and a fast path. This implies that implementations with both a fast and a slow path must re-evaluate ACL/route lookups for flows whenever configuration changes occurs that may affect the flows. Is this correct? Should the behavioral model explicitly model a fast and a slow path?

https://github.com/Azure/DASH/blob/6b2a638d6f620469fdff59716c65bb7286b5ef61/sirius-pipeline/sirius_conntrack.p4#L42

See PR PNA compatible connection tracking #21

Mention of Azure in Glossary

I noticed in the Overlay definition (in the glossary) there was a reference to Azure customers when discussing the APIs. DASH is not owned by Microsoft or Azure and will remain a public and open source of information. We should remove all references to Azure and use other words in it’s place like customer SDN, SDN orchestrator and such.

DASH will be used in the enterprise and other clouds who will orchestrate the services that fit their own individual needs. Over time we all hope that DASH will provide high level guidance to the lower level implementations of smart devices to get the most out of them without overly complicating scenarios with infinite flexibility leading to sub-optimal performance.

ixia-c tests can fail if protobuf already installed with incompatible version

Use of snappi package 0.7.37 as specified in https://github.com/Azure/DASH/blob/main/test/requirements.txt could cause snappi client errors if an incompatible version of protobuf python package is already installed.

Per ixia-c support Slack channel (https://ixia-c.slack.com/archives/C021DU5026R/p1657894469742489?thread_ts=1657847300.692199&cid=C021DU5026R), this can be fixed by upgrading to snappi 0.7.38. I will file a PR to do so.

An example error output is show below:

dash@chris-z4:~/chris-dash/DASH/dash-pipeline$ make run-all-tests 
# Ensure P4Runtime server is listening
t=5; \
while [ ${t} -ge 1 ]; do \
	if sudo lsof -i:9559 | grep LISTEN >/dev/null; then \
		break; \
	else \
		sleep 1; \
		t=`expr $t - 1`; \
	fi; \
done; \
docker exec -w /tests/vnet_out simple_switch-dash ./vnet_out
GRPC call SetForwardingPipelineConfig 0.0.0.0:9559 => /etc/dash/dash_pipeline.json, /etc/dash/dash_pipeline_p4rt.txt
GRPC call Write::add_one_entry OK: GRPC call Write::add_one_entry OK: GRPC call Write::add_one_entry OK: Done.
python3 -m pip install -r ../test/requirements.txt
Requirement already satisfied: snappi==0.7.37 in /home/dash/.local/lib/python3.8/site-packages (from -r ../test/requirements.txt (line 1)) (0.7.37)
Requirement already satisfied: pytest==6.0.1 in /home/dash/.local/lib/python3.8/site-packages (from -r ../test/requirements.txt (line 2)) (6.0.1)
Requirement already satisfied: urllib3 in /usr/lib/python3/dist-packages (from snappi==0.7.37->-r ../test/requirements.txt (line 1)) (1.25.8)
Requirement already satisfied: grpcio-tools==1.44.0; python_version > "2.7" in /home/dash/.local/lib/python3.8/site-packages (from snappi==0.7.37->-r ../test/requirements.txt (line 1)) (1.44.0)
Requirement already satisfied: PyYAML in /usr/lib/python3/dist-packages (from snappi==0.7.37->-r ../test/requirements.txt (line 1)) (5.3.1)
Requirement already satisfied: grpcio==1.44.0; python_version > "2.7" in /home/dash/.local/lib/python3.8/site-packages (from snappi==0.7.37->-r ../test/requirements.txt (line 1)) (1.44.0)
Requirement already satisfied: requests in /usr/lib/python3/dist-packages (from snappi==0.7.37->-r ../test/requirements.txt (line 1)) (2.22.0)
Requirement already satisfied: pluggy<1.0,>=0.12 in /home/dash/.local/lib/python3.8/site-packages (from pytest==6.0.1->-r ../test/requirements.txt (line 2)) (0.13.1)
Requirement already satisfied: more-itertools>=4.0.0 in /usr/lib/python3/dist-packages (from pytest==6.0.1->-r ../test/requirements.txt (line 2)) (4.2.0)
Requirement already satisfied: toml in /home/dash/.local/lib/python3.8/site-packages (from pytest==6.0.1->-r ../test/requirements.txt (line 2)) (0.10.2)
Requirement already satisfied: attrs>=17.4.0 in /usr/lib/python3/dist-packages (from pytest==6.0.1->-r ../test/requirements.txt (line 2)) (19.3.0)
Requirement already satisfied: py>=1.8.2 in /home/dash/.local/lib/python3.8/site-packages (from pytest==6.0.1->-r ../test/requirements.txt (line 2)) (1.11.0)
Requirement already satisfied: packaging in /home/dash/.local/lib/python3.8/site-packages (from pytest==6.0.1->-r ../test/requirements.txt (line 2)) (21.3)
Requirement already satisfied: iniconfig in /home/dash/.local/lib/python3.8/site-packages (from pytest==6.0.1->-r ../test/requirements.txt (line 2)) (1.1.1)
Requirement already satisfied: setuptools in /usr/lib/python3/dist-packages (from grpcio-tools==1.44.0; python_version > "2.7"->snappi==0.7.37->-r ../test/requirements.txt (line 1)) (45.2.0)
Requirement already satisfied: protobuf<4.0dev,>=3.5.0.post1 in /usr/local/lib/python3.8/dist-packages (from grpcio-tools==1.44.0; python_version > "2.7"->snappi==0.7.37->-r ../test/requirements.txt (line 1)) (3.6.1)
Requirement already satisfied: six>=1.5.2 in /usr/lib/python3/dist-packages (from grpcio==1.44.0; python_version > "2.7"->snappi==0.7.37->-r ../test/requirements.txt (line 1)) (1.14.0)
Requirement already satisfied: pyparsing!=3.0.5,>=2.0.2 in /home/dash/.local/lib/python3.8/site-packages (from packaging->pytest==6.0.1->-r ../test/requirements.txt (line 2)) (3.0.9)
cd ../test/third-party/traffic_gen && ./deploy_ixiac.sh
.
deployment_traffic_engine_1_1 is up-to-date
deployment_controller_1 is up-to-date
deployment_traffic_engine_2_1 is up-to-date
# Ensure P4Runtime server is listening
t=5; \
while [ ${t} -ge 1 ]; do \
	if sudo lsof -i:9559 | grep LISTEN >/dev/null; then \
		break; \
	else \
		sleep 1; \
		t=`expr $t - 1`; \
	fi; \
done; \
docker exec -w /tests/init_switch simple_switch-dash ./init_switch
GRPC call SetForwardingPipelineConfig 0.0.0.0:9559 => /etc/dash/dash_pipeline.json, /etc/dash/dash_pipeline_p4rt.txt
Switch is initialized.
python3 -m pytest ../test/test-cases/bmv2_model/ -s
================================================================================================== test session starts ===================================================================================================
platform linux -- Python 3.8.10, pytest-6.0.1, py-1.11.0, pluggy-0.13.1
rootdir: /home/dash/chris-dash/DASH
collected 0 items / 1 error                                                                                                                                                                                              

========================================================================================================= ERRORS =========================================================================================================
____________________________________________________________________________ ERROR collecting test/test-cases/bmv2_model/test_hello_world.py _____________________________________________________________________________
../../../.local/lib/python3.8/site-packages/snappi/otg_pb2_grpc.py:7: in <module>
    import otg_pb2 as otg__pb2
E   ModuleNotFoundError: No module named 'otg_pb2'

During handling of the above exception, another exception occurred:
../test/test-cases/bmv2_model/test_hello_world.py:1: in <module>
    import snappi
../../../.local/lib/python3.8/site-packages/snappi/__init__.py:1: in <module>
    from .snappi import Config
../../../.local/lib/python3.8/site-packages/snappi/snappi.py:17: in <module>
    from snappi import otg_pb2_grpc as pb2_grpc
../../../.local/lib/python3.8/site-packages/snappi/otg_pb2_grpc.py:9: in <module>
    from snappi import otg_pb2 as otg__pb2
../../../.local/lib/python3.8/site-packages/snappi/otg_pb2.py:23: in <module>
    _CONFIG = DESCRIPTOR.message_types_by_name['Config']
E   AttributeError: 'NoneType' object has no attribute 'message_types_by_name'
==================================================================================================== warnings summary ====================================================================================================
/usr/local/lib/python3.8/dist-packages/google/protobuf/internal/containers.py:182
  /usr/local/lib/python3.8/dist-packages/google/protobuf/internal/containers.py:182: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.10 it will stop working
    MutableMapping = collections.MutableMapping

/usr/local/lib/python3.8/dist-packages/google/protobuf/internal/containers.py:340
  /usr/local/lib/python3.8/dist-packages/google/protobuf/internal/containers.py:340: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3, and in 3.10 it will stop working
    collections.MutableSequence.register(BaseContainer)

-- Docs: https://docs.pytest.org/en/stable/warnings.html
================================================================================================ short test summary info =================================================================================================
ERROR ../test/test-cases/bmv2_model/test_hello_world.py - AttributeError: 'NoneType' object has no attribute 'message_types_by_name'
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! Interrupted: 1 error during collection !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
============================================================================================== 2 warnings, 1 error in 0.98s ==============================================================================================
make: *** [Makefile:355: run-ixiac-test] Error 2

sonic-net / dash Goto Github PK

dash's People

Stargazers

Watchers

Forkers

dash's Issues

Related information

Related information

Notes

Related information

Related information

Related information

Related work items

Related information

Related Information

Issues

Pull requests

Related information

Related information

Related information

Related information

Notes

Related information

Recommend Projects

Recommend Topics

Recommend Org