xilinx / accl Goto Github PK
View Code? Open in Web Editor NEWAlveo Collective Communication Library: MPI-like communication operations for Xilinx Alveo accelerators
Home Page: https://accl.readthedocs.io/
License: Apache License 2.0
Alveo Collective Communication Library: MPI-like communication operations for Xilinx Alveo accelerators
Home Page: https://accl.readthedocs.io/
License: Apache License 2.0
The ACCL XRT test suite fails for large counts on the bcast with root 1 tests.
ACCL was synthesized with TCP stack and UDP stack from dev
at 36eebbb. All other tests passed.
The error appears in both versions.
Setup:
Execution:
TCP version: mpirun -n 2 bin/test -f -x the.xclbin -t -b 2 -s 512
UDP version: mpirun -n 2 bin/test -f -x the.xclbin -u -b 2 -s 512
Output for the two failing tests:
[1,1]<stdout>:Start bcast test with root 1 ...
[1,0]<stdout>:Start bcast test with root 1 ...
[1,0]<stdout>:257th item is incorrect! (629.447388 != -480.259155)
[1,0]<stdout>:258th item is incorrect! (-729.046021 != -909.880920)
[1,0]<stdout>:259th item is incorrect! (811.583862 != 600.136963)
[1,0]<stdout>:260th item is incorrect! (670.017090 != 320.238892)
[1,0]<stdout>:261th item is incorrect! (-746.026367 != -137.172363)
[1,0]<stdout>:262th item is incorrect! (937.735596 != 499.881958)
[1,0]<stdout>:263th item is incorrect! (826.751709 != 821.295166)
[1,0]<stdout>:264th item is incorrect! (-557.931885 != -734.007935)
[1,0]<stdout>:265th item is incorrect! (264.718506 != -636.305969)
[1,0]<stdout>:266th item is incorrect! (-383.665894 != 964.721069)
[1,0]<stdout>:267th item is incorrect! (-804.919189 != -472.394165)
[1,0]<stdout>:268th item is incorrect! (94.441162 != -809.289673)
[1,0]<stdout>:269th item is incorrect! (-443.003540 != -708.921997)
[1,0]<stdout>:270th item is incorrect! (-623.236084 != -434.653381)
[1,0]<stdout>:271th item is incorrect! (93.762939 != -727.862915)
[1,0]<stdout>:272th item is incorrect! (985.762573 != 604.222900)
[1,0]<stdout>:273th item is incorrect! (915.013672 != 738.584351)
[1,0]<stdout>:274th item is incorrect! (992.922607 != -844.885925)
[1,0]<stdout>:275th item is incorrect! (929.776978 != 159.409180)
[1,0]<stdout>:276th item is incorrect! (935.389893 != 254.768677)
[1,0]<stdout>:277th item is incorrect! (-684.773804 != 99.720337)
[1,0]<stdout>:278th item is incorrect! (451.677979 != -983.812195)
[1,0]<stdout>:279th item is incorrect! (941.185547 != -710.090393)
[1,0]<stdout>:280th item is incorrect! (962.219360 != 360.574097)
[1,0]<stdout>:281th item is incorrect! (914.333984 != 706.062256)
[1,0]<stdout>:282th item is incorrect! (-780.276489 != 67.866211)
[1,0]<stdout>:283th item is incorrect! (-29.248718 != 244.110229)
[1,0]<stdout>:284th item is incorrect! (596.211670 != -122.666382)
[1,0]<stdout>:285th item is incorrect! (600.560913 != -298.095215)
[1,0]<stdout>:286th item is incorrect! (-405.941101 != -600.897583)
[1,0]<stdout>:287th item is incorrect! (-716.227295 != 26.499023)
[1,0]<stdout>:288th item is incorrect! (-990.433044 != -723.997314)
[1,0]<stdout>:289th item is incorrect! (-156.477478 != -196.383972)
[1,0]<stdout>:290th item is incorrect! (-775.070984 != -235.334106)
[1,0]<stdout>:291th item is incorrect! (831.471069 != -848.066650)
[1,0]<stdout>:292th item is incorrect! (279.526733 != 524.842651)
[1,0]<stdout>:293th item is incorrect! (584.414551 != -520.167664)
[1,0]<stdout>:294th item is incorrect! (756.861328 != -919.057800)
[1,0]<stdout>:295th item is incorrect! (918.984863 != -753.362122)
[1,0]<stdout>:296th item is incorrect! (7.325439 != -494.088043)
[1,0]<stdout>:297th item is incorrect! (311.481323 != -632.184448)
[1,0]<stdout>:298th item is incorrect! (595.857300 != 9.541992)
[1,0]<stdout>:299th item is incorrect! (-928.576660 != -520.094971)
[1,0]<stdout>:300th item is incorrect! (-277.411987 != 645.209961)
[1,0]<stdout>:301th item is incorrect! (698.258667 != -165.465881)
[1,0]<stdout>:302th item is incorrect! (-576.151367 != 963.446289)
[1,0]<stdout>:303th item is incorrect! (867.986450 != -900.691162)
[1,0]<stdout>:304th item is incorrect! (362.719116 != 646.910767)
[1,0]<stdout>:305th item is incorrect! (357.470215 != 805.432251)
[1,0]<stdout>:306th item is incorrect! (-202.522949 != -396.345398)
[1,0]<stdout>:307th item is incorrect! (515.480225 != 889.574463)
[1,0]<stdout>:308th item is incorrect! (481.294556 != -904.111450)
[1,0]<stdout>:309th item is incorrect! (486.264893 != -18.271790)
[1,0]<stdout>:310th item is incorrect! (-50.482605 != -504.304810)
[1,0]<stdout>:311th item is incorrect! (-215.545959 != -21.494751)
[1,0]<stdout>:312th item is incorrect! (-155.824646 != 88.112183)
[1,0]<stdout>:313th item is incorrect! (310.955811 != -324.561157)
[1,0]<stdout>:314th item is incorrect! (-652.269653 != 775.452271)
[1,0]<stdout>:315th item is incorrect! (-657.626587 != 800.107666)
[1,0]<stdout>:316th item is incorrect! (-396.173767 != -312.320923)
[1,0]<stdout>:317th item is incorrect! (412.092163 != -261.506409)
[1,0]<stdout>:318th item is incorrect! (594.559814 != -933.462891)
[1,0]<stdout>:319th item is incorrect! (-936.334290 != -777.594482)
[1,0]<stdout>:320th item is incorrect! (-366.899109 != -674.255737)
[1,0]<stdout>:321th item is incorrect! (-446.154053 != 560.504028)
[1,0]<stdout>:322th item is incorrect! (744.857666 != 754.727783)
[1,0]<stdout>:323th item is incorrect! (-907.657227 != -220.522339)
[1,0]<stdout>:324th item is incorrect! (-701.772034 != -579.396240)
[1,0]<stdout>:325th item is incorrect! (-805.736450 != -516.617432)
[1,0]<stdout>:326th item is incorrect! (988.136963 != -454.493958)
[1,0]<stdout>:327th item is incorrect! (646.915649 != -192.175720)
[1,0]<stdout>:328th item is incorrect! (643.806519 != -15.116028)
[1,0]<stdout>:329th item is incorrect! (389.657227 != -807.090942)
[1,0]<stdout>:330th item is incorrect! (-749.634460 != -553.160095)
[1,0]<stdout>:331th item is incorrect! (-365.801025 != -736.053406)
[1,0]<stdout>:332th item is incorrect! (527.500000 != -19.397034)
[1,0]<stdout>:333th item is incorrect! (900.444092 != 884.101196)
[1,0]<stdout>:334th item is incorrect! (-18.821899 != 909.886719)
[1,0]<stdout>:335th item is incorrect! (-931.107849 != 912.269165)
[1,0]<stdout>:336th item is incorrect! (327.211060 != 303.937256)
[1,0]<stdout>:337th item is incorrect! (-122.511292 != 150.417236)
[1,0]<stdout>:338th item is incorrect! (-748.206726 != 515.007202)
[1,0]<stdout>:339th item is incorrect! (-236.883118 != -880.440918)
[1,0]<stdout>:340th item is incorrect! (-579.581848 != -129.508606)
[1,0]<stdout>:341th item is incorrect! (531.033691 != -530.440186)
[1,0]<stdout>:342th item is incorrect! (-897.567139 != 105.762329)
[1,0]<stdout>:343th item is incorrect! (590.399902 != -293.682861)
[1,0]<stdout>:344th item is incorrect! (-927.117493 != -893.694885)
[1,0]<stdout>:345th item is incorrect! (-626.254761 != 642.388062)
[1,0]<stdout>:346th item is incorrect! (-182.537659 != -354.978516)
[1,0]<stdout>:347th item is incorrect! (-20.471191 != -969.193115)
[1,0]<stdout>:348th item is incorrect! (-84.021667 != -190.037231)
[1,0]<stdout>:349th item is incorrect! (-108.827576 != -913.952393)
[1,0]<stdout>:350th item is incorrect! (-24.862183 != 816.867188)
[1,0]<stdout>:351th item is incorrect! (292.625977 != -662.019897)
[1,0]<stdout>:352th item is incorrect! (587.949951 != 618.274414)
[1,0]<stdout>:353th item is incorrect! (418.729614 != 298.230957)
[1,0]<stdout>:354th item is incorrect! (841.749512 != -483.408203)
[1,0]<stdout>:355th item is incorrect! (509.373291 != 463.444824)
[1,0]<stdout>:356th item is incorrect! (615.062012 != -740.306946)
[1,0]<stdout>:357th item is incorrect! (-447.949829 != 295.491943)
[1,0]<stdout>:358th item is incorrect! (411.548462 != -13.346313)
[1,0]<stdout>:359th item is incorrect! (359.405396 != -98.152588)
[1,0]<stdout>:360th item is incorrect! (-994.363159 != -242.999146)
[1,0]<stdout>:361th item is incorrect! (310.196045 != 94.017700)
[1,0]<stdout>:362th item is incorrect! (421.407715 != 436.939941)
[1,0]<stdout>:363th item is incorrect! (-674.776489 != -407.358398)
[1,0]<stdout>:364th item is incorrect! (287.921875 != -424.390015)
[1,0]<stdout>:365th item is incorrect! (-762.004639 != 489.385620)
[1,0]<stdout>:366th item is incorrect! (-87.934387 != 246.871094)
[1,0]<stdout>:367th item is incorrect! (-3.271851 != -622.089966)
[1,0]<stdout>:368th item is incorrect! (547.834229 != 623.747925)
[1,0]<stdout>:369th item is incorrect! (919.487915 != 373.550903)
[1,0]<stdout>:370th item is incorrect! (147.509277 != -374.984009)
[1,0]<stdout>:371th item is incorrect! (-319.228516 != -632.977661)
[1,0]<stdout>:372th item is incorrect! (753.514893 != -228.578979)
[1,0]<stdout>:373th item is incorrect! (170.535400 != -263.030823)
[1,0]<stdout>:374th item is incorrect! (616.350952 != -312.140503)
[1,0]<stdout>:375th item is incorrect! (-552.376099 != 251.237183)
[1,0]<stdout>:376th item is incorrect! (-964.452209 != 631.538208)
[1,0]<stdout>:377th item is incorrect! (502.534180 != 560.454834)
[1,0]<stdout>:378th item is incorrect! (642.491943 != 338.570190)
[1,0]<stdout>:379th item is incorrect! (-489.809753 != -837.748474)
[1,0]<stdout>:380th item is incorrect! (641.681519 != -241.162292)
[1,0]<stdout>:381th item is incorrect! (11.914124 != 858.771973)
[1,0]<stdout>:382th item is incorrect! (880.148071 != -83.006226)
[1,0]<stdout>:383th item is incorrect! (398.153442 != 551.425293)
[1,0]<stdout>:384th item is incorrect! (-174.666931 != -392.772095)
[1,0]<stdout>:385th item is incorrect! (781.806519 != -26.416748)
[1,0]<stdout>:386th item is incorrect! (-153.669800 != 821.129883)
[1,0]<stdout>:387th item is incorrect! (918.582764 != -128.282837)
[1,0]<stdout>:388th item is incorrect! (161.913452 != -22.764587)
[1,0]<stdout>:389th item is incorrect! (94.431030 != -106.432495)
[1,0]<stdout>:390th item is incorrect! (-683.884827 != 511.579712)
[1,0]<stdout>:391th item is incorrect! (-722.751099 != -387.301025)
[1,0]<stdout>:392th item is incorrect! (523.462402 != -783.876160)
[1,0]<stdout>:393th item is incorrect! (-701.411987 != 17.017334)
[1,0]<stdout>:394th item is incorrect! (-539.687866 != -212.820312)
[1,0]<stdout>:395th item is incorrect! (-484.983521 != 21.543152)
[1,0]<stdout>:396th item is incorrect! (619.468994 != 740.373535)
[1,0]<stdout>:397th item is incorrect! (681.434570 != 635.255493)
[1,0]<stdout>:398th item is incorrect! (977.043213 != -278.278564)
[1,0]<stdout>:399th item is incorrect! (-491.435638 != 589.662842)
[1,0]<stdout>:400th item is incorrect! (-335.103455 != 834.235840)
[1,0]<stdout>:401th item is incorrect! (628.569580 != 288.636230)
[1,0]<stdout>:402th item is incorrect! (-400.336548 != 87.610962)
[1,0]<stdout>:403th item is incorrect! (-512.950073 != -242.781250)
[1,0]<stdout>:404th item is incorrect! (-972.921753 != -719.712280)
[1,0]<stdout>:405th item is incorrect! (858.527344 != 623.161011)
[1,0]<stdout>:406th item is incorrect! (-565.524292 != -600.254272)
[1,0]<stdout>:407th item is incorrect! (-300.032471 != 65.651123)
[1,0]<stdout>:408th item is incorrect! (814.729492 != 897.850098)
[1,0]<stdout>:409th item is incorrect! (-606.809509 != -298.545776)
[1,0]<stdout>:410th item is incorrect! (696.935547 != 980.219971)
[1,0]<stdout>:411th item is incorrect! (-497.832306 != 878.003174)
[1,0]<stdout>:412th item is incorrect! (910.035156 != -519.848145)
[1,0]<stdout>:413th item is incorrect! (232.089355 != 751.885620)
[1,0]<stdout>:414th item is incorrect! (557.795410 != -966.958862)
[1,0]<stdout>:415th item is incorrect! (-53.422302 != 100.312744)
[1,0]<stdout>:416th item is incorrect! (974.919189 != -222.769653)
[1,0]<stdout>:417th item is incorrect! (-296.680969 != 244.950195)
[1,0]<stdout>:418th item is incorrect! (-864.809265 != 559.378784)
[1,0]<stdout>:419th item is incorrect! (661.657227 != 174.089478)
[1,0]<stdout>:420th item is incorrect! (587.195190 != -46.723816)
[1,0]<stdout>:421th item is incorrect! (170.528198 != -584.515381)
[1,0]<stdout>:422th item is incorrect! (189.007202 != 80.276001)
[1,0]<stdout>:423th item is incorrect! (99.447266 != -397.507324)
[1,0]<stdout>:424th item is incorrect! (465.597412 != -963.098328)
[1,0]<stdout>:425th item is incorrect! (834.387329 != -58.153320)
[1,0]<stdout>:426th item is incorrect! (390.465698 != 800.366455)
[1,0]<stdout>:427th item is incorrect! (-428.321960 != -539.023682)
[1,0]<stdout>:428th item is incorrect! (359.639526 != -611.009277)
[1,0]<stdout>:429th item is incorrect! (514.400513 != 688.617554)
[1,0]<stdout>:430th item is incorrect! (-215.359070 != 771.996216)
[1,0]<stdout>:431th item is incorrect! (507.458252 != -610.471436)
[1,0]<stdout>:432th item is incorrect! (123.114990 != -117.553040)
[1,0]<stdout>:433th item is incorrect! (-239.108337 != -548.156433)
[1,0]<stdout>:434th item is incorrect! (-583.863892 != -704.341980)
[1,0]<stdout>:435th item is incorrect! (135.643188 != -658.583923)
[1,0]<stdout>:436th item is incorrect! (54.742920 != -520.995117)
[1,0]<stdout>:437th item is incorrect! (-848.291382 != -544.671387)
[1,0]<stdout>:438th item is incorrect! (-191.583008 != 599.306396)
[1,0]<stdout>:439th item is incorrect! (-892.099731 != -128.602600)
[1,0]<stdout>:440th item is incorrect! (-294.475220 != -53.969849)
[1,0]<stdout>:441th item is incorrect! (61.595093 != -377.795410)
[1,0]<stdout>:442th item is incorrect! (185.647705 != -820.353699)
[1,0]<stdout>:443th item is incorrect! (558.334473 != 846.759277)
[1,0]<stdout>:444th item is incorrect! (-287.309692 != 289.101074)
[1,0]<stdout>:445th item is incorrect! (868.021362 != -139.585205)
[1,0]<stdout>:446th item is incorrect! (929.932739 != 266.127319)
[1,0]<stdout>:447th item is incorrect! (-740.187622 != -630.367371)
[1,0]<stdout>:448th item is incorrect! (-691.123169 != 168.764526)
[1,0]<stdout>:449th item is incorrect! (137.647217 != 809.761841)
[1,0]<stdout>:450th item is incorrect! (-210.183533 != 453.308838)
[1,0]<stdout>:451th item is incorrect! (-61.218750 != 959.496704)
[1,0]<stdout>:452th item is incorrect! (-225.408203 != -290.723755)
[1,0]<stdout>:453th item is incorrect! (-976.195862 != -122.260010)
[1,0]<stdout>:454th item is incorrect! (453.909424 != 360.813232)
[1,0]<stdout>:455th item is incorrect! (-325.754700 != -777.761597)
[1,0]<stdout>:456th item is incorrect! (-222.860413 != 414.643066)
[1,0]<stdout>:457th item is incorrect! (-675.635376 != -483.870605)
[1,0]<stdout>:458th item is incorrect! (854.985718 != -675.343018)
[1,0]<stdout>:459th item is incorrect! (588.569092 != -182.560303)
[1,0]<stdout>:460th item is incorrect! (-127.764893 != -732.527222)
[1,0]<stdout>:461th item is incorrect! (-377.569885 != 189.792114)
[1,0]<stdout>:462th item is incorrect! (725.356323 != -100.887817)
[1,0]<stdout>:463th item is incorrect! (57.066284 != -475.576538)
[1,0]<stdout>:464th item is incorrect! (240.720093 != -915.891724)
[1,0]<stdout>:465th item is incorrect! (-668.702515 != 205.686157)
[1,0]<stdout>:466th item is incorrect! (-760.905640 != 594.728271)
[1,0]<stdout>:467th item is incorrect! (203.963867 != 422.431641)
[1,0]<stdout>:468th item is incorrect! (-56.086426 != -664.888428)
[1,0]<stdout>:469th item is incorrect! (-474.057434 != -556.506531)
[1,0]<stdout>:470th item is incorrect! (-319.560608 != 662.428589)
[1,0]<stdout>:471th item is incorrect! (308.158203 != -765.164673)
[1,0]<stdout>:472th item is incorrect! (59.683960 != -350.072937)
[1,0]<stdout>:473th item is incorrect! (378.429077 != -406.648254)
[1,0]<stdout>:474th item is incorrect! (432.201416 != 311.559814)
[1,0]<stdout>:475th item is incorrect! (496.303223 != -362.443359)
[1,0]<stdout>:476th item is incorrect! (976.758789 != -159.620117)
[1,0]<stdout>:477th item is incorrect! (-98.916809 != -151.666443)
[1,0]<stdout>:478th item is incorrect! (440.986816 != 563.818359)
[1,0]<stdout>:479th item is incorrect! (-832.357239 != 15.716553)
[1,0]<stdout>:480th item is incorrect! (825.155029 != -773.614929)
[1,0]<stdout>:481th item is incorrect! (-542.046082 != -828.968384)
[1,0]<stdout>:482th item is incorrect! (10.997070 != 987.069458)
[1,0]<stdout>:483th item is incorrect! (826.674683 != -475.035522)
[1,0]<stdout>:484th item is incorrect! (116.537598 != -636.854065)
[1,0]<stdout>:485th item is incorrect! (-695.243958 != 602.029175)
[1,0]<stdout>:486th item is incorrect! (6.380066 != 525.862793)
[1,0]<stdout>:487th item is incorrect! (651.634033 != -941.559448)
[1,0]<stdout>:488th item is incorrect! (-75.051636 != -468.158691)
[1,0]<stdout>:489th item is incorrect! (76.684814 != 857.708374)
[1,0]<stdout>:490th item is incorrect! (93.183838 != -511.723450)
[1,0]<stdout>:491th item is incorrect! (992.269409 != 460.661743)
[1,0]<stdout>:492th item is incorrect! (-104.831238 != -798.523010)
[1,0]<stdout>:493th item is incorrect! (-843.648926 != -22.782043)
[1,0]<stdout>:494th item is incorrect! (708.901978 != -311.344116)
[1,0]<stdout>:495th item is incorrect! (-114.643433 != 157.050171)
[1,0]<stdout>:496th item is incorrect! (208.463013 != -436.745361)
[1,0]<stdout>:497th item is incorrect! (-786.694458 != -525.432861)
[1,0]<stdout>:498th item is incorrect! (-2.911621 != 937.280762)
[1,0]<stdout>:499th item is incorrect! (923.796143 != -82.302307)
[1,0]<stdout>:500th item is incorrect! (959.851318 != -572.454468)
[1,0]<stdout>:501th item is incorrect! (-990.731567 != 926.177002)
[1,0]<stdout>:502th item is incorrect! (-931.365356 != 211.856323)
[1,0]<stdout>:503th item is incorrect! (549.820923 != 93.611450)
[1,0]<stdout>:504th item is incorrect! (954.004028 != -546.897339)
[1,0]<stdout>:505th item is incorrect! (634.606445 != 42.271606)
[1,0]<stdout>:506th item is incorrect! (-273.627075 != -620.189026)
[1,0]<stdout>:507th item is incorrect! (737.389404 != -536.811218)
[1,0]<stdout>:508th item is incorrect! (359.039429 != -196.857056)
[1,0]<stdout>:509th item is incorrect! (-831.128296 != -22.204529)
[1,0]<stdout>:510th item is incorrect! (-307.533203 != -173.418762)
[1,0]<stdout>:511th item is incorrect! (-200.434692 != 248.120239)
[1,0]<stdout>:512th item is incorrect! (711.750244 != -659.695435)
[1,0]<stdout>:256 errors!
[1,1]<stdout>:Start bcast compression test with root 1 ...
[1,0]<stdout>:Start bcast compression test with root 1 ...
[1,0]<stdout>:257th item is incorrect! (629.500000 != -480.259155)
[1,0]<stdout>:258th item is incorrect! (-729.000000 != -909.880920)
[1,0]<stdout>:259th item is incorrect! (811.500000 != 600.136963)
[1,0]<stdout>:260th item is incorrect! (670.000000 != 320.238892)
[1,0]<stdout>:261th item is incorrect! (-746.000000 != -137.172363)
[1,0]<stdout>:262th item is incorrect! (937.500000 != 499.881958)
[1,0]<stdout>:263th item is incorrect! (827.000000 != 821.295166)
[1,0]<stdout>:264th item is incorrect! (-558.000000 != -734.007935)
[1,0]<stdout>:265th item is incorrect! (264.750000 != -636.305969)
[1,0]<stdout>:266th item is incorrect! (-383.750000 != 964.721069)
[1,0]<stdout>:267th item is incorrect! (-805.000000 != -472.394165)
[1,0]<stdout>:268th item is incorrect! (94.437500 != -809.289673)
[1,0]<stdout>:269th item is incorrect! (-443.000000 != -708.921997)
[1,0]<stdout>:270th item is incorrect! (-623.000000 != -434.653381)
[1,0]<stdout>:271th item is incorrect! (93.750000 != -727.862915)
[1,0]<stdout>:272th item is incorrect! (986.000000 != 604.222900)
[1,0]<stdout>:273th item is incorrect! (915.000000 != 738.584351)
[1,0]<stdout>:274th item is incorrect! (993.000000 != -844.885925)
[1,0]<stdout>:275th item is incorrect! (930.000000 != 159.409180)
[1,0]<stdout>:276th item is incorrect! (935.500000 != 254.768677)
[1,0]<stdout>:277th item is incorrect! (-685.000000 != 99.720337)
[1,0]<stdout>:278th item is incorrect! (451.750000 != -983.812195)
[1,0]<stdout>:279th item is incorrect! (941.000000 != -710.090393)
[1,0]<stdout>:280th item is incorrect! (962.000000 != 360.574097)
[1,0]<stdout>:281th item is incorrect! (914.500000 != 706.062256)
[1,0]<stdout>:282th item is incorrect! (-780.500000 != 67.866211)
[1,0]<stdout>:283th item is incorrect! (-29.250000 != 244.110229)
[1,0]<stdout>:284th item is incorrect! (596.000000 != -122.666382)
[1,0]<stdout>:285th item is incorrect! (600.500000 != -298.095215)
[1,0]<stdout>:286th item is incorrect! (-406.000000 != -600.897583)
[1,0]<stdout>:287th item is incorrect! (-716.000000 != 26.499023)
[1,0]<stdout>:288th item is incorrect! (-990.500000 != -723.997314)
[1,0]<stdout>:289th item is incorrect! (-156.500000 != -196.383972)
[1,0]<stdout>:290th item is incorrect! (-775.000000 != -235.334106)
[1,0]<stdout>:291th item is incorrect! (831.500000 != -848.066650)
[1,0]<stdout>:292th item is incorrect! (279.500000 != 524.842651)
[1,0]<stdout>:293th item is incorrect! (584.500000 != -520.167664)
[1,0]<stdout>:294th item is incorrect! (757.000000 != -919.057800)
[1,0]<stdout>:295th item is incorrect! (919.000000 != -753.362122)
[1,0]<stdout>:296th item is incorrect! (7.324219 != -494.088043)
[1,0]<stdout>:297th item is incorrect! (311.500000 != -632.184448)
[1,0]<stdout>:298th item is incorrect! (596.000000 != 9.541992)
[1,0]<stdout>:299th item is incorrect! (-928.500000 != -520.094971)
[1,0]<stdout>:300th item is incorrect! (-277.500000 != 645.209961)
[1,0]<stdout>:301th item is incorrect! (698.500000 != -165.465881)
[1,0]<stdout>:302th item is incorrect! (-576.000000 != 963.446289)
[1,0]<stdout>:303th item is incorrect! (868.000000 != -900.691162)
[1,0]<stdout>:304th item is incorrect! (362.750000 != 646.910767)
[1,0]<stdout>:305th item is incorrect! (357.500000 != 805.432251)
[1,0]<stdout>:306th item is incorrect! (-202.500000 != -396.345398)
[1,0]<stdout>:307th item is incorrect! (515.500000 != 889.574463)
[1,0]<stdout>:308th item is incorrect! (481.250000 != -904.111450)
[1,0]<stdout>:309th item is incorrect! (486.250000 != -18.271790)
[1,0]<stdout>:310th item is incorrect! (-50.468750 != -504.304810)
[1,0]<stdout>:311th item is incorrect! (-215.500000 != -21.494751)
[1,0]<stdout>:312th item is incorrect! (-155.875000 != 88.112183)
[1,0]<stdout>:313th item is incorrect! (311.000000 != -324.561157)
[1,0]<stdout>:314th item is incorrect! (-652.500000 != 775.452271)
[1,0]<stdout>:315th item is incorrect! (-657.500000 != 800.107666)
[1,0]<stdout>:316th item is incorrect! (-396.250000 != -312.320923)
[1,0]<stdout>:317th item is incorrect! (412.000000 != -261.506409)
[1,0]<stdout>:318th item is incorrect! (594.500000 != -933.462891)
[1,0]<stdout>:319th item is incorrect! (-936.500000 != -777.594482)
[1,0]<stdout>:320th item is incorrect! (-367.000000 != -674.255737)
[1,0]<stdout>:321th item is incorrect! (-446.250000 != 560.504028)
[1,0]<stdout>:322th item is incorrect! (745.000000 != 754.727783)
[1,0]<stdout>:323th item is incorrect! (-907.500000 != -220.522339)
[1,0]<stdout>:324th item is incorrect! (-702.000000 != -579.396240)
[1,0]<stdout>:325th item is incorrect! (-805.500000 != -516.617432)
[1,0]<stdout>:326th item is incorrect! (988.000000 != -454.493958)
[1,0]<stdout>:327th item is incorrect! (647.000000 != -192.175720)
[1,0]<stdout>:328th item is incorrect! (644.000000 != -15.116028)
[1,0]<stdout>:329th item is incorrect! (389.750000 != -807.090942)
[1,0]<stdout>:330th item is incorrect! (-749.500000 != -553.160095)
[1,0]<stdout>:331th item is incorrect! (-365.750000 != -736.053406)
[1,0]<stdout>:332th item is incorrect! (527.500000 != -19.397034)
[1,0]<stdout>:333th item is incorrect! (900.500000 != 884.101196)
[1,0]<stdout>:334th item is incorrect! (-18.828125 != 909.886719)
[1,0]<stdout>:335th item is incorrect! (-931.000000 != 912.269165)
[1,0]<stdout>:336th item is incorrect! (327.250000 != 303.937256)
[1,0]<stdout>:337th item is incorrect! (-122.500000 != 150.417236)
[1,0]<stdout>:338th item is incorrect! (-748.000000 != 515.007202)
[1,0]<stdout>:339th item is incorrect! (-236.875000 != -880.440918)
[1,0]<stdout>:340th item is incorrect! (-579.500000 != -129.508606)
[1,0]<stdout>:341th item is incorrect! (531.000000 != -530.440186)
[1,0]<stdout>:342th item is incorrect! (-897.500000 != 105.762329)
[1,0]<stdout>:343th item is incorrect! (590.500000 != -293.682861)
[1,0]<stdout>:344th item is incorrect! (-927.000000 != -893.694885)
[1,0]<stdout>:345th item is incorrect! (-626.500000 != 642.388062)
[1,0]<stdout>:346th item is incorrect! (-182.500000 != -354.978516)
[1,0]<stdout>:347th item is incorrect! (-20.468750 != -969.193115)
[1,0]<stdout>:348th item is incorrect! (-84.000000 != -190.037231)
[1,0]<stdout>:349th item is incorrect! (-108.812500 != -913.952393)
[1,0]<stdout>:350th item is incorrect! (-24.859375 != 816.867188)
[1,0]<stdout>:351th item is incorrect! (292.750000 != -662.019897)
[1,0]<stdout>:352th item is incorrect! (588.000000 != 618.274414)
[1,0]<stdout>:353th item is incorrect! (418.750000 != 298.230957)
[1,0]<stdout>:354th item is incorrect! (841.500000 != -483.408203)
[1,0]<stdout>:355th item is incorrect! (509.250000 != 463.444824)
[1,0]<stdout>:356th item is incorrect! (615.000000 != -740.306946)
[1,0]<stdout>:357th item is incorrect! (-448.000000 != 295.491943)
[1,0]<stdout>:358th item is incorrect! (411.500000 != -13.346313)
[1,0]<stdout>:359th item is incorrect! (359.500000 != -98.152588)
[1,0]<stdout>:360th item is incorrect! (-994.500000 != -242.999146)
[1,0]<stdout>:361th item is incorrect! (310.250000 != 94.017700)
[1,0]<stdout>:362th item is incorrect! (421.500000 != 436.939941)
[1,0]<stdout>:363th item is incorrect! (-675.000000 != -407.358398)
[1,0]<stdout>:364th item is incorrect! (288.000000 != -424.390015)
[1,0]<stdout>:365th item is incorrect! (-762.000000 != 489.385620)
[1,0]<stdout>:366th item is incorrect! (-87.937500 != 246.871094)
[1,0]<stdout>:367th item is incorrect! (-3.271484 != -622.089966)
[1,0]<stdout>:368th item is incorrect! (548.000000 != 623.747925)
[1,0]<stdout>:369th item is incorrect! (919.500000 != 373.550903)
[1,0]<stdout>:370th item is incorrect! (147.500000 != -374.984009)
[1,0]<stdout>:371th item is incorrect! (-319.250000 != -632.977661)
[1,0]<stdout>:372th item is incorrect! (753.500000 != -228.578979)
[1,0]<stdout>:373th item is incorrect! (170.500000 != -263.030823)
[1,0]<stdout>:374th item is incorrect! (616.500000 != -312.140503)
[1,0]<stdout>:375th item is incorrect! (-552.500000 != 251.237183)
[1,0]<stdout>:376th item is incorrect! (-964.500000 != 631.538208)
[1,0]<stdout>:377th item is incorrect! (502.500000 != 560.454834)
[1,0]<stdout>:378th item is incorrect! (642.500000 != 338.570190)
[1,0]<stdout>:379th item is incorrect! (-489.750000 != -837.748474)
[1,0]<stdout>:380th item is incorrect! (641.500000 != -241.162292)
[1,0]<stdout>:381th item is incorrect! (11.914062 != 858.771973)
[1,0]<stdout>:382th item is incorrect! (880.000000 != -83.006226)
[1,0]<stdout>:383th item is incorrect! (398.250000 != 551.425293)
[1,0]<stdout>:384th item is incorrect! (-174.625000 != -392.772095)
[1,0]<stdout>:385th item is incorrect! (782.000000 != -26.416748)
[1,0]<stdout>:386th item is incorrect! (-153.625000 != 821.129883)
[1,0]<stdout>:387th item is incorrect! (918.500000 != -128.282837)
[1,0]<stdout>:388th item is incorrect! (161.875000 != -22.764587)
[1,0]<stdout>:389th item is incorrect! (94.437500 != -106.432495)
[1,0]<stdout>:390th item is incorrect! (-684.000000 != 511.579712)
[1,0]<stdout>:391th item is incorrect! (-723.000000 != -387.301025)
[1,0]<stdout>:392th item is incorrect! (523.500000 != -783.876160)
[1,0]<stdout>:393th item is incorrect! (-701.500000 != 17.017334)
[1,0]<stdout>:394th item is incorrect! (-539.500000 != -212.820312)
[1,0]<stdout>:395th item is incorrect! (-485.000000 != 21.543152)
[1,0]<stdout>:396th item is incorrect! (619.500000 != 740.373535)
[1,0]<stdout>:397th item is incorrect! (681.500000 != 635.255493)
[1,0]<stdout>:398th item is incorrect! (977.000000 != -278.278564)
[1,0]<stdout>:399th item is incorrect! (-491.500000 != 589.662842)
[1,0]<stdout>:400th item is incorrect! (-335.000000 != 834.235840)
[1,0]<stdout>:401th item is incorrect! (628.500000 != 288.636230)
[1,0]<stdout>:402th item is incorrect! (-400.250000 != 87.610962)
[1,0]<stdout>:403th item is incorrect! (-513.000000 != -242.781250)
[1,0]<stdout>:404th item is incorrect! (-973.000000 != -719.712280)
[1,0]<stdout>:405th item is incorrect! (858.500000 != 623.161011)
[1,0]<stdout>:406th item is incorrect! (-565.500000 != -600.254272)
[1,0]<stdout>:407th item is incorrect! (-300.000000 != 65.651123)
[1,0]<stdout>:408th item is incorrect! (814.500000 != 897.850098)
[1,0]<stdout>:409th item is incorrect! (-607.000000 != -298.545776)
[1,0]<stdout>:410th item is incorrect! (697.000000 != 980.219971)
[1,0]<stdout>:411th item is incorrect! (-497.750000 != 878.003174)
[1,0]<stdout>:412th item is incorrect! (910.000000 != -519.848145)
[1,0]<stdout>:413th item is incorrect! (232.125000 != 751.885620)
[1,0]<stdout>:414th item is incorrect! (558.000000 != -966.958862)
[1,0]<stdout>:415th item is incorrect! (-53.437500 != 100.312744)
[1,0]<stdout>:416th item is incorrect! (975.000000 != -222.769653)
[1,0]<stdout>:417th item is incorrect! (-296.750000 != 244.950195)
[1,0]<stdout>:418th item is incorrect! (-865.000000 != 559.378784)
[1,0]<stdout>:419th item is incorrect! (661.500000 != 174.089478)
[1,0]<stdout>:420th item is incorrect! (587.000000 != -46.723816)
[1,0]<stdout>:421th item is incorrect! (170.500000 != -584.515381)
[1,0]<stdout>:422th item is incorrect! (189.000000 != 80.276001)
[1,0]<stdout>:423th item is incorrect! (99.437500 != -397.507324)
[1,0]<stdout>:424th item is incorrect! (465.500000 != -963.098328)
[1,0]<stdout>:425th item is incorrect! (834.500000 != -58.153320)
[1,0]<stdout>:426th item is incorrect! (390.500000 != 800.366455)
[1,0]<stdout>:427th item is incorrect! (-428.250000 != -539.023682)
[1,0]<stdout>:428th item is incorrect! (359.750000 != -611.009277)
[1,0]<stdout>:429th item is incorrect! (514.500000 != 688.617554)
[1,0]<stdout>:430th item is incorrect! (-215.375000 != 771.996216)
[1,0]<stdout>:431th item is incorrect! (507.500000 != -610.471436)
[1,0]<stdout>:432th item is incorrect! (123.125000 != -117.553040)
[1,0]<stdout>:433th item is incorrect! (-239.125000 != -548.156433)
[1,0]<stdout>:434th item is incorrect! (-584.000000 != -704.341980)
[1,0]<stdout>:435th item is incorrect! (135.625000 != -658.583923)
[1,0]<stdout>:436th item is incorrect! (54.750000 != -520.995117)
[1,0]<stdout>:437th item is incorrect! (-848.500000 != -544.671387)
[1,0]<stdout>:438th item is incorrect! (-191.625000 != 599.306396)
[1,0]<stdout>:439th item is incorrect! (-892.000000 != -128.6[1,0]<stdout>:02600)
[1,0]<stdout>:440th item is incorrect! (-294.500000 != -53.969849)
[1,0]<stdout>:441th item is incorrect! (61.593750 != -377.795410)
[1,0]<stdout>:442th item is incorrect! (185.625000 != -820.353699)
[1,0]<stdout>:443th item is incorrect! (558.500000 != 846.759277)
[1,0]<stdout>:444th item is incorrect! (-287.250000 != 289.101074)
[1,0]<stdout>:445th item is incorrect! (868.000000 != -139.585205)
[1,0]<stdout>:446th item is incorrect! (930.000000 != 266.127319)
[1,0]<stdout>:447th item is incorrect! (-740.000000 != -630.367371)
[1,0]<stdout>:448th item is incorrect! (-691.000000 != 168.764526)
[1,0]<stdout>:449th item is incorrect! (137.625000 != 809.761841)
[1,0]<stdout>:450th item is incorrect! (-210.125000 != 453.308838)
[1,0]<stdout>:451th item is incorrect! (-61.218750 != 959.496704)
[1,0]<stdout>:452th item is incorrect! (-225.375000 != -290.723755)
[1,0]<stdout>:453th item is incorrect! (-976.000000 != -122.260010)
[1,0]<stdout>:454th item is incorrect! (454.000000 != 360.813232)
[1,0]<stdout>:455th item is incorrect! (-325.750000 != -777.761597)
[1,0]<stdout>:456th item is incorrect! (-222.875000 != 414.643066)
[1,0]<stdout>:457th item is incorrect! (-675.500000 != -483.870605)
[1,0]<stdout>:458th item is incorrect! (855.000000 != -675.343018)
[1,0]<stdout>:459th item is incorrect! (588.500000 != -182.560303)
[1,0]<stdout>:460th item is incorrect! (-127.750000 != -732.527222)
[1,0]<stdout>:461th item is incorrect! (-377.500000 != 189.792114)
[1,0]<stdout>:462th item is incorrect! (725.500000 != -100.887817)
[1,0]<stdout>:463th item is incorrect! (57.062500 != -475.576538)
[1,0]<stdout>:464th item is incorrect! (240.750000 != -915.891724)
[1,0]<stdout>:465th item is incorrect! (-668.500000 != 205.686157)
[1,0]<stdout>:466th item is incorrect! (-761.000000 != 594.728271)
[1,0]<stdout>:467th item is incorrect! (204.000000 != 422.431641)
[1,0]<stdout>:468th item is incorrect! (-56.093750 != -664.888428)
[1,0]<stdout>:469th item is incorrect! (-474.000000 != -556.506531)
[1,0]<stdout>:470th item is incorrect! (-319.500000 != 662.428589)
[1,0]<stdout>:471th item is incorrect! (308.250000 != -765.164673)
[1,0]<stdout>:472th item is incorrect! (59.687500 != -350.072937)
[1,0]<stdout>:473th item is incorrect! (378.500000 != -406.648254)
[1,0]<stdout>:474th item is incorrect! (432.250000 != 311.559814)
[1,0]<stdout>:475th item is incorrect! (496.250000 != -362.443359)
[1,0]<stdout>:476th item is incorrect! (977.000000 != -159.620117)
[1,0]<stdout>:477th item is incorrect! (-98.937500 != -151.666443)
[1,0]<stdout>:478th item is incorrect! (441.000000 != 563.818359)
[1,0]<stdout>:479th item is incorrect! (-832.500000 != 15.716553)
[1,0]<stdout>:480th item is incorrect! (825.000000 != -773.614929)
[1,0]<stdout>:481th item is incorrect! (-542.000000 != -828.968384)
[1,0]<stdout>:482th item is incorrect! (11.000000 != 987.069458)
[1,0]<stdout>:483th item is incorrect! (826.500000 != -475.035522)
[1,0]<stdout>:484th item is incorrect! (116.562500 != -636.854065)
[1,0]<stdout>:485th item is incorrect! (-695.000000 != 602.029175)
[1,0]<stdout>:486th item is incorrect! (6.378906 != 525.862793)
[1,0]<stdout>:487th item is incorrect! (651.500000 != -941.559448)
[1,0]<stdout>:488th item is incorrect! (-75.062500 != -468.158691)
[1,0]<stdout>:489th item is incorrect! (76.687500 != 857.708374)
[1,0]<stdout>:490th item is incorrect! (93.187500 != -511.723450)
[1,0]<stdout>:491th item is incorrect! (992.500000 != 460.661743)
[1,0]<stdout>:492th item is incorrect! (-104.812500 != -798.523010)
[1,0]<stdout>:493th item is incorrect! (-843.500000 != -22.782043)
[1,0]<stdout>:494th item is incorrect! (709.000000 != -311.344116)
[1,0]<stdout>:495th item is incorrect! (-114.625000 != 157.050171)
[1,0]<stdout>:496th item is incorrect! (208.500000 != -436.745361)
[1,0]<stdout>:497th item is incorrect! (-786.500000 != -525.432861)
[1,0]<stdout>:498th item is incorrect! (-2.912109 != 937.280762)
[1,0]<stdout>:499th item is incorrect! (924.000000 != -82.302307)
[1,0]<stdout>:500th item is incorrect! (960.000000 != -572.454468)
[1,0]<stdout>:501th item is incorrect! (-990.500000 != 926.177002)
[1,0]<stdout>:502th item is incorrect! (-931.500000 != 211.856323)
[1,0]<stdout>:503th item is incorrect! (550.000000 != 93.611450)
[1,0]<stdout>:504th item is incorrect! (954.000000 != -546.897339)
[1,0]<stdout>:505th item is incorrect! (634.500000 != 42.271606)
[1,0]<stdout>:506th item is incorrect! (-273.750000 != -620.189026)
[1,0]<stdout>:507th item is incorrect! (737.500000 != -536.811218)
[1,0]<stdout>:508th item is incorrect! (359.000000 != -196.857056)
[1,0]<stdout>:509th item is incorrect! (-831.000000 != -22.204529)
[1,0]<stdout>:510th item is incorrect! (-307.500000 != -173.418762)
[1,0]<stdout>:511th item is incorrect! (-200.375000 != 248.120239)
[1,0]<stdout>:512th item is incorrect! (712.000000 != -659.695435)
[1,0]<stdout>:256 errors!
Add support for hierarchical collectives within the confines of fanin == 1. Some examples:
Using the XRT driver to create an ACCL buffer for simulation, all create_buffer calls except the one that takes an xrt::bo will create a SimBuffer object that returns a nullptr in the SimBuffer::bo() call.
This becomes a problem, if the buffers created with this call should also be used in a user kernel emulated with Vitis sw_emu.
In this case, an invalid address (nullptr) will be passed to the user kernel and lead to undefined behavior. However, in hardware execution, the same code would work, because another buffer class is used underneath.
This is rather an inconsistency than a bug.
Assuming #3 is fixed and we can receive from multiple nodes simultaneously, add driver/firmware support for configuring and operating multiple communicators simultaneously.
The code below creates two fpga buffers and it is synced to device. However, the address got from the tx_buf_network->bo() and the rx_buf_network->bo() doesn't represent the physical FPGA memory address. So the network kernel can not write to memory, or can only write a very small amount of data to memory.
Buffer<int8_t> tx_buf_network = new FPGABuffer<int8_t>(3210241024, dataType::int8, device, networkmem);
Buffer<int8_t> rx_buf_network = new FPGABuffer<int8_t>(3210241024, dataType::int8,device, networkmem);
tx_buf_network->sync_to_device();
rx_buf_network->sync_to_device();
network_krnl(localFPGAIP, uint(rank), localFPGAIP, tx_buf_network->bo(), rx_buf_network->bo());
After changing the buffer instantiation using the original xrt api, it works fine. The code is attached below:
auto tx_buf_network = xrt::bo (device, 810241024sizeof(int8_t), networkmem);
tx_buf_network.sync(XCL_BO_SYNC_BO_TO_DEVICE);
auto rx_buf_network = xrt::bo (device, 810241024sizeof(int8_t), networkmem);
rx_buf_network.sync(XCL_BO_SYNC_BO_TO_DEVICE);
network_krnl(localFPGAIP, uint(rank), localFPGAIP, tx_buf_network, rx_buf_network);
Following user kernel is used to schedule sends and receives from PL:
#include "accl_hls.h"
void send_recv(const float *read_buffer,float *write_buffer, ap_uint<32> size, ap_uint<32> num_iterations,
ap_uint<32> neighbor_rank, ap_uint<32> communicator_addr, ap_uint<32> datapath_cfg,
STREAM<command_word> &cmd, STREAM<command_word> &sts) {
accl_hls::ACCLCommand accl_cmd(cmd, sts, communicator_addr, datapath_cfg,0,0);
for (int i = 0; i < num_iterations; i++) {
accl_cmd.send(size, 0, neighbor_rank, (ap_uint<64>)read_buffer);
accl_cmd.recv(size, 0, neighbor_rank, (ap_uint<64>)write_buffer);
}
}
The user kernel is linked with the ACCL cclo and plugin kernels of the latest dev branch like this: https://github.com/XilinxDublinLabs/HPCBenchmarks/blob/accl/b_eff/settings/settings.link.xilinx.accl_pl.u55c.hbm.profile.ini
The execution of the design gets stuck when executing the send_recv
kernel. Profiling data shows, that the commands of the user kernel do not get passed to the client_arbiter
and cclo
:
Accelerator Monitor Counters (hex values are cycle count)
Compute Unit Ends Starts Max Parallel Itr Execution Memory Stall Pipe Stall Stream Stall Min Exec Max Exec
ccl_offload_0 0 0 0 0x0 0x0 0x0 0x0 0xffffffffffffffff 0x0
hostctrl_0 4 4 1 0x6ab 0x0 0x0 0x0 0xc6 0x45d
networklayer_0 0 1 1 0x27aad2d70 0x0 0x0 0x0 0xffffffffffffffff 0x0
sendrecv 0 1 1 0x220eb9f1e 0x0 0x0 0x0 0xffffffffffffffff 0x0
cmac_0 0 0 0 0x0 0x0 0x0 0x0 0xffffffffffffffff 0x0
AXI Stream Monitor Counters
Stream Master Stream Slave Num Trans. Data kBytes Busy Cycles Stall Cycles Starve Cycles
cmac_0/M_AXIS networklayer_0/S_AXIS_eth2nl 48 0.832 118 0 14
networklayer_0/M_AXIS_nl2eth cmac_0/S_AXIS 512 4.096 512 0 0
networklayer_0/M_AXIS_nl2sk PIPE 0 0.000 0 0 0
ccl_offload_0/m_axis_eth_tx_data PIPE 0 0.000 0 0 0
sendrecv/cmd client_arbiter/cmd_clients_1 0 0.000 0 0 0
ccl_offload_0/m_axis_call_ack client_arbiter/ack_cclo 0 0.016 9142780514 0 9142780510
client_arbiter/ack_clients_0 hostctrl_0/sts 0 0.016 9142809039 0 9142809035
client_arbiter/ack_clients_1 sendrecv/sts 0 0.000 0 0 0
client_arbiter/cmd_cclo ccl_offload_0/s_axis_call_req 4 0.240 76 16 0
hostctrl_0/cmd client_arbiter/cmd_clients_0 4 0.240 94 34 0
Currently we have a small exchange memory mapped both in the Microblaze and host address spaces to hold configuration data. The size of this memory is limiting, e.g. some users might need much more memory than others. To enable user-configurable and potentially very large exchange memory, we could read configuration from PLRAM. Performance implications must be considered.
The behaviour of ACCL is undefined when doing operations where the data being exchanged is larger than the size of the rx buffers. This is an easy mistake to overlook, and results in the CCLO hanging without clear cause. It would be helpful if the ACCL driver would issue a warning when trying to perform an operation that is too large to fit in the rx buffers.
Observed on a test against the emulator, with 8 ranks. All tests run, some fail, test killed at the very end with:
[1,7]<stdout>:3 tests failed on rank 7 (skipped 1 tests).
[1,6]<stdout>:3 tests failed on rank 6 (skipped 1 tests).
[1,2]<stdout>:3 tests failed on rank 2 (skipped 1 tests).
[1,3]<stdout>:3 tests failed on rank 3 (skipped 1 tests).
[1,4]<stdout>:3 tests failed on rank 4 (skipped 1 tests).
[1,1]<stdout>:3 tests failed on rank 1 (skipped 1 tests).
[1,5]<stdout>:3 tests failed on rank 5 (skipped 1 tests).
[1,0]<stdout>:3 tests failed on rank 0 (skipped 1 tests).
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
A single kernel that sends cmds to 3 or more cclos
the cclos are connected via axis switch
And the performance kernel can be connected to a timer to count the elapsed time before cmd send and ack received
It can recall the collective multiple times
We get rid of pcie communication
This is just a small usability bug I came across: Most of the links in the section of the README that describe the repository structure are broken and the structure seems to be outdated especially for the demo
/test
directory.
HOST memory on selected shells allows direct DMA access into host DDR from the FPGA. Using HOST memory instead of FPGA DDR/HBM may reduce the latency of host-to-host collectives or primitives. Performance implications on throughput must be evaluated.
Add a new memory operation in the emulator and simulator that only specifies the memory address + size and allocates the memory without initializing it. This will allow to use bigger buffers, without having a large performance impact from sending large messages over the network.
Currently the XRT driver throws a CCLO appears configured error in the XRT driver, even though reset_periph is called on deinit. This behavior is not observed in the pynq driver.
Add RDMA support in addition to TCP/UDP
To reproduce, run tests against an emulator session, with 8 ranks
Some outputs slightly different from expected:
[1,7]<stdout>:1th item is incorrect! (5035.578613 != 5035.579102)
[1,7]<stdout>:2th item is incorrect! (-5832.367676 != -5832.368164)
[1,7]<stdout>:3th item is incorrect! (6492.671387 != 6492.670898)
[1,7]<stdout>:6th item is incorrect! (7501.883789 != 7501.884766)
[1,7]<stdout>:7th item is incorrect! (6614.014648 != 6614.013672)
[1,7]<stdout>:13th item is incorrect! (-3544.027832 != -3544.028320)
[1,7]<stdout>:16th item is incorrect! (7886.101074 != 7886.100586)
I'm working on the integration of ACCL and OMPC. Currently now using ACCL distributed emulation approach to start testing offloading computation to Alveo boards in a distributed system using ACCL as the communication backend.
I've tried some scenarios:
Any scenario with 10 (or fewer) instances in total do work fine (can't test with 11 instances due to some integration constraints)
The stream ID seems to be reset somewhere during transmission. I added an extra test case to test/host/hls to show this behavior in this branch.
If stream_put
is reading from a stream and the ACCL:dummy_buffer
is used , the data type of the operation is not defined. Currently, a buffer with the required data type needs to be created and passed to the operation, although no buffers are used.
The same holds when sending data from a stream with send
or when receiving data and forwarding it to a stream with recv
.
Either we need an elegant way to define the data type over dummy buffers without the need to manually allocating one or we need separate function signatures for these kind of operations.
The test script currently only tests a limited set of inputs which sometimes results in bugs being merged into the project without us realizing. The script is also turning quite large and crashes on the first error. We should look into switching to a unit test framework like GoogleTest.
To have FPGAs talk to non-FPGA hosts we need a software implementation of the ACCL collectives protocol on top of TCP initially, then on top of RDMA.
The first execution of the test/host/xrt/test.cpp
test works fine (using hardware). When doing a subsequent runs of this test, it fails and shows errors like the ones present in the subsequent-run.log file (subsequent runs work fine in the pynq tri test). We are only able to execute the test again successfully, using the xrt driver, if doing a hot reset on the board.
Enable compression e.g. FP16 transport/sums
Currently, we only have a single parameter to specify the source/destination tag of a message and the source/destination stream_id. This means, that tag and stream_id have to match for every call.
However, the stream_id may also be used in applications to identify different user kernels. Forcing a mapping to a tag artificially reduces the number of valid tags for a message. Also, the tag range is larger than the stream_id range, which may lead to undefined behaviour in some cases.
Possible solutions could be:
recv
and stream_put
calls to allow the specification of the destination user kernel if the data is coming from a buffer.A large part of the complexity of the firmware (and time spent in execution) is for managing the issuing and acknowledgement of DMA transfer segments. We should evaluate the feasibility and benefits of offloading the functionality to a HLS IP, which would reduce the firmware complexity and also improve latency especially for small messages.
Implement a barrier with ACCL, to avoid having to go to system mpi for this functionality. Possibly reuse a ring collective with small dummy payload.
Reduce and Allreduce stall for large message sizes when using streaming sums. The problem is worse on 4 ranks than on 3.
The simulator does not write the results of allreduce back correctly.
The TCP stack does not respond to requests after a random amount of time, ranging from 3s up to about 1 minute. This may be related to internal time-outs in the TCP stack
Currently, the simulator can run out of memory because memory can not be reused.
This leads to failing tests for the C++ driver, because there are too many, small allocations. A proper allocation scheme is required in the C++ driver to prevent this issue sustainably in the C++ tests but also in applications using the C++ driver.
Currently the results of the reduce_scatter are ordered unintuitively. If the result array is n large and there are n processes, rank i will receive the index (i + 1) % n. It would be more intuitive if rank i would receive index i instead.
Currently, all ACCL calls like send/recv require a reference to a BaseBuffer object as mandatory argument. When streaming is used as input or output, this requires to create and pass a BaseBuffer which is not used at all.
CCLO *ACCL::send(BaseBuffer &srcbuf, unsigned int count, unsigned int dst, unsigned int tag, communicatorId comm_id, bool from_fpga, streamFlags stream_flags, dataType compress_dtype, bool run_async, std::vector<CCLO *> waitfor)
There are different options to change/extend the API to overcome this issue:
ACCL::prepare_call
but it would change the signatures for all calls.ACCL::stream_send
, ACCL::stream_recv
... that explicitly support streaming and only take the required arguments as input.I personally would favor the last option because it is most explicit. However, I am not sure if it is flexible enough to support further enhancements of ACCL in the future since we would basically remove control over the streaming flags by the user.
I would also argue for a stream_recv
call because the current way of handling communication via streams as one-sided communication is ambiguous. What happens in cases, where two ranks send data to the same destination rank? The order the messages will arrive is not defined and there is no way to find out the current sender on the receiving side. With that, we need to use barriers to enforce the correct order. With the stream_recv
calls we would be able to define the order how the messages are received and handled by the user kernel.
I think all this needs further discussion to find the best solution. Maybe there are also technical restrictions i am not aware of.
Currently ACCL can only be called from the host. It's useful in some scenarios for Vitis kernels to call ACCL directly. For this, two things are required:
This issue is probably an edge case in the CCLO kernel, since both the XRT and PYNQ drivers for ACCL observe this issue.
In the XRT driver, the ACCL constructor currently requires the user to give the cclo and control kernel as xrt::ip
and xrt::kernel
objects.
But in some cases it may be easier for the user to assume default names for these kernels to allow reduction of the code required for initialization.
Therefore, an additional ACCL constructor taking the xrt::uuid
instead of the kernels could be provided. The instantiation of the kernels is moved to the constructor itself. Optionally, the kernel names may be given as std::string
s to the constructor.
Hi all,
I recently tried to compile a bitstream for u250(xilinx_u250_gen3x16_xdma_4_1_202210_1) and it fails during routing. The build was based on the last commit of the new_tcp branch (which is now merged to dev branch).
The error message is the following:
ERROR: [Constraints 18-1000] Routing results verification failed due to partially-conflicted nets (Up to first 10 of violated nets): level0_i/level1/level1_i/ulp/arith_0/inst/dadd_64ns_64ns_64_5_full_dsp_1_U39/reduce_ops_dadd_64ns_64ns_64_5_full_dsp_1_ip_u/inst/i_synth/ADDSUB_OP.ADDSUB/LOGIC_SPEED.OP/B_IP_DELAY/i_pipe/b_frac_del[20] level0_i/level1/level1_i/ulp/arith_0/inst/dadd_64ns_64ns_64_5_full_dsp_1_U39/reduce_ops_dadd_64ns_64ns_64_5_full_dsp_1_ip_u/inst/i_synth/ADDSUB_OP.ADDSUB/LOGIC_SPEED.OP/B_IP_DELAY/i_pipe/b_frac_del[5] level0_i/level1/level1_i/ulp/arith_0/inst/fadd_32ns_32ns_32_7_full_dsp_1_U11/reduce_ops_fadd_32ns_32ns_32_7_full_dsp_1_ip_u/inst/i_synth/ADDSUB_OP.ADDSUB/DSP.OP/A_IP_DELAY/i_pipe/opt_has_pipe.first_q_reg[22]_0[5] level0_i/level1/level1_i/ulp/arith_0/inst/fadd_32ns_32ns_32_7_full_dsp_1_U11/reduce_ops_fadd_32ns_32ns_32_7_full_dsp_1_ip_u/inst/i_synth/ADDSUB_OP.ADDSUB/DSP.OP/DSP48E1_BODY.ALIGN_ADD/SUM_DELAY/i_pipe/opt_has_pipe.first_q_reg[25]_0 level0_i/level1/level1_i/ulp/arith_0/inst/fadd_32ns_32ns_32_7_full_dsp_1_U11/reduce_ops_fadd_32ns_32ns_32_7_full_dsp_1_ip_u/inst/i_synth/ADDSUB_OP.ADDSUB/DSP.OP/DSP48E1_BODY.ALIGN_ADD/SUM_DELAY/i_pipe/mant_norm[5] level0_i/level1/level1_i/ulp/memory_subsystem/inst/interconnect/interconnect_m00_axi_mem00/inst/m00_exit_pipeline/m00_exit/inst/aw_reg/skid_buffer[1144]_i_1__0_n_0 level0_i/level1/level1_i/ulp/memory_subsystem/inst/interconnect/interconnect_s06_axi/inst/m00_exit_pipeline/m00_exit/inst/ar_reg/Q[134] level0_i/level1/level1_i/ulp/arith_0/inst/dadd_64ns_64ns_64_5_full_dsp_1_U40/reduce_ops_dadd_64ns_64ns_64_5_full_dsp_1_ip_u/inst/i_synth/ADDSUB_OP.ADDSUB/LOGIC_SPEED.OP/ALIGN_BLK/FRAC_ADDSUB/DSP_ADD.FRAC_ADDSUB/i_no_versal_es1_workaround.DSP48E1_ADD.DSP48E1_ADD/i_no_versal_es1_workaround.DSP/P[26] level0_i/level1/level1_i/ulp/arith_0/inst/dadd_64ns_64ns_64_5_full_dsp_1_U40/reduce_ops_dadd_64ns_64ns_64_5_full_dsp_1_ip_u/inst/i_synth/ADDSUB_OP.ADDSUB/LOGIC_SPEED.OP/ALIGN_BLK/FRAC_ADDSUB/DSP_ADD.FRAC_ADDSUB/i_no_versal_es1_workaround.DSP48E1_ADD.DSP48E1_ADD/i_no_versal_es1_workaround.DSP/P[25] level0_i/level1/level1_i/ulp/arith_0/inst/dadd_64ns_64ns_64_5_full_dsp_1_U36/reduce_ops_dadd_64ns_64ns_64_5_full_dsp_1_ip_u/inst/i_synth/ADDSUB_OP.ADDSUB/LOGIC_SPEED.OP/ALIGN_BLK/FRAC_ADDSUB/DSP_ADD.FRAC_ADDSUB/i_no_versal_es1_workaround.DSP48E1_ADD.DSP48E1_ADD/i_no_versal_es1_workaround.DSP_7
Scatter gives wrong results when compression is used. This causes both accl.scatter
and accl.reduce_scatter
to fail.
Currently the receive pipeline supports fan-in == 1 and all collectives are ring-based as a result. Adding support for fan-in > 1 would enable tree collectives.
Currently it is not possible to have the same source and destination rank for send/recv.
A possible replacement for this call may be a copy from or to stream to buffer the data manually.
It would be useful to add support for in-place operators in the reduce collectives, so that you can pass the same buffer for input and output.
For some collectives it's impossible to use streaming input/output, for example reduce can't use a stream as output. For new users it might not be easy to find out where this feature is supported. It would be useful to have a table on readthedocs that shows for each argument of all collectives whether it supports streaming or not.
Application b_eff works in emulation and simulation as expected.
ACCL XRT tests succeed with the bitstream (except streaming send/recv which is not used by the application).
Application hangs on first receive when executed on hardware.
Run run_mpi.sh
in following directory: /proj/xlabs_t3/users/mariusm/runs/2022-10-20-b_eff_accl_test_u55c
Application repo: https://gitenterprise.xilinx.com/mariusm/HPCC_FPGA.git
Bitstream: /proj/xlabs_t3/users/mariusm/synth/benchmarks/b_eff/u55c_pl/build
stream_put
requires to pass a stream_id. This stream_id needs to be >= 9. In the C++ driver, this id is subtracted by 9 in this line because of non-obvious reasons.
This leads to the issue, that if stream_put
is called with stream_id=9 and source and destination rank are the same, it will send the data to stream_id=0.
The behavior is described in this test which should pass in my eyes.
For send/recv, the stream_id/tag is used as expected as it can be seen in this test.
In my opinion, the behavior should be the same for both operations.
In the CCLO description, the streaming connections to user kernels are described in the text but not visualized in the figure.
Maybe this visualization could then be extended to describe the data path of streaming operations for streaming.
Buffers get allocated sequentially and never get de-allocated, which can create problems especially with the simulator, which has very little memory. There needs to be a method to delete buffers and reuse the freed memory for new buffers (i.e. a real allocator)
In some specific development environments, the emulator and simulator hang on opening the ports to communicator ranks. This specifically happens when using the Python driver.
Provide alltoall collective in ACCL. Explore ring-based implementation for fanin==1 and broadcast-based implementations for fanin >= 1.
Emulation with >8 threads is not supported. Request from 2 users
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.