Comments (13)
This instruction is implemented and handled by the decoder. There is a unit test for this instruction that passes. Also, the destination register is correctly identified as a scalar register, which prevents any address alignment checks that could trigger an illegal instruction.
Is it possible that previous code has caused an invalid configuration (with the vill
bit in vtype
being set)? Otherwise I do not see how Vicuna could flag this instruction as illegal.
from vicuna.
Thanks, I looked into it further. Here's some more details:
C code with intrinsics:
void vect_addReduction(unsigned int N, const int8_t *vec1, int16_t *scalarOut){
// set result vector to 0
size_t vl = vsetvl_e16m1(1);
vint16m1_t v2 = vmv_v_x_i16m1(0, vl);
while (N > 0) {
vl = vsetvl_e8m1(N);
vint8m2_t v1 = vle8_v_i8m2(vec1, vl);
v2 = vwredsum_vs_i8m2_i16m1(v2, v1, v2, vl);
vec1 += vl;
N -= vl;
}
// return result
*scalarOut = vmv_x_s_i16m1_i16(v2);
}
Produces this assembly:
00013d1a <vect_addReduction>:
13d1a: 4785 li a5,1
13d1c: 0087f7d7 vsetvli a5,a5,e16,m1,tu,mu
13d20: 0087f7d7 vsetvli a5,a5,e16,m1,tu,mu
13d24: 5e0030d7 vmv.v.i v1,0
13d28: e901 bnez a0,13d38 <vect_addReduction+0x1e>
13d2a: 00807057 vsetvli zero,zero,e16,m1,tu,mu
13d2e: 421027d7 vmv.x.s a5,v1
13d32: 00f61023 sh a5,0(a2)
13d36: 8082 ret
13d38: 000577d7 vsetvli a5,a0,e8,m1,tu,mu
13d3c: 0017f757 vsetvli a4,a5,e8,m2,tu,mu
13d40: 02058107 vle8.v v2,(a1)
13d44: 0017f757 vsetvli a4,a5,e8,m2,tu,mu
13d48: 9e103257 vmv1r.v v4,v1
13d4c: 95be add a1,a1,a5
13d4e: 8d1d sub a0,a0,a5
13d50: c62200d7 vwredsum.vs v1,v2,v4
13d54: bfd1 j 13d28 <vect_addReduction+0xe>
It's a little different from the original post, but the same instruction is causing an error: 421027d7 vmv.x.s a5,v1
Probing deeper, I see that these two settings
Line 390 in 52c9ee1
are
{VSEW_8, VSEW_16}
, and this leads us to check the LMUL settings atLine 407 in 52c9ee1
which are
{LMUL_2, LMUL_1}
. Since that combination is not in the case statement, vsew_q
becomes VSEW_INVALID
, and the whole instruction is flagged as invalid.
A few things to note:
- This simulates okay in Spike
- The C code is manually written, so I could be doing something wrong here (for example, do I need set v2 as the return value of this line when it is also in the dst argument?
v2 = vwredsum_vs_i8m2_i16m1(v2, v1, v2, vl);
) - This is using the GCC compiler, which is not as up-to-date as Clang; Clang produces assembly without this instruction, so maybe it's a GCC issue?
Let me know if there are specific register/signal values or waveforms you need and I'll post them. Thanks!
from vicuna.
Thanks for the detailed analysis!
The instruction at address 13d2a
(vsetvli zero,zero,e16,m1,tu,mu
) uses x0
as both the source and destination register, which allows to change the vtype
setting while retaining the current vector length vl
. According to the spec:
When rs1=x0 and rd=x0, the instruction operates as if the current vector length in vl is used as the AVL, and the resulting
value is written to vl, but not to a destination register. This form can only be used when VLMAX and hence vl is not actually
changed by the new SEW/LMUL ratio. Use of the instruction with a new SEW/LMUL ratio that would result in a change of
VLMAX is reserved. Implementations may set vill in this case.
At the time the instruction is executed vsew_q
is VSEW_8
and lmul_q
is LMUL_2
, which was probably set by vsetvli a4,a5,e8,m2,tu,mu
at address 13d44
(before jumping to 13d28
via the jump at 13d54
). The vsetvli
instruction at address 13d2a
now attempts to change the configuration to VSEW_16
and LMUL_1
.
Changing the element width from VSEW_8
to VSEW_16
means that each vector register element will occupy twice the space, effectively halving VLMAX
. This can be compensated by doubling the vector length multiplier. However, the length multiplier is reduced from LMUL_2
to LMUL_1
instead. Thus, the configuration change would effectively divide VLMAX
by four, which is reserved for this variant of vsetvli
. Therefore, Vicuna sets vsew_q
to VSEW_INVALID
(which corresponds to the vill
bit being set in the vtype
CSR) and any subsequent attempt to execute a vector instruction results in an illegal instruction exception.
Note that the spec allows but does not require implementations to set the vill
bit, which is probably why the issue does not appear in spike. Regardless, this sequence of vsetvli
instructions is reserved and the compiler should not produce such code.
from vicuna.
Thanks, this makes sense. At first glance, it seems the issue is in my code. When I return the result, I don't update the vector length or LMUL value. If I do, then the code runs on Vicuna as expected:
// return result
vl = vsetvl_e16m1(0);
*scalarOut = vmv_x_s_i16m1_i16(v2);
}
But on further inspection of the spec, it says:
Specifically,
The instructions ignore LMUL and vector register groups.
and
vmv.x.s performs its operation even if vstart ≥ vl or vl=0.
which suggest that the above considerations aren't valid for these instructions?
from vicuna.
I believe the intention with the C intrinsics for vector operations is that the compiler automatically inserts vsetvl
instructions as required in most cases. Your C code seems to be correct. gcc should not use a vsetvli
with rs1=x0 and rd=x0 unless it can be certain that VLMAX
is not being changed.
The vmv.x.s
instruction ignores LMUL but it does not ignore SEW, both of which are updated simultaneously by each vsetvl
instruction. Vicuna sets vsew_q
to VSEW_INVALID
to encode an illegal configuration (i.e. vill
bit set), as suggested by the spec:
A small implementation supporting ELEN=32 requires only seven bits of state in vtype: two bits for ma and ta, two bits for
vsew[1:0] and three bits for vlmul[2:0]. The illegal value represented by vill can be internally encoded using the illegal 64-bit
combination in vsew[1:0] without requiring an additional storage bit to hold vill.
Most vector instructions will fail if the configuration is invalid:
If the vill bit is set, then any attempt to execute a vector instruction that depends upon vtype will raise an illegal-
instruction exception.
The spec also notes that:
vset{i}vl{i} and whole-register loads, stores, and moves do not depend upon vtype.
Actually, Vicuna does not fully comply with the spec in this regard, since whole-register instructions are also flagged as illegal if the configuration is invalid. I will fix this.
from vicuna.
Thanks. It appears that Clang also outputs a vsetvli
with rs1=x0 and rd=x0, albeit for a different function in my code. This has the same issue, where VLMAX
is changed. I guess I'll have to file a bug with LLVM :(
https://godbolt.org/z/1a99z4obo
from vicuna.
from vicuna.
Craig replied almost instantly. I'll copy it here:
The instruction generated is vsetivli which is a different instruction than vsetvli. There's an extra 'i' in the middle. As far as I know vsetivli allows an immediate of 0. This should be different than the x0, x0 encoding for vsetvli.
Does this change things?
from vicuna.
Sorry, looking into it further, the original post used the vsetvli
instruction, which is different from the LLVM output using vsetivli
. I think the LLVM-compiled code is also failing my simulation, so let me confirm. If so, it's probably a different issue. I'll close this issue. It's likely a bug in GCC, but I think LLVM is the way to go for now.
from vicuna.
Thanks for the updates and sorry for the late reply!
It seems that Vicuna incorrectly interprets a vsetivli
with an immediate value of 0 the same a vsetvli
with x0
as source operand. I am working on fixing this.
from vicuna.
Awesome, thanks! Do you want a new issue for that bug?
from vicuna.
I am reopening this issue as a reminder and will close it once that is fixed.
from vicuna.
Vicuna's handling of the special AVL encodings should now be spec compliant.
from vicuna.
Related Issues (20)
- Vicuna accepts instructions for which source registers are not valid.
- Wrong result generated by multiply unit (probably control logic related) HOT 1
- Wrong operand for `vwmacc(u|us|su).vx`
- narrowing instructions are never popped from the instruction queue HOT 1
- Vicuna + Ibex and WFI
- No way to clear a cache error?
- fail to set VREG_W=2048 HOT 2
- Floating point support.
- Combinatorial Loop Alert while Generating bitstream for vicuna using CV32E40X as a scalar core
- 'Illegal Instruction' when executing sign and zero extend functions when destination LMUL=8
- Error in Questasim Simulation
- Reserved word not implemented: 'config' HOT 1
- Machine mode CSRs not accessible with Ibex host core when using Vicuna's verilated model HOT 1
- Certain applications when executed with dual and triple pipeline configurations on verilated model of Vicuna hang indefinittely
- Applications hang indefinitely on Verilator when size of data cache is different than 0 HOT 1
- Suggestion for vectorizing MaxPool and Convolution Layer HOT 1
- Rounding issue for `vasub(u).(vv|vx)`
- Tail-undisturbed policy violation for comparison instructions.
- Masking not working
- `vslidedown.(vx|vi)` issue when VLMAX is exceeded
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vicuna.