My compiler produced the following instruction, which Vicuna reports as illegal. It si

This instruction is implemented and <a href="https://github.com/vproc/vicuna/blob/224f

<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="11

Illegal instruction? about vicuna HOT 13 CLOSED

vproc commented on June 19, 2024

Illegal instruction?

from vicuna.

Comments (13)

michael-platzer commented on June 19, 2024

This instruction is implemented and handled by the decoder. There is a unit test for this instruction that passes. Also, the destination register is correctly identified as a scalar register, which prevents any address alignment checks that could trigger an illegal instruction.

Is it possible that previous code has caused an invalid configuration (with the vill bit in vtype being set)? Otherwise I do not see how Vicuna could flag this instruction as illegal.

from vicuna.

stevobailey commented on June 19, 2024

Thanks, I looked into it further. Here's some more details:

C code with intrinsics:

void vect_addReduction(unsigned int N, const int8_t *vec1, int16_t *scalarOut){
  // set result vector to 0
  size_t vl = vsetvl_e16m1(1);
  vint16m1_t v2 = vmv_v_x_i16m1(0, vl);
  
  while (N > 0) {
    vl = vsetvl_e8m1(N);
    vint8m2_t v1 = vle8_v_i8m2(vec1, vl);
    v2 = vwredsum_vs_i8m2_i16m1(v2, v1, v2, vl);
    vec1 += vl;
    N -= vl;
  }
  
  // return result
  *scalarOut = vmv_x_s_i16m1_i16(v2);
}

Produces this assembly:

00013d1a <vect_addReduction>:
   13d1a: 4785                  li  a5,1 
   13d1c: 0087f7d7            vsetvli a5,a5,e16,m1,tu,mu
   13d20: 0087f7d7            vsetvli a5,a5,e16,m1,tu,mu
   13d24: 5e0030d7            vmv.v.i v1,0 
   13d28: e901                  bnez  a0,13d38 <vect_addReduction+0x1e>
   13d2a: 00807057            vsetvli zero,zero,e16,m1,tu,mu
   13d2e: 421027d7            vmv.x.s a5,v1
   13d32: 00f61023            sh  a5,0(a2)
   13d36: 8082                  ret
   13d38: 000577d7            vsetvli a5,a0,e8,m1,tu,mu
   13d3c: 0017f757            vsetvli a4,a5,e8,m2,tu,mu
   13d40: 02058107            vle8.v  v2,(a1)
   13d44: 0017f757            vsetvli a4,a5,e8,m2,tu,mu
   13d48: 9e103257            vmv1r.v v4,v1
   13d4c: 95be                  add a1,a1,a5
   13d4e: 8d1d                  sub a0,a0,a5
   13d50: c62200d7            vwredsum.vs v1,v2,v4
   13d54: bfd1                  j 13d28 <vect_addReduction+0xe>

It's a little different from the original post, but the same instruction is causing an error: 421027d7 vmv.x.s a5,v1

Probing deeper, I see that these two settings

vicuna/rtl/vproc_core.sv

Line 390 in 52c9ee1

unique case ({vsew_q, dec_data_q.mode.cfg.vsew})

are {VSEW_8, VSEW_16}, and this leads us to check the LMUL settings at

vicuna/rtl/vproc_core.sv

Line 407 in 52c9ee1

unique case ({lmul_q, dec_data_q.mode.cfg.lmul})

which are {LMUL_2, LMUL_1}. Since that combination is not in the case statement, vsew_q becomes VSEW_INVALID, and the whole instruction is flagged as invalid.

A few things to note:

This simulates okay in Spike
The C code is manually written, so I could be doing something wrong here (for example, do I need set v2 as the return value of this line when it is also in the dst argument? v2 = vwredsum_vs_i8m2_i16m1(v2, v1, v2, vl);)
This is using the GCC compiler, which is not as up-to-date as Clang; Clang produces assembly without this instruction, so maybe it's a GCC issue?

Let me know if there are specific register/signal values or waveforms you need and I'll post them. Thanks!

from vicuna.

michael-platzer commented on June 19, 2024

Thanks for the detailed analysis!

The instruction at address 13d2a (vsetvli zero,zero,e16,m1,tu,mu) uses x0 as both the source and destination register, which allows to change the vtype setting while retaining the current vector length vl. According to the spec:

When rs1=x0 and rd=x0, the instruction operates as if the current vector length in vl is used as the AVL, and the resulting
value is written to vl, but not to a destination register. This form can only be used when VLMAX and hence vl is not actually
changed by the new SEW/LMUL ratio. Use of the instruction with a new SEW/LMUL ratio that would result in a change of
VLMAX is reserved. Implementations may set vill in this case.

At the time the instruction is executed vsew_q is VSEW_8 and lmul_q is LMUL_2, which was probably set by vsetvli a4,a5,e8,m2,tu,mu at address 13d44 (before jumping to 13d28 via the jump at 13d54). The vsetvli instruction at address 13d2a now attempts to change the configuration to VSEW_16 and LMUL_1.

Changing the element width from VSEW_8 to VSEW_16 means that each vector register element will occupy twice the space, effectively halving VLMAX. This can be compensated by doubling the vector length multiplier. However, the length multiplier is reduced from LMUL_2 to LMUL_1 instead. Thus, the configuration change would effectively divide VLMAX by four, which is reserved for this variant of vsetvli. Therefore, Vicuna sets vsew_q to VSEW_INVALID (which corresponds to the vill bit being set in the vtype CSR) and any subsequent attempt to execute a vector instruction results in an illegal instruction exception.

Note that the spec allows but does not require implementations to set the vill bit, which is probably why the issue does not appear in spike. Regardless, this sequence of vsetvli instructions is reserved and the compiler should not produce such code.

from vicuna.

stevobailey commented on June 19, 2024

Thanks, this makes sense. At first glance, it seems the issue is in my code. When I return the result, I don't update the vector length or LMUL value. If I do, then the code runs on Vicuna as expected:

  // return result
  vl = vsetvl_e16m1(0);
  *scalarOut = vmv_x_s_i16m1_i16(v2);
}

But on further inspection of the spec, it says:

Specifically,

The instructions ignore LMUL and vector register groups.

and

vmv.x.s performs its operation even if vstart ≥ vl or vl=0.

which suggest that the above considerations aren't valid for these instructions?

from vicuna.

michael-platzer commented on June 19, 2024

I believe the intention with the C intrinsics for vector operations is that the compiler automatically inserts vsetvl instructions as required in most cases. Your C code seems to be correct. gcc should not use a vsetvli with rs1=x0 and rd=x0 unless it can be certain that VLMAX is not being changed.

The vmv.x.s instruction ignores LMUL but it does not ignore SEW, both of which are updated simultaneously by each vsetvl instruction. Vicuna sets vsew_q to VSEW_INVALID to encode an illegal configuration (i.e. vill bit set), as suggested by the spec:

A small implementation supporting ELEN=32 requires only seven bits of state in vtype: two bits for ma and ta, two bits for
vsew[1:0] and three bits for vlmul[2:0]. The illegal value represented by vill can be internally encoded using the illegal 64-bit
combination in vsew[1:0] without requiring an additional storage bit to hold vill.

Most vector instructions will fail if the configuration is invalid:

If the vill bit is set, then any attempt to execute a vector instruction that depends upon vtype will raise an illegal-
instruction exception.

The spec also notes that:

vset{i}vl{i} and whole-register loads, stores, and moves do not depend upon vtype.

Actually, Vicuna does not fully comply with the spec in this regard, since whole-register instructions are also flagged as illegal if the configuration is invalid. I will fix this.

from vicuna.

stevobailey commented on June 19, 2024

Thanks. It appears that Clang also outputs a vsetvli with rs1=x0 and rd=x0, albeit for a different function in my code. This has the same issue, where VLMAX is changed. I guess I'll have to file a bug with LLVM :(

https://godbolt.org/z/1a99z4obo

from vicuna.

stevobailey commented on June 19, 2024

llvm/llvm-project#53577

from vicuna.

stevobailey commented on June 19, 2024

Craig replied almost instantly. I'll copy it here:

The instruction generated is vsetivli which is a different instruction than vsetvli. There's an extra 'i' in the middle. As far as I know vsetivli allows an immediate of 0. This should be different than the x0, x0 encoding for vsetvli.

Does this change things?

from vicuna.

stevobailey commented on June 19, 2024

Sorry, looking into it further, the original post used the vsetvli instruction, which is different from the LLVM output using vsetivli. I think the LLVM-compiled code is also failing my simulation, so let me confirm. If so, it's probably a different issue. I'll close this issue. It's likely a bug in GCC, but I think LLVM is the way to go for now.

from vicuna.

michael-platzer commented on June 19, 2024

Thanks for the updates and sorry for the late reply!

It seems that Vicuna incorrectly interprets a vsetivli with an immediate value of 0 the same a vsetvli with x0 as source operand. I am working on fixing this.

from vicuna.

stevobailey commented on June 19, 2024

Awesome, thanks! Do you want a new issue for that bug?

from vicuna.

michael-platzer commented on June 19, 2024

I am reopening this issue as a reminder and will close it once that is fixed.

from vicuna.

michael-platzer commented on June 19, 2024

Vicuna's handling of the special AVL encodings should now be spec compliant.

from vicuna.

Illegal instruction? about vicuna HOT 13 CLOSED

Comments (13)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent