arm-software / acle Goto Github PK

View Code? Open in Web Editor NEW

78.0 78.0 50.0 44.88 MB

Arm C Language Extensions (ACLE)

License: Other

Shell 13.88% Python 52.48% HTML 14.69% SCSS 7.58% TeX 10.13% Dockerfile 1.24%

acle's People

Contributors

Stargazers

Watchers

acle's Issues

[proposal][FMV] Make target_clones and target_version mixable

as proposed llvm/llvm-project#74358 (comment) here let's make possible to mix target_clones with target_version.

__attribute__((target_clones("dotprod", "aes"))) 
int callee(void) {
    return 42;
}

__attribute__((target_version("default"))) 
int callee(void) {
    return 0;
}

int caller(void) {
    return callee();
}

Optimize YAML content in the github.io dev branch

Merge all the YAML content (pdf and html one) in the header of the md file of the specs, and remove the mechanisms that uses the

We should check that we are happy with both the PDF and HTML rendering.

[BUG] __ARM_FEATURE_CRYPTO replacements have unclear descriptions

Describe the bug

In the Q2 2018 version of the document, __ARM_FEATURE_CRYPTO was deprecated in favor of finer-grained __ARM_FEATURE_AES and __ARM_FEATURE_SHA2 flags. But the ARM cryptography extension originally covered four features: FEAT_AES, FEAT_PMULL, FEAT_SHA1, and FEAT_SHA256.

Given how critical FEAT_PMULL is to AES-GCM, the most common AES mode, I assume the intent was to cover FEAT_PMULL under __ARM_FEATURE_AES. But ACLE just says "the AES Crypto instructions from Armv8-A are supported and intrinsics targeting them are available". But looking through "Arm Architecture Reference Manual; Armv8, for A-profile architecture", AES is originally defined under "Armv8.0 Cryptographic Extension", which covers all four. It later says:

From Armv8.2, an implementation of the Armv8.0 Cryptographic Extension can include either or both of:

The AES functionality, including support for multiplication of 64-bit polynomials. The
ID_AA64ISAR0_EL1.AES field indicates whether this functionality is supported.

The SHA1 and SHA2-256 functionality. The ID_AA64ISAR0_EL1.{SHA2, SHA1} fields indicate whether
this functionality is supported.

I assume this implies that "the AES crypto instructions" includes pmull? This is confusing because phrase "AES crypto instructions" doesn't appear in the reference manual. Is the terminology defined in another document instead? Either way, this is confusing enough that I think it's simplest for ACLE to list this under __ARM_FEATURE_AES explicitly.

Next, ACLE says __ARM_FEATURE_SHA2 covers "the SHA1 & SHA2 Crypto instructions from Armv8-A are supported", so that covers FEAT_SHA1. This one's okay, but, "SHA2 Crypto instructions" is odd way to refer to SHA-256 because SHA-512 is also in the SHA-2 family. And indeed the ARM reference manual says "SHA2-256". Better to clarify this, especially given...

__ARM_FEATURE_SHA512 is defined as in terms of "the SHA2 Crypto instructions from Armv8.2-A are supported and intrinsics targeting them". This is extra confusing because A2.3.1 of the reference manual refers to this as. "FEAT_SHA512", "Advanced SIMD SHA512 instructions", and "SHA2-512 functionality". Any reason not to specify SHA-512? Defining it relative to v8.2 is especially confusing because it seems splitting v8.0's crypto extensions into AES/PMULL vs SHA-1/SHA-256 itself dates to v8.2.

Screenshots

n/a

Arm Neon Intrinsics guarded by `__ARM_FEATURE_SVE`?

In the 2021Q2 version of neon_intrinsics/advsimd.rst (in the Basic Intrinsics section) the documentation states: "The intrinsics in this section are guarded by the macro __ARM_FEATURE_SVE."

Cf. https://raw.githubusercontent.com/ARM-software/acle/6232924d3a1e5df0dab6dde5472d81a3ec03a391/neon_intrinsics/advsimd.rst

Is it possible that __ARM_NEON should be used instead?

[BUG] Phrase in "Name mangling" section is missing words

Name mangling ^
The "default" version is not mangled top of the language specific name mangling.

From comments by Sally Neale: "Are there missing words between mangled and top?
should it read
"The '"default"' version is not mangled at the top of the language specific name mangling"

Improve gihub.io pages

After merging #65 , we need to:

#81
make sure that the CI and all contributors badges show up in the Home page.
Add a button that says "report a bug" that points at https://github.com/ARM-software/acle/issues (done in #86)
Add a button that says "contribute with a proposal" that points at the top folder CONTRIBUTING.md (done in #86 )
remove Table of Contents from main README.md (done in #82)
In the navbar, change Date to Release date. (done in #85)
When clicking on sections in the TOC, it would be good if the section would not disappear under the nav bar. (Proposed solution: https://stackoverflow.com/questions/10732690/offsetting-an-html-anchor-to-adjust-for-fixed-header) (done in #94: https://arm-software.github.io/acle/mve_intrinsics/mve.html#extract-one-element-from-vector )
Add a button to go back to the table of contents in the renderings of the specs (done in #118 )

[BUG]Error: General purpose registers may not be the same -- `vmov q3[2],q3[0],r2,r2'

I tried to optimize kf_bfly4 function of opus_fft_impl in opus 1.3.1,but the complier reported "General purpose registers may not be the same -- vmov..." when the following code snippet was added.This bug appeared when gcc was set with O2 optimization,and disappeared with O0.How to solve this?Thanks.
"
vst1q_s32(ai, Fout_4.val[0]);
vst1q_s32(ai+m1, Fout_4.val[1]);
vst1q_s32(ai+m2, Fout_4.val[2]);
vst1q_s32(ai+m3, Fout_4.val[3]);
"

[BUG] CMSE section 8.4.2: the example mentions soft floating-point ABI wrongly

Describe the bug

In CMSE section 8.4.2, the example states that some code sequence that clears FP registers is relevant due to the "soft float ABI".
This is wrong because FP registers aren't available in the soft float ABI to begin with.

Bug report by Thomas Grocutt.

Our commitment

We will work to solve the bug report in time for the upcoming
release. However, we would like to encourage you to submit the fix
yourself, if possible. If you intend to do so and this is your first
contribution, we recommend reading our contribution
guidelines.

[BUG] poly64 load intrinsics are not available on v7

The intrinsics vld1_p64 and vld1q_p64, as well as other intrinsics that construct poly64s like vcreate_p64 and vreinterpret_p64_s32 are not supported for v7, even though intrinsics accepting poly64 vectors as arguments like vadd_p64 are supported (with no way to construct their arguments).

acle/tools/intrinsic_db/advsimd.csv

Line 2077 in 6eb8516

 poly64x1_t vld1_p64(poly64_t const *ptr) ptr -> Xn LD1 {Vt.1D},[Xn] Vt.1D -> result A32/A64 

[proposal] Make intrinsics are available regardless of feature macros

Availability of an intrinsic is sometimes gated on a feature macro like: __ARM_FEATURE_RNG. these macros only turned on when the compilation unit compiled for a given architecture e.g. with march flag.

A given function may compiled for a specific target feature and the availability check is done in runtime.

__attribute__((target("rand")))
void foo(void){
// __rndr() can't be used here. 
}

Some implementation already allows such a behaviour to favour intrinsics use instead of inline assembly.
Some code manually enables these feature macros just to make the above example code compile.

Proposal: Make all intrinsics always available. Keep the feature macros to keep the indication of general use of the feature.

Compilers may still validate the availability of the instruction in a given context but still the developer is responsible of the correct use.

[BUG] Navigation bar covering title.

Describe the bug

The pages on github.io do not render correctly on small screens. The navigation bar becomes too tall and covers the title of the page. See this address: https://arm-software.github.io/acle/morello/morello.html

Screenshots
I have attached a screenshot of the rendering I see on my 13in laptop.

improve the rendering of the tables of intrinsics

odd/even colors for the rows

Typo in vaddq_s16

In the vaddq_s16 intrinsic (https://github.com/ARM-software/acle/blob/main/tools/intrinsic_db/advsimd.csv#L20) the type for the b parameter is int16x8q_t it should be int16x8_t.

Broken link on https://arm-software.github.io/acle/cmse/

https://arm-software.github.io/acle/cmse/ contains a broken link in the Defects reports section on the text Arm®v8-M Security Extensions - Requirements on Development Tools.

[BUG] No specification for __ARM_FEATURE_BF16

The macro __ARM_FEATURE_BF16 is referenced a couple of times in the ACLE specification, but there is no specification of exactly when this macro should be defined by the compiler. That leaves implementers to infer this. Given the text, a possible inference (though probably wrong) is that the compiler supplies the header arm_bf16.h.

There is a similar issue for __ARM_FEATURE_BF16_SCALAR_ARITHMETIC, which is referenced, but not defined (unlike the vector variant).

[CI] Detect broken internal links in markdown files generated from RST

The issue

The conversion RST to Markdown has lost the ability to detect bugs in internal hyperlinks.

In the RST version, we were checking the internal hyperlinks via rstcheck. When going to markdown, sectioning anchors like ssec-whatever have lost meaning.

For example, the hyperlink ssec-whatever_ in the RST sources would point to the following anchor

.. _ssec-whatever:

The section title
----------------

The translation to mardown via pandoc has lost track of these hyperlink, because in a pandoc (or GitHub) world the header in the example would have been converted in

# The section title

For this header, the internal hyperlink would assume the form:[link text](#the-section-title). Therefore, the references using the original anchor name from the RST file like [link text](#ssec-whatever) have lost their target and don't map to the correct place in the docs. This can be seen also in issue #7 .

We use markdown-links-check to find for broken links, but these internal link errors cannot be detected by markdown-links-check.

We know that pandoc doesn't emit warnings for missing internal hyperlinks, and we also know that pdflatex doesn't fail when these links are not resolving correctly. Therefore, we cannot expect neither tools to fail when the internal links are missing.

However, we can run pandoc in --verbose mode when generating the PDFs, and we can capture the output of pdfTeX which is emitting the following warnings for missing links:

pdfTeX warning (dest): name{sec-NEON-intrinsics} has been referenced but does n
ot exist, replaced by a fixed one

Therefore, we can produce the list of broken links by extracting all the text inside the name{...} text.

One way to produce the full list of all the broken links is the following (this was run on acle.md as present in https://github.com/ValeriaDaneva/acle/blob/github.io-rst-to-md/main/acle.md, from #101 ):

 pandoc acle.md --verbose --fail-if-warnings -o acle.pdf 2>&1 | grep -E 'pdfTeX warning \(dest\): name{[^}]+}' | sed -E  's/.*name\{([^}]+)\}.*/\1/' | sort | uniq

proposed solution

A bash script with a function that invokes the comamnd used to generate the list of broken links for all the MDs of the specs, and that fails if such list is not empty. The script needs to be invoked in the workflow .github/workflows/ci.yml.

[BUG] [PDF] Identifiers with underscores are not searchable inside paragraphs

Description

The identifiers with underscores that are typeset as inline code spans cannot be searched for.

The acle.pdf document contains many preprocessor macro names such as __ARM_FEATURE_CRC32 both inside standalone code blocks and surrounded by other text in a paragraph. These names are visually rendered with underscores, as expected. On the other hand, when searched for with Ctrl+F in a PDF viewer, such identifiers are only found if they are inside standalone blocks of code. This was tested in Okular viewer and the internal PDF viewer integrated into Firefox 121 (snap package on Kubuntu 23.10), but according to this question on LaTeX StackExchange, this is a known issue affecting underscore character in LaTeX output. The same issues exist in other released PDFs in this repository (maybe except for mve-<version>.pdf).

I do understand that PDF format is not always copy-paste friendly, but searching for preprocessor macroses seems to be generally useful and this issue seems to be quite easy to fix.

How to reproduce

Open acle.pdf from the ACLE 2023 Q2 release
Search for __ARM_FEATURE_CRC32
Search for __ARM_FEATURE_FP16_SCALAR_ARITHMETIC
In the Section 3.4.2 <arm_fp16.h>, copy the last line before the code block into a plain text editor

Observed behavior

__ARM_FEATURE_CRC32 is not found at all, __ARM_FEATURE_FP16_SCALAR_ARITHMETIC is found 3 times: twice in Section 3.4.2 and once in Section 13.2.5 (always inside a standalone code block).

The last line in section 3.4.2 is copied as

ARM FEATURE FP16 SCALAR ARITHMETIC feature macro should be tested before including the header:

(note missing underscores).

Expected behavior

All of the above occurrences are found as well as:

__ARM_FEATURE_CRC32 in Section 5.4.12, Section 8.8 as well as in the summary table in Section 5.12
__ARM_FEATURE_FP16_SCALAR_ARITHMETIC in Section 3.4.2 (right before the code block), 5.5.5.1 and others

The last line in section 3.4.2 is copied as

__ARM_FEATURE_FP16_SCALAR_ARITHMETIC feature macro should be tested before including the header:

Proposed fix

As suggested in the above question on LaTeX StackExhange, I managed to fix this issue by inserting the

\usepackage[T1]{fontenc}

line right after the \documentclass line in the tools/acle_template.tex file and rebuilding PDFs with build_with_docker.sh. I am not experienced enough in LaTeX to ensure this does not break anything but this looks like a working solution.

[BUG] neon_intrinsics: some fp16 convert intrinsics have type mismatches

The intrinsics for scalar converts between fp16 and integer/fixed-point values are currently specified to always use 16bit -> 16bit converts, regardless of the size of the integer/fixed-point value. For example:

acle/tools/intrinsic_db/advsimd.csv

Line 3795 in 6eb8516

float16_t vcvth_f16_s32(int32_t a) a -> Hn SCVTF Hd,Hn Hd -> result A32/A64

Using the instruction listed causes certain input values to be treated incorrectly. For the above example, an input int32 65504 produces an fp16 value of -32.0, instead of the expected 65504.0.

Instead, the above intrinsic should use the SCVTF Hd,Wn instruction, which better matches the input type.

This applies to all scalar converts between fp16 and 32-bit/64-bit integer/fixed-point converts:

float16_t vcvth_f16_s32
float16_t vcvth_f16_s64
float16_t vcvth_f16_u32
float16_t vcvth_f16_u64
int32_t vcvth_s32_f16
int64_t vcvth_s64_f16
uint32_t vcvth_u32_f16
uint64_t vcvth_u64_f16
float16_t vcvth_n_f16_s32
float16_t vcvth_n_f16_s64
float16_t vcvth_n_f16_u32
float16_t vcvth_n_f16_u64
int32_t vcvth_n_s32_f16
int64_t vcvth_n_s64_f16
uint32_t vcvth_n_u32_f16
uint64_t vcvth_n_u64_f16

Testing with two mainstream compilers (gcc and clang/llvm) shows that these intrinsics are often already generating the proposed instructions, rather than the instructions listed in the ACLE. In particular:
GCC (tested with 9.2.0) generates the proposed instructions for all of the intrinsics.
clang/llvm (tested with 14.0.0) generates the proposed instructions for the integer converts, but generates the ACLE instructions for the fixed-point converts.

[BUG] build-github-pages CI job is failing due to wrong Ruby version

Recent updates to the pages-gem repository, used by the build-github-pages CI job, have introduced a regression by which the build fails due to incompatible Ruby version.

Upstream is aware of this, and it seems to be non-trivial to make a local downstream fix. I therefore believe we can afford to wait until it's fixed by them.

[BUG] CMSE specs: Need examples using new Armv8.1-M instructions to get correct behaviour and show optimal new sequences.

Versions of the CMSE example code that use the new instructions in Armv8.1-M should be created. While some of the instructions (EG CLRM and VSCCLRM) just provide performance and code size improvements, correct usage of the new FPCXT payloads is required to fix an ABI issue.

For example:

Many examples (including sections 8.3.1, 8.3.2, 8.3.3, 8.4.2) have a long string of mov r?, #0 instructions to clear the registers. These can be replaced by the new CLRM clear multiple instruction.
The FPCXTS payload should be saved and restored around a Secure -> Non-secure call. This should be incorporated into the example shown in section 8.3.1
The FPCXTNS payload should be saved and restored on entry/exit from a cmse_nonsecure_entry function that either uses floating point, or calls another function that may use floating-point. This could be added to a new version of the example in section 8.4.2, although it might be cleaner to have a simpler example that doesn’t also involve stacked arguments.
Section 8.4.2 contains the following code to detect if an FP context is active, and to clear the registers if they contain secure data:

@14: check SFPA bit to see if FP is used
    mrs     r1, control
    tst     r1, #8
    bne     .LdoneFP
    @15: clear floating point caller-saved registers (soft ABI)
    mov     r1, #0
    vmov    s0, s1, r1, r1
    vmov    s2, s3, r1, r1
    ...
    vmov    s30, s31, r1, r1
    @16: clear floating point flags
    vmsr    fpscr, r1
.LdoneFP:

This code can be replaced by the new VSCCLRM instruction

[BUG] Missing intrinsics for AArch32 instructions VMLA.F16 and VMLS.F16

Alongside VFMA.F16/VFMS.F16, AArch32 offers VMLA.F16/VMLS.F16 instructions which performs multiply-add operation with intermediate rounding. Importantly, the vector-by-vector lane form (e.g. VMLA.F16 Qd, Qn, Dm[x]) on AArch32 is supported only for VMLA/VMLS instructions, and not for VFMA/VFMS instructions.

The NEON intrinsics specification lacks intrinsics for the VMLA/VMLS instructions. In particular, it makes impossible to achieve peak performance on half-precision matrix-matrix multiplication in AArch32 using NEON intrinsics, because the optimal implementation would use the VMLA.F16 Qd, Qn, Dm[x] instructions.

I request that NEON specification be updated to include the following intrinsics for AArch32:

vmla_f16 (VMLA.F16 Dd, Dn, Dm)
vmls_f16 (VMLS.F16 Dd, Dn, Dm)
vmlaq_f16 (VMLA.F16 Qd, Qn, Qm)
vmlsq_f16 (VMLS.F16 Qd, Qn, Qm)
vmla_lane_f16 (VMLA.F16 Dd, Dn, Dm[x])
vmls_lane_f16 (VMLS.F16 Dd, Dn, Dm[x])
vmlaq_lane_f16/vmlaq_laneq_f16 (VMLA.F16 Qd, Qn, Dm[x])
vmlsq_lane_f16/vmlsq_laneq_f16 (VMLS.F16 Qd, Qn, Dm[x])

[morello] Reference the CHERI hybrid guide in the specs

The CHERI hybrid guide should be referenced from the ACLE. This guide was not available when we first published the Morello ACLE, but it is now.

[proposal] Add vector intrinsics for loading into lane 0 and setting other lanes to 0

It would be useful to have vector intrinsics that load lane 0 from memory and set the other elements to zero. E.g.:

int8x16_t vfoo_s8(const int8_t *) → LDR Bn, [Xn]
int16x8_t vfoo_s16(const int16_t *) → LDR Hn, [Xn]
….

The same thing would work for SVE.

GCC does at least optimise something like:

#include <arm_neon.h>

float32x2_t f(float32_t *ptr)
{
    float32x2_t vec = {};
    vec = vld1_lane_f32(ptr, vec, 0);
    vec = vld1_lane_f32(ptr + 2, vec, 1);
    return vec;
}

to:

        ldr     s0, [x0], 8
        ld1     {v0.s}[1], [x0]
        ret

and LLVM behaves similarly, but that seems a bit indirect.

[proposal] Clarify that AArch32 scalar polynomial types' underlying integer type is a recommendation

The definition of polynomial types in AArch32 is inconsistent between LLVM, gcc and ACLE.

At this point there's potentially high risk in changing the compilers' implementation as it may cause user code breakage.

The ACLE must be modified to state that the polynomial types are recommended to be unsigned, but ultimately it is up to the toolchains' discretion whether or not to follow it.

[proposal] Freeze __ARM_ARCH macro up until Armv8.x-A

LLVM and gcc have an older definition of this macro, thus they don't currently follow the ACLE.

Starting from Armv9-a, numerical comparisons break. For instance, 805 (Armv8-5.a) is greater than 9 (Armv9-a).

Action: freeze the macro definition up until Armv8.x-A. Thus it must not be used from Armv9-A onwards. The ACLE spec must be updated to inform so.

However, the ACLE specification must inform users not to use or rely on this macro to control architecture features usage. They must use specific feature macros instead.

Recognizing past contributors (pre Open Source release).

I am going to add past contributors acknowledgment (pre-open source release) via @all-contributors via this issue.

[proposal] Add __ARM_FEATURE macros for more ISA features

I hear that there was previously a principle that we would only add __ARM_FEATURE macros for features that have associated C/C++ intrinsics. However:

This is inconvenient for users who want to use other features in inline asm.
Some features do not add new intrinsics or language constructs, but do improve the performance of existing constructs. Users might want to test for such features too.

There are already macros like __ARM_FEATURE_ATOMICS that have no associated intrinsics. #199 adds another case. There are probably more macros besides these two (I've not gone through all the features to check).

It would be good if we retroactively added macros for existing “relevant” ISA features, to improve consistency. This probably needs a policy decision about where the new line should be drawn. E.g.:

all features with a FEAT_ identification
all features that have associated instructions, system registers, DC encodings, etc.
all features that have associated instructions

(1) feels a bit overboard, but both (2) and (3) seem plausible.

[BUG] CMSE specs: S --> NS call handling

Describe the bug

In section 6.6 of CMSE cmse_is_nsfptr() is used to determine if the function pointer is Secure or Non-secure, and whether to cast the pointer back to a normal function pointer before the call. This isn't needed and the associated comments are misleading. The underlying BLXNS instruction that's used to call Non-secure function checks the LSB of the pointer to determine whether to actually transition between states, or stay in the secure state. So effectively the "only branch to non-secure state if cmse_is_nsfptr()" functionality is already build into the instruction. The code should look something like this:

void call_callback(void) {
    // As the fp pointer has the cmse_nonsecure_call attribute the compiler 
    // will use a BLXNS instruction for the function call. This instruction 
    // determines whether to perform a Secure --> Non-secure transition, 
    // or state in the Secure state based on the LSB of the pointer (as set by
    // cmse_nsfptr_create() when the callback function was passed to the 
    // Secure state)
    fp();      
}

ssec-ls64 link is dead

In paragraph https://github.com/ARM-software/acle/blob/main/main/acle.rst#armv8-7-a-load-store-64-byte-extension there's a link:

ssec-LS64 (https://github.com/ARM-software/acle/blob/main/main/acle.rst#ssec-ls64)

which leads to top of ACLE GitHub page instead of LS64 intrinsics (https://github.com/ARM-software/acle/blob/main/main/acle.rst#load-store-64-byte-intrinsics).

I guess this is minor issue with incorrect paragraph linking?

[BUG] Document title in table within morello.md separates two lines of one cell into two cells

The title of the morello document shouldn't be going into a separate row, but stay inside the cell above it instead.

link each intrinsic to the correspondent documentation in the interactive guide on developer.arm.com

We need to modify the markdown workflow of tools/gen-intrinsics-specs.py so that each intrinsics in the table produced by the script has a link to the corresponding page on developer.arm.com describing the intrinsic.

For example, an intrinsic like [__arm_]vclsq[_s8] in the MVE specs should link to https://developer.arm.com/architectures/instruction-sets/intrinsics/[__arm_]vclsq[_s8]

The format is pretty simple, all we need to do to create the link is to add the name of the intrinsic to the base url of the interactive guide:

https://developer.arm.com/architectures/instruction-sets/intrinsics/<intrinsic name>

[BUG] running build scripts creates directories owned by root

Running ./tools/build-pdfs.sh build etc will generate directories owned by root, including .sass-cache and _site.

[BUG]

Describe the bug

Some of the C code in the next-release branch of the main ACLE spec is not properly formatted (see screenshot).

The issue is due to the presence of few spaces on the left of the three ` wrapping the code.

Screenshots

Fix changelog of the Main ACLE document for next release (2021Q4)

The item Introduce __ARM_FEATURE_PAUTH and __ARM_FEATURE_BTI in sections ssec-PAC and ssec-BTI respectively. at https://github.com/ARM-software/acle/blob/next-release/main/acle.rst#1919changes-between-acle-q2-2021-and-acle-q3-2021 should be part of a new section called Changes for next release

[BUG] mis-formating of "Availability of Armv8.4-A Advanced SIMD intrinsics" section contents

https://arm-software.github.io/acle/main/acle.html#availability-of-armv84-a-advanced-simd-intrinsics

The final bullet point has been mis-formated as a table. Presumably this is because the raw text contains a `|' character in what was intended to be a regular expression..

[BUG] Extra PDF in the artifacts uploaded by the CI

Describe the bug
Runs like this one are producing a file tmp.pdf in the artifacts that is being generated by the script introduced to check the links in the PDF files.

Proposed fix
Delete the file at the end of the script, or avoid targeting the folder pdfs for producing the file.

Add additional item in check list for PR

We should add a checklist item that reminds to add an item in the Changes for next release section.

[main] Move CDE intrinsics out of BETA stage

The CDE intrinsics are specified here: https://arm-software.github.io/acle/main/acle.html#custom-datapath-extension-1

This issue is to track what needs to be done and agreed to be able to remove the intrinsics out of BETA quality.

[BUG] Missing counters in headings

Describe the bug

The counters of the html headers in the NEON specs are showing up in the table of contents, but not in the text of the headers in the body of the document.

Screenshots

Here are the headings without the counters, in the body of the doc.

Here are the items in the TOC, with the counters.

References
This problem has been detected while browsing the page at https://arm-software.github.io/acle/neon_intrinsics/advsimd.html#fp16-armv84-a

Wrong headings in the list of intrinsics of `mve.md`

@ValeriaDaneva, I have missed something in my review of #29.

I have manually created a PDF version of the MVE intrinsic specifications produced out of the changes in #29 and noticed that there is an issue with the headings of the list of intrinsics. To reproduce, run the following command:

pandoc --toc -o mve.pdf -V geometry:landscape  -V geometry:margin=1cm mve.md

You will notice that the sections in the List of intrinsics section need to be moved down in the hierarchy of the headings. this issue is visible also in the HTML output of github.io: https://arm-software.github.io/acle/mve_intrinsics/mve.html#list-of-intrinsics

[BUG] Fix acle.md HTML Formatting

The table headers aren't formatted properly and there is an extra row.

The list of intrinsics at the bottom isn't set as one per line.

Also it's worth checking if the header issue appears in other files and have it be corrected there as well.

[BUG] Architecture availability descriptions for RCPC extensions seems backwards

Describe the bug

In the table at https://github.com/ARM-software/acle/blob/main/main/acle.md#rcpc it says that FEAT_LRCPC is available from Armv8.4 and FEAT_LRCPC2 from Armv8.3, but I believe it should be the other way around: FEAT_LRCPC and the base LDAPR instructions were added in Armv8.3 and extended with the LDAPUR and STLUR variants in Armv8.4 (plus the usual backports to Armv8.2)

Our commitment

[BUG] Label in acle.md Doesn't Link to Anything

The label "[C++ #1]" in acle.md doesn't link to anything, not even previous versions of the document.

[proposal] New macro __ARM_ACLE_VERSION to use with the new __ARM_ACLE macro definition

With the new proposed __ARM_ACLE macro definition, we should consider the creation of a new utility macro to use with it:

#define __ARM_ACLE_VERSION(year,quarter,patch) (year*100 + quarter*10 + patch)
#if __ARM_ACLE >= __ARM_ACLE_VERSION(2024, 1, 0 )

[BUG] __ARM_ACLE macro definition is outdated

The __ARM_ACLE macro definition still follows the old versioning model of the ACLE, as in major.minor.

It should be updated to reflect the new model of year+quarter.

[BUG] Wrong use of architecture macros

Describe the bug

Section 5.4.3 of acle-2022Q4-3.pdf appears to be confused. The document states

The common subset of the A, R and M profiles is indicated by
__ARM_ARCH == 7 && !defined (__ARM_ARCH_PROFILE)

Firstly, I believe the test for the common subset would be just

__ARM_ARCH == 7

and, secondly, all v7 cores should have a profile defined, so the test as shown would not pass for any compilation target.

usdot intrinsics missing AArch32 variants

None of the intrinsics for usdot should be A64 only. They should all be A64/A32.

https://github.com/ARM-software/acle/blob/main/tools/intrinsic_db/advsimd.csv#L4258

The (vector) variant clearly has a Q variant, and for the (by element) intrinsics the higher lanes are implemented on AArch32 by
using the fact that the register file overlaps.

so if you want index [2,3] you remap the number and use the high half of the register.

e.g. a Qn consists of Dn, Dn+1. so when you want index e.g. 2, you remap 2->0 and use register Dn+1 since the registers are always allocated in pairs.

[BUG] __arm_mops_memset_tag should require __ARM_FEATURE_MTE

The __arm_mops_memset_tag intrinsic introduced for the Armv8.8-A/v9.3-A MOPS extension requires both the MOPs and MTE extensions to be executed, but at the moment the ACLE specifies that it only requires the __ARM_FEATURE_MOPS feature macro to be available.
To match its actual dependencies, the intrinsic should require both __ARM_FEATURE_MOPS and __ARM_FEATURE_MTE.

[BUG] Pull request template

Describe the bug

The copyright text appears in the description of the pull request (see screenshot).

Screenshots

[BUG] CMSE specs: Conditionality of FP register clearing shouldn't be described as an optimization

The end of section 8.4.2 of CMSE states:
"The instruction sequence between comment 14 and 15 is an optimization to skip clearing floating point registers if they are not used by the secure state. Removing these instructions is functionally equivalent but might create an unnecessary floating point context."

The instructions at 14 and 15 should be mandatory and not described as an optimization. Stating that removal of these instructions is functionally equivalent is also incorrect given it could result in a floating point context being created. Inadvertently creating a FP context can have some serious side effects, including triggering UsageFaults to be raise.

arm-software / acle Goto Github PK

acle's People

Contributors

Stargazers

Watchers

Forkers

acle's Issues

Description

How to reproduce

Observed behavior

Expected behavior

Proposed fix

Recommend Projects

Recommend Topics

Recommend Org