merryhime / dynarmic Goto Github PK

View Code? Open in Web Editor NEW

961.0 48.0 161.0 33.35 MB

An ARM dynamic recompiler.

License: BSD Zero Clause License

CMake 0.76% C++ 65.75% C 0.02% SourcePawn 0.96% POV-Ray SDL 32.39% Assembly 0.11%

jit arm compiler emulation runtime-compilation x86-64 cpp17

dynarmic's Introduction

Dynarmic

A dynamic recompiler for ARM.

Supported guest architectures

ARMv6K
ARMv7A
32-bit ARMv8
64-bit ARMv8

Supported host architectures

x86-64
64-bit ARMv8 (AArch64)

There are no plans to support x86-32.

Projects using Dynarmic

Alternatives to Dynarmic

If you are looking at a recompiler which you can use with minimal effort to run ARM executables on non-native platforms, we would strongly recommend looking at qemu-user-static (description of qemu-user-static, using qemu-user-static in combination with Docker to provide a complete emulated environment). Having a complete plug-and-play solution is out-of-scope of this project.

Here are some projects with the same goals as dynarmic:

ChocolArm64 from Ryujinx - ARMv8 recompiler on top of RyuJIT
Unicorn - Recompiling multi-architecture CPU emulator, based on QEMU
SkyEye - Cached interpreter for ARM

More general alternatives:

tARMac - Tarmac's use of armlets was initial inspiration for us to use an intermediate representation
QEMU - Recompiling multi-architecture system emulator
VisUAL - Visual ARM UAL emulator intended for education
A wide variety of other recompilers, interpreters and emulators can be found embedded in other projects, here are some we would recommend looking at:
- firebird's recompiler - Takes more of a call-threaded approach to recompilation
- higan's arm7tdmi emulator - Very clean code-style
- arm-js by ozaki-r - Emulates ARMv7A and some peripherals of Versatile Express, in the browser

Disadvantages of Dynarmic

In the pursuit of speed, some behavior not commonly depended upon is elided. Therefore this emulator does not match spec.

Known examples:

Only user-mode is emulated, there is no emulation of any other privilege levels.
FPSR state is approximate.
Misaligned loads/stores are not appropriately trapped in certain cases.
Exclusive monitor behavior may not match any known physical processor.

As with most other hobby ARM emulation projects, no formal verification has been done. Use this code base at your own risk.

Documentation

Design documentation can be found at docs/Design.md.

Usage Example

The below is a minimal example. Bring-your-own memory system.

#include <array>
#include <cstdint>
#include <cstdio>
#include <exception>

#include "dynarmic/interface/A32/a32.h"
#include "dynarmic/interface/A32/config.h"

using u8 = std::uint8_t;
using u16 = std::uint16_t;
using u32 = std::uint32_t;
using u64 = std::uint64_t;

class MyEnvironment final : public Dynarmic::A32::UserCallbacks {
public:
    u64 ticks_left = 0;
    std::array<u8, 2048> memory{};

    u8 MemoryRead8(u32 vaddr) override {
        if (vaddr >= memory.size()) {
            return 0;
        }
        return memory[vaddr];
    }

    u16 MemoryRead16(u32 vaddr) override {
        return u16(MemoryRead8(vaddr)) | u16(MemoryRead8(vaddr + 1)) << 8;
    }

    u32 MemoryRead32(u32 vaddr) override {
        return u32(MemoryRead16(vaddr)) | u32(MemoryRead16(vaddr + 2)) << 16;
    }

    u64 MemoryRead64(u32 vaddr) override {
        return u64(MemoryRead32(vaddr)) | u64(MemoryRead32(vaddr + 4)) << 32;
    }

    void MemoryWrite8(u32 vaddr, u8 value) override {
        if (vaddr >= memory.size()) {
            return;
        }
        memory[vaddr] = value;
    }

    void MemoryWrite16(u32 vaddr, u16 value) override {
        MemoryWrite8(vaddr, u8(value));
        MemoryWrite8(vaddr + 1, u8(value >> 8));
    }

    void MemoryWrite32(u32 vaddr, u32 value) override {
        MemoryWrite16(vaddr, u16(value));
        MemoryWrite16(vaddr + 2, u16(value >> 16));
    }

    void MemoryWrite64(u32 vaddr, u64 value) override {
        MemoryWrite32(vaddr, u32(value));
        MemoryWrite32(vaddr + 4, u32(value >> 32));
    }

    void InterpreterFallback(u32 pc, size_t num_instructions) override {
        // This is never called in practice.
        std::terminate();
    }

    void CallSVC(u32 swi) override {
        // Do something.
    }

    void ExceptionRaised(u32 pc, Dynarmic::A32::Exception exception) override {
        // Do something.
    }

    void AddTicks(u64 ticks) override {
        if (ticks > ticks_left) {
            ticks_left = 0;
            return;
        }
        ticks_left -= ticks;
    }

    u64 GetTicksRemaining() override {
        return ticks_left;
    }
};

int main(int argc, char** argv) {
    MyEnvironment env;
    Dynarmic::A32::UserConfig user_config;
    user_config.callbacks = &env;
    Dynarmic::A32::Jit cpu{user_config};

    // Execute at least 1 instruction.
    // (Note: More than one instruction may be executed.)
    env.ticks_left = 1;

    // Write some code to memory.
    env.MemoryWrite16(0, 0x0088); // lsls r0, r1, #2
    env.MemoryWrite16(2, 0xE7FE); // b +#0 (infinite loop)

    // Setup registers.
    cpu.Regs()[0] = 1;
    cpu.Regs()[1] = 2;
    cpu.Regs()[15] = 0; // PC = 0
    cpu.SetCpsr(0x00000030); // Thumb mode

    // Execute!
    cpu.Run();

    // Here we would expect cpu.Regs()[0] == 8
    printf("R0: %u\n", cpu.Regs()[0]);

    return 0;
}

Legal

dynarmic is under a 0BSD license. See LICENSE.txt for more details.

dynarmic uses several other libraries, whose licenses are included below:

catch

Boost Software License - Version 1.0 - August 17th, 2003

Permission is hereby granted, free of charge, to any person or organization
obtaining a copy of the software and accompanying documentation covered by
this license (the "Software") to use, reproduce, display, distribute,
execute, and transmit the Software, and to prepare derivative works of the
Software, and to permit third-parties to whom the Software is furnished to
do so, all subject to the following:

The copyright notices in the Software and this entire statement, including
the above license grant, this restriction and the following disclaimer,
must be included in all copies of the Software, in whole or in part, and
all derivative works of the Software, unless such copies or derivative
works are solely in the form of machine-executable object code generated by
a source language processor.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.

fmt

Copyright (c) 2012 - 2016, Victor Zverovich

All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this
   list of conditions and the following disclaimer.
2. Redistributions in binary form must reproduce the above copyright notice,
   this list of conditions and the following disclaimer in the documentation
   and/or other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

mcl & oaknut

MIT License

Copyright (c) 2022 merryhime <https://mary.rs>

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

robin-map

MIT License

Copyright (c) 2017 Thibaut Goetghebuer-Planchon <[email protected]>

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

xbyak

Copyright (c) 2007 MITSUNARI Shigeo
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice,
this list of conditions and the following disclaimer in the documentation
and/or other materials provided with the distribution.
Neither the name of the copyright owner nor the names of its contributors may
be used to endorse or promote products derived from this software without
specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
THE POSSIBILITY OF SUCH DAMAGE.
-----------------------------------------------------------------------------
ソースコード形式かバイナリ形式か、変更するかしないかを問わず、以下の条件を満た
す場合に限り、再頒布および使用が許可されます。

ソースコードを再頒布する場合、上記の著作権表示、本条件一覧、および下記免責条項
を含めること。
バイナリ形式で再頒布する場合、頒布物に付属のドキュメント等の資料に、上記の著作
権表示、本条件一覧、および下記免責条項を含めること。
書面による特別の許可なしに、本ソフトウェアから派生した製品の宣伝または販売促進
に、著作権者の名前またはコントリビューターの名前を使用してはならない。
本ソフトウェアは、著作権者およびコントリビューターによって「現状のまま」提供さ
れており、明示黙示を問わず、商業的な使用可能性、および特定の目的に対する適合性
に関する暗黙の保証も含め、またそれに限定されない、いかなる保証もありません。
著作権者もコントリビューターも、事由のいかんを問わず、 損害発生の原因いかんを
問わず、かつ責任の根拠が契約であるか厳格責任であるか（過失その他の）不法行為で
あるかを問わず、仮にそのような損害が発生する可能性を知らされていたとしても、
本ソフトウェアの使用によって発生した（代替品または代用サービスの調達、使用の
喪失、データの喪失、利益の喪失、業務の中断も含め、またそれに限定されない）直接
損害、間接損害、偶発的な損害、特別損害、懲罰的損害、または結果損害について、
一切責任を負わないものとします。

zydis

The MIT License (MIT)

Copyright (c) 2014-2020 Florian Bernd
Copyright (c) 2014-2020 Joel Höner

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

dynarmic's People

Contributors

Stargazers

Watchers

Forkers

lioncash bunnei jroweboy spixi subv yuriks freiro lynn jfmherokiller inquisitio hayate891 jacecear shuixi2013 mewbak phanto-m cdagaming fernandos27 jamieelee crackercat explife0011 tilka velocityra fightbob sreeram1211 supertanglang johnny-mac vperus janisozaur pent0 killvxk meme annomatg shutterbug2000 jf-botto neobrain sickc linkmauve zsx0512 freemanzyq alexander-guesnon baiwyc119 a18532086 degasus woo-ing n0thing2speak phymanwow ne0sight fearlesstobi meiro kiku233 sachinvin dampih michaeljclark vdwjeremy aparashk kevinxucs d3v3l0 scowalt ksunhokim wunkolo lxngoddess5321 reinuseslisp zhkl0228 ccheng-ppppp mirsys emuplz xperia64 citra-emu asdlei99 eka2l1 celestialwy icew4y piowind kingking888 harlowja fathui weimeittx korewawatchful near2see wheremyfoodat breathleas esigodini jtsoso909 vricosti vitor-k cglkw sunho vita3k angusholder kelebek1 emulationchannel kappamalone abouvier ckandroidproject avuxo yetanotheropensource tachi107 ameerj eul-lord liushuyu

dynarmic's Issues

Optimization: Constant Folding

This is partially implemented in a0e9417.

Optimization: Remove unnecessarily materialization of constants

Keep track of materialized constants with the register allocator and re-use these.

VFP: Bounce to Support Code

If the processor is not in runfast mode, there is a possibility of bouncing to VFP support code. This VFP support code could hypothetically do something other than IEEE floating point, depending on how the system has been set-up.

Note that this is a nice-to-have as there is very little practical use for this in the current use-cases of this library.

VFP: Optimization: Track if values have been default-NaN-ed

ARM has a default NaN mode. This means that any NaN produced by a VFP operation can only be 0x7fc00000 or 0x7ff8000000000000.

We currently check for NaNs after every single floating point operation. This is unnecessary as the actual bit-value of the NaN does not affect the result of most operations. The default NaN check can occur just before operations where it would be relevant (e.g.: transfer to guest GPRs or memory).

Please keep in mind: x86 (empirically) produces NaN values with the sign bit set. This differs from most other architectures which have the sign bit unset.

Inconsistent warnings across different versions of MSC

Related to citra-emu/citra#2377.

Essentially there are three possible solutions:

Use the /Wv:<version> tag to choose a specific version of the compiler for which warnings will be produced.
Add a new cmake option DYNARMIC_DEV that would enable warnings-as-errors, have that enabled on CI and dev machines.
Ignore the problem.

Dynarmic in Citra Canary 1309 crashes on certain instructions

I am modifying a 3DS game, and due to some space constraints, I've had to employ some dirty ASM to fit my modifications in. One modification involves using the following instruction:

STREQD R3, [R0],-LR (opcode FE3F0000)

This instruction is hit by the PC but will not execute because it very shortly follows a BEQ not taken and flags are not updated before this instruction is hit. When it is hit, I get an assertion failed error in line 148 of dynarmic.cpp in Citra. This does not crash with the JIT off, and it does not crash on hardware.

I also tried (erroneously):

STRDLS R3, [R0], -LR (opcode FE3F0090)

which also crashed. Perhaps the problem has to do with the negative link register as the operand? LR's value would have been equivalent to the opcode of the instruction.

Check calculation of ASPR.GE

Verify calculation of ASPR.GE is correct for all instructions.

Optimization: Improve FRSQRTE performance

https://github.com/MerryMage/dynarmic/blob/6c877ff8dbf06a80cbe413d7e7457a9cb1616d79/src/backend/x64/emit_x64_vector_floating_point.cpp#L966

Consider (and benchmark) one of the following strategies:

Lookup table using e.g. vgatherdps
Advantages: Smallest codesize
Disadvantages: Use of dcache
Using sqrtss and divss for accurate calculation.
Advantages: Minimal dcache usage, Smallest codesize of the following options.
Disadvantages: Heavy latency cost.
Using rsqrtss and a floating-point based Newton step.
Advantages: Minimal dcache usage.
Disadvantages: Larger codesize than above.
Using rsqrtss and a fixed-point based Newton step.
Advantages: Minimal dcache usage, no fp overhead.
Disadvantages: Larger codesize than above.

Which method is most performant depends on if we are dcache bound or bound by host instruction decoding. Thus a microbenchmark is not adequate, the first method would obviously win a microbenchmark; benchmarking should be with an appropriate test application.

Why can't we use rsqrtss directly? The issue here comes from rsqrtss not having sufficient precision to be able to round to 9 significant bits accuracy. vrsqrt14ss may have sufficient precision to not require a Newton step, but that instruction is AVX512.

What annoys me the most about this is the guest application is likely going to just pass the result of FRSQRTE through its own accuracy improvement via FRSQRTS and likely doesn't care about the precision of the final couple of LSB.

Verify all VFP instructions

Verification has only been done for VADD.

Optimization: Detect jump tables

This is a specialization of the optimization in #87.

[meta-issue] Optimizations

List of things to do in order of priority.

High

Return Stack Buffer

Medium

Local pc cache for instructions currently requiring ReturnToDispatch
Use host flags
Get/Set-elimination for Extended Registers (requires special handling)
Access page-table directly for memory access.

Low

Constant folding
Replace memory reads to read-only locations with immediates
Keep track of sexting, zexting and FTZing

Ideas

IR instruction merging
Basicblocks having multipile exits

Core.ARM Unicorn fallback @ 0xA170A34 for 1 instructions (instr = E7FFDEFE)

Core.ARM core\arm\dynarmic\arm_dynarmic_64.cpp:InterpreterFallback:70: Unicorn fallback @ 0xA170A34 for 1 instructions (instr = E7FFDEFE)

gear of unlimited 2 stuck at title screen in yuzu .

feel free to close this if it is already known

Issues with licensing as GPLv2 or later

Similar to citra-emu/citra#1279 the interpreter code is licensed under GPLv2, but several files in dynarmic claim to be licensed under GPLv2 or later. After all of the interpreter fallbacks have been removed, it would be possible to have a release thats actually GPLv3 compatible.

Related to #69

-Werror build fails for Clang 5.0 or later (due to -Wunused-lambda-capture)

$ c++ -v
FreeBSD clang version 5.0.0 (tags/RELEASE_500/final 312559) (based on LLVM 5.0.0svn)
Target: x86_64-unknown-freebsd12.0
Thread model: posix
InstalledDir: /usr/bin

$ pkg install git cmake ninja boost-libs
$ git clone https://github.com/MerryMage/dynarmic
$ mkdir dynarmic_build; cd dynarmic_build
$ cmake -GNinja ../dynarmic
$ ninja
[...]
/tmp/dynarmic/src/frontend/translate/translate_arm/exception_generating.cpp:13:12: fatal error: lambda capture 'cond' is not used [-Wunused-lambda-capture]
    UNUSED(cond, imm12, imm4);
           ^
In file included from /tmp/dynarmic/src/frontend/disassembler/disassembler_thumb.cpp:16:
In file included from /tmp/dynarmic/src/./frontend/decoder/thumb16.h:14:
/tmp/dynarmic/src/./frontend/decoder/decoder_detail.h:118:25: fatal error: lambda capture 'arg_masks' is not used [-Wunused-lambda-capture]
            return [fn, arg_masks, arg_shifts](Visitor& v, opcode_type instruction) {
                        ^

Vectorize vectorized VFP instructions on host

There is no need to write a vectorizer in the backend.

Tasks:

Introduce IR instructions that support vectorization.
Implement a decomposition pass for backends that don't support vectorization.
Implement vectorized instructions in x64 backend.

Issues:

Strict IEEE exception flag bit support may not happen.

VFP: Incorrect behaviour when underflow occurs

Incorrect results and exception bits are set when arithmetic underflow (unrepresentable by denormal numbers) occurs.

Incorrect results: We currently incorrectly return negative zero in some cases. Hardware always returns a positive zero on underflow.

Incorrect exception bits: We currently set IXC when underflow occurs. This does not happen on hardware.

Tests can be found in #18 (comment).

Proposed Fix: (x64 backend)

Store mxcsr before an operation that can cause underflow that's unrepresentable by a denormal. Store mxcsr again after an operation that can cause underflow that's unrepresentable by a denormal. Mask bits as necessary, reload mxcsr.

TODO: Exclusive memory

Use page-tables instead of callbacks
Inline locking into emitted assembly
Don't use VAddr to key the monitor, instead use the underlying memory address instead as a poor-man's substitute for PAddr.
- Consider providing a mode for fastmem-only users.

AArch64: PRFUM causes exception 0

On yuzu, game Axiom Verge

Implement StandardFPSCRValue argument

Add a StandardFPSCRValue argument to all floating-point instructions.

This would replace the existing fpcr_controlled argument.

ARM64 Dynarec for Citra

To clarify: this thread is now dead: libretro/citra#54

Hi.

This is needed in order to have it running at (near) full speed on the following platforms:

Nintendo Switch (through Horizon OS and/or Lakka OS).

Nvidia Shield TV.

Mobile phones/tablets (Android and iOS devices).

Raspberry Pi 3B+.

If needed, a bounty can be created.

https://www.bountysource.com/issues/66791703-arm64-dynarec-for-citra

Warning C4265 when using <functional> header

Reported by @wwylele.

While Microsoft's policy is to make STL's headers clean at /W4, they have no such guarantee for off-by-default warnings like C4265. Reference.

Planned resolution:

Make codebase clean at /W4.
Enable C4265 at level 4 (workaround mentioned in link above).

x64: Optimization: Local PC cache

A instruction-local pc cache can be implemented for instructions currently requiring ReturnToDispatch to minimize returns to dispatch.

Unimplemented Instructions list

This list contains all the instructions which aren't being recompiled by the JIT and are sent into the interpreter. In order to optimize and improve the performance of the JIT, the further instructions must be implemented.

Thumb

Exception generating

BKPT
UDF

Load/Store (System Instructions)

Parallel

QASX
QSAX
UQASX
UQSAX

Status Register Access

Get rid of `InterpreterFallback`

Would like the JIT to be standalone. This requires the remaining unimplemented instructions to be implemented.

Relatedly, we should have UndefinedInstruction and UnpredictableInstruction as callbacks to be called when an undefined or unpredictable instruction is executed.

Only builds on x86_64

For simplicity (system Boost 1.55 here) I'm building as part of Citra with dynarmic updated to git master. Both Clang 3.6 and GCC 5.4 fail at least on FreeBSD i386 9.3/10.1/11.0.

In file included from src/backend_x64/abi.cpp:19:
In file included from src/./backend_x64/abi.h:10:
src/./backend_x64/hostloc.h:93:8: fatal error: no type named 'Reg64' in namespace 'Xbyak'
Xbyak::Reg64 HostLocToReg64(HostLoc loc);
~~~~~~~^

or with add_definitions(-DXBYAK64=1)

In file included from src/backend_x64/hostloc.cpp:7:
In file included from src/./backend_x64/hostloc.h:8:
externals/xbyak/xbyak/xbyak.h:1282:54: fatal error: shift count >= width of type [-Wshift-count-overflow]
        static const size_t dummyAddr = (size_t(0x11223344) << 32) | 55667788;
                                                            ^  ~~

Note, aarch64 fails with the same error.

x64: DenormalsAreZero SSE 4.1 compatibility

Hello, my CPU is an old Q9300 (SSE 4.1),
since the opcode linked below "pcmpgtq" is introduced on SSE 4.2, I need an equivalent one for SSE 4.1, cause I receive the following error:

received signal SIGILL, Illegal instruction.

For what I can understand, I should split the qword and compare the upper and lower dword using "pcmpgtd", but I don't know how to do it with the library.

https://github.com/MerryMage/dynarmic/blob/8d1699ba2db216e569e998ea318d5cde47720e97/src/backend/x64/emit_x64_floating_point.cpp#L99

Not sure if you mind supporting SSE 4.1.
Thanks anyway for your precious time.
Karl

yuzu crashes at boot in cricket 19

providing the log below
yuzu_log_cricket19.txt
feel frree to close if it is not dynarmic related

Doesn't build with system Boost 1.66

I'm testing a beta of Boost 1.66 downstream. 44e6ce3 fails to build after boostorg/optional@c695be11b56a, found via Citra package. Can you reproduce?

$ c++ -v
FreeBSD clang version 5.0.0 (tags/RELEASE_500/final 312559) (based on LLVM 5.0.0svn)
Target: x86_64-unknown-freebsd12.0
Thread model: posix
InstalledDir: /usr/bin

$ pkg install git cmake ninja boost-libs
$ git clone https://github.com/MerryMage/dynarmic
$ mkdir dynarmic_build; cd dynarmic_build
$ cmake -GNinja ../dynarmic
$ ninja
[1/36] Building CXX object src/CMakeFiles/dynarmi....dir/frontend/disassembler/disassembler_arm.cpp.o
FAILED: src/CMakeFiles/dynarmic.dir/frontend/disassembler/disassembler_arm.cpp.o
/usr/bin/c++  -DARCHITECTURE_x86_64=1 -DXBYAK_NO_OP_NAMES -I/tmp/dynarmic/src/../include -I/tmp/dynarmic/src/. -isystem /usr/local/include -I/tmp/dynarmic/externals/fmt -I/tmp/dynarmic/externals/xbyak/xbyak -Wall -Wextra -Wcast-qual -pedantic -pedantic-errors -Wfatal-errors -Wno-missing-braces -Werror -std=c++14 -MD -MT src/CMakeFiles/dynarmic.dir/frontend/disassembler/disassembler_arm.cpp.o -MF src/CMakeFiles/dynarmic.dir/frontend/disassembler/disassembler_arm.cpp.o.d -o src/CMakeFiles/dynarmic.dir/frontend/disassembler/disassembler_arm.cpp.o -c /tmp/dynarmic/src/frontend/disassembler/disassembler_arm.cpp
In file included from /tmp/dynarmic/src/frontend/disassembler/disassembler_arm.cpp:17:
/tmp/dynarmic/src/./frontend/decoder/vfp2.h:88:12: fatal error: no viable conversion from returned value of type 'optional<typename boost::decay<const Matcher<DisassemblerVisitor, unsigned int> &>::type>' to function return type 'optional<const VFP2Matcher<Dynarmic::Arm::DisassemblerVisitor> &>'
    return iter != table.end() ? boost::make_optional<const VFP2Matcher<V>&>(*iter) : boost::none;
           ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/tmp/dynarmic/src/frontend/disassembler/disassembler_arm.cpp:1073:28: note: in instantiation of function template specialization 'Dynarmic::Arm::DecodeVFP2<Dynarmic::Arm::DisassemblerVisitor>' requested here
    if (auto vfp_decoder = DecodeVFP2<DisassemblerVisitor>(instruction)) {
                           ^
/usr/local/include/boost/optional/detail/optional_reference_spec.hpp:127:5: note: candidate constructor not viable: no known conversion from 'optional<typename boost::decay<const Matcher<DisassemblerVisitor, unsigned int> &>::type>' (aka 'optional<Dynarmic::Arm::Matcher<Dynarmic::Arm::DisassemblerVisitor, unsigned int> >') to 'boost::none_t' for 1st argument
    optional(none_t) BOOST_NOEXCEPT : ptr_() {}
    ^
/usr/local/include/boost/optional/detail/optional_reference_spec.hpp:131:5: note: candidate constructor not viable: no known conversion from 'optional<typename boost::decay<const Matcher<DisassemblerVisitor, unsigned int> &>::type>' (aka 'optional<Dynarmic::Arm::Matcher<Dynarmic::Arm::DisassemblerVisitor, unsigned int> >') to 'const boost::optional<const Dynarmic::Arm::Matcher<Dynarmic::Arm::DisassemblerVisitor, unsigned int> &> &' for 1st argument
    optional(const optional& rhs) BOOST_NOEXCEPT : ptr_(rhs.get_ptr()) {}
    ^
/usr/local/include/boost/optional/detail/optional_reference_spec.hpp:165:5: note: candidate constructor not viable: no known conversion from 'optional<typename boost::decay<const Matcher<DisassemblerVisitor, unsigned int> &>::type>' (aka 'optional<Dynarmic::Arm::Matcher<Dynarmic::Arm::DisassemblerVisitor, unsigned int> >') to 'const Dynarmic::Arm::Matcher<Dynarmic::Arm::DisassemblerVisitor, unsigned int> &&' for 1st argument
    optional(T&& /* rhs */) BOOST_NOEXCEPT { detail::prevent_binding_rvalue<T&&>(); }
    ^
/usr/local/include/boost/optional/detail/optional_reference_spec.hpp:139:7: note: candidate template ignored: substitution failure [with U = boost::optional<Dynarmic::Arm::Matcher<Dynarmic::Arm::DisassemblerVisitor, unsigned int> >]: no type named 'type' in 'boost::enable_if_c<false, void>'
      optional(U& rhs, BOOST_DEDUCED_TYPENAME boost::enable_if_c<detail::is_same_decayed<T, U>::value && !detail::is_const_integral_bad_for_conversion<U>::value>::type* = 0) BOOST_NOEXCEPT
      ^                                                                                                                                                            ~~~~
/usr/local/include/boost/optional/detail/optional_reference_spec.hpp:168:65: note: candidate template ignored: disabled by 'enable_if' [with R = boost::optional<Dynarmic::Arm::Matcher<Dynarmic::Arm::DisassemblerVisitor, unsigned int> >]
        optional(R&& r, BOOST_DEDUCED_TYPENAME boost::enable_if<detail::no_unboxing_cond<T, R> >::type* = 0) BOOST_NOEXCEPT
                                                                ^
1 error generated.

Allow for W^X systems

Some systems (such as OpenBSD) enforce W^X strongly.

Tasks:

Unlock and lock code memory before and after writing to it respectively.
Xbyak provides CodeArray::protect for this purpose.

backend_x64: Incorrect behaviour in MemoryRead

Reminder to self to fix this.

Relevant code.

Possible trashing of values in registers when cb.page_table != nullptr and the abort label was jumped to. This is due to an ABI violation (not saving caller-save registers). This case is triggered when the relevant page table entry (value in rax) is nullptr and fallback is required.

Appveyor CI

Breaking the build on Windows is bad probably.

Hint instructions

Do something with hint instructions, since they normally mark busy-wait loops.

Enabling this should be a JIT-time option in UserConfig. Options of things one could implement include: Firing a callback to ask the library user what to do next, advancing time to next event, &c.

x64: Optimization: Delay memory writes to JitState

Keep values in host registers for as-long-as-possible.

This allows for better code generation in some scenarios, like in those introduced by #88.

Note that the get-set elimination optimization is less general than this optimization.

Edit:

The load part of this optimization can be done at IR level. This would simplify the implementation.

Integration Tests

Currently we posses a bunch of unit tests for every instruction but there are still things we can't test to it's full effect with ease like instruction integration, optimization passes and quality of generated blocks.

My current idea would be to generate a bunch of linker independant code using C or direct Assembly, generate objects files using GNU's Assembler for the correct target architecture. The ideal test suite would create an stub client that loads the .data sections and the .text sections and does runs on it checking the ending results of both the JIT and the interpreter.

Right now this kind of tests are being made by hand and it's a very tedious work.

Ranged invalidation of code cache

Summary

The current implementation has a ClearCache() function which destroys the entire cache.
It would be nice to support more targeted invalidation of the cache for performance reasons.

Use case

Emulating a system which regularly modifies code.
A part of proper emulation for a memory management unit.
Helps with support for performant code breakpoints in the future.

Overview

Expose a new function on Dynarmic::Jit that would perform targeted invalidation. Input would either be a single address or a range of addresses in the form (start address, length).
Implement basic block invalidation in EmitX64. Note that this would require un-patching jump locations in other blocks. See EmitX64::Patch for details, unpatching would be largely the same but doing the opposite.
Overwrite the x64 basic block in BlockOfCode with an appropriate string of 0xCCs.
(Optional) Destroy the entire cache if we run out of space. We currently allocate 128MB of code memory, this may not be sufficient if we're invalidating code blocks all the time.

A64: FDIV

Results do not agree between dynarmic and unicorn. Requires investigation.

Verify return stack buffer is being used in all possible places

An RSB pop seems to be missing in thumb16_POP.

An RSB push should be performed when executing (in both ARM and Thumb modes):

BL instruction
BLX instruction

An RSB pop should be performed when executing:

MOV pc, lr instruction (ARM and Thumb)
LDR pc instruction (ARM)
LDM sp, {.. pc} instruction (ARM)
BX r14 instruction (ARM and Thumb)
POP instruction (Thumb)

Potential regression in yuzu with Minecraft: Story Mode

Commit 2f567fe seems to introduce a regression in Minecraft: Story Mode. Prior to this commit, the game would boot to the title screen. After this commit, the game seems to softlock at the first loading sequence:

x64: Optimization: Use host flags

Emitted code currently stores individual flags in their own host GPR registers. This is an absurd waste of registers.

Three things should be done about this:

Store the flags in a compact format instead of in individual registers. sahf; seto al is a potential candidate.
Use existing values in host flags instead of reloading them from stored values. An instruction sequence like add eax, 1; setc bl; bt eax, 0; adc eax, ecx is frankly absurd.
Move instructions that depend on flags closer to where flags are produced so that register storage becomes unnecessary.

Tangential issue: Semi-relatedly, the way CPSR is updated can be optimized. (For example, pext eax, eax, 0xC101; shl eax, 28)

RegAlloc: Automate transfer between GPR and XMM registers

There is currently quite a lot of manual transfer going on in emit_x64.cpp between GPRs and XMM registers especially for the parallel instruction implementations. This is a wasteful use of FP bandwidth especially if there are a series of parallel instructions being executed in a row.

The register allocator could be more intelligent in this respect.

Potential issues: The register allocator currently assumes that all Values under its management in XMM and GPR registers must be of type F32/F64 only and U8/U16/U32/U64 only respectively.

Optimization: Support non-canonical representation for values

Note to self.

This would be beneficial when the canonical representation is inefficient due to a constant need to convert to and from that representation.

Examples of this include:

GE Flags -> These could remain as expanded XMM masks until storage in CPSR is necessary.
NZCV Flags -> These could be represented in a format such as sahf until conversion is necessary.
In some parallel instructions especially in the implementation of saturated instructions supporting a non-canonical expanded representation can be helpful.

Optimization: Ranged invalidation

@Phanto-m has a neat but probably incorrect implementation of a better optimized ranged invalidation: Phanto-m@0b522d4

Suggesion:

While I'd been trying to avoid boost::icl, it probably does provide the easiest way to speed this up.

One would need to add a split_interval_map<u32, u64> block_interval_map to EmitX64. This would map block PC ranges to UniqueHashes.

The input would be a boost::icl::interval_set<u32> invalid_intervals.

One would find the list of invalidated blocks by doing block_interval_map & invalid_intervals to find their insection. One would then need to unique this list of blocks (perhaps by insertion into a std::set), then invalidate each block.

boost::optional error on gcc 6.3.1, glibc 2.25

This prevents Citra from building on amd64:

externals/dynarmic/src/backend_x64/emit_x64.cpp: In member function ‘void Dynarmic::BackendX64::EmitX64::EmitCoprocLoadWords(Dynarmic::IR::Block&, Dynarmic::IR::Inst*)’:
externals/dynarmic/src/backend_x64/emit_x64.cpp:3178:25: error: ‘*((void*)& option +1)’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
     boost::optional<u8> option{has_option, coproc_info[5]};
                         ^~~~~~

Optimization: Multiple exits in basic blocks

Currently we only allow a single exit in a basic block.

Allowing for multiple exits would allow for longer basic blocks and the generation of better code.

Requirements:

A revamp/removal of the concept of terminal instructions is required.
A new IR instruction that conditionally jumps out of a block.

N.B.: These "longer basic blocks" no longer fit the definition of basic blocks, but are rather more like traces.

Edit: Dolphin calls this "conditional continue".

Bug CPU JIT in game Nintendogs+Cats Toy Puddle and New Friends (US region)

Minimal changes needed to run the game (async events + stub mic:u)
https://github.com/mailwl/citra/tree/toys

On arm_dyncom the game works (slow)
On arm_dynarmic the game jumps to data section to 0x7711c4 in line:
.text:00423C34 BLX R1 ; R1 should be 0x42f798
looks like R1 contains wrong address
full function:

.text:00423C20 sub_423C20
.text:00423C20                 STMFD           SP!, {R4,LR}
.text:00423C24                 LDR             R0, [R0,#4]
.text:00423C28                 LDR             R0, [R0,#0xA0]
.text:00423C2C                 LDR             R1, [R0]
.text:00423C30                 LDR             R1, [R1,#8]
.text:00423C34                 BLX             R1      ; R1 should be 0x42f798
.text:00423C38                 VLDR            S1, =0.001
.text:00423C3C                 VMUL.F32        S0, S0, S1
.text:00423C40                 LDMFD           SP!, {R4,PC}
.text:00423C40 ; End of function sub_423C20

Remove v6K-isms from IR

Prefer v8-isms.

The primary example of this is the fpscr_controlled parameter.

backend_x64: Implement a constant pool

We currently have a fixed number of constants generated by BlockOfCode::GenConstants available for use. This could be done in a more dynamic fashion; a constant constant pool requires a bit more maintenance overhead rather than specifying the constants we need inline in each Emit* function.

Use case:

The primary use-case for constants are for loading into SSE/AVX registers and as arguments for SSE/AVX operations.

Issues:

One may prefer to load constants by using instruction sequences instead of using memory bandwidth. Some constants one may want to use include:

// 0x0000000000000000 (SSE2)
pxor xmm0, xmm0

// 0xFFFFFFFFFFFFFFFF (SSE2)
pcmpeqw xmm0, xmm0

// 0x0101010101010101 (SSSE3)
pcmpeqw xmm0, xmm0
pabsb xmm0, xmm0

// 0x0101010101010101 (SSE2)
pcmpeqw xmm0, xmm0
packsswb xmm0, xmm0
psrlq xmm0, 7

// 0x8080808080808080 (SSE2)
pcmpeqw xmm0, xmm0
packsswb xmm0, xmm0

// 0x80000000 (fp single sign) (SSE2)
pcmpeqw xmm0, xmm0
pslld xmm0, 31
// and similar with psllq for double

// 0x7FFFFFFF (fp single non-sign) (SSE2)
pcmpeqw xmm0,xmm0
psrld xmm0, 1
// and similar with psrlq for double

// Most other fp constants can be generated by 
// combinations of pslld and psrld.
// Exceptions are:
// -2147483648.0, 2147483647.0, 4294967295.0

A64: Vector floating point NaN handling

NaN handling isn't accurate enough at the moment.

Example testcase:

TEST_CASE("A64: FABD", "[a64]") {
    TestEnv env;
    Dynarmic::A64::Jit jit{Dynarmic::A64::UserConfig{&env}};

    env.code_mem[0] = 0x6eb5d556; // FABD.4S V22, V10, V21
    env.code_mem[1] = 0x14000000; // B .

    jit.SetPC(0);
    jit.SetVector(10, {0xb4858ac77ff39a87, 0x9fce5e14c4873176});
    jit.SetVector(21, {0x56d3f085ff890e2b, 0x6e4b0a41801a2d00});

    env.ticks_left = 2;
    jit.Run();

    REQUIRE(jit.GetVector(22) == Vector{0x56d3f0857fc90e2b, 0x6e4b0a4144873175});
}

Detect overflow of `BlockOfCode`

Overflow occurs when near_code_ptr gets close to far_code_begin or far_code_ptr gets close to the end of the allocated region.

When this occurs a cache clear should be performed.

Relicensing Permission: GPLv2+ → 0BSD

tl;dr: Please reply to this issue with a statement saying "Yes, I give my permission to relicense my contributions to dynarmic under 0BSD."

I am planning on relicensing dynarmic under a permissive license (0BSD). Since dynarmic contains contributions from more people than myself, I am seeking permission to do so from all who have contributed. You can do so by replying to this GitHub issue.

Why switch from GPL?

Relicensing from GPL will allow dynarmic to be used in more contexts and by more people – GPL essentially forces users of this library to license the resultant combined binary under the GPL.

Since 0BSD is a very liberal license, I do not forsee having to relicense again in the future.

What if someone doesn't agree?

I hope that people will agree to this, but that is certainly not a certanty nor is everyone guaranteed to be contactable. I will remove code made by people who do not agree or are uncontactable, and expunge their commits from the git history.