Git Product home page Git Product logo

uam's Introduction

UAM - deko3d shader compiler

Usage: uam [options] file
Options:
  -o, --out=<file>   Specifies the output deko3d shader module file (.dksh)
  -r, --raw=<file>   Specifies the file to which output raw Maxwell bytecode
  -t, --tgsi=<file>  Specifies the file to which output intermediary TGSI code
  -s, --stage=<name> Specifies the pipeline stage of the shader
                     (vert, tess_ctrl, tess_eval, geom, frag, comp)
  -v, --version      Displays version information

UAM is the shader compiler designed to produce precompiled DKSH shaders usable with the deko3d graphics API, specifically for the Nvidia Tegra X1 processor found inside the Nintendo Switch.

UAM is based on mesa's GLSL parser and TGSI infrastructure; as well as nouveau's nv50_ir code generation backend. As such, it inherits all the capabilities and the feature set (GLSL extension support) offered by mesa/nouveau for GM20x GPUs. In addition, there are a number of customizations and codegen improvements that produce code better suited for use with deko3d.

Differences with standard GL and mesa/nouveau

  • The DEKO3D preprocessor symbol is defined, with a value of 100.
  • UBO, SSBO, sampler and image bindings are required to be explicit (i.e. layout (binding = N)), and they have a one-to-one correspondence with deko3d bindings. Failure to specify explicit bindings will result in an error.
  • There is support for 16 UBOs, 16 SSBOs, 32 "samplers" (combined image+sampler handle), and 8 images for each and every shader stage; with binding IDs ranging from zero to the corresponding limit minus one. However note that due to hardware limitations, only compute stage UBO bindings 0-5 are natively supported, while 6-15 are emulated as "SSBOs".
  • Default uniforms outside UBO blocks (which end up in the internal driver const buffer) are detected, however they are reported as an error due to lack of support in both DKSH and deko3d for retrieving the location of and setting these uniforms.
  • Internal deko3d constbuf layout and numbering schemes are used, as opposed to nouveau's.
  • gl_FragCoord always uses the Y axis convention specified in the flags during the creation of a deko3d device. layout (origin_upper_left) has no effect whatsoever and produces a warning, while layout (pixel_center_integer) is not supported at all and produces an error.
  • Integer divisions and modulo operations with non-constant divisors decay to floating point division, and generate a warning. Well written shaders should avoid these operations for performance and accuracy reasons. (Also note that unmodified nouveau, in order to comply with the GL standard, emulates integer division/module with a software routine that has been removed in UAM)
  • 64-bit floating point divisions and square roots can only be approximated with native hardware instructions. This results in loss of accuracy, and as such these operations should be avoided, and they generate a warning as well. (Also note that likewise, unmodified nouveau uses a software routine that has been removed in UAM)
  • Transform feedback is not supported.
  • GLSL shader subroutines (ARB_shader_subroutine) are not supported.
  • There is no concept of shader linking. Separable programs (ARB_separate_shader_objects) are always in effect.
  • The compiler is based on mesa 19.0.8 sources; however several cherrypicked bugfixes from mesa 19.1 and up have been applied.
  • Numerous codegen differences:
    • Added Maxwell dual issue scheduling support based on the groundwork laid out by karolherbst's dual_issue_v3 branch, and enhanced with new experimental findings.
    • Removed bound checks in SSBO accesses.
    • Removed bound checks in atomic accesses.
    • Removed bound checks in image accesses.
    • Multisampled texture lookups use optimized bitwise logic with hardcoded sample positions instead of requiring helper data in the driver constbuf.
    • Multisampled image operations use TXQ instead of requiring helper data in the driver constbuf.
    • Non-bindless image operations are supported natively instead of being emulated with bindless operations.
    • SSBO size calculations use unsigned math instead of signed math, which results in better codegen.
    • ballotARB() called with a constant argument now results in optimal codegen using the PT predicate register.
    • Bugfixes:
      • Bindless texture queries were broken.
      • IMAD instruction encoding with negated operands was broken.
    • Minor changes done to match properties observed in official shader code:
      • MOV Rd,RZ is now preferred to MOV32I Rd,0.
      • LDG/STG instructions are used for SSBO accesses instead of LD/ST.
      • Shader programs are properly padded out to a size that is a multiple of 64 bytes.

uam's People

Contributors

fincs avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

uam's Issues

Scheduling, texture names, optimization, and other questions.

Hello!

I've been trying to use Uam to reinsert customized shaders into Smash Brothers Ultimate. (All the shaders are in raw binary btw, and I've been using the associated option)

While your tool has been a godsend, I do have quite a few questions on how it works, its limitations, etc. If you have a better way for me to reach ya, please let me know, but for now:

ONE - SCHEDULING;

This is the big one. Now, Uam is definitely really helpful. I tried to write a simple shader a while back, by converting my own code to raw bytes- but figuring out how to write the scheduling instructions kinda killed it for me. And if I ignored them, it gave me weird and awful results in game.

But when I tried using Uam, the scheduling instructions seemed to work like a dream.

However...it isn't perfect, and I'm unsure of it's limitations.

I've recently annotated a giant shader with hundreds of variables, and as a test, I modified it to fit within the length limit, before converting it via Uam. And after some tweaking to get the texture names and calls to c1 right, I put it back into the game.

And....it MOSTLY worked like I expected it to.

Mostly being the operative word.

While in general it was what I expected, with big enough models, some of it would have...the best way I can describe it, is static. It looks similar to Z-fighting (though I doubt it is Z-fighting, given I didn't do any model editing), and is something I think I saw when I was trying to write the binary manually.

In other words, while it's mostly accurate, that "static" of sorts is still there, I think is likely to do with scheduling.

Let me be clear: I am not mocking Uam for being bad at scheduling or anything. It's still a godsend, something I could only dream of writing, and it's really difficult in general to not only reverse engineer formats, but make your own compilers.

However, given that, I do want to ask about well....the more specific strengths and weaknesses, boundaries and limitations, of Uam's scheduling.
Currently the only way for me to figure out those things is by trial and error, so I'd like to ask about them directly. What can it do, what can't it do, and what are the best ways to make sure no scheduling issues occur?

That leads us to

TWO: OPTIMIZATION

All the questions I have about the scheduling also apply to optimization.

Smash Ultimate's Shaders have a length limit per shader, and nobody has figured out how to change said limit yet. And usually, converting the binary to glsl, and then converting that glsl back to binary (via Uam) ends up with binary that's longer than the original.

Which again, I fully get. We aren't going to get as optimized as the original tools for a LONG time, that's a given.

However, when I was cutting down the annotated code mentioned earlier in order to fit within said length limit (by making it only use 1 channel from the actual color texture), I noticed that it was still a bit above what I expected from Uam in terms of size.

But on a hunch, I tried rewriting it - specifically by making more things use FMA - and it started to shrink down. Eventually, to the point that I was able to get it back to the length limit.

But like scheduling, that begs the question of how it works. What are the best optimizations for the compiler? What tips should I know when writing code for it? What does it already do in terms of optimization? What shouldn't I do? Etc etc etc.

Like with scheduling, the only way I can tell is through trial and error, so I would rather try asking directly.

Again, not knocking Uam here, I just want to ask for the best methods to use in terms of that.

and finally (for now)

THREE: TEXTURE NAMES

For some reason, Uam does seem to mess up the texture names. The binary to glsl converter I'm working with, ryujinx.shadertools, has all texture names converted to "fp_tex_tcb_X", with X being the value its marked with in the code. But when I compile something with Uam and view it in envydis or convert it via ryujinx.shadertools, it has the format of '1AY", with Y being a completely different value.

Now, while I got that fp_tex_tcb_X format wouldn't be read, I then tried tex_X, textureX, and texture_X, and it still didn't work. So I have no clue what I should write in order for uam to compile it with the correct values.

So do I want to ask how that works, and how I should name em in order to get those correct values.

CONCLUSION:

I'm sorry for the length of this, I am....very long winded as a person. But yeah, I wanted to actually just try to ask all this and hope I get an answer. Please LMK if there is a better way to contact you or ask this sorta thing. Sent ya a twitter follow, if that helps.

And apologies if there are other people working on Uam besides Fincs. They're the only one I saw listed besides DevKitPro themselves, but yeah.

Either way, thank you, and I hope to hear back from you soon.

-Trainmaster

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.