Git Product home page Git Product logo

Comments (16)

yoanlcq avatar yoanlcq commented on August 11, 2024 4

I tried to understand the issue with the cube sample. It turns out that sceGumPerspective() is the culprit in this particular case, because the remainder of the code works just fine.
Since the sample links against pspgum and not pspgum_vfpu, the source for sceGumPerspective() is the following :

void sceGumPerspective(float fovy, float aspect, float near, float far)
{
	float angle = (fovy / 2) * (GU_PI/180.0f);
	float cotangent = cosf(angle) / sinf(angle);
	float delta_z = near-far;
	register ScePspFMatrix4* t __asm("a0") = GUM_ALIGNED_MATRIX();

	memset(t,0,sizeof(ScePspFMatrix4));
	t->x.x = cotangent / aspect;
	t->y.y = cotangent;
	t->z.z = (far + near) / delta_z; // -(far + near) / delta_z
	t->w.z = 2.0f * (far * near) / delta_z; // -2 * (far * near) / delta_z
	t->z.w = -1.0f;
	t->w.w = 0.0f;

	sceGumMultMatrix(t);
}

With

// these macros are because GCC cannot handle aligned matrices declared on the stack
#define GUM_ALIGNED_MATRIX() (ScePspFMatrix4*)((((unsigned int)alloca(sizeof(ScePspFMatrix4)+64)) + 63) & ~63)
#define GUM_ALIGNED_VECTOR() (ScePspFVector4*)((((unsigned int)alloca(sizeof(ScePspFVector4)+64)) + 63) & ~63)

Debugging in psp-gdb with PSPLINK shows that a store exception happens at the assignment immediately following the call to memset(). From the disassembly, I'm assuming that alloca() did resolve to GCC's __builtin_alloca(), because there's no jal alloca instruction. I also see that the call to memset() was indeed compiled as a library call, and was not inlined by GCC.

Here's the thing: newlib's memset() clobbers the a0 register, meaning that t points to garbage after the call. PSPSDK libc's memset() does not; I've reached this conclusion by looking at the disassembly of both functions in psp-gdb.

It looks to me like the fix would be to remove the __asm("a0") part when declaring t. Why would that be so important? In fact I don't see the point in doing an alloca() either. ScePspFMatrix4 is declared with an alignment of 16. If "GCC cannot handle aligned matrices declared on the stack" as the comment says, then isn't all user code screwed anyway? That looks unlikely to me.

What I would do is simply declare t just like a regular variable, and spare the call to memset() altogether via zero-initialization. Is there something critical I'm missing?

Worth noting is that sceGumOrtho() does a similar trick.

from pspsdk.

albe avatar albe commented on August 11, 2024 4

"raphael" here from old ps2dev days. I spent a couple years playing with digging into the VFPU opcodes and also added some GE related code to the pspsdk.

From what I recall from back then (that's been ages 😱), GCC had a couple of bugs, especially with alignment of variables on stack. Since the VFPU, but also the GE was pretty strict with the alignment requirements, this hack with aligning the matrix with alloca was necessary, though I'm not sure why the tacking onto the "a0" register was in place.

from pspsdk.

dialupnoises avatar dialupnoises commented on August 11, 2024 2

Can confirm that removing that __asm("a0") bit fixes the issue.

from pspsdk.

carstene1ns avatar carstene1ns commented on August 11, 2024

Just a quick guess:
Did you build them with a (recent) toolchain version that has the fix for double precision crash?
See pspdev/psptoolchain#85 and pspdev/psptoolchain#65 for reference.

from pspsdk.

ijacquez avatar ijacquez commented on August 11, 2024

I just double checked, and I'm at a435ee0df19d8e07fa4c0a01f1f47fb7aeca834d, which is the HEAD.

from pspsdk.

yoanlcq avatar yoanlcq commented on August 11, 2024

This looks similar to the situation I was in when I started out (that is, recently), and I remember that setting USE_PSPSDK_LIBC = 1 in the Makefiles (or passing it as argument to make) did the trick for me.
Some seemingly basic floating-point operations caused a crash otherwise (these were easy to figure out by sprinkling printf() calls through one of the samples).

I didn't dig too deep and this does sound like a bug, but hopefully that could help.

(edit: I'm always building with PSP_FW_VERSION = 635.)

from pspsdk.

ijacquez avatar ijacquez commented on August 11, 2024

Thanks for that. I'll give it a try.

from pspsdk.

ijacquez avatar ijacquez commented on August 11, 2024

First run of cube.prx and a few other tests I've marked as failed do work.

Yeah, this may be a bug. Can anyone confirm?

from pspsdk.

carstene1ns avatar carstene1ns commented on August 11, 2024

So, something wrong with our newlib? Very strange.

from pspsdk.

ijacquez avatar ijacquez commented on August 11, 2024

I haven't checked, but Newlib being built with floating point support?

from pspsdk.

ijacquez avatar ijacquez commented on August 11, 2024

Yeah, this all seems rather odd. I'm fairly sure that __attribute__ ((aligned(64))) would be sufficient for ScePspFVector4?

from pspsdk.

yoanlcq avatar yoanlcq commented on August 11, 2024

I'm fairly sure that attribute ((aligned(64))) would be sufficient for ScePspFVector4?

Maybe you meant ScePspFMatrix4? Such a requirement on ScePspFVector4 would forbid contiguous arrays of it... Or maybe you mean putting the attribute at the declaration site instead of the type itself, which looks reasonable (way better than resorting to alloca()... yikes!)

From what I gather, 16-byte alignment is a requirement for VFPU "unaligned" load/stores (ulv.q/usv.q), and 64-byte alignment is a requirement for VFPU "aligned" load/stores (lv.q/sv.q) (none of which pspgum actually needs....)

There are these pieces of info in the chapter 4 of Hitmen's Yet Another PSP Documentation:

VFPU load/store instructions seem to support only 16-byte-aligned accesses (similiar to Altivec and SSE).

But right below, in the section for lv and sv (Load/Store Vector Quadword) instructions :

Final Address needs to be 64-byte aligned.

That second statement was confusing to me at first, but I think I "got it" by looking at how pspgum_vfpu.c manages its loads and stores.

About the a0 stuff, I'm surprised GCC doesn't do "the right thing" by saving it or something, if it knows that the ABI allows clobbering it, but oh well. This makes me wonder which other pieces of code are doing such a thing in the SDK, probably accidents waiting to happen.
In any case let's just remove that __asm("a0") and call it a day; that's the only problematic part really x).

(For posterity, I think this issue's title should be changed to match what we know now :) )

from pspsdk.

ijacquez avatar ijacquez commented on August 11, 2024

Yes, I referred to the wrong one. Adding the attribute to the declaration site is preferred.

I'll run a few tests, removing the need for the a0 register.

from pspsdk.

ijacquez avatar ijacquez commented on August 11, 2024

Thanks!

Do we at least know when that GCC bug occurred? If there is still a need for backwards compatibility with older versions of GCC, then we could wrap these workarounds around a pre-processor GCC version check.

from pspsdk.

albe avatar albe commented on August 11, 2024

Uhhh, I don't really remember which GCC version exactly I used back then. I roughly remember something 4.0 or 4.1 at least at some point, which would roughly fit https://gcc.gnu.org/releases.html and the fact that I did a lot of that deeper stuff in 2006-07 (see e.g. http://lukasz.dk/mirror/forums.ps2dev.org/viewtopic35a4.html?t=6929#47791)

But if the removal of just the __asm("a0") part fixes things, that should be fine from my gut-feeling.

from pspsdk.

carstene1ns avatar carstene1ns commented on August 11, 2024

I have pushed a commit that has the removes the assembly register call. Tested some examples in PPSSPP (cube example even on hardware) and all worked.
Would be nice if we can get some other hacks (like the alloca() stuff mentioned earlier) out of the SDK in the future, but we need some contributors.

Closing for now.

from pspsdk.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.