Git Product home page Git Product logo

Comments (19)

blairmacintyre avatar blairmacintyre commented on August 26, 2024 3

We can't get rid of transform now because of backwards compatibility, but it's not the primary way we should be exposing this data to users as they will mostly care about positions and normals.

The API hasn't been approved. We should focus on making it correct, since we aren't breaking anything approved.

In that light: the "correct" API does not return a matrix, since the underlying conceptual model does not return a pose.

I would support having a utility method that returns a pose, because that is a common use case (as a programmer, having the take the point+normal and generate a pose is "a bit of a PITA"), but framework and app programmers should really be thinking about the fact that this matrix is partially arbitrary, and they could instead pick the extra axes based on their application specific needs.

from hit-test.

bialpio avatar bialpio commented on August 26, 2024

I think it's still better to present the result as a pose, even if the Z and X axes are unconstrained. My goal here was to make it familiar & consistent with frame.getPose(spaceA, spaceB). In addition, poses are very explicit about being expressed relative to reference spaces & carry additional information with them (emulatedPosition), whereas I thought the intent of the XRRay was to be a mathematical concept (similar to XRRigidTransform).

I think we can explicitly specify that X and Z axes are undefined on the returned pose. The WPTs will need to be relaxed in this case, but that should be ok.

from hit-test.

blairmacintyre avatar blairmacintyre commented on August 26, 2024

I think the issue is really interesting, and I agree with both of you. 😄

I'd want getPose because that's such a common use and if the Z and X axes are arbitrary, many folks will be just as happy letting the system choose them for them, then doing the obvious math to do it themselves.

Perhaps simply doing what a physics collision system would do, and adding getHitPoint and getNormal (which are the two things you are encapsulating in your getRay) would make sense. There is the collision point, and there is the surface normal at the point of collision, and there is the systems "best guess at a reasonable pose"...

from hit-test.

Manishearth avatar Manishearth commented on August 26, 2024

Hmm, emulatedPosition is a good point.

I think XRRays are also relative to a space given the way we supply them. I think that while symmetry is valuable, it's not worth it to make the API overall worse to get that symmetry, since now the normal vector is not directly exposed. But this is kind of academic anyway since we can't make this breaking change.

Could you update the relevant tests? We basically need a hit_test_transform_equal function that checks if positions are the same and the orientations transform the y-axis the same way.

from hit-test.

Manishearth avatar Manishearth commented on August 26, 2024

Perhaps simply doing what a physics collision system would do, and adding getHitPoint and getNormal (which are the two things you are encapsulating in your getRay) would make sense. There is the collision point, and there is the surface normal at the point of collision, and there is the systems "best guess at a reasonable pose"...

I like this, this is similar to the Unity API.

What if we created an XRHitTestPose: XRPose with these APIs (as attributes, not methods)? We continue to allow implementors to arbitrarily pick how X/Z get rotated in the transform field.

The tests can be updated to check these and we can have additional checks that ensure that the transform correctly matches the point/normal.

from hit-test.

bialpio avatar bialpio commented on August 26, 2024

I think that the result of a hit test is semantically closer to XRPose (position and orientation relative to some space) than to XRRay (infinite half-line expressed in some space). There is a physical similarity between XRPose with 2 unconstrained axes and XRRay, but to me it seems like not a valid enough reason to use XRRay to represent the results of hit test. XRHitTestPose sounds like something worth considering, but my approach to the APIs was to expose capabilities that are not available - plane normal is already encapsulated in the pose and can be extracted in multiple ways. There is a lot of helpers we could add to spec, I'm just not convinced it's where our current efforts should go to.

from hit-test.

blairmacintyre avatar blairmacintyre commented on August 26, 2024

I am not super-happy with XRRay over XRPose because (as you say @bialpio) it's not really "correct" either.

My preference would be to add update HitTestResult with a getHitLocation and getHitNormal (or similar names) to retrieve the collision point and normal, and define the pose relative to them (as discussed above).

I think many people will want the pose, because a common use is to just display some content at the location of the hit, and letting the API choose the undefined two axes is just as good (and simpler, even) than doing it yourself. But, there will be other uses that really want the point and vector, so giving it directly, seems right.

(pardon my crap IDL, I'm sure what I put below isn't exactly right, but you see what I'm saying -- perhaps I should be passing in a space, for example).

//
// Hit Test Results
//
[SecureContext, Exposed=Window]
interface XRHitTestResult {
  XRPose? getPose(XRSpace baseSpace);
  DOMPointReadOnly? getPosition();
  DOMPointReadOnly? getNormal();
};

from hit-test.

Manishearth avatar Manishearth commented on August 26, 2024

Yeah to be clear I'm past suggesting XRRay, I prefer Blair's idea, though I think it should be on a subclass of XRPose, not on XRHitTestResult, since XRHitTestResult doesn't have a "default space" to relate to.

interface XRHitTestPose: XRPose {
  DOMPointReadOnly? getPosition();
  DOMPointReadOnly? getNormal();
}

from hit-test.

blairmacintyre avatar blairmacintyre commented on August 26, 2024

I was suggesting the other way to avoid creating a pose (with "junk" in it) that we don't need.

I admit I was also avoid having to waste "work" computing (and allocating) poses that you don't want or need. By the time you create a pose, the normal is just the Y vector and the position is well defined too. This concern is probably not a big deal (since there won't be piles of hit tests per frame)

from hit-test.

Manishearth avatar Manishearth commented on August 26, 2024

I was suggesting the other way to avoid creating a pose (with "junk" in it) that we don't need.

Attributes can be lazy evaluated. Implementation wise this would mean that the transform is stored as nullable, and constructed when necessary based on the type.

The thing you proposed can work if getPosition() and getNormal() have an XRSpace argument. In that case I think it makes more sense to me, and i have no strong preference.


@bialpio @toji which do you prefer:

interface XRHitTestResult {
  XRPose? getPose(XRSpace baseSpace);
  DOMPointReadOnly? getPosition(XRSpace baseSpace);
  DOMPointReadOnly? getNormal(XRSpace baseSpace);
};

vs

interface XRHitTestResult {
  XRHitTestPose? getPose(XRSpace baseSpace);
}

interface XRHitTestPose : XRPose {
  readonly attribute DOMPointReadOnly position;
  readonly attribute DOMPointReadOnly normal;
};

from hit-test.

bialpio avatar bialpio commented on August 26, 2024

I actually prefer another option, which is to refrain from adding helpers until we get strong signals that they are needed. I'm worried that this is a slippery slope and we'll turn this (and the main spec) into a bunch of mathematical utilities. Especially given that it seems to me that both of those methods could be written as one-liners in JS, something along the lines of:

function getPosition(pose) {
  return pose.transform.position;
  // alternatively:
  // return DOMMatrix.fromFloat32Array(pose.transform.matrix).transformPoint(new DOMPoint(0, 0, 0, 1));
}

function getNormal(pose) {
  return DOMMatrix.fromFloat32Array(pose.transform.matrix).transformPoint(new DOMPoint(0, 1, 0, 0));
}

from hit-test.

Manishearth avatar Manishearth commented on August 26, 2024

I don't consider these helpers: I consider these to be the primary API we should have had in the first place, with .transform being a helper, if anything. We can't get rid of transform now because of backwards compatibility, but it's not the primary way we should be exposing this data to users as they will mostly care about positions and normals.

from hit-test.

Manishearth avatar Manishearth commented on August 26, 2024

/agenda to figure out the best path forward here

from hit-test.

kearwood avatar kearwood commented on August 26, 2024

Exposing a matrix in the hit test results, essentially, is requesting additional information from the platform -- orientation around the surface normal vector. Some reasonable algorithms could be explored to make this useful, such as ensuring that an object is rendered facing the device or in a way that is accessible given the user's preferences.

The problem with this; however, is that it assumes that hit testing is used primarily for placement of objects.

Many consider a hit test as a low-level concept, which can be used as a primitive composing a variety of higher-level algorithms, not just placement of an object on a surface. For example, hit tests can provide world understanding for AI actors that navigate autonomously with path-finding algorithms. They can also be used to help select appropriate impulse response samples for processing of audio reflections / reverb or for rendering large-scale-ambient-occlusion effects.

Perhaps we could say that if the intent is to have the API provide a ready-to-use matrix, positioning an object in a useful way in space, that such higher-level logic be activated by requesting an XRAnchor with XRHitTestResult.createAnchor.

Unlike hit tests, anchors have the added responsibility of providing consistent orientation over multiple frames, and thus expose more information.

from hit-test.

Manishearth avatar Manishearth commented on August 26, 2024

This was discussed in the call today. A major new point brought up was that hit testing is often used for orienting objects placed by the user. I'm not going to fully articulate that position for fear of accidentally misrepresenting it (but would appreciate if someone who is in that camp does so).

Currently, because a pose has additional degrees of freedom, implementations might differ on how these degrees of freedom are picked. Indeed, ARCore and thus the Chromium implementation currently constrains this with "X+ is perpendicular to the cast ray and parallel to the physical surface centered around the hit test", which is not always unique and thus it's not possible to replicate the exact behavior across browsers.

The status quo of it being underspecified is not good. If content is blindly using it, it will break when used on a browser other than the one it was tested on. If content is converting it to a ray before recreating the transform it needs, there was little point to exposing the transform in the first place!

I feel like we are generally in agreement that this should at least be specified better. Correct me if I'm wrong.

We should definitely push forward on picking a specified


Now, my unease with actually specifying it and having it be the primary entry point for the API is that I don't think there's a logical way to specify it that is actually generally useful. One logical way to specify it would be to align the X or Z axis with one of the reference space axes (basically giving you a notion of "lie flat"), except there's no indication as to which reference space to use -- the hit test was likely created with an input space and won't know whether to pick local, local-floor, unbounded, or bounded-floor.

Except as @thetuvix brought up during the call, this makes sense for placing objects on the floor, you probably want something different for placing objects on a wall! And if we pick something different, what do we do with planes that are at an angle?

Plus, X needs to be perpendicular to Y. You can take your desired rough z-direction vector and cross product twice (X' = Y ⨯ X ⨯ Y) to extend it to be perpendicular, unless the two vectors are parallel. You can still "pick something" in this case, but ultimately the vector space is not a smooth one (it can't ever be due to the singularity introduced by the cross product!), so you'll have very jumpy behavior for the user if they orient themselves such that the rough Z direction is parallel to the normal, or, if the rough Z direction is defined in such a way that it can never be parallel to the normal (as it is in ARCore), they can then be in a situation where they orient themselves such that the Z direction is ambiguous. Ultimately, it's mathematically impossible to pick a smooth vector space for this. (This might be a more general concern -- I don't see applications being able to fix this either, but they have some freedom to paper over it in the way they'd find the smoothest for their experience)

So I'm skeptical that we can pick a decent choice of transform here that does what the user will expect for all kinds of planes, and even if we do, it will be very jumpy around a particular kind of configuration.

Personally I think most applications will want either "roughly aligned to world-space, but flat on surface", or "roughly aligned to input ray, but flat on surface" for "placing on floor", and a similar option for walls except instead of being "flat on" (y parallel to normal), it's "flat against" (z parallel to normal). This is a four-way ambiguity, not counting the fact that we don't know which world space the application wishes to use here (it can also be "facing the user"!).

My impression is that the ideal system here for placing objects seems to be some kind of smooth transition based on normal angle that's essentially about "tip the object in the direction that causes one of its coordinate planes to stick to the hit tested plane with the least angular displacement", but this doesn't fix the two-way ambiguity involved in picking where the object's coordinate planes are locked to, and that's application choice.


Given the above, to me it seems likely for applications to not like the choices we're making, and this makes me uncomfortable with transform as the primary route for applications. I would prefer to advertise position and normal as a primary route, with the people using transform being ones that explicitly look for it and thus understand what it does (and we explicitly spec out a behavior for it). Currently the discovery path leads people straight to transform, which feels like we're creating a pit of failure with potentially unexpected behavior around walls, etc.

transform strongly feels like a higher level utility function to me here, and the only reason i think we should have it at all is backwards compatibility. The "this is how you place objects" argument in favor of it seems to make it fall squarely in the space of it being a high level utility; frameworks should be figuring that out.

from hit-test.

bialpio avatar bialpio commented on August 26, 2024

The problem with this; however, is that it assumes that hit testing is used primarily for placement of objects.

Many consider a hit test as a low-level concept, which can be used as a primitive composing a variety of higher-level algorithms, not just placement of an object on a surface. For example, hit tests can provide world understanding for AI actors that navigate autonomously with path-finding algorithms. They can also be used to help select appropriate impulse response samples for processing of audio reflections / reverb or for rendering large-scale-ambient-occlusion effects.
(...)

Correct, the API (as per explainer) focuses mainly on object placement use case. Other use cases could potentially be solved by it (maybe a bit awkwardly given the subscription-based nature of the API, as with the agent navigating the environment), but I mainly looked at it through the lens of placing objects. Tying it into the anchors API is also a good side effect of the current approach, but I did not consider it when writing the spec.

I feel like we are generally in agreement that this should at least be specified better. Correct me if I'm wrong.

Agreed, this should be fully specified to enable cross-browser compat and it needs to happen irrespective of what we decide for additional properties on the hit test result, so let's start with that.

So I'm skeptical that we can pick a decent choice of transform here that does what the user will expect for all kinds of planes, and even if we do, it will be very jumpy around a particular kind of configuration.

It may not be possible in a general case (as mentioned during the call), but I think it'll be valuable to use something that works fairly well in the common case, which to me is placing an object on the floor.

I feel like we had a similar conversation when introducing a matrix property on the ray, and it's actually a good thing we have done that since now we can leverage it to specify the algorithm here as well, roughly:

  1. Let X,Y, Z be the axes of the coordinate system coord defined by the matrix attribute of the ray used for hit test - from the definition of the matrix, Z should be pointing towards the ray origin. Let p be the intersection point, and normal be the surface normal at the intersection point.
  2. Calculate the angle alpha between plane defined by normal & Z and YZ plane. If normal = Z, set the alpha to 0 (normal already lies on YZ plane so next step is a no-op).
  3. Rotate the coordinate system coord by alpha around the Z axis, yielding coord' with axes X', Y' and Z. normal now lies on the Y'Z plane. Note: this step should probably ensure that normal lies above or on the ZX' plane - calculating alpha appropriately should take care of that.
  4. Calculate the angle beta between Y' and normal.
  5. Rotate the coordinate system coord' by beta around the X' axis, yielding coord" with axes X', Y" and Z'. Y" now is equal to normal.
  6. Construct a pose from coord" coordinate system, placing it at p.

I believe this will work (assuming I have not messed up the math, please proofread), is resilient against the edge case of normal being equal to initial Z axis, and will make the pose's Z axis point roughly towards the origin of the ray (matching Chrome's behavior) w/o relying on any data not already available to hit test.

from hit-test.

Manishearth avatar Manishearth commented on August 26, 2024

I think this will work conceptually as a choice of transform given that backwards compatibility forces us to make one.

It still has a patched singularity around the "normal = Z" case, which means that the placed object ghost will rotate erratically around that point -- look at what happens when a normal that is slightly offset from Z swings across normal=Z to be slightly offset from z on the other side: the rotation component will abruptly turn by 180 degrees.

As I said before it's impossible to make a choice that doesn't have this problem of erratic behavior (the problem we're trying to solve here can be easily reduced to one that is proven unsolvable by the hairy ball theorem!).

This isn't my primary concern, however, applications are going to have to deal with this too -- there are stateful solutions that are possible (which we can't use), but overall this is a hard problem which we provably cannot solve and applications will find really hard to solve. Many will probably decide to live with it.


My primary concern is that we're not solving the general case here, and it's probably really hard to solve the general case well. We're exposing an attribute based on a known subset of use cases, which will behave quite badly the moment the user lifts their hand and has the ray hit a wall.

I'm not convinced that "objects on the floor" is so overwhelmingly in the majority of use cases for this as to justify a single transform attribute that is geared to that use case only. To me this feels very much like a utility, one that really belongs in a library alongside other utilities for different kinds of object placement (ones which handle walls! ones which somewhat handle both via the "tip object" scheme I described!), and the reason we can't remove it is backwards compatibility. I still feel uncomfortable about this being the primary path for surfacing the hit test normal.

So I still think that the best path forward is to spec transform the way you did (or some alternate way producing similar results), and then surface position/normal directly, with documentation pointing to those first, and transform very much being a "read the warning labels first" API.

from hit-test.

Manishearth avatar Manishearth commented on August 26, 2024

@bialpio could you open a PR for standardizing your proposed choice of transform? I think this would at least fix the part of this issue that isn't contentious -- this API needs to be standardized.

One thing we should check: ARKit exposes a pose as well, does anyone know what the details behind that pose are? Is it the same choice as ARCore?

from hit-test.

bialpio avatar bialpio commented on August 26, 2024

@bialpio could you open a PR for standardizing your proposed choice of transform? I think this would at least fix the part of this issue that isn't contentious -- this API needs to be standardized.

On it!

One thing we should check: ARKit exposes a pose as well, does anyone know what the details behind that pose are? Is it the same choice as ARCore?

I took a quick look at ARKit's hit test result - there seem to be 3 different ways to obtain the information about the pose out of it (all matrix-based):

  • worldTransform
  • localTransform
  • anchor.transform

The documentation does not seem to explicitly state anything about the properties of the returned matrix, which is not ideal. I might be missing something here though.

from hit-test.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.