Comments (19)
We can't get rid of
transform
now because of backwards compatibility, but it's not the primary way we should be exposing this data to users as they will mostly care about positions and normals.
The API hasn't been approved. We should focus on making it correct, since we aren't breaking anything approved.
In that light: the "correct" API does not return a matrix, since the underlying conceptual model does not return a pose.
I would support having a utility method that returns a pose, because that is a common use case (as a programmer, having the take the point+normal and generate a pose is "a bit of a PITA"), but framework and app programmers should really be thinking about the fact that this matrix is partially arbitrary, and they could instead pick the extra axes based on their application specific needs.
from hit-test.
I think it's still better to present the result as a pose, even if the Z and X axes are unconstrained. My goal here was to make it familiar & consistent with frame.getPose(spaceA, spaceB)
. In addition, poses are very explicit about being expressed relative to reference spaces & carry additional information with them (emulatedPosition
), whereas I thought the intent of the XRRay was to be a mathematical concept (similar to XRRigidTransform
).
I think we can explicitly specify that X and Z axes are undefined on the returned pose. The WPTs will need to be relaxed in this case, but that should be ok.
from hit-test.
I think the issue is really interesting, and I agree with both of you. 😄
I'd want getPose
because that's such a common use and if the Z and X axes are arbitrary, many folks will be just as happy letting the system choose them for them, then doing the obvious math to do it themselves.
Perhaps simply doing what a physics collision system would do, and adding getHitPoint
and getNormal
(which are the two things you are encapsulating in your getRay
) would make sense. There is the collision point, and there is the surface normal at the point of collision, and there is the systems "best guess at a reasonable pose"...
from hit-test.
Hmm, emulatedPosition is a good point.
I think XRRays are also relative to a space given the way we supply them. I think that while symmetry is valuable, it's not worth it to make the API overall worse to get that symmetry, since now the normal vector is not directly exposed. But this is kind of academic anyway since we can't make this breaking change.
Could you update the relevant tests? We basically need a hit_test_transform_equal
function that checks if positions are the same and the orientations transform the y-axis the same way.
from hit-test.
Perhaps simply doing what a physics collision system would do, and adding
getHitPoint
andgetNormal
(which are the two things you are encapsulating in yourgetRay
) would make sense. There is the collision point, and there is the surface normal at the point of collision, and there is the systems "best guess at a reasonable pose"...
I like this, this is similar to the Unity API.
What if we created an XRHitTestPose: XRPose
with these APIs (as attributes, not methods)? We continue to allow implementors to arbitrarily pick how X/Z get rotated in the transform
field.
The tests can be updated to check these and we can have additional checks that ensure that the transform correctly matches the point/normal.
from hit-test.
I think that the result of a hit test is semantically closer to XRPose (position and orientation relative to some space) than to XRRay (infinite half-line expressed in some space). There is a physical similarity between XRPose with 2 unconstrained axes and XRRay, but to me it seems like not a valid enough reason to use XRRay to represent the results of hit test. XRHitTestPose
sounds like something worth considering, but my approach to the APIs was to expose capabilities that are not available - plane normal is already encapsulated in the pose and can be extracted in multiple ways. There is a lot of helpers we could add to spec, I'm just not convinced it's where our current efforts should go to.
from hit-test.
I am not super-happy with XRRay
over XRPose
because (as you say @bialpio) it's not really "correct" either.
My preference would be to add update HitTestResult
with a getHitLocation
and getHitNormal
(or similar names) to retrieve the collision point and normal, and define the pose relative to them (as discussed above).
I think many people will want the pose, because a common use is to just display some content at the location of the hit, and letting the API choose the undefined two axes is just as good (and simpler, even) than doing it yourself. But, there will be other uses that really want the point and vector, so giving it directly, seems right.
(pardon my crap IDL, I'm sure what I put below isn't exactly right, but you see what I'm saying -- perhaps I should be passing in a space, for example).
//
// Hit Test Results
//
[SecureContext, Exposed=Window]
interface XRHitTestResult {
XRPose? getPose(XRSpace baseSpace);
DOMPointReadOnly? getPosition();
DOMPointReadOnly? getNormal();
};
from hit-test.
Yeah to be clear I'm past suggesting XRRay, I prefer Blair's idea, though I think it should be on a subclass of XRPose
, not on XRHitTestResult
, since XRHitTestResult
doesn't have a "default space" to relate to.
interface XRHitTestPose: XRPose {
DOMPointReadOnly? getPosition();
DOMPointReadOnly? getNormal();
}
from hit-test.
I was suggesting the other way to avoid creating a pose (with "junk" in it) that we don't need.
I admit I was also avoid having to waste "work" computing (and allocating) poses that you don't want or need. By the time you create a pose, the normal is just the Y vector and the position is well defined too. This concern is probably not a big deal (since there won't be piles of hit tests per frame)
from hit-test.
I was suggesting the other way to avoid creating a pose (with "junk" in it) that we don't need.
Attributes can be lazy evaluated. Implementation wise this would mean that the transform
is stored as nullable, and constructed when necessary based on the type.
The thing you proposed can work if getPosition()
and getNormal()
have an XRSpace argument. In that case I think it makes more sense to me, and i have no strong preference.
@bialpio @toji which do you prefer:
interface XRHitTestResult {
XRPose? getPose(XRSpace baseSpace);
DOMPointReadOnly? getPosition(XRSpace baseSpace);
DOMPointReadOnly? getNormal(XRSpace baseSpace);
};
vs
interface XRHitTestResult {
XRHitTestPose? getPose(XRSpace baseSpace);
}
interface XRHitTestPose : XRPose {
readonly attribute DOMPointReadOnly position;
readonly attribute DOMPointReadOnly normal;
};
from hit-test.
I actually prefer another option, which is to refrain from adding helpers until we get strong signals that they are needed. I'm worried that this is a slippery slope and we'll turn this (and the main spec) into a bunch of mathematical utilities. Especially given that it seems to me that both of those methods could be written as one-liners in JS, something along the lines of:
function getPosition(pose) {
return pose.transform.position;
// alternatively:
// return DOMMatrix.fromFloat32Array(pose.transform.matrix).transformPoint(new DOMPoint(0, 0, 0, 1));
}
function getNormal(pose) {
return DOMMatrix.fromFloat32Array(pose.transform.matrix).transformPoint(new DOMPoint(0, 1, 0, 0));
}
from hit-test.
I don't consider these helpers: I consider these to be the primary API we should have had in the first place, with .transform
being a helper, if anything. We can't get rid of transform
now because of backwards compatibility, but it's not the primary way we should be exposing this data to users as they will mostly care about positions and normals.
from hit-test.
/agenda to figure out the best path forward here
from hit-test.
Exposing a matrix in the hit test results, essentially, is requesting additional information from the platform -- orientation around the surface normal vector. Some reasonable algorithms could be explored to make this useful, such as ensuring that an object is rendered facing the device or in a way that is accessible given the user's preferences.
The problem with this; however, is that it assumes that hit testing is used primarily for placement of objects.
Many consider a hit test as a low-level concept, which can be used as a primitive composing a variety of higher-level algorithms, not just placement of an object on a surface. For example, hit tests can provide world understanding for AI actors that navigate autonomously with path-finding algorithms. They can also be used to help select appropriate impulse response samples for processing of audio reflections / reverb or for rendering large-scale-ambient-occlusion effects.
Perhaps we could say that if the intent is to have the API provide a ready-to-use matrix, positioning an object in a useful way in space, that such higher-level logic be activated by requesting an XRAnchor
with XRHitTestResult.createAnchor
.
Unlike hit tests, anchors have the added responsibility of providing consistent orientation over multiple frames, and thus expose more information.
from hit-test.
This was discussed in the call today. A major new point brought up was that hit testing is often used for orienting objects placed by the user. I'm not going to fully articulate that position for fear of accidentally misrepresenting it (but would appreciate if someone who is in that camp does so).
Currently, because a pose has additional degrees of freedom, implementations might differ on how these degrees of freedom are picked. Indeed, ARCore and thus the Chromium implementation currently constrains this with "X+ is perpendicular to the cast ray and parallel to the physical surface centered around the hit test", which is not always unique and thus it's not possible to replicate the exact behavior across browsers.
The status quo of it being underspecified is not good. If content is blindly using it, it will break when used on a browser other than the one it was tested on. If content is converting it to a ray before recreating the transform it needs, there was little point to exposing the transform in the first place!
I feel like we are generally in agreement that this should at least be specified better. Correct me if I'm wrong.
We should definitely push forward on picking a specified
Now, my unease with actually specifying it and having it be the primary entry point for the API is that I don't think there's a logical way to specify it that is actually generally useful. One logical way to specify it would be to align the X or Z axis with one of the reference space axes (basically giving you a notion of "lie flat"), except there's no indication as to which reference space to use -- the hit test was likely created with an input space and won't know whether to pick local, local-floor, unbounded, or bounded-floor.
Except as @thetuvix brought up during the call, this makes sense for placing objects on the floor, you probably want something different for placing objects on a wall! And if we pick something different, what do we do with planes that are at an angle?
Plus, X needs to be perpendicular to Y. You can take your desired rough z-direction vector and cross product twice (X' = Y ⨯ X ⨯ Y
) to extend it to be perpendicular, unless the two vectors are parallel. You can still "pick something" in this case, but ultimately the vector space is not a smooth one (it can't ever be due to the singularity introduced by the cross product!), so you'll have very jumpy behavior for the user if they orient themselves such that the rough Z direction is parallel to the normal, or, if the rough Z direction is defined in such a way that it can never be parallel to the normal (as it is in ARCore), they can then be in a situation where they orient themselves such that the Z direction is ambiguous. Ultimately, it's mathematically impossible to pick a smooth vector space for this. (This might be a more general concern -- I don't see applications being able to fix this either, but they have some freedom to paper over it in the way they'd find the smoothest for their experience)
So I'm skeptical that we can pick a decent choice of transform here that does what the user will expect for all kinds of planes, and even if we do, it will be very jumpy around a particular kind of configuration.
Personally I think most applications will want either "roughly aligned to world-space, but flat on surface", or "roughly aligned to input ray, but flat on surface" for "placing on floor", and a similar option for walls except instead of being "flat on" (y parallel to normal), it's "flat against" (z parallel to normal). This is a four-way ambiguity, not counting the fact that we don't know which world space the application wishes to use here (it can also be "facing the user"!).
My impression is that the ideal system here for placing objects seems to be some kind of smooth transition based on normal angle that's essentially about "tip the object in the direction that causes one of its coordinate planes to stick to the hit tested plane with the least angular displacement", but this doesn't fix the two-way ambiguity involved in picking where the object's coordinate planes are locked to, and that's application choice.
Given the above, to me it seems likely for applications to not like the choices we're making, and this makes me uncomfortable with transform
as the primary route for applications. I would prefer to advertise position
and normal
as a primary route, with the people using transform
being ones that explicitly look for it and thus understand what it does (and we explicitly spec out a behavior for it). Currently the discovery path leads people straight to transform
, which feels like we're creating a pit of failure with potentially unexpected behavior around walls, etc.
transform
strongly feels like a higher level utility function to me here, and the only reason i think we should have it at all is backwards compatibility. The "this is how you place objects" argument in favor of it seems to make it fall squarely in the space of it being a high level utility; frameworks should be figuring that out.
from hit-test.
The problem with this; however, is that it assumes that hit testing is used primarily for placement of objects.
Many consider a hit test as a low-level concept, which can be used as a primitive composing a variety of higher-level algorithms, not just placement of an object on a surface. For example, hit tests can provide world understanding for AI actors that navigate autonomously with path-finding algorithms. They can also be used to help select appropriate impulse response samples for processing of audio reflections / reverb or for rendering large-scale-ambient-occlusion effects.
(...)
Correct, the API (as per explainer) focuses mainly on object placement use case. Other use cases could potentially be solved by it (maybe a bit awkwardly given the subscription-based nature of the API, as with the agent navigating the environment), but I mainly looked at it through the lens of placing objects. Tying it into the anchors API is also a good side effect of the current approach, but I did not consider it when writing the spec.
I feel like we are generally in agreement that this should at least be specified better. Correct me if I'm wrong.
Agreed, this should be fully specified to enable cross-browser compat and it needs to happen irrespective of what we decide for additional properties on the hit test result, so let's start with that.
So I'm skeptical that we can pick a decent choice of transform here that does what the user will expect for all kinds of planes, and even if we do, it will be very jumpy around a particular kind of configuration.
It may not be possible in a general case (as mentioned during the call), but I think it'll be valuable to use something that works fairly well in the common case, which to me is placing an object on the floor.
I feel like we had a similar conversation when introducing a matrix property on the ray, and it's actually a good thing we have done that since now we can leverage it to specify the algorithm here as well, roughly:
- Let
X
,Y
,Z
be the axes of the coordinate systemcoord
defined by thematrix
attribute of the ray used for hit test - from the definition of the matrix,Z
should be pointing towards the ray origin. Letp
be the intersection point, andnormal
be the surface normal at the intersection point. - Calculate the angle
alpha
between plane defined bynormal
&Z
andYZ
plane. Ifnormal
=Z
, set thealpha
to 0 (normal
already lies onYZ
plane so next step is a no-op). - Rotate the coordinate system
coord
byalpha
around theZ
axis, yieldingcoord'
with axesX'
,Y'
andZ
.normal
now lies on theY'Z
plane. Note: this step should probably ensure thatnormal
lies above or on theZX'
plane - calculatingalpha
appropriately should take care of that. - Calculate the angle
beta
betweenY'
andnormal
. - Rotate the coordinate system
coord'
bybeta
around theX'
axis, yieldingcoord"
with axesX'
,Y"
andZ'
.Y"
now is equal tonormal
. - Construct a pose from
coord"
coordinate system, placing it atp
.
I believe this will work (assuming I have not messed up the math, please proofread), is resilient against the edge case of normal being equal to initial Z axis, and will make the pose's Z axis point roughly towards the origin of the ray (matching Chrome's behavior) w/o relying on any data not already available to hit test.
from hit-test.
I think this will work conceptually as a choice of transform given that backwards compatibility forces us to make one.
It still has a patched singularity around the "normal = Z" case, which means that the placed object ghost will rotate erratically around that point -- look at what happens when a normal that is slightly offset from Z swings across normal=Z to be slightly offset from z on the other side: the rotation component will abruptly turn by 180 degrees.
As I said before it's impossible to make a choice that doesn't have this problem of erratic behavior (the problem we're trying to solve here can be easily reduced to one that is proven unsolvable by the hairy ball theorem!).
This isn't my primary concern, however, applications are going to have to deal with this too -- there are stateful solutions that are possible (which we can't use), but overall this is a hard problem which we provably cannot solve and applications will find really hard to solve. Many will probably decide to live with it.
My primary concern is that we're not solving the general case here, and it's probably really hard to solve the general case well. We're exposing an attribute based on a known subset of use cases, which will behave quite badly the moment the user lifts their hand and has the ray hit a wall.
I'm not convinced that "objects on the floor" is so overwhelmingly in the majority of use cases for this as to justify a single transform
attribute that is geared to that use case only. To me this feels very much like a utility, one that really belongs in a library alongside other utilities for different kinds of object placement (ones which handle walls! ones which somewhat handle both via the "tip object" scheme I described!), and the reason we can't remove it is backwards compatibility. I still feel uncomfortable about this being the primary path for surfacing the hit test normal.
So I still think that the best path forward is to spec transform
the way you did (or some alternate way producing similar results), and then surface position/normal directly, with documentation pointing to those first, and transform
very much being a "read the warning labels first" API.
from hit-test.
@bialpio could you open a PR for standardizing your proposed choice of transform? I think this would at least fix the part of this issue that isn't contentious -- this API needs to be standardized.
One thing we should check: ARKit exposes a pose as well, does anyone know what the details behind that pose are? Is it the same choice as ARCore?
from hit-test.
@bialpio could you open a PR for standardizing your proposed choice of transform? I think this would at least fix the part of this issue that isn't contentious -- this API needs to be standardized.
On it!
One thing we should check: ARKit exposes a pose as well, does anyone know what the details behind that pose are? Is it the same choice as ARCore?
I took a quick look at ARKit's hit test result - there seem to be 3 different ways to obtain the information about the pose out of it (all matrix-based):
worldTransform
localTransform
anchor.transform
The documentation does not seem to explicitly state anything about the properties of the returned matrix, which is not ideal. I might be missing something here though.
from hit-test.
Related Issues (20)
- Is this API reliable on Chrome/Android at the moment?
- Vague definition for native entity type: point HOT 1
- Link to this repo from APA wiki is broken
- Casting a ray downwards from a virtual object to place it on the floor: possible?
- offsetRay--origin is 3D localPosition of ReferenceSpace? HOT 1
- session.requestHitTestSource not support 'unbounded' and 'bounded-floor'? HOT 1
- Return semantic label of the hit object HOT 1
- Broken references in WebXR Hit Test Module
- createAnchor function missing on XRHitTestResult HOT 3
- Use of FrozenArray<>s in dictionaries is unnecessary and bad HOT 1
- Use of FrozenArray<>s for return types of methods is probably unnecessary HOT 2
- XRRay(transform) constructor doesn't make conceptual sense HOT 4
- Current XRRay construction semantics incorrect HOT 1
- need a supported way of checking for hit-test capability HOT 9
- (from Anchor Repo) Should move discussion and API for "creating anchor from hitTestResult" to hit-test API
- Implementing Hit Test for touchscreen, controllers and moving rays HOT 15
- Question - What scenarios exist that require Hit Test as a separate module compared to facilitated through real world geometry APIs? HOT 2
- Best practices for headset and handset AR HOT 5
- API for accessing hit-test from Input Sources
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from hit-test.