Dear Simon, I am reading your code of preprocessing. I have two questions in my mi

About RootTransformer,about simonalexanderson/stylegestures

Comments (12)

simonalexanderson commented on July 17, 2024 2

Regarding the first question about dimensions, your understanding is correct.

Regarding the second question:

Your assumption is incorrect. (dx,dz, dr) is part of the output and is continuously generated by the model.
In our experiments, we used the (dx, dz, dr) paths from the test data. In practical applications, this would be entered by hand, e.g. by setting (dx, dz, dr)=(0,0,0) for standing still, or by specifying a curve in some editor. Other possibilities may be to copy and loop some snippets with desired stepping behaviour from the original data, or to have in generated procedurally. It's really up to you and your application.

from stylegestures.

simonalexanderson commented on July 17, 2024 1

In fact, these are two separate processing methods. "absolute_translation_deltas" is part of the original PyMO library (see https://github.com/omimo/PyMO), and "pos_rot_deltas" was added by me. I have not analysed the "absolute_translation_deltas" in-depth , but I think you are right in that it centers the motion without taking respect to rotation. The "pos_rot_deltas" splits the hip-motion into 6 "root"-centric coordinates local to a ground-projected, forward-directed "root" coordinate system, and 3 coordinates describing the delta-translation and y-rotation (dx, dz, dr) of this "root" coordinate system.

The code for creating the FB-C condition may be missing, but is very simple to reproduce: Just take the last three features (dx,dz,dr) from the processed output features and move them to the input features.

from stylegestures.

ghenter commented on July 17, 2024

To my humble understanding, I guess "abdolute_translation_deltas" only try to introduce some features on body movement without any feature on body rotation? But "pos_rot_deltas" try to cover both movement and rotation, is that correct?

The three deltas correspond to the change in forward and lateral displacement of the root node in a body-centric coordinate system, as well as the change in heading (a rotation), between frames. This is described in Section 4.1 of our MoGlow paper on arXiv and in Habibie et al. (2017). Even though I did not write that code, I feel confident in saying that the between-frame displacements are the "absolute_translation_deltas", and the changes in heading are the "pos_rot_deltas".

In your paper, you mentioned that, you have two models, one is FB-U, the other is FB-C. My question is where I can find a example code for you to construct the control vector?

The difference between FB-U and FB-C is that the "pos_rot_deltas" and "absolute_translation_deltas" are control-signal inputs (concatenated with the other per-frame control information) for FB-C, but they instead constitute model outputs for FB-U (they are concatenated with the per-frame pose information instead). This means that the code that constructs the "pos_rot_deltas" and "absolute_translation_deltas" is part of the code that creates the control-signal vectors for FB-C, and also is needed for FB-U, although FB-U uses the same control signal as MG.

While I do not know where in the code the concatenations mentioned above occur, this might give you an idea of where to search for the relevant code; for instance, it must come after those deltas are created. (And if you cannot find the code you need, re-putting in a tensor concatenation should not be too difficult to do either.)

from stylegestures.

ghenter commented on July 17, 2024

Thanks for correcting my misunderstanding, Simon.

from stylegestures.

kelvinqin commented on July 17, 2024

Gustav,
Thanks for all the comments, very helpful,
Kelvin

from stylegestures.

kelvinqin commented on July 17, 2024

In fact, these are two separate processing methods. "absolute_translation_deltas" is part of the original PyMO library (see https://github.com/omimo/PyMO), and "pos_rot_deltas" was added by me. I have not analysed the "absolute_translation_deltas" in-depth , but I think you are right in that it centers the motion without taking respect to rotation. The "pos_rot_deltas" splits the hip-motion into 6 "root"-centric coordinates local to a ground-projected, forward-directed "root" coordinate system, and 3 coordinates describing the delta-translation and y-rotation (dx, dz, dr) of this "root" coordinate system.

The code for creating the FB-C condition may be missing, but is very simple to reproduce: Just take the last three features (dx,dz,dr) from the processed output features and move them to the input features.

Simon,
Thanks so much, I understand most of your comments.
You knew what, in my head I have always think that you are the owner of PyMO :-), I think you did an excellent job to extend it.

Just a little question on the last sentence in your comments. "Just take the last three features (dx,dz,dr) from the processed output features and move them to the input features.", could you be a little more detail here? I can not understand what you mean by "processed output features" and "the input features".

Thanks a lot,
Kelvin

from stylegestures.

ghenter commented on July 17, 2024

"Just take the last three features (dx,dz,dr) from the processed output features and move them to the input features.", could you be a little more detail here? I can not understand what you mean by "processed output features" and "the input features".

As defined in our research paper, a MoGlow system takes a sequence c of control vectors c_t as input and produces output motion x_t as a sequence of individual poses x_t (here, underscores denote subscripts). My best understanding of what Simon is saying is that you should take the three last elements of pos_rot_deltas for every frame t and append them to the existing control vector at that frame to obtain the complete control vector c_t, in order to replicate the setup of FB-C used in training and synthesis.

When Simon explicitly says that you should "move" the three elements, I think he means that these three elements of pos_rot_deltas should not be included in the output vector x_t in training and synthesis (so disable any code that concatenates those three elements to it, if such code exists). It is only relevant to include those elements of pos_rot_deltas in the pose vectors x_t when replicating the FB-U system.

I hope this answers your question.

from stylegestures.

kelvinqin commented on July 17, 2024

Dear Ghenter,
With more detailed reading of the code and with your great comments, now I have a better understanding on my previous question on Simon's statement "Just take the last three features (dx,dz,dr) from the processed output features and move them to the input features.".

For training, this method is clear, but for synthesizing, how did you construct those 3 elements (dx, dz and dr)? For example, if I want to generate a 4seconds fullbody clip, how should I construct the sequence of (dx,dz, dr) (length = 4*20fps - 80)?

Look forward to your sharing,

Have a nice day,

from stylegestures.

kelvinqin commented on July 17, 2024

Dear Gustav and Simon,
Two other related questions:

My first question is about what is the train_input and train_output for each of those two models? :
1> FB-U, In my current understanding and guessing:
train_input: 27 dim of melspectrum
train_output: emp of selected joints + (dx,dz,dr)

2> FB-C, In my current understanding and guessing:
train_input: 27 dim of melspectrum + (dx, dz, dr)
train_output: emp of selected joints

My second question is how did you construct the control signal in synthesizing phase for FB-U and FB-C, in order to make it clear, I read your paper in section 4.5 (full body synthesis).
1> FB-U, In my current understanding and guessing:
You copy through the (dx, dz, dr) from the original bvh? (https://youtu.be/egf3tjbWBQE?t=150)

2> FB-C: In my current understanding and guessing:
This is unclear for me, please teach me how did you do that. By looking at the video, the controlled foot-stepping and body-stance looks great and very natural. (https://youtu.be/egf3tjbWBQE?t=163)

Look forward to your sharing, thanks in advance,
Kelvin

from stylegestures.

kelvinqin commented on July 17, 2024

Simon,
Thanks so much for your correction about my understanding of the logic. On style-control and also FB-C, your work is very practical in terms of meeting real requirement of talking avatar.

Can I ask one more visualization related question in high level? ---
Eventually the synthesized body should have a consistent facial expression and synchronized lip movement, so actually I have been working on face-mesh generation for a while (I extracted face-mesh from video and generated face mesh according to speech, in one of my chat with Taras, I attached a demo of my face-mesh generation system: https://user-images.githubusercontent.com/10486482/103012202-31dda980-4576-11eb-963d-5bf89224833e.mp4).

But a problem in my mind is how to integrate the generated face-mesh with the animation character. Do you have any experience or vision on the best way to support face or lip please?

Have a nice day,
Kelvin

from stylegestures.

ghenter commented on July 17, 2024

a problem in my mind is how to integrate the generated face-mesh with the animation character. Do you have any experience or vision on the best way to support face or lip please?

I don't know what is the best way to do this. In terms of face motion and face meshes, my relevant experience is limited to being part of one specific research paper called Let's Face It, published at IVA last year. There, we applied motion-generation methods to face meshes parameterised using FLAME, although the lipsync we used was taken from another source and not generated by the method presented in the paper. Since lip movements are fairly well determined by acoustics, I think a deterministic (i.e., non-probabilistic) method could be used to solve that problem, although MoGlow in principle should work too. Either way, we did not combine face meshes with body motion in that work, and I don't have sufficient experience with 3D graphics to tell you how to do that.

from stylegestures.

kelvinqin commented on July 17, 2024

Gustav,
Thanks a lot for the sharing,

I think to combine facial synthesis and gesture synthesis is very sense-making --- I searched a while on INTERNET, I am surprised that I did not find a on-going project which address those two targets in one system, and sure there are a lot of lip-synching related work.
When I got more free time, I would like to try MoGlow on lip-synching/face generation, I expect a more vivid result comparing with deterministic methods,

Cheers,
Kelvin

from stylegestures.

About RootTransformer about stylegestures HOT 12 CLOSED

Comments (12)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent