Comments (12)
Regarding the first question about dimensions, your understanding is correct.
Regarding the second question:
- Your assumption is incorrect. (dx,dz, dr) is part of the output and is continuously generated by the model.
- In our experiments, we used the (dx, dz, dr) paths from the test data. In practical applications, this would be entered by hand, e.g. by setting (dx, dz, dr)=(0,0,0) for standing still, or by specifying a curve in some editor. Other possibilities may be to copy and loop some snippets with desired stepping behaviour from the original data, or to have in generated procedurally. It's really up to you and your application.
from stylegestures.
In fact, these are two separate processing methods. "absolute_translation_deltas" is part of the original PyMO library (see https://github.com/omimo/PyMO), and "pos_rot_deltas" was added by me. I have not analysed the "absolute_translation_deltas" in-depth , but I think you are right in that it centers the motion without taking respect to rotation. The "pos_rot_deltas" splits the hip-motion into 6 "root"-centric coordinates local to a ground-projected, forward-directed "root" coordinate system, and 3 coordinates describing the delta-translation and y-rotation (dx, dz, dr) of this "root" coordinate system.
The code for creating the FB-C condition may be missing, but is very simple to reproduce: Just take the last three features (dx,dz,dr) from the processed output features and move them to the input features.
from stylegestures.
To my humble understanding, I guess "abdolute_translation_deltas" only try to introduce some features on body movement without any feature on body rotation? But "pos_rot_deltas" try to cover both movement and rotation, is that correct?
The three deltas correspond to the change in forward and lateral displacement of the root node in a body-centric coordinate system, as well as the change in heading (a rotation), between frames. This is described in Section 4.1 of our MoGlow paper on arXiv and in Habibie et al. (2017). Even though I did not write that code, I feel confident in saying that the between-frame displacements are the "absolute_translation_deltas", and the changes in heading are the "pos_rot_deltas".
In your paper, you mentioned that, you have two models, one is FB-U, the other is FB-C. My question is where I can find a example code for you to construct the control vector?
The difference between FB-U and FB-C is that the "pos_rot_deltas" and "absolute_translation_deltas" are control-signal inputs (concatenated with the other per-frame control information) for FB-C, but they instead constitute model outputs for FB-U (they are concatenated with the per-frame pose information instead). This means that the code that constructs the "pos_rot_deltas" and "absolute_translation_deltas" is part of the code that creates the control-signal vectors for FB-C, and also is needed for FB-U, although FB-U uses the same control signal as MG.
While I do not know where in the code the concatenations mentioned above occur, this might give you an idea of where to search for the relevant code; for instance, it must come after those deltas are created. (And if you cannot find the code you need, re-putting in a tensor concatenation should not be too difficult to do either.)
from stylegestures.
Thanks for correcting my misunderstanding, Simon.
from stylegestures.
Gustav,
Thanks for all the comments, very helpful,
Kelvin
from stylegestures.
In fact, these are two separate processing methods. "absolute_translation_deltas" is part of the original PyMO library (see https://github.com/omimo/PyMO), and "pos_rot_deltas" was added by me. I have not analysed the "absolute_translation_deltas" in-depth , but I think you are right in that it centers the motion without taking respect to rotation. The "pos_rot_deltas" splits the hip-motion into 6 "root"-centric coordinates local to a ground-projected, forward-directed "root" coordinate system, and 3 coordinates describing the delta-translation and y-rotation (dx, dz, dr) of this "root" coordinate system.
The code for creating the FB-C condition may be missing, but is very simple to reproduce: Just take the last three features (dx,dz,dr) from the processed output features and move them to the input features.
Simon,
Thanks so much, I understand most of your comments.
You knew what, in my head I have always think that you are the owner of PyMO :-), I think you did an excellent job to extend it.
Just a little question on the last sentence in your comments. "Just take the last three features (dx,dz,dr) from the processed output features and move them to the input features.", could you be a little more detail here? I can not understand what you mean by "processed output features" and "the input features".
Thanks a lot,
Kelvin
from stylegestures.
"Just take the last three features (dx,dz,dr) from the processed output features and move them to the input features.", could you be a little more detail here? I can not understand what you mean by "processed output features" and "the input features".
As defined in our research paper, a MoGlow system takes a sequence c of control vectors c_t as input and produces output motion x_t as a sequence of individual poses x_t (here, underscores denote subscripts). My best understanding of what Simon is saying is that you should take the three last elements of pos_rot_deltas
for every frame t and append them to the existing control vector at that frame to obtain the complete control vector c_t, in order to replicate the setup of FB-C used in training and synthesis.
When Simon explicitly says that you should "move" the three elements, I think he means that these three elements of pos_rot_deltas
should not be included in the output vector x_t in training and synthesis (so disable any code that concatenates those three elements to it, if such code exists). It is only relevant to include those elements of pos_rot_deltas
in the pose vectors x_t when replicating the FB-U system.
I hope this answers your question.
from stylegestures.
Dear Ghenter,
With more detailed reading of the code and with your great comments, now I have a better understanding on my previous question on Simon's statement "Just take the last three features (dx,dz,dr) from the processed output features and move them to the input features.".
For training, this method is clear, but for synthesizing, how did you construct those 3 elements (dx, dz and dr)? For example, if I want to generate a 4seconds fullbody clip, how should I construct the sequence of (dx,dz, dr) (length = 4*20fps - 80)?
Look forward to your sharing,
Have a nice day,
from stylegestures.
Dear Gustav and Simon,
Two other related questions:
My first question is about what is the train_input and train_output for each of those two models? :
1> FB-U, In my current understanding and guessing:
train_input: 27 dim of melspectrum
train_output: emp of selected joints + (dx,dz,dr)
2> FB-C, In my current understanding and guessing:
train_input: 27 dim of melspectrum + (dx, dz, dr)
train_output: emp of selected joints
My second question is how did you construct the control signal in synthesizing phase for FB-U and FB-C, in order to make it clear, I read your paper in section 4.5 (full body synthesis).
1> FB-U, In my current understanding and guessing:
You copy through the (dx, dz, dr) from the original bvh? (https://youtu.be/egf3tjbWBQE?t=150)
2> FB-C: In my current understanding and guessing:
This is unclear for me, please teach me how did you do that. By looking at the video, the controlled foot-stepping and body-stance looks great and very natural. (https://youtu.be/egf3tjbWBQE?t=163)
Look forward to your sharing, thanks in advance,
Kelvin
I
from stylegestures.
Simon,
Thanks so much for your correction about my understanding of the logic. On style-control and also FB-C, your work is very practical in terms of meeting real requirement of talking avatar.
Can I ask one more visualization related question in high level? ---
Eventually the synthesized body should have a consistent facial expression and synchronized lip movement, so actually I have been working on face-mesh generation for a while (I extracted face-mesh from video and generated face mesh according to speech, in one of my chat with Taras, I attached a demo of my face-mesh generation system: https://user-images.githubusercontent.com/10486482/103012202-31dda980-4576-11eb-963d-5bf89224833e.mp4).
But a problem in my mind is how to integrate the generated face-mesh with the animation character. Do you have any experience or vision on the best way to support face or lip please?
Have a nice day,
Kelvin
from stylegestures.
a problem in my mind is how to integrate the generated face-mesh with the animation character. Do you have any experience or vision on the best way to support face or lip please?
I don't know what is the best way to do this. In terms of face motion and face meshes, my relevant experience is limited to being part of one specific research paper called Let's Face It, published at IVA last year. There, we applied motion-generation methods to face meshes parameterised using FLAME, although the lipsync we used was taken from another source and not generated by the method presented in the paper. Since lip movements are fairly well determined by acoustics, I think a deterministic (i.e., non-probabilistic) method could be used to solve that problem, although MoGlow in principle should work too. Either way, we did not combine face meshes with body motion in that work, and I don't have sufficient experience with 3D graphics to tell you how to do that.
from stylegestures.
Gustav,
Thanks a lot for the sharing,
I think to combine facial synthesis and gesture synthesis is very sense-making --- I searched a while on INTERNET, I am surprised that I did not find a on-going project which address those two targets in one system, and sure there are a lot of lip-synching related work.
When I got more free time, I would like to try MoGlow on lip-synching/face generation, I expect a more vivid result comparing with deterministic methods,
Cheers,
Kelvin
from stylegestures.
Related Issues (20)
- Did you experiment with different batch sizes in training? HOT 2
- Strange results when I train with multiple GPUs. HOT 2
- bvh files with fixed frames
- Difference between time_steps and seqlen? HOT 2
- Possible bug when computing the log-det of Jacobian for affine coupling HOT 2
- About datasets HOT 1
- For the freshmen about Gesture Generation HOT 2
- Questions about the latent random variable Z HOT 2
- The Python version
- About the Cuda version HOT 4
- About the swapaxes for self.x and self.cond HOT 6
- This dataset link doesn't seem to work. HOT 2
- The dataset are inconsistent HOT 10
- Excuse me, where can I find the dataset used by Example3 in readme? HOT 11
- This is the curve when I use two different data sets for training, and the parameters are the same. It can be roughly seen that the loss in Figure 1 will be lower than that in Figure 2, which can indicate that the performance of the first trained model will be better? HOT 6
- How to apply the output file(*.bvh) to 3D model file(*.3ds) HOT 3
- Some questions about the style control HOT 2
- Some questions about the style control HOT 1
- About the dataset HOT 2
- The trained model posture shakes badly. What might be the cause? Is there any way to solve this problem? HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from stylegestures.