Git Product home page Git Product logo

Comments (63)

liyingliu avatar liyingliu commented on May 6, 2024 1

Hi, Sorry for the missing information in my previous comment.

Yes, I am using KITTI, eigen split, using data generation code from vid2depth as what struct2depth does.

Yes, I have created a "possibility mobile" mask for each image. I am using the same masks of struth2depth (each object has different object ID and the objects are being tracked across three consecutive sequences). I am using Mask-RCNN to obtain the mask. Also, I have turned boxify=True, so the masks will become bounding boxes as I understand.
For your information, I also attached the the image in tensorboard of variable seg_stack of line 172, model.py.
Screenshot 2019-08-18 at 9 24 02 AM

from google-research.

liyingliu avatar liyingliu commented on May 6, 2024 1

Hi @gariel-google, I am also evaluating the egomotion prediction by using inference_egomotion to obtain egomition, and using sfmlearner to compute 5-point/3-point ATE.

Since I am just evaluating on my trained model (the one with Abs_Rel=0.147, trained on eigen split training set) and eigen split training set has overlap frames with odometry sequence 09 and 10, the ATE should be reasonably good. However, the result I got it's quite bad compared to what you stated in the paper. (I apologize if I am evaluating the egomotion prediction wrongly.)

Seq. 09 Seq. 10
5-point 0.0296 0.0245
3-point 0.0212 0.0180

For your information, I also attached the plotted trajectories.
seq09
seq10

Maybe we could exchange the odometry evaluation result as well?

from google-research.

gariel-google avatar gariel-google commented on May 6, 2024 1

from google-research.

gariel-google avatar gariel-google commented on May 6, 2024 1

from google-research.

player1321 avatar player1321 commented on May 6, 2024 1

@gariel-google Thanks for your reply.
Actually I have trouble generating the numbers. I'm using this evaluation tool, and the visualization looks fine but the numbers are very different from yours.
Here is the visualization result:
Seq-09
image
Seq-10
image

And here are the numbers:

  • KITTI
    Sequence: 9
    Trans. err. (%): 6.572
    Rot. err. (deg/100m): 2.254
    ATE (m): 40.574
    RPE (m): 0.068
    RPE (deg): 0.101
    Sequence: 10
    Trans. err. (%): 13.659
    Rot. err. (deg/100m): 3.904
    ATE (m): 145.783
    RPE (m): 0.100
    RPE (deg): 0.097

  • KITTI + cityscapes
    Sequence: 9
    Trans. err. (%): 7.727
    Rot. err. (deg/100m): 2.220
    ATE (m): 45.602
    RPE (m): 0.088
    RPE (deg): 0.093
    Sequence: 10
    Trans. err. (%): 13.032
    Rot. err. (deg/100m): 2.529
    ATE (m): 88.142
    RPE (m): 0.104
    RPE (deg): 0.082

It seems that the definition of the ATE is not the same as yours. Would you share some evaluation tools? Or any reference about the definitions can be recommended?

from google-research.

gariel-google avatar gariel-google commented on May 6, 2024 1

Thanks for pointing this out. This seems to be a bug on our side, then. I will look into it, but it may take some time till I can get to it and debug. Sorry about that :-)

from google-research.

gariel-google avatar gariel-google commented on May 6, 2024

from google-research.

gariel-google avatar gariel-google commented on May 6, 2024

from google-research.

liyingliu avatar liyingliu commented on May 6, 2024

Thanks for the reply.

Yes, for depth inference, I am using the link you mentioned in your previous comment.

In Figure 5 of the paper, for Evaluated on KITTI, the training converges at around 1 million training images. So if we assume that batch_size=4, learning_rate=0.0002 has similar convergence, and as I found that my training has Abs_Rel=0.147 for the best checkpoint within around 370k-th step (370k steps=1.48 million images). Therefore, can I conclude that the pretrained ImageNet checkpoint has huge impact (0.147 vs 0.128) on the result?

Looking forward to your release and thanks for the efforts.

from google-research.

gariel-google avatar gariel-google commented on May 6, 2024

from google-research.

liyingliu avatar liyingliu commented on May 6, 2024

Hi, it sounds great to me, Let's do it together.

from google-research.

happinessoverdue avatar happinessoverdue commented on May 6, 2024

In the paper,it said that some videos selected from 3079 YouTube8M videos labled 'Quadcopter', so does the IDs of them will be public soon?
And I realized that it need much time for process so many videos into three-frames-split and generate its masks and alignment,so the pretrained YouTube8M checkpoint will be also released soon?
I also notice that the current release does not yet support initialization form the resnet18 checkpoint pretrained on ImageNet, I'm trying to write codes to implement that since struct2depth has the similar codes organization...

from google-research.

gariel-google avatar gariel-google commented on May 6, 2024

from google-research.

liyingliu avatar liyingliu commented on May 6, 2024

Yes, as we exchange results it should be enough for comparing odometry evals. Thanks again for the efforts.

from google-research.

gariel-google avatar gariel-google commented on May 6, 2024

from google-research.

liyingliu avatar liyingliu commented on May 6, 2024

Thanks for the release. Yes, I am ready to try! However, I noticed that only the data files are released, and as I understand that to restore a model in TensorFlow, we need 3 files (correct me if I am wrong)--index, data, meta. Therefore, could you release the complete checkpoints?

from google-research.

gariel-google avatar gariel-google commented on May 6, 2024

from google-research.

liyingliu avatar liyingliu commented on May 6, 2024

Hi @gariel-google, thanks for the work. The new zip files work for me. I have tested the checkpoint trained on KITTI. The following is what I have:

  • depth result is different when inference with different batch_size:
abs_rel sq_rel rms log_rms a1 a2 a3
batch_size=1 0.1262 0.9462 5.2214 0.2086 0.8470 0.9475 0.9774
batch_size=16 0.1305 1.0186 5.3237 0.2136 0.8389 0.9430 0.9751

When batch_size=1, we have the same result, therefore we should be having the same evaluating metrics. However, depth output doesn't have a consistent result when batch_size changes, is it the same case for you? where does the variation come from?

  • odometry result (ATE) when inference with batch_size=1:
seq_09 seq_10
5-point 0.0231 0.0195
3-point 0.0170 0.0149

The plotted trajectories:
seq_09
dfvauthorseq09
seq_10
dfvauthorseq10
The odometry result looks quite bad for me. Do you have the same result? Since eigen split training set has overlap with odometry sequence 09 and 10, the ATE should be better than what you have stated in the paper (0.0231 vs 0.012 and 0.0195 vs 0.010)?

from google-research.

gariel-google avatar gariel-google commented on May 6, 2024

from google-research.

liyingliu avatar liyingliu commented on May 6, 2024

Hi @gariel-google, Sorry for the late reply and thanks for your explanations.
Being able to reproduce the training would be a nice start for me. Thanks again for the work.
Also, Thank you in advance for releasing the odometry checkpoint and the respective inferred trajectories.

from google-research.

buaafish avatar buaafish commented on May 6, 2024

Hi @gariel-google , thanks for your checkpoint. But I test intrinsic inference with your kitti checkpoint, the intrinsic matrix is not right.
The input images is one pair kitti images like this:
kitti2

We use top two images to infer intrinsic matrix:
[[119.80293 0. 702.5139 ]
[ 0. 74.126114 -29.449604]
[ 0. 0. 1. ]]

But the ground truth intrinsic matrix:
[[241.67446312 0. 204.16801031]
[ 0. 246.28486827 59.000832 ]
[ 0. 0. 1. ]]

Why the intrinsic matrix is not right?

from google-research.

gariel-google avatar gariel-google commented on May 6, 2024

from google-research.

buaafish avatar buaafish commented on May 6, 2024

@gariel-google

  1. Indeed I made incorrect normalization of the images(0-255). Then I add imagenet_norm operation to your test code.

The inference code like this :

  def _build_egomotion_test_graph(self):
    """Builds graph for inference of egomotion given two images."""
    with tf.variable_scope(tf.get_variable_scope(), reuse=tf.AUTO_REUSE):
      self._image1 = tf.placeholder(
          tf.float32, [self.batch_size, self.img_height, self.img_width, 3],
          name='image1')
      self._image2 = tf.placeholder(
          tf.float32, [self.batch_size, self.img_height, self.img_width, 3],
          name='image2')
      if self.imagenet_norm:
        self._image1 = (self._image1 - reader.IMAGENET_MEAN) / reader.IMAGENET_SD
        self._image2 = (self._image2 - reader.IMAGENET_MEAN) / reader.IMAGENET_SD
        
      # The "compute_loss" scope is needed for the checkpoint to load properly.
      with tf.name_scope('compute_loss'):
        rot, trans, _, mat = motion_prediction_net.motion_field_net(
            images=tf.concat([self._image1, self._image2], axis=-1))
        inv_rot, inv_trans, _, inv_mat = (
            motion_prediction_net.motion_field_net(
                images=tf.concat([self._image2, self._image1], axis=-1)))
      intrinsic_mat = 0.5 * (mat + inv_mat)
      rot = transform_utils.matrix_from_angles(rot)
      inv_rot = transform_utils.matrix_from_angles(inv_rot)
      trans = tf.squeeze(trans, axis=(1, 2))
      inv_trans = tf.squeeze(inv_trans, axis=(1, 2))

      # rot and inv_rot should be the inverses on of the other, but in reality
      # they slightly differ. Averaging rot and inv(inv_rot) gives a better
      # estimator for the rotation. Similarly, trans and rot*inv_trans should
      # be the negatives one of the other, so we average rot*inv_trans and trans
      # to get a better estimator. TODO(gariel): Check if there's an estimator
      # with less variance.
      self.rot = 0.5 * (tf.linalg.inv(inv_rot) + rot)
      self.trans = 0.5 * (-tf.squeeze(
          tf.matmul(self.rot, tf.expand_dims(inv_trans, -1)), axis=-1) + trans)
      self.inf_intrinsic_mat = intrinsic_mat
      
  def inference_egomotion(self, image1, image2, sess):
    return sess.run([self.rot, self.trans, self.inf_intrinsic_mat],
                    feed_dict={
                        self._image1: image1,
                        self._image2: image2
                    })

I modified your code :

      if self.imagenet_norm:
        self._image1 = (self._image1 - reader.IMAGENET_MEAN) / reader.IMAGENET_SD
        self._image2 = (self._image2 - reader.IMAGENET_MEAN) / reader.IMAGENET_SD
     intrinsic_mat = 0.5 * (mat + inv_mat)

Then I read RGB images to feed to image1 and image2.

The intrinsic matrix is not right too.
[[ 375.67065 0. -579.6008 ]
[ 0. 89.65622 -65.86415]
[ 0. 0. 1. ]]

from google-research.

buaafish avatar buaafish commented on May 6, 2024

@gariel-google Test code like this.

def main(_):
  seed = FLAGS.seed
  tf.set_random_seed(seed)
  np.random.seed(seed)
  random.seed(seed)

  if not gfile.Exists(FLAGS.checkpoint_dir):
    gfile.MakeDirs(FLAGS.checkpoint_dir)

  test_model = model.Model(
      boxify=FLAGS.boxify,
      data_dir=FLAGS.data_dir,
      file_extension=FLAGS.file_extension,
      is_training=False,
      foreground_dilation=FLAGS.foreground_dilation,
      learn_intrinsics=FLAGS.learn_intrinsics,
      learning_rate=FLAGS.learning_rate,
      reconstr_weight=FLAGS.reconstr_weight,
      smooth_weight=FLAGS.smooth_weight,
      ssim_weight=FLAGS.ssim_weight,
      translation_consistency_weight=FLAGS.translation_consistency_weight,
      rotation_consistency_weight=FLAGS.rotation_consistency_weight,
      batch_size=FLAGS.batch_size,
      img_height=FLAGS.img_height,
      img_width=FLAGS.img_width,
      weight_reg=FLAGS.weight_reg,
      depth_consistency_loss_weight=FLAGS.depth_consistency_loss_weight,
      queue_size=FLAGS.queue_size,
      input_file=FLAGS.input_file)
  
  _test(test_model, FLAGS.checkpoint_dir)

def readImages(path, subdir, name):
  filename = name+".png"
  filepath = os.path.join(path, subdir, filename)
  im = Image.open(filepath)
  im_array = np.array(im)
  img1 = im_array[:, 0:416, :]
  img2 = im_array[:, 416:832, :]
  return img1[np.newaxis, :, :, :], img2[np.newaxis, :, :, :]


def readMat(path, subdir, name):
  filename = name+"_cam.txt"
  filepath = os.path.join(path, subdir, filename)
  data_temp=[]
  with open(filepath) as fdata:
    line=fdata.readline()
    data_temp.append([float(i) for i in line.split(',')])
  return np.array(data_temp).reshape((3,3))
 
def readFileList(list_data_dir):
  with gfile.Open(list_data_dir) as f:
    frames = f.readlines()
    frames = [k.rstrip() for k in frames]
  subfolders = [x.split(' ')[0] for x in frames]
  frame_ids = [x.split(' ')[1] for x in frames]
  return subfolders, frame_ids


def _test(test_model, checkpoint_dir):
  checkpointpath = "./pretrained/cityscapes_kitti_learned_intrinsics/"
  
  saver = tf.train.import_meta_graph(checkpointpath+'model-1000977.meta')
  checkpoint = checkpointpath+"model-1000977"
  with tf.device('/cpu:0'):
    with tf.Session() as sess:
      sess.run(tf.local_variables_initializer())
      sess.run(tf.global_variables_initializer())
      logging.info('Loading checkpoint...')
      saver.restore(sess, checkpoint)
      logging.info('Reading data...')
      path = "./kitti/format_data"
      list_data_dir = "test.txt"
      subfolders, frame_ids = readFileList(list_data_dir)
      for (subdir, name) in zip(subfolders, frame_ids):  
        img1, img2 = readImages(path, subdir, name)
        logging.info('Start testing...')
        ret = test_model.inference_egomotion(img1, img2,sess)
        print(ret[2])
        mat = readMat(path, subdir, name)
        print(mat)
        logging.info('End testing...')
            
if __name__ == '__main__':
  app.run(main)

from google-research.

gariel-google avatar gariel-google commented on May 6, 2024

from google-research.

liyingliu avatar liyingliu commented on May 6, 2024

@gariel-google Understand and thanks! Looking forward to exchanging the odometry results.

from google-research.

gariel-google avatar gariel-google commented on May 6, 2024

from google-research.

cognitiveRobot avatar cognitiveRobot commented on May 6, 2024

@liyingliu We just added the code for initialization from Imagenet, as well as some corrections in the hyperparameters for training. Unfortunately I was unable to obtain clearance to release the specific ImageNet checkpoint itself yet - sorry about that, things sometimes get more bureaucratic than expected. @buaafish Thanks for sharing your code, it's not easy for me though to spot a bug if there is one. Is there a chance you have an answer for me whether you were able to reproduce the depth inference metrics and/or whether the trajectories look reasonable? The Intrinsic matrix is so much off that I still suspect there is some sort of crude error somewhere. My next steps are to release the checkpoints we used for calculating odometry, with learned and given intrinsics, as well as the respective odometry trajectories. Then I can try to add a small piece of code for generating Fig. 9 in the paper for the intrinsics, which should hopefully resolve the intrinsics issue. Thank you all for your help debugging this, our goal is that everyone will be able to reproduce out results.

On Mon, Aug 26, 2019 at 8:31 PM buaafish @.***> wrote: @gariel-google https://github.com/gariel-google Test code like this. def main(_): seed = FLAGS.seed tf.set_random_seed(seed) np.random.seed(seed) random.seed(seed) if not gfile.Exists(FLAGS.checkpoint_dir): gfile.MakeDirs(FLAGS.checkpoint_dir) test_model = model.Model( boxify=FLAGS.boxify, data_dir=FLAGS.data_dir, file_extension=FLAGS.file_extension, is_training=False, foreground_dilation=FLAGS.foreground_dilation, learn_intrinsics=FLAGS.learn_intrinsics, learning_rate=FLAGS.learning_rate, reconstr_weight=FLAGS.reconstr_weight, smooth_weight=FLAGS.smooth_weight, ssim_weight=FLAGS.ssim_weight, translation_consistency_weight=FLAGS.translation_consistency_weight, rotation_consistency_weight=FLAGS.rotation_consistency_weight, batch_size=FLAGS.batch_size, img_height=FLAGS.img_height, img_width=FLAGS.img_width, weight_reg=FLAGS.weight_reg, depth_consistency_loss_weight=FLAGS.depth_consistency_loss_weight, queue_size=FLAGS.queue_size, input_file=FLAGS.input_file) _test(test_model, FLAGS.checkpoint_dir) def readImages(path, subdir, name): filename = name+".png" filepath = os.path.join(path, subdir, filename) im = Image.open(filepath) im_array = np.array(im) img1 = im_array[:, 0:416, :] img2 = im_array[:, 416:832, :] return img1[np.newaxis, :, :, :], img2[np.newaxis, :, :, :] def readMat(path, subdir, name): filename = name+"_cam.txt" filepath = os.path.join(path, subdir, filename) data_temp=[] with open(filepath) as fdata: line=fdata.readline() data_temp.append([float(i) for i in line.split(',')]) return np.array(data_temp).reshape((3,3)) def readFileList(list_data_dir): with gfile.Open(list_data_dir) as f: frames = f.readlines() frames = [k.rstrip() for k in frames] subfolders = [x.split(' ')[0] for x in frames] frame_ids = [x.split(' ')[1] for x in frames] return subfolders, frame_ids def _test(test_model, checkpoint_dir): checkpointpath = "./pretrained/cityscapes_kitti_learned_intrinsics/" saver = tf.train.import_meta_graph(checkpointpath+'model-1000977.meta') checkpoint = checkpointpath+"model-1000977" with tf.device('/cpu:0'): with tf.Session() as sess: sess.run(tf.local_variables_initializer()) sess.run(tf.global_variables_initializer()) logging.info('Loading checkpoint...') saver.restore(sess, checkpoint) print(reader.IMAGENET_MEAN) print(reader.IMAGENET_SD) logging.info('Reading data...') path = "./kitti/format_data" list_data_dir = "test.txt" subfolders, frame_ids = readFileList(list_data_dir) for (subdir, name) in zip(subfolders, frame_ids): img1, img2 = readImages(path, subdir, name) logging.info('Start testing...') ret = test_model.inference_egomotion(img1, img2,sess) print(ret[2]) mat = readMat(path, subdir, name) print(mat) logging.info('End testing...') if name == 'main': app.run(main) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#46?email_source=notifications&email_token=ADXKUNHNC2IDOE7JDYVIPN3QGSN25A5CNFSM4IMPQMMKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD5GLS5Q#issuecomment-525121910>, or mute the thread https://github.com/notifications/unsubscribe-auth/ADXKUNHON74T2IYYN5MCBTTQGSN25ANCNFSM4IMPQMMA .

@gariel-google did you get clearance to release the specific ImageNet checkpoint? I want to try with that.
In my case, it not learning if I train without any checkpoint. If I provide checkpoint from https://github.com/google-research/google-research/tree/master/depth_from_video_in_the_wild#pretrained-checkpoints-and-respective-depth-metrics
I get the following

I1111 21:44:55.547858 140513105188608 train.py:167] Attempting to resume training from depth_from_video_in_the_wild/kitti_learned_intrinsics/...
I1111 21:44:55.548219 140513105188608 train.py:169] Last checkpoint found: None
I1111 21:44:55.548329 140513105188608 train.py:176] Training...

Not learning.
Tried with 'learning_rate', 1e-4 to 1e-6 and 'batch_size', 4 and 2.
Any thoughts? Thanks.

from google-research.

liyingliu avatar liyingliu commented on May 6, 2024

@cognitiveRobot If your question is to restore and train the checkpoint that the author provided. Then, you could try to add a file named "checkpoint" in your checkpoint folder (the folder contains .index, .meta .data-xxxx). The content in "checkpoint" file can be the following:
model_checkpoint_path:"path_to_kitti_learned_intrinsics/model-248900"

from google-research.

gariel-google avatar gariel-google commented on May 6, 2024

from google-research.

gariel-google avatar gariel-google commented on May 6, 2024

from google-research.

cognitiveRobot avatar cognitiveRobot commented on May 6, 2024

@gariel-google, thanks. I will test. :)

from google-research.

liyingliu avatar liyingliu commented on May 6, 2024

Regarding the batch size, we tested at 1. If batch-normalization is replaced everywhere by randomized layer normalization, the inference results do not depend on the batch size, as it should be. Due to an oversight, when we were obtaining the results for the paper, we left a few batch normalization layers in place. We fixed that since, but to be compatible with the checkpoints used for the paper, we needed to leave the batch-norms there, hence the dependence on the batch size.

@gariel-google Hi, I realize that for batch-normalization, in the code is_train is set to False during inference. Therefore, in this case, the depth result shouldn't depend on the batch size.

Another experiment I did is: I trained a model with all the batch-normalization replaced by randomized layer normalization. However, when I evaluated this checkpoint, the depth result is still inconsistent with different batch size. It seems randomized layer normalization is the reason causing inconsistent results? Can you help to explain this? Thanks.

from google-research.

gariel-google avatar gariel-google commented on May 6, 2024

from google-research.

hyeokhyen avatar hyeokhyen commented on May 6, 2024

Any news for model used for the paper that can predict intrinsic parameter correctly?

from google-research.

gariel-google avatar gariel-google commented on May 6, 2024

from google-research.

StephenStorm avatar StephenStorm commented on May 6, 2024

@liyingliu
sorry to bother you, but I have some questions I'd like to ask you. I tried to infer the depth map using the existing checkpoint cityscape_kitti, but the depth value I read directly from the '.npy' file was far from real depth, whether it was my own images or images in cityscape. Did I do something wrong? Or further operations are required to obtain true depth values. Thank you very much.
'inference.py' from (https://github.com/tensorflow/models/blob/master/research/struct2depth/inference.py)
img_width and img_height are defaults (416,128)

from google-research.

liyingliu avatar liyingliu commented on May 6, 2024

Hi, there is an unknown scale between the predicted depth value from the network and the real depth value. You need to scale the predicted depth by such a scale factor to have true depth value. You could use the median of your prediction divided by the median of ground truth to be the scale.

from google-research.

gariel-google avatar gariel-google commented on May 6, 2024

from google-research.

StephenStorm avatar StephenStorm commented on May 6, 2024

@liyingliu @gariel-google
Thank you very much for your two patient guidance, I will try again based on your suggestions.
If I encounter other problems, I will consult you again.
@gariel-google
By the way, I don't quite understand what you mean by "observing strong discrepancies even beyond that global factor". Do you mean whether I found the scale factor does not apply to the entire image ?
thank you both again.

from google-research.

player1321 avatar player1321 commented on May 6, 2024

@gariel-google
Hello, I'm new to this line.
I tried to run trajectory_inference.py and evaluate the odometry result, but the output format looks different from the ground truth downloaded from KITTI odometry.
Will you release some examples about how to translate the format? Or maybe I missed some examples in this repo?

from google-research.

gariel-google avatar gariel-google commented on May 6, 2024

@StephenStorm by "observing strong discrepancies even beyond that global factor" I mean: If you multiply the predicted depth by a factor such that its median matches the median groundtruth depth, do you still see significant discrepancies?

@player1321 In the KITTI format, the first 3 columns are the (x, y, z) position of the car if I'm not mistaken. This code generates the inferred (x, y, z)-s of the trajectory.

from google-research.

player1321 avatar player1321 commented on May 6, 2024

@gariel-google Thanks a lot for your patient guidance.
I tested your released model. It seems that the model trained for odometry works well on trajectory inference but not so well on depth estimation, and the model trained on cityscapes&kitti works well on depth estimation but not so well on trajectory inference.
Is this caused by the gap between cityscapes and kitti? Or are there some tricks for improving depth estimation and trajectory inference respectively?

from google-research.

gariel-google avatar gariel-google commented on May 6, 2024

@player1321 I looked up the checkpoint we used for KITTI odometry (with given intrinsics), and its depth prediction metric is 0.1321, which is indeed worse than the KITTI-only depth error that we report in the paper for given intrinsics (0.129). Is that your concern? We did observe that for odometry results tend to improve the longer we train, whereas for depth they tend to become slightly worse and more noise beyond some point. We did not try to evaluate the cityscapes + KITTI checkpoints for odometry, and I don't know how it would perform.

Would you like to share your numbers on both evaluations?

from google-research.

frobinet avatar frobinet commented on May 6, 2024

@gariel-google Thanks for sharing the code and helping us reproduce the results. I'm able to reproduce similar figures to the paper using the odometry checkpoints, but the scale seems to be wrong. Is the egomotion network supposed to output positions with the real-world scale immediately, or is it assumed that we're performing a scaling as postprocessing? If yes, which type of scaling is used in the paper?

EDIT: From the looks of it, I think the scale-7dof scaling technique is used (see https://github.com/Huangying-Zhan/kitti-odom-eval)

from google-research.

frobinet avatar frobinet commented on May 6, 2024

Also, I realized that the given intrinsics weights link with kitti odometry is wrong: it references references the cityscape model from the depth table right above: https://www.googleapis.com/download/storage/v1/b/gresearch/o/depth_from_video_in_the_wild%2Fcheckpoints%2Fcityscapes_learned_intrinsics.zip?generation=1566493765410932&alt=media

from google-research.

gariel-google avatar gariel-google commented on May 6, 2024

@player1321 The definition of ATE we used follows Zhou et al. Our ATE eval is based on theirs, which is given here: https://github.com/tinghuiz/SfMLearner/tree/master/kitti_eval. The numbers typically are in the 10^-2-s. Your's are in meters, and are large-ish, so indeed it's probably the same definition.

Regarding the translation error, we didn't check if tor the city+KITTI checkpoint, and while your numbers are different, it seems that your numbers are not far from ours, assuming you tested checkpoints with learned intrinsics (right?).

@frobinet The odometry predictions are scale-less, just like the depth predictions. We normalized the entire trajectory by its length. That is, we scaled the predicted trajectory uniformly until its total length was identical to the GT length.

from google-research.

gariel-google avatar gariel-google commented on May 6, 2024

@frobinet I'll have a look at the model links and get back to you, thanks for pointing this out.

from google-research.

frobinet avatar frobinet commented on May 6, 2024

@gariel-google Thanks for helping with this! Any news about releasing the weights for the given intrinsics odometry model?

from google-research.

player1321 avatar player1321 commented on May 6, 2024

@gariel-google Thanks for your guidance! It's very helpful!
You are right, I'm using checkpoints with learned intrinsics.
I have a new question now:
image
In your paper, you show that the network can carve the silhouette of the people out of a rough mask, but in my test, it only happens in a few cases, and in most of the cases the output is like the region of the car, that is, the network cannot catch the residual translation correctly.
Does this happen in your test?

from google-research.

NHirose avatar NHirose commented on May 6, 2024

@gariel-google Thanks for sharing the code and helping us.
Even though I am using your pretrained model(learned intrinsic), I can not reproduce the results of egomotion.
I got 0.0259@seq-09 and 0.0210@seq-10, which are much worse than values on your paper.

Can you provide trajectories with poses and/or code to reproduce the same value?
Shared trajecotries only include XYZ positions. If you can add poses, it is very helpful to reproduce the results.

from google-research.

gariel-google avatar gariel-google commented on May 6, 2024

Sorry for the delayed response.

@NHirose we released the trajectories here - please see the table below the title, under the links "trajectory".

@player1321 We didn't evaluate quantitatively the prediction of residual motion. Qualitatively it looks good in most cases - I know it sounds hand-wavy, but unfortunately there is no number that I can quote to support this quantitatively.

from google-research.

NHirose avatar NHirose commented on May 6, 2024

@gariel-google Thank you for your replying. However your released trajectory file only includes XYZ position. I additionally need roll pitch yaw angles to reproduce the values on your paper.

Or can you provide the evaluation file to reproduce the value about ego motion on your paper? That can help me to find differences!!

from google-research.

adizhol avatar adizhol commented on May 6, 2024

Hi,

I'm getting an error when loading EuroC MAV checkpoint for training. [depth_from_video_in_the_wild_euroc_ckpt_MachineHallAll]

  saver = train_model.saver
  with sv.managed_session(config=config) as sess:
      saver.restore(sess, 'depth_from_video_in_the_wild_euroc_ckpt_MachineHallAll/model-1797000')

When using the same code as above with checkpoint saved after training from scratch - no errors.

Key MotionFieldNet/compute_loss/MotionFieldNet_2/Conv1/Relu/MotionBottleneck/weights not found in checkpoint
[[node save/RestoreV2 (defined at /depth_from_video_in_the_wild/model.py:117) ]]

@gariel-google

from google-research.

gariel-google avatar gariel-google commented on May 6, 2024

from google-research.

adizhol avatar adizhol commented on May 6, 2024

from google-research.

mathmax12 avatar mathmax12 commented on May 6, 2024

@gariel-google @adizhol I am facing the same issue, i.e., the checkpoints of " cityscape and kitti" works well with the model.py.
But the euro one complains.

sess = tf.Session()
saver = tf.train.Saver()
saver.restore(sess, 'euroc_ckpt_ViconRoom1-01/model-2091000')
NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key MotionFieldNet/CameraIntrinsics/foci/biases not found in checkpoint
	 [[node save_3/RestoreV2 (defined at <ipython-input-13-a5df47d040a7>:2) ]]

Are there any updates about this?

Thanks

from google-research.

adizhol avatar adizhol commented on May 6, 2024

After training on custom data, I'm getting different depth when training and when doing inference (on the same images).
When training, the depth is smooth and when inferencing it's very rough and inaccurate.
I'm using the inference code from struct2depth inference.py.
Has anyone experienced something like this?

update
I just saw that I had messages like
util.py:198] Did not find var depth_prediction/conv2_2/bn_1/moving_mean in checkpoint

The following variables in the checkpoint were not loaded:
util.py:210] MotionFieldNet/compute_loss/MotionFieldNet_2/Conv1/Relu/MotionBottleneck/weights

update
I changed
vars_to_restore = util.get_vars_to_save_and_restore(model_ckpt)
to:

  vars_to_restore = [v for v in tf.trainable_variables()]

And the error\warning is gone, but the problem still exists

update
Using Batch Normalization or Randomized Layer Normalization in "train" mode during inference yields results like during training.
I don't understand why do you add the Gaussian noise only when is_train=True.
In the code you ramp-up the stddev of the noise for the Randomized Layer Normalization, but you didn't mention anything about it in the paper.

Also, during inference you're inferring on a flipped image, and taking the minimum with the no-flipped image.
This should be used only if trained with flip_mode != none

from google-research.

VladimirYugay avatar VladimirYugay commented on May 6, 2024

Hey there, I'm still facing this:

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key MotionFieldNet/CameraIntrinsics/foci/biases not found in checkpoint
[[node save_3/RestoreV2 (defined at :2) ]]

Currently, I'm using the latest code version:

vars_to_restore = [ v for v in tf.trainable_variables() if v.op.name.startswith(DEPTH_SCOPE + '/conv') ] vars_to_restore = { v.op.name[len(DEPTH_SCOPE) + 1:]: v for v in vars_to_restore }
@adizhol

from google-research.

jaysonph avatar jaysonph commented on May 6, 2024

@VladimirYugay @adizhol @mathmax12

Run tf.reset_default_graph() before restoring the checkpoint to cope with the error below

NotFoundError (see above for traceback): Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:

Key MotionFieldNet/CameraIntrinsics/foci/biases not found in checkpoint
[[node save_3/RestoreV2 (defined at :2) ]]

from google-research.

Hzy98 avatar Hzy98 commented on May 6, 2024

@cognitiveRobot If your question is to restore and train the checkpoint that the author provided. Then, you could try to add a file named "checkpoint" in your checkpoint folder (the folder contains .index, .meta .data-xxxx). The content in "checkpoint" file can be the following: model_checkpoint_path:"path_to_kitti_learned_intrinsics/model-248900"

I have downloaded the checkpoints provided by the author, extracted them and put them in the folder, and also did as you said,added a checkpoint. In the run.sh, I wrote as follows:
"--imagenet_ckpt=/root/depth_from_video_in_the_wild/MY_IMAGENET_CHECKPOINT/model-248900"
But it's still wrong:
Restoring from checkpoint failed. This is most likely due to a Variable name or other graph key that is missing from the checkpoint. Please ensure that you have not altered the graph expected based on the checkpoint. Original error:
2 root error(s) found.
(0) Not found: Key conv1/bn/beta not found in checkpoint
[[node save_1/RestoreV2 (defined at /miniconda3/envs/huawei/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]
[[save_1/RestoreV2/_1067]]
(1) Not found: Key conv1/bn/beta not found in checkpoint
[[node save_1/RestoreV2 (defined at /miniconda3/envs/huawei/lib/python3.7/site-packages/tensorflow_core/python/framework/ops.py:1748) ]]

Is my path wrong?Can you help me,thank you.

from google-research.

VladimirYugay avatar VladimirYugay commented on May 6, 2024

@Hzy98 Check this this, it's a refactored version of this paper + fixes

from google-research.

Hzy98 avatar Hzy98 commented on May 6, 2024

@Hzy98 Check this this, it's a refactored version of this paper + fixes

Thank you. I'll try this.

from google-research.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.