Hi, I try to train the network on my own dataset but the results are

Here is the code for visualizing: <a href="https://github.com/j96w/DenseFusion/fil

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Own dataset training results are not accurate,about j96w/densefusion

Comments (39)

fbas-est commented on July 3, 2024 1

Here is the code for visualizing:
visualize.txt

@Xushuangyin
You produced 10000 pictures from one video or from different videos? In my case I used different videos due to RAM limitations.
The problem was that every video produce pointclouds with different rotation and translation matrices and so the model could not use the same mesh for all the combined dataset.

from densefusion.

jc0725 commented on July 3, 2024

@fbas-est
Hello. This is unrelated to your question, but I am also trying to use DenseFusion on my own dataset.
May I ask what your environment settings are (CUDA version, etc.), and the steps for how you successfully managed to build using your own dataset?
Thank you in advance.

from densefusion.

Xushuangyin commented on July 3, 2024

Hello, I'm also making my own datasets for training and using realsense camera to estimate the attitude of objects. I've also encountered some problems. Is it convenient to add a contact information for communication? My wechat is 18845107925

from densefusion.

fbas-est commented on July 3, 2024

@jc0725
Hello I use CUDA 10.1 and PyTorch 1.6.
To build my dataset I used ObjectDatasetTools. You can find the source code from github: https://github.com/F2Wang/ObjectDatasetTools
In order to make it work I changed the format of the dataset to comply with the format of the DenseFusion's Linemod Dataset.

from densefusion.

jc0725 commented on July 3, 2024

@fbas-est
Thank you for your response.
May I ask how you trained the SegNet for LINEMOD? Did you change the "--dataset_root" directory to LINEMOD instead of YCB in ./vanilla_segmentation/train.py ?

Also, after training, what script did you run to get the 6DoF results?

I apologize if my questions are quite elementary.

from densefusion.

fbas-est commented on July 3, 2024

@jc0725
Yes. I also changed dataset.py a bit in order to work for my dataset.
A slighty different version of eval_linemod.py with some functions for visualizing the 3D bounding box

from densefusion.

jc0725 commented on July 3, 2024

@fbas-est
Would it be possible for you to upload your working code to your repository so that I can clone it?

from densefusion.

Xushuangyin commented on July 3, 2024

Thank you very much for your reply. I also used ObjectDatasetTools to make my own dataset. I made 10000 pictures of a single object, but after training 20epoch, the posture of the model was changed greatly when I called the model to pose the object. I wanted to ask you how many rounds you trained, and how did you get the green bounding box in your video? Thank you. @fbas-est

from densefusion.

Xushuangyin commented on July 3, 2024

3d844a26c702f624fea6619a37124476.mp4

from densefusion.

Xushuangyin commented on July 3, 2024

I made 10000 pictures from different videos. If there are too many pictures, the program will report an error. I made my own object grid. How can I solve the problem you said? @fbas-est

from densefusion.

Xushuangyin commented on July 3, 2024

Thank you very much for your code! @fbas-est

from densefusion.

jc0725 commented on July 3, 2024

@fbas-est
Thank you very much. I will let you know if I am able to make any improvements or if I come up with any suggestions for improved accuracy on your project.

from densefusion.

fbas-est commented on July 3, 2024

@Xushuangyin
I suggest to begin by finding a way to render the point cloud into the labeled dataset's color images (3D bounding box won't work). If the target pointcloud (the pointcloud used as label) is not accurate then the network won't work.
If that's the problem, then for every video collected you need to change the transforms in the file transforms.npy so that they have one mesh as reference and then label them with that mesh

from densefusion.

Xushuangyin commented on July 3, 2024

My datasets is composed of different videos, but the grid used by each video is generated at that time. Each grid is different, but when I train the network, the grid loaded is made by myself. Will this affect the accuracy of the model?

…

------------------ 原始邮件 ------------------ 发件人: "j96w/DenseFusion" ***@***.***>; 发送时间: 2022年4月15日(星期五) 晚上8:16 ***@***.***>; ***@***.******@***.***>; 主题: Re: [j96w/DenseFusion] Own dataset training results are not accurate (Issue #207) @Xushuangyin I suggest to begin by finding a way to render the point cloud into the labeled dataset's color images (3D bounding box won't work). If the target pointcloud (the pointcloud used as label) is not accurate then the network won't work. If that's the problem, then for every video collected you need to change the transforms in the file transforms.npy so that they have one mesh as reference and then label them with that mesh — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

from densefusion.

orangeRobot990 commented on July 3, 2024

do you guys resize images during inference ?
i get weird convolution errors :

RuntimeError: Calculated padded input size per channel: (6 x 320). Kernel size: (7 x 7). Kernel size can't be greater than actual input size

RuntimeError: Calculated padded input size per channel: (6 x 287). Kernel size: (7 x 7). Kernel size can't be greater than actual input size

its different each time, so i guess its the image or mask size ? where should i resize ?

@Xushuangyin @fbas-est
thank you

from densefusion.

Xushuangyin commented on July 3, 2024

Can I see your specific error code?

…

------------------ 原始邮件 ------------------ 发件人: "j96w/DenseFusion" ***@***.***>; 发送时间: 2022年4月23日(星期六) 凌晨1:50 ***@***.***>; ***@***.******@***.***>; 主题: Re: [j96w/DenseFusion] Own dataset training results are not accurate (Issue #207) do you guys resize images during inference ? i get weird convolution errors : RuntimeError: Calculated padded input size per channel: (6 x 320). Kernel size: (7 x 7). Kernel size can't be greater than actual input size RuntimeError: Calculated padded input size per channel: (6 x 287). Kernel size: (7 x 7). Kernel size can't be greater than actual input size its different each time, so i guess its the image or mask size ? where should i resize ? @Xushuangyin @fbas-est thank you — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

from densefusion.

an99990 commented on July 3, 2024

hi @Xushuangyin thank you for responding, i actually found the source it was because i was transposing the array incorrectly.

from densefusion.

an99990 commented on July 3, 2024

right now @Xushuangyin i a having issues with nana values in my training when i removed the /1000 since my depth and other metrics are in meters.

I also reduced the learning rate but i still get nan

from densefusion.

an99990 commented on July 3, 2024

@Xushuangyin so now i just have giant results. I confirmed that my meshes are in meters so i removed the /1000.

Full code here


from importlib.abc import Loader
import torch.utils.data as data
from PIL import Image
import os
import os.path
import errno
import torch
import json
import codecs
import numpy as np
import sys
import torchvision.transforms as transforms
import argparse
import json
import time
import random
import numpy.ma as ma
import copy
import scipy.misc
import scipy.io as scio
import yaml
import cv2


class PoseDataset(data.Dataset):
    def __init__(self, mode, num, add_noise, root, noise_trans, refine):
        self.objlist = [0, 1]
        self.mode = mode

        self.list_rgb = []
        self.list_depth = []
        self.list_label = []
        self.list_obj = []
        self.list_rank = []
        self.meta = {}
        self.pt = {}
        self.root = root
        self.noise_trans = noise_trans
        self.refine = refine
        min = 1000


        item_count = 0
        for item in self.objlist:
            if self.mode == 'train':
                input_file = open('{0}/data/{1}/train.txt'.format(self.root, '%d' % item))
            else:
                input_file = open('{0}/data/{1}/test.txt'.format(self.root, '%d' % item))
            while 1:
                item_count += 1
                input_line = input_file.readline()
                if self.mode == 'test' and item_count % 10 != 0:
                    continue
                if not input_line:
                    break
                if input_line[-1:] == '\n':
                    input_line = input_line[:-1]
                self.list_rgb.append('{0}/data/{1}/rgb/{2}.jpg'.format(self.root, '%d' % item, input_line))
                self.list_depth.append('{0}/data/{1}/depth/{2}.png'.format(self.root, '%d' % item, input_line))
                if self.mode == 'eval':
                    self.list_label.append('{0}/segnet_results/{1}_label/{2}_label.png'.format(self.root, '%d' % item, input_line))
                else:
                    self.list_label.append('{0}/data/{1}/mask/{2}.png'.format(self.root, '%d' % item, input_line))
                
                self.list_obj.append(item)
                self.list_rank.append(int(input_line))

            meta_file = open('{0}/data/{1}/gt.yml'.format(self.root, '%d' % item), 'r')
            self.meta[item] = yaml.safe_load(meta_file)
            self.pt[item] = npy_vtx('{0}/models/{1}.npy'.format(self.root, '%d' % item))

            if len(self.pt[item]) < min:
                min = len(self.pt[item])
            
            print("Object {0} buffer loaded".format(item))

        self.length = len(self.list_rgb)
        self.num_pt_mesh_small = min
        
        # retrieved from /usr/local/zed/settings according to 
        # https://support.stereolabs.com/hc/en-us/articles/360007497173-What-is-the-calibration-file-
        self.cam_cx = 1080.47
        self.cam_cy = 613.322
        self.cam_fx = 1057.8
        self.cam_fy = 1056.61


        self.num = num
        self.add_noise = add_noise
        self.trancolor = transforms.ColorJitter(0.2, 0.2, 0.2, 0.05)
        self.norm = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
        self.border_list = [-1, 40, 80, 120, 160, 200, 240, 280, 320, 360, 400, 440, 480, 520, 560, 600, 640, 680]
        self.num_pt_mesh_large = 500
        # self.num_pt_mesh_small = 100
        self.symmetry_obj_idx = []

    def __getitem__(self, index):
        img = Image.open(self.list_rgb[index])
        ori_img = np.array(img)
        depth = np.array(Image.open(self.list_depth[index]))
        label = np.array(Image.open(self.list_label[index]))


        self.height, self.width, _ = np.shape(img)

        self.xmap = np.array([[j for i in range(self.width)] for j in range(self.height)])
        self.ymap = np.array([[j for i in range(self.width)] for j in range(self.height)])

        # # removing alpha channel
        if np.shape(label)[-1] == 4 :
            label = label[:,:,:-1] 

        obj = self.list_obj[index]
        rank = self.list_rank[index]        

        if obj == 2:
            for i in range(0, len(self.meta[obj][rank])):
                if self.meta[obj][rank][i]['obj_id'] == 2:
                    meta = self.meta[obj][rank][i]
                    break
        else:
            meta = self.meta[obj][rank][0]
        #return array of bools
        mask_depth = ma.getmaskarray(ma.masked_not_equal(depth, 0))
        if self.mode == 'eval':
            mask_label = ma.getmaskarray(ma.masked_equal(label, np.array(255)))
        else:
            mask_label = ma.getmaskarray(ma.masked_equal(label, np.array([255, 255, 255])))[:, :, 0]
        
        mask = mask_label * mask_depth

        if self.add_noise:
            img = self.trancolor(img)

        # remove alpha channel
        img = np.array(img)[:, :, :3]
        img = np.transpose(img, (2, 0, 1))
        img_masked = img

        if self.mode == 'eval':
            rmin, rmax, cmin, cmax = get_bbox(mask_to_bbox(mask_label))
        else: #obj_bb: [minX, minY, widhtOfBbx, heigthOfBbx]
            rmin, rmax, cmin, cmax = get_bbox(meta['obj_bb'])

        img_masked = img_masked[:, rmin:rmax, cmin:cmax]
        # p_img = np.transpose(img_masked, (1, 2, 0))
        # cv2.imwrite('{0}_input.png'.format(index), p_img)

        choose = mask[rmin:rmax, cmin:cmax].flatten().nonzero()[0]
        if len(choose) == 0:
            cc = torch.LongTensor([0])
            return(cc, cc, cc, cc, cc, cc)

        if len(choose) > self.num:
            c_mask = np.zeros(len(choose), dtype=int)
            c_mask[:self.num] = 1
            np.random.shuffle(c_mask)
            choose = choose[c_mask.nonzero()]
        else:
            choose = np.pad(choose, (0, self.num - len(choose)), 'wrap')
        
        depth_masked = depth[rmin:rmax, cmin:cmax].flatten()[choose][:, np.newaxis].astype(np.float32)
        xmap_masked = self.xmap[rmin:rmax, cmin:cmax].flatten()[choose][:, np.newaxis].astype(np.float32)
        ymap_masked = self.ymap[rmin:rmax, cmin:cmax].flatten()[choose][:, np.newaxis].astype(np.float32)
        choose = np.array([choose])

        cam_scale = 1.0
        pt2 = depth_masked / cam_scale
        pt0 = (ymap_masked - self.cam_cx) * pt2 / self.cam_fx
        pt1 = (xmap_masked - self.cam_cy) * pt2 / self.cam_fy
        cloud = np.concatenate((pt0, pt1, pt2), axis=1)
        # cloud = cloud / 1000.0
        cloud = cloud 

        #fw = open('evaluation_result/{0}_cld.xyz'.format(index), 'w')
        #for it in cloud:
        #    fw.write('{0} {1} {2}\n'.format(it[0], it[1], it[2]))
        #fw.close()

        # model_points = self.pt[obj] / 1000.0
        model_points = self.pt[obj]
        dellist = [j for j in range(0, len(model_points))]
        dellist = random.sample(dellist, len(model_points) - self.num_pt_mesh_small)
        model_points = np.delete(model_points, dellist, axis=0)

        target_r = np.resize(np.array(meta['cam_R_m2c']), (3, 3))
        target_t = np.array(meta['cam_t_m2c'])
        add_t = np.array([random.uniform(-self.noise_trans, self.noise_trans) for i in range(3)])

        if self.add_noise:
            cloud = np.add(cloud, add_t)

        #fw = open('evaluation_result/{0}_model_points.xyz'.format(index), 'w')
        #for it in model_points:
        #    fw.write('{0} {1} {2}\n'.format(it[0], it[1], it[2]))
        #fw.close()

        target = np.dot(model_points, target_r.T)
        # if self.add_noise:
        #     target = np.add(target, target_t / 1000.0 + add_t)
        #     out_t = target_t / 1000.0 + add_t
        # else:
        #     target = np.add(target, target_t / 1000.0)
        #     out_t = target_t / 1000.0


        if self.add_noise:
            target = np.add(target, target_t + add_t)
            out_t = target_t + add_t
        else:
            target = np.add(target, target_t)
            out_t = target_t 
        #fw = open('evaluation_result/{0}_tar.xyz'.format(index), 'w')
        #for it in target:
        #    fw.write('{0} {1} {2}\n'.format(it[0], it[1], it[2]))
        #fw.close()

        # np.shape(cloud) (500, 3)
        # np.shape(choose) (1, 500)
        # np.shape(img_masked) (3, 120, 80)
        # np.shape(target) (24, 3)
        # np.shape(model_points) (24, 3)
  
        return torch.from_numpy(cloud.astype(np.float32)), \
               torch.LongTensor(choose.astype(np.int32)), \
               self.norm(torch.from_numpy(img_masked.astype(np.float32))), \
               torch.from_numpy(target.astype(np.float32)), \
               torch.from_numpy(model_points.astype(np.float32)), \
               torch.LongTensor([self.objlist.index(obj)])

    def __len__(self):
        return self.length

    def get_sym_list(self):
        return self.symmetry_obj_idx

    def get_num_points_mesh(self):
        if self.refine:
            return self.num_pt_mesh_large
        else:
            return self.num_pt_mesh_small

border_list = [-1, 40, 80, 120, 160, 200, 240, 280, 320, 360, 400, 440, 480, 520, 560, 600, 640, 680]

def mask_to_bbox(mask):
    mask = mask.astype(np.uint8)
    contours, _ = cv2.findContours(mask, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)


    x = 0
    y = 0
    w = 0
    h = 0
    for contour in contours:
        tmp_x, tmp_y, tmp_w, tmp_h = cv2.boundingRect(contour)
        if tmp_w * tmp_h > w * h:
            x = tmp_x
            y = tmp_y
            w = tmp_w
            h = tmp_h
    return [x, y, w, h]


def get_bbox(bbox):
    bbx = [bbox[1], bbox[1] + bbox[3], bbox[0], bbox[0] + bbox[2]]
    if bbx[0] < 0:
        bbx[0] = 0
    if bbx[1] >= 540:
        bbx[1] = 539
    if bbx[2] < 0:
        bbx[2] = 0
    if bbx[3] >= 960:
        bbx[3] = 959                
    rmin, rmax, cmin, cmax = bbx[0], bbx[1], bbx[2], bbx[3]
    r_b = rmax - rmin
    for tt in range(len(border_list)):
        if r_b > border_list[tt] and r_b < border_list[tt + 1]:
            r_b = border_list[tt + 1]
            break
    c_b = cmax - cmin
    for tt in range(len(border_list)):
        if c_b > border_list[tt] and c_b < border_list[tt + 1]:
            c_b = border_list[tt + 1]
            break
    center = [int((rmin + rmax) / 2), int((cmin + cmax) / 2)]
    rmin = center[0] - int(r_b / 2)
    rmax = center[0] + int(r_b / 2)
    cmin = center[1] - int(c_b / 2)
    cmax = center[1] + int(c_b / 2)
    if rmin < 0:
        delt = -rmin
        rmin = 0
        rmax += delt
    if cmin < 0:
        delt = -cmin
        cmin = 0
        cmax += delt
    if rmax > 540:
        delt = rmax - 540
        rmax = 540
        rmin -= delt
    if cmax > 960:
        delt = cmax - 960
        cmax = 960
        cmin -= delt
    return rmin, rmax, cmin, cmax


def ply_vtx(path):
    f = open(path)
    assert f.readline().strip() == "ply"
    f.readline()
    f.readline()
    N = int(f.readline().split()[-1])
    while f.readline().strip() != "end_header":
        continue
    pts = []
    for _ in range(N):
        pts.append(np.float32(f.readline().split()[:3]))
    return np.array(pts)

def npy_vtx(path):
    return np.load(path,allow_pickle=True)

Thank you for your help @Xushuangyin

from densefusion.

orangeRobot990 commented on July 3, 2024

Hey @fbas-est , I'm having issues with my training as well. Did you notice anything weird in your avg distance when you removed /1000 ? Did you remove it anywhere else than dataset.py ?

Thank you @Xushuangyin and @an99990 i solve it with the array. Now i have issues with training and gettingd nans too because my stuff are in meters ..
Thanks for any help

from densefusion.

Xushuangyin commented on July 3, 2024

cam_scale = 0.001
pt2 = depth_masked * cam_scale
You should change these two lines of code like this

from densefusion.

Xushuangyin commented on July 3, 2024

Because of my cam_ Scale = 0.001, so the code I modified is like this
@an99990 @orangeRobot990

from densefusion.

an99990 commented on July 3, 2024

thank you so much @Xushuangyin , i was able to finally have results using cam_scale/0.001 and without dividing/1000 in getittem. I will start another training with the correct values. thank you so much !

from densefusion.

jc0725 commented on July 3, 2024

Hello. May I ask how any of you were able to train your custom dataset on SegNet?
It seems like the provided code is for YCB format and not Linemod format.

My guess was that I would have to run the SegNet train.py for each of the individual objects for Linemod.

from densefusion.

Xushuangyin commented on July 3, 2024

I used labelme to label the objects in the custom dataset.

…

------------------ 原始邮件 ------------------ 发件人: "j96w/DenseFusion" ***@***.***>; 发送时间: 2022年5月11日(星期三) 中午1:35 ***@***.***>; ***@***.******@***.***>; 主题: Re: [j96w/DenseFusion] Own dataset training results are not accurate (Issue #207) Hello. May I ask how any of you were able to train your custom dataset on SegNet? It seems like the provided code is for YCB format and not Linemod format. My guess was that I would have to run the SegNet train.py for each of the individual objects for Linemod. — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

from densefusion.

jc0725 commented on July 3, 2024

Thank you for your response.
Do you mean that you didn't train SegNet?

from densefusion.

Xushuangyin commented on July 3, 2024

I trained 300 pictures of a single object using Seg Net. @jc0725

from densefusion.

jc0725 commented on July 3, 2024

@Xushuangyin
Thank you for clarifying!
Also, were you able to successfully visualize the bounding box using the visualize.py code provided by @fbas-est ?

from densefusion.

fbas-est commented on July 3, 2024

@an99990
Hello I saw that you are using a ZED camera and from the intrinsic array I assume you didn't train the model at 480p resolution images.
Did you successfully trained the model in higher resolution?

from densefusion.

an99990 commented on July 3, 2024

@fbas-est I generated image from Unity. The image are 560 x 940 , if I remember correctly. My poses do not seem to be quite correct tho. Heres an image during inference. I might create a dataset with images from the ZED camera. The camera in Unity didnt have the same camera intrinsic as the ZED, so that might be why my results arent precised. I also never reached the refinement step during training.

from densefusion.

fbas-est commented on July 3, 2024

@an99990 Yes that is probably the issue, ZED camera comes with 4 build in calibrations with the smallest being for 672x376 images. If you train the network with synthetic data I guess you have to replicate the images that your camera captures.

May I ask how you created the synthetic dataset ?

from densefusion.

an99990 commented on July 3, 2024

i have a Unity project to create dataset with linemode format. I cant share it tho since it is not the companies stuff :/

from densefusion.

jc0725 commented on July 3, 2024

May I ask how any of you were able to output and save the vanilla_segmentation label png files?

from densefusion.

XLXIAOLONG commented on July 3, 2024

@an99990 Hello. i make a linemod dataset by Objectdatasettools. in the eval_linemod.py, it's success rate is 0.9285. but when i visualize it, the point seems to be in the wrong place. Can you give me some advice? Thank you in advance!

from densefusion.

an99990 commented on July 3, 2024

Have you payed with the cam_scale ? i had to change it to 1000, try with different values, it seems that its bigger than your object

from densefusion.

XLXIAOLONG commented on July 3, 2024

Have you payed with the cam_scale ? i had to change it to 1000, try with different values, it seems that its bigger than your object

@an99990 Thanks for your reply. I make the dataset by realsense. I change the cam_scale to it's own value, like this
cam_scale = 0.0002500000118743628
pt2 = depth_masked * cam_scale
pt0 = (ymap_masked - self.cam_cx) * pt2 / self.cam_fx
pt1 = (xmap_masked - self.cam_cy) * pt2 / self.cam_fy
cloud = np.concatenate((pt0, pt1, pt2), axis=1)
# cloud = cloud / 1000.0
# print(cloud.max())
cloud = cloud

0.0002500000118743628 is the depth scale of real camera.

from densefusion.

Windson9 commented on July 3, 2024

Hi @Xushuangyin and @an99990. I hope you are doing well. I am trying to train this model on my custom dataset. Can you please share if you were able to successfully train the model? Can you share the results if possible? Thanks.

from densefusion.

nanxiaoyixuan commented on July 3, 2024

@jc0725 Hi, I also trained myself to build linemod datasets, and when I debug, I found that 'input_file = open('{0}/data/{1}/train.txt'.format(self.root, '%02d' % item)) 'error' No such file or Directory: '/ datasets/linemod/linemod_preprocessed/data / 01 / train. txt', 'cause I won't be able to view the subsequent code to run through the debug.
But through the command 'bash. / experiments/scripts/train_linemod sh' can be trained, not appear this kind of error, excuse me you had met this kind of situation? Is there any solution?
Thank you very much for your reply.

from densefusion.

nanxiaoyixuan commented on July 3, 2024

@fbas-est Hi, I also trained myself to build linemod datasets, and when I debug, I found that 'input_file = open('{0}/data/{1}/train.txt'.format(self.root, '%02d' % item)) 'error' No such file or Directory: '/ datasets/linemod/linemod_preprocessed/data / 01 / train. txt', 'cause I won't be able to view the subsequent code to run through the debug.
But through the command 'bash. / experiments/scripts/train_linemod sh' can be trained, not appear this kind of error, excuse me you had met this kind of situation? Is there any solution?
Thank you very much for your reply.

from densefusion.

Own dataset training results are not accurate about densefusion HOT 39 OPEN

Comments (39)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent