Hi, I am trying to render (x,y,z) axis based on network output inste

Drawing axis based on yaw, pitch, roll about img2pose HOT 7 CLOSED

vitoralbiero commented on May 23, 2024

Drawing axis based on yaw, pitch, roll

from img2pose.

Comments (7)

vladimirmujagic commented on May 23, 2024 1

Looks like when i add your code it works well, changed my code by adding

                r, rendered_frame, poses, bboxes = process_image_img2pose(frame, model, renderer, transform, threshold, threed_points)
                for i, pose in enumerate(poses):
                    bbox = bboxes[i]
                    pitch, yaw, roll, _, _, scale = pose
                    tdx = bbox[0] + ((bbox[2] - bbox[0]) / 2)
                    tdy = bbox[1] + ((bbox[3] - bbox[1]) / 2)
                    rendered_frame = draw_axis(np.asarray(rendered_frame), yaw, pitch, roll, tdx=tdx, tdy=tdy, size=1000 / scale)

from img2pose.

vitoralbiero commented on May 23, 2024

Yes, the order is pitch, yaw, roll, horizontal translation, vertical translation, and scale.
By your pose example it looks like you are doing it right, but just double check that you are giving the pose mean and std deviation when creating the model, or adding it afterwards.

from img2pose.

vladimirmujagic commented on May 23, 2024

Thanks for quick response,

This is how i load and prepare model

def load():
    renderer = Renderer(
        vertices_path="/app/detectors/img2pose/pose_references/vertices_trans.npy",
        triangles_path="/app/detectors/img2pose/pose_references/triangles.npy"
    )

    threed_points = np.load('/app/detectors/img2pose/pose_references/reference_3d_5_points_trans.npy')

    transform = transforms.Compose([transforms.ToTensor()])

    DEPTH = 18
    MAX_SIZE = 1400
    MIN_SIZE = 600

    POSE_MEAN = "/app/detectors/img2pose/models/WIDER_train_pose_mean_v1.npy"
    POSE_STDDEV = "/app/detectors/img2pose/models/WIDER_train_pose_stddev_v1.npy"
    MODEL_PATH = "/app/detectors/img2pose/models/img2pose_v1.pth"

    pose_mean = np.load(POSE_MEAN)
    pose_stddev = np.load(POSE_STDDEV)

    img2pose_model = img2poseModel(
        DEPTH, MIN_SIZE, MAX_SIZE,
        pose_mean=pose_mean, pose_stddev=pose_stddev,
        threed_68_points=threed_points,
    )
    load_model(img2pose_model.fpn_model, MODEL_PATH, cpu_mode=str(img2pose_model.device) == "cpu", model_only=True)
    img2pose_model.evaluate()
    threshold = 0.9

    return renderer, img2pose_model, transform, threshold, threed_points

This is how i process the current frame

def process_image_img2pose(frame, img2pose_model, renderer, transform, threshold, threed_points):
    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    img = Image.fromarray(frame)

    (w, h) = img.size
    image_intrinsics = np.array([[w + h, 0, w // 2], [0, w + h, h // 2], [0, 0, 1]])

    res = img2pose_model.predict([transform(img)])[0]

    all_bboxes = res["boxes"].cpu().numpy().astype('float')

    poses = []
    bboxes = []
    for i in range(len(all_bboxes)):
        if res["scores"][i] > threshold:
            bbox = all_bboxes[i]
            pose_pred = res["dofs"].cpu().numpy()[i].astype('float')
            pose_pred = pose_pred.squeeze()

            poses.append(pose_pred)
            bboxes.append(bbox)

    aligned_faces = align_faces_lm(threed_points, img, poses)
    if not aligned_faces:
        aligned_faces = []

    return aligned_faces, render_plot(img.copy(), poses, bboxes, renderer), poses

Video processing

            while capture.isOpened():
                detections = []

                ret, frame = capture.read()
                if not ret:
                    break

                r, rendered_frame, poses = process_image_img2pose(frame, model, renderer, transform, threshold, threed_points)
                for i, pose in enumerate(poses):
                    print(pose[0:3])
                    draw_axis(rendered_frame, pose[0:3], np.mean(r[i], axis=0))

                for lms in r:
                    for lm in lms:
                        point = (int(lm[0]), int(lm[1]))
                        cv2.circle(rendered_frame, point, 3, (255, 100, 100), 1, cv2.LINE_AA)

                result = {
                    'lms': [lms.tolist() for lms in r],
                    'pose': [p.tolist() for p in poses]
                }
                json_result[frame_id] = result

                sink.write(rendered_frame)
                frame_id += 1

            with open(json_out_path, 'w') as json_file:
                json.dump(json_result, json_file, indent=2)

I am trying to process some videos quickly for qualitative assesment on my data, currently I am able to produce pretty good videos containing (5pts & face mask) but would like to have axis as well.

Will try to use img2pose as base detector for autoannotation with multiple detectors since its quite robust according to current experiments

from img2pose.

vitoralbiero commented on May 23, 2024

No problem!
Everything looks good in the snippets you sent. And I believe the draw axis code will work as well.
Just one thing, if you care about the bbox at all, instead of giving the 5 pts 3D reference in _"threed_68_points=threed_points,", give the 68 pts one, as the bbox will be capture better the face.

On early experiments, I have used the following code to draw axis:

def draw_axis(img, yaw, pitch, roll, tdx=None, tdy=None, size=50):
    yaw = -yaw

    if tdx != None and tdy != None:
        tdx = tdx
        tdy = tdy
    else:
        height, width = img.shape[:2]
        tdx = width / 2
        tdy = height / 2

    # X-Axis pointing to right drawn in red
    x1 = size * (cos(yaw) * cos(roll)) + tdx
    y1 = size * (cos(pitch) * sin(roll) + cos(roll) * sin(pitch) * sin(yaw)) + tdy

    # Y-Axis | drawn in green
    x2 = size * (-cos(yaw) * sin(roll)) + tdx
    y2 = size * (cos(pitch) * cos(roll) - sin(pitch) * sin(yaw) * sin(roll)) + tdy

    # Z-Axis (out of the screen) drawn in blue
    x3 = size * (sin(yaw)) + tdx
    y3 = size * (-cos(yaw) * sin(pitch)) + tdy

    cv2.line(img, (int(tdx), int(tdy)), (int(x1),int(y1)),(0,0,255),3)
    cv2.line(img, (int(tdx), int(tdy)), (int(x2),int(y2)),(0,255,0),3)
    cv2.line(img, (int(tdx), int(tdy)), (int(x3),int(y3)),(255,0,0),2)

    return img

Calling like:

pitch, yaw, roll, _, _, scale = pose
tdx = bbox[0] + ((bbox[2] - bbox[0]) / 2)
tdy = bbox[1] + ((bbox[3] - bbox[1]) / 2)
res_img = draw_axis(np.asarray(img), yaw, pitch, roll, tdx=tdx, tdy=tdy, size=1000 / scale)

from img2pose.

vladimirmujagic commented on May 23, 2024

Thank you very much, i will try your code and post results

from img2pose.

vladimirmujagic commented on May 23, 2024

Regarding the bounding box, I need 5pts format and conversion to widerface so i can retrain some models

from img2pose.

vitoralbiero commented on May 23, 2024

Regarding the bounding box, I need 5pts format and conversion to widerface so i can retrain some models

Yes, you can still use the 5 pts to that, but change this part so that the output bbox captures more the face:

threed_68_points = np.load('/app/detectors/img2pose/pose_references/reference_3d_68_points_trans.npy')

img2pose_model = img2poseModel(
        DEPTH, MIN_SIZE, MAX_SIZE,
        pose_mean=pose_mean, pose_stddev=pose_stddev,
        threed_68_points=threed_68_points,
    )

Then, you can continue to give the 5 pts version to

aligned_faces = align_faces_lm(threed_points, img, poses)

from img2pose.

Drawing axis based on yaw, pitch, roll about img2pose HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent