Git Product home page Git Product logo

Comments (7)

vladimirmujagic avatar vladimirmujagic commented on May 23, 2024 1

Looks like when i add your code it works well, changed my code by adding

                r, rendered_frame, poses, bboxes = process_image_img2pose(frame, model, renderer, transform, threshold, threed_points)
                for i, pose in enumerate(poses):
                    bbox = bboxes[i]
                    pitch, yaw, roll, _, _, scale = pose
                    tdx = bbox[0] + ((bbox[2] - bbox[0]) / 2)
                    tdy = bbox[1] + ((bbox[3] - bbox[1]) / 2)
                    rendered_frame = draw_axis(np.asarray(rendered_frame), yaw, pitch, roll, tdx=tdx, tdy=tdy, size=1000 / scale)

from img2pose.

vitoralbiero avatar vitoralbiero commented on May 23, 2024

Yes, the order is pitch, yaw, roll, horizontal translation, vertical translation, and scale.
By your pose example it looks like you are doing it right, but just double check that you are giving the pose mean and std deviation when creating the model, or adding it afterwards.

from img2pose.

vladimirmujagic avatar vladimirmujagic commented on May 23, 2024

Thanks for quick response,

This is how i load and prepare model

def load():
    renderer = Renderer(
        vertices_path="/app/detectors/img2pose/pose_references/vertices_trans.npy",
        triangles_path="/app/detectors/img2pose/pose_references/triangles.npy"
    )

    threed_points = np.load('/app/detectors/img2pose/pose_references/reference_3d_5_points_trans.npy')

    transform = transforms.Compose([transforms.ToTensor()])

    DEPTH = 18
    MAX_SIZE = 1400
    MIN_SIZE = 600

    POSE_MEAN = "/app/detectors/img2pose/models/WIDER_train_pose_mean_v1.npy"
    POSE_STDDEV = "/app/detectors/img2pose/models/WIDER_train_pose_stddev_v1.npy"
    MODEL_PATH = "/app/detectors/img2pose/models/img2pose_v1.pth"

    pose_mean = np.load(POSE_MEAN)
    pose_stddev = np.load(POSE_STDDEV)

    img2pose_model = img2poseModel(
        DEPTH, MIN_SIZE, MAX_SIZE,
        pose_mean=pose_mean, pose_stddev=pose_stddev,
        threed_68_points=threed_points,
    )
    load_model(img2pose_model.fpn_model, MODEL_PATH, cpu_mode=str(img2pose_model.device) == "cpu", model_only=True)
    img2pose_model.evaluate()
    threshold = 0.9

    return renderer, img2pose_model, transform, threshold, threed_points

This is how i process the current frame

def process_image_img2pose(frame, img2pose_model, renderer, transform, threshold, threed_points):
    frame = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    img = Image.fromarray(frame)

    (w, h) = img.size
    image_intrinsics = np.array([[w + h, 0, w // 2], [0, w + h, h // 2], [0, 0, 1]])

    res = img2pose_model.predict([transform(img)])[0]

    all_bboxes = res["boxes"].cpu().numpy().astype('float')

    poses = []
    bboxes = []
    for i in range(len(all_bboxes)):
        if res["scores"][i] > threshold:
            bbox = all_bboxes[i]
            pose_pred = res["dofs"].cpu().numpy()[i].astype('float')
            pose_pred = pose_pred.squeeze()

            poses.append(pose_pred)
            bboxes.append(bbox)

    aligned_faces = align_faces_lm(threed_points, img, poses)
    if not aligned_faces:
        aligned_faces = []

    return aligned_faces, render_plot(img.copy(), poses, bboxes, renderer), poses

Video processing

            while capture.isOpened():
                detections = []

                ret, frame = capture.read()
                if not ret:
                    break

                r, rendered_frame, poses = process_image_img2pose(frame, model, renderer, transform, threshold, threed_points)
                for i, pose in enumerate(poses):
                    print(pose[0:3])
                    draw_axis(rendered_frame, pose[0:3], np.mean(r[i], axis=0))

                for lms in r:
                    for lm in lms:
                        point = (int(lm[0]), int(lm[1]))
                        cv2.circle(rendered_frame, point, 3, (255, 100, 100), 1, cv2.LINE_AA)

                result = {
                    'lms': [lms.tolist() for lms in r],
                    'pose': [p.tolist() for p in poses]
                }
                json_result[frame_id] = result

                sink.write(rendered_frame)
                frame_id += 1

            with open(json_out_path, 'w') as json_file:
                json.dump(json_result, json_file, indent=2)

I am trying to process some videos quickly for qualitative assesment on my data, currently I am able to produce pretty good videos containing (5pts & face mask) but would like to have axis as well.

Will try to use img2pose as base detector for autoannotation with multiple detectors since its quite robust according to current experiments

from img2pose.

vitoralbiero avatar vitoralbiero commented on May 23, 2024

No problem!
Everything looks good in the snippets you sent. And I believe the draw axis code will work as well.
Just one thing, if you care about the bbox at all, instead of giving the 5 pts 3D reference in _"threed_68_points=threed_points,", give the 68 pts one, as the bbox will be capture better the face.

On early experiments, I have used the following code to draw axis:

def draw_axis(img, yaw, pitch, roll, tdx=None, tdy=None, size=50):
    yaw = -yaw

    if tdx != None and tdy != None:
        tdx = tdx
        tdy = tdy
    else:
        height, width = img.shape[:2]
        tdx = width / 2
        tdy = height / 2

    # X-Axis pointing to right drawn in red
    x1 = size * (cos(yaw) * cos(roll)) + tdx
    y1 = size * (cos(pitch) * sin(roll) + cos(roll) * sin(pitch) * sin(yaw)) + tdy

    # Y-Axis | drawn in green
    x2 = size * (-cos(yaw) * sin(roll)) + tdx
    y2 = size * (cos(pitch) * cos(roll) - sin(pitch) * sin(yaw) * sin(roll)) + tdy

    # Z-Axis (out of the screen) drawn in blue
    x3 = size * (sin(yaw)) + tdx
    y3 = size * (-cos(yaw) * sin(pitch)) + tdy

    cv2.line(img, (int(tdx), int(tdy)), (int(x1),int(y1)),(0,0,255),3)
    cv2.line(img, (int(tdx), int(tdy)), (int(x2),int(y2)),(0,255,0),3)
    cv2.line(img, (int(tdx), int(tdy)), (int(x3),int(y3)),(255,0,0),2)

    return img

Calling like:

pitch, yaw, roll, _, _, scale = pose
tdx = bbox[0] + ((bbox[2] - bbox[0]) / 2)
tdy = bbox[1] + ((bbox[3] - bbox[1]) / 2)
res_img = draw_axis(np.asarray(img), yaw, pitch, roll, tdx=tdx, tdy=tdy, size=1000 / scale)

from img2pose.

vladimirmujagic avatar vladimirmujagic commented on May 23, 2024

Thank you very much, i will try your code and post results

from img2pose.

vladimirmujagic avatar vladimirmujagic commented on May 23, 2024

Regarding the bounding box, I need 5pts format and conversion to widerface so i can retrain some models

from img2pose.

vitoralbiero avatar vitoralbiero commented on May 23, 2024

Regarding the bounding box, I need 5pts format and conversion to widerface so i can retrain some models

Yes, you can still use the 5 pts to that, but change this part so that the output bbox captures more the face:

threed_68_points = np.load('/app/detectors/img2pose/pose_references/reference_3d_68_points_trans.npy')

img2pose_model = img2poseModel(
        DEPTH, MIN_SIZE, MAX_SIZE,
        pose_mean=pose_mean, pose_stddev=pose_stddev,
        threed_68_points=threed_68_points,
    )

Then, you can continue to give the 5 pts version to

aligned_faces = align_faces_lm(threed_points, img, poses)

from img2pose.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.