Light

wzzheng / idml Goto Github PK

View Code? Open in Web Editor NEW

45.0 45.0 4.0 3.48 MB

[TPAMI 2023] Official implementation of Introspective Deep Metric Learning.

Python 100.00%

image-retrieval metric-learning

idml's Introduction

Hi there 👋

I'm Wenzhao Zheng, a postdoctoral fellow at BAIR, UC Berkeley, working with by Prof. Kurt Keutzer. I received my Ph.D. and B.S. from Tsinghua University, supervised by Jie Zhou and Jiwen Lu.

Previous Efforts

We build the first academic surround-camera 3D occupancy prediction model TPVFormer🎉.

3D Scene Representation: SurroundDepth -> TPVFormer -> SurroundOcc -> SelfOcc -> GaussianFormer
End-to-End Autonomous Driving: BEVerse -> OccWorld -> GenAD -> S³Gaussian -> OccSora

Current Interests

🦙Large Models + 🚙Autonomous Driving -> 🤖AGI

🦙 Large Models: Efficient/Small LLMs, Multimodal Models, Video Generation Models, Large Action Models...
🚙 Autonomous Driving: 3D Occupancy Prediction, End-to-End Driving, World Models, 3D Scene Reconstruction...

Collaborations

If you want to work with me (in person or remotely) at 🐻UC Berkeley (Co-supervised by Prof. Kurt Keutzer), 💜Tsinghua University (Co-supervised by Prof. Jiwen Lu), and/or 🔴Peking University (Co-supervised by Prof. Shanghang Zhang), feel free to drop me an email at [email protected]. I could support GPUs if we are a good fit.

idml's People

Contributors

Stargazers

Watchers

Forkers

techthiyanes caczhtus mldl hoangpham3003

idml's Issues

How model learn uncertainty

Hi Chengkun, your work is very novelty, I have a question that how the model learn uncertainty because there is no ground truth uncertainty label provided.

Image augmentation

Hi, thank you for your interesting research.

I have some questions regarding the data augmentation:

In the paper, you stated as follows: "The training images were first resized to 256 × 256 and then augmented with random cropping to 224×224 as well as random horizontal flipping with the probability of 50%." But when reading the code, I see that you use the default dataset augmentation from the ProxyAnchor code, which does not contain resized image to 256 x 256 for is_train=True. Can you clarify it?
There is no mention, is the test data augmentation used the same one from the ProxyAnchor code.

Has this work been accepted by any conference? such as ECCV or CVPR.

May I ask what's the difference between your work and Proxy Synthesis?

Hi,
I think your work is promising and interesting. May I kindly ask what the difference is between your work and the Proxy Synthesis[1]? It seems you all try to search for a linear interpretation between different classes.

[1] Gu et al. "Proxy synthesis: Learning with synthetic classes for deep metric learning." AAAI 2021
github.com/navervision/proxy-synthesis

Thank you so much.

An error occurred while running

When running image_retrieval/code/train.py it shows the error
An error is reported in the opt.step() line of code, showing
local variable 'beta1' referenced before assignment

Question about tau and gamma.

Thank you for your interesting research.

I would be very happy if you can answer some of my questions:

You have this paragraph in your paper:

"In addition, we fixed τ = 5 and set γ to 0, 1, 2, 3, 4 for training. The experimental results vary on the two datasets.
Specifically, our framework achieves the best performance when γ = 0 on the CUB-200-2011 dataset while γ = 3 on the Cars196 dataset. This indicates that the metric is more discreet when comparing images on the Cars196 dataset."

But on the left of figures 4. a and 4. b, we can see that the highest recall@1 corresponds to gamma = 2 for both CUB-200-2011 and Cars196 datasets. What is this different?

In your ProxyAnchor loss, you had set the gamma = 4, and tau = 5. Is this the set of hyperparameters you found has the highest performance on recall@1?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.