Git Product home page Git Product logo

idml's Introduction

Hi there 👋

I'm Wenzhao Zheng, a postdoctoral fellow at BAIR, UC Berkeley, working with by Prof. Kurt Keutzer. I received my Ph.D. and B.S. from Tsinghua University, supervised by Jie Zhou and Jiwen Lu.

Previous Efforts

We build the first academic surround-camera 3D occupancy prediction model TPVFormer🎉.

Current Interests

🦙Large Models + 🚙Autonomous Driving -> 🤖AGI

  • 🦙 Large Models: Efficient/Small LLMs, Multimodal Models, Video Generation Models, Large Action Models...
  • 🚙 Autonomous Driving: 3D Occupancy Prediction, End-to-End Driving, World Models, 3D Scene Reconstruction...

Collaborations

If you want to work with me (in person or remotely) at 🐻UC Berkeley (Co-supervised by Prof. Kurt Keutzer), 💜Tsinghua University (Co-supervised by Prof. Jiwen Lu), and/or 🔴Peking University (Co-supervised by Prof. Shanghang Zhang), feel free to drop me an email at [email protected]. I could support GPUs if we are a good fit.

idml's People

Contributors

wangck20 avatar wzzheng avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

idml's Issues

How model learn uncertainty

Hi Chengkun, your work is very novelty, I have a question that how the model learn uncertainty because there is no ground truth uncertainty label provided.

Image augmentation

Hi, thank you for your interesting research.

I have some questions regarding the data augmentation:

  1. In the paper, you stated as follows: "The training images were first resized to 256 × 256 and then augmented with random cropping to 224×224 as well as random horizontal flipping with the probability of 50%." But when reading the code, I see that you use the default dataset augmentation from the ProxyAnchor code, which does not contain resized image to 256 x 256 for is_train=True. Can you clarify it?
  2. There is no mention, is the test data augmentation used the same one from the ProxyAnchor code.

May I ask what's the difference between your work and Proxy Synthesis?

Hi,
I think your work is promising and interesting. May I kindly ask what the difference is between your work and the Proxy Synthesis[1]? It seems you all try to search for a linear interpretation between different classes.

[1] Gu et al. "Proxy synthesis: Learning with synthetic classes for deep metric learning." AAAI 2021
github.com/navervision/proxy-synthesis

Thank you so much.

An error occurred while running

When running image_retrieval/code/train.py it shows the error
An error is reported in the opt.step() line of code, showing
local variable 'beta1' referenced before assignment

Question about tau and gamma.

Thank you for your interesting research.

I would be very happy if you can answer some of my questions:

  1. You have this paragraph in your paper:

"In addition, we fixed τ = 5 and set γ to 0, 1, 2, 3, 4 for training. The experimental results vary on the two datasets.
Specifically, our framework achieves the best performance when γ = 0 on the CUB-200-2011 dataset while γ = 3 on the Cars196 dataset. This indicates that the metric is more discreet when comparing images on the Cars196 dataset."

But on the left of figures 4. a and 4. b, we can see that the highest recall@1 corresponds to gamma = 2 for both CUB-200-2011 and Cars196 datasets. What is this different?

  1. In your ProxyAnchor loss, you had set the gamma = 4, and tau = 5. Is this the set of hyperparameters you found has the highest performance on recall@1?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.