Hi, I notice there are differences between results reported in CQL p

The numbers in the NeurIPS version of the CQL paper: <a href="https://proceedings.neur

Discrepancy between results reported in CQL and D4rl papers about d4rl HOT 5 OPEN

farama-foundation commented on July 17, 2024

Discrepancy between results reported in CQL and D4rl papers

from d4rl.

Comments (5)

aviralkumar2907 commented on July 17, 2024 4

The numbers in the NeurIPS version of the CQL paper: https://proceedings.neurips.cc/paper/2020/file/0d2b2061826a5df3221116a5085a6052-Paper.pdf are supposed to be used as reference, which refers to the table you mentioned. The original CQL paper (old version) matches the D4RL paper. We are in the process of fixing github issues in D4RL and will report the updated numbers in the next update.

from d4rl.

aviralkumar2907 commented on July 17, 2024 1

Hi, CQL reported numbers from the first arxiv version of the D4RL paper, which (for BEAR) have then improved in the newer version of D4RL. We will update the numbers for baselines in CQL, and so the results in D4RL should be used as reference. I think the difference is mainly in BEAR numbers, which changed since we moved to a better BEAR implementation.

from d4rl.

yifan123 commented on July 17, 2024 1

Hi, I also cross check the CQL scores reported in D4RL (arXiv-v4).
1. The mismatches has not been fixed @IcarusWizard @justinjfu @aviralkumar2907
2. In Table2 and Table3 of D4RL, there are also few mismatches for the same env.
For BC, 923 / 3234 = 29
For CQL, 2557 / 3234 = 79 not 58

Task		SAC	BC	CQL
hopper-medium	D4RL (arXiv-v4) Table2 Normalized Score	100	29.0	58
hopper-medium	D4RL (arXiv-v4) Table3 Un-Normalized Score	3234.3	923.5	2557.3

from d4rl.

rasoolfa commented on July 17, 2024

Thanks for your response.
One more question, were hyperparameters tuned per environment and data setting? or just one set of hyperparameter is used for all environments?

from d4rl.

IcarusWizard commented on July 17, 2024

Hi, I just cross check the CQL scores reported in D4RL (arXiv-v4) and CQL (NeurIPS) papers, there are few mismatches.

Task	D4RL (arXiv-v4)	CQL (NeurIPS)
walker2d-medium	79.2	74.5
hopper-medium	58.0	86.6
walker2d-medium-replay	26.7	32.6

I hope you can clarify which one can be correctly used as a reference. Especially for hopper-medium, since the difference is huge.

from d4rl.

Recommend Projects

Discrepancy between results reported in CQL and D4rl papers about d4rl HOT 5 OPEN

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent