kevslinger / dtqn Goto Github PK

Deep Transformer Q-Networks for Partially Observable Reinforcement Learning

License: MIT License

Python 100.00%

dtqn's Introduction

Hi there 👋, I'm Kevin Esslinger! But you can call me Kev

Want to chat? Feel free to send me a message 📬, a follow 🐦, an invitation to connect 📥, or an email 📧! (Or all of the above 😁)

Bio

👨‍💻 I'm working as a Graduate Software Developer at Flow Traders in Amsterdam, The Netherlands.
🎓 Master of Science in Computer Science from Northeastern University in the Khoury College of Computer Sciences in Boston, Massachusetts. Graduated December 2022. Co-advised by Chris Amato and Robert Platt, worked on using transformers to solve challenging, partially observable tasks with reinforcement learning
🎓 Bachelor of Science in Computer Science and Mathematics with a minor in Data Science from Temple University in Philadelphia, Pennsylvania. Graduated Summa Cum Laude in May 2020
🦅 Achieved the rank of Eagle Scout

My favourite technologies:

Programming languages:
Text editors:
Machine learning lbiraries:
Operating system:

News

📓 My first paper, Deep Transformer Q-Networks for Partially Observable Reinforcement Learning, was published to the Neurips 2022 Workshop on Foundation Models for Decision Making! It's publicly available here on arXiv. The code for the paper is available at my DTQN repo

More about me

🚴‍♂️ Casual city biker
🎮 Teamfight Tactics and Magic: the Gathering player
🏀 76ers fans
🏈 Ravens and Eagles fan
☕ Latte art amateur
📜 Check out my resume here

dtqn's People

Contributors

Stargazers

Watchers

Forkers

timckai mahyardana lyu-xg mhahn0106 marisgg ibagur eejuncao richardjozsa sci-i toughstyle darcstar-solutions-tech hust1booze alireza-ebrahimi-ai dianabessie rknssang tttonyalpha jonywang1775 supercodeai paugarriga22

dtqn's Issues

Error run.py

(DTQN) mds@mds:~/DTQN$ python -u "/home/mds/DTQN/run.py"
Loading using gym.make
Environment with id D not found.
Loading using YAML
Traceback (most recent call last):
File "/home/mds/DTQN/utils/env_processing.py", line 34, in make_env
env = gym.make(id_or_path)
File "/home/mds/anaconda3/envs/DTQN/lib/python3.8/site-packages/gym/envs/registration.py", line 142, in make
return registry.make(id, **kwargs)
File "/home/mds/anaconda3/envs/DTQN/lib/python3.8/site-packages/gym/envs/registration.py", line 86, in make
spec = self.spec(path)
File "/home/mds/anaconda3/envs/DTQN/lib/python3.8/site-packages/gym/envs/registration.py", line 115, in spec
raise error.Error('Attempted to look up malformed environment ID: {}. (Currently all IDs must be of the form {}.)'.format(id.encode('utf-8'), env_id_re.pattern))
gym.error.Error: Attempted to look up malformed environment ID: b'D'. (Currently all IDs must be of the form ^(?:[\w:-]+/)?([\w:.-]+)-v(\d+)$.)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/mds/DTQN/run.py", line 533, in
run_experiment(get_args())
File "/home/mds/DTQN/run.py", line 415, in run_experiment
envs.append(env_processing.make_env(env_str))
File "/home/mds/DTQN/utils/env_processing.py", line 39, in make_env
inner_env = factory_env_from_yaml(
File "/home/mds/anaconda3/envs/DTQN/lib/python3.8/site-packages/gym_gridverse/envs/yaml/factory.py", line 243, in factory_env_from_yaml
with open(path) as f:
FileNotFoundError: [Errno 2] No such file or directory: '/home/mds/DTQN/envs/gridverse/D'

When I executed run.py , got above Error. I think I installed right. But might installed in wrong path. I don't understand envs/gridverse/D.
Please help me execute run.py

Can this project run on the Windows?

It seems that you run this project on Linux，when i tried Window and run the command "pip install -r requirements.txt"，it will prompt me some wrong information:

Building wheels for collected packages: nle
Building wheel for nle (pyproject.toml) ... error
error: subprocess-exited-with-error

× Building wheel for nle (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [41 lines of output]
fatal: not a git repository (or any of the parent directories): .g
it
Building wheel nle-0.8.1
running bdist_wheel
running build
running build_py
creating build
creating build\lib.win-amd64-cpython-38
creating build\lib.win-amd64-cpython-38\nle
copying nle\version.py -> build\lib.win-amd64-cpython-38\nle
copying nle_init_.py -> build\lib.win-amd64-cpython-38\nle
creating build\lib.win-amd64-cpython-38\nle\env
copying nle\env\base.py -> build\lib.win-amd64-cpython-38\nle\env
copying nle\env\tasks.py -> build\lib.win-amd64-cpython-38\nle\env

  copying nle\env\__init__.py -> build\lib.win-amd64-cpython-38\nle\

env
creating build\lib.win-amd64-cpython-38\nle\nethack
copying nle\nethack\actions.py -> build\lib.win-amd64-cpython-38\n
le\nethack
copying nle\nethack\nethack.py -> build\lib.win-amd64-cpython-38\n
le\nethack
copying nle\nethack_init_.py -> build\lib.win-amd64-cpython-38
nle\nethack
creating build\lib.win-amd64-cpython-38\nle\agent
copying nle\agent\agent.py -> build\lib.win-amd64-cpython-38\nle\a
gent
copying nle\agent\vtrace.py -> build\lib.win-amd64-cpython-38\nle
agent
copying nle\agent_init_.py -> build\lib.win-amd64-cpython-38\nl
e\agent
creating build\lib.win-amd64-cpython-38\nle\scripts
copying nle\scripts\check_nethack_speed.py -> build\lib.win-amd64-
cpython-38\nle\scripts
copying nle\scripts\collect_env.py -> build\lib.win-amd64-cpython-
38\nle\scripts
copying nle\scripts\play.py -> build\lib.win-amd64-cpython-38\nle
scripts
copying nle\scripts\plot.py -> build\lib.win-amd64-cpython-38\nle
scripts
copying nle\scripts\read_heaplog.py -> build\lib.win-amd64-cpython
-38\nle\scripts
copying nle\scripts\read_tty.py -> build\lib.win-amd64-cpython-38
nle\scripts
copying nle\scripts\test_raw_nethack.py -> build\lib.win-amd64-cpy
thon-38\nle\scripts
copying nle\scripts\ttyplay.py -> build\lib.win-amd64-cpython-38\n
le\scripts
copying nle\scripts\ttyplay2.py -> build\lib.win-amd64-cpython-38
nle\scripts
copying nle\scripts\ttyrec.py -> build\lib.win-amd64-cpython-38\nl
e\scripts
copying nle\scripts_init_.py -> build\lib.win-amd64-cpython-38
nle\scripts
creating build\lib.win-amd64-cpython-38\nle\tests
copying nle\tests\test_envs.py -> build\lib.win-amd64-cpython-38\n
le\tests
copying nle\tests\test_nethack.py -> build\lib.win-amd64-cpython-3
8\nle\tests
copying nle\tests\test_profile.py -> build\lib.win-amd64-cpython-3
8\nle\tests
copying nle\tests\test_system.py -> build\lib.win-amd64-cpython-38
\nle\tests
running build_ext
error: [WinError 2] 系统找不到指定的文件。
[end of output]

note: This error originates from a subprocess, and is likely not a pro
blem with pip.
ERROR: Failed building wheel for nle
Failed to build nle
ERROR: Could not build wheels for nle, which is required to install pypr
oject.toml-based projects

paper reproduction in gridverse 7x7, 9x9

Hi,

First of all, I want to commend you on your great idea and thank you for sharing your code.

After reading your paper and being impressed, I have been trying to reproduce the results using your code.

However, after conducting several experiments with the main branch, the performance seems to be unstable.

I have been working with griverse 7x7, 9x9, and other configurations, but there are many cases where the success rate drops sharply.

I found out from the issue board that the version used in the paper is in the 'paper' branch.

Could you please explain any major changes between these two branches?

Question about transformer with DQN

Hi, Kev
It's glad to know your work about DTQN.
I am very curious about why the work of combine Transformer and DQN is very small ,and this two technology is emit very early.
Because I thought there would be a lot of work in that point, but it is not.
As I know , It's maybe just one paper 'TRANSFORMER BASED REINFORCEMENT LEARNING FOR GAMES' before your work, and in that paper DTQN is not good as DRQN.
Do you have some insight about this?

`in-embed` instead of `inembed`?

In the readme,
the flag mentioned is --inembed but shouldn'it it be --in-embed?
hence the command becomes,

python run.py --envs DiscreteCarFlag-v0 --in-embed 128 --disable-wandb --verbose

I'm trying to connect DTQN network with SUMO. I want to know inputs of DTQN.

Hello, again.
As I mention on the title. I'm trying to connect DTQN network with SUMO as an environment. I want to know inputs of DTQN.
I think the inputs are Observations as I read the following paper.
But I don't know the format.
I'm struggle really hard on finding the input format of DTQN.
Which python file should I check? Could you tell me the location, Please?
Furthermore, I'm trying to use sequential numbers as states. Ex) relative distance between agent and other cars, agent's velocity and current lane number.
And I wonder DTQN architecture can handle this state as observation.
Because DQN's inputs seem like sequential images. Is DTQN's inputs are sequential images too?

Reproduction of GridVerse results

Hi!

Thanks for sharing your intriguing ideas on how to setup a transformer-based memory DRL algorithm. I'm interested in the way how the interface of the transformer works during inference and optimization. So I started out by simply trying to reproduce your GridVerse results as stated in your readme.

I'm currently running 3 repetitions of this experiment:

python run.py --env gv_memory.7x7.yaml --inembed 128 --disable-wandb --verbose

The success rate stays zero for the entire training so far.

[ December 15, 14:00:03 ] Training Steps: 699000, Success Rate: 0.00, Return: -25.00, Episode Length: 500.00, Hours: 3.77
[ December 15, 14:00:23 ] Training Steps: 700000, Success Rate: 0.00, Return: -25.00, Episode Length: 500.00, Hours: 3.78
[ December 15, 14:00:42 ] Training Steps: 701000, Success Rate: 0.00, Return: -25.00, Episode Length: 500.00, Hours: 3.78
[ December 15, 14:01:02 ] Training Steps: 702000, Success Rate: 0.00, Return: -25.00, Episode Length: 500.00, Hours: 3.79

I'm pretty sure I missed something. It would be great if you could help.

edit:
Training on 5x5 looks pretty volatile in comparison to the reported results.

[ December 15, 14:10:52 ] Training Steps: 891000, Success Rate: 0.30, Return: -2.24, Episode Length: 4.90, Hours: 3.92
[ December 15, 14:11:04 ] Training Steps: 892000, Success Rate: 0.70, Return: 1.78, Episode Length: 4.40, Hours: 3.92
[ December 15, 14:11:17 ] Training Steps: 893000, Success Rate: 0.40, Return: -1.20, Episode Length: 4.00, Hours: 3.92
[ December 15, 14:11:30 ] Training Steps: 894000, Success Rate: 0.70, Return: -0.42, Episode Length: 48.40, Hours: 3.93

Replacing `done` with `truncated` and `terminated`

Hey Kevin,
I hope you are doing well. I noticed a small bug where the step function returns only obs, reward, done, info instead of the obs, reward, terminated, truncated, info. I came across this article from gymansium that emphasised the need for both terminated and truncated. Can I help in updating the codebase?

Getting an error stating environment D does not exist

I tried to run the basic version of the code python run.py without installing any of the additional packages and getting this error

WARNING: ``gym_gridverse`` is not installed. This means you cannot run an experiment with the `gv_*` domains.
WARNING: ``gym_gridverse`` is not installed. This means you cannot run an experiment with the gv_*.yaml domains.
WARNING: ``gym_pomdps`` is not installed. This means you cannot run an experiment with the HeavenHell or Hallway domain. 
WARNING: ``mini_hack`` is not installed. This means you cannot run an experiment with any of the MH- domains.
Loading using gym.make
Environment with id D not found.
Loading using YAML
Traceback (most recent call last):
  File "/...../DTQN-main/utils/env_processing.py", line 34, in make_env
    env = gym.make(id_or_path)
  File "/......./lib/python3.10/site-packages/gym/envs/registration.py", line 569, in make
    _check_version_exists(ns, name, version)
  File "/......./lib/python3.10/site-packages/gym/envs/registration.py", line 219, in _check_version_exists
    _check_name_exists(ns, name)
  File "/....../lib/python3.10/site-packages/gym/envs/registration.py", line 197, in _check_name_exists
    raise error.NameNotFound(
gym.error.NameNotFound: Environment D doesn't exist. 

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/....../DTQN-main/run.py", line 533, in <module>
    run_experiment(get_args())
  File "/......./DTQN-main/run.py", line 415, in run_experiment
    envs.append(env_processing.make_env(env_str))
  File "/....../DTQN-main/utils/env_processing.py", line 39, in make_env
    inner_env = factory_env_from_yaml(
NameError: name 'factory_env_from_yaml' is not defined

But then I did installation of gridverse and ran python3 run.py --envs DiscreteCarFlags-v0 --device mps and then it ran successfully.

Rendering issue - ImportError

Hey Kevin,

In DiscreteCarFlag-v0 env for gym==0.18.0, running it with the --render flag returns the traceback:

Loading using gym.make
Traceback (most recent call last):
  File "run.py", line 333, in <module>
    run_experiment(parser.parse_args())
  File "run.py", line 21, in run_experiment
    env = env_processing.make_env(args.env)
  File "/home/hp/Desktop/ashok/DTQN-paper/utils/env_processing.py", line 46, in make_env
    env = gym.make(id_or_path)
  File "/home/hp/miniconda3/envs/dtqn/lib/python3.8/site-packages/gym/envs/registration.py", line 145, in make
    return registry.make(id, **kwargs)
  File "/home/hp/miniconda3/envs/dtqn/lib/python3.8/site-packages/gym/envs/registration.py", line 90, in make
    env = spec.make(**kwargs)
  File "/home/hp/miniconda3/envs/dtqn/lib/python3.8/site-packages/gym/envs/registration.py", line 59, in make
    cls = load(self.entry_point)
  File "/home/hp/miniconda3/envs/dtqn/lib/python3.8/site-packages/gym/envs/registration.py", line 18, in load
    mod = importlib.import_module(mod_name)
  File "/home/hp/miniconda3/envs/dtqn/lib/python3.8/importlib/__init__.py", line 127, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
  File "<frozen importlib._bootstrap>", line 1014, in _gcd_import
  File "<frozen importlib._bootstrap>", line 991, in _find_and_load
  File "<frozen importlib._bootstrap>", line 975, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 671, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 843, in exec_module
  File "<frozen importlib._bootstrap>", line 219, in _call_with_frames_removed
  File "/home/hp/Desktop/ashok/DTQN-paper/envs/car_flag.py", line 12, in <module>
    from gym.utils import pyglet_rendering as visualize
ImportError: cannot import name 'pyglet_rendering' from 'gym.utils' (/home/hp/miniconda3/envs/dtqn/lib/python3.8/site-packages/gym/utils/__init__.py)

The solution is to replace

from gym.utils import pyglet_rendering as visualize

with:

from gym.envs.classic_control import rendering as visualize

Matrix size mismatch error

Hey Kevin, I am facing the following error when running DQN on the heavenhell environment.

Error log:

$ python run.py --env POMDP-heavenhell_3-episodic-v0 --inembed 64 --model DQN --verbose --seed 1 --disable-wandb
Loading using gym.make
Loading using gym.make
[ July 12, 22:58:08 ] Creating DQN with 5132 parameters
Traceback (most recent call last):
  File "run.py", line 333, in <module>
    run_experiment(parser.parse_args())
  File "run.py", line 115, in run_experiment
    agent.train()
  File "/home/cse/Desktop/ashok/DTQN-paper/dtqn/agents/dqn.py", line 189, in train
    q_values = self.policy_network(obss).gather(1, actions).squeeze()
  File "/home/cse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/cse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/cse/Desktop/ashok/DTQN-paper/dtqn/networks/dqn.py", line 47, in forward
    return self.ffn(self.obs_embed(x))
  File "/home/cse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/cse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/cse/Desktop/ashok/DTQN-paper/dtqn/networks/representations.py", line 12, in forward
    return self.embedding(obs)
  File "/home/cse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/cse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/cse/.local/lib/python3.8/site-packages/torch/nn/modules/container.py", line 217, in forward
    input = module(input)
  File "/home/cse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/cse/.local/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/cse/.local/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 116, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x256 and 8x64)