Git Product home page Git Product logo

rl4co's Introduction


RL4CO has been accepted as an oral presentation at the NeurIPS 2023 GLFrontiers Workshop! 🎉


An extensive Reinforcement Learning (RL) for Combinatorial Optimization (CO) benchmark. Our goal is to provide a unified framework for RL-based CO algorithms, and to facilitate reproducible research in this field, decoupling the science from the engineering.

RL4CO is built upon:

  • TorchRL: official PyTorch framework for RL algorithms and vectorized environments on GPUs
  • TensorDict: a library to easily handle heterogeneous data such as states, actions and rewards
  • PyTorch Lightning: a lightweight PyTorch wrapper for high-performance AI research
  • Hydra: a framework for elegantly configuring complex applications

RL4CO Overview

We provide several utilities and modularization. For autoregressive policies, we modularize reusable components such as environment embeddings that can easily be swapped to solve new problems.

RL4CO Policy

Getting started

Open In Colab

RL4CO is now available for installation on pip!

pip install rl4co

To get started, we recommend checking out our quickstart notebook or the minimalistic example below.

Install from source

This command installs the bleeding edge main version, useful for staying up-to-date with the latest developments - for instance, if a bug has been fixed since the last official release but a new release hasn’t been rolled out yet:

pip install -U git+https://github.com/ai4co/rl4co.git

Local install and development

If you want to develop RL4CO we recommend you to install it locally with pip in editable mode:

git clone https://github.com/ai4co/rl4co && cd rl4co
pip install -e .

We recommend using a virtual environment such as conda to install rl4co locally.

Usage

Train model with default configuration (AM on TSP environment):

python run.py

Tip

You may check out this notebook to get started with Hydra!

Change experiment settings

Train model with chosen experiment configuration from configs/experiment/

python run.py experiment=routing/am env=tsp env.num_loc=50 model.optimizer_kwargs.lr=2e-4

Here you may change the environment, e.g. with env=cvrp by command line or by modifying the corresponding experiment e.g. configs/experiment/routing/am.yaml.

Disable logging
python run.py experiment=routing/am logger=none '~callbacks.learning_rate_monitor'

Note that ~ is used to disable a callback that would need a logger.

Create a sweep over hyperparameters (-m for multirun)
python run.py -m experiment=routing/am  model.optimizer.lr=1e-3,1e-4,1e-5

Minimalistic Example

Here is a minimalistic example training the Attention Model with greedy rollout baseline on TSP in less than 30 lines of code:

from rl4co.envs import TSPEnv
from rl4co.models import AttentionModel
from rl4co.utils import RL4COTrainer

# Environment, Model, and Lightning Module
env = TSPEnv(num_loc=20)
model = AttentionModel(env,
                       baseline="rollout",
                       train_data_size=100_000,
                       test_data_size=10_000,
                       optimizer_kwargs={'lr': 1e-4}
                       )

# Trainer
trainer = RL4COTrainer(max_epochs=3)

# Fit the model
trainer.fit(model)

# Test the model
trainer.test(model)

Other examples can be found on the documentation!

Testing

Run tests with pytest from the root directory:

pytest tests

Known Bugs

Bugs installing PyTorch Geometric (PyG)

Installing PyG via Conda seems to update Torch itself. We have found that this update introduces some bugs with torchrl. At this moment, we recommend installing PyG with Pip:

pip install torch_geometric

Contributing

Have a suggestion, request, or found a bug? Feel free to open an issue or submit a pull request. If you would like to contribute, please check out our contribution guidelines here. We welcome and look forward to all contributions to RL4CO!

We are also on Slack if you have any questions or would like to discuss RL4CO with us. We are open to collaborations and would love to hear from you 🚀

Contributors

Citation

If you find RL4CO valuable for your research or applied projects:

@inproceedings{berto2023rl4co,
    title={{RL}4{CO}: a Unified Reinforcement Learning for Combinatorial Optimization Library},
    author={Federico Berto and Chuanbo Hua and Junyoung Park and Minsu Kim and Hyeonah Kim and Jiwoo Son and Haeyeon Kim and Joungho Kim and Jinkyoo Park},
    booktitle={NeurIPS 2023 Workshop: New Frontiers in Graph Learning},
    year={2023},
    url={https://openreview.net/forum?id=YXSJxi8dOV},
    note={\url{https://github.com/ai4co/rl4co}}
}

Join us

Slack

We invite you to join our AI4CO community, an open research group in Artificial Intelligence (AI) for Combinatorial Optimization (CO)!

rl4co's People

Contributors

bokveizen avatar cbhua avatar eltociear avatar fedebotu avatar furffico avatar haimrich avatar henry-yeh avatar hyeok9855 avatar hyeonahkimm avatar junyoungpark avatar ltluttmann avatar ngastzepeda avatar tycbony avatar zymrael avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

rl4co's Issues

[Refactoring] even tighter integration with Lightning

@cbhua @Junyoungpark

As suggested by @Junyoungpark , we could reduce the complexity by having the RL model directly handled by PyTorch Lightning, which is what happens in the scheme below:

image

At the moment, we have the following levels:

└── RL4COLitModule    <- PyTorch Lightning Module
    └──Model          <- e.g. `AttentionModel` (=RL)
        └── Policy    <- e.g. `AttentionModelPolicy`

However, this could be simplified as

└── RLModel         <- PyTorch Lightning Module  (=RL, e.g. REINFORCE)
        └── Policy  <- e.g. `AttentionModel`

This would also allow for easier implementation of e.g. PPO, since the inner optimization loop would be done directly in PyTorch Lightning. Moreover, we would not need to have callbacks in RL4COLitModule to the models themselves / baselines, since everything would be integrated into a single module 🚀

[Notice] Updating to new TorchRL and TensorDict

The new TorchRL and TensorDict versions have been officially released! We will be able to use Python 3.11 as well 🚀

However, there are some backward compatibility issues with the current RL4CO version and the new TorchRL. Moreover, it looks like the newest version needs PyTorch 2.1 to work for the time being(issue here, linking this one as well).

In the following couple of weeks, we will work towards the next release to address new compatibilities and making RL4CO more efficient ! Stay tuned


For the time being

To install RL4CO now, make sure TorchRL and Tensordict are at v0.1.1 (for Windows: PyTorch should be <2.1 for this version of TorchRL):

pip install torchrl==0.1.1 tensordict==0.1.1 rl4co==0.2.3

[BUG] RuntimeError in minimalistic example

Thanks for the amazing repo! I encounter a bug when I run the minimalistic example given in README:

Describe the bug

RuntimeError: It looks like your LightningModule has parameters that were not used in producing the loss returned by training_step. If this is intentional, you must enable the detection of unused parameters in DDP, either by setting the string value `strategy='ddp_find_unused_parameters_true'` or by setting the flag in the strategy with `strategy=DDPStrategy(find_unused_parameters=True)`.

image

To Reproduce

minimalistic example in README.md

Questions

I find the bug can be fixed if I add DDPStrategy(find_unused_parameters=True) according to the suggestion:

trainer = L.Trainer(
    max_epochs=1,  # only few epochs
    accelerator="gpu",  # use GPU if available, else you can use others as "cpu"
    logger=None,  # can replace with WandbLogger, TensorBoardLogger, etc.
    precision="16-mixed",  # Lightning will handle faster training with mixed precision
    gradient_clip_val=1.0,  # clip gradients to avoid exploding gradients
    reload_dataloaders_every_n_epochs=1,  # necessary for sampling new data

    strategy=DDPStrategy(find_unused_parameters=True),  # TODO can we add it?
)

But I'm wondering whether the training can still work well. Is there any bad effect introduced by DDPStrategy(find_unused_parameters=True)?

[Feature] Make RL4CO Lightning Module work adaptively with or without Hydra

It would be good to have the module to adaptively take either:

  1. Configuration and instantiate with hydra.utils.instantiate
  2. Env and model previously initialized

This would be very useful for people who do not know how to use Hydra, since it might be easier to just manually pass the model instead.
Config could still be passed as a DictConfig

[Doc] Documentation with environment descriptions

It would be better to have a document/readme/notebook to describe the environmental problem in detail. Since there are many details for those problems should be clear. For example, can the salesman in the PCTSP visit the depot multi times, should the salesman always come back to the depot for every problem (theoretically should), etc.

This can also include some details in our implementation to help people understand our code better. For example, when we set the trip to be done in the PCTSP, etc.

This is temp in low priority before we finish the repo implementation (I can help during the paper-writing period).

DPP environment

@kim960121 As we discussed, having variants to make the DPP problem more challenging would be great. An EDA environment such as DPP will be an impactful addition to our collection.
Here is a tutorial to get started 🚀

[Feature] Adaptation methods

@Leaveson Leaving this here for tracking it
It would be great to have adaptation methods:

  • EAS
  • (SGBS - or maybe not because of simulation, which is not totally on RL scope)
  • Meta-SAGE

[BUG] `glimpse_val` not considered

Right now, in the models, glimpse_val is not considered and is instead substituted with glimpse_key here
This seems not to affect training too much, but should be fixed nonetheless

[Help] The results about the sdvrp environment

Hi, When I test the sdvrp env, I found the results is not normal, when i setting the Env config below, Basiclly i think the vehicle should revisited the same target location for many times, while the results and printed actions shows that each target location only be visited once.
Plus, when i setting the min_demand to 0 and train the model , it sometimes happened error for the start_loc demand will be the negative and raise the error.
We are are appreciated with your work and waiting for your reply,it will be better if there will be more tutorial and example about how to load the trained model for that we are unfamiliar with your structure.
sdvrpenv = SDVRPEnv(num_loc=20,
min_loc=0,
max_loc=1,
min_demand=1,
max_demand=10,
vehicle_capacity=1.0)

Support for PyTorch < 2.0

At the moment, as shown here we are using the scaled_dot_product_attention from PyTorch, which requires >2.0. Since this is the only point of incompatibility, we can just refactor scaled_dot_product_attention such that if it is not imported, then run our own implementation / run compiled FlashAttention

  • Implement scaled_dot_product_attention in PyTorch < 2.0
    Upon availability, run FlashAttention
    Note: we do not run FlashAttention on availability due to too many checks, although we have this version in previous commits. Might do it in the future

[BugFix] check the MDAM model for the VRP and OP

The MDAM model's original implementation has different process steps for TSP and, VRP, OP. In our implementation, we have the problem context; the running is no problem. But I have to double-check if the MDAM for TSP and VRP works properly. Mostly it's about the depot information (for the VRP problem the depot information is split with agents' information).

  • TSP: done, works the same with the original code;
  • CVRP;
  • PCTSP;
  • OP;

Environments other than routing

Other environments except routing problems in TorchRL to consider


[Extra]
To be decided if we have time: it would be great to add some others if time allows, such as:

The following may not be very meaningful to us:

  • Knapsack problem
  • Maximum vertex covering

Profile of our RL4CO vs original attention model

As far as we tested, we have the following at the moment:

  • For small problems (e.g. CVRP 20), our implementation has basically the same running time as the original AM
  • For larger problem, the original implementation becomes faster. But why? Here are some possible reasons
  1. PyTorch Lightning has some slight overhead, but generally negligible
  2. Perhaps TensorDicts and their dataloading,
  3. Maaaybe models may have some bottlenecks

We would like to benchmark our implementation carefully - which is, to use a profiler to understand exactly what the bottleneck is and possibly solve it


Note that if we use FlashAttention and fp16-mixed during training, then we can do better than the original, so in practice there should still be speedups . But, would be great to claim we are always the better implementation


If you have some ideas, feel free to contribute ;) @cbhua @Junyoungpark

Does rl4co's POMO support SDVRP, OP?

Thank you for your excellent work!
Does RL4CO also support solving other CO problems besides TSP and CVRP? For instance, SDVRP, OP, PCTSP, which are applicable to the AM model.
When I attempted the POMO-OP, I encountered this error message. (

current_total_prize = td["current_total_prize"] + gather_by_index(
)

../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [0,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [1,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [2,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [3,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [4,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [5,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [6,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [7,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [8,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [9,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [10,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [11,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [12,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [13,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [14,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [15,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [16,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [17,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [18,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [19,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [20,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [21,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [22,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [23,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [24,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [25,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [26,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [27,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [28,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [29,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [30,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.
../aten/src/ATen/native/cuda/ScatterGatherKernel.cu:144: operator(): block: [3228,0,0], thread: [31,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "rl4co/rl4co/rl4co/utils/utils.py", line 36, in wrap
metric_dict, object_dict = task_func(cfg=cfg)
File "rl4co/rl4co/rl4co/tasks/train.py", line 77, in run
trainer.fit(model=model, ckpt_path=cfg.get("ckpt_path"))
File "rl4co/rl4co/rl4co/utils/trainer.py", line 145, in fit
super().fit(
File "/opt/conda/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py", line 544, in fit
call._call_and_handle_interrupt(
File "/opt/conda/lib/python3.8/site-packages/lightning/pytorch/trainer/call.py", line 68, in _call_and_handle_interrupt
trainer._teardown()
File "/opt/conda/lib/python3.8/site-packages/lightning/pytorch/trainer/trainer.py", line 1012, in _teardown
self.strategy.teardown()
File "/opt/conda/lib/python3.8/site-packages/lightning/pytorch/strategies/ddp.py", line 405, in teardown
super().teardown()
File "/opt/conda/lib/python3.8/site-packages/lightning/pytorch/strategies/parallel.py", line 127, in teardown
super().teardown()
File "/opt/conda/lib/python3.8/site-packages/lightning/pytorch/strategies/strategy.py", line 528, in teardown
self.lightning_module.cpu()
File "/opt/conda/lib/python3.8/site-packages/lightning/fabric/utilities/device_dtype_mixin.py", line 79, in cpu
return super().cpu()
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 967, in cpu
return self._apply(lambda t: t.cpu())
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 810, in _apply
module._apply(fn)
[Previous line repeated 1 more time]
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 833, in _apply
param_applied = fn(param)
File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 967, in
return self._apply(lambda t: t.cpu())
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

To Reproduce

I modified two lines of config/experiment/base.yaml as follows

defaults:
**- override /model: pomo.yaml

  • override /env: op.yaml**
  • override /callbacks: default.yaml
  • override /trainer: default.yaml
  • override /logger: wandb.yaml

[Minor] stochastic prize collecting travelling salesman problem (SPCTSP) feature embedding problem

Definition of the SPCTSP [1]:

In the SPCTSP, the expected node prize is known upfront, but the real collected prize only becomes known upon visitation.

i.e., for each node, we know the expectation of the prize as $\rho_i$; when the salesman travels to this node, we will generate the real prize by $\rho_i^*\sim\mathrm{Uniform}(0, 2\rho_i)$ and then follow the PCTSP rule to continue.

In this case, different from the PCTSP, the encoder embedding information should contain the observation of all node locations and the expectations of each node.

The attention-learn-to-route seems not to implement this problem's environment. I may need a discussion before the pull request.

[1] Kool, Wouter, Herke Van Hoof, and Max Welling. "Attention, learn to solve routing problems!." arXiv preprint arXiv:1803.08475 (2018).

SDVRP environment

@cbhua for the SDVRP environment, we may need to separate embeddings / context from normal VRP (so maybe make a SDVRPInitEmbedding)

In the original code here they have the allow_partial variable (EDIT) this one should be covered in SDVRPDynamicEmbedding

  • Make the model compatible with SDVRP and CVRP at the same time (maybe add some checks/context etc)
  • Test out against original implementation and see if it trains
  • For @fedebotu : make sure also the final training loop works. Right now there is a small bug in CVRP, possibly due to reward sign

Fix batching / unbatching

Batching / unbatching needs to be checked and fixed. We need to make sure that

x_new = repeat_batch(x, n)
x_unrepeat = unrepeat_batch(x, n)
assert torch.allclose(x, x_unrepeat[:, 0])

This of course depends on the (un)repeat_batch functions

[Refactoring][Minor] split the decoder and the policy for the mdam model

The first submitted version of the MDAM model works properly. But the code is massive, where the decoder process is included in the policy. Do a refactoring to split the decoder and policy to make the structure clear and meet our flexible repository requirement.

This task will not be long, I will use one or two hours today (Saturday) to finish this.

PPO implementation

As discussed today, let's make this work as we can show another RL algorithm for training the models

Mistery: `einops` works, but `view` does not?

We encountered the following problem:

import torch
from einops import rearrange

num_heads = 8

a = torch.randn(512, 20, 128)

# einops
a_einops = rearrange(a, 'b n (h d) -> b h n d', h=num_heads)

# torch
batch_size, length, hidden_dim = a.shape
a_torch = a.view(batch_size, num_heads, length, -1)

print(a_einops.shape)
print(a_torch.shape)

print(torch.allclose(a_einops, a_torch))

False

Why is this the case? By substituting view with einops, Attention works as it should

Deprecate `init_obs` and improve dataloading efficiency

Right now we are loading data with init_obs and not the native reset function of the environments because of some efficiency reasons.

This is because stacking TensorDicts is very slow (see notebooks) for some reason while re-creating them on the run is very fast

  • RL4COBaseEnv
  • Refactor models
  • Refactor enviroments

[BUG] SharedBaseline on_dim default value should be -1

Describe the bug

I think in the SharedBaseline, on_dim here default value should be -1

If not, it raises 👇
image

To Reproduce

Simply change the baseline and run

Checklist

  • I have checked that there is no similar issue in the repo (required)
  • I have provided a minimal working example to reproduce the bug (required)

[BUG] PDPEnv is not importable

Describe the bug

PDPEnv is not importable.

Python 3.9.16 (main, Mar  8 2023, 14:00:05)
[GCC 11.2.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from rl4co.envs import PDPEnv
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/silab9/proj/rl4co/rl4co/envs/__init__.py", line 1, in <module>
    from rl4co.envs.base import RL4COEnvBase
  File "/home/silab9/proj/rl4co/rl4co/envs/base.py", line 9, in <module>
    from rl4co.data.dataset import TensorDictDataset
ModuleNotFoundError: No module named 'rl4co.data'
>>>

To Reproduce

Steps to reproduce the behavior.

Please try to provide a minimal example to reproduce the bug. Error messages and stack traces are also helpful.

Please use the markdown code blocks for both code and stack traces.

pip install -e .

from rl4co.envs import PDPEnv

Expected behavior

The PDPEnv is being importable.

Reason and Possible fixes

If you know or suspect the reason for this bug, paste the code lines and suggest modifications.

Checklist

  • I have checked that there is no similar issue in the repo (required)
  • I have provided a minimal working example to reproduce the bug (required)

[BUG]When I run sdvrp example,it raise error

Hello, I find a bug when i want to run the sdvrp problem, here is my settings below
config = DictConfig(
{"data": {
"train_size": 10000,
"val_size": 100,
"test_size": 2,
"batch_size": 50,
"generate_data": True
},
"optimizer": {"lr": 1e-4},
"path": {"data_dir": "data111/"},
}
)
# Environment, Model, and Lightning Module
sdvrpenv = SDVRPEnv(num_loc=20,
min_loc=0,
max_loc=1,
min_demand=1,
max_demand=10,
vehicle_capacity=1.0,
capacity=1.0,
# train_file="tsp/tsp20_test_seed1234.npz",
# val_file="tsp/tsp20_test_seed1234.npz",
# test_file="tsp/tsp20_test_seed1234.npz",
seed=None,
device="cuda",
)
model = AttentionModel(sdvrpenv)
lit_module = RL4COLitModule(config, sdvrpenv, model)
# Trainer
trainer = L.Trainer(
max_epochs=3, # only few epochs
accelerator="gpu", # use GPU if available, else you can use others as "cpu"
logger=None, # can replace with WandbLogger, TensorBoardLogger, etc.
precision="16-mixed", # Lightning will handle faster training with mixed precision
gradient_clip_val=1.0, # clip gradients to avoid exploding gradients
reload_dataloaders_every_n_epochs=1, # necessary for sampling new data
)

trainer.fit(lit_module)   
trainer.test(lit_module)

Then it raise error : glimpse_k = cached.glimpse_key + glimpse_key_dynamic
RuntimeError: The size of tensor a (21) must match the size of tensor b (20) at non-singleton dimension 1
it looks like you haven't complete the sdvrp_env? Because when i run the older version, i found that you haven't complete this env.
Thanks for your reading.

[Feature] Multi-agent environments with `min-max` objective

As discussed yesterday with @alstn12088 , we could add also mTSP and mPDP, and include Equity-Transformer in the benchmark @Leaveson.

I have already made an mTSP implementation in free style here
As for context and embeddings, you may find them here.
Finally this notebook makes a simple training for mTSP - note that most probably you can add your knowledge and make it work better!

Extending to mPDP: I can make the environment, basically just extend mTSP to deal with coupled nodes (we already have implemented PDP here
These multi-agent environments with min-max objective can be a fine addition to our collection :)

  • mTSP
  • mPDP

[Feature enhancement] Edge handling in GCNEnc

I found that the current implementation of GCNEnc cannot support the cases where we test the model that is not trained.
This could be critical especially when we measure the generalization performance of the trained model.

An easy fix is, inside of the forward, when the given x.shape[1] != self.edge_index[0].max() - or something more smart -, we can reconstruct the edges from the inputs.

https://github.com/kaist-silab/rl4co/blob/daf558a8ae42a4c64ba8464135c3af85423ad469/rl4co/models/nn/graph/gcn.py#LL52C39-L52C49

[BUG] Can't install rl4co in CPython3.11

Describe the bug

Can't install rl4co with pip install rl4co or pip install -e . in CPython3.11.

This is because torchrl is not supported in CPython3.11 (see this torchrl issue).

To Reproduce

Please refer to the torchrl issue.

Screenshots

image

Reason and Possible fixes

Maybe specify the right python version that works (e.g. 3.10)?

Checklist

  • I have checked that there is no similar issue in the repo (required)
  • I have provided a minimal working example to reproduce the bug (required)

Remove `log_cost` option

At the moment, we can choose whether to log the cost or the reward in PyTorch Lightning here although with some tricks.

However, I believe this is not a good option. Because of this, model_checkpoint is known to crash if we do not log the correct value (reward or cost). The end-user experience would be simple if we just used the reward for all the environments, even though it would result in -cost for some routing problems such as the TSP

TorchRL support

Hi guys!
Great work on the library.
Just leaving this here to let you know that we're happy to support the effort in any way we can: if you spot issues or missing features in torchrl / tensordict just let us know.
If you'd like to be integrated in some design decisions feel free to let us know too! We'll ping you when relevant PRs are out.

[Minor] Rename `observation` as `loc`

Observation is generally good, but may be misleading. In fact, td carry observations in forms of TensorDict, so we would be better off calling observation as loc

[BUG] possible bug in Rollout Baseline

In both mTSP and PDP, with rollout baseline, we may get an exploding behavior (loss increases after some time)
I suspect this may be due to gradient clipping by PyTorch Lightning, so we may have to investigate

[BUG] torchrl error

Describe the bug

When running the code in the quickstart, an error occurs:

ImportError: /usr/local/lib/python3.10/dist-packages/torchrl/_torchrl.so: undefined symbol: _ZN5torch8autograd13_wrap_outputsERKSt6vectorIN2at6TensorESaIS3_EERKSt13unordered_setIPN3c1010TensorImplESt4hashISB_ESt8equal_toISB_ESaISB_EESJ_NS9_8ArrayRefINS9_8optionalIS3_EEEERKSt10shared_ptrINS0_4NodeEESt8functionIFS5_S5_S5_EESJ_

I tested it locally, and the same issue occurs.
My environment:

python 3.11
rl4co 0.2.0
pytorch 2.0.1

To Reproduce

Just open your link

Additional context

I remember that the code could still run in the older version.

Checklist

  • I have checked that there is no similar issue in the repo (required)
  • I have provided a minimal working example to reproduce the bug (required)

Rename `base` to `policy`

Rename base to policy for clarity.

  • Policy becomes the backbone of the model
  • Full model is Policy + REINFORCE Baseline (or full algorithm considering the RL training method)

[BUG] POMO `unbatchify` bug

This happens with latest version of the code; only during validation/test for the following line

aug_reward = unbatchify(max_reward, self.num_augment)
  File "/home/botu/Dev/rl4co/rl4co/utils/ops.py", line 19, in unbatchify
    return x.view(repeats, s[0] // repeats, *s[1:]).permute(1, 0, *range(2, len(s) + 1))
RuntimeError: shape '[8, 4, 8]' is invalid for input of size 288

We need to check the shapes, put possibly not modify the batchify/unbatchify functions. Better do this after new features for testing have been already added

Additional features

As @alstn12088 and I discussed today, here are some features to add. Adding this issue to track them

  • greedy only inference mode; in SymNCO, equivalent to setting reward to [:, 0, 0]
    -[ ] sampling mode for all models during inference. This means that we want to enlarge the batch to e.g. 128 and then take the best out of all samples -> This will be done directly in the eval script
  • Additional testing metrics: take best reward out of only augmentation of SymNCO (perhaps, minor, since we would need a matrix of things to do)
  • Adjust SoftMax temperature during testing only
  • Change POMO to 6 encoder layers by default (minor, maybe not fair for comparing with other models)
  • Add counter for number of steps, samples, and augmented samples

[Feature] various encoder implement

Implement different encoders to show that our repository implementation is flexible, and it could add more impact rather than going for the MatNet.

Tasks:

  • MPNN encoder;
  • GCN encoder;

Other encoders for future work.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.