Comments (6)
Do you mean adding exploration noise during training rollouts? It's a bit tricky and one of the heftier technical debts of tianshou. If that's what you mean, I'll write a short explanation of the current status here
from tianshou.
Thank you for your prompt and professional reply!
According to my understanding, tianshou.trainer.BaseTrainer.train_step() implements the use of train_collector.collect() to collect training data from the environment ,
The usage is as shown in the code below(tianshou/trainer/base.py line 436):
result = self.train_collector.collect(
n_step=self.step_per_collect,
n_episode=self.episode_per_collect,
)
in collect() There is a piece of code as follows(tianshou/data/collector.py line 285)
if random:
try:
act_sample = [self._action_space[i].sample() for i in ready_env_ids]
except TypeError: # envpool's action space is not for per-env
act_sample = [self._action_space.sample() for _ in ready_env_ids]
act_sample = self.policy.map_action_inverse(act_sample) # type: ignore
self.data.update(act=act_sample)
My question is how to pass random parameters to collect() function in train_step when using BaseTrainer Class
The method of using trainer in the outermost layer is as follows I can't find how to pass in a random tag
result = OffpolicyTrainer(
policy=policy,
train_collector=train_collector,
test_collector=test_collector,
max_epoch=args.epoch,
step_per_epoch=args.step_per_epoch,
step_per_collect=args.step_per_collect,
episode_per_test=args.test_num,
batch_size=args.batch_size,
update_per_step=args.update_per_step,
stop_fn=stop_fn,
train_fn=train_fn,
test_fn=test_fn,
save_best_fn=save_best_fn,
logger=logger,
).run()
from tianshou.
I see, I'll check it and will write an explanation tomorrow in the morning. Thanks for the good question, we'll add some documentation (or maybe even a small refactoring, if possible) to make that clearer
from tianshou.
Thanks!
from tianshou.
May I ask what your intended use-case is? If it is collecting warm-up steps for OffPolicy algorithms, these are currently collected before the Collector is handed to the Trainer. Have a look at the examples for algorithms like TD3 or SAC.
from tianshou.
Sorry, the last days were very full, only got around to it now.
- When instantiating
BaseTrainer
directly, you will have to create aCollector
and pass it as train_collector. Then you can passrandom=True
when instantiating the correspondingColllector
instance. - Currently the
train_step
interface doesn't allow any additional controls. We might adjust it in the near future. If you want to hack your way through it, you could manually adjust the wrapped train_collector. So you could do
my_trainer.train_collector.random = True
# Do something that you want
my_trainer.train_step()
my_trainer.train_collector.random = False
While it's not pretty, it will work. If you want to use the high-level interfaces, such hacking would be more difficult, but probably still possible
To better understand how we can adjust the interfaces for improving user experience, it would be great to understand your use case, as @maxhuettenrauch has pointed out.
Hope this answer helps you :) @DLoveS1314
from tianshou.
Related Issues (20)
- Wrong output of forward for custom policy HOT 1
- Support MultiBinary action space for SAC or A2C HOT 2
- Clearer separation between the trainer and the algorithm and refactoring of policy classes HOT 1
- When is the reset() function being called in tictactoe? HOT 2
- Rename state_shape to obs_shape HOT 1
- logger and save_best_fn do not work for Custom Environment HOT 1
- Regarding the error related to SEED when I train in a homebrew environment HOT 6
- How to train a offline BCQ model with a custom logged data? HOT 1
- Collector sampling with multiple environment does not seem to be unbiased with n_episodes HOT 10
- Colab button not working in docs tutorials HOT 2
- My custom PettingZoo env is working with DQNPolicy but not with PPOPolicy : AttributeError: 'str' object has no attribute 'ndim' HOT 2
- Update Gymnasium to v1.0.0a1 HOT 3
- Allow number of episodes per test step to be configured in high-level API
- Improve and extend Documentation Content
- Code duplication between ReplayBuffer and ReplayBufferManager
- Remove data from state in Collector, and remove preprocess_fn there HOT 7
- how to convert Batch into ndarray/tensor HOT 5
- Support Dict observation spaces HOT 7
- Revisit and maybe optimize Collectors
- two dimensional input action in DDPG HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tianshou.