hmomin / finenvs Goto Github PK
View Code? Open in Web Editor NEWFast Parallel Simulation of Financial Time Series Environments for Reinforcement Learning
License: GNU General Public License v3.0
Fast Parallel Simulation of Financial Time Series Environments for Reinforcement Learning
License: GNU General Public License v3.0
I'm having trouble getting SAC to learn Cartpole effectively. Below is sample output of one of the better trials, but in most trials, it can't even break above a total reward of 10.
Also, there is a memory leak somewhere that triggers after about 2.2-million samples for me: based on the error message, it looks like it's resulting from not detaching the output from means = self.forward(states)
in the actor file, but I'll let you see to it.
Importing module 'gym_37' (/home/momin/Documents/isaacgym/python/isaacgym/_bindings/linux-x86_64/gym_37.so)
Setting GYM_USD_PLUG_INFO_PATH to /home/momin/Documents/isaacgym/python/isaacgym/_bindings/linux-x86_64/usd/plugInfo.json
PyTorch version 1.10.2+cu113
Device count 1
/home/momin/Documents/isaacgym/python/isaacgym/_bindings/src/gymtorch
Using /home/momin/.cache/torch_extensions/py37_cu113 as PyTorch extensions root...
Emitting ninja build file /home/momin/.cache/torch_extensions/py37_cu113/gymtorch/build.ninja...
Building extension module gymtorch...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module gymtorch...
/home/momin/anaconda3/envs/rlgpu/lib/python3.7/site-packages/gym/spaces/box.py:112: UserWarning: WARN: Box bound precision lowered by casting to float32
logger.warn(f"Box bound precision lowered by casting to {self.dtype}")
Not connected to PVD
+++ Using GPU PhysX
Physics Engine: PhysX
Physics Device: cuda:0
GPU Pipeline: enabled
num samples: 48640 - evaluation return: 42.664555 - mean training return: 19.240843 - std dev training return: 21.298613
num samples: 75264 - evaluation return: 17.117111 - mean training return: 37.337551 - std dev training return: 24.090464
num samples: 96256 - evaluation return: 9.022424 - mean training return: 34.875938 - std dev training return: 26.308626
num samples: 116224 - evaluation return: 7.290071 - mean training return: 33.133209 - std dev training return: 25.416498
num samples: 136192 - evaluation return: 8.945022 - mean training return: 32.170544 - std dev training return: 25.197166
num samples: 157184 - evaluation return: 11.210396 - mean training return: 32.678501 - std dev training return: 22.590132
num samples: 178176 - evaluation return: 8.651594 - mean training return: 33.387737 - std dev training return: 24.583792
num samples: 203264 - evaluation return: 19.072805 - mean training return: 33.129204 - std dev training return: 22.507452
num samples: 224256 - evaluation return: 10.693352 - mean training return: 33.824829 - std dev training return: 22.631231
num samples: 242688 - evaluation return: 7.536970 - mean training return: 35.717468 - std dev training return: 26.653385
num samples: 263168 - evaluation return: 9.159291 - mean training return: 39.677971 - std dev training return: 26.461964
num samples: 284672 - evaluation return: 13.753101 - mean training return: 38.719288 - std dev training return: 25.779493
num samples: 306688 - evaluation return: 11.256489 - mean training return: 41.800442 - std dev training return: 27.398203
num samples: 326656 - evaluation return: 7.761618 - mean training return: 41.407909 - std dev training return: 30.619041
num samples: 349184 - evaluation return: 12.650271 - mean training return: 39.605946 - std dev training return: 26.463860
num samples: 368640 - evaluation return: 7.406431 - mean training return: 43.363243 - std dev training return: 31.207508
num samples: 391168 - evaluation return: 13.261372 - mean training return: 40.798859 - std dev training return: 28.053375
num samples: 411136 - evaluation return: 8.921464 - mean training return: 47.900318 - std dev training return: 34.384724
num samples: 431616 - evaluation return: 10.965508 - mean training return: 41.790977 - std dev training return: 30.708284
num samples: 459264 - evaluation return: 24.945358 - mean training return: 42.953075 - std dev training return: 33.013519
num samples: 480256 - evaluation return: 10.809840 - mean training return: 41.681019 - std dev training return: 29.536516
num samples: 500736 - evaluation return: 10.699786 - mean training return: 42.309608 - std dev training return: 30.666838
num samples: 519680 - evaluation return: 7.754902 - mean training return: 38.211960 - std dev training return: 29.142509
num samples: 538112 - evaluation return: 7.239990 - mean training return: 40.609222 - std dev training return: 30.046165
num samples: 557568 - evaluation return: 7.792455 - mean training return: 41.511486 - std dev training return: 29.129290
num samples: 579584 - evaluation return: 15.246098 - mean training return: 43.569519 - std dev training return: 30.588730
num samples: 598528 - evaluation return: 6.837287 - mean training return: 44.639370 - std dev training return: 32.618675
num samples: 616960 - evaluation return: 8.379328 - mean training return: 44.910011 - std dev training return: 30.007103
num samples: 635392 - evaluation return: 7.636024 - mean training return: 41.894653 - std dev training return: 29.709085
num samples: 654848 - evaluation return: 10.105590 - mean training return: 42.790398 - std dev training return: 29.942787
num samples: 674304 - evaluation return: 9.454279 - mean training return: 42.145344 - std dev training return: 30.102503
num samples: 696832 - evaluation return: 15.113770 - mean training return: 42.603848 - std dev training return: 26.551289
num samples: 743936 - evaluation return: 69.809341 - mean training return: 44.716377 - std dev training return: 31.713352
num samples: 765952 - evaluation return: 14.800228 - mean training return: 48.687096 - std dev training return: 35.265812
num samples: 784384 - evaluation return: 8.898764 - mean training return: 44.878021 - std dev training return: 29.701893
num samples: 810496 - evaluation return: 22.929743 - mean training return: 42.003948 - std dev training return: 29.101030
num samples: 830464 - evaluation return: 8.730614 - mean training return: 46.895416 - std dev training return: 30.673267
num samples: 850944 - evaluation return: 10.750460 - mean training return: 44.366295 - std dev training return: 32.735119
num samples: 869376 - evaluation return: 7.646038 - mean training return: 42.031437 - std dev training return: 30.974838
num samples: 888320 - evaluation return: 8.660542 - mean training return: 45.897411 - std dev training return: 35.273087
num samples: 910336 - evaluation return: 14.657757 - mean training return: 42.573399 - std dev training return: 29.213062
num samples: 951808 - evaluation return: 59.844833 - mean training return: 44.369228 - std dev training return: 32.552788
num samples: 972288 - evaluation return: 10.970460 - mean training return: 42.581337 - std dev training return: 26.832909
num samples: 990208 - evaluation return: 8.688063 - mean training return: 42.989204 - std dev training return: 27.803591
num samples: 1009664 - evaluation return: 10.115323 - mean training return: 44.869339 - std dev training return: 32.852955
num samples: 1028608 - evaluation return: 7.315423 - mean training return: 41.035736 - std dev training return: 32.797501
num samples: 1051648 - evaluation return: 17.410482 - mean training return: 43.608242 - std dev training return: 32.394970
num samples: 1070080 - evaluation return: 8.257707 - mean training return: 44.351231 - std dev training return: 29.345806
num samples: 1089024 - evaluation return: 7.072944 - mean training return: 44.150719 - std dev training return: 31.034515
num samples: 1107968 - evaluation return: 7.315763 - mean training return: 45.740803 - std dev training return: 29.843706
num samples: 1126912 - evaluation return: 8.030341 - mean training return: 48.802032 - std dev training return: 32.735664
num samples: 1147904 - evaluation return: 12.481560 - mean training return: 46.902039 - std dev training return: 30.762377
num samples: 1165824 - evaluation return: 7.350004 - mean training return: 49.774536 - std dev training return: 34.108013
num samples: 1184768 - evaluation return: 8.855827 - mean training return: 48.475475 - std dev training return: 33.205433
num samples: 1203200 - evaluation return: 6.800958 - mean training return: 43.822147 - std dev training return: 27.918304
num samples: 1249280 - evaluation return: 60.188492 - mean training return: 48.652798 - std dev training return: 32.888950
num samples: 1267200 - evaluation return: 7.280651 - mean training return: 43.635883 - std dev training return: 29.472729
num samples: 1314816 - evaluation return: 68.751907 - mean training return: 45.681065 - std dev training return: 32.199825
num samples: 1334784 - evaluation return: 10.479751 - mean training return: 46.177887 - std dev training return: 33.436707
num samples: 1363456 - evaluation return: 27.123913 - mean training return: 45.143280 - std dev training return: 32.398781
num samples: 1382912 - evaluation return: 10.328647 - mean training return: 43.507858 - std dev training return: 35.936104
num samples: 1401856 - evaluation return: 7.638084 - mean training return: 44.668758 - std dev training return: 31.289669
num samples: 1432576 - evaluation return: 32.943344 - mean training return: 44.900688 - std dev training return: 31.360880
num samples: 1452544 - evaluation return: 9.221864 - mean training return: 41.564133 - std dev training return: 27.927759
num samples: 1473536 - evaluation return: 11.704432 - mean training return: 48.011837 - std dev training return: 34.653778
num samples: 1496064 - evaluation return: 15.954937 - mean training return: 50.346596 - std dev training return: 35.377712
num samples: 1514496 - evaluation return: 8.035228 - mean training return: 47.771240 - std dev training return: 33.395077
num samples: 1533952 - evaluation return: 8.281386 - mean training return: 41.216488 - std dev training return: 30.946314
num samples: 1553408 - evaluation return: 10.508433 - mean training return: 44.966591 - std dev training return: 29.735842
num samples: 1575936 - evaluation return: 16.217566 - mean training return: 44.983177 - std dev training return: 33.251244
num samples: 1594368 - evaluation return: 8.081646 - mean training return: 44.372837 - std dev training return: 34.626404
num samples: 1615872 - evaluation return: 12.964675 - mean training return: 45.627056 - std dev training return: 30.419598
num samples: 1636864 - evaluation return: 11.474453 - mean training return: 44.596386 - std dev training return: 30.335295
num samples: 1656320 - evaluation return: 9.743287 - mean training return: 48.475723 - std dev training return: 34.589176
num samples: 1676288 - evaluation return: 9.889929 - mean training return: 45.983326 - std dev training return: 32.190174
num samples: 1706496 - evaluation return: 31.201733 - mean training return: 44.044250 - std dev training return: 29.999483
num samples: 1751552 - evaluation return: 67.660858 - mean training return: 47.377201 - std dev training return: 33.140793
num samples: 1771520 - evaluation return: 11.536253 - mean training return: 48.449409 - std dev training return: 33.765598
num samples: 1788928 - evaluation return: 7.400703 - mean training return: 46.131039 - std dev training return: 34.952114
num samples: 1810944 - evaluation return: 14.474745 - mean training return: 42.301899 - std dev training return: 35.764179
num samples: 1831936 - evaluation return: 10.688743 - mean training return: 46.439331 - std dev training return: 31.543478
num samples: 1860096 - evaluation return: 26.456173 - mean training return: 45.223267 - std dev training return: 31.939152
num samples: 1881600 - evaluation return: 10.944726 - mean training return: 44.479214 - std dev training return: 27.474779
num samples: 1905152 - evaluation return: 18.194380 - mean training return: 50.965813 - std dev training return: 35.656303
num samples: 1925120 - evaluation return: 9.182425 - mean training return: 46.331676 - std dev training return: 33.462132
num samples: 1943040 - evaluation return: 7.370675 - mean training return: 49.256039 - std dev training return: 32.469273
num samples: 1963520 - evaluation return: 10.486354 - mean training return: 46.099960 - std dev training return: 32.586018
num samples: 1980416 - evaluation return: 6.084354 - mean training return: 45.396633 - std dev training return: 33.130478
num samples: 2000384 - evaluation return: 11.129582 - mean training return: 45.089146 - std dev training return: 33.360134
num samples: 2021376 - evaluation return: 11.500680 - mean training return: 46.762657 - std dev training return: 32.203106
num samples: 2042880 - evaluation return: 12.560404 - mean training return: 49.993290 - std dev training return: 33.039204
num samples: 2061824 - evaluation return: 9.376321 - mean training return: 45.685165 - std dev training return: 33.842228
num samples: 2081792 - evaluation return: 11.399903 - mean training return: 44.692337 - std dev training return: 33.136116
num samples: 2101760 - evaluation return: 10.711651 - mean training return: 44.341721 - std dev training return: 31.529741
num samples: 2128896 - evaluation return: 24.776752 - mean training return: 47.567459 - std dev training return: 30.484776
num samples: 2200064 - evaluation return: 87.675751 - mean training return: 46.126362 - std dev training return: 31.622690
Traceback (most recent call last):
File "examples/SAC_MLP_Isaac_Gym.py", line 30, in <module>
train_SAC_MLP_on_environiment("Cartpole")
File "examples/SAC_MLP_Isaac_Gym.py", line 20, in train_SAC_MLP_on_environiment
actions = agent.step(states)
File "/home/momin/Documents/GitHub/FinEnvs/finenvs/agents/SAC/SAC_agent.py", line 117, in step
actions = self.actor.get_actions_and_log_probs(states)[0]
File "/home/momin/Documents/GitHub/FinEnvs/finenvs/agents/SAC/actor.py", line 58, in get_actions_and_log_probs
distribution = self.get_distribution(states)
File "/home/momin/Documents/GitHub/FinEnvs/finenvs/agents/SAC/actor.py", line 51, in get_distribution
mean = self.forward(states)
File "/home/momin/Documents/GitHub/FinEnvs/finenvs/agents/networks/multilayer_perceptron.py", line 26, in forward
return self.network(inputs)
File "/home/momin/anaconda3/envs/rlgpu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/momin/anaconda3/envs/rlgpu/lib/python3.7/site-packages/torch/nn/modules/container.py", line 141, in forward
input = module(input)
File "/home/momin/anaconda3/envs/rlgpu/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/momin/anaconda3/envs/rlgpu/lib/python3.7/site-packages/torch/nn/modules/activation.py", line 499, in forward
return F.elu(input, self.alpha, self.inplace)
File "/home/momin/anaconda3/envs/rlgpu/lib/python3.7/site-packages/torch/nn/functional.py", line 1391, in elu
result = torch._C._nn.elu(input, alpha)
RuntimeError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 0; 10.75 GiB total capacity; 8.52 GiB already allocated; 31.38 MiB free; 8.60 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
real 0m36.206s
user 0m37.513s
sys 0m3.575s
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.