Thanks for very interesting project and contributing to NNUE training.
python train.py total_3m_d14.bin total_3m_d16_nnue.bin --lambda 1.0 --val_check_interval 2000 --threads 2 --batch-size 16384 --progress_bar_refresh_rate 20
RuntimeError: Pinned memory requires CUDA. PyTorch splits its backend into two shared libraries: a CPU library and a CUDA library; this error has occurred because you are trying to use some CUDA functionality, but the CUDA library has not been loaded by the dynamic linker for some reason. The CUDA library MUST be loaded, EVEN IF you don't directly use any symbols from the CUDA library! One common culprit is a lack of -INCLUDE:?warp_size@cuda@at@@YAHXZ in your link arguments; many dynamic linkers will delete dynamic library dependencies if you don't depend on any of their symbols. You can check if this has occurred by using link on your binary to see if there is a dependency on *_cuda.dll library.
I am running Windows 10 and I installed CUDA. Full output of the command is below:
(env) C:\Users\volodymyr\Documents\Sources\nnue-pytorch>python train.py total_3m_d14.bin total_3m_d16_nnue.bin --lambda 1.0 --val_check_interval 2000 --threads 2 --batch-size 16384 --progress_bar_refresh_rate 20
Feature set: HalfKP^
Num real features: 41024
Num virtual features: 704
Num features: 41728
Training with total_3m_d14.bin validating with total_3m_d16_nnue.bin
Seed 42
Using batch size 16384
Smart fen skipping: False
Random fen skipping: 0
limiting torch to 2 threads.
Using log dir logs/
C:\Users\volodymyr\Documents\Sources\nnue-pytorch\env\lib\site-packages\pytorch_lightning\utilities\distributed.py:49: UserWarning: ModelCheckpoint(save_last=True, monitor=None) is a redundant configuration. You can save the last checkpoint with ModelCheckpoint(save_top_k=None, monitor=None).
warnings.warn(*args, **kwargs)
GPU available: False, used: False
TPU available: None, using: 0 TPU cores
Using c++ data loader
Ranger optimizer loaded.
Gradient Centralization usage = True
GC applied to both conv and fc layers
| Name | Type | Params
----------------------------------
0 | input | Linear | 10.7 M
1 | l1 | Linear | 16.4 K
2 | l2 | Linear | 1.1 K
3 | output | Linear | 33
----------------------------------
10.7 M Trainable params
0 Non-trainable params
10.7 M Total params
Validation sanity check: 0it [00:00, ?it/s]Traceback (most recent call last):
File "train.py", line 99, in <module>
main()
File "train.py", line 96, in main
trainer.fit(nnue, train, val)
File "C:\Users\volodymyr\Documents\Sources\nnue-pytorch\env\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 470, in fit
results = self.accelerator_backend.train()
File "C:\Users\volodymyr\Documents\Sources\nnue-pytorch\env\lib\site-packages\pytorch_lightning\accelerators\cpu_accelerator.py", line 62, in train
results = self.train_or_test()
File "C:\Users\volodymyr\Documents\Sources\nnue-pytorch\env\lib\site-packages\pytorch_lightning\accelerators\accelerator.py", line 69, in train_or_test
results = self.trainer.train()
File "C:\Users\volodymyr\Documents\Sources\nnue-pytorch\env\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 492, in train
self.run_sanity_check(self.get_model())
File "C:\Users\volodymyr\Documents\Sources\nnue-pytorch\env\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 690, in run_sanity_check
_, eval_results = self.run_evaluation(test_mode=False, max_batches=self.num_sanity_val_batches)
File "C:\Users\volodymyr\Documents\Sources\nnue-pytorch\env\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 593, in run_evaluation
for batch_idx, batch in enumerate(dataloader):
File "C:\Users\volodymyr\Documents\Sources\nnue-pytorch\env\lib\site-packages\torch\utils\data\dataloader.py", line 435, in __next__
data = self._next_data()
File "C:\Users\volodymyr\Documents\Sources\nnue-pytorch\env\lib\site-packages\torch\utils\data\dataloader.py", line 475, in _next_data
data = self._dataset_fetcher.fetch(index) # may raise StopIteration
File "C:\Users\volodymyr\Documents\Sources\nnue-pytorch\env\lib\site-packages\torch\utils\data\_utils\fetch.py", line 46, in fetch
data = self.dataset[possibly_batched_index]
File "C:\Users\volodymyr\Documents\Sources\nnue-pytorch\nnue_dataset.py", line 151, in __getitem__
return next(self.iter)
File "C:\Users\volodymyr\Documents\Sources\nnue-pytorch\nnue_dataset.py", line 89, in __next__
tensors = v.contents.get_tensors(self.device)
File "C:\Users\volodymyr\Documents\Sources\nnue-pytorch\nnue_dataset.py", line 32, in get_tensors
white_values = torch.from_numpy(np.ctypeslib.as_array(self.white_values, shape=(self.num_active_white_features,))).pin_memory().to(device=device, non_blocking=True)
RuntimeError: Pinned memory requires CUDA. PyTorch splits its backend into two shared libraries: a CPU library and a CUDA library; this error has occurred because you are trying to use some CUDA functionality, but the CUDA library has not been loaded by the dynamic linker for some reason. The CUDA library MUST be loaded, EVEN IF you don't directly use any symbols from the CUDA library! One common culprit is a lack of -INCLUDE:?warp_size@cuda@at@@YAHXZ in your link arguments; many dynamic linkers will delete dynamic library dependencies if you don't depend on any of their symbols. You can check if this has occurred by using link on your binary to see if there is a dependency on *_cuda.dll library.
Thanks for any help or hint for the issue.
With best regards,
Volodymyr