I installed CUDA 10.0 and built ManagedCUDA (x64, Release) dll libraries.
Before I started training, I copied several files into the same directory as "Seq2SeqConsole.exe":
info,2/11/2019 11:17:18 AM Command Line = '-TaskName train -WordVectorSize 50 -HiddenSize 50 -LearningRate 0.1 -ModelFilePath alarm.model -SrcVocab data_vocab.source -TgtVocab data_vocab.target -SrcLang en -TgtLang lf -TrainCorpusPath C:/Users/vlad/Downloads -ArchType 0 -Depth 1'
info,2/11/2019 11:17:18 AM Source Language = 'en'
info,2/11/2019 11:17:18 AM Target Language = 'lf'
info,2/11/2019 11:17:18 AM SSE Enable = 'True'
info,2/11/2019 11:17:18 AM SSE Size = '256'
info,2/11/2019 11:17:18 AM Processor counter = '8'
info,2/11/2019 11:17:18 AM Hidden Size = '50'
info,2/11/2019 11:17:18 AM Word Vector Size = '50'
info,2/11/2019 11:17:18 AM Learning Rate = '0.1'
info,2/11/2019 11:17:18 AM Network Layer = '1'
info,2/11/2019 11:17:18 AM Gradient Clip = '5'
info,2/11/2019 11:17:18 AM Dropout Ratio = '0.1'
info,2/11/2019 11:17:18 AM Batch Size = '1'
info,2/11/2019 11:17:18 AM Arch Type = 'GPU_CUDA'
info,2/11/2019 11:17:18 AM Device Ids = '0'
info,2/11/2019 11:17:18 AM Loading model from 'alarm.model'...
info,2/11/2019 11:17:18 AM Initialize device '0'
Precompiling GatherScatterKernels
Precompiling Im2ColKernels
Precompiling IndexSelectKernels
Precompiling ReduceDimIndexKernels
Precompiling CudaReduceAllKernels
Precompiling CudaReduceKernels
Precompiling ElementwiseKernels
Precompiling FillCopyKernels
Precompiling SoftmaxKernels
Precompiling SpatialMaxPoolKernels
Precompiling VarStdKernels
info,2/11/2019 11:24:37 AM Loading model from 'alarm.model'...
info,2/11/2019 11:24:37 AM Initializing weights...
info,2/11/2019 11:24:37 AM Initializing weights for device '0'
info,2/11/2019 11:24:37 AM Initializing encoders and decoders for device '0'...
info,2/11/2019 11:24:37 AM Start to train...
info,2/11/2019 11:24:37 AM Shuffling training corpus...
info,2/11/2019 11:24:37 AM Base learning rate is '0.1' at epoch '0'
info,2/11/2019 11:24:37 AM Cleaning cache of weights optmiazation.'
info,2/11/2019 11:24:37 AM Start to process training corpus.
info,2/11/2019 11:24:37 AM Shuffling training corpus...
Then it gets stuck (it also took a while to do Precompiling steps). One CPU is loaded 100% and 29GB (!) of RAM is used! My system has 64GB of RAM so the RAM appears not to be the issue.
Note: Training using CPU only works just fine.