when I try to use command like this to compile scons -j

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

How to use GPU to speed up in clstm?,about tmbdev/clstm

Comments (19)

ASDen commented on September 23, 2024

one thing to note that is not directly mentioned is that the required minimum CUDA compute capability is 3.0

from clstm.

Halfish commented on September 23, 2024

@ASDen Yes, the GPU in my school lab is powerful enough. But I wonder how to enable the GPU, because I got compile error when I run scons gpu=1 clstmocrtrain.
Here is the nvcc.log

In file included from clstm_compute_cuda.cc:9:0:
clstm_compute.cc:49:0: warning: "CLSTM_ALL_TENSOR" redefined [enabled by default]
 #define CLSTM_ALL_TENSOR
 ^
<command-line>:0:0: note: this is the location of the previous definition
utils.h(230): warning: statement is unreachable

utils.h(247): warning: statement is unreachable

/usr/local/include/eigen3/Eigen/src/Core/util/Memory.h(585): warning: calling a __host__ function from a __host__ __device__ function is not allowed
          detected during:
            instantiation of "void Eigen::internal::smart_copy_helper<T, false>::run(const T *, const T *, T *) [with T=ocropus::IndexPair]" 
(575): here
            instantiation of "void Eigen::internal::smart_copy(const T *, const T *, T *) [with T=ocropus::IndexPair]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Core/util/EmulateArray.h(114): here
            instantiation of "Eigen::array<T, n>::array(std::initializer_list<T>) [with T=ocropus::IndexPair, n=1UL]" 
clstm_compute.cc(101): here

/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionMapper.h(251): warning: dynamic initialization in unreachable code
          detected during:
            instantiation of "Eigen::internal::BaseTensorContractionMapper<Scalar, Index, side, Tensor, nocontract_t, contract_t, packet_size, inner_dim_contiguous, inner_dim_reordered, Alignment>::Packet Eigen::internal::BaseTensorContractionMapper<Scalar, Index, side, Tensor, nocontract_t, contract_t, packet_size, inner_dim_contiguous, inner_dim_reordered, Alignment>::loadPacket(Index, Index) const [with Scalar=ocropus::Float, Index=Eigen::DenseIndex, side=1, Tensor=Eigen::TensorEvaluator<const ocropus::TensorMap2, Eigen::GpuDevice>, nocontract_t=Eigen::array<Eigen::DenseIndex, 1UL>, contract_t=Eigen::array<Eigen::DenseIndex, 1UL>, packet_size=4, inner_dim_contiguous=true, inner_dim_reordered=false, Alignment=0, AlignmentType=0]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h(572): here
            instantiation of "void Eigen::EigenFloatContractionKernelInternal16x16<Index,LhsMapper,RhsMapper,OutputMapper,CHECK_LHS_BOUNDARY,CHECK_RHS_BOUNDARY>(LhsMapper, RhsMapper, OutputMapper, float2 (*)[16], float2 (*)[8], Index, Index, Index, Index, Index) [with Index=Eigen::DenseIndex, LhsMapper=Eigen::internal::TensorContractionInputMapper<ocropus::Float, Eigen::DenseIndex, 1, Eigen::TensorEvaluator<const ocropus::TensorMap2, Eigen::GpuDevice>, Eigen::array<Eigen::DenseIndex, 1UL>, Eigen::array<Eigen::DenseIndex, 1UL>, 4, true, false, 0>, RhsMapper=Eigen::internal::TensorContractionInputMapper<ocropus::Float, Eigen::DenseIndex, 0, Eigen::TensorEvaluator<const ocropus::TensorMap2, Eigen::GpuDevice>, Eigen::array<Eigen::DenseIndex, 1UL>, Eigen::array<Eigen::DenseIndex, 1UL>, 4, true, true, 0>, OutputMapper=Eigen::internal::blas_data_mapper<ocropus::Float, Eigen::DenseIndex, 0, 0>, CHECK_LHS_BOUNDARY=false, CHECK_RHS_BOUNDARY=false]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h(1190): here
            instantiation of "void Eigen::EigenFloatContractionKernel16x16(LhsMapper, RhsMapper, OutputMapper, Index, Index, Index) [with Index=Eigen::DenseIndex, LhsMapper=Eigen::internal::TensorContractionInputMapper<ocropus::Float, Eigen::DenseIndex, 1, Eigen::TensorEvaluator<const ocropus::TensorMap2, Eigen::GpuDevice>, Eigen::array<Eigen::DenseIndex, 1UL>, Eigen::array<Eigen::DenseIndex, 1UL>, 4, true, false, 0>, RhsMapper=Eigen::internal::TensorContractionInputMapper<ocropus::Float, Eigen::DenseIndex, 0, Eigen::TensorEvaluator<const ocropus::TensorMap2, Eigen::GpuDevice>, Eigen::array<Eigen::DenseIndex, 1UL>, Eigen::array<Eigen::DenseIndex, 1UL>, 4, true, true, 0>, OutputMapper=Eigen::internal::blas_data_mapper<ocropus::Float, Eigen::DenseIndex, 0, 0>]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h(1363): here
            instantiation of "void Eigen::TensorEvaluator<const Eigen::TensorContractionOp<Indices, LeftArgType, RightArgType>, Eigen::GpuDevice>::evalTyped<lhs_inner_dim_contiguous,rhs_inner_dim_contiguous,rhs_inner_dim_reordered,Alignment>(Eigen::TensorEvaluator<const Eigen::TensorContractionOp<Indices, LeftArgType, RightArgType>, Eigen::GpuDevice>::Scalar *) const [with Indices=const ocropus::Axes1, LeftArgType=const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, RightArgType=const ocropus::TensorMap2, lhs_inner_dim_contiguous=true, rhs_inner_dim_contiguous=true, rhs_inner_dim_reordered=true, Alignment=0]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h(1281): here
            instantiation of "void Eigen::TensorEvaluator<const Eigen::TensorContractionOp<Indices, LeftArgType, RightArgType>, Eigen::GpuDevice>::evalTo(Eigen::TensorEvaluator<const Eigen::TensorContractionOp<Indices, LeftArgType, RightArgType>, Eigen::GpuDevice>::Scalar *) const [with Indices=const ocropus::Axes1, LeftArgType=const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, RightArgType=const ocropus::TensorMap2]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h(1268): here
            instantiation of "__nv_bool Eigen::TensorEvaluator<const Eigen::TensorContractionOp<Indices, LeftArgType, RightArgType>, Eigen::GpuDevice>::evalSubExprsIfNeeded(Eigen::TensorEvaluator<const Eigen::TensorContractionOp<Indices, LeftArgType, RightArgType>, Eigen::GpuDevice>::Scalar *) [with Indices=const ocropus::Axes1, LeftArgType=const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, RightArgType=const ocropus::TensorMap2]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h(131): here
            instantiation of "__nv_bool Eigen::TensorEvaluator<const Eigen::TensorAssignOp<LeftArgType, RightArgType>, Device>::evalSubExprsIfNeeded(Eigen::TensorEvaluator<const Eigen::TensorAssignOp<LeftArgType, RightArgType>, Device>::Scalar *) [with LeftArgType=Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, RightArgType=const Eigen::TensorContractionOp<const ocropus::Axes1, const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, const ocropus::TensorMap2>, Device=Eigen::GpuDevice]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h(219): here
            instantiation of "void Eigen::internal::TensorExecutor<Expression, Eigen::GpuDevice, false>::run(const Expression &, const Eigen::GpuDevice &) [with Expression=const Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, const Eigen::TensorContractionOp<const ocropus::Axes1, const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, const ocropus::TensorMap2>>]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorDevice.h(35): here
            instantiation of "Eigen::TensorDevice<ExpressionType, DeviceType> &Eigen::TensorDevice<ExpressionType, DeviceType>::operator=(const OtherDerived &) [with ExpressionType=Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, DeviceType=ocropus::Device, OtherDerived=Eigen::TensorContractionOp<const ocropus::Axes1, const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, const ocropus::TensorMap2>]" 
clstm_compute.cc(286): here

/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionMapper.h(251): warning: dynamic initialization in unreachable code
          detected during:
            instantiation of "Eigen::internal::BaseTensorContractionMapper<Scalar, Index, side, Tensor, nocontract_t, contract_t, packet_size, inner_dim_contiguous, inner_dim_reordered, Alignment>::Packet Eigen::internal::BaseTensorContractionMapper<Scalar, Index, side, Tensor, nocontract_t, contract_t, packet_size, inner_dim_contiguous, inner_dim_reordered, Alignment>::loadPacket(Index, Index) const [with Scalar=ocropus::Float, Index=Eigen::DenseIndex, side=0, Tensor=Eigen::TensorEvaluator<const ocropus::TensorMap2, Eigen::GpuDevice>, nocontract_t=Eigen::array<Eigen::DenseIndex, 1UL>, contract_t=Eigen::array<Eigen::DenseIndex, 1UL>, packet_size=4, inner_dim_contiguous=true, inner_dim_reordered=false, Alignment=0, AlignmentType=0]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h(580): here
            instantiation of "void Eigen::EigenFloatContractionKernelInternal16x16<Index,LhsMapper,RhsMapper,OutputMapper,CHECK_LHS_BOUNDARY,CHECK_RHS_BOUNDARY>(LhsMapper, RhsMapper, OutputMapper, float2 (*)[16], float2 (*)[8], Index, Index, Index, Index, Index) [with Index=Eigen::DenseIndex, LhsMapper=Eigen::internal::TensorContractionInputMapper<ocropus::Float, Eigen::DenseIndex, 1, Eigen::TensorEvaluator<const ocropus::TensorMap2, Eigen::GpuDevice>, Eigen::array<Eigen::DenseIndex, 1UL>, Eigen::array<Eigen::DenseIndex, 1UL>, 4, true, false, 0>, RhsMapper=Eigen::internal::TensorContractionInputMapper<ocropus::Float, Eigen::DenseIndex, 0, Eigen::TensorEvaluator<const ocropus::TensorMap2, Eigen::GpuDevice>, Eigen::array<Eigen::DenseIndex, 1UL>, Eigen::array<Eigen::DenseIndex, 1UL>, 4, true, false, 0>, OutputMapper=Eigen::internal::blas_data_mapper<ocropus::Float, Eigen::DenseIndex, 0, 0>, CHECK_LHS_BOUNDARY=false, CHECK_RHS_BOUNDARY=false]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h(1190): here
            instantiation of "void Eigen::EigenFloatContractionKernel16x16(LhsMapper, RhsMapper, OutputMapper, Index, Index, Index) [with Index=Eigen::DenseIndex, LhsMapper=Eigen::internal::TensorContractionInputMapper<ocropus::Float, Eigen::DenseIndex, 1, Eigen::TensorEvaluator<const ocropus::TensorMap2, Eigen::GpuDevice>, Eigen::array<Eigen::DenseIndex, 1UL>, Eigen::array<Eigen::DenseIndex, 1UL>, 4, true, false, 0>, RhsMapper=Eigen::internal::TensorContractionInputMapper<ocropus::Float, Eigen::DenseIndex, 0, Eigen::TensorEvaluator<const ocropus::TensorMap2, Eigen::GpuDevice>, Eigen::array<Eigen::DenseIndex, 1UL>, Eigen::array<Eigen::DenseIndex, 1UL>, 4, true, false, 0>, OutputMapper=Eigen::internal::blas_data_mapper<ocropus::Float, Eigen::DenseIndex, 0, 0>]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h(1363): here
            instantiation of "void Eigen::TensorEvaluator<const Eigen::TensorContractionOp<Indices, LeftArgType, RightArgType>, Eigen::GpuDevice>::evalTyped<lhs_inner_dim_contiguous,rhs_inner_dim_contiguous,rhs_inner_dim_reordered,Alignment>(Eigen::TensorEvaluator<const Eigen::TensorContractionOp<Indices, LeftArgType, RightArgType>, Eigen::GpuDevice>::Scalar *) const [with Indices=const ocropus::Axes1, LeftArgType=const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, RightArgType=const ocropus::TensorMap2, lhs_inner_dim_contiguous=true, rhs_inner_dim_contiguous=true, rhs_inner_dim_reordered=false, Alignment=0]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h(1284): here
            instantiation of "void Eigen::TensorEvaluator<const Eigen::TensorContractionOp<Indices, LeftArgType, RightArgType>, Eigen::GpuDevice>::evalTo(Eigen::TensorEvaluator<const Eigen::TensorContractionOp<Indices, LeftArgType, RightArgType>, Eigen::GpuDevice>::Scalar *) const [with Indices=const ocropus::Axes1, LeftArgType=const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, RightArgType=const ocropus::TensorMap2]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h(1268): here
            instantiation of "__nv_bool Eigen::TensorEvaluator<const Eigen::TensorContractionOp<Indices, LeftArgType, RightArgType>, Eigen::GpuDevice>::evalSubExprsIfNeeded(Eigen::TensorEvaluator<const Eigen::TensorContractionOp<Indices, LeftArgType, RightArgType>, Eigen::GpuDevice>::Scalar *) [with Indices=const ocropus::Axes1, LeftArgType=const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, RightArgType=const ocropus::TensorMap2]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h(131): here
            instantiation of "__nv_bool Eigen::TensorEvaluator<const Eigen::TensorAssignOp<LeftArgType, RightArgType>, Device>::evalSubExprsIfNeeded(Eigen::TensorEvaluator<const Eigen::TensorAssignOp<LeftArgType, RightArgType>, Device>::Scalar *) [with LeftArgType=Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, RightArgType=const Eigen::TensorContractionOp<const ocropus::Axes1, const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, const ocropus::TensorMap2>, Device=Eigen::GpuDevice]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h(219): here
            instantiation of "void Eigen::internal::TensorExecutor<Expression, Eigen::GpuDevice, false>::run(const Expression &, const Eigen::GpuDevice &) [with Expression=const Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, const Eigen::TensorContractionOp<const ocropus::Axes1, const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, const ocropus::TensorMap2>>]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorDevice.h(35): here
            instantiation of "Eigen::TensorDevice<ExpressionType, DeviceType> &Eigen::TensorDevice<ExpressionType, DeviceType>::operator=(const OtherDerived &) [with ExpressionType=Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, DeviceType=ocropus::Device, OtherDerived=Eigen::TensorContractionOp<const ocropus::Axes1, const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, const ocropus::TensorMap2>]" 
clstm_compute.cc(286): here

/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h(1268): warning: calling a __host__ function from a __host__ __device__ function is not allowed
          detected during:
            instantiation of "__nv_bool Eigen::TensorEvaluator<const Eigen::TensorContractionOp<Indices, LeftArgType, RightArgType>, Eigen::GpuDevice>::evalSubExprsIfNeeded(Eigen::TensorEvaluator<const Eigen::TensorContractionOp<Indices, LeftArgType, RightArgType>, Eigen::GpuDevice>::Scalar *) [with Indices=const ocropus::Axes1, LeftArgType=const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, RightArgType=const ocropus::TensorMap2]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h(131): here
            instantiation of "__nv_bool Eigen::TensorEvaluator<const Eigen::TensorAssignOp<LeftArgType, RightArgType>, Device>::evalSubExprsIfNeeded(Eigen::TensorEvaluator<const Eigen::TensorAssignOp<LeftArgType, RightArgType>, Device>::Scalar *) [with LeftArgType=Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, RightArgType=const Eigen::TensorContractionOp<const ocropus::Axes1, const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, const ocropus::TensorMap2>, Device=Eigen::GpuDevice]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h(219): here
            instantiation of "void Eigen::internal::TensorExecutor<Expression, Eigen::GpuDevice, false>::run(const Expression &, const Eigen::GpuDevice &) [with Expression=const Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, const Eigen::TensorContractionOp<const ocropus::Axes1, const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, const ocropus::TensorMap2>>]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorDevice.h(35): here
            instantiation of "Eigen::TensorDevice<ExpressionType, DeviceType> &Eigen::TensorDevice<ExpressionType, DeviceType>::operator=(const OtherDerived &) [with ExpressionType=Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, DeviceType=ocropus::Device, OtherDerived=Eigen::TensorContractionOp<const ocropus::Axes1, const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, const ocropus::TensorMap2>]" 
clstm_compute.cc(286): here

/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h(1272): warning: calling a __host__ function from a __host__ __device__ function is not allowed
          detected during:
            instantiation of "__nv_bool Eigen::TensorEvaluator<const Eigen::TensorContractionOp<Indices, LeftArgType, RightArgType>, Eigen::GpuDevice>::evalSubExprsIfNeeded(Eigen::TensorEvaluator<const Eigen::TensorContractionOp<Indices, LeftArgType, RightArgType>, Eigen::GpuDevice>::Scalar *) [with Indices=const ocropus::Axes1, LeftArgType=const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, RightArgType=const ocropus::TensorMap2]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h(131): here
            instantiation of "__nv_bool Eigen::TensorEvaluator<const Eigen::TensorAssignOp<LeftArgType, RightArgType>, Device>::evalSubExprsIfNeeded(Eigen::TensorEvaluator<const Eigen::TensorAssignOp<LeftArgType, RightArgType>, Device>::Scalar *) [with LeftArgType=Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, RightArgType=const Eigen::TensorContractionOp<const ocropus::Axes1, const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, const ocropus::TensorMap2>, Device=Eigen::GpuDevice]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h(219): here
            instantiation of "void Eigen::internal::TensorExecutor<Expression, Eigen::GpuDevice, false>::run(const Expression &, const Eigen::GpuDevice &) [with Expression=const Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, const Eigen::TensorContractionOp<const ocropus::Axes1, const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, const ocropus::TensorMap2>>]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorDevice.h(35): here
            instantiation of "Eigen::TensorDevice<ExpressionType, DeviceType> &Eigen::TensorDevice<ExpressionType, DeviceType>::operator=(const OtherDerived &) [with ExpressionType=Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, DeviceType=ocropus::Device, OtherDerived=Eigen::TensorContractionOp<const ocropus::Axes1, const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, const ocropus::TensorMap2>]" 
clstm_compute.cc(286): here

/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorIndexList.h(535): warning: calling a constexpr __host__ function("run") from a __host__ __device__ function("indices_statically_known_to_increase") is not allowed. The experimental flag '--expt-relaxed-constexpr' can be used to allow this.
          detected during:
            instantiation of "__nv_bool Eigen::internal::indices_statically_known_to_increase<T>() [with T=const ocropus::Indexes1]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h(94): here
            instantiation of class "Eigen::internal::are_inner_most_dims<ReducedDims, NumTensorDims, 0> [with ReducedDims=const ocropus::Indexes1, NumTensorDims=2]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h(434): here
            instantiation of class "Eigen::TensorEvaluator<const Eigen::TensorReductionOp<Op, Dims, ArgType>, Device> [with Op=Eigen::internal::SumReducer<ocropus::Float>, Dims=const ocropus::Indexes1, ArgType=const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, Device=Eigen::GpuDevice]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorEvaluator.h(314): here
            instantiation of class "Eigen::TensorEvaluator<const Eigen::TensorCwiseBinaryOp<BinaryOp, LeftArgType, RightArgType>, Device> [with BinaryOp=Eigen::internal::scalar_sum_op<ocropus::Float>, LeftArgType=const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 1, 0, Eigen::DenseIndex>, 0>, RightArgType=const Eigen::TensorReductionOp<Eigen::internal::SumReducer<ocropus::Float>, const ocropus::Indexes1, const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>>, Device=Eigen::GpuDevice]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h(97): here
            instantiation of class "Eigen::TensorEvaluator<const Eigen::TensorAssignOp<LeftArgType, RightArgType>, Device> [with LeftArgType=Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 1, 0, Eigen::DenseIndex>, 0>, RightArgType=const Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<ocropus::Float>, const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 1, 0, Eigen::DenseIndex>, 0>, const Eigen::TensorReductionOp<Eigen::internal::SumReducer<ocropus::Float>, const ocropus::Indexes1, const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>>>, Device=Eigen::GpuDevice]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorForwardDeclarations.h(83): here
            instantiation of class "Eigen::internal::IsVectorizable<Eigen::GpuDevice, Expression> [with Expression=const Eigen::TensorAssignOp<Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 1, 0, Eigen::DenseIndex>, 0>, const Eigen::TensorCwiseBinaryOp<Eigen::internal::scalar_sum_op<ocropus::Float>, const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 1, 0, Eigen::DenseIndex>, 0>, const Eigen::TensorReductionOp<Eigen::internal::SumReducer<ocropus::Float>, const ocropus::Indexes1, const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>>>>]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorForwardDeclarations.h(88): here
            processing of template argument list for "Eigen::internal::TensorExecutor" based on template arguments <const Assign, ocropus::Device> 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorDevice.h(46): here
            instantiation of "Eigen::TensorDevice<ExpressionType, DeviceType> &Eigen::TensorDevice<ExpressionType, DeviceType>::operator+=(const OtherDerived &) [with ExpressionType=Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 1, 0, Eigen::DenseIndex>, 0>, DeviceType=ocropus::Device, OtherDerived=Eigen::TensorReductionOp<Eigen::internal::SumReducer<ocropus::Float>, const ocropus::Indexes1, const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>>]" 
clstm_compute.cc(298): here

/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h(547): warning: calling a __host__ function from a __host__ __device__ function is not allowed
          detected during:
            instantiation of "__nv_bool Eigen::TensorEvaluator<const Eigen::TensorReductionOp<Op, Dims, ArgType>, Device>::evalSubExprsIfNeeded(Eigen::TensorEvaluator<const Eigen::TensorReductionOp<Op, Dims, ArgType>, Device>::CoeffReturnType *) [with Op=Eigen::internal::SumReducer<ocropus::Float>, Dims=const ocropus::Indexes1, ArgType=const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, Device=Eigen::DefaultDevice]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h(131): here
            instantiation of "__nv_bool Eigen::TensorEvaluator<const Eigen::TensorAssignOp<LeftArgType, RightArgType>, Device>::evalSubExprsIfNeeded(Eigen::TensorEvaluator<const Eigen::TensorAssignOp<LeftArgType, RightArgType>, Device>::Scalar *) [with LeftArgType=Eigen::Tensor<ocropus::Float, 1, 0, Eigen::DenseIndex>, RightArgType=const Eigen::TensorReductionOp<Eigen::internal::SumReducer<ocropus::Float>, const ocropus::Indexes1, const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>>, Device=Eigen::DefaultDevice]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h(57): here
            instantiation of "void Eigen::internal::TensorExecutor<Expression, Eigen::DefaultDevice, true>::run(const Expression &, const Eigen::DefaultDevice &) [with Expression=const Eigen::TensorAssignOp<Eigen::Tensor<ocropus::Float, 1, 0, Eigen::DenseIndex>, const Eigen::TensorReductionOp<Eigen::internal::SumReducer<ocropus::Float>, const ocropus::Indexes1, const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>>>]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/Tensor.h(392): here
            instantiation of "Eigen::Tensor<Scalar_, NumIndices_, Options_, IndexType_>::Tensor(const Eigen::TensorBase<OtherDerived, 0> &) [with Scalar_=ocropus::Float, NumIndices_=1, Options_=0, IndexType_=Eigen::DenseIndex, OtherDerived=Eigen::TensorReductionOp<Eigen::internal::SumReducer<ocropus::Float>, const ocropus::Indexes1, const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>>]" 
clstm_compute.cc(335): here

/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h(564): warning: calling a __host__ function from a __host__ __device__ function is not allowed
          detected during:
            instantiation of "__nv_bool Eigen::TensorEvaluator<const Eigen::TensorReductionOp<Op, Dims, ArgType>, Device>::evalSubExprsIfNeeded(Eigen::TensorEvaluator<const Eigen::TensorReductionOp<Op, Dims, ArgType>, Device>::CoeffReturnType *) [with Op=Eigen::internal::SumReducer<ocropus::Float>, Dims=const ocropus::Indexes1, ArgType=const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>, Device=Eigen::DefaultDevice]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorAssign.h(131): here
            instantiation of "__nv_bool Eigen::TensorEvaluator<const Eigen::TensorAssignOp<LeftArgType, RightArgType>, Device>::evalSubExprsIfNeeded(Eigen::TensorEvaluator<const Eigen::TensorAssignOp<LeftArgType, RightArgType>, Device>::Scalar *) [with LeftArgType=Eigen::Tensor<ocropus::Float, 1, 0, Eigen::DenseIndex>, RightArgType=const Eigen::TensorReductionOp<Eigen::internal::SumReducer<ocropus::Float>, const ocropus::Indexes1, const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>>, Device=Eigen::DefaultDevice]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorExecutor.h(57): here
            instantiation of "void Eigen::internal::TensorExecutor<Expression, Eigen::DefaultDevice, true>::run(const Expression &, const Eigen::DefaultDevice &) [with Expression=const Eigen::TensorAssignOp<Eigen::Tensor<ocropus::Float, 1, 0, Eigen::DenseIndex>, const Eigen::TensorReductionOp<Eigen::internal::SumReducer<ocropus::Float>, const ocropus::Indexes1, const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>>>]" 
/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/Tensor.h(392): here
            instantiation of "Eigen::Tensor<Scalar_, NumIndices_, Options_, IndexType_>::Tensor(const Eigen::TensorBase<OtherDerived, 0> &) [with Scalar_=ocropus::Float, NumIndices_=1, Options_=0, IndexType_=Eigen::DenseIndex, OtherDerived=Eigen::TensorReductionOp<Eigen::internal::SumReducer<ocropus::Float>, const ocropus::Indexes1, const Eigen::TensorMap<Eigen::Tensor<ocropus::Float, 2, 0, Eigen::DenseIndex>, 0>>]" 
clstm_compute.cc(335): here

/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h(1281): warning: calling a __host__ function("Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<int> , (unsigned long)1ul> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> > , Eigen::GpuDevice> ::evalTyped<(bool)1, (bool)1, (bool)1, (int)0> ") from a __host__ __device__ function("Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<int> , (unsigned long)1ul> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> > , Eigen::GpuDevice> ::evalSubExprsIfNeeded") is not allowed

/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h(1284): warning: calling a __host__ function("Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<int> , (unsigned long)1ul> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> > , Eigen::GpuDevice> ::evalTyped<(bool)1, (bool)1, (bool)0, (int)0> ") from a __host__ __device__ function("Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<int> , (unsigned long)1ul> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> > , Eigen::GpuDevice> ::evalSubExprsIfNeeded") is not allowed

/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h(1289): warning: calling a __host__ function("Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<int> , (unsigned long)1ul> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> > , Eigen::GpuDevice> ::evalTyped<(bool)1, (bool)0, (bool)1, (int)0> ") from a __host__ __device__ function("Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<int> , (unsigned long)1ul> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> > , Eigen::GpuDevice> ::evalSubExprsIfNeeded") is not allowed

/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h(1292): warning: calling a __host__ function("Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<int> , (unsigned long)1ul> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> > , Eigen::GpuDevice> ::evalTyped<(bool)1, (bool)0, (bool)0, (int)0> ") from a __host__ __device__ function("Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<int> , (unsigned long)1ul> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> > , Eigen::GpuDevice> ::evalSubExprsIfNeeded") is not allowed

/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h(1299): warning: calling a __host__ function("Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<int> , (unsigned long)1ul> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> > , Eigen::GpuDevice> ::evalTyped<(bool)0, (bool)1, (bool)1, (int)0> ") from a __host__ __device__ function("Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<int> , (unsigned long)1ul> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> > , Eigen::GpuDevice> ::evalSubExprsIfNeeded") is not allowed

/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h(1302): warning: calling a __host__ function("Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<int> , (unsigned long)1ul> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> > , Eigen::GpuDevice> ::evalTyped<(bool)0, (bool)1, (bool)0, (int)0> ") from a __host__ __device__ function("Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<int> , (unsigned long)1ul> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> > , Eigen::GpuDevice> ::evalSubExprsIfNeeded") is not allowed

/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h(1307): warning: calling a __host__ function("Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<int> , (unsigned long)1ul> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> > , Eigen::GpuDevice> ::evalTyped<(bool)0, (bool)0, (bool)1, (int)0> ") from a __host__ __device__ function("Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<int> , (unsigned long)1ul> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> > , Eigen::GpuDevice> ::evalSubExprsIfNeeded") is not allowed

/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h(1310): warning: calling a __host__ function("Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<int> , (unsigned long)1ul> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> > , Eigen::GpuDevice> ::evalTyped<(bool)0, (bool)0, (bool)0, (int)0> ") from a __host__ __device__ function("Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<int> , (unsigned long)1ul> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> > , Eigen::GpuDevice> ::evalSubExprsIfNeeded") is not allowed

/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h(1281): warning: calling a __host__ function("Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<int> , (unsigned long)1ul> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> > , Eigen::GpuDevice> ::evalTyped<(bool)1, (bool)1, (bool)1, (int)0> ") from a __host__ __device__ function("Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<int> , (unsigned long)1ul> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> > , Eigen::GpuDevice> ::evalSubExprsIfNeeded") is not allowed

/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h(1284): warning: calling a __host__ function("Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<int> , (unsigned long)1ul> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> > , Eigen::GpuDevice> ::evalTyped<(bool)1, (bool)1, (bool)0, (int)0> ") from a __host__ __device__ function("Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<int> , (unsigned long)1ul> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> > , Eigen::GpuDevice> ::evalSubExprsIfNeeded") is not allowed

/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h(1289): warning: calling a __host__ function("Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<int> , (unsigned long)1ul> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> > , Eigen::GpuDevice> ::evalTyped<(bool)1, (bool)0, (bool)1, (int)0> ") from a __host__ __device__ function("Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<int> , (unsigned long)1ul> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> > , Eigen::GpuDevice> ::evalSubExprsIfNeeded") is not allowed

/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h(1292): warning: calling a __host__ function("Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<int> , (unsigned long)1ul> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> > , Eigen::GpuDevice> ::evalTyped<(bool)1, (bool)0, (bool)0, (int)0> ") from a __host__ __device__ function("Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<int> , (unsigned long)1ul> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> > , Eigen::GpuDevice> ::evalSubExprsIfNeeded") is not allowed

/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h(1299): warning: calling a __host__ function("Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<int> , (unsigned long)1ul> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> > , Eigen::GpuDevice> ::evalTyped<(bool)0, (bool)1, (bool)1, (int)0> ") from a __host__ __device__ function("Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<int> , (unsigned long)1ul> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> > , Eigen::GpuDevice> ::evalSubExprsIfNeeded") is not allowed

/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h(1302): warning: calling a __host__ function("Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<int> , (unsigned long)1ul> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> > , Eigen::GpuDevice> ::evalTyped<(bool)0, (bool)1, (bool)0, (int)0> ") from a __host__ __device__ function("Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<int> , (unsigned long)1ul> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> > , Eigen::GpuDevice> ::evalSubExprsIfNeeded") is not allowed

/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h(1307): warning: calling a __host__ function("Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<int> , (unsigned long)1ul> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> > , Eigen::GpuDevice> ::evalTyped<(bool)0, (bool)0, (bool)1, (int)0> ") from a __host__ __device__ function("Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<int> , (unsigned long)1ul> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> > , Eigen::GpuDevice> ::evalSubExprsIfNeeded") is not allowed

/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorContractionCuda.h(1310): warning: calling a __host__ function("Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<int> , (unsigned long)1ul> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> > , Eigen::GpuDevice> ::evalTyped<(bool)0, (bool)0, (bool)0, (int)0> ") from a __host__ __device__ function("Eigen::TensorEvaluator<const Eigen::TensorContractionOp<const Eigen::array<Eigen::IndexPair<int> , (unsigned long)1ul> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> , const Eigen::TensorMap<Eigen::Tensor<float, (int)2, (int)0, long> , (int)0> > , Eigen::GpuDevice> ::evalSubExprsIfNeeded") is not allowed

/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h(349): error: identifier "__T216" is undefined in device code

/usr/local/include/eigen3/unsupported/Eigen/CXX11/src/Tensor/TensorReduction.h(359): error: identifier "__T217" is undefined in device code

2 errors detected in the compilation of "/tmp/tmpxft_00003910_00000000-8_clstm_compute_cuda.cpp2.i".

from clstm.

haihaoshen commented on September 23, 2024

Any idea to enable GPU to build clstm?

from clstm.

Halfish commented on September 23, 2024

No idea.

from clstm.

melody-rain commented on September 23, 2024

got the same error.
cannot find __T216 in TensorReduction.h...

from clstm.

melody-rain commented on September 23, 2024

check assert(false && "Not implemented"); in TensorReduction.h.

what is false && "Not implemented"? I just comment the line of code and it compiles successfully.

from clstm.

melody-rain commented on September 23, 2024

updating eigen seems to fix the problem...

from clstm.

chinakook commented on September 23, 2024

it could really have not been implemented as the assertion

from clstm.

mattndu commented on September 23, 2024

Has anyone had success getting this to work? I successfully compiled it for GPU, but it doesn't train -- I get segmentation fault.

If anyone has made this work, what kind of training times are you seeing?

This library looks really cool, unfortunately seeing reports of 50+ days of training time (for the Japanese data), makes me think it might be practically unusable.

from clstm.

chinakook commented on September 23, 2024

I think the mini-batch size cannot be set in clstm until now, so the GPU version may not be faster than the CPU one. If you want to accelerate the training, please try torch with cudnn V5, in which library the training speed up 5x when using batch-size 64.

from clstm.

PedroBarcha commented on September 23, 2024

I've got the same issue. Have you found out any solution?
Halfish , @mattndu

from clstm.

mattndu commented on September 23, 2024

I switched over to this. It's a bitch to setup, but It works great. It drives my GPU at 70% or so, but it's still a little slow. Probably 3-4x faster than clstmocr though. I think it's just inherently slow due to the serial nature of the training:

https://github.com/dmlc/mxnet/tree/master/example/warpctc

from clstm.

bluefa1con commented on September 23, 2024

@mattndu hey mind giving a heads up on setting it up? The input format and requirements
I got this error compiling it, clstm is painfully slow for me so want something that uses GPU

/usr/lib/gcc/x86_64-linux-gnu/5/include/mwaitxintrin.h(36): error: identifier "__builtin_ia32_monitorx" is undefined

/usr/lib/gcc/x86_64-linux-gnu/5/include/mwaitxintrin.h(42): error: identifier "__builtin_ia32_mwaitx" is undefined

2 errors detected in the compilation of "/tmp/tmpxft_000035a0_00000000-16_reduce.compute_52.cpp1.ii".
CMake Error at warpctc_generated_reduce.cu.o.cmake:266 (message):
Error generating file
/home/user/Workspace/warpctc/warp-ctc/build/CMakeFiles/warpctc.dir/src/./warpctc_generated_reduce.cu.o

CMakeFiles/warpctc.dir/build.make:70: recipe for target 'CMakeFiles/warpctc.dir/src/warpctc_generated_reduce.cu.o' failed
make[2]: *** [CMakeFiles/warpctc.dir/src/warpctc_generated_reduce.cu.o] Error 1
CMakeFiles/Makefile2:141: recipe for target 'CMakeFiles/warpctc.dir/all' failed
make[1]: *** [CMakeFiles/warpctc.dir/all] Error 2
Makefile:127: recipe for target 'all' failed
make: *** [all] Error 2

UPDATE: Use these flags in CMAKECACHE.txt "-D_MWAITXINTRIN_H_INCLUDED -D_FORCE_INLINES -D__STRICT_ANSI__"

from clstm.

bluefa1con commented on September 23, 2024

@mattndu hey man not to bother you about this but I ran into a few errors while getting it compiling
its mostly related to gcc, how did you fix them?
/usr/include/c++/5/bits/stl_iterator_base_types.h(156): error: name followed by "::" must be a class or namespace name
detected during:
instantiation of class "std::__iterator_traits<_Iterator, void> [with _Iterator=int]"
(163): here

from clstm.

moucmou commented on September 23, 2024

How did you succeed later? I also tried using gpu to speed up training，I also failed！！

my gpu driver

and my cuda
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2017 NVIDIA Corporation
Built on Fri_Sep__1_21:08:03_CDT_2017
Cuda compilation tools, release 9.0, V9.0.176

Can anyone help me? Sincere thanks ！！

from clstm.

melody-rain commented on September 23, 2024

@moucmou there is no need to use this library any longer. A lot of deep learning frameworks support lstm, which can be used.

from clstm.

amitdo commented on September 23, 2024

@melody-rain,
You have a typo in your profile page. It should be 'Computer vision and deep learning'.

from clstm.

melody-rain commented on September 23, 2024

@amitdo Thanks : )

from clstm.

mittagessen commented on September 23, 2024

@moucmou if you want a mostly out of the box OCR with gpu acceleration you could use @andbue's kraken branch that uses pytorch instead of clstm. IIRC it's performance isn't as it still trains using SGD but to be frank when you do OCR with some hacked-together tensorflow-scripts it probably won't be significantly faster, especially when figuring in compile/model loading times.

from clstm.

How to use GPU to speed up in clstm? about clstm HOT 19 OPEN

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent