llnl / hiop Goto Github PK
View Code? Open in Web Editor NEWHPC solver for nonlinear optimization problems
License: Other
HPC solver for nonlinear optimization problems
License: Other
implementing and adding "no line search" option would be very helpful.
Method hiopVectorPar::logBarrier
is a reduction, but it sums only over local vector elements and not over all MPI ranks. Is this method intended for use in distributed memory cases?
For a minimal example, I setup:
such that W
may be a clone of A
:
hiop::hiopMatrixDense A(M_local, N_global, partition, comm);
hiop::hiopMatrixDense X(N_global, N_global, partition, comm);
hiop::hiopMatrixDense* W = A.alloc_clone();
A.setToConstant(1.);
W.setToConstant(1.);
X.setToConstant(1.);
// Beta = 0 to just test matmul portion
A.timesMat(0., *W, 1., X);
I expect W
to have all it's elements set to N_global
:
// W = 0 * W + A * X
double expected = 1. * 1. * N_glob;
This succeeds when running on a single machine. When I attempt to run this in an MPI environment however, the following assertion in timesMat_local
is thrown:
assert(W.n_local==W.n_global && "requested multiplication should be done in parallel using timesMat");
Magma solver interface is implemented in hiopLinSolver.hpp
what seems to pollute HiOp API. One consequence of that is that any application depending on HiOp will depend on Magma API and will need Magma include files to build (if HiOp is built with Magma). Perhaps it would be good idea to move the implementation of Magma interface to a separate compilation unit (a *.cpp
file)?
When building with MPI on and GPU off, all tests pass as expected. With MPI and GPU on however, all of the NlpMDS methods fail:
The following tests FAILED:
11 - NlpMixedDenseSparse4_1 (Failed)
12 - NlpMixedDenseSparse4_2 (Failed)
13 - NlpMixedDenseSparse5_1 (Failed)
The method hiopMatrixDense::assertSymmetry
checks symmetry of the local data block and likely won't work with distributed memory partitioned matrices. It would be good to clarify this in the documentation.
Cmake cannot find the installed Lapack on Lassen.
In configuration, it shows:
Found LAPACK libraries: /usr/lib64/libessl.so;/usr/lib64/libblas.so;
However, it gives error message with compiling the code:
hiopKKTLinSys.cpp:(.text+0x7c24): undefined reference to `dposvx_'
Same error happens when different version of Lapack is loaded by command "module".
Hi,
I'm using hiop library together with MFEM to solve a optimization problem that involves the resolution of a PDE. The optimization works very well and in few iteration a get convergence. However, I recieve the following warning after each iteration and I don't know what it means to be able to remove it.
"On entry to DPOSVXEpsilon1Full parameter number 6 had an illegal value"
thanks for your time,
Jesus
It would be desirable to build shared HiOp library with runtime paths to its dependencies, similar to how Ipopt is configured to build.
I am trying to access the constraint multipliers through solution_callback, but the multipliers array lam
(and even the constraints arrays g
) is NULL. Below is the debugging frame which shows that g and lam passed to solution_callback are NULL.
Process 56591 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
frame #0: 0x00000001001411f5 libopflow.dylib`OPFLOWHIOPInterface::solution_callback(this=0x000000010460a430, status=Solve_Success, n=24, xsol=0x000000010460d140, z_L=0x000000010460dd10, z_U=0x000000010460de10, m=36, g=0x0000000000000000, lam=0x0000000000000000, obj_value=5297.4067102865956) at opflow-hiop.cpp:441:18
This is with running the NewtonMDS solver.
I am missing setting something or does this need to be added to HIOP?
parts of the code NLP preprocessing code need to be revisited since they are "wired" to work only on CPU. Need to either use generic hiopVector kernels or port these parts using Raja. This is especially pervasive in fixed variables removal.
It would be good to add "gpu" as a "compute_mode" option in HiOp. However, this option is not independent of "mem_space" selection. Here is the summary of compute mode and memory space option compatibilities:
We can simply document this and expect user to select a meaningful combination or we can add some logic to HiOp options class that would warn user or when incompatible combination was selected and fall back to the next best thing.
If HiOp is built without RAJA support, only "default" memory space is available.
Options "um", "pinned" or "device" are available only if HiOp is built with GPU support (for now it's CUDA only) turned on.
HiOp is using array of row pointers to access dense matrix data through its interface (e.g. constraint Jacobian data). Perhaps more flexible solution, especially for GPU implementations, would be to pass a raw pointer to matrix data.
Such change would imply all matrix rows are stored in a contiguous memory block. This assumption is actually made at several places in HiOp code, so passing pointer to data block instead of an array of row pointers would make it more clear to user how to store Jacobian data, in addition to making it more GPU friendly.
A C/Fortran interface to this would be great. I think it could be added in a similar way that Ipopt has done it.
functions addToSymDenseMatrixUpperTriangle
and transAddToSymDenseMatrixUpperTriangle
for sparse matrix have never been used.
The optimization routines under 'hiopKKTLinMDS' only need to use this function for its dense part (see here) , and we always assume the sparse part of HessMDS is a diagonal matrix (in order to perform its inverse function efficiently).
New functions addToSym**Sparse**MatrixUpperTriangle
may be required for the sparse linear implementation, which is handled in #85. The RAJA variants may be implemented later.
MA86 Z headers are not working with C++ when std::complex includes are present.
A temporary dirty-ish fix below.
Edit 'hsl_mc69z.hpp" and "hsl_ma86z.hpp" as in 2, 3, and 4 below.
#include <complex.h>
add these lines#ifdef __cplusplus
extern "C" {
#endif
replace "complex" with "_Complex" whenever occurs in "double complex" or "complex double"
before the last #endif
insert these lines
#ifdef __cplusplus
}
#endif
For the fuction transAddToSymDenseMatrixUpperTriangle
, it is unclear given the current documentation which matrix should be transposed. From what is in the kernel, it appears as though it is the symmetric sparse triplet matrix that is transposed. If this is the case, then this may be related to issue #77.
If it is the output matrix that should be transposed, then the kernel should be modified, and documentation should be updated accordingly.
It might be helpful to consider a coding style that makes class member variables distinguishable (e.g. ending each member variable name with _
). IMHO it would significantly improve readability of the code, especially in functions that are a couple of hundreds lines long and operate on a few dozen variables.
Considering the size of HiOp code, making a wholesale changes would require some work, but even setting style guidelines for future contributions would be quite helpful.
In hiopMatrixMDS::copyFrom
, we have:
virtual void copyFrom(const hiopMatrixMDS& m)
{
mSp->copyFrom(*m.mSp);
mDe->copyFrom(*m.mDe);
}
Yet, hiopMatrixSparseTriplet::copyFrom
has:
void hiopMatrixSparseTriplet::copyFrom(const hiopMatrixSparseTriplet& dm)
{
assert(false && "this is to be implemented - method def too vague for now");
}
Therefore this method will always fail. Should it not be a method of hiopMatrixMDS
or should there be an implementation for hiopMatrixSparseTriplet::copyFrom
that will not immediately assert?
Hey,
Just ran into this one.
In hiopNlpFormulation.cpp
, you create a hiop logger with the verbosity level as a parameter. However, you are immediately grabbing the verbosity level option from the default constructed hiopOptions
which prohibits anyone from actually specifying the verbosity level :
options = new hiopOptions(/*filename=NULL*/);
hiopOutVerbosity hov = (hiopOutVerbosity) options->GetInteger("verbosity_level");
log = new hiopLogger(this, hov, stdout);
Method hiopVectorPar::logBarrier
seems to accumulate error and fails unit tests. In the current unit test the error is of order 1e-11 whereas expected accuracy is around machine precision (as is the case for all other vector kernel tests).
Consider modifying sequential algorithm to use Kahan summation. Check whether optimization -O2 or -O3 would destroy the precision restoration in Kahan summation.
According to @cnpetra, for logBarrier method accuracy is more important than performance, because it directly affects convergence of the overall algorithm.
It seems there may be a bug in the MPI part of the hiopMatrixDenseRowMajor::timesMatTrans
function. Perhaps this line should instead be:
double* Wglob= W.new_mxnlocal_buff(); //[n2Red];
Local buffers of this
and W
matrix are not necessarily the same.
In README_developers.md
the required tests involve using clang tools for address sanitization. Clang does not work so readily with openmp which is now a dependency for the new RAJA linear algebra library. To run these we have to disable HIOP_USE_RAJA
. @cnpetra would you still like to have clang tools checks for the non-raja parts of hiop or should we pursue other tests as prerequisites for submitting PRs?
When testing the shiftRows
method of hiopMatrixDense
, I am running into a segfault on our Power9 systems:
233 A.shiftRows(shift);
(gdb)
Program received signal SIGSEGV, Segmentation fault.
0x0000000010017474 in hiop::hiopMatrixDense::shiftRows (this=0x7fffffffe410, shift=4) at /people/manc568/projects/hiop/src/LinAlg/hiopMatrix.cpp:256
256 assert(test1==M[shift<0?0:m_local][0] && "a different copy technique than memcpy is needed on this system");
Should we add a fallback method in case memcpy will not work? How do you suggest we test this method? @cnpetra
When running HiOp with MAGMA, the following output is being printed to stdout.
GPU FACT in 0.0493771 sec at TFlops: 0.000712128
GPU FACT in 0.0434769 sec at TFlops: 0.00080877
GPU FACT in 0.0436565 sec at TFlops: 0.000805443
GPU FACT in 0.043938 sec at TFlops: 0.000800282
GPU FACT in 0.0430558 sec at TFlops: 0.00081668
GPU FACT in 0.0432639 sec at TFlops: 0.000812752
GPU FACT in 0.0433745 sec at TFlops: 0.00081068
GPU FACT in 0.0432896 sec at TFlops: 0.000812269
GPU FACT in 0.0591214 sec at TFlops: 0.000594756
GPU FACT in 0.0609099 sec at TFlops: 0.000577292
I'd like this output to be suppressed and perhaps be allowed based on the verbose level.
I think it is printed via this line
Setting up three matrices like so:
hiop::hiopMatrixDense A(M_global, K_global, k_partition, comm);
hiop::hiopMatrixDense M(K_global, N_global, n_partition, comm);
hiop::hiopMatrixDense W(M_global, N_global, n_partition, comm);
I then set values and attempt to call timesMat
:
A.setToConstant(A_val);
W.setToConstant(W_val);
M.setToConstant(M_val);
A.timesMat(beta, W, alpha, M);
real_type expected = (beta * W_val) + (alpha * A_val * M_val * N_global);
const int fail = verifyAnswer(&W, expected);
Which results in a segfault.
In our internal branch, the linear algebra factory has a method (createMatrixSparse
) to create an instance of the abstract class hiopMatrixSparse
, choosing the appropriate implementation (e.g. RAJA vs default).
Seeing that there are other implementations of sparse matrices, how should we alter the factory class API to create other kinds of sparse matrices?
A few options:
createMatrixSparseSym
, createMatrixSparse
, and othersCC @pelesh
Should we consider moving implementation of the solve()
method below from hiopLinSolver.hpp
to a source file? Simplifying API would likely help porting to GPU and managing compile-time dependencies. See also #43.
void solve ( hiopVector& x_ )
{
assert(M.n() == M.m());
assert(x_.get_size()==M.n());
int N=M.n(), LDA = N, info;
if(N==0) return;
hiopVectorPar* x = dynamic_cast<hiopVectorPar*>(&x_);
assert(x != NULL);
char uplo='L'; // M is upper in C++ so it's lower in fortran
int NRHS=1, LDB=N;
DSYTRS(&uplo, &N, &NRHS, M.local_buffer(), &LDA, ipiv, x->local_data(), &LDB, &info);
if(info<0) {
nlp->log->printf(hovError, "hiopLinSolverIndefDenseLapack: DSYTRS returned error %d\n", info);
assert(false);
} else if(info>0) {
nlp->log->printf(hovError, "hiopLinSolverIndefDenseLapack: DSYTRS returned error %d\n", info);
}
}
I'm getting linker errors of the following type:
[ 68%] Linking CXX executable nlpDenseCons_ex3.exe
Undefined symbols for architecture x86_64:
"daxpy", referenced from:
hiop::hiopVectorPar::axpy(double, hiop::hiopVector const&) in libhiop.a(hiopVector.cpp.o)
hiop::hiopMatrixDense::addMatrix(double, hiop::hiopMatrix const&) in libhiop.a(hiopMatrix.cpp.o)
Openblas, Lapack, OpenMPI are installed.
This is the cmake output:
cmake_out.txt
Finally, I have good description of this case. The MOI interface is working fine using the dense algebra. With the sparse one, I have the following issue.
I pass this sparsity pattern to Hiop:
iHSS = Int32[0, 1, 2, 6, 9, 15, 18, 7, ..., 14, 16, 21, 22, 23, 9, 13, 14, 18, 22, 23]
jHSS = Int32[0, 1, 2, 6, 6, 6, 6,..., 22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23
The relevant entries are only the first 3.
At the first iteration on the IEEE 9 bus case with 3 generators and the lambdas being initialized to 0 I get the following Hessian:
nzval = [2200.0, 1700.0, 2450.0, 0.0, ..., -0.0, 0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0, 0.0]
Because the multipliers lambda are 0 this seems right to me, given there are 3 generators.
Hiop then aborts with this message:
iter objective inf_pr inf_du lg(mu) alpha_du alpha_pr linesrch
0 8.3631250e+03 1.550e+00 3.530e+03 -1.00 0.000e+00 0.000e+00 -(-)
[Warning] hiopLinSolverIndefDense error: 11 entry in the factorization's diagonal
is exactly zero. Division by zero will occur if it a solve is attempted.
[Warning] hiopLinSolverIndefDense error: 11 entry in the factorization's diagonal
is exactly zero. Division by zero will occur if it a solve is attempted.
julia: /scratch/mschanen/git/hiop/src/Optimization/hiopPDPerturbation.hpp:184: bool hiop::hiopPDPerturbation::compute_perturb_singularity(double&, double&, double&, double&): Assertion `delta_cc == 0. && delta_cd == 0.' failed.
If I set the multipliers to nonzero at the first iteration, it works fine, and the problem even converges to the right solution. So I'm wondering what is going on? Is this expected behavior?
Thanks!
Building HiOp with default options on macOS 10.12.6 and running the tests yields errors for the nlpDenseCons_ex1 tests. Log files can be found at: https://gist.github.com/goxberry/8bdc80e6dcd4d15ed0a7c5130009d6aa
The configuration I'm using is built by spack, so I have some flexibility in choosing libraries, but all of these libraries are included via RPATH directives. My impression is that linking isn't an issue, but I could be wrong about that.
It might be helpful to specify and better document how to select vector pattern. Currently pattern is selected by a vector of 1.0
s and 0.0
s in double precision. Rules are a little vague as to what happens in case that pattern vector element is neither one nor zero. Possible solutions I see are:
The argument related to GPU implementations is something like this: If pattern vector elements are either one or zero, then instead of using conditionals, which could cause warp divergence, one could simply do elementwise multiply with pattern vector to select the pattern. For memory bound computations, this could lead to a better performing implementation, I think.
Method axdzpy_w_pattern
is implemented in class hiopVectorPar
, but there is no abstract interface to it in (almost) pure virtual class hiopVector
. Is this method needed?
in a couple of places (essential) code is placed inside 'assert(...)'. This causes undefined behavoir and/or crashes when hiop is compiled with NDEBUG. @junkudo
My HiOp build from dev/NewtonMDS
branch fails at compile stage with message:
make[2]: *** No rule to make target `/usr/lib64/libopenblas.so -lmagma -L/share/apps/cuda/9.2/lib64 -lculibos -lcublas -lcublasLt -lnvblas -lcusparse -lcudart -lcudadevrt', needed by `src/LinAlg/test_hiopLinAlgComplex.exe'. Stop.
It seems that there is some mess-up with CMake paths. I used following configuration for build:
CC=mpicc CXX=mpicxx FC=mpif90 cmake \
-DHIOP_USE_MPI=1 \
-DHIOP_USE_GPU=1 \
-DHIOP_MAGMA_DIR="/.../exasgd/newell/magma" \
-DCMAKE_INSTALL_PREFIX=$HIOP_DIR \
../hiop
I used cmake 3.13.4, gcc 7.4.0, openmpi 3.1.3, cuda 9.2, OpenBLAS 0.3.3, and LAPACK 3.4.2 for the build. Configure part works fine, but something goes wrong with complex linear algebra compilation.
Verbose output gives me this:
make -f src/LinAlg/CMakeFiles/test_hiopLinAlgComplex.exe.dir/build.make src/LinAlg/CMakeFiles/test_hiopLinAlgComplex.exe.dir/depend
make[2]: Entering directory `/.../exasgd/src/hiop/build_newell'
cd /.../exasgd/src/hiop/build_newell && /.../cmake/3.13.4/bin/cmake -E cmake_depends "Unix Makefiles" /.../exasgd/src/hiop/hiop /.../exasgd/src/hiop/hiop/src/LinAlg /.../exasgd/src/hiop/build_newell /.../exasgd/src/hiop/build_newell/src/LinAlg /.../exasgd/src/hiop/build_newell/src/LinAlg/CMakeFiles/test_hiopLinAlgComplex.exe.dir/DependInfo.cmake --color=
make[2]: Leaving directory `/.../exasgd/src/hiop/build_newell'
make -f src/LinAlg/CMakeFiles/test_hiopLinAlgComplex.exe.dir/build.make src/LinAlg/CMakeFiles/test_hiopLinAlgComplex.exe.dir/build
make[2]: Entering directory `/.../exasgd/src/hiop/build_newell'
make[2]: *** No rule to make target `/usr/lib64/libopenblas.so -L/.../exasgd/newell/magma/lib -lmagma -L/.../cuda/9.2/lib64 -lculibos -lcublas -lcublasLt -lnvblas -lcusparse -lcudart -lcudadevrt', needed by `src/LinAlg/test_hiopLinAlgComplex.exe'. Stop.
Consider implementing loop that is counting negative eigenvalues in hiopKKTLinCompressedMDSXYcYd
class as a vector kernel and adding it to the abstract vector interface. Otherwise, method hiopKKTLinSysCompressedMDSXYcYd::update
depends on implementation detail of the Hxs_
vector.
This loop will not run on GPU, for example.
There may be an issue in Magma no pivoting solver interface or even bug in Magma. The other possibility is that Example 4 cannot be solved when using no-pivoting linear solver. When switching between hybrid
and cpu
modes in Example 4, the number of iterations and the convergence rate changes.
It seems that Magma no-pivoting solver diverges. I am not sure though if cpu
compute mode uses no-pivoting function or it always uses Bunch-Kauffmann, though.
Below is the Example 4 output for hybrid and cpu compute modes, respectively.
hybrid
compute mode - Magma factorization and Lapack solve:iter objective inf_pr inf_du lg(mu) alpha_du alpha_pr linesrch
0 3.9800005e+02 4.990e+02 4.000e+00 -1.00 0.000e+00 0.000e+00 -(-)
[Warning] KKT_MDS_XYcYd linsys: MagmaNopiv size 503 (403 cons) (safe_mode=0)
1 3.6824691e+02 5.536e+02 3.990e+00 -1.00 1.579e-03 2.372e-03 1(s)
2 3.3431951e+02 5.126e+02 3.989e+00 -1.00 6.706e-05 4.510e-05 1(s)
3 6.2429733e+02 4.067e+02 3.987e+00 -1.00 6.218e-05 6.111e-05 1(s)
4 1.5031313e+03 3.637e+02 1.647e+01 -1.00 1.336e-04 3.048e-05 1(s)
[Warning] Requesting additional accuracy and stability from the KKT linear system at iteration 4 (safe mode ON)
[Warning] KKT_MDS_XYcYd linsys: MagmaBuKa size 503 (403 cons) (safe_mode=1)
5 1.0674827e+02 1.112e+02 1.509e+01 -1.00 3.784e-01 6.944e-01 1(s)
6 5.0647226e+00 5.952e+01 1.038e+01 -1.00 3.490e-01 4.646e-01 1(s)
7 4.1846536e+00 5.900e+01 9.514e+00 -1.00 4.810e-02 8.682e-03 1(s)
8 4.1624661e+00 5.893e+01 6.328e+00 -1.00 9.055e-03 1.154e-03 1(s)
9 -4.9917037e+01 2.292e-01 1.064e+01 -1.00 6.468e-03 9.961e-01 1(s)
iter objective inf_pr inf_du lg(mu) alpha_du alpha_pr linesrch
10 -4.9924417e+01 2.221e-01 2.031e+01 -1.00 7.325e-01 3.102e-02 1(s)
11 -4.3510181e+01 1.070e-12 1.171e+00 -1.00 8.872e-01 1.000e+00 1(s)
12 -4.3197637e+01 4.252e-14 1.000e-06 -1.00 1.000e+00 1.000e+00 1(f)
13 -4.9686273e+01 1.054e-12 1.748e+00 -2.55 9.219e-01 1.000e+00 1(f)
14 -4.9973780e+01 1.013e-13 2.828e-08 -2.55 1.000e+00 1.000e+00 1(f)
15 -4.9992605e+01 3.642e-14 1.504e-09 -3.82 1.000e+00 1.000e+00 1(f)
16 -4.9993471e+01 8.882e-16 1.504e-09 -3.82 1.000e+00 1.000e+00 1(f)
17 -4.9993739e+01 2.442e-15 1.729e-03 -5.73 9.710e-01 1.000e+00 1(f)
18 -4.9994734e+01 1.179e-13 1.845e-11 -5.73 1.000e+00 1.000e+00 1(f)
19 -4.9994724e+01 2.887e-15 1.845e-11 -5.73 1.000e+00 1.000e+00 1(f)
iter objective inf_pr inf_du lg(mu) alpha_du alpha_pr linesrch
20 -4.9994868e+01 6.661e-16 3.077e-04 -6.00 8.548e-01 1.000e+00 1(f)
21 -4.9994888e+01 6.883e-14 1.000e-11 -6.00 1.000e+00 1.000e+00 1(f)
Successfull termination.
Total time 2.521 sec
Hiop internal time: total 2.515 sec avg iter 0.120 sec
internal total std dev across ranks 0.000 percent
Fcn/deriv time: total=0.004 sec ( obj=0.000 grad=0.000 cons=0.001 Jac=0.002 Hess=0.001)
Fcn/deriv total std dev across ranks 0.000 percent
Fcn/deriv #: obj 56 grad 22 eq cons 57 ineq cons 57 eq Jac 22 ineq Jac 22
Total KKT time 2.506 sec
update init 1.725sec update linsys 0.077 sec fact 0.655 sec
solve rhs-manip 0.007 sec triangular solve 0.042 sec
cpu
compute mode - Lapack factorization and solveiter objective inf_pr inf_du lg(mu) alpha_du alpha_pr linesrch
0 3.9800005e+02 4.990e+02 4.000e+00 -1.00 0.000e+00 0.000e+00 -(-)
1 1.8397700e+01 1.818e+02 1.457e+00 -1.00 3.341e-01 6.357e-01 1(s)
2 -4.3390041e+01 5.305e+01 4.253e-01 -1.00 7.654e-01 7.081e-01 1(s)
3 -4.9598500e+01 6.175e-13 2.539e-01 -1.00 8.152e-01 1.000e+00 1(s)
4 -4.9406538e+01 5.160e-13 3.078e+01 -1.00 9.388e-01 5.909e-02 1(f)
5 -4.3566387e+01 3.950e-13 1.000e-06 -1.00 1.000e+00 1.000e+00 1(f)
6 -4.9711480e+01 6.206e-13 1.601e+00 -2.55 9.261e-01 1.000e+00 1(f)
7 -4.9974904e+01 1.774e-13 2.828e-08 -2.55 1.000e+00 1.000e+00 1(f)
8 -4.9992701e+01 3.353e-14 1.504e-09 -3.82 1.000e+00 1.000e+00 1(f)
9 -4.9993494e+01 1.643e-14 1.504e-09 -3.82 1.000e+00 1.000e+00 1(f)
iter objective inf_pr inf_du lg(mu) alpha_du alpha_pr linesrch
10 -4.9993744e+01 1.310e-14 1.732e-03 -5.73 9.710e-01 1.000e+00 1(f)
11 -4.9994740e+01 2.887e-15 1.845e-11 -5.73 1.000e+00 1.000e+00 1(f)
12 -4.9994726e+01 1.377e-14 1.845e-11 -5.73 1.000e+00 1.000e+00 1(f)
13 -4.9994868e+01 7.772e-15 3.029e-04 -6.00 8.509e-01 1.000e+00 1(f)
14 -4.9994888e+01 2.941e-12 1.000e-11 -6.00 1.000e+00 1.000e+00 1(f)
Successfull termination.
Total time 0.131 sec
Hiop internal time: total 0.128 sec avg iter 0.009 sec
internal total std dev across ranks 0.000 percent
Fcn/deriv time: total=0.002 sec ( obj=0.000 grad=0.000 cons=0.000 Jac=0.001 Hess=0.001)
Fcn/deriv total std dev across ranks 0.000 percent
Fcn/deriv #: obj 15 grad 15 eq cons 16 ineq cons 16 eq Jac 15 ineq Jac 15
Total KKT time 0.123 sec
update init 0.009sec update linsys 0.049 sec fact 0.059 sec
solve rhs-manip 0.004 sec triangular solve 0.002 sec
addToSymDenseMatrixUpperTriangle
and transAddToSymDenseMatrixUpperTriangle
for the symmetric sparse triplet classes both seem to not take into account the symmetric nature of the matrices when adding them to the output matrices.
For reference timesVec has a the following section of code that takes this into account:
y[iRow_[i]] += alpha * x[jCol_[i]] * values_[i];
if(iRow_[i]!=jCol_[i])
y[jCol_[i]] += alpha * x[iRow_[i]] * values_[i];
A way of fixing this issue would be to have the existing addToSymDenseMatrixUpperTriangle
look something like the following, with a similar fix for transAddToSymDenseMatrixUpperTriangle
:
void hiopMatrixSymSparseTriplet::addToSymDenseMatrixUpperTriangle(int row_start, int col_start,
double alpha, hiopMatrixDense& W) const
{
assert(row_start>=0 && row_start+nrows<=W.m());
assert(col_start>=0 && col_start+ncols<=W.n());
assert(W.n()==W.m());
double** WM = W.get_M();
for(int it=0; it<nnz; it++) {
assert(iRow[it]<=jCol[it] && "sparse symmetric matrices should contain only upper triangular entries");
int i = iRow[it]+row_start;
int j = jCol[it]+col_start;
assert(i<W.m() && j<W.n()); assert(i>=0 && j>=0);
assert(i<=j && "symMatrices not aligned; source entries need to map inside the upper triangular part of destination");
WM[i][j] += alpha*values[it];
if(iRow[it] != jCol[it])
{
i = jCol[it]+row_start;
j = iRow[it]+col_start;
assert(i<W.m() && j<W.n()); assert(i>=0 && j>=0);
assert(i<=j && "symMatrices not aligned; source entries need to map inside the upper triangular part of destination");
WM[i][j] += alpha*values[it];
}
}
}
If this fix is not implemented, only one half of the symmetric sparse matrix will be added to the destination matrix every time this function is called.
It seems that local variable howManyToCopy
and howManyToCopyDest
are flipped around in function hiopVectorPar::startingAtCopyFromStartingAt
.
howManyToCopy
is the number of elements at the destination that will be overwritten.howManyToCopyDest
is number of source element that will be written to the destination. assert(howManyToCopy <= howManyToCopyDest);
what is, I think, opposite of what is intended (despite names of these variables suggesting otherwise).
Also, it seems the function arguments seem to have misleading meaning since start_idx_src
is destination and start_idx_dest
is source offset.
This (possible) bug does not affect the code, because only time this function is called the source and destination are of the same size and both offsets are zero.
Heavy use is made of copyFrom
and similar methods, yet the abstract interface does not mandate that a hiopMatrix implementation has this method. In contrast, hiopVector does have copyTo
and copyFrom
methods in the abstract interface. I believe this could lead to implementation-specific code which will cause problems when we attempt to migrate to other implementations for hiopMatrix.
Review RAJA kernels (currently in raja-dev
branch) and flag potential bottlenecks. The purpose of this review is pre-screening of potential performance issues to give us heads up what to pay attention to when profiling the performance.
The current implementation of RAJA kernels was done with objective to ensure accurate computations. Some kernels, such as hiopVectorRajaPar::projectIntoBounds
, are implemented in a way that is not quite "GPU friendly". Help from RAJA developers in identifying other potential bottlenecks and suggestions how to implement these kernels better is very much appreciated.
RAJA kernels are implemented in following HiOp classes:
hiopVectorRajaPar
hiopMatrixRajaDense
hiopMatrixRajaSparseTriplet
hiopMatrixRajaSymSparseTriplet
Currently, RAJA kernels run only within unit tests. See tests in:
testVector.cpp
testMatrix.cpp
testMatrixSparse.cpp
The existing LSQ duals calculator needs to be revisited and re-engineered to work with generic hiopNlpFormulation
. Currently it assumes a hiopNlpDenseCons
NLP:
hiop/src/Optimization/hiopDualsUpdater.cpp
Line 111 in 2252b90
It seems that *SymDenseMatrixUpperTriangle
methods add elements of a rectangular matrix (pointed by this
) into the upper triangular part of the matrix W
(passed as the input argument). The methods use only local data indices and may not work for distributed memory partitioned matrices unless the caller provides row and column start indices that would guarantee data is written to the upper triangular part of W
.
It would be good to better document preconditions for calling these functions as they are nontrivial. Are these methods intended for use when both, none, or only W
are MPI partitioned?
The nlpMDS_ex5.exe
test fails to converge on Intel platform with Volta GPU. HiOp is built with GPU support and Kron reduction enabled. When running with 1 MPI rank and on one GPU device, following error message is obtained:
...
[Warning] hiopLinSolverMagmaBuka error: 191 entry in the factorization's diagonal
is exactly zero. Division by zero will occur if it a solve is attempted.
[Warning] hiopLinSolverMagmaBuka error: 191 entry in the factorization's diagonal
is exactly zero. Division by zero will occur if it a solve is attempted.
[Warning] KKT_MDS_XYcYd linsys: Detected negative eigenvalues in (1,1) sparse block.
83 -1.3559463e+03 1.260e-02 5.259e-03 -2.55 1.000e+00 1.000e+00 1(f)
[Warning] hiopLinSolverMagmaBuka error: 191 entry in the factorization's diagonal
is exactly zero. Division by zero will occur if it a solve is attempted.
[Warning] hiopLinSolverMagmaBuka error: 191 entry in the factorization's diagonal
is exactly zero. Division by zero will occur if it a solve is attempted.
[Warning] KKT_MDS_XYcYd linsys: Detected negative eigenvalues in (1,1) sparse block.
84 -1.3560463e+03 4.662e-02 2.022e-03 -3.82 1.000e+00 1.000e+00 1(h)
[Warning] hiopLinSolverMagmaBuka error: 191 entry in the factorization's diagonal
is exactly zero. Division by zero will occur if it a solve is attempted.
[Warning] hiopLinSolverMagmaBuka error: 191 entry in the factorization's diagonal
is exactly zero. Division by zero will occur if it a solve is attempted.
[Warning] KKT_MDS_XYcYd linsys: Detected negative eigenvalues in (1,1) sparse block.
Panic: minimum step size reached. The problem may be infeasible or the gradient inaccurate. Will exit here.
85 -1.3560463e+03 4.662e-02 2.028e-03 -3.82 1.000e+00 5.551e-17 54(?)
Couldn't solve the problem.
Linesearch returned unsuccessfully (small step). Probable cause: inaccurate gradients/Jacobians or infeasible problem.
Total time 6.014 sec
Hiop internal time: total 6.002 sec avg iter 0.071 sec
internal total std dev across ranks 0.000 percent
Fcn/deriv time: total=0.005 sec ( obj=0.001 grad=0.000 cons=0.001 Jac=0.002 Hess=0.001)
Fcn/deriv total std dev across ranks 0.000 percent
Fcn/deriv #: obj 172 grad 86 eq cons 173 ineq cons 173 eq Jac 86 ineq Jac 86
Total KKT time 5.986 sec
update init 0.001sec update linsys 0.293 sec fact 5.673 sec
solve rhs-manip 0.013 sec triangular solve 0.005 sec
solve4 trouble: returned -4 (with objective is -1.356046289760e+03)
srun: error: dl08: task 0: Exited with exit code 255
Following dependencies have been used to build HiOp:
$ module list
Currently Loaded Modulefiles:
1) gcc/7.3.0 3) cmake/3.15.3 5) metis/5.1.0
2) cuda/10.2.89 4) openmpi/3.1.3 6) magma/2.5.2_cuda10.2
Please let me know what additional data would be helpful.
Two reports of build failure came in the last week via email. Coincidentally, I've just encountered the same exact problem on summit (using cmake 3.9.2)
CMake Error at CMakeLists.txt:7 (cmake_policy):
Policy "CMP0074" is not known to this version of CMake.
-- The C compiler identification is GNU 4.8.5
-- The CXX compiler identification is GNU 4.8.5
(...)
-- Found LAPACK libraries: /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-9.1.0/openblas-0.3.9-aymovpat33osbzgh5gsmhyvstsol4sfp/lib/libopenblas.so;/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-9.1.0/openblas-0.3.9-aymovpat33osbzgh5gsmhyvstsol4sfp/lib/libopenblas.so
CMake Error at src/Optimization/CMakeLists.txt:2 (target_link_libraries):
Object library target "hiopOptimization" may not link to anything.
-- Configuring incomplete, errors occurred!
See also "/ccs/home/cpetra/work/projects/hiop/build/CMakeFiles/CMakeOutput.log".
See also "/ccs/home/cpetra/work/projects/hiop/build/CMakeFiles/CMakeError.log".
Some of the methods of hiopMatrixSparseTriplet
class have an argument, which is a reference to a specific matrix implementation (see e.g. transAddToSymDenseMatrixUpperTriangle
method). This could potentially lead to cumbersome solutions when porting this class to hardware accelerators (e.g. GPU).
A minimally invasive way to go about this would be to add an enum
to matrix base class with matrix type IDs, as well as a virtual method to return the matrix type ID. Similar was done in SUNDIALS.
With such modification, method transAddToSymDenseMatrixUpperTriangle
can take reference to virtual hiopMatrix
class as the input argument, and then check in the implementation if a compatible matrix type was passed. The implementation of transAddToSymDenseMatrixUpperTriangle
can then select computation specific to the matrix layout or throw an exception if matrix type is incompatible.
This could keep API cleaner and provide more extensibility. The downside of this approach is that passing an incompatible matrix type would be caught at runtime instead of at compile time. A more comprehensive solution would be to use template parameters to specify matrix layout, but that would require more significant changes to the code.
with cmake -DHIOP_USE_MPI=NO ..
I get the compilation issue below:
[ 86%] Building CXX object tests/CMakeFiles/testMatrix.dir/testMatrix.cpp.o
In file included from /ccs/home/cpetra/work/projects/hiop/tests/LinAlg/matrixTestsDense.hpp:59:0,
from /ccs/home/cpetra/work/projects/hiop/tests/testMatrix.cpp:61:
/ccs/home/cpetra/work/projects/hiop/tests/LinAlg/matrixTests.hpp:63:20: fatal error: optional: No such file or directory
#include <optional>
^
compilation terminated.
make[2]: *** [tests/CMakeFiles/testMatrix.dir/testMatrix.cpp.o] Error 1
make[1]: *** [tests/CMakeFiles/testMatrix.dir/all] Error 2
make: *** [all] Error 2
I am installing updated hiop versions on our PNNL machines. Would now be an appropriate time to update the tags to v0.3
? Updating the tags after every large PR would be helpful in tracking versions. I think v0.2
points at c52a6f6
which is from last December.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.