llnl / hiop Goto Github PK

HPC solver for nonlinear optimization problems

License: Other

CMake 1.68% C++ 93.29% MATLAB 0.05% C 1.59% Shell 1.11% Awk 0.02% Perl 0.01% Cuda 1.77% Fortran 0.48%

hpc nonlinear-optimization nonlinear-programming nonlinear-programming-algorithms interior-point-method parallel-programming mpi bfgs quasi-newton constrained-optimization

hiop's People

Contributors

Stargazers

Watchers

hiop's Issues

no linesearch option

implementing and adding "no line search" option would be very helpful.

Should hiopVectorPar::logBarrier method perform MPI reduction?

Method hiopVectorPar::logBarrier is a reduction, but it sums only over local vector elements and not over all MPI ranks. Is this method intended for use in distributed memory cases?

MPI matrix multiplication fails

For a minimal example, I setup:

$A_{M \times N} \quad X_{N \times N} \quad W_{M \times N}$

such that W may be a clone of A:

hiop::hiopMatrixDense A(M_local, N_global, partition, comm);
hiop::hiopMatrixDense X(N_global, N_global, partition, comm);
hiop::hiopMatrixDense* W = A.alloc_clone();

A.setToConstant(1.);
W.setToConstant(1.);
X.setToConstant(1.);

// Beta = 0 to just test matmul portion
A.timesMat(0., *W, 1., X);

I expect W to have all it's elements set to N_global:

//     W        = 0 * W + A  * X
double expected =         1. * 1. * N_glob;

This succeeds when running on a single machine. When I attempt to run this in an MPI environment however, the following assertion in timesMat_local is thrown:

assert(W.n_local==W.n_global && "requested multiplication should be done in parallel using timesMat");

Magma interface implementation in hiopLinSolver.hpp

Magma solver interface is implemented in hiopLinSolver.hpp what seems to pollute HiOp API. One consequence of that is that any application depending on HiOp will depend on Magma API and will need Magma include files to build (if HiOp is built with Magma). Perhaps it would be good idea to move the implementation of Magma interface to a separate compilation unit (a *.cpp file)?

@cnpetra @ashermancinelli

Nlp mixed sparse-dense tests fail with GPU enabled

When building with MPI on and GPU off, all tests pass as expected. With MPI and GPU on however, all of the NlpMDS methods fail:

The following tests FAILED:
	 11 - NlpMixedDenseSparse4_1 (Failed)
	 12 - NlpMixedDenseSparse4_2 (Failed)
	 13 - NlpMixedDenseSparse5_1 (Failed)

HiOp dense matrix method assertSymmetry does not work with MPI partitioned matrices

The method hiopMatrixDense::assertSymmetry checks symmetry of the local data block and likely won't work with distributed memory partitioned matrices. It would be good to clarify this in the documentation.

Cmake fails to find the installed lapack on Lassen

Cmake cannot find the installed Lapack on Lassen.
In configuration, it shows:
Found LAPACK libraries: /usr/lib64/libessl.so;/usr/lib64/libblas.so;
However, it gives error message with compiling the code:
hiopKKTLinSys.cpp:(.text+0x7c24): undefined reference to `dposvx_'

Same error happens when different version of Lapack is loaded by command "module".

DPOSVXEpsilon1Full parameter with illegal value

Hi,

I'm using hiop library together with MFEM to solve a optimization problem that involves the resolution of a PDE. The optimization works very well and in few iteration a get convergence. However, I recieve the following warning after each iteration and I don't know what it means to be able to remove it.

"On entry to DPOSVXEpsilon1Full parameter number 6 had an illegal value"

thanks for your time,
Jesus

Configure build for libhiop.so with correct rpath

It would be desirable to build shared HiOp library with runtime paths to its dependencies, similar to how Ipopt is configured to build.

@cnpetra @ashermancinelli

Cannot retrieve constraint multipliers from solution_callback

I am trying to access the constraint multipliers through solution_callback, but the multipliers array lam (and even the constraints arrays g) is NULL. Below is the debugging frame which shows that g and lam passed to solution_callback are NULL.

Process 56591 stopped
* thread #1, queue = 'com.apple.main-thread', stop reason = breakpoint 1.1
    frame #0: 0x00000001001411f5 libopflow.dylib`OPFLOWHIOPInterface::solution_callback(this=0x000000010460a430, status=Solve_Success, n=24, xsol=0x000000010460d140, z_L=0x000000010460dd10, z_U=0x000000010460de10, m=36, g=0x0000000000000000, lam=0x0000000000000000, obj_value=5297.4067102865956) at opflow-hiop.cpp:441:18

This is with running the NewtonMDS solver.

I am missing setting something or does this need to be added to HIOP?

removing fixed variables/NLP preprocessing [on device]

parts of the code NLP preprocessing code need to be revisited since they are "wired" to work only on CPU. Need to either use generic hiopVector kernels or port these parts using Raja. This is especially pervasive in fixed variables removal.

@pelesh

GPU compute mode option

It would be good to add "gpu" as a "compute_mode" option in HiOp. However, this option is not independent of "mem_space" selection. Here is the summary of compute mode and memory space option compatibilities:

"cpu" option works only with "default", "host" and likely "um" memory space options.
"hybrid" works with all but "device" memory space option.
"gpu", if added, would work with "um", "pinned" or "device" options.

We can simply document this and expect user to select a meaningful combination or we can add some logic to HiOp options class that would warn user or when incompatible combination was selected and fall back to the next best thing.

If HiOp is built without RAJA support, only "default" memory space is available.

Options "um", "pinned" or "device" are available only if HiOp is built with GPU support (for now it's CUDA only) turned on.

CC @ashermancinelli @cnpetra

Consider simplifying how dense matrix is accessed in HiOp interface

HiOp is using array of row pointers to access dense matrix data through its interface (e.g. constraint Jacobian data). Perhaps more flexible solution, especially for GPU implementations, would be to pass a raw pointer to matrix data.

Such change would imply all matrix rows are stored in a contiguous memory block. This assumption is actually made at several places in HiOp code, so passing pointer to data block instead of an array of row pointers would make it more clear to user how to store Jacobian data, in addition to making it more GPU friendly.

C/Fortran interface

A C/Fortran interface to this would be great. I think it could be added in a similar way that Ipopt has done it.

`addToSymDenseMatrixUpperTriangle` and `transAddToSymDenseMatrixUpperTriangle` for sparse matrix have never been used

functions addToSymDenseMatrixUpperTriangle and transAddToSymDenseMatrixUpperTriangle for sparse matrix have never been used.
The optimization routines under 'hiopKKTLinMDS' only need to use this function for its dense part (see here) , and we always assume the sparse part of HessMDS is a diagonal matrix (in order to perform its inverse function efficiently).

New functions addToSym**Sparse**MatrixUpperTriangle may be required for the sparse linear implementation, which is handled in #85. The RAJA variants may be implemented later.

Correctness on Summit Supercomputer

Cosmin,
As we discussed, the results on Summit (compared to Linux) are in the picture below:

.
Kalyan

MA86 Z headers not working with C++

MA86 Z headers are not working with C++ when std::complex includes are present.

A temporary dirty-ish fix below.

In the MA86 'include' directory, copy 'hsl_mc69z.h" and "hsl_ma86z.h" to 'hsl_mc69z.hpp" and "hsl_ma86z.hpp"

Edit 'hsl_mc69z.hpp" and "hsl_ma86z.hpp" as in 2, 3, and 4 below.

After #include <complex.h> add these lines

#ifdef __cplusplus
extern "C" {
#endif

replace "complex" with "_Complex" whenever occurs in "double complex" or "complex double"
before the last #endif insert these lines

#ifdef __cplusplus
}
#endif

Insufficient Documentation for hiopMatrixSymSparseTriplet::transAddToSymDenseMatrixUpperTriangle

For the fuction transAddToSymDenseMatrixUpperTriangle, it is unclear given the current documentation which matrix should be transposed. From what is in the kernel, it appears as though it is the symmetric sparse triplet matrix that is transposed. If this is the case, then this may be related to issue #77.

If it is the output matrix that should be transposed, then the kernel should be modified, and documentation should be updated accordingly.

Coding style to distinguish class member variables

It might be helpful to consider a coding style that makes class member variables distinguishable (e.g. ending each member variable name with _). IMHO it would significantly improve readability of the code, especially in functions that are a couple of hundreds lines long and operate on a few dozen variables.

Considering the size of HiOp code, making a wholesale changes would require some work, but even setting style guidelines for future contributions would be quite helpful.

hiopMatrixMDS copyFrom method will always fail

In hiopMatrixMDS::copyFrom, we have:

  virtual void copyFrom(const hiopMatrixMDS& m)
  {
    mSp->copyFrom(*m.mSp);
    mDe->copyFrom(*m.mDe);
  }

Yet, hiopMatrixSparseTriplet::copyFrom has:

void hiopMatrixSparseTriplet::copyFrom(const hiopMatrixSparseTriplet& dm)
{
  assert(false && "this is to be implemented - method def too vague for now");
}

Therefore this method will always fail. Should it not be a method of hiopMatrixMDS or should there be an implementation for hiopMatrixSparseTriplet::copyFrom that will not immediately assert?

User can't set verbosity level without re-compiling HIOP

Hey,
Just ran into this one.

In hiopNlpFormulation.cpp, you create a hiop logger with the verbosity level as a parameter. However, you are immediately grabbing the verbosity level option from the default constructed hiopOptions which prohibits anyone from actually specifying the verbosity level :

  options = new hiopOptions(/*filename=NULL*/);

  hiopOutVerbosity hov = (hiopOutVerbosity) options->GetInteger("verbosity_level");
  log = new hiopLogger(this, hov, stdout);

Error accumulation in sequential `logBarrier` vector method

Method hiopVectorPar::logBarrier seems to accumulate error and fails unit tests. In the current unit test the error is of order 1e-11 whereas expected accuracy is around machine precision (as is the case for all other vector kernel tests).

Consider modifying sequential algorithm to use Kahan summation. Check whether optimization -O2 or -O3 would destroy the precision restoration in Kahan summation.

According to @cnpetra, for logBarrier method accuracy is more important than performance, because it directly affects convergence of the overall algorithm.

Possible bug in hiopMatrixDenseRowMajor::timesMatTrans

It seems there may be a bug in the MPI part of the hiopMatrixDenseRowMajor::timesMatTrans function. Perhaps this line should instead be:

  double* Wglob= W.new_mxnlocal_buff(); //[n2Red];

Local buffers of this and W matrix are not necessarily the same.

CC @JuanDiegoMontoya

Developer Readme Tests

In README_developers.md the required tests involve using clang tools for address sanitization. Clang does not work so readily with openmp which is now a dependency for the new RAJA linear algebra library. To run these we have to disable HIOP_USE_RAJA. @cnpetra would you still like to have clang tools checks for the non-raja parts of hiop or should we pursue other tests as prerequisites for submitting PRs?

hiopMatrixDense shiftRows segfault

When testing the shiftRows method of hiopMatrixDense, I am running into a segfault on our Power9 systems:

233	    A.shiftRows(shift);
(gdb) 

Program received signal SIGSEGV, Segmentation fault.
0x0000000010017474 in hiop::hiopMatrixDense::shiftRows (this=0x7fffffffe410, shift=4) at /people/manc568/projects/hiop/src/LinAlg/hiopMatrix.cpp:256
256	    assert(test1==M[shift<0?0:m_local][0] && "a different copy technique than memcpy is needed on this system");

Should we add a fallback method in case memcpy will not work? How do you suggest we test this method? @cnpetra

Suppressing GPU Fact info when using MAGMA

When running HiOp with MAGMA, the following output is being printed to stdout.

GPU FACT in 0.0493771 sec at TFlops: 0.000712128
GPU FACT in 0.0434769 sec at TFlops: 0.00080877
GPU FACT in 0.0436565 sec at TFlops: 0.000805443
GPU FACT in 0.043938 sec at TFlops: 0.000800282
GPU FACT in 0.0430558 sec at TFlops: 0.00081668
GPU FACT in 0.0432639 sec at TFlops: 0.000812752
GPU FACT in 0.0433745 sec at TFlops: 0.00081068
GPU FACT in 0.0432896 sec at TFlops: 0.000812269
GPU FACT in 0.0591214 sec at TFlops: 0.000594756
GPU FACT in 0.0609099 sec at TFlops: 0.000577292

I'd like this output to be suppressed and perhaps be allowed based on the verbose level.

I think it is printed via this line

timesMat segmentation fault

Setting up three matrices like so:

hiop::hiopMatrixDense A(M_global, K_global, k_partition, comm);
hiop::hiopMatrixDense M(K_global, N_global, n_partition, comm);
hiop::hiopMatrixDense W(M_global, N_global, n_partition, comm);

I then set values and attempt to call timesMat:

A.setToConstant(A_val);
W.setToConstant(W_val);
M.setToConstant(M_val);
A.timesMat(beta, W, alpha, M);

real_type expected = (beta * W_val) + (alpha * A_val * M_val * N_global);
const int fail = verifyAnswer(&W, expected);

Which results in a segfault.

hiopMatrixSparse interface decisions

In our internal branch, the linear algebra factory has a method (createMatrixSparse) to create an instance of the abstract class hiopMatrixSparse, choosing the appropriate implementation (e.g. RAJA vs default).

Seeing that there are other implementations of sparse matrices, how should we alter the factory class API to create other kinds of sparse matrices?

A few options:

Pass options as template parameters (storage type of csr/coo, complex/real-only, sym/non-sym, etc)
Pass options as parameters of enum types
Pass options as parameters of string types
Create different factory methods for different types of sparse matrices
- i.e. methods like createMatrixSparseSym, createMatrixSparse, and others

CC @pelesh

solve() method implemented in hiopLinSolver.hpp file.

Should we consider moving implementation of the solve() method below from hiopLinSolver.hpp to a source file? Simplifying API would likely help porting to GPU and managing compile-time dependencies. See also #43.

  void solve ( hiopVector& x_ )
  {
    assert(M.n() == M.m());
    assert(x_.get_size()==M.n());
    int N=M.n(), LDA = N, info;
    if(N==0) return;

    hiopVectorPar* x = dynamic_cast<hiopVectorPar*>(&x_);
    assert(x != NULL);

    char uplo='L'; // M is upper in C++ so it's lower in fortran
    int NRHS=1, LDB=N;
    DSYTRS(&uplo, &N, &NRHS, M.local_buffer(), &LDA, ipiv, x->local_data(), &LDB, &info);
    if(info<0) {
      nlp->log->printf(hovError, "hiopLinSolverIndefDenseLapack: DSYTRS returned error %d\n", info);
      assert(false);
    } else if(info>0) {
      nlp->log->printf(hovError, "hiopLinSolverIndefDenseLapack: DSYTRS returned error %d\n", info);
    }   
  }

CC @ashermancinelli

Compiling HiOp on Mac (v 10.14.6)

I'm getting linker errors of the following type:
[ 68%] Linking CXX executable nlpDenseCons_ex3.exe
Undefined symbols for architecture x86_64:
"daxpy", referenced from:
hiop::hiopVectorPar::axpy(double, hiop::hiopVector const&) in libhiop.a(hiopVector.cpp.o)
hiop::hiopMatrixDense::addMatrix(double, hiop::hiopMatrix const&) in libhiop.a(hiopMatrix.cpp.o)

Openblas, Lapack, OpenMPI are installed.

This is the cmake output:
cmake_out.txt

Sparse algebra interface and lambda .= 0 at first iteration

Finally, I have good description of this case. The MOI interface is working fine using the dense algebra. With the sparse one, I have the following issue.

I pass this sparsity pattern to Hiop:

iHSS = Int32[0, 1, 2, 6, 9, 15, 18, 7, ..., 14, 16, 21, 22, 23, 9, 13, 14, 18, 22, 23]
jHSS = Int32[0, 1, 2, 6, 6, 6, 6,..., 22, 22, 22, 22, 22, 22, 22, 23, 23, 23, 23, 23, 23

The relevant entries are only the first 3.

At the first iteration on the IEEE 9 bus case with 3 generators and the lambdas being initialized to 0 I get the following Hessian:

nzval = [2200.0, 1700.0, 2450.0, 0.0, ..., -0.0, 0.0, -0.0, 0.0, 0.0, 0.0, -0.0, -0.0, 0.0]

Because the multipliers lambda are 0 this seems right to me, given there are 3 generators.

Hiop then aborts with this message:

iter    objective     inf_pr     inf_du   lg(mu)  alpha_du   alpha_pr linesrch
   0  8.3631250e+03 1.550e+00  3.530e+03  -1.00  0.000e+00  0.000e+00  -(-)
[Warning] hiopLinSolverIndefDense error: 11 entry in the factorization's diagonal
is exactly zero. Division by zero will occur if it a solve is attempted.
[Warning] hiopLinSolverIndefDense error: 11 entry in the factorization's diagonal
is exactly zero. Division by zero will occur if it a solve is attempted.
julia: /scratch/mschanen/git/hiop/src/Optimization/hiopPDPerturbation.hpp:184: bool hiop::hiopPDPerturbation::compute_perturb_singularity(double&, double&, double&, double&): Assertion `delta_cc == 0. && delta_cd == 0.' failed.

If I set the multipliers to nonzero at the first iteration, it works fine, and the problem even converges to the right solution. So I'm wondering what is going on? Is this expected behavior?

Thanks!

HiOp segfaults on nlpDenseCons_ex1 tests when built with default options

Building HiOp with default options on macOS 10.12.6 and running the tests yields errors for the nlpDenseCons_ex1 tests. Log files can be found at: https://gist.github.com/goxberry/8bdc80e6dcd4d15ed0a7c5130009d6aa

The configuration I'm using is built by spack, so I have some flexibility in choosing libraries, but all of these libraries are included via RPATH directives. My impression is that linking isn't an issue, but I could be wrong about that.

How to select vector pattern

It might be helpful to specify and better document how to select vector pattern. Currently pattern is selected by a vector of 1.0s and 0.0s in double precision. Rules are a little vague as to what happens in case that pattern vector element is neither one nor zero. Possible solutions I see are:

Strictly enforce pattern vectors have elements zero (not selected) and one (selected) only.
- Pros: No changes to existing vector kernel implementations, provides more flexibility for future GPU implementation of vector kernels.
- Cons: Potentially bug prone, need to add assertions (at least in debug mode) enforcing pattern vector values to be either one or zero.
SUNDIALS way: zero means not selected, > zero means selected.
- Pros: Safer than 1., no changes to existing vector kernel implementations needed.
- Cons: Provides less flexibility for GPU implementations, undefined when value < 1.
Define pattern vector as a vector of booleans.
- Pros: Safe implementation
- Cons: Less flexibility for GPU implementations, more invasive changes to the code needed.

The argument related to GPU implementations is something like this: If pattern vector elements are either one or zero, then instead of using conditionals, which could cause warp divergence, one could simply do elementwise multiply with pattern vector to select the pattern. For memory bound computations, this could lead to a better performing implementation, I think.

@cnpetra @ashermancinelli

axdzpy_w_pattern is implemented, but not part of hiopVector abstract interface

Method axdzpy_w_pattern is implemented in class hiopVectorPar, but there is no abstract interface to it in (almost) pure virtual class hiopVector. Is this method needed?

crash with NDEBUG flag

in a couple of places (essential) code is placed inside 'assert(...)'. This causes undefined behavoir and/or crashes when hiop is compiled with NDEBUG. @junkudo

HiOp GPU-enabled build fails

My HiOp build from dev/NewtonMDS branch fails at compile stage with message:

make[2]: *** No rule to make target `/usr/lib64/libopenblas.so -lmagma -L/share/apps/cuda/9.2/lib64 -lculibos -lcublas -lcublasLt -lnvblas -lcusparse -lcudart -lcudadevrt', needed by `src/LinAlg/test_hiopLinAlgComplex.exe'.  Stop.

It seems that there is some mess-up with CMake paths. I used following configuration for build:

CC=mpicc CXX=mpicxx FC=mpif90 cmake              \
-DHIOP_USE_MPI=1                                 \
-DHIOP_USE_GPU=1                                 \
-DHIOP_MAGMA_DIR="/.../exasgd/newell/magma" \
-DCMAKE_INSTALL_PREFIX=$HIOP_DIR                 \
../hiop

I used cmake 3.13.4, gcc 7.4.0, openmpi 3.1.3, cuda 9.2, OpenBLAS 0.3.3, and LAPACK 3.4.2 for the build. Configure part works fine, but something goes wrong with complex linear algebra compilation.

Verbose output gives me this:

make -f src/LinAlg/CMakeFiles/test_hiopLinAlgComplex.exe.dir/build.make src/LinAlg/CMakeFiles/test_hiopLinAlgComplex.exe.dir/depend
make[2]: Entering directory `/.../exasgd/src/hiop/build_newell'
cd /.../exasgd/src/hiop/build_newell && /.../cmake/3.13.4/bin/cmake -E cmake_depends "Unix Makefiles" /.../exasgd/src/hiop/hiop /.../exasgd/src/hiop/hiop/src/LinAlg /.../exasgd/src/hiop/build_newell /.../exasgd/src/hiop/build_newell/src/LinAlg /.../exasgd/src/hiop/build_newell/src/LinAlg/CMakeFiles/test_hiopLinAlgComplex.exe.dir/DependInfo.cmake --color=
make[2]: Leaving directory `/.../exasgd/src/hiop/build_newell'
make -f src/LinAlg/CMakeFiles/test_hiopLinAlgComplex.exe.dir/build.make src/LinAlg/CMakeFiles/test_hiopLinAlgComplex.exe.dir/build
make[2]: Entering directory `/.../exasgd/src/hiop/build_newell'
make[2]: *** No rule to make target `/usr/lib64/libopenblas.so -L/.../exasgd/newell/magma/lib -lmagma -L/.../cuda/9.2/lib64 -lculibos -lcublas -lcublasLt -lnvblas -lcusparse -lcudart -lcudadevrt', needed by `src/LinAlg/test_hiopLinAlgComplex.exe'.  Stop.

Counting negatives in Hxs should be implemented as a vector kernel?

Consider implementing loop that is counting negative eigenvalues in hiopKKTLinCompressedMDSXYcYd class as a vector kernel and adding it to the abstract vector interface. Otherwise, method hiopKKTLinSysCompressedMDSXYcYd::update depends on implementation detail of the Hxs_ vector.

This loop will not run on GPU, for example.

Possible issue in Magma solver interface

There may be an issue in Magma no pivoting solver interface or even bug in Magma. The other possibility is that Example 4 cannot be solved when using no-pivoting linear solver. When switching between hybrid and cpu modes in Example 4, the number of iterations and the convergence rate changes.

It seems that Magma no-pivoting solver diverges. I am not sure though if cpu compute mode uses no-pivoting function or it always uses Bunch-Kauffmann, though.

Below is the Example 4 output for hybrid and cpu compute modes, respectively.

`hybrid` compute mode - Magma factorization and Lapack solve:

iter    objective     inf_pr     inf_du   lg(mu)  alpha_du   alpha_pr linesrch
   0  3.9800005e+02 4.990e+02  4.000e+00  -1.00  0.000e+00  0.000e+00  -(-)
[Warning] KKT_MDS_XYcYd linsys: MagmaNopiv size 503 (403 cons) (safe_mode=0)
   1  3.6824691e+02 5.536e+02  3.990e+00  -1.00  1.579e-03  2.372e-03  1(s)
   2  3.3431951e+02 5.126e+02  3.989e+00  -1.00  6.706e-05  4.510e-05  1(s)
   3  6.2429733e+02 4.067e+02  3.987e+00  -1.00  6.218e-05  6.111e-05  1(s)
   4  1.5031313e+03 3.637e+02  1.647e+01  -1.00  1.336e-04  3.048e-05  1(s)
[Warning] Requesting additional accuracy and stability from the KKT linear system at iteration 4 (safe mode ON)
[Warning] KKT_MDS_XYcYd linsys: MagmaBuKa size 503 (403 cons) (safe_mode=1)
   5  1.0674827e+02 1.112e+02  1.509e+01  -1.00  3.784e-01  6.944e-01  1(s)
   6  5.0647226e+00 5.952e+01  1.038e+01  -1.00  3.490e-01  4.646e-01  1(s)
   7  4.1846536e+00 5.900e+01  9.514e+00  -1.00  4.810e-02  8.682e-03  1(s)
   8  4.1624661e+00 5.893e+01  6.328e+00  -1.00  9.055e-03  1.154e-03  1(s)
   9 -4.9917037e+01 2.292e-01  1.064e+01  -1.00  6.468e-03  9.961e-01  1(s)
iter    objective     inf_pr     inf_du   lg(mu)  alpha_du   alpha_pr linesrch
  10 -4.9924417e+01 2.221e-01  2.031e+01  -1.00  7.325e-01  3.102e-02  1(s)
  11 -4.3510181e+01 1.070e-12  1.171e+00  -1.00  8.872e-01  1.000e+00  1(s)
  12 -4.3197637e+01 4.252e-14  1.000e-06  -1.00  1.000e+00  1.000e+00  1(f)
  13 -4.9686273e+01 1.054e-12  1.748e+00  -2.55  9.219e-01  1.000e+00  1(f)
  14 -4.9973780e+01 1.013e-13  2.828e-08  -2.55  1.000e+00  1.000e+00  1(f)
  15 -4.9992605e+01 3.642e-14  1.504e-09  -3.82  1.000e+00  1.000e+00  1(f)
  16 -4.9993471e+01 8.882e-16  1.504e-09  -3.82  1.000e+00  1.000e+00  1(f)
  17 -4.9993739e+01 2.442e-15  1.729e-03  -5.73  9.710e-01  1.000e+00  1(f)
  18 -4.9994734e+01 1.179e-13  1.845e-11  -5.73  1.000e+00  1.000e+00  1(f)
  19 -4.9994724e+01 2.887e-15  1.845e-11  -5.73  1.000e+00  1.000e+00  1(f)
iter    objective     inf_pr     inf_du   lg(mu)  alpha_du   alpha_pr linesrch
  20 -4.9994868e+01 6.661e-16  3.077e-04  -6.00  8.548e-01  1.000e+00  1(f)
  21 -4.9994888e+01 6.883e-14  1.000e-11  -6.00  1.000e+00  1.000e+00  1(f)
Successfull termination.
Total time 2.521 sec 
Hiop internal time:     total 2.515 sec     avg iter 0.120 sec 
    internal total std dev across ranks 0.000 percent
Fcn/deriv time:     total=0.004 sec  ( obj=0.000 grad=0.000 cons=0.001 Jac=0.002 Hess=0.001) 
    Fcn/deriv total std dev across ranks 0.000 percent
Fcn/deriv #: obj 56 grad 22 eq cons 57 ineq cons 57 eq Jac 22 ineq Jac 22
Total KKT time 2.506 sec 
        update init 1.725sec     update linsys 0.077 sec     fact 0.655 sec 
        solve rhs-manip 0.007 sec     triangular solve 0.042 sec

`cpu` compute mode - Lapack factorization and solve

iter    objective     inf_pr     inf_du   lg(mu)  alpha_du   alpha_pr linesrch
   0  3.9800005e+02 4.990e+02  4.000e+00  -1.00  0.000e+00  0.000e+00  -(-)
   1  1.8397700e+01 1.818e+02  1.457e+00  -1.00  3.341e-01  6.357e-01  1(s)
   2 -4.3390041e+01 5.305e+01  4.253e-01  -1.00  7.654e-01  7.081e-01  1(s)
   3 -4.9598500e+01 6.175e-13  2.539e-01  -1.00  8.152e-01  1.000e+00  1(s)
   4 -4.9406538e+01 5.160e-13  3.078e+01  -1.00  9.388e-01  5.909e-02  1(f)
   5 -4.3566387e+01 3.950e-13  1.000e-06  -1.00  1.000e+00  1.000e+00  1(f)
   6 -4.9711480e+01 6.206e-13  1.601e+00  -2.55  9.261e-01  1.000e+00  1(f)
   7 -4.9974904e+01 1.774e-13  2.828e-08  -2.55  1.000e+00  1.000e+00  1(f)
   8 -4.9992701e+01 3.353e-14  1.504e-09  -3.82  1.000e+00  1.000e+00  1(f)
   9 -4.9993494e+01 1.643e-14  1.504e-09  -3.82  1.000e+00  1.000e+00  1(f)
iter    objective     inf_pr     inf_du   lg(mu)  alpha_du   alpha_pr linesrch
  10 -4.9993744e+01 1.310e-14  1.732e-03  -5.73  9.710e-01  1.000e+00  1(f)
  11 -4.9994740e+01 2.887e-15  1.845e-11  -5.73  1.000e+00  1.000e+00  1(f)
  12 -4.9994726e+01 1.377e-14  1.845e-11  -5.73  1.000e+00  1.000e+00  1(f)
  13 -4.9994868e+01 7.772e-15  3.029e-04  -6.00  8.509e-01  1.000e+00  1(f)
  14 -4.9994888e+01 2.941e-12  1.000e-11  -6.00  1.000e+00  1.000e+00  1(f)
Successfull termination.
Total time 0.131 sec 
Hiop internal time:     total 0.128 sec     avg iter 0.009 sec 
    internal total std dev across ranks 0.000 percent
Fcn/deriv time:     total=0.002 sec  ( obj=0.000 grad=0.000 cons=0.000 Jac=0.001 Hess=0.001) 
    Fcn/deriv total std dev across ranks 0.000 percent
Fcn/deriv #: obj 15 grad 15 eq cons 16 ineq cons 16 eq Jac 15 ineq Jac 15
Total KKT time 0.123 sec 
        update init 0.009sec     update linsys 0.049 sec     fact 0.059 sec 
        solve rhs-manip 0.004 sec     triangular solve 0.002 sec

Symmetric Sparse Matrix Kernel Implementations

addToSymDenseMatrixUpperTriangle and transAddToSymDenseMatrixUpperTriangle for the symmetric sparse triplet classes both seem to not take into account the symmetric nature of the matrices when adding them to the output matrices.

For reference timesVec has a the following section of code that takes this into account:

y[iRow_[i]] += alpha * x[jCol_[i]] * values_[i];
if(iRow_[i]!=jCol_[i])
  y[jCol_[i]] += alpha * x[iRow_[i]] * values_[i];

A way of fixing this issue would be to have the existing addToSymDenseMatrixUpperTriangle look something like the following, with a similar fix for transAddToSymDenseMatrixUpperTriangle:

void hiopMatrixSymSparseTriplet::addToSymDenseMatrixUpperTriangle(int row_start, int col_start, 
						  double alpha, hiopMatrixDense& W) const
{
  assert(row_start>=0 && row_start+nrows<=W.m());
  assert(col_start>=0 && col_start+ncols<=W.n());
  assert(W.n()==W.m());

  double** WM = W.get_M();
  for(int it=0; it<nnz; it++) {
    assert(iRow[it]<=jCol[it] && "sparse symmetric matrices should contain only upper triangular entries");
    int i = iRow[it]+row_start;
    int j = jCol[it]+col_start;
    assert(i<W.m() && j<W.n()); assert(i>=0 && j>=0);
    assert(i<=j && "symMatrices not aligned; source entries need to map inside the upper triangular part of destination");
    WM[i][j] += alpha*values[it];
    if(iRow[it] != jCol[it])
    {
      i = jCol[it]+row_start;
      j = iRow[it]+col_start;
      assert(i<W.m() && j<W.n()); assert(i>=0 && j>=0);
      assert(i<=j && "symMatrices not aligned; source entries need to map inside the upper triangular part of destination");
      WM[i][j] += alpha*values[it];
    }
  }
}

If this fix is not implemented, only one half of the symmetric sparse matrix will be added to the destination matrix every time this function is called.

Is `hiopVectorPar::startingAtCopyFromStartingAt` correctly implemented?

It seems that local variable howManyToCopy and howManyToCopyDest are flipped around in function hiopVectorPar::startingAtCopyFromStartingAt.

howManyToCopy is the number of elements at the destination that will be overwritten.
howManyToCopyDest is number of source element that will be written to the destination.
The assertion in this function will enforce:

  assert(howManyToCopy <= howManyToCopyDest);

what is, I think, opposite of what is intended (despite names of these variables suggesting otherwise).

Also, it seems the function arguments seem to have misleading meaning since start_idx_src is destination and start_idx_dest is source offset.

This (possible) bug does not affect the code, because only time this function is called the source and destination are of the same size and both offsets are zero.

@cnpetra

`copyFrom` and similar methods are not part of the abstract hiopMatrix interface

Heavy use is made of copyFrom and similar methods, yet the abstract interface does not mandate that a hiopMatrix implementation has this method. In contrast, hiopVector does have copyTo and copyFrom methods in the abstract interface. I believe this could lead to implementation-specific code which will cause problems when we attempt to migrate to other implementations for hiopMatrix.

Review RAJA kernels

Review RAJA kernels (currently in raja-dev branch) and flag potential bottlenecks. The purpose of this review is pre-screening of potential performance issues to give us heads up what to pay attention to when profiling the performance.

The current implementation of RAJA kernels was done with objective to ensure accurate computations. Some kernels, such as hiopVectorRajaPar::projectIntoBounds, are implemented in a way that is not quite "GPU friendly". Help from RAJA developers in identifying other potential bottlenecks and suggestions how to implement these kernels better is very much appreciated.

RAJA kernels are implemented in following HiOp classes:

hiopVectorRajaPar
hiopMatrixRajaDense
hiopMatrixRajaSparseTriplet
hiopMatrixRajaSymSparseTriplet

Currently, RAJA kernels run only within unit tests. See tests in:

testVector.cpp
testMatrix.cpp
testMatrixSparse.cpp

CC @davidbeckingsale @rhornung67

LSQ duals initialization working with hiopNlpMDS needed

The existing LSQ duals calculator needs to be revisited and re-engineered to work with generic hiopNlpFormulation. Currently it assumes a hiopNlpDenseCons NLP:

hiop/src/Optimization/hiopDualsUpdater.cpp

Line 111 in 2252b90

hiopNlpDenseConstraints* nlpd = dynamic_cast<hiopNlpDenseConstraints*>(_nlp);

Do *SymDenseMatrixUpperTriangle methods work for MPI partitioned matrices?

It seems that *SymDenseMatrixUpperTriangle methods add elements of a rectangular matrix (pointed by this) into the upper triangular part of the matrix W (passed as the input argument). The methods use only local data indices and may not work for distributed memory partitioned matrices unless the caller provides row and column start indices that would guarantee data is written to the upper triangular part of W.

It would be good to better document preconditions for calling these functions as they are nontrivial. Are these methods intended for use when both, none, or only W are MPI partitioned?

nlpMDS_ex5.exe fails on Broadwell/Volta100

The nlpMDS_ex5.exe test fails to converge on Intel platform with Volta GPU. HiOp is built with GPU support and Kron reduction enabled. When running with 1 MPI rank and on one GPU device, following error message is obtained:

...

[Warning] hiopLinSolverMagmaBuka error: 191 entry in the factorization's diagonal
is exactly zero. Division by zero will occur if it a solve is attempted.
[Warning] hiopLinSolverMagmaBuka error: 191 entry in the factorization's diagonal
is exactly zero. Division by zero will occur if it a solve is attempted.
[Warning] KKT_MDS_XYcYd linsys: Detected negative eigenvalues in (1,1) sparse block.
  83 -1.3559463e+03 1.260e-02  5.259e-03  -2.55  1.000e+00  1.000e+00  1(f)
[Warning] hiopLinSolverMagmaBuka error: 191 entry in the factorization's diagonal
is exactly zero. Division by zero will occur if it a solve is attempted.
[Warning] hiopLinSolverMagmaBuka error: 191 entry in the factorization's diagonal
is exactly zero. Division by zero will occur if it a solve is attempted.
[Warning] KKT_MDS_XYcYd linsys: Detected negative eigenvalues in (1,1) sparse block.
  84 -1.3560463e+03 4.662e-02  2.022e-03  -3.82  1.000e+00  1.000e+00  1(h)
[Warning] hiopLinSolverMagmaBuka error: 191 entry in the factorization's diagonal
is exactly zero. Division by zero will occur if it a solve is attempted.
[Warning] hiopLinSolverMagmaBuka error: 191 entry in the factorization's diagonal
is exactly zero. Division by zero will occur if it a solve is attempted.
[Warning] KKT_MDS_XYcYd linsys: Detected negative eigenvalues in (1,1) sparse block.
Panic: minimum step size reached. The problem may be infeasible or the gradient inaccurate. Will exit here.
  85 -1.3560463e+03 4.662e-02  2.028e-03  -3.82  1.000e+00  5.551e-17  54(?)
Couldn't solve the problem.
Linesearch returned unsuccessfully (small step). Probable cause: inaccurate gradients/Jacobians or infeasible problem.
Total time 6.014 sec 
Hiop internal time:     total 6.002 sec     avg iter 0.071 sec 
    internal total std dev across ranks 0.000 percent
Fcn/deriv time:     total=0.005 sec  ( obj=0.001 grad=0.000 cons=0.001 Jac=0.002 Hess=0.001) 
    Fcn/deriv total std dev across ranks 0.000 percent
Fcn/deriv #: obj 172 grad 86 eq cons 173 ineq cons 173 eq Jac 86 ineq Jac 86
Total KKT time 5.986 sec 
	update init 0.001sec     update linsys 0.293 sec     fact 5.673 sec 
	solve rhs-manip 0.013 sec     triangular solve 0.005 sec 

solve4 trouble: returned -4 (with objective is -1.356046289760e+03)
srun: error: dl08: task 0: Exited with exit code 255

Following dependencies have been used to build HiOp:

$ module list
Currently Loaded Modulefiles:
  1) gcc/7.3.0              3) cmake/3.15.3           5) metis/5.1.0
  2) cuda/10.2.89           4) openmpi/3.1.3          6) magma/2.5.2_cuda10.2

Please let me know what additional data would be helpful.

CC @ashermancinelli

Build issue

Two reports of build failure came in the last week via email. Coincidentally, I've just encountered the same exact problem on summit (using cmake 3.9.2)

CMake Error at CMakeLists.txt:7 (cmake_policy):
  Policy "CMP0074" is not known to this version of CMake.


-- The C compiler identification is GNU 4.8.5
-- The CXX compiler identification is GNU 4.8.5
(...)
-- Found LAPACK libraries: /autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-9.1.0/openblas-0.3.9-aymovpat33osbzgh5gsmhyvstsol4sfp/lib/libopenblas.so;/autofs/nccs-svm1_sw/summit/.swci/1-compute/opt/spack/20180914/linux-rhel7-ppc64le/gcc-9.1.0/openblas-0.3.9-aymovpat33osbzgh5gsmhyvstsol4sfp/lib/libopenblas.so
CMake Error at src/Optimization/CMakeLists.txt:2 (target_link_libraries):
  Object library target "hiopOptimization" may not link to anything.


-- Configuring incomplete, errors occurred!
See also "/ccs/home/cpetra/work/projects/hiop/build/CMakeFiles/CMakeOutput.log".
See also "/ccs/home/cpetra/work/projects/hiop/build/CMakeFiles/CMakeError.log".

@ashermancinelli

API for hiopMatrixSparseTriplet has implementation specific arguments

Some of the methods of hiopMatrixSparseTriplet class have an argument, which is a reference to a specific matrix implementation (see e.g. transAddToSymDenseMatrixUpperTriangle method). This could potentially lead to cumbersome solutions when porting this class to hardware accelerators (e.g. GPU).

A minimally invasive way to go about this would be to add an enum to matrix base class with matrix type IDs, as well as a virtual method to return the matrix type ID. Similar was done in SUNDIALS.

With such modification, method transAddToSymDenseMatrixUpperTriangle can take reference to virtual hiopMatrix class as the input argument, and then check in the implementation if a compatible matrix type was passed. The implementation of transAddToSymDenseMatrixUpperTriangle can then select computation specific to the matrix layout or throw an exception if matrix type is incompatible.

This could keep API cleaner and provide more extensibility. The downside of this approach is that passing an incompatible matrix type would be caught at runtime instead of at compile time. A more comprehensive solution would be to use template parameters to specify matrix layout, but that would require more significant changes to the code.

@cnpetra @ashermancinelli

compilation error

with cmake -DHIOP_USE_MPI=NO .. I get the compilation issue below:

[ 86%] Building CXX object tests/CMakeFiles/testMatrix.dir/testMatrix.cpp.o
In file included from /ccs/home/cpetra/work/projects/hiop/tests/LinAlg/matrixTestsDense.hpp:59:0,
                 from /ccs/home/cpetra/work/projects/hiop/tests/testMatrix.cpp:61:
/ccs/home/cpetra/work/projects/hiop/tests/LinAlg/matrixTests.hpp:63:20: fatal error: optional: No such file or directory
 #include <optional>
                    ^
compilation terminated.
make[2]: *** [tests/CMakeFiles/testMatrix.dir/testMatrix.cpp.o] Error 1
make[1]: *** [tests/CMakeFiles/testMatrix.dir/all] Error 2
make: *** [all] Error 2

Update Tags

I am installing updated hiop versions on our PNNL machines. Would now be an appropriate time to update the tags to v0.3? Updating the tags after every large PR would be helpful in tracking versions. I think v0.2 points at c52a6f6 which is from last December.

linking error with some blas/lapack libraries

hiop assumes a certain fortran name mangling, which does not occur for some blas/lapack implementations. As a result, hiop does not link with these libraries. @junkudo @goxberry

llnl / hiop Goto Github PK

hiop's People

Contributors

Stargazers

Watchers

Forkers

hiop's Issues

hybrid compute mode - Magma factorization and Lapack solve:

cpu compute mode - Lapack factorization and solve

Recommend Projects

Recommend Topics

Recommend Org

`hybrid` compute mode - Magma factorization and Lapack solve:

`cpu` compute mode - Lapack factorization and solve