Git Product home page Git Product logo

cktso's Issues

Need some help on weird outputs of the benchmark demo.

Hi! Dr Chen, recently we made some benchmark tests on cktso. The test matrices are drawn from real circuit design ranging from 1.0E+04 up to 1.3E+07. We use the benchmark demos from both NICSLU and CKTSO and meet some weird situation. When
run ./benchmark add20.mtx #nthreads the output is just fine something like this follows:
Analysis time = 4900 us.
Factorization average time = 100 us, min time = 82 us.
Refactorization average time = 49 us, min time = 47 us.
Solve average time = 10 us, min time = 9 us.
Residual = 2.47485e-10.
Transposed solve average time = 12 us, min time = 11 us.
Residual = 2.44494e-10.
NNZ(L) = 9867, NNZ(U) = 7472.
Factorization flops = 133187, solve flops = 32283.
Determinent = 5.86668*10^(-3351).
Memory usage = 646989 bytes, max memory usage = 646989 bytes.

However, let's take CKTSO as an example, if we run ./benchmark ourcircuitmatrix #nthreads, something like this follows:
Analysis time = 0 us.
Factorization average time = 0 us, min time = 0 us.
Refactorization average time = 0 us, min time = 0 us.
Solve average time = 0 us, min time = 0 us.
Residual = 7062.9.
Transposed solve average time = 0 us, min time = 0 us.
Residual = 7062.9.
NNZ(L) = 0, NNZ(U) = 0.
Factorization flops = 106382044954745, solve flops = 10.
Determinent = 4.67441e-310*10^(6.91969e-310).
Memory usage = 1480 bytes, max memory usage = 61772 bytes.

I really don't know how to tune the numerous parameters in CKTSO or NICSLU. Canyou please give me some advices on tuning the solver? Since other popular direct solvers like KLU or PARDISO solved all of our test cases without tuning too much, I'm really confused about the weird result.

Helper function for the case when B is an identity matrix

When the system is relatively large(e.g. B is 20000 * 20000), creating (and allocating) such an identity matrix takes almost the same time as factoring and solving the system(benchmarked in Julia). Is it possible to provide a helper function, so that we could simply pass the sparse matrix A and an allocated space B? I feel like this scenario is quite common in the numerical computation.

Support inplace for CKTSO_L_SolveMV

Can we have a inplace version of CKTSO_L_SolveMV? The goal is to reduce the memory allocation and usage (thus better performance). Right now we need to allocate x first in order to execute CKTSO_L_SolveMV(id, nb, b, x, transpose), can we optimize the logic if the memory of b could be reused?

Support single precision

KLU does not support single precision, and it seems cktso does not as well. How hard to add support for single precision? And just wonder what's the reason behind that most sparse solvers do not support single precision.

FPGA utilization?

I know this may sound like something too much to be done in the near future, but have you considered utilizing cloud FPGA services to achieve more parallel speedups? Do you have any experience in this field?
I've recently read a paper from Tarek Nechma who claims to had success with it - though on local FPGA hardware.
Thank you for any answer or hint.

Native support when b is a 2D matrix

When b is actually a 2D matrix, the time cost for factorization can be neglectable (cause you only have to do once apparently). In this case, does cktso natively support b as a sparse or dense 2D matrix? If yes, is cktso solving the slices of b in parallel as well?

Paper access

It seems the referenced paper is still not indexed by IEEE. Is it possible to provide here if it is already accepted and published? You could have a footnote to indicate the copyright belong to IEEE to legally share here.

always return -8 from factorization

I'm trying to directly use the dll from Julia, however the factorization function always return -8, while others seem work fine (return 0). Here is my script:

using CEnum
using SparseArrays

const _libcktso = joinpath(dirname(@__FILE__),"..","cktso","win10_x64","cktso_l.dll")

mutable struct __cktso_l_dummy end

const ICktSo_L = Ptr{__cktso_l_dummy}

function CKTSO_L_CreateSolver(inst, iparm, oparm)
    ccall((:CKTSO_L_CreateSolver, _libcktso), Cint, (Ptr{ICktSo_L}, Ptr{Ptr{Cint}}, Ptr{Ptr{Clonglong}}), inst, iparm, oparm)
end

function CKTSO_L_DestroySolver(inst)
    ccall((:CKTSO_L_DestroySolver, _libcktso), Cint, (ICktSo_L,), inst)
end

function CKTSO_L_Analyze(inst, is_complex, n, ap, ai, ax, threads)
    ccall((:CKTSO_L_Analyze, _libcktso), Cint, (ICktSo_L, Bool, Clonglong, Ptr{Clonglong}, Ptr{Clonglong}, Ptr{Cdouble}, Cint), inst, is_complex, n, ap, ai, ax, threads)
end

function CKTSO_L_Factorize(inst, ax, fast)
    ccall((:CKTSO_L_Factorize, _libcktso), Cint, (ICktSo_L, Ptr{Cdouble}, Bool), inst, ax, fast)
end

function CKTSO_L_CleanUpGarbage(inst)
    ccall((:CKTSO_L_CleanUpGarbage, _libcktso), Cint, (ICktSo_L,), inst)
end

function CKTSO_L_Determinant(inst, mantissa, exponent)
    ccall((:CKTSO_L_Determinant, _libcktso), Cint, (ICktSo_L, Ptr{Cdouble}, Ptr{Cdouble}), inst, mantissa, exponent)
end

a = Ref{ICktSo_L}(0)
b = Cint[]
c = Clonglong[]
solver = CKTSO_L_CreateSolver(a, b, c)

A = sprand(100, 100, 0.01)
# make sure the diagonal is 1
for i in 1:100
    A[i, i] = 1
end

ap = Clonglong.(A.colptr) .- 1;  # -1 because the indices in Julia are 1-based
ai = Clonglong.(A.rowval) .- 1;  #
ax = Cdouble.(A.nzval);

CKTSO_L_Analyze(a[], Bool(false), Clonglong(0),  pointer_from_objref(ap), pointer_from_objref(ai), pointer_from_objref(ax), Cint(0)) # return 0

res = CKTSO_L_Factorize(a[], pointer_from_objref(ax), Bool(false))  # return -8

display(res)

Parallel performance

Dear Mr.Chen,
we have tested your CKTSO matrix solver in our circuit simulation software and generally liked the performance. We tried it on AMD/Intel Windows platforms, with CSR formatted matrices (row-major).
However we did not achieve multi-thread performance improvement over single-thread mode, only slowdown.
We did transient simulations - many refactorization and solve calls.
We think we are using the library as recommended in the user guide.

To check that we are doing everything properly, could you help in these:

  • if we send you a number of exported MTX files with RHS vectors, can you evaluate if there's a multi-thread speed gain in your test environment? Do you need anything else?
  • are there some statistical dll calls or output values to see if CKTSO has really decided to use multiple threads or used sequential solving?

Thank you and best regards,
Gergely

Support custom or natural ordering

Relevant to #2, since we are dealing with b which the number of columns may greatly exceeds the number of rows, factorization time does not matter anymore, but rather the quality of factorization, or more specifically the number of fills really matters and could result in an even 10% difference on the total computation time. If possible, we would like to use our own optimized ordering and skip the built-in AMD or any MD algorithm.

Support in-place transpose `x` in `CKTSO_L_MV`

I am not sure whether it is feasible, but I am wondering if we could add an additional argument to CKTSO_L_MV like transpose so that if x is a square matrix, then we could get a transposed x directly. The benefit of doing is that sometimes the user may need to run transpose(x) immediately after solving Ax = B (2D), having such an in-place transpose will greatly reduce the memory allocation and time cost if x is very large.
image

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.