Git Product home page Git Product logo

femex's Introduction

femex v.2.1.0

Research use for coding with PDE/FEM in 2D. Tested on Matlab 2013a with gcc-4.8 and gfortran-4.4.

System Requirements:

  • g++/gcc-4.7+ (for c++11)
  • gfortran-4.7+ and gfortran-4.4
  • Matlab 2013a+(gcc-4.4+ compatible)

if you would like to use AGMG-3.0 as solver, gfortran-4.4 is also needed. Thanks to ABI(http://www.fortran90.org/src/faq.html#abi), gfortran 4.7+ can link libraries on compiled files from gfortran 4.4, since the library is backward compatible.

ILUPACK is already included for a choice of solvers.

How to use:

  • For those who hadn't change the default gcc/gfortran .so files within Matlab, you should PRELOAD the system's gcc/gfortran libs first before starting Matlab.
  • femex only works in shell mode, window mode will report TLS problem.

In your shell, type in something like:

export LD_PRELOAD=/usr/lib/gcc/x86_64-linux-gnu/4.8/libstdc++.so:/usr/lib/gcc/x86_64-linux-gnu/4.8/libgcc_s.so:/usr/lib/gcc/x86_64-linux-gnu/4.8/libgfortran.so

then start Matlab using

matlab -nodesktop -nosplash
  • Back up Matlab's out-of-date libstdc++ dynamic library files, try to use system's library.

  • Modify Makefile.in to set MATLAB_ROOT to the directory which includes bin.

  • Simply add femex directory to current workspace path.

  • Follow the example to set up boundary conditions.

Note:

  • Since femex is built with Matlab's mex interface, it will need license for use which is not included.

femex's People

Contributors

gaz3ll3 avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

femex's Issues

Domain decomp

Since our problem works on a geometric shape. When doing iteration for optimization, instead of doing BFGS or something else directly on the nodes/elements(fine nodes), probably we need a rough idea on doing optimization on several sub-domains. Then just like multi-level method, move our solution towards the optimal one.

Problem:

  • Algorithm on how to divide nodes/elements(Aggregates)
  • Integral difficulties, maybe only a few levels can be made. Decomp depends on element's shape, if our mesh is not structured, then we probably need a sorting on location of elements, and get a quad-tree like thing.
  • After the last level, we leave the optimization with element-wise level(piecewise constant) solution, then throw it into node-based iteration would require 3x or more memory.
  • According to above, the convergence rate is very important to us. Since coarse mesh gives a better rate but worse precision, slow convergence at the end would be the case for multilevel method.

use raw pointer instead of eigen

the bbfmm part or kifmm use Eigen to provide most linear algebra operations, but the operations are limited to multiplications and divisions. We probably should use raw pointers instead.

MKL can run in serialized mode here, since all matrices are quite small. Not needed to run in parallel.

exprtk takes too long to compile

Since we only need double precision floating point operations, one way is to build the header with template initiated with double type. Then link all things with the library. Compile process will be great reduced.

Parallelize ray-triangle algorithm

container unordered_set needs to be allocate memory in the process, which might cause problem.

a workaround is to split the all nodes into serveral buckets and each bucket will take a thread to work on.

Now the problem does not need to be solved, a 70+ seconds warming up is fine.

update readme

Need to set preload library in order to run with un-modified matlab.

other interfaces

separate mex and C++ codes into two parts.

Get Julia/Python interfaces for that. Since Matlab is not desirable sometimes, importing too many libraries.

Both can use cffi or similar wrapper or swig.

treecode update

previous treecode is slow and not accurate enough, needs to update.

DOM memory efficiency

The Raylet structure defined as

typedef struct Raylet {
    int32_t elem;
    double first[2];
    double second[2];
} Raylet;

and sizeof gives 40 bytes(since there is padding __int64). first and second take 32 bytes.

line integral not accurate

divide and conquer is good, at last step, has to consider several cases, 9 cases.

x0/side ? x1/side
y0/side ? y1/side

the corner cases are not easy to handle since there are two intersections. In this case, we divide and conquer again. to reduce to the easier 5 cases.

lumped mass with coefficient

The lumped mass is directly extracted from original matrix, that will involve some thin layer elements outside, though it will not harm the result.

Accelerate DOM

  • use OPENMP.
  • optimize some parts. throw the intersecting boundary part away may help.

DOM matrix approach

Now using matrix to calculate the solution instead of iteration.

Set up: 75 s

Iteration takes around 5~6 s per iteration.
Matrix uses 6 s for building the matrix. And around 10 s to solve (or faster).

If matrix method is faster then iteration method will take less iterations.

Find a way to make the solving process faster, since umfpack gives accurate solution to 1e-13 or so, we only need accuracy at 1e-8. Some iteration methods might help. (AMG already tried, it requires sparse structure of the matrix, it will take more time and memory on amg.)

GMRES is the next to be tried.

regularization term

implement global regularization term, associate with a weight function.

roughly speaking, TV regularization term takes form in

\int \grad f \cdot \grad f dx

which is actually a stiffness matrix. however, it might only exists on boundary or part of the domain, depends on the problem. Now, we just take stiffness.

OpenMP support revisit

Now can add OpenMP support to all other modules. However, meshing process is not necessay to be parallelized. And only some assembly process has to be faster.

Newton's method

sometimes the equation might not be linear, it is advisable to make it run with Newton's method if it is stable.

New storage

Since each nodes is stored twice in our model.

Now I think first intersection is enough.

typedef struct Raylet {
    int32_t elem;
    double first[2];
} Raylet;

which is 8 + 16 = 24 bytes(with padding). But have to set to guardian Raylet at last.

Assorted Linear Solvers

Now considering shift pyamg to C/C++ + mex combination, since Pyamg is totally serial, we could for sure avoid the annoying MPI problem with MATLAB.

TODO LIST: Some built-in helper functions

In PDEcs module,

there should be some functions like:

  • Integral over boundary
    • Specifiy all boundary or part of boundary
  • Integral over domain
    • Domain still cannot specify sub-domains
  • Solver specification
    • Use umfpack
    • Use AMG as preconditioner
      • Usually, it will be slower if matrix size is not large, in Matlab, there is no way to run a huge matrix, anyway.
  • Extract Neumann data
  • Extract Dirichlet data (however, this is not necessary)
  • Regularization terms
    • Total variation
    • Tikhonov
    • TSVD
    • Iterative
  • Gradient for the regularization terms
  • Objective function evaluation expenses estimate

lumped mass

use lumped mass to replace actual mass matrix to accelerate.

Assorted Linear Solvers

Need assorted solvers, now it uses MATLAB 's UMFPACK sparse solver, with reasonable performance.

ILUPACK's AMG solver is under integration into the project.

Considering using gmres or other Krylov methods.

Considering interfacing with Pardiso's solver, if MKL is present.

output format

If it is a vector, then column vector should be default.

Multigrid

For the solver, a refinement method is needed for meshing, along with upward projection operator and downward operator, i.e. mapping between grids.

Warning: this would take large memory for storing mesh mapping. Refining mesh now takes use of triangle's -r option, which reuse the previous nodes, but not the edges.

2016a

the libraries loading is not working for 2016a.

parallelize fmm

currently fmm are running serially. Now there are two options to do it.

  • cilkplus. supported by gcc-5/g++-5 and icc/icpc, easy to program and spawn new task.
  • tbb.
    • upward pass operates on source, bottom-up, on each level, the tasks are independent, which can run in parallel.
    • downward pass operates on tree, top-bottom, on each level, the tasks are also independent.

The concurrent queue should be a good choice.

  • each task may add new tasks append on end
  • when meeting current task, it should have all information needed from previous tasks.

use FDM to solve PAT

since FEM does not work well. The good news is FDM will be much more easier to implement and easy to use. Again, the PML is used, otherwise, we need a larger domain instead of [0, 1] x [0, 1] domain to let the wave pass through without bounce. consider the bouncing path, we do not need it to be too large.

delete quadtree

FMM is expected to be an more efficient version of treecode.

quadtree is not going to be in femex anymore.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.