Comments (6)
This is due to the serial implementation using the same code as in parallel. For example, it calculates the dual graph for partitioning the mesh (not needed in serial).
from dolfinx.
Confirming that this is still an issue:
DOLFINx
python3 -c"import dolfinx; from mpi4py import MPI; dolfinx.UnitCubeMesh(MPI.COMM_WORLD, 100, 100, 100)"
real 0m11.179s
user 0m9.546s
sys 0m1.611s
DOLFIN:
fenics@f596c6c6e934:/root/shared$ time python3 -c"from dolfin import *; UnitCubeMesh(MPI.comm_world, 100, 100, 100)"
real 0m1.046s
user 0m0.907s
sys 0m0.140s
from dolfinx.
To avoid computing the dual graph in serial we can call a custom partitioner, which sets the destination of all cells to process 0 (in serial):
from mpi4py import MPI
import dolfinx
import numpy
def serial_partitioner(mpi_comm, nparts, tdim, cells, ghost_mode):
dest = numpy.zeros(cells.num_nodes, dtype=numpy.int32)
return dolfinx.cpp.graph.AdjacencyList_int32(dest)
mesh = dolfinx.UnitCubeMesh(
MPI.COMM_WORLD, 100, 100, 100, partitioner=serial_partitioner)
dolfinx.list_timings(MPI.COMM_WORLD, [dolfinx.TimingType.wall])
with custom partitioner:
real 0m6.091s
user 0m4.249s
sys 0m2.458s
[MPI_AVG] Summary of timings | reps wall avg wall tot
------------------------------------------------------------------------------------------
Build BoxMesh | 1 4.254135 4.254135
Build dofmap data | 1 1.060930 1.060930
Compute SCOTCH graph re-ordering | 1 0.141512 0.141512
Compute dof reordering map | 1 0.666798 0.666798
Compute local-to-local map | 1 0.068058 0.068058
Compute-local-to-global links for global/local adjacency list | 1 0.042887 0.042887
Distribute in graph creation AdjacencyList | 1 0.509226 0.509226
Fetch float data from remote processes | 1 0.029057 0.029057
Init dofmap from element dofmap | 1 0.343332 0.343332
SCOTCH: call SCOTCH_graphBuild | 1 0.000490 0.000490
SCOTCH: call SCOTCH_graphOrder | 1 0.121626 0.121626
TOPOLOGY: Create sets | 1 0.735610 0.735610
with standard partitioner:
real 0m9.116s
user 0m6.854s
sys 0m2.900s
[MPI_AVG] Summary of timings | reps wall avg wall tot
------------------------------------------------------------------------------------------
Build BoxMesh | 1 7.280152 7.280152
Build dofmap data | 1 1.074936 1.074936
Compute SCOTCH graph re-ordering | 1 0.140453 0.140453
Compute dof reordering map | 1 0.675271 0.675271
Compute graph partition (SCOTCH) | 1 0.338212 0.338212
Compute local part of mesh dual graph | 1 2.617081 2.617081
Compute local-to-local map | 1 0.069505 0.069505
Compute non-local part of mesh dual graph | 1 0.047709 0.047709
Compute-local-to-global links for global/local adjacency list | 1 0.044120 0.044120
Distribute in graph creation AdjacencyList | 1 0.515640 0.515640
Extract partition boundaries from SCOTCH graph | 1 0.029006 0.029006
Fetch float data from remote processes | 1 0.032896 0.032896
Get SCOTCH graph data | 1 0.000000 0.000000
Init dofmap from element dofmap | 1 0.348799 0.348799
SCOTCH: call SCOTCH_dgraphBuild | 1 0.003080 0.003080
SCOTCH: call SCOTCH_dgraphHalo | 1 0.035761 0.035761
SCOTCH: call SCOTCH_dgraphPart | 1 0.190264 0.190264
SCOTCH: call SCOTCH_graphBuild | 1 0.000497 0.000497
SCOTCH: call SCOTCH_graphOrder | 1 0.120793 0.120793
TOPOLOGY: Create sets | 1 0.739001 0.739001
from dolfinx.
Updated syntax:
time python -c "from dolfinx.mesh import create_unit_cube, CellType; from mpi4py import MPI; create_unit_cube(MPI.COMM_WORLD, 100, 100, 100, cell_type=CellType.tetrahedron)"
from dolfinx.
def serial_partitioner(mpi_comm, nparts, tdim, cells, ghost_mode): dest = numpy.zeros(cells.num_nodes, dtype=numpy.int32) return dolfinx.cpp.graph.AdjacencyList_int32(dest)
New syntax:
def serial_partitioner(comm, n, m, topo):
dest = np.zeros(topo.num_nodes, dtype=np.int32)
return dolfinx.cpp.graph.AdjacencyList_int32(dest)
from dolfinx.
OK, so we could (automatically) just call this "null" partitioner, when running in serial. It knocks of about 25% of the time. On my mac, it goes down from about 11s to 8s.
However, if we look at the timings, with dolfinx.list_timings
, we see:
Compute local part of mesh dual graph | 1 2.886924 2.886924
Topology: create | 1 3.307674 3.307674
The local dual graph is still computed, because it is used for reordering. Probably this didn't happen in old dolfin, which is why it is so fast to create a simple mesh. I really wonder if we shouldn't just close this issue as "won't fix"...
from dolfinx.
Related Issues (20)
- ParMETIS not found when building `dolfinx` with `docker/Dockerfile.end-user` HOT 1
- `arm64` step of docker end user workflow is failing without a meaningful log HOT 4
- New non-matching meshes interpolation algorithm is extremely expensive HOT 1
- Rewrite FindUFCx.cmake
- Reduce number of build threads in Docker end-user image builds HOT 5
- Add tests of quadrature elements HOT 3
- Setting local to global mapping before preallocation HOT 2
- [BUG]: Importing pandas causes error in io.gmshio.read_from_mesh HOT 11
- `IndexMap::create_submap` reshuffles ghost nodes: is this expected? HOT 6
- `nanobind` leaks memory in builtin mesh generation? HOT 2
- Regression in Python `assemble_vector` performance after nanobind migration HOT 17
- Nanobinding for `NewtonSolver.set_update` is broken HOT 1
- Cannot import library from dolfinx(Create 2D mesh) HOT 7
- [BUG]: `test_collapse` fails on 4 processes HOT 1
- Add support for Quadrature Element Symmetry
- [Unexpected Behavior]: Multiplying gradient with ScalarType leads to ValueError HOT 2
- Quad Precision FEM in Fenicsx HOT 7
- [BUG]: VTXWriter outputs erroneous data in parallel when outputting collapsed space HOT 11
- [BUG]: Incorrect equality of elements with identical parameters but different layouts HOT 2
- [BUG]: `:v0.7.x` ARM images all fail `python/unit/test/mesh/test_higher_order_mesh.py`
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dolfinx.