Comments (74)
Call graph with https://github.com/jrfonseca/gprof2dot
Okay this is just to much fun
from sverchok.
Total exec time: NodeTree.001 1.192e-01
DAG build time: 4.996e-04 0.4%
Nodes:
SvRxNumberLinspace: Linspace 5.717e-04 0.5%
SvRxNumberLinspace: Linspace.001 1.887e-03 1.6%
SvRxNumberLinspace: Linspace.002 1.819e-02 15.3%
SvRxDebugPrint: Debug Print 6.776e-02 56.8%
SvRxNumberLinspace: Linspace.003 3.020e-02 25.3%
Node call time total: 1.186e-01 99.5%
Functions:
Debug Print 11 6.656e-02 55.8%
Linspace 123 4.270e-02 35.8%
Functon call total time 1.093e-01 91.6%
from sverchok.
Not any sorting etc being done right now
As well as making it optional from ui
from sverchok.
This format makes it quite clear in which nodes times is spent as well how much time is spent there.
The overhead for recursing and building argument list etc surprises me and for being close to 50% of total runtime (node total vs function call total).
from sverchok.
Mesh out is also worth noticing the difference between total node time and the actual call (just doing some append).
Of course the star is transform, called 20 times to move the vertices still less than 1 % of runtime.
from sverchok.
So the biggest gains is working with the recursion code and possibly the @generator
function. This surprises me a bit but it is good to know, before doing this knowing this was impossible basically.
from sverchok.
I won't go all micro optimizing them quite yet but worth spending some time on it for sure
from sverchok.
@zeffii
If you have time somethings like this would be fantastic for sorting out where the time is spent.
nortikin/sverchok#500
from sverchok.
how about I push the bgl stub for stethoscope, and then make something that parses that time info into a neatly drawn bgl time-stats thing, with one or two modes..
from sverchok.
That would be great
from sverchok.
Right now it is very stupid and just add time stamps. In show_timings
is does some more useful parsing of it.
Order is of timestamps is as follows, all times are absolute.
NodeTree, start
Dag, start
Dag, stop
Node0: start
Func, start
Func, stop
#reapeated N times
Node0, stop
Node1, start
#etc
NodeTree, Stop
[('NodeTree', 52.14670506666667),
('DAG', 52.146708053333334),
('DAG', 52.14710613333333),
('SvRxNumberLinspace: Linspace.004', 52.147112533333335),
('Linspace', 52.14722389333333),
('Linspace', 52.14733141333333),
('SvRxNumberLinspace: Linspace.004', 52.147355733333335),
('SvRxNodeTrig: Trig', 52.14735957333333),
('Sincos', 52.14742528),
('Sincos', 52.14744064)]
....
('SvRxGeneratorTopology: Topology', 52.16390869333333),
('SvRxNodeMeshOut: Mesh out', 52.16391424),
('Mesh out', 52.16398592),
('Mesh out', 52.16398976),
('SvRxNodeMeshOut: Mesh out', 52.17281706666667),
('NodeTree', 52.1728192)]
from sverchok.
The green is the tims spend for each node, time in x direction, amount of function calls in y.
The red in x the time spend for each function call
Observations
- Yeah obviously the drawing needs some refining.
- The observed overhead with parsing etc was mostly due mesh out doing its work in
stop(self)
which is accounted for in the node time by not in function call time. - OpenGL is fun
- Every call within a node could be done in parallell
from sverchok.
The overhead for parsing/recursion etc seems to be around 10% in this example which seems resonable.
from sverchok.
Obviously some limits has to be placed on the drawing, both the total how much of each node to show and scale it reasonably.
from sverchok.
Somewhat changed formatting
Total exec time: NodeTree 0.062013
DAG build time: 0.001204 1.9%
Nodes:
SvRxNumberLinspace: Linspace.004 0.000566 0.9%
SvRxNodeTrig: Trig 0.000262 0.4%
SvRxNodeMath: Math 0.000259 0.4%
SvRxNodeVectorIn: Vector In 0.000468 0.8%
SvRxVirtualNode: VNode<Vector In> 0.000258 0.4%
SvRxNodeCreateMatrix: Create Matrix 0.010574 17.1%
SvRxNodeCircle: Circle 0.001040 1.7%
SvRxMatrixTransform: Transform 0.001379 2.2%
SvRxGeneratorTopology: Topology 0.014357 23.2%
SvRxNodeMeshOut: Mesh out 0.031510 50.8%
Node call time total: 0.060672 97.8%
Functions:
Add 1 0.000051 0.1%
Circle 1 0.000714 1.2%
Create Matrix 1 0.010027 16.2%
Cylinder 1 0.013970 22.5%
Linspace 1 0.000256 0.4%
Mesh out 3 0.031244 50.4%
Sincos 1 0.000034 0.1%
Transform 20 0.000475 0.8%
Vector In 2 0.000157 0.3%
Functon call total time 0.060672 91.8%
Compareable old layout
Processed Range Float in: 0.0003
Processed Math.001 in: 0.0005
Processed Vector in.001 in: 0.0004
Processed Circle in: 0.0005
Processed Math in: 0.0003
Processed Math.002 in: 0.0003
Processed Vector in in: 0.0005
Processed Matrix in in: 0.0020
Processed Matrix Apply in: 0.0028
Processed UV Connection in: 0.0076
Processed Viewer BMeshMK2 in: 0.0197
Node set updated in: 0.0348 seconds
This is a whole issue by itself... For later
from sverchok.
The nodes that make sense to optimize in this layout are clearly:
- Mesh out - vertices .for_each I guess
- Topology - numpyifed
- Create Matrix - needs to be investigated
from sverchok.
openGL is fun when you can start something fresh :) not sure i'd want to mess around with hover-over actions in openGL.. for those I would still defer to spawning a browser / js file reading some json.
but as it stands it's going be relatively informative already with minor additions.
from sverchok.
The text info is probably more useful than the graph, but it was helpful in realizing some things.
But dumping it to json and a zoomable view sounds like better proposition for indepth.
What troubles me is that we are slower then old Sverchok. Would like to look into if that is a systematic issue or not.
from sverchok.
one thing that I think might be an issue (and I found this in the stethoscope node) is if you do say... a print(self) in the __init__
, i was expecting to only get that printed once... but it is printed...100's of times. before the node is ever attached to anything.
from sverchok.
Yeah, making the node layout being cached is a issue to do soon. But building links and init all nodes etc takes 1-2 % of time for a small layout.
It is accounted under DAG build time.
from sverchok.
100 of times does sound excessive however, will look into it
from sverchok.
As it stands now it should be printed once before the node is connected.
from sverchok.
i was probably seeing things then..
from sverchok.
no.. if you move your mouse over the node's ui (properties) it will trigger
from sverchok.
Okay, will look into that.
from sverchok.
i can only imagine that this causes retrigs we aren't even aware of?
from sverchok.
Most definitely.
from sverchok.
Ok, now I realize what happens. I was a bit stupid.
from sverchok.
It is due to the UI draw call being dependent it... I wasn't happy with that but didn't see a clean solution. Will go back tomorrow.
It would impact interactive performance somewhat but not the execution
from sverchok.
The function responsible is the draw_buttons
below, when I wrote this it was before stateful
and basically just a dictionary lookup, I wasn't happy with this then either but went on. It is quick enough anyway
But the stateful class which does the lookup and instantiates the class for each draw call, when just the class
contains sufficient information for the lookup.
def draw_buttons(self, context, layout):
props = self.compile().properties
for name in props.keys():
layout.prop(self, name)
def compile(self):
return _node_funcs[self.bl_idname]
# but in the stateful class
def compile(self):
return NodeStateful.get_cls(self.bl_idname)(self)
@staticmethod
def get_cls(bl_idname):
return _node_classes[bl_idname]
from sverchok.
@zeffii
Ok took care of that problem, it now uses the actual class instead of an instance.
from sverchok.
not sure how much time i'll have for this over the next two days . but will keep an eye on the changes
from sverchok.
I will also have less time but I want make this viable, will try to tick off as many boxes as I can
from sverchok.
Made an np version for cylinder faces/edges, not tested for optimization yet.
Cylinder went from around 20% of runtime to less tan 2%
Total exec time: NodeTree.002 0.027394
DAG build time: 0.000438 1.6%
Nodes:
SvRxNumberLinspace: Linspace.004 0.000204 0.7%
SvRxNodeTrig: Trig 0.000094 0.3%
SvRxNodeMath: Math 0.000090 0.3%
SvRxNodeVectorIn: Vector In 0.000134 0.5%
SvRxVirtualNode: VNode<Vector In> 0.000075 0.3%
SvRxNodeCreateMatrix: Create Matrix 0.007697 28.1%
SvRxNodeCircle: Circle 0.000239 0.9%
SvRxMatrixTransform: Transform 0.001235 4.5%
SvRxGeneratorTopology: Topology 0.000478 1.7%
SvRxNodeMeshOut: Mesh out 0.016669 60.8%
Node call time total: 0.026916 98.3%
Functions:
Add 1 0.000016 0.1%
Circle 1 0.000162 0.6%
Create Matrix 1 0.007425 27.1%
Cylinder 1 0.000340 1.2%
Linspace 1 0.000093 0.3%
Mesh out 3 0.016617 60.7%
Sincos 1 0.000016 0.1%
Transform 100 0.000439 1.6%
Vector In 2 0.000052 0.2%
Functon call total time 0.025158 91.8%
Processed Range Float in: 0.0001
Processed Math in: 0.0002
Processed Math.002 in: 0.0002
Processed Vector in in: 0.0002
Processed Math.001 in: 0.0002
Processed Vector in.001 in: 0.0002
Processed Matrix in in: 0.0015
Processed Circle in: 0.0001
Processed Matrix Apply in: 0.0025
Processed UV Connection in: 0.0052
Processed Viewer BMeshMK2 in: 0.0144
Node set updated in: 0.0247 seconds
from sverchok.
I think we can expect a something similar from updating create matrix.
Mesh out can redone in a similar fashion as built-in .from_pydata
, which is also true for the old one.
from sverchok.
anything we can backport will be a bonus :)
# ? mesh.polygons.foreach_set('vertices', flat_face_keys) using loops?
mesh.edges.foreach_set('vertices', flat_edge_keys)
mesh.vertices.foreach_set('co', flat_verts)
from sverchok.
from sverchok.
For the old using the .from_pydata
and using mesh.validate might be good enough.
from sverchok.
Of course have to clear the mesh first.
from sverchok.
#51 looks good.
from sverchok.
For super profiling (perhaps excessive)
https://gist.github.com/ly29/ef8fb787457291bf8e3ad92f18c67ecf
https://docs.python.org/3.5/library/profile.html
Didn't know it would work inside of blender. Anyway see results in "NodeTree Profile" text in blender.
ng = D.node_groups['NodeTree']
ng.profile_execute()
from sverchok.
So with #51 done the method that takes longest time is create matrix.
SvRxNumberLinspace: Linspace.004 0.000171 1.0%
SvRxNodeTrig: Trig 0.000061 0.4%
SvRxNodeMath: Math 0.000055 0.3%
SvRxNodeVectorIn: Vector In 0.000083 0.5%
SvRxVirtualNode: VNode<Vector In> 0.000058 0.3%
SvRxNodeCreateMatrix: Create Matrix 0.008902 52.1%
SvRxNodeCircle: Circle 0.000253 1.5%
SvRxMatrixTransform: Transform 0.001233 7.2%
SvRxGeneratorTopology: Topology 0.000421 2.5%
SvRxNodeRxMeshOut: RxMesh out 0.005521 32.3%
It looks so innocent!
@node_func(bl_idname="SvRxNodeCreateMatrix")
@generator
def create_matrix(location: Vector = (0.0, 0.0, 0.0, 1.0),
scale: Vector = (1.0, 1.0, 1.0, 0.0),
rotation: Vector = (0.0, 0.0, 1.0, 0.0),
angle: Float = 0.0
) -> [Matrix]:
t = translation_matrix(location)
s = np.identity(4)
for i in range(3):
s[i,i] = scale[i]
r = rotation_matrix(np.radians(angle), rotation)
mat = concatenate_matrices(t, s, r)
return mat
from sverchok.
It would be cool if just selected nodes where profile, or maybe just active.
from sverchok.
So rotation matrix takes a lot of time. If somebody wants to dig into that I would be happy...
def rotation_matrix(angle, direction, point=None):
"""Return matrix to rotate about axis defined by point and direction.
...
"""
sina = math.sin(angle)
cosa = math.cos(angle)
direction = unit_vector(direction[:3])
# rotation matrix around unit vector
R = numpy.diag([cosa, cosa, cosa])
R += numpy.outer(direction, direction) * (1.0 - cosa)
direction *= sina
R += numpy.array([[ 0.0, -direction[2], direction[1]],
[ direction[2], 0.0, -direction[0]],
[-direction[1], direction[0], 0.0]])
M = numpy.identity(4)
M[:3, :3] = R
if point is not None:
# rotation not around origin
point = numpy.array(point[:3], dtype=numpy.float64, copy=False)
M[:3, 3] = point - numpy.dot(R, point)
return M
from sverchok.
Actually looking at the above it became clear it was basically the Rodriguez Rotation Formula spelled out and calculated to save annoying writing.
Googling that lead to stack overflow. Adopted as 4,4 directly.
http://mathworld.wolfram.com/RodriguesRotationFormula.html
http://stackoverflow.com/questions/6802577/python-rotation-of-3d-vector
def rotation_matrix2(axis, theta):
"""
Return the rotation matrix associated with counterclockwise rotation about
the given axis by theta radians.
Adapted from
http://stackoverflow.com/questions/6802577/python-rotation-of-3d-vector
"""
axis = axis[:3]
axis = axis/math.sqrt(np.dot(axis, axis))
a = math.cos(theta/2.0)
b, c, d = -axis*math.sin(theta/2.0)
aa, bb, cc, dd = a*a, b*b, c*c, d*d
bc, ad, ac, ab, bd, cd = b*c, a*d, a*c, a*b, b*d, c*d
return np.array([[aa+bb-cc-dd, 2*(bc+ad), 2*(bd-ac), 0],
[2*(bc-ad), aa+cc-bb-dd, 2*(cd+ab), 0],
[2*(bd+ac), 2*(cd-ab), aa+dd-bb-cc, 0,],
[0, 0, 0, 1]])
Where number 2 uses the expanded version
SvRxNodeCreateMatrix: Create Matrix 0.013027 28.5%
SvRxNodeCreateMatrix2: Create Matrix2 0.004904 10.7%
from sverchok.
Okay enough off playing with optimizations for now.
from sverchok.
Similar layout with the old Sverchok
from sverchok.
Old where the update list is also built
from sverchok.
So this is interesting, it is easy to see where time is spent, sv_deep_copy
would be the easiest way to speed up the old sverchok, all nodes that don't modify input lists should turn it off with deepcopy=False
, maybe it would be even quicker just to recalculate things instead of reusing data.
from sverchok.
These graphs are execellent tools to study where time is spent. Do look closer
from sverchok.
maybe it would be even quicker just to recalculate things instead of reusing data.
faster vs more memory ?
from sverchok.
The reason for the sv_deep_copy
is to not use changed memory, but most nodes, even though they don't change the incoming lists don't set the option deepcopy=False
, because setting that option should be the default.
In the simple example 30% of the time is spent in sv_deep_copy
which most nodes don't need.
But changing leads to subtle and hard to track down bugs...
So if 30% of the time is spent copying data then we could perhaps recalc the same data when needed. It is a thought and not a definitive statement.
from sverchok.
Fastest would be if nodes didn't change incoming data but that is tricky to verify.
from sverchok.
we could set it for those nodes that are used most. maybe it makes a dent
from sverchok.
Especially late in the node chain where the amount of data tends be larger, like viewer bmesh
from sverchok.
I don't mind doing another pass on the n1
nodes.
from sverchok.
Have deep look at this image, time in () is the time in the function, without is the time calling other functions + time inside so to speak
https://cloud.githubusercontent.com/assets/6241382/23023577/98bcd27c-f456-11e6-8599-35c578ad9c2d.png
from sverchok.
bmeshviewer geometry_from_sockets and get ..ho ho ho ho
from sverchok.
like you say, at the end of the node tree deepcopy=False
should almost be compulsory.
from sverchok.
Those graphs are truly fantastic tools
from sverchok.
We should do some for advanced layouts, this was a really simple one.
from sverchok.
a bit more complex :)
from sverchok.
Will test that :)
from sverchok.
100x100 torus
Added topology and split the list to use it...
Total exec time: NodeTree 0.113370
DAG build time: 0.000867 0.8%
Nodes:
SvRxNodeIntValue: Int input.001 0.000049 0.0%
SvRxNodeTrig: Trig.002 0.000061 0.1%
SvRxNodeMath: Math.003 0.000043 0.0%
SvRxNodeIntValue: Int input 0.000024 0.0%
SvRxNodeMath: Math.002 0.000025 0.0%
SvRxNodeTrig: Trig.001 0.000037 0.0%
SvRxNodeMath: Math 0.000030 0.0%
SvRxNumberLinspace: Linspace.001 0.000115 0.1%
SvRxNodeMath: Math.001 0.000028 0.0%
SvRxNodeTrig: Trig.003 0.000035 0.0%
SvRxNodeMath: Math.007 0.000048 0.0%
SvRxNumberLinspace: Linspace 0.000087 0.1%
SvRxNodeMath: Math.008 0.000047 0.0%
SvRxListRepeat: Repeat 0.000076 0.1%
SvRxListRepeat: Repeat.001 0.000077 0.1%
SvRxNodeTrig: Trig.004 0.000035 0.0%
SvRxNodeTrig: Trig 0.000257 0.2%
SvRxListRepeat: Repeat.002 0.000061 0.1%
SvRxNodeMath: Math.006 0.000047 0.0%
SvRxNodeMath: Math.005 0.000045 0.0%
SvRxNodeVectorIn: Vector In 0.000320 0.3%
SvRxNodeListSplit: Split 0.000497 0.4%
SvRxGeneratorTopology: Topology 0.000834 0.7%
SvRxNodeMeshOut: Mesh out 0.109570 96.6%
Node call time total: 0.112449 99.2%
Functions:
Add 1 0.000006 0.0%
Cosine 1 0.000006 0.0%
Div 2 0.000026 0.0%
Int input 6 0.000023 0.0%
Linspace 2 0.000127 0.1%
Mesh out 3 0.109513 96.6%
Mul 3 0.000034 0.0%
Np Repeat 1 0.000045 0.0%
Np Tile 2 0.000071 0.1%
Sincos 1 0.000217 0.2%
Sine 1 0.000006 0.0%
Split 1 0.000227 0.2%
Sub 2 0.000007 0.0%
Tau 2 0.000038 0.0%
Torus 1 0.000657 0.6%
Vector In 1 0.000250 0.2%
Functon call total time 0.111253 98.1%
Vs node.
Total exec time: NodeTree.001 0.071630
DAG build time: 0.000626 0.9%
Nodes:
SvRxNodeGenTorus: Torus 0.007764 10.8%
SvRxNodeMeshOut: Mesh out 0.063184 88.2%
Node call time total: 0.070948 99.0%
Functions:
Mesh out 3 0.062956 87.9%
Torus 1 0.007110 9.9%
Functon call total time 0.070065 97.8%
from sverchok.
Torus 3
torus_3.zip
Requires an update to work.
from sverchok.
Here the math nodes doesn't even contribute before mesh out...
from sverchok.
Switched around so using quicker mesh output, that can still crash blender so holding off a bit with making it the default.
from sverchok.
D.node_groups['NodeTree'].profile_execute(pstat_file='c:\\Users\\USERNAME\\pstat_file')
And then in shell.
gprof2dot -f pstats pstat_file | "C:\Program Files (x86)\Graphviz2.38\bin\dot.exe" -Tpng -o torus.png
Not really friend with shell in windows...
from sverchok.
@ly29 shell for windows? then investigate xonsh
, it's shell with python support :)
from sverchok.
is the reason everything ahead of the mesh output is so lightweight, because it's all different views of the same data, and 'light copies' (if that's a thing)...is the np stuff lazy until it reaches a final point ?
from sverchok.
quite happy to see the complex setup and the more 'optimal' node to be in the same order of magnitude time wise.
from sverchok.
to be brutally honest, i think writing the numpy directly is less of a headache than wiring the nodes.
from sverchok.
Yeah it is quite easy to write np directly.
Np is very quick :)
from sverchok.
Every node produces a new copy as a result, but incoming data isn't copied just stored.
The above is extra quick because it causes minimal recursion.
from sverchok.
Related Issues (20)
- Documentation HOT 8
- Int and Float socket can be limited HOT 2
- Mesh - rxdata_from_pydata HOT 28
- BMesh nodes HOT 10
- Dynamic Socket Issues HOT 3
- API to simplify mesh creation HOT 3
- The great namning debacle HOT 18
- Node wishlist HOT 48
- stethoscope v2 HOT 44
- startup error message HOT 1
- .foreach mesh out HOT 15
- torus np scriptnode HOT 39
- Rx Bridges (remakes of real bridges) HOT 1
- Texture script node HOT 52
- Index Viewer HOT 70
- Linspace and arrange HOT 16
- Some refactoring going on HOT 5
- OpenGL viwers HOT 39
- vectorization concept HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sverchok.