Some timings, non optional in branch right now, prints into SVRX_timings text file in

<a target="_blank" rel="noopener noreferrer nofollow" href="https://cloud.githubuserco

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Timings,about sverchok/sverchok

Comments (74)

ly29 commented on June 3, 2024 2

Call graph with https://github.com/jrfonseca/gprof2dot

Okay this is just to much fun

from sverchok.

ly29 commented on June 3, 2024 1

Total exec time:  NodeTree.001 1.192e-01
DAG build time:  4.996e-04 0.4%
Nodes:
SvRxNumberLinspace: Linspace            5.717e-04   0.5%        
SvRxNumberLinspace: Linspace.001        1.887e-03   1.6%        
SvRxNumberLinspace: Linspace.002        1.819e-02   15.3%       
SvRxDebugPrint: Debug Print             6.776e-02   56.8%       
SvRxNumberLinspace: Linspace.003        3.020e-02   25.3%       
Node call time total: 1.186e-01 99.5%
Functions:
Debug Print                   11        6.656e-02   55.8%       
Linspace                      123       4.270e-02   35.8%       
Functon call total time 1.093e-01 91.6%

from sverchok.

ly29 commented on June 3, 2024

Not any sorting etc being done right now
As well as making it optional from ui

from sverchok.

ly29 commented on June 3, 2024

This format makes it quite clear in which nodes times is spent as well how much time is spent there.
The overhead for recursing and building argument list etc surprises me and for being close to 50% of total runtime (node total vs function call total).

from sverchok.

ly29 commented on June 3, 2024

Mesh out is also worth noticing the difference between total node time and the actual call (just doing some append).
Of course the star is transform, called 20 times to move the vertices still less than 1 % of runtime.

from sverchok.

ly29 commented on June 3, 2024

So the biggest gains is working with the recursion code and possibly the @generator function. This surprises me a bit but it is good to know, before doing this knowing this was impossible basically.

from sverchok.

ly29 commented on June 3, 2024

I won't go all micro optimizing them quite yet but worth spending some time on it for sure

from sverchok.

ly29 commented on June 3, 2024

@zeffii
If you have time somethings like this would be fantastic for sorting out where the time is spent.
nortikin/sverchok#500

from sverchok.

zeffii commented on June 3, 2024

how about I push the bgl stub for stethoscope, and then make something that parses that time info into a neatly drawn bgl time-stats thing, with one or two modes..

from sverchok.

ly29 commented on June 3, 2024

That would be great

from sverchok.

ly29 commented on June 3, 2024

Right now it is very stupid and just add time stamps. In show_timings is does some more useful parsing of it.

Order is of timestamps is as follows, all times are absolute.

NodeTree, start
Dag, start
Dag, stop
Node0: start
Func, start
Func, stop
#reapeated N times
Node0, stop
Node1, start
#etc

NodeTree, Stop

[('NodeTree', 52.14670506666667),
 ('DAG', 52.146708053333334),
 ('DAG', 52.14710613333333),
 ('SvRxNumberLinspace: Linspace.004', 52.147112533333335),
 ('Linspace', 52.14722389333333),
 ('Linspace', 52.14733141333333),
 ('SvRxNumberLinspace: Linspace.004', 52.147355733333335),
 ('SvRxNodeTrig: Trig', 52.14735957333333),
 ('Sincos', 52.14742528),
 ('Sincos', 52.14744064)]
....
 ('SvRxGeneratorTopology: Topology', 52.16390869333333),
 ('SvRxNodeMeshOut: Mesh out', 52.16391424),
 ('Mesh out', 52.16398592),
 ('Mesh out', 52.16398976),
 ('SvRxNodeMeshOut: Mesh out', 52.17281706666667),
 ('NodeTree', 52.1728192)]

from sverchok.

ly29 commented on June 3, 2024

The green is the tims spend for each node, time in x direction, amount of function calls in y.
The red in x the time spend for each function call

Observations

Yeah obviously the drawing needs some refining.
The observed overhead with parsing etc was mostly due mesh out doing its work in stop(self) which is accounted for in the node time by not in function call time.
OpenGL is fun
Every call within a node could be done in parallell

from sverchok.

ly29 commented on June 3, 2024

The overhead for parsing/recursion etc seems to be around 10% in this example which seems resonable.

from sverchok.

ly29 commented on June 3, 2024

Obviously some limits has to be placed on the drawing, both the total how much of each node to show and scale it reasonably.

from sverchok.

ly29 commented on June 3, 2024

Somewhat changed formatting

Total exec time:  NodeTree 0.062013
DAG build time:  0.001204 1.9%
Nodes:
SvRxNumberLinspace: Linspace.004        0.000566    0.9%        
SvRxNodeTrig: Trig                      0.000262    0.4%        
SvRxNodeMath: Math                      0.000259    0.4%        
SvRxNodeVectorIn: Vector In             0.000468    0.8%        
SvRxVirtualNode: VNode<Vector In>       0.000258    0.4%        
SvRxNodeCreateMatrix: Create Matrix     0.010574    17.1%       
SvRxNodeCircle: Circle                  0.001040    1.7%        
SvRxMatrixTransform: Transform          0.001379    2.2%        
SvRxGeneratorTopology: Topology         0.014357    23.2%       
SvRxNodeMeshOut: Mesh out               0.031510    50.8%       
Node call time total:  0.060672 97.8%
Functions:
Add                           1         0.000051    0.1%        
Circle                        1         0.000714    1.2%        
Create Matrix                 1         0.010027    16.2%       
Cylinder                      1         0.013970    22.5%       
Linspace                      1         0.000256    0.4%        
Mesh out                      3         0.031244    50.4%       
Sincos                        1         0.000034    0.1%        
Transform                     20        0.000475    0.8%        
Vector In                     2         0.000157    0.3%        
Functon call total time 0.060672 91.8%

Compareable old layout

Processed  Range Float in: 0.0003
Processed  Math.001 in: 0.0005
Processed  Vector in.001 in: 0.0004
Processed  Circle in: 0.0005
Processed  Math in: 0.0003
Processed  Math.002 in: 0.0003
Processed  Vector in in: 0.0005
Processed  Matrix in in: 0.0020
Processed  Matrix Apply in: 0.0028
Processed  UV Connection in: 0.0076
Processed  Viewer BMeshMK2 in: 0.0197
Node set updated in: 0.0348 seconds

This is a whole issue by itself... For later

from sverchok.

ly29 commented on June 3, 2024

The nodes that make sense to optimize in this layout are clearly:

Mesh out - vertices .for_each I guess
Topology - numpyifed
Create Matrix - needs to be investigated

from sverchok.

zeffii commented on June 3, 2024

openGL is fun when you can start something fresh :) not sure i'd want to mess around with hover-over actions in openGL.. for those I would still defer to spawning a browser / js file reading some json.

but as it stands it's going be relatively informative already with minor additions.

from sverchok.

ly29 commented on June 3, 2024

The text info is probably more useful than the graph, but it was helpful in realizing some things.
But dumping it to json and a zoomable view sounds like better proposition for indepth.

What troubles me is that we are slower then old Sverchok. Would like to look into if that is a systematic issue or not.

from sverchok.

zeffii commented on June 3, 2024

one thing that I think might be an issue (and I found this in the stethoscope node) is if you do say... a print(self) in the __init__ , i was expecting to only get that printed once... but it is printed...100's of times. before the node is ever attached to anything.

from sverchok.

ly29 commented on June 3, 2024

Yeah, making the node layout being cached is a issue to do soon. But building links and init all nodes etc takes 1-2 % of time for a small layout.

It is accounted under DAG build time.

from sverchok.

ly29 commented on June 3, 2024

100 of times does sound excessive however, will look into it

from sverchok.

ly29 commented on June 3, 2024

As it stands now it should be printed once before the node is connected.

from sverchok.

zeffii commented on June 3, 2024

i was probably seeing things then..

from sverchok.

zeffii commented on June 3, 2024

no.. if you move your mouse over the node's ui (properties) it will trigger

from sverchok.

ly29 commented on June 3, 2024

Okay, will look into that.

from sverchok.

zeffii commented on June 3, 2024

i can only imagine that this causes retrigs we aren't even aware of?

from sverchok.

ly29 commented on June 3, 2024

Most definitely.

from sverchok.

ly29 commented on June 3, 2024

Ok, now I realize what happens. I was a bit stupid.

from sverchok.

ly29 commented on June 3, 2024

It is due to the UI draw call being dependent it... I wasn't happy with that but didn't see a clean solution. Will go back tomorrow.

It would impact interactive performance somewhat but not the execution

from sverchok.

ly29 commented on June 3, 2024

The function responsible is the draw_buttons below, when I wrote this it was before stateful and basically just a dictionary lookup, I wasn't happy with this then either but went on. It is quick enough anyway

But the stateful class which does the lookup and instantiates the class for each draw call, when just the class contains sufficient information for the lookup.

    def draw_buttons(self, context, layout):
        props = self.compile().properties

        for name in props.keys():
            layout.prop(self, name)

    def compile(self):
        return _node_funcs[self.bl_idname]

# but in the stateful class

    def compile(self):
        return NodeStateful.get_cls(self.bl_idname)(self)

    @staticmethod
    def get_cls(bl_idname):
        return _node_classes[bl_idname]

from sverchok.

ly29 commented on June 3, 2024

@zeffii
Ok took care of that problem, it now uses the actual class instead of an instance.

from sverchok.

zeffii commented on June 3, 2024

not sure how much time i'll have for this over the next two days . but will keep an eye on the changes

from sverchok.

ly29 commented on June 3, 2024

I will also have less time but I want make this viable, will try to tick off as many boxes as I can

from sverchok.

ly29 commented on June 3, 2024

Made an np version for cylinder faces/edges, not tested for optimization yet.
Cylinder went from around 20% of runtime to less tan 2%

Total exec time:  NodeTree.002 0.027394
DAG build time:  0.000438 1.6%
Nodes:
SvRxNumberLinspace: Linspace.004        0.000204    0.7%        
SvRxNodeTrig: Trig                      0.000094    0.3%        
SvRxNodeMath: Math                      0.000090    0.3%        
SvRxNodeVectorIn: Vector In             0.000134    0.5%        
SvRxVirtualNode: VNode<Vector In>       0.000075    0.3%        
SvRxNodeCreateMatrix: Create Matrix     0.007697    28.1%       
SvRxNodeCircle: Circle                  0.000239    0.9%        
SvRxMatrixTransform: Transform          0.001235    4.5%        
SvRxGeneratorTopology: Topology         0.000478    1.7%        
SvRxNodeMeshOut: Mesh out               0.016669    60.8%       
Node call time total:  0.026916 98.3%
Functions:
Add                           1         0.000016    0.1%        
Circle                        1         0.000162    0.6%        
Create Matrix                 1         0.007425    27.1%       
Cylinder                      1         0.000340    1.2%        
Linspace                      1         0.000093    0.3%        
Mesh out                      3         0.016617    60.7%       
Sincos                        1         0.000016    0.1%        
Transform                     100       0.000439    1.6%        
Vector In                     2         0.000052    0.2%        
Functon call total time 0.025158 91.8%

Processed  Range Float in: 0.0001
Processed  Math in: 0.0002
Processed  Math.002 in: 0.0002
Processed  Vector in in: 0.0002
Processed  Math.001 in: 0.0002
Processed  Vector in.001 in: 0.0002
Processed  Matrix in in: 0.0015
Processed  Circle in: 0.0001
Processed  Matrix Apply in: 0.0025
Processed  UV Connection in: 0.0052
Processed  Viewer BMeshMK2 in: 0.0144
Node set updated in: 0.0247 seconds

from sverchok.

ly29 commented on June 3, 2024

I think we can expect a something similar from updating create matrix.
Mesh out can redone in a similar fashion as built-in .from_pydata, which is also true for the old one.

from sverchok.

zeffii commented on June 3, 2024

anything we can backport will be a bonus :)

# ? mesh.polygons.foreach_set('vertices', flat_face_keys) using loops?
mesh.edges.foreach_set('vertices', flat_edge_keys)
mesh.vertices.foreach_set('co', flat_verts)

from sverchok.

ly29 commented on June 3, 2024

@zeffii
See #51

from sverchok.

ly29 commented on June 3, 2024

For the old using the .from_pydata and using mesh.validate might be good enough.

from sverchok.

ly29 commented on June 3, 2024

Of course have to clear the mesh first.

from sverchok.

zeffii commented on June 3, 2024

#51 looks good.

from sverchok.

ly29 commented on June 3, 2024

For super profiling (perhaps excessive)
https://gist.github.com/ly29/ef8fb787457291bf8e3ad92f18c67ecf
https://docs.python.org/3.5/library/profile.html

Didn't know it would work inside of blender. Anyway see results in "NodeTree Profile" text in blender.

ng = D.node_groups['NodeTree']
ng.profile_execute()

from sverchok.

ly29 commented on June 3, 2024

So with #51 done the method that takes longest time is create matrix.

SvRxNumberLinspace: Linspace.004        0.000171    1.0%        
SvRxNodeTrig: Trig                      0.000061    0.4%        
SvRxNodeMath: Math                      0.000055    0.3%        
SvRxNodeVectorIn: Vector In             0.000083    0.5%        
SvRxVirtualNode: VNode<Vector In>       0.000058    0.3%        
SvRxNodeCreateMatrix: Create Matrix     0.008902    52.1%       
SvRxNodeCircle: Circle                  0.000253    1.5%        
SvRxMatrixTransform: Transform          0.001233    7.2%        
SvRxGeneratorTopology: Topology         0.000421    2.5%        
SvRxNodeRxMeshOut: RxMesh out           0.005521    32.3%

It looks so innocent!

@node_func(bl_idname="SvRxNodeCreateMatrix")
@generator
def create_matrix(location: Vector = (0.0, 0.0, 0.0, 1.0),
                  scale: Vector = (1.0, 1.0, 1.0, 0.0),
                  rotation: Vector = (0.0, 0.0, 1.0, 0.0),
                  angle: Float = 0.0
                  ) -> [Matrix]:

    t = translation_matrix(location)
    s = np.identity(4)
    for i in range(3):
        s[i,i] = scale[i]
    r = rotation_matrix(np.radians(angle), rotation)
    mat = concatenate_matrices(t, s, r)
    return mat

from sverchok.

ly29 commented on June 3, 2024

It would be cool if just selected nodes where profile, or maybe just active.

from sverchok.

ly29 commented on June 3, 2024

So rotation matrix takes a lot of time. If somebody wants to dig into that I would be happy...

def rotation_matrix(angle, direction, point=None):
    """Return matrix to rotate about axis defined by point and direction.
   ...
    """
    sina = math.sin(angle)
    cosa = math.cos(angle)
    direction = unit_vector(direction[:3])
    # rotation matrix around unit vector
    R = numpy.diag([cosa, cosa, cosa])
    R += numpy.outer(direction, direction) * (1.0 - cosa)
    direction *= sina
    R += numpy.array([[ 0.0,         -direction[2],  direction[1]],
                      [ direction[2], 0.0,          -direction[0]],
                      [-direction[1], direction[0],  0.0]])
    M = numpy.identity(4)
    M[:3, :3] = R
    if point is not None:
        # rotation not around origin
        point = numpy.array(point[:3], dtype=numpy.float64, copy=False)
        M[:3, 3] = point - numpy.dot(R, point)
    return M

from sverchok.

ly29 commented on June 3, 2024

Actually looking at the above it became clear it was basically the Rodriguez Rotation Formula spelled out and calculated to save annoying writing.

Googling that lead to stack overflow. Adopted as 4,4 directly.
http://mathworld.wolfram.com/RodriguesRotationFormula.html
http://stackoverflow.com/questions/6802577/python-rotation-of-3d-vector

def rotation_matrix2(axis, theta):
    """
    Return the rotation matrix associated with counterclockwise rotation about
    the given axis by theta radians.
    Adapted from
    http://stackoverflow.com/questions/6802577/python-rotation-of-3d-vector
    """
    axis = axis[:3]
    axis = axis/math.sqrt(np.dot(axis, axis))
    a = math.cos(theta/2.0)
    b, c, d = -axis*math.sin(theta/2.0)
    aa, bb, cc, dd = a*a, b*b, c*c, d*d
    bc, ad, ac, ab, bd, cd = b*c, a*d, a*c, a*b, b*d, c*d
    return np.array([[aa+bb-cc-dd, 2*(bc+ad), 2*(bd-ac), 0],
                     [2*(bc-ad), aa+cc-bb-dd, 2*(cd+ab), 0],
                     [2*(bd+ac), 2*(cd-ab), aa+dd-bb-cc, 0,],
                     [0,         0,         0,           1]])

Where number 2 uses the expanded version

SvRxNodeCreateMatrix: Create Matrix     0.013027    28.5%       
SvRxNodeCreateMatrix2: Create Matrix2   0.004904    10.7%

from sverchok.

ly29 commented on June 3, 2024

Okay enough off playing with optimizations for now.

from sverchok.

ly29 commented on June 3, 2024

Similar layout with the old Sverchok

from sverchok.

ly29 commented on June 3, 2024

Old where the update list is also built

from sverchok.

ly29 commented on June 3, 2024

So this is interesting, it is easy to see where time is spent, sv_deep_copy would be the easiest way to speed up the old sverchok, all nodes that don't modify input lists should turn it off with deepcopy=False, maybe it would be even quicker just to recalculate things instead of reusing data.

from sverchok.

ly29 commented on June 3, 2024

These graphs are execellent tools to study where time is spent. Do look closer

from sverchok.

zeffii commented on June 3, 2024

maybe it would be even quicker just to recalculate things instead of reusing data.

faster vs more memory ?

from sverchok.

ly29 commented on June 3, 2024

The reason for the sv_deep_copy is to not use changed memory, but most nodes, even though they don't change the incoming lists don't set the option deepcopy=False, because setting that option should be the default.

In the simple example 30% of the time is spent in sv_deep_copy which most nodes don't need.

But changing leads to subtle and hard to track down bugs...

So if 30% of the time is spent copying data then we could perhaps recalc the same data when needed. It is a thought and not a definitive statement.

from sverchok.

ly29 commented on June 3, 2024

Fastest would be if nodes didn't change incoming data but that is tricky to verify.

from sverchok.

zeffii commented on June 3, 2024

we could set it for those nodes that are used most. maybe it makes a dent

from sverchok.

ly29 commented on June 3, 2024

Especially late in the node chain where the amount of data tends be larger, like viewer bmesh

from sverchok.

zeffii commented on June 3, 2024

I don't mind doing another pass on the n1 nodes.

from sverchok.

ly29 commented on June 3, 2024

Have deep look at this image, time in () is the time in the function, without is the time calling other functions + time inside so to speak
https://cloud.githubusercontent.com/assets/6241382/23023577/98bcd27c-f456-11e6-8599-35c578ad9c2d.png

from sverchok.

zeffii commented on June 3, 2024

bmeshviewer geometry_from_sockets and get ..ho ho ho ho

from sverchok.

zeffii commented on June 3, 2024

like you say, at the end of the node tree deepcopy=False should almost be compulsory.

from sverchok.

ly29 commented on June 3, 2024

Those graphs are truly fantastic tools

from sverchok.

ly29 commented on June 3, 2024

We should do some for advanced layouts, this was a really simple one.

from sverchok.

zeffii commented on June 3, 2024

torus_2.zip

a bit more complex :)

from sverchok.

ly29 commented on June 3, 2024

Will test that :)

from sverchok.

ly29 commented on June 3, 2024

100x100 torus
Added topology and split the list to use it...

Total exec time:  NodeTree 0.113370
DAG build time:  0.000867 0.8%
Nodes:
SvRxNodeIntValue: Int input.001         0.000049    0.0%        
SvRxNodeTrig: Trig.002                  0.000061    0.1%        
SvRxNodeMath: Math.003                  0.000043    0.0%        
SvRxNodeIntValue: Int input             0.000024    0.0%        
SvRxNodeMath: Math.002                  0.000025    0.0%        
SvRxNodeTrig: Trig.001                  0.000037    0.0%        
SvRxNodeMath: Math                      0.000030    0.0%        
SvRxNumberLinspace: Linspace.001        0.000115    0.1%        
SvRxNodeMath: Math.001                  0.000028    0.0%        
SvRxNodeTrig: Trig.003                  0.000035    0.0%        
SvRxNodeMath: Math.007                  0.000048    0.0%        
SvRxNumberLinspace: Linspace            0.000087    0.1%        
SvRxNodeMath: Math.008                  0.000047    0.0%        
SvRxListRepeat: Repeat                  0.000076    0.1%        
SvRxListRepeat: Repeat.001              0.000077    0.1%        
SvRxNodeTrig: Trig.004                  0.000035    0.0%        
SvRxNodeTrig: Trig                      0.000257    0.2%        
SvRxListRepeat: Repeat.002              0.000061    0.1%        
SvRxNodeMath: Math.006                  0.000047    0.0%        
SvRxNodeMath: Math.005                  0.000045    0.0%        
SvRxNodeVectorIn: Vector In             0.000320    0.3%        
SvRxNodeListSplit: Split                0.000497    0.4%        
SvRxGeneratorTopology: Topology         0.000834    0.7%        
SvRxNodeMeshOut: Mesh out               0.109570    96.6%       
Node call time total:  0.112449 99.2%
Functions:
Add                           1         0.000006    0.0%        
Cosine                        1         0.000006    0.0%        
Div                           2         0.000026    0.0%        
Int input                     6         0.000023    0.0%        
Linspace                      2         0.000127    0.1%        
Mesh out                      3         0.109513    96.6%       
Mul                           3         0.000034    0.0%        
Np Repeat                     1         0.000045    0.0%        
Np Tile                       2         0.000071    0.1%        
Sincos                        1         0.000217    0.2%        
Sine                          1         0.000006    0.0%        
Split                         1         0.000227    0.2%        
Sub                           2         0.000007    0.0%        
Tau                           2         0.000038    0.0%        
Torus                         1         0.000657    0.6%        
Vector In                     1         0.000250    0.2%        
Functon call total time 0.111253 98.1%

Vs node.

Total exec time:  NodeTree.001 0.071630
DAG build time:  0.000626 0.9%
Nodes:
SvRxNodeGenTorus: Torus                 0.007764    10.8%       
SvRxNodeMeshOut: Mesh out               0.063184    88.2%       
Node call time total:  0.070948 99.0%
Functions:
Mesh out                      3         0.062956    87.9%       
Torus                         1         0.007110    9.9%        
Functon call total time 0.070065 97.8%

from sverchok.

ly29 commented on June 3, 2024

Torus 3
torus_3.zip
Requires an update to work.

from sverchok.

ly29 commented on June 3, 2024

Here the math nodes doesn't even contribute before mesh out...

from sverchok.

ly29 commented on June 3, 2024

Switched around so using quicker mesh output, that can still crash blender so holding off a bit with making it the default.

from sverchok.

ly29 commented on June 3, 2024

D.node_groups['NodeTree'].profile_execute(pstat_file='c:\\Users\\USERNAME\\pstat_file')

And then in shell.

gprof2dot -f pstats pstat_file | "C:\Program Files (x86)\Graphviz2.38\bin\dot.exe" -Tpng -o torus.png

Not really friend with shell in windows...

from sverchok.

zeffii commented on June 3, 2024

@ly29 shell for windows? then investigate xonsh , it's shell with python support :)

from sverchok.

zeffii commented on June 3, 2024

is the reason everything ahead of the mesh output is so lightweight, because it's all different views of the same data, and 'light copies' (if that's a thing)...is the np stuff lazy until it reaches a final point ?

from sverchok.

zeffii commented on June 3, 2024

quite happy to see the complex setup and the more 'optimal' node to be in the same order of magnitude time wise.

from sverchok.

zeffii commented on June 3, 2024

to be brutally honest, i think writing the numpy directly is less of a headache than wiring the nodes.

from sverchok.

ly29 commented on June 3, 2024

Yeah it is quite easy to write np directly.

Np is very quick :)

from sverchok.

ly29 commented on June 3, 2024

Every node produces a new copy as a result, but incoming data isn't copied just stored.
The above is extra quick because it causes minimal recursion.

from sverchok.

Timings about sverchok HOT 74 CLOSED

Comments (74)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent