the application client front end to my graph database is single threaded. so in situat

Fastest Way to Exfiltrate a Matrix about graphblas HOT 3 CLOSED

victorstewart commented on August 14, 2024

Fastest Way to Exfiltrate a Matrix

from graphblas.

Comments (3)

DrTimothyAldenDavis commented on August 14, 2024

The timings are misleading in this example. You've built the matrix with setElement, which is the slower way to do it, and it also leaves the matrix as a pile of unsorted tuples. If you want to time just the dup, you would need to put a GrB_wait outside the timer, first. Otherwise, GrB_Matrix_dup must first call GrB_wait itself on the matrix, which is doing the work of GrB_Matrix_build on the unsorted tuples from setElement.

Serialization is likely the best way to send a matrix across a network. That's what it's designed for. An alternative would be to use no compression at all, which leads to a faster serialize time, but requires more bytes to send. Another alternative would be to use GxB_Matrix_unpack to unpack the matrix in O(1) time, transmit the pieces, and then GxB_Matrix_pack it again.

from graphblas.

victorstewart commented on August 14, 2024

magic!!! i knew about needing to wait for operations to execute of course but i didn't realize it applied to setElement as well.

this order of magnitude drop in time cost buys me a lot of time until i have to worry about this problem again. thanks so much.

i see now that the serialization without compression is essentially an allocate + memcpy as well. so i'll just copy the matrix on the "main thread" then serialize it with GxB_COMPRESSION_ZSTD on a worker thread.

the memory consumption of the matrix also fell to 37.6% of the original after GrB_MATERIALIZE. so that's only ~8 bytes per uint8_t element instead of the ~22 bytes i assumed before. so now the memory size differential between the serialized matrix and the GrB_Matrix is on the order of 2.8x whereas before it was on the order of 7.4x, so ~2.7x smaller. so much easier to justify overcommitting the database's memory to account for this transient memory need of copy then compress than before.

i need to preserve the matrices so the unpack way isn't an option, but its time cost would be the same as these other ways.

root@clr-b5df9984821e4d129387e172044f5754~ # ./graphreplication.test
milliseconds to GrB_MATERIALIZE: 3354ms
1000000x1000000 matrix with 100000000 elements consumes 770.5MB
milliseconds to allocate + copy the matrix: 76ms
milliseconds to allocate + serialize the matrix with compression: 776ms
size of serialized compressed matrix: 275.6MB
milliseconds to allocate + serialize the matrix NO compression: 72ms
size of serialized matrix without compression: 770.5MB

from graphblas.

victorstewart commented on August 14, 2024

circling back on this. i realized the optimal solution is to fork the process to get copy-on-write matrices, and then serialize on that new process. this requires no downtime and the absolute minimal extra memory cost.

from graphblas.

Fastest Way to Exfiltrate a Matrix about graphblas HOT 3 CLOSED

Comments (3)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent