It would be convenient if hsds could "scale down" to

Are you thinking of something like this: <a href="https://github.com/HDFGroup/hsds/blo

This is implemented in PR <a class="issue-link js-issue-link" data-error-text="Failed

Hey <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url=

Make hsds an application about hsds HOT 10 CLOSED

hdfgroup commented on May 16, 2024

Make hsds an application

from hsds.

Comments (10)

jreadey commented on May 16, 2024

Are you thinking of something like this: https://github.com/HDFGroup/hsds/blob/master/docs/design/direct_access/direct_access.md?

from hsds.

t20100 commented on May 16, 2024

Yes pretty much, except that it could be run as a standalone server rather than being integrated into h5pyd, but I guess both can be achieved at once.

from hsds.

jreadey commented on May 16, 2024

This is implemented in PR #35, merged into master.

from hsds.

jreadey commented on May 16, 2024

Hey @t20100 - I'm not that familiar with aiohttp.Web.AppRunner, but it looks like it runs the server within the same process.
I think it would be more efficient to run head, SN, and DN nodes in their own processes so that the server could utilize multiple cores.

What would you think about kicking off a subprocess for each SN, DN node? (the head node could live in the parent process)

from hsds.

t20100 commented on May 16, 2024

Hi,

Yes, as it is everything runs in the same process, I expect it should be simple to spawn subprocesses instead (the most complicated part is probably to pass the config to subprocesses).
But I usually try to avoid subprocesses when I can, what do you think subprocess would improve?

from hsds.

jreadey commented on May 16, 2024

Thanks for confirming...

Here's an example where running in a subprocess would speed things up:

Say you have a SN worker and 4 DN workers. The SN gets a request to read a dataset selection crossing 10 chunks. The chunk read requests get spread out over the 4 workers. Each DN worker needs to fetch the chunk from S3 which is async, so not a big deal to do in one process (I think). But any uncompression will be hogging the CPU and blocking the other workers.

With docker I observe that we can get close to 100% CPU for each of 4 containers. For HSDS apps, being able to processes on their own core should be faster (ymmv).

from hsds.

t20100 commented on May 16, 2024

OK, I though no lengthy operation was done in either DN or SN loop, but if it is the case, yes, multiprocessing or maybe multithreading would be better.

BTW, when I did it, I set DN and SN count to one when started as application: https://github.com/HDFGroup/hsds/blob/master/hsds/app.py#L67 . That would need to be changed as well.

from hsds.

t20100 commented on May 16, 2024

I gave it a try with threads and more or less managed to make it passes the tests for 1 SN and 1 DN, but not with many... Where I stopped, I guessed that each DN should have a different config/env var DN_PORT.

The way the config is handled has changed (now in a yaml file), and the application would need to be updated accordingly.

I can't spend much time on this now, but I can come back on it later.

Also, related to this but which can be done independently, wouldn't it make sense to run the compression/decompression in a ThreadPoolExecutor (see https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.loop.run_in_executor) so as to leverage the multiple cores even from a single DN node?

from hsds.

jreadey commented on May 16, 2024

Maybe it would make sense to have sub-processes or not be an additional sys arg. I can look at this myself later today or tomorrow.

Running compression.decompression in a thread pool is an idea, but my assumption for most deployments is that the number of DN nodes would be set to the number of cores. Therefore having multiple threads per node may not help (or actually hurt performance).

from hsds.

t20100 commented on May 16, 2024

Yes, you don't want to have more threads+processes running at the same time than the number of cores.

from hsds.

Make hsds an application about hsds HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent