bradbell / at_cascade Goto Github PK

View Code? Open in Web Editor NEW

4.0 4.0 3.0 1.95 MB

Cascading Dismod_at Analysis From Parent To Child Regions

Home Page: https://at-cascade.readthedocs.io

Shell 1.86% Python 98.14%

at_cascade's People

Contributors

Stargazers

Watchers

Forkers

garland-culbreth bradley-m-bell

at_cascade's Issues

predict_all generates multiprocessing jobs that can fail due to long directory names

Multiprocessing is failing when long directory names are passed. Due to the shared infrastructure we're working with, as well as the additional directories generated by at-cascade, it's likely that a directory that exceeds the limit (around 100 characters) will be passed sometimes.

Traceback example:

Process SyncManager-3:
Traceback (most recent call last):
File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.11/multiprocessing/managers.py", line 592, in _run_server
server = cls._Server(registry, address, authkey, serializer)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/managers.py", line 156, in init
self.listener = Listener(address=address, backlog=16)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 447, in init
self._listener = SocketListener(address, family, backlog)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 590, in init
self._socket.bind(address)
OSError: AF_UNIX path too long
Traceback (most recent call last):
File "", line 1, in
File "/{path redacted}/at_cascade/csv/predict.py", line 867, in predict
predict_all(fit_dir, sim_dir,
File "/{path redacted}/at_cascade/csv/predict.py", line 531, in predict_all
manager = multiprocessing.Manager()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/context.py", line 57, in Manager
m.start()
File "/usr/lib/python3.11/multiprocessing/managers.py", line 567, in start
self._address = reader.recv()
^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 249, in recv
buf = self._recv_bytes()
^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 413, in _recv_bytes
buf = self._recv(4)
^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 382, in _recv
raise EOFError
EOFError

The issue appears to come in the predict_all() method:
https://github.com/bradbell/at_cascade/blob/499a7b680a387469af2a659c7b747acea69ed0f3/at_cascade/csv/predict.py#L440C4-L440C11

I suspect a quick solution would be to set the working directory to the fit directory and replace {fit_dir} with "./" to use relative subdirectories of fit_dir but would need to test if changing the working directory or replacing all instances of {fit_dir} in that function would cause issues.

csv.predict: pre_user_csv.py: Cannot find a file

at_cascade-2024.1.30:
It appears that sometimes one of the prediction jobs does not create its output files and pre_user_csv crashes when it tries to use those files. I have seen this twice and decided to report it the second time.

In the case below, there is a begin for 124_Central_Latin_Am.female but there is no end notice for this job.

Predict: n_predict = 7, n_spawn = 3
Begin: 20:08:57: predict 1_Earth.both
Begin: 20:08:57: predict 103_Latin_Am_Caribbean.female
Begin: 20:08:57: predict 103_Latin_Am_Caribbean.male
Begin: 20:08:57: predict 124_Central_Latin_Am.female
End:   01:46:31: predict 103_Latin_Am_Caribbean.female 1/7
Begin: 01:46:31: predict 124_Central_Latin_Am.male
End:   01:46:37: predict 103_Latin_Am_Caribbean.male 2/7
Begin: 01:46:37: predict 130_Mexico.female
End:   01:46:50: predict 1_Earth.both 3/7
Begin: 01:46:50: predict 130_Mexico.male
End:   02:31:41: predict 124_Central_Latin_Am.male 4/7
End:   02:32:51: predict 130_Mexico.male 5/7
End:   02:33:08: predict 130_Mexico.female 6/7
Traceback (most recent call last):
  File "/home/bradbell/trash/./run_cascade.py", line 18, in <module>
    at_cascade.csv.predict(fit_dir, sim_dir, start_job_name, max_node_depth)
  File "/home/bradbell/.local/lib/python3.12/site-packages/at_cascade/csv/predict.py", line 446, in predict
    at_cascade.csv.pre_parallel(
  File "/home/bradbell/.local/lib/python3.12/site-packages/at_cascade/csv/pre_parallel.py", line 285, in pre_parallel
    at_cascade.csv.pre_user_csv(
  File "/home/bradbell/.local/lib/python3.12/site-packages/at_cascade/csv/pre_user_csv.py", line 157, in pre_user_csv
    assert os.path.isfile(file_name)
AssertionError

Bug: s_out points don't match with max_fit

A fit had s_out points which didn't match up with the max_fit setting specified.
The file path to the folder where this issue occurred is in the slack thread, submitting as a bug report after the meeting this morning.

db2csv command fails because avgint table moves after sample table creation error

Impacted versions

dismod_at version: dismod_at-20231229
at_cascade version: 2023.12.22

Description of issue

When at_cascade.csv.fit() errors and fails during sample table creation, the avgint table is still moved to c_shift_avgint after the error occurs. This causes subsequent db2csv commands to fail because that command expects a non-empty avgint table.

Screenshot of a log table from a node where this occurred:

This might be preventable by changing the table moving behavior to leave avgint in place if an error occurs during sample table creation.

bradbell / at_cascade Goto Github PK

at_cascade's People

Contributors

Stargazers

Watchers

Forkers

at_cascade's Issues

predict_all generates multiprocessing jobs that can fail due to long directory names

csv.predict: pre_user_csv.py: Cannot find a file

Bug: s_out points don't match with max_fit

db2csv command fails because avgint table moves after sample table creation error

Impacted versions

Description of issue

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent