bradbell / at_cascade Goto Github PK
View Code? Open in Web Editor NEWCascading Dismod_at Analysis From Parent To Child Regions
Home Page: https://at-cascade.readthedocs.io
Cascading Dismod_at Analysis From Parent To Child Regions
Home Page: https://at-cascade.readthedocs.io
Multiprocessing is failing when long directory names are passed. Due to the shared infrastructure we're working with, as well as the additional directories generated by at-cascade, it's likely that a directory that exceeds the limit (around 100 characters) will be passed sometimes.
Traceback example:
Process SyncManager-3:
Traceback (most recent call last):
File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.11/multiprocessing/managers.py", line 592, in _run_server
server = cls._Server(registry, address, authkey, serializer)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/managers.py", line 156, in init
self.listener = Listener(address=address, backlog=16)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 447, in init
self._listener = SocketListener(address, family, backlog)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 590, in init
self._socket.bind(address)
OSError: AF_UNIX path too long
Traceback (most recent call last):
File "", line 1, in
File "/{path redacted}/at_cascade/csv/predict.py", line 867, in predict
predict_all(fit_dir, sim_dir,
File "/{path redacted}/at_cascade/csv/predict.py", line 531, in predict_all
manager = multiprocessing.Manager()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/context.py", line 57, in Manager
m.start()
File "/usr/lib/python3.11/multiprocessing/managers.py", line 567, in start
self._address = reader.recv()
^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 249, in recv
buf = self._recv_bytes()
^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 413, in _recv_bytes
buf = self._recv(4)
^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 382, in _recv
raise EOFError
EOFError
The issue appears to come in the predict_all() method:
https://github.com/bradbell/at_cascade/blob/499a7b680a387469af2a659c7b747acea69ed0f3/at_cascade/csv/predict.py#L440C4-L440C11
I suspect a quick solution would be to set the working directory to the fit directory and replace {fit_dir} with "./" to use relative subdirectories of fit_dir but would need to test if changing the working directory or replacing all instances of {fit_dir} in that function would cause issues.
at_cascade-2024.1.30:
It appears that sometimes one of the prediction jobs does not create its output files and pre_user_csv crashes when it tries to use those files. I have seen this twice and decided to report it the second time.
In the case below, there is a begin for 124_Central_Latin_Am.female but there is no end notice for this job.
Predict: n_predict = 7, n_spawn = 3
Begin: 20:08:57: predict 1_Earth.both
Begin: 20:08:57: predict 103_Latin_Am_Caribbean.female
Begin: 20:08:57: predict 103_Latin_Am_Caribbean.male
Begin: 20:08:57: predict 124_Central_Latin_Am.female
End: 01:46:31: predict 103_Latin_Am_Caribbean.female 1/7
Begin: 01:46:31: predict 124_Central_Latin_Am.male
End: 01:46:37: predict 103_Latin_Am_Caribbean.male 2/7
Begin: 01:46:37: predict 130_Mexico.female
End: 01:46:50: predict 1_Earth.both 3/7
Begin: 01:46:50: predict 130_Mexico.male
End: 02:31:41: predict 124_Central_Latin_Am.male 4/7
End: 02:32:51: predict 130_Mexico.male 5/7
End: 02:33:08: predict 130_Mexico.female 6/7
Traceback (most recent call last):
File "/home/bradbell/trash/./run_cascade.py", line 18, in <module>
at_cascade.csv.predict(fit_dir, sim_dir, start_job_name, max_node_depth)
File "/home/bradbell/.local/lib/python3.12/site-packages/at_cascade/csv/predict.py", line 446, in predict
at_cascade.csv.pre_parallel(
File "/home/bradbell/.local/lib/python3.12/site-packages/at_cascade/csv/pre_parallel.py", line 285, in pre_parallel
at_cascade.csv.pre_user_csv(
File "/home/bradbell/.local/lib/python3.12/site-packages/at_cascade/csv/pre_user_csv.py", line 157, in pre_user_csv
assert os.path.isfile(file_name)
AssertionError
A fit had s_out points which didn't match up with the max_fit setting specified.
The file path to the folder where this issue occurred is in the slack thread, submitting as a bug report after the meeting this morning.
dismod_at version: dismod_at-20231229
at_cascade version: 2023.12.22
When at_cascade.csv.fit()
errors and fails during sample table creation, the avgint table is still moved to c_shift_avgint after the error occurs. This causes subsequent db2csv
commands to fail because that command expects a non-empty avgint table.
Screenshot of a log table from a node where this occurred:
This might be preventable by changing the table moving behavior to leave avgint in place if an error occurs during sample table creation.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.