Git Product home page Git Product logo

at_cascade's People

Contributors

bradbell avatar garland-culbreth avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

at_cascade's Issues

predict_all generates multiprocessing jobs that can fail due to long directory names

Multiprocessing is failing when long directory names are passed. Due to the shared infrastructure we're working with, as well as the additional directories generated by at-cascade, it's likely that a directory that exceeds the limit (around 100 characters) will be passed sometimes.

Traceback example:

Process SyncManager-3:
Traceback (most recent call last):
File "/usr/lib/python3.11/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.11/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3.11/multiprocessing/managers.py", line 592, in _run_server
server = cls._Server(registry, address, authkey, serializer)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/managers.py", line 156, in init
self.listener = Listener(address=address, backlog=16)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 447, in init
self._listener = SocketListener(address, family, backlog)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 590, in init
self._socket.bind(address)
OSError: AF_UNIX path too long
Traceback (most recent call last):
File "", line 1, in
File "/{path redacted}/at_cascade/csv/predict.py", line 867, in predict
predict_all(fit_dir, sim_dir,
File "/{path redacted}/at_cascade/csv/predict.py", line 531, in predict_all
manager = multiprocessing.Manager()
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/context.py", line 57, in Manager
m.start()
File "/usr/lib/python3.11/multiprocessing/managers.py", line 567, in start
self._address = reader.recv()
^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 249, in recv
buf = self._recv_bytes()
^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 413, in _recv_bytes
buf = self._recv(4)
^^^^^^^^^^^^^
File "/usr/lib/python3.11/multiprocessing/connection.py", line 382, in _recv
raise EOFError
EOFError

The issue appears to come in the predict_all() method:
https://github.com/bradbell/at_cascade/blob/499a7b680a387469af2a659c7b747acea69ed0f3/at_cascade/csv/predict.py#L440C4-L440C11

I suspect a quick solution would be to set the working directory to the fit directory and replace {fit_dir} with "./" to use relative subdirectories of fit_dir but would need to test if changing the working directory or replacing all instances of {fit_dir} in that function would cause issues.

csv.predict: pre_user_csv.py: Cannot find a file

at_cascade-2024.1.30:
It appears that sometimes one of the prediction jobs does not create its output files and pre_user_csv crashes when it tries to use those files. I have seen this twice and decided to report it the second time.

In the case below, there is a begin for 124_Central_Latin_Am.female but there is no end notice for this job.

Predict: n_predict = 7, n_spawn = 3
Begin: 20:08:57: predict 1_Earth.both
Begin: 20:08:57: predict 103_Latin_Am_Caribbean.female
Begin: 20:08:57: predict 103_Latin_Am_Caribbean.male
Begin: 20:08:57: predict 124_Central_Latin_Am.female
End:   01:46:31: predict 103_Latin_Am_Caribbean.female 1/7
Begin: 01:46:31: predict 124_Central_Latin_Am.male
End:   01:46:37: predict 103_Latin_Am_Caribbean.male 2/7
Begin: 01:46:37: predict 130_Mexico.female
End:   01:46:50: predict 1_Earth.both 3/7
Begin: 01:46:50: predict 130_Mexico.male
End:   02:31:41: predict 124_Central_Latin_Am.male 4/7
End:   02:32:51: predict 130_Mexico.male 5/7
End:   02:33:08: predict 130_Mexico.female 6/7
Traceback (most recent call last):
  File "/home/bradbell/trash/./run_cascade.py", line 18, in <module>
    at_cascade.csv.predict(fit_dir, sim_dir, start_job_name, max_node_depth)
  File "/home/bradbell/.local/lib/python3.12/site-packages/at_cascade/csv/predict.py", line 446, in predict
    at_cascade.csv.pre_parallel(
  File "/home/bradbell/.local/lib/python3.12/site-packages/at_cascade/csv/pre_parallel.py", line 285, in pre_parallel
    at_cascade.csv.pre_user_csv(
  File "/home/bradbell/.local/lib/python3.12/site-packages/at_cascade/csv/pre_user_csv.py", line 157, in pre_user_csv
    assert os.path.isfile(file_name)
AssertionError

Bug: s_out points don't match with max_fit

A fit had s_out points which didn't match up with the max_fit setting specified.
The file path to the folder where this issue occurred is in the slack thread, submitting as a bug report after the meeting this morning.

db2csv command fails because avgint table moves after sample table creation error

Impacted versions

dismod_at version: dismod_at-20231229
at_cascade version: 2023.12.22

Description of issue

When at_cascade.csv.fit() errors and fails during sample table creation, the avgint table is still moved to c_shift_avgint after the error occurs. This causes subsequent db2csv commands to fail because that command expects a non-empty avgint table.

Screenshot of a log table from a node where this occurred:
Screenshot 2024-01-29 at 15 49 16

This might be preventable by changing the table moving behavior to leave avgint in place if an error occurs during sample table creation.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.