Git Product home page Git Product logo

taccjm's People

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Forkers

bpachev

taccjm's Issues

Sessions Dying

Investigate and get more logging as to why sessions are becoming inactive. Implement specific JM checks in heartbeat:

ERROR:taccjm.TACCJobManager:list_files - Unknown error trying to access .: SSH session not active Traceback (most recent call last): File "/usr/local/lib/python3.9/wsgiref/handlers.py", line 137, in run self.result = application(self.environ, self.start_response) File "/usr/local/lib/python3.9/site-packages/falcon/api.py", line 269, in __call__ responder(req, resp, **params) File "/usr/local/lib/python3.9/site-packages/hug/interface.py", line 947, in __call__ raise exception File "/usr/local/lib/python3.9/site-packages/hug/interface.py", line 918, in __call__ self.call_function(input_parameters), context, request, response, **kwargs File "/usr/local/lib/python3.9/site-packages/hug/interface.py", line 840, in call_function return self.interface(**parameters) File "/usr/local/lib/python3.9/site-packages/hug/interface.py", line 129, in __call__ return __hug_internal_self._function(*args, **kwargs) File "/usr/local/lib/python3.9/site-packages/taccjm/taccjm_server.py", line 163, in list_files files = JM[jm_id].list_files(path=path) File "/usr/local/lib/python3.9/site-packages/taccjm/TACCJobManager.py", line 492, in list_files raise e File "/usr/local/lib/python3.9/site-packages/taccjm/TACCJobManager.py", line 459, in list_files with self._client.open_sftp() as sftp: File "/usr/local/lib/python3.9/site-packages/paramiko/client.py", line 558, in open_sftp return self._transport.open_sftp_client() File "/usr/local/lib/python3.9/site-packages/paramiko/transport.py", line 1142, in open_sftp_client return SFTPClient.from_transport(self) File "/usr/local/lib/python3.9/site-packages/paramiko/sftp_client.py", line 164, in from_transport chan = t.open_session( File "/usr/local/lib/python3.9/site-packages/paramiko/transport.py", line 920, in open_session return self.open_channel( File "/usr/local/lib/python3.9/site-packages/paramiko/transport.py", line 1014, in open_channel raise SSHException("SSH session not active") paramiko.ssh_exception.SSHException: SSH session not active 127.0.0.1 - - [30/Sep/2022 21:40:19] "GET /l1/files/list HTTP/1.1" 500 59

Downloading files from public directories

TACCJM throws an error on downloading a file from any public directory that a user has read but not write access to:

stdout : tar (child): /work2/06307/clos21/pub/adcirc/inputs/ShinnecockInlet/mesh/def.tar.gz: Cannot open: Permission denied

This bug is happening because TACCJM tries to tar whatever you want to download before downloading it. However it tar’s it in the same directory as the data is in, which is a public directory that only I have write access to.

Fix is to have TACCJM tar the contents in another temp directory (lets say the JM's trash directory), and then download the tarred file from there. This also is a good implementation change as it handles automatically trash clean-up of partial files on failed download attempts.

:bug: Run Script not working with args specified

run_script error
---------------------------------------------------------------------------
TACCJMError                               Traceback (most recent call last)
Input In [63], in <module>
----> 1 res = tjm.run_script('l1', 'adcirc_compile', args=["v55.01", "https://github.com/cdelcastillo21/adcirc-cg.git", "1"])

File ~/repos/taccjm/src/taccjm/taccjm.py:1297, in run_script(jm_id, script_name, job_id, args)
   1295     e.message = f"run_script error"
   1296     logger.error(e.message)
-> 1297     raise e
   1299 return res

File ~/repos/taccjm/src/taccjm/taccjm.py:1293, in run_script(jm_id, script_name, job_id, args)
   1291 data = {'script_name': script_name, 'job_id': job_id,  'args': args}
   1292 try:
-> 1293     res = api_call('PUT', f"{jm_id}/scripts/run", data)
   1294 except TACCJMError as e:
   1295     e.message = f"run_script error"

File ~/repos/taccjm/src/taccjm/taccjm.py:177, in api_call(http_method, end_point, data)
    175     return json.loads(res.text)
    176 else:
--> 177     raise TACCJMError(res)

TACCJMError: args : 'list' object is not callable

TACCJM version 0.0.2

:bug: List Files Return Type

TACCJobManager class list_files routine returns list of dictionaries with file info, but says it returns list of strings, which is what list_files in taccjm_server.py is expecting.

Unable to parse json errors

Getting useless 'unable to parse json errors' when TACCJMErrors occurs. Fix for better messages.

>>> job = tjm.deploy_job('l1',local_job_dir='/Users/carlos/repos/pyadcirc/apps/adcirc', proj_conf
... ig_file='/Users/carlos/repos/pyadcirc/apps/adcirc/ls6.ini')
deploy_job error
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/carlos/repos/taccjm/src/taccjm/taccjm.py", line 857, in deploy_job
    raise e
  File "/Users/carlos/repos/taccjm/src/taccjm/taccjm.py", line 853, in deploy_job
    res = api_call('POST', f"{jm_id}/jobs/deploy", data)
  File "/Users/carlos/repos/taccjm/src/taccjm/taccjm.py", line 177, in api_call
    raise TACCJMError(res)
taccjm.exceptions.TACCJMError: <Response [500]> unable to parse json errors

<Response [500]> unable to parse json errors

Fix Handling of Stale/Errored Job Managers

No way currently to clean-up stale/dead job managers, and the error message provided is not helpful for the user (just throws a general 500 server error). Need to fix.

DAG Simulation Framework

Implement DAG Simulation Framework.

1.) Base simulation class should have a 'parent' field, with parent simulation.
2.) _dag object -> Graph, _sims dictionary -> Maps simulation ID/name to DAG point.
3.) Entrypoint for sim is a call to a task list, that has DAG built into dependencies.
4.) run() into job.

Ideally want a framework that allows for the following (example):

sim = ADCPREP(parent=None)
sim = PADCIRC(parent=sim)
sim = ADCIRCOutputCompress(parent=sim)

sim.run()

i.e. - Can chain simulations.

  1. TaskQueue handles execution of DAG.

DesignSafe Initializing TACCJM in Jupyter Envs

Error when trying to initialize tjm processes from within jupyter environment. This example is from trying to init JM within designsafe:

   1819     if errno_num != 0:
   1820         err_msg = os.strerror(errno_num)
-> 1821     raise child_exception_type(errno_num, err_msg, err_filename)
   1822 raise child_exception_type(err_msg)

FileNotFoundError: [Errno 2] No such file or directory: 'hug'```

Works if in as separate terminal window within jupyter one navigates to proper conda environment where installed and then initializes via a python repl. 

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.