Git Product home page Git Product logo

chitin's People

Contributors

samstudio8 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Forkers

unix0000 gsc0107

chitin's Issues

Dashboard

  • Ongoing experiments
  • Latest errors and file handler warnings

Is multiprocessing a good idea anyway?

I'm not even sure if we want a multiprocessed shell. Sure, it can run jobs in parallel and is trivial to script, and also needs no configuration. But does anyone want this? Is it because I just don't want to hammer at GNU Parallel? I don't know.

Apply command to...

I've been generating big runs of data in directories with UUIDs, and a lookup file that contains the UUID to parameters-what-were-used-to-generate-those-files. This has been pretty handy because all the files are uniquely identifiable and don't feature parameters that become unhelpful, or deprecated, etc. later. This also means I'm not messing about with stupid folder hierarchies: the filesystem is a crap abstraction for the representation of experiment properties.

Because of this hierarchy, I've found myself just applying operations to lists of UUID-named directories, so why not make this a part of chitin? We could just provide a script (or series of commands) and a bunch of UUIDs.

Should we be using `abspath`?

Currently chitin switches ALL paths to an absolute path, but if you wanted to wrap up and throw your workspace on another system, all your paths are suddenly incorrect...

How to address the problem of multiple users?

File metadata is stored per user, in their local database. Changes to files are monitored via use of chitin. So we have two points:

  • How to share file metadata between a group of users?
  • What happens when non-chitin users cause changes in the file system?

The first point falls in-line with the future development of permitting the database to be on a server instead of just local. I suspect we may have some work to do to ensure that history is processed and stored in the correct order if multiple users do things at once, but I think this will be fine. Caching may also be necessary so users have some history data for when they are offline? But we are a while from this right now anyway.

The second is unlikely to be addressable in a fashion I would like. Right now, chitin will always raise a warning about files that have changed outside of its knowledge, which is reasonable. After all, that is what I care about more than the history: now a user will know if somebody has messed with this file. Potential ways to catch this (on a shared computer at least), is to have an additional daemon or kernel module that captures some information - but the reason chitin works the way it does is because it seemed to be a rather easy way of getting this data in the first place! ;)

I would love to try and make a ZFS extension for this, but that's a long time away and possible beyond my time and ability anyway.

sqlalchemy error upon launch

Getting an error when launching for the first time. Install (as --user) went smoothly.

Traceback (most recent call last):
File "/homes/ccole/.local/bin/chitin", line 9, in
load_entry_point('chitin==0.0.1', 'console_scripts', 'chitin')()
File "/sw/opt/python/2.7.3/lib/python2.7/site-packages/pkg_resources/init.py", line 542, in load_entry_point
return get_distribution(dist).load_entry_point(group, name)
File "/sw/opt/python/2.7.3/lib/python2.7/site-packages/pkg_resources/init.py", line 2569, in load_entry_point
return ep.load()
File "/sw/opt/python/2.7.3/lib/python2.7/site-packages/pkg_resources/init.py", line 2229, in load
return self.resolve()
File "/sw/opt/python/2.7.3/lib/python2.7/site-packages/pkg_resources/init.py", line 2235, in resolve
module = import(self.module_name, fromlist=['name'], level=0)
File "build/bdist.linux-x86_64/egg/chitin/init.py", line 18, in
'BufrStubImagePlugin',
File "build/bdist.linux-x86_64/egg/chitin/util.py", line 11, in
File "build/bdist.linux-x86_64/egg/chitin/record.py", line 6, in
File "/cluster/gjb_lab/ccole/.local/lib/python2.7/site-packages/Flask_SQLAlchemy-2.1-py2.7.egg/flask_sqlalchemy/init.py", line 25, in
from sqlalchemy import orm, event, inspect
ImportError: cannot import name inspect

Could be a dependency issue. What version requirements are there?

multiprocessing tracking bug

For some stupid reason I thought adding multiprocessing to the mix was a great idea.
Now there is a ton of wonky shit going on simultaneously:

  • Potential to run sequential commands out-of-order
  • stderr and stdout interrupt the console
  • Zombie processes (ish)
  • Gross code. Specifically to handle apparently incorrect attributes
  • Bash script variable capturing broken (the context to pass back data is lost)
  • CPU usage wasted by empty zombies that poll and timeout
  • Files edited by multiple processes will not be trackable to particular Events

`chitin` should be a lab book instance

Seeing as one can't change directory in this shell, it could be considered as a manager for a given top level analysis directory. We could store the JSON (soon to be sqlite schema) in the same directory and switch to relative paths?
[See #4 ]

chitin3 Roadmap

In case you hadn't noticed, I trashed the entire chitin repo to make chitin2. It's around half as much less garbage as the last time and moves a little away from the idea of replacing your shell, but instead wrapping a script to keep track of what happens inside. I got overexcited in the last version, and made chitin a clever, parallelised shell that sent commands to a remote machine and allowed any chitin-capable shell to download and process the jobs. At this point, I realised I'd made a grid engine, so I've nuked the code base and started over: this time trying to remember the goal of chitin was to be a watchful guardian of your filesystem.
For a reminder of my November 2016 tirade that caused chitin to come into existence, check my blog.

Pretty much all the cool features of chitin1 are missing, but I plan to bring them back:

  • filetype handlers (eg. catching the number of alignments in a BAM)
  • command handlers (eg. remembering the alignment rate from bowtie2 stderr)
  • fetch or put a resource (file) to any other of your machines if a command needs it

I've finally made a business decision about the metadata storage part of chitin. I don't like sqlite, the database gets big, slow and locked. Originally the metadata was to be presented in the terminal (and it was), but we've outgrown this by necessity (commands and resources are linked together and I want you to click on them to find out stuff). Thus we're in your browser. The current version of chitin2 has an integrated webserver using Flask and SQLalchemy but this is troublesome for migration, and it was never my intention to bundle the shell-part and web-part together. Thus my roadmap includes:

  • Extracting all the web crap and deploying it to django instead; I'll be making a chitin-server repo soon. Django is definitely OTT for this, it's also wonderfully crafted, extendable, well-supported and has an excellent database migration system. chitin-server

Additional ideas of things that are to come:

  • An arbitrary resource monitor can ping the server with CPU/RAM info every minute or so, we'll then graph these between the start and end of a command on its detail page; alternatively, we could keep track of the PID for a command and try to keep more specific numbers.
  • Leverage all of your hashes and sell chitin-coins
  • Associate ENA ids with the chitin:\ resource locator
  • Dumping generated graphs and some text data for an experiment to the server

Early 2019 Stories

  • User should be able to see most recent commands run on a node
  • User should be able to see most recent resource changes on a node
  • User should be able to see a list of over-arching top-level "projects"
  • User should be able to see a list of "experiments" that belong to a project (eg. all the assemblies), ideally an API would be able to generate tables to present parameters/results

Late 2019 Stories

`EventSet` to house executions of `%script`

Seeing as we can now run bash scripts, it would be nice to group all of the Event objects (that is, the commands executed individually) together under some container. An EventSet seems like a reasonable solution. We can attach the input parameters, name, path, MD5 (or even a copy?) of the script in question to the EventSet such that is available to all ItemEvent.

We could also have total_wall metadata and such, too.

Allow re-running of an experiment automatically

I've just had to re-run an experiment, I don't need to generate new data, but rather just have an umbrella for a "new set of runs". It would be helpful if there was a CLI/API/Web option to request a new UUID to do this. Bonus points if we held a "parent" experiment or something.

We might need an intermediate class where Experiments have RunGroups with Runs, rather than Runs immediately belonging to an experiment.

[tracking] Pain points

This is really a tracking issue for me, that outlines the issues I have using chitin with my own workflow, but please feel free to add your own.

Make directory `Item`s more useful

Directory Items are somewhat useless. The hash of an Item that is also a directory was designed to detect changes in directories outside of chitin, but serves little purpose outside of the integrity check.

It would be much more useful if we could correspond the hash of a directory to a group of items. So I propose something like an ItemSet object that can represent a directory (or even a group of files belonging to a project, etc.). An Item could easily be in multiple ItemSet.

My ideas for "protecting" files could be instead applied to ItemSets (ie. flat out prevent clobbering).

An example of where this would be much more useful is tar: where we already capture the directory hash, but cannot easily work out what the file state was at that particular hash.

Have a FUSE backed file system

  • Enforce integrity rules with FUSE (ie. actually prevent writing, truncating)
  • Catch copies and moves with less hassle
  • Provide useful magic files that can automatically output the results of %history, or create tar archives
  • Mirrored versions of directories

How to work with scripts?

Could we "load in" and parse a script such that we can read all the commands it contains? Do we need to?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.