openworm / owmeta Goto Github PK

View Code? Open in Web Editor NEW

147.0 32.0 49.0 5.08 MB

Unified, simple data access python library for data & facts about C. elegans anatomy

License: MIT License

Python 61.67% TeX 0.60% Prolog 36.98% Shell 0.75%

rdf metadata

owmeta's People

Contributors

Stargazers

Watchers

owmeta's Issues

Update readme.md and intro.rst for alpha0.5 branch

Draw examples out of the test cases you have built. Always good to update docs in sync with the update to the code!

Add cell division volume data

Part of #27

Collect division volume data (see the devoworm repo for possible data sources and contact @balicea for guidance in interpreting them)
Clean up collected data (extract, normalize, etc.)
Add a DataTranslator / DataSource in PyOpenWorm.data_trans which includes the data.

setup.py install error

Calling setup.py install --user in the main directory results in this:

...
Processing PyOpenWorm-0.0.1_alpha-py2.7.egg
Removing /home/markw/.local/lib/python2.7/site-packages/PyOpenWorm-0.0.1_alpha-py2.7.egg
Copying PyOpenWorm-0.0.1_alpha-py2.7.egg to /home/markw/.local/lib/python2.7/site-packages
PyOpenWorm 0.0.1-alpha is already the active version in easy-install.pth

Installed /home/markw/.local/lib/python2.7/site-packages/PyOpenWorm-0.0.1_alpha-py2.7.egg
Processing dependencies for PyOpenWorm==0.0.1-alpha
Traceback (most recent call last):
  File "setup.py", line 29, in <module>
    'Topic :: Scientific/Engineering']
  File "/usr/lib/python2.7/distutils/core.py", line 152, in setup
    dist.run_commands()
  File "/usr/lib/python2.7/distutils/dist.py", line 953, in run_commands
    self.run_command(cmd)
  File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
    cmd_obj.run()
  File "build/bdist.linux-x86_64/egg/setuptools/command/install.py", line 73, in run
  File "build/bdist.linux-x86_64/egg/setuptools/command/install.py", line 101, in do_egg_install
  File "build/bdist.linux-x86_64/egg/setuptools/command/easy_install.py", line 360, in run

  File "build/bdist.linux-x86_64/egg/setuptools/command/easy_install.py", line 584, in easy_install

  File "build/bdist.linux-x86_64/egg/setuptools/command/easy_install.py", line 635, in install_item

  File "build/bdist.linux-x86_64/egg/setuptools/command/easy_install.py", line 686, in process_distribution

  File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 599, in resolve
    plugin_env, full_env=None, installer=None, fallback=True
  File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 2439, in requires

  File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 2424, in _dep_map
    except ValueError:
  File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 2914, in split_sections

  File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 2111, in yield_lines
    def _remove_md5_fragment(location):
  File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 2453, in _get_metadata
    @property
  File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 1370, in get_metadata_lines
    )
  File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 1362, in get_metadata

  File "build/bdist.linux-x86_64/egg/pkg_resources.py", line 1427, in _get
    os.write(outf, self.loader.get_data(zip_path))
zipimport.ZipImportError: bad local file header in /home/markw/.local/lib/python2.7/site-packages/PyOpenWorm-0.0.1_alpha-py2.7.egg

implement Neuron.get_neighbors

Ideal spec:

Get a list of neighboring neurons.  

:param type: What kind of junction to look for.  
0=all, 1=gap junctions only, 2=all chemical synapses 3=incoming chemical synapses, 4=outgoing chemical synapses
 :returns: a list of neuron names
 :rtype: List

Stub is here: https://github.com/openworm/PyOpenWorm/blob/master/PyOpenWorm/neuron.py#L376

Have a single place in docs that describes the list of data sources

installing requirements.txt has errors

Here's the way it terminates--the text below is displayed in red:

Command C:\Python27\python.exe -c "import setuptools, tokenize;__file__='c:\\users\\central computers\\appdata\\local\\t
emp\\pip_build_Central Computers\\lxml\\setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace(
'\r\n', '\n'), __file__, 'exec'))" install --record "c:\users\central computers\appdata\local\temp\pip-gyputi-record\ins
tall-record.txt" --single-version-externally-managed --compile failed with error code 1 in c:\users\central computers\ap
pdata\local\temp\pip_build_Central Computers\lxml
Storing debug log for failure in C:\Users\Central Computers\pip\pip.log

Other errors occur before the process terminates. Should they be ignored?

Running setup.py install doesn't install requirements in Alpha0.5

For some reason running setup.py install, the requirements.txt packages are not actually installed. Tried this on a fresh virtualenv and it complained that networkx wasn't installed, even though it is in the requirements.txt file. Running pip install -r requirements.txt works, and have added this to the INSTALL.md instructions but this should be improved to happen in one step.

Add a separate 'update' database for testing

Add a database setup script

A script must be created to install the worm database in an appropriate location upon install of PyOpenWorm. Also, see: #17
Requirements:

The script must not be run by services such as Travis-CI or ReadTheDocs
The script should be run automatically during setup for a regular user
The database installed must correspond to the contents of the database serialization included in the distributed package.
The default configuration file must point to the installed database

Docs failing to build

Managed object-saving

Currently, the user must keep track of the graph of objects in order to save them. It would be preferable to track creation and linking of objects automatically.
The beginnings of the accounting system are included in dataObject.py as addToOpenSet and removeFromOpeSet methods.

Below, "object" refers to a data object as defined in dataObject.py.

The proposed algorithm has a notion of a "closed" and "open" set (if you're familiar with the mathematical notions of closed and open sets, forgive me for the confusion):

Objects are added to the open set on creation.
When an object is linked to another object, as through an object property or as any of the Property objects, it is removed from the open set and placed in the closed set.
Objects which have been added to the closed set cannot be again added to the closed set or removed from the open set.
To save all of the objects, each member in the open set is saved.

Requirements

It MUST be possible to create and delete objects without adding them to closed/open sets for purposes of testing.
It MAY be permissible to have objects normally saved on disconnect; however, if this is the case, then there MUST be a means of saving objects in advance of disconnect for the purpose of testing and multi-step scripts. Additionally, if objects are saved on disconnect, there must be an option to disable this behavior on and after connect.

In other words, there may be a coupling of this automated-saving with connect/disconnect, but the coupling MUST be optional, even if it is the default.
Objects SHOULD NOT be multiple-times included in dataset saved; however, such multiplicity does not affect the correctness of the algorithm.
The graph by which objects are saved MAY be that created by the existing triples calls.

If the existing machinery is replaced with, for example, a Visitor pattern, care must be taken to ensure that the resulting structure accounts for all of the uses of the triples graph. Leaving calls to linked the triples methods of linked objects would result in the multiplicity which we wish to avoid (as mentioned above).
Extraneous objects SHOULD not be created by the library as this would result in such objects being saved in the graph.

By extraneous, I mean that object does not contribute to the descriptions being written by the user--it is created for utility or transient functionality. It should not be necessary to create such objects in general, but if such a need seems to be uncommutable, the object must either never be added to the open set or immediately removed from the closed set upon creation.
It SHOULD provide implicit saving of the open set before the evaluation of any Query form statements.

An alternative to this would be to mandate separation of data definition from querying as is done in Prolog. This would make for a less natural description language.

Add biological entities to owmeta to support parameter changes for aging

Original issue: openworm/OpenWorm#176

Update the DB with new circuit information

This full circuit is not entirely represented in the library right now:

What the circuit should be:

What the circuit currently is in PyOpenWorm:

Get edge info
"RMGR", "URXR": {'synapse': 'GapJunction', 'neurotransmitter': 'FRMFemide_GJ', 'weight': '1'}
"RMGR", "ASKR": {'synapse': 'GapJunction', 'neurotransmitter': 'FRMFemide_GJ', 'weight': '1'}
"RMGR", "ASJR": None
"RMGR", "RMHR": {'synapse': 'GapJunction', 'neurotransmitter': 'FRMFemide_GJ', 'weight': '1'}
"RMGR", "AWBR": {'synapse': 'GapJunction', 'neurotransmitter': 'FRMFemide_GJ', 'weight': '1'}
"RMGR", "IL2R": {'synapse': 'GapJunction', 'neurotransmitter': 'FRMFemide_GJ', 'weight': '1'}
"RMGR", "ASHR": {'synapse': 'Send', 'neurotransmitter': 'FRMFemide', 'weight': '1'}
"RMGR", "ADLR": None

We should use this as a test case of updating the DB using the alpha0.5 methods and adding a reference to the paper

Flesh out additional tests to cover full new API revisions.

Progress with test.py has been great. Keep going to create tests to cover the rest of the new API in keeping with best test-driven development practice.

Create a dev release

Let's get this out so we can do a pip install

implement Neuron.add_reference

Spec is:

Add a reference that provides evidence of the relationship between this neuron and one of its elements.
    Example::
    >>>aval = PyOpenWorm.Neuron('AVAL')
    >>>aval.receptors()
    ['GLR-1', 'NMR-1', 'GLR-4', 'GLR-2', 'GGR-3', 'UNC-8', 'GLR-5', 'NMR-2']
    #look up what reference says this neuron has a receptor GLR-1
    >>>aval.get_reference(0,'GLR-1')
    None
    >>>aval.add_reference(0,'GLR-1', doi='125.41.3/ploscompbiol', pmid = '57182010')
    >>>aval.get_reference(0,'GLR-1')
    ['http://dx.doi.org/125.41.3/ploscompbiol', 'http://pubmedcentral.nih.gov/57182010']
    :param type: The kind of thing to add.  Valid options are: 0=receptor, 1=neighbor 
    :param item: Name of the item
    :param doi: A Digital Object Identifier (DOI) that provides evidence, optional
    :param pmid: A PubMed ID (PMID) that point to a paper that provides evidence, optional
    :param wormbaseid: An ID from WormBase that points to a record that provides evidence, optional

stub is here: https://github.com/openworm/PyOpenWorm/blob/master/PyOpenWorm/neuron.py#L289

ALFRED: an evolutionary approach to modeling embryo development

A possible approach to modeling development of the C. elegans embryo

Background:
I propose to model embryological development in C. elegans by beginning with a simpler problem - plant development - to create first an algorithmic structure homologous to the genetic program that drives plant development, and then generalize the algorithmic structure to enable it to cover the more complicated development of multicellular animals.

Around 1990, it occurred to me that the operons found in bacterial DNA constitute a universal computer language. This led to my collaboration and friendship with Dick Gordon. For nearly a decade I had developed and worked with a "genetic algorithm" that serves as a general problem solver, finding optimal solutions to complex problems in engineering. The genetic algorithm, "Generator", has been used for optimizing stock portfolios, designing lens systems, cracking encryption, and a host of other optimization tasks. It seemed to me that a computer language designed specifically to reflect the algorithmic structure of operons could be used to evolve algorithms to optimally model complex systems. Only a toy version of "Operon Language" was ever developed.

The current proposal is to use a somewhat higher-level approach, to model embryological development. A handful of elemental instructions, specifically designed to reflect the parallel development of cell lineages in an embryo as well as the physical and chemical properties of cells and their environments, will provide the basis for evolving an algorithm that models the actual development of the embryo as closely as possible using known experimental data.

An important advantage to having an algorithmic model is that the model can be used to identify potentially fertile areas of experimental research. The very nature of the algorithm and the way it emerges will make it able to adapt to accommodate new experimental data or offer alternative explanations for an observed spatiotemporal pattern in development.

ALFRED: proposed Algorithmic Language For Realistic Embryological Development
The proposed language, ALFRED, will consist of arbitrary sets of the following instructions:

· Replicate
· Die
· Update State
· Sense
· Signal

The instructions will operate on the following objects:
· Cells which have attributes of:
o pedigree
o state buffers representing:
§ xyz location
§ orientation
§ rigidity
§ size
§ shape
§ motiliity
§ etc.

Mechanical and chemical interactions between cells will be mediated by the Sense function.

There should also be an Overview module that, using cell states, provides information that can only be computed from a global perspective such as overall shape, distribution of stresses and strains in the embryo, and external influences. The Overview module will post information that is accessible to individual cells via the Sense function.

Adjunct to the Overview module will be a Presentation module through which the user can use visualization tools to watch the growth of a virtual embryo.

A computational cycle amounts to:

using the Sense function to gather relevant internal and external information for each cell,
updating cell states,
Dividing or Dying

Division can be symmetrical or asymmetrical. In the symmetrical case, Division amounts to creating a new cell that is identical in all respects to the parent cell, including the contents of each state buffer. In the asymmetrical case, the contents of each state buffer are copied with changes dictated by a Boolean function of internal state values.

Dying removes a cell from the population, and is dictated by a Boolean function of internal and external state values.

Although ALFRED can be used by a human programmer like any other programming language to write code line by line, I propose that a genetic algorithm be employed to evolve code that results in a model of the embryo.

The genetic algorithm will work as follows:

SETUP:
1a. create a population of Np quasi-randomly constructed instruction sets and Sense functions, the combination being called an "individual".
1b. Associate one precursor cell to each individual. The state values in the precursor cell all start at "zero".
CELL CYCLE
2a. Run the individuals in the population through one computational cell cycle for each precursor cell. Unless the cell Dies, it will be replicated to produce a descendent cell. The collection of cells descended from any one precursor cell is called an "embryo".
DEVELOPMENT
3a. Repeat CELL CYCLE, for a user-assigned number Nc of cycles.
SELECTION&VARIATION
4a. After Nc computational cycles, compare the states of the cells in each embryo to observed cell states at the same stage. The degree of matching is called "fitness". The user can specify a function to calculate fitness
4b. Rank the individuals in the population according to their "fitness".
4c. Create a new population by:
4ci. keeping the Nh highest-fitness individuals
4cii. replacing the Nl lowest-fitness individuals with randomly created individuals
4cii. generating enough new individuals to maintain the population at Np, by:
4cii1. selecting a "mother" individual from the current population, with probability proportional to the individual's fitness rank.
4cii2. selecting a "father" individual from the current population, also with probability proportional to the individual's fitness rank.
4cii3. creating a hybrid "daughter" individual from the "mother" and "father" by a user-specified process which will usually retain common features of the mother and father and quasi-randomly copy features from the mother and father that are not common to both. It helps a lot to choose a process that provides a high likelihood that the resulting daughter will have a reasonably high fitness.
4cii4. repeating steps i through iii above until the population reaches Np.
EVOLUTION
5a. Repeat CELL CYCLE, DEVELOPMENT, and SELECTION&VARIATION until a user-specified termination criterion is met. The criterion might be a number of evolution cycles, an amount of computational time, a rate of change of fitness value of the highest-fitness individual in the population over the past number Nx of evolution cycles, or attainment of a target level of fitness in any individual in the population.

The user will specify Nh, Nl, Np, and other control parameters via a Dashboard. The program should save the Nh top-fitness individuals for the user to study.

Discussion:
There is, of course, a lot to discuss and a lot of experimenting to do if ALFRED is to be implemented. One part of ALFRED will need concentrated attention: producing daughter individuals from parent individuals. The Overview, Presentation, and Dashboard modules can probably be worked on after the other parts are done.

I propose that ALFRED be used initially to model plant development. It should be relatively easy to evolve algorithms to construct satisfactory models of, for example, ferns, oak trees, pine trees, palm trees, ivy, roses, or carrots. Experience with these simpler examples should provide a basis for modifying ALFRED's structure to tackle C. elegans.

If ALFRED is pursued, the database tools currently under development in DevoWorm will be valuable in several ways:
· informing decisions that will need to be made in constructing ALFRED
· providing a resource ALFRED will use for calculating fitness while evolving models.

Rename PyOpenWorm folder to openworm

For a more pythonic:

import openworm
net = openworm.Network()

implement Neuron.get_connections

Spec is:

"""Get a list of Connections between this neuron and other neurons.  
    :param type: What kind of junction to look for.  
                    0=all, 1=gap junctions only, 2=all chemical synapses
                    3=incoming chemical synapses, 4=outgoing chemical synapses
       :returns: a list of PyOpenWorm.Connection objects
       :rtype: List
       """

Stub is here: https://github.com/openworm/PyOpenWorm/blob/master/PyOpenWorm/neuron.py#L386

Modify insert_worm.py to fix multiple RDF nodes per neuron

This test is failing, because the structure of the data is inconsistent.

The data is structured by insert_worm.py. Somehow, when this is run, there are two RDF nodes per neuron instead of one.

This issues is to understand why that is happening and fix insert_worm.py so that there is only one RDF node per neuron (but not losing any other data that we already have).

Add summary data of system-level C. elegans to openworm/owmeta-data bundle

Incorporate the facts from this spreadsheet into the openworm/owmeta-data bundle.

Requirements:

Must include references listed in the document as appropriate Document/Evidence
Must use DataSource / DataTranslator for translating from the spreadsheet
May use the spreadsheet directly from Google Sheets or download it
Must include the ranges for values, where listed, rather than individual values within those ranges
Must merge existing Document objects with the new ones created: search for documents already in the database and re-use the URIs.

Recommendations:

Should verify that the facts and figures are faithfully reproduced from the references.

alpha0.5: sensory() and interneurons() have wrong output

This is in the network.py object

>>> list(net.interneurons())
[]
>>> list(net.sensory())
[Neuron(name=PHAL, Neighbor(), Connection(), receptor=NLP-7), Neuron(name=PHAL, Neighbor(), Connection()), Neuron(name=PHAL, Neighbor(), Connection(), receptor=NLP-14)]

Enable python setup.py install to work

Currently Travis CI testing under the zodb-install and the zodb branches works but only with running python setup.py develop. Let's get python setup.py install to work as well.

Current error during python setup.py install in the zodb-install branch:

$ python setup.py install
Compiling C. elegans data
Traceback (most recent call last):
  File "setup.py", line 32, in <module>
    insert_worm.do_insert()
  File "/home/travis/build/openworm/PyOpenWorm/db/insert_worm.py", line 232, in do_insert
    P.connect(configFile='default.conf',do_logging=logging)
  File "/home/travis/build/openworm/PyOpenWorm/PyOpenWorm/__init__.py", line 134, in connect
    loadConfig(configFile)
  File "/home/travis/build/openworm/PyOpenWorm/PyOpenWorm/__init__.py", line 98, in loadConfig
    Configureable.conf = Data.open(f)
  File "/home/travis/build/openworm/PyOpenWorm/PyOpenWorm/data.py", line 221, in open
    Configureable.conf = Configure.open(file_name)
  File "/home/travis/build/openworm/PyOpenWorm/PyOpenWorm/configure.py", line 80, in open
    value = resource_filename(Requirement.parse("PyOpenWorm"), value)
  File "/home/travis/virtualenv/python2.7.8/lib/python2.7/site-packages/pkg_resources.py", line 895, in resource_filename
    return get_provider(package_or_requirement).get_resource_filename(
  File "/home/travis/virtualenv/python2.7.8/lib/python2.7/site-packages/pkg_resources.py", line 225, in get_provider
    return working_set.find(moduleOrReq) or require(str(moduleOrReq))[0]
  File "/home/travis/virtualenv/python2.7.8/lib/python2.7/site-packages/pkg_resources.py", line 685, in require
    needed = self.resolve(parse_requirements(requirements))
  File "/home/travis/virtualenv/python2.7.8/lib/python2.7/site-packages/pkg_resources.py", line 588, in resolve
    raise DistributionNotFound(req)
pkg_resources.DistributionNotFound: PyOpenWorm
The command "python setup.py install" failed and exited with 1 during .
Your build has been stopped.

Improve update performance

Currently, uploads are performed in small blocks due to limitations of the stack size during RDFLib's parse. This requires us to parse and compile the update statement for each block. This can be avoided by calling rdflib.graph.Graph.add() directly for each statement and adding a switch for remote stores (e.g., SPARQLUpdateStore), for which the cost of one add() is very expensive, to do SPARQL Update.

Add cell lineage info to PyOpenWorm

Cell lineage refers to the tree of cell divisions that occurs when an organism is growing from an embryo. We'd like to have a data structure that captures this for c. elegans. The data are out there -- now we'd like to put them into PyOpenWorm.

This is the first step in a greater project to be able to render lineage trees as "differentiation trees" for exploration and potentially simulation down the road.

Identify best data source to begin working with
Write a simple python script that pulls out the names of the cells and the cell division relationships and displays to console
Augment the script to dump the data into the PyOpenWorm SQLite3 DB.
Augment PyOpenWorm to pull this data out from the SQLite3 DB and construct an RDF graph out of it
Build accessor functions in PyOpenWorm that can pull the whole lineage into a networkX graph, and also can report lineage information when asked for a specific cell.

Original issue: openworm/OpenWorm#179

Do default config as part of .connect() in alpha0.5

The database load steps are:

  P.connect('PyOpenWorm/default.conf')
  P.config()['rdf.graph'].parse('OpenWormData/out.n3', format='n3')

Can the second step be incorporated under connect()? The second step doesn't seem to helpful to expose to the end user as they are not going to know enough to change it.

Add timeouts for metadata lookups in Evidence

alpha0.5 producing incorrect output compared to README.md

With a system where tests are passing, we are not getting sensible output from the examples.

This is most concerning and the highest priority issue to fix right now.

type() not working:

>>> net = P.Worm().get_neuron_network()
>>> list(net.aneuron('AVAL').type())
[]
>>> list(net.aneuron('DD5').type())
[]

wrong receptors coming out:
FROM README:

>>>list(net.aneuron('AVAL').receptors())
['GLR-1', 'NMR-1', 'GLR-4', 'GLR-2', 'GGR-3', 'UNC-8', 'GLR-5', 'NMR-2']

ACTUAL:

>>> list(net.aneuron('AVAL').receptors())
['FLP-18', 'FLP-1']

Readme says this will be True

>>> 'MDL08' in P.Worm().muscles()
False

No receptors coming out even though we say there are:

FROM README:

>>>ader = P.Neuron('ADER')
>>>list(ader.receptors())
['ACR-16', 'TYRA-3', 'DOP-2', 'EXP-1']

ACTUAL:

>>> ader = P.Neuron('ADER')
>>> list(ader.receptors())
[]

Omit identifiers for un-referenced properties

Properties have the useful feature that they can be referenced uniquely. However, the mechanism by which this is done results in an excess of triples which could otherwise be omitted.

Requirements

Must preserve a means of retrieving all properties of a given type.
Should only store a single triple for a property with a set value unless some other object references it.

Add RDFS and OWL reasoning to RDF graph

Some of the queries we perform in PyOpenWorm could be simplified by including a reasoner like FuXi. For instance, we currently insert an rdf:type triple for each super-class of a DataObject's type in order to utilize classes in our queries.

It would be preferable to augment the RDF graph with RDFS reasoning to infer that, for instance, a Neuron is a Cell, without inserting extra triples beyond a description of the class hiearchy (i.e., a rdfs:subClassOf link between Neuron and Cell). While it is be possible to modify individual queries to search the class hierarchy, by pushing this logic into a reasoner, we retain the simplified structure of our object -> triples -> query path in all present and future queries added to PyOpenWorm as well as custom queries devised by users of the library.

FuXi was chosen as a possible reasoner because it already interfaces with RDFLib.

See the old Fuxi docs for a quick look at how to interface FuXi with a graph.

Changes should be made in PyOpenWorm/data.py by replacing the value that goes into rdf.graph. See here. The graph can also be set directly. After calling PyOpenWorm.connect():

PyOpenWorm.Configureable.default['rdf.graph'] = <your rdflib graph>

Make examples directory human readable

Remove unnecessary files
README should explain each example
Examples should have human readable comments including a summary at the top and comments throughout that explain what is going on.

Improve the Cell morphology

The morphology query runs very slowly. This is probably a result of the OPTIONALs in the query.

alpha0.5: re-integrate the data repo

Versioning the data outside of the library is the opposite of what we want to achieve here. We want the data version tightly coupled to the library as a feature because then users know what data they have in a given version of the library. Decoupling it will create confusions and mismatches between which data version versus which library version, and is also likely to make queries fail. This is also the reason not to use an externally hosted database that may be changing over time. We are following a "data as code" idea here.

Please bring back any data you've put outside this repo or let me know where those calls are and I'll do it in my branch.

Add neuron cell receptor data

Part of #27

Force users to manually open the database in their own scripts

Some graph stores, Sleepycat in particular, require being opened before use and closed after. There's no reliable way that I know to register a clean-up function in Python without knowing the particular program that's running. Consequently, responsibility for closing the store must be pushed up to the user and opening the store should as well to remind the user to close it.

Failure to import PyOpenWorm in the alpha0.5 branch

In a fresh clone of the alpha 0.5 branch, the import of PyOpenWorm failed, as below:

import PyOpenWorm
...
import transaction
ImportError: No module named transaction

Change backend to pull from flat file in local memory rather than REST interface

Just to codify what we talked about with regards to the data process, let's keep the whole database as a flat file under the db/ directory, pull it in each time the library loads, and write it back out each time there is an update / edit to the database.

I noticed out.n4. Couple things:

I believe the extension is supposed to be .nq (ref)
Have you experimented with TriG as an alternative to N-Quads? It looks more concise. rdflib seems to have support for it

Fix import script to ensure single RDF nodes per neuron

The import script currently is creating different RDF nodes for information about the same neurons. This makes the graph hard to query due to redundant information because there is not a 1:1 relationship between a neuron and an RDF node that connects to all information about it. This is likely to lead to data appearing lost.

A test has been created that looks for this in the data and when it fails it reports the following RDF node count per known neuron:

{'VB4': 2, 'ADLL': 2, 'VD7': 2, 'HSNL': 2, 'SIBDR': 2, 'CEPVR': 2, 'PHCR': 2, 'VD8': 2, 'VA8': 2, 'AIYR': 2, 'M2R': 3, 'CEPVL': 2, 'DB1': 1, 'OLLR': 2, 'DB3': 1, 'DB2': 2, 'DB5': 2, 'AIYL': 2, 'DB7': 2, 'CANR_': 0, 'VB7': 2, 'URAVR': 2, 'NSMR': 2, 'ASKR': 2, 'RMFR': 2, 'AWCR': 2, 'AVAR': 2, 'RIS': 2, 'SMBVL': 2, 'RICR': 2, 'AVAL': 2, 'VA9': 2, 'AWCL': 2, 'AS11': 2, 'AS10': 2, 'VA2': 2, 'ASKL': 2, 'NSML': 2, 'RIH': 2, 'VA6': 2, 'VA7': 2, 'VA4': 2, 'VA5': 2, 'CEPDL': 2, 'URYVL': 2, 'PVQL': 2, 'FLPR': 2, 'AVJR': 2, 'URYDR': 2, 'IL2VR': 2, 'IL2VL': 2, 'URYDL': 2, 'FLPL': 2, 'AVJL': 2, 'URBR': 2, 'RMDR': 2, 'CEPDR': 2, 'AVBR': 2, 'VD13': 2, 'ASJL': 2, 'ALML': 2, 'PVQR': 2, 'AWBR': 2, 'IL2L': 2, 'RMHR': 2, 'AWBL': 2, 'IL2R': 2, 'ALMR': 2, 'ASJR': 2, 'DB4': 2, 'PVDL': 2, 'AVBL': 2, 'VD12': 2, 'ASER': 2, 'SAAVL': 2, 'RMGL': 2, 'SMDDL': 2, 'M5': 3, 'M4': 3, 'RID': 2, 'M1': 3, 'URXL': 2, 'ASEL': 2, 'ALA': 2, 'SMBDL': 2, 'VA11': 2, 'VA12': 2, 'ALNL': 2, 'HSNR': 2, 'RMDDR': 2, 'PLNL': 2, 'VB10': 2, 'DD3': 2, 'DD2': 2, 'ADAR': 2, 'DD6': 2, 'DD5': 2, 'ADLR': 2, 'M2L': 3, 'SMDDR': 2, 'ADAL': 2, 'DD4': 2, 'URXR': 2, 'PLNR': 2, 'PVDR': 2, 'DB6': 2, 'PHAL': 2, 'SABD': 2, 'ALNR': 2, 'VD10': 2, 'AUAL': 2, 'SABVR': 2, 'RIMR': 2, 'RMER': 2, 'AIAR': 2, 'RMEV': 2, 'RMEL': 2, 'AIAL': 2, 'RIML': 2, 'RMED': 2, 'VB9': 2, 'AUAR': 2, 'VD1': 2, 'VC1': 2, 'PVNR': 2, 'PLML': 2, 'VD5': 2, 'RICL': 2, 'AINR': 2, 'PVWL': 2, 'VD9': 2, 'SIBVL': 2, 'SAADR': 2, 'RIFR': 2, 'MI': 3, 'AVHL': 2, 'ADFR': 2, 'AVL': 2, 'AVM': 2, 'AVHR': 2, 'SAADL': 2, 'SIBVR': 2, 'AINL': 2, 'PVWR': 2, 'AVG': 2, 'PVNL': 2, 'PLMR': 2, 'VC2': 2, 'SIBDL': 2, 'IL1DL': 2, 'RIBR': 2, 'OLLL': 2, 'RIR': 2, 'IL1R': 2, 'IL1VL': 2, 'DVA': 2, 'SMBDR': 2, 'DVC': 2, 'DVB': 2, 'RMDDL': 2, 'SIADL': 2, 'PDEL': 2, 'IL1L': 2, 'VD4': 2, 'RIVR': 2, 'I1R': 2, 'VC3': 2, 'IL1DR': 2, 'OLQDR': 2, 'PVR': 2, 'PVT': 2, 'AVEL': 2, 'I1L': 2, 'IL2DL': 2, 'ASGL': 2, 'AIMR': 2, 'AIML': 2, 'ASGR': 2, 'IL2DR': 2, 'OLQVL': 2, 'AVER': 2, 'RIFL': 2, 'PVM': 2, 'OLQDL': 2, 'AS3': 2, 'AS2': 2, 'AS1': 2, 'URADL': 2, 'AS7': 2, 'AS6': 2, 'AS5': 2, 'AS4': 2, 'AS9': 2, 'AS8': 2, 'PVCR': 2, 'VD3': 2, 'SMDVR': 2, 'VD2': 2, 'LUAR': 2, 'VA3': 2, 'VA1': 2, 'URYVR': 2, 'RIPL': 2, 'VD11': 2, 'PDB': 2, 'PDA': 2, 'SMBVR': 2, 'BAGR': 2, 'AVFL': 1, 'SDQR': 2, 'PQR': 2, 'RIGL': 2, 'AFDL': 2, 'I2R': 2, 'VD6': 2, 'RMHL': 2, 'BDUR': 2, 'VB3': 2, 'BDUL': 2, 'VB1': 2, 'SMDVL': 2, 'I2L': 2, 'RMFL': 2, 'VB5': 2, 'AFDR': 2, 'RIGR': 2, 'SDQL': 2, 'VB8': 2, 'AVFR': 1, 'BAGL': 2, 'URADR': 2, 'AIBR': 2, 'CANL_': 0, 'AVKR': 2, 'OLQVR': 2, 'RMDVR': 2, 'M3R': 3, 'RIPR': 2, 'RMDVL': 2, 'AVKL': 2, 'PHAR': 2, 'PVCL': 2, 'AIBL': 2, 'URAVL': 2, 'AWAR': 2, 'ASIL': 2, 'ADER': 2, 'AQR': 2, 'VA10': 2, 'I3': 2, 'URBL': 2, 'I5': 2, 'I4': 2, 'PVPL': 2, 'I6': 2, 'PVPR': 2, 'SABVL': 2, 'SIADR': 2, 'M3L': 3, 'VC4': 2, 'VC5': 2, 'VC6': 2, 'PHCL': 2, 'ADEL': 2, 'RMGR': 2, 'RIVL': 2, 'PDER': 2, 'ASIR': 2, 'AWAL': 2, 'AIZL': 2, 'SIAVR': 2, 'VB6': 2, 'IL1VR': 2, 'ADFL': 2, 'RIBL': 2, 'DA4': 2, 'DA5': 2, 'DA6': 2, 'DA7': 2, 'DA1': 2, 'DA2': 2, 'DA3': 2, 'AIZR': 2, 'DA8': 2, 'DA9': 2, 'VB2': 2, 'ASHL': 2, 'SAAVR': 2, 'RIAR': 2, 'MCL': 3, 'VB11': 2, 'SIAVL': 2, 'AVDR': 2, 'PHBR': 1, 'MCR': 3, 'LUAL': 2, 'PHBL': 2, 'RMDL': 2, 'AVDL': 2, 'RIAL': 2, 'DD1': 2, 'ASHR': 2}

To see this, you'll need to take off the @unittest.expectedFailure annotation when you run the test. You can now run this test separate from the rest of the test suite with:

python tests/test.py DataIntegrityTest.testUniqueNeuronNode

We can close this issue when the test passes correctly.

Implement full Alpha 0.5 API

Losing track of what is implemented in the alpha0.5 branch and what is not. @mwatts15 could you check boxes here to communicate where we are at?

API list taken from here

Integrate doctest with unit tests

Doctest may be helpful in keeping the examples in documentation up-to-date with the source code.

Doctest should execute as part of the normal unit testing.

Make better use EAV structure

RDF gives us the ability to leave many values NULL for each object in the database. Currently we don't make good use of this facility, inputting a sort of NULL (called graph variables in the source code). Correcting this should be done in concert with improving the query functionality.

Acheiving this would drastically reduce the update time from the reduction in triples inserted.

Port PyOpenWorm to python3

Advantages:

Other parts of the project, such as movement_validation, are built in python3 - consistency is good.
It seems unlikely that python3 will be abandoned as a failed experiment at this point - given we want the usership of PyOpenWorm to increase over time, it seems better to write forward-compatible than backward-compatible code.
By importing from future, some python3-style code can be interpreted in python2: you can't do the same to interpret python2-style code in python3.
It's a newer version - all the changes which have been made have presumably been made for a reason, resulting in either cleaner, faster, or more robust code.

Difficulties:

There may well be 3rd party packages which don't support python 3... yet. Again, I find it highly unlikely that as time goes on, there will be any important piece of functionality offered by py2 packages which won't have a counterpart in py3. RDFlib and sqlite3 both seem to be supported. libNeuroML may not be, but having a dig through that code there doesn't seem to be any packages which wouldn't support it.

A 'port to python3' issue is the sort of thing which should be on the backlog of absolutely any maintained python project. Currently, it is just on the backlog, and under 'ideas'- surely we plan to do this at some point in the future, and planning for the future is what backlogs are for!

Implement muscle.innervatedBy() in alpha0.5

There's interest from @wilzh40 to see this accomplished for another part of the project. @mwatts15 do you have everything you need to do this?

CRITICAL:ZODB.FileStorage:/.../worm.fs Database records 35284834279 seconds in the future

This message prints when a ZODB store is reopened. Doesn't seem to affect data integrity.

Loading neurons with Neuron().load() is very slow

Tests are incredibly slow:

-----------------------------------------------------------------
Testing with tests/test_ZODB.conf
Ran 96 tests in 274.587s

OK (SKIP=8)
-----------------------------------------------------------------

Simple functions are incredibly slow (each takes more than 90 seconds to complete)

>>> net.aneuron('AVAL').connection.count('either',syntype='gapjunction')
80
>>> len(set(net.neurons()))
302

These are very fast in master

Asserts() is incredibly slow (more than 10 minutes to complete final command & then I got tired of waiting):

>>> e = P.Evidence(author='Sulston et al.', date='1983')
>>> e.asserts(P.Neuron(name="AVDL").lineageName("AB alaaapalr"))
asserts=lineageName=AB alaaapalr
>>> e.save()
>>> e0 = P.Evidence(author='Sulston et al.', date='1983')
>>> list(e0.asserts())
...? #what's the output?  Can't be bothered to wait for it...

Create data integrity tests that ensure db is sane

There is a start here
#44 is the result of not having this.

Initial tests should include:

Is there only 1 node in the db per neuron or is their more than one? Currently AVAL has 2 for example
Does every neuron have a type that returns a name?
Does every fact have evidence associated with it?

Simplify installation for alpha0.5

As reported on the openworm-discuss list folks are running into roadblocks installing this library due to a dependency on the BerkeleyDB being installed on their system and version mismatches. This is too high a bar for most users to have to worry about; they won't use this and they'll go elsewhere. We need to simplify the installation procedures so that pip install is really all that is needed.

One approach may be to look at adding a new RDFLib backend that is pure python. Some options include:

"_p_serial must be an 8-character bytes array" error with ZODB store

alpha0.5: remove need to wrap everything in list() or set()

Via the use of generators, we are forcing the end user of the api to use list() and set() methods to get basic info out like type() and name(). For example, from the current version of the readme:

  >>> list(net.aneuron('AVAL').name())
  ['AVAL']

  #list all known receptors
  >>> s = set(net.aneuron('AVAL').receptors())
  >>> s == set(['GLR-1', 'NMR-1', 'GLR-4', 'GLR-2', 'GGR-3', 'UNC-8', 'GLR-5', 'NMR-2'])
  True

  >>> list(net.aneuron('DD5').type())
  ['motor']
  >>> list(net.aneuron('PHAL').type())
  ['sensory']

This is not in keeping with the goals of simplicity of this package. If we need to expose the generators directly, let's make additional parallel methods for this, but let's keep the basic getters returning strings and lists.

To fix, I will follow this approach:

rename generator producing methods as private and suffix with "_helper"
wrap the generator producing method in the simple API call with the same list() or set() method that seems to be used in examples.

Comments on these changes are welcome.

Improve update performance, 2

The current method of doing updates calls save() on a top-level object which calls triples() recursively on properties and ObjectProperty values. Since objects can appear in multiple places in the tree of calls to triples(), we should prevent them from releasing their triples within a single save().

The solution should not use a global variable to indicate saved/unsaved status.

openworm / owmeta Goto Github PK

owmeta's People

Contributors

Stargazers

Watchers

Forkers

owmeta's Issues

Requirements

Requirements

Advantages:

Difficulties:

Recommend Projects

Recommend Topics

Recommend Org