transator-java's Introduction

TransATor

This repository contains the cheminformatics (PKS Structure Generator and Runner) and web parts (Web and REST) of TransAtor. TransAtor is a tool for identification of tran-AT PKS domains and further generation of a first chemical hypothesis of the polyketide generated by the PKS enzyme in question. For more details see the wiki.

transator-java's People

Contributors

Watchers

transator-java's Issues

Task to be done for paper release

Adapt Java code to Python refactoring

The Java modules needs to be able to read the new annotation file and make use of this for shaping the PK molecule. 0.5 days used so far.

Release current version as 1.1 (the one that works with the flagged python 0.1).
Add ability to read additional verification field for sequence feature.
Define in which section is the annotation data required within the Java code.
- Implement objects for CladeAnnotation
- Implement ability to skip clades if they are not verified, use highest ranking
- Update some of the tests to the new cladification annotation.
Identify clades that produce stereo chemistry related exceptions with non-planar bonds.
- Move to newest CDK from 1.5.10, John suggests that this might fix this issue.

Produce new cladiifications HMMER models and setup directories

Took 0.5 days to delve into code
Realised that we cannot use a single newick file with complete cladification, but either a separate newick per clade or a fasta identifier to clade assignment, compatible with IDs used in annotation file.
Go into code and check what is needed, fix some bugs found (0.25)
Document what is needed (including got refactoring part #4 ) (0.15)
Discuss needs with Eric (0.1)
Inspect new fasta to clade assignment file given by Eric (1 including fixes)
- Fix errors
Try setup scripts with new input from Eric and document process (0.5)
Finish documentation for setup.

Python refactoring to new annotation scheme

Python part needs to be refactored to use the new annotation scheme. Currently, the python modules read the following from the annotation file:

Clade identifier
Description of the clade

The current annotation file is expected to be in the same path as the HMMER model, with an .annot file extension. For backward compatibility, we could add a flag for the new annotation file, and read it if provided, otherwise, expect to find the previous annotation file in the expected location. The class currently responsible for reading the annotation is hmmer/core/ModelAnnotator.

Write reader and related classes for new annotation scheme.
Add backward compatible ability to read new annotation file, extracting from here the clade description that it was obtained from the older annotation file.
- Override constructor of the class that process it (ModelAnnotator) to obtain these descriptions from a CladeAnnotation object if provided.

The Python code might need the following data from the annotation file:

Clade ID
Clade description as shown in tool
Mol file for monomer (this was previously based on the clade identifer, not anymore)
- ~~[ ] Make changes in code to use the mol file name given in the new annotation format, if provided.~~ Only needed in the Java part.
Postprocessor: This is used by the Java part and should be passed along.
VerificationDomains: This probably should be used within the Python part, to correct the annotation given to the Java part. I have my doubts here.
- Use annotation object with Domain_Verifier classes, instead of local loader previously implemented.
- Test DomainVerifier classes with Annotation reader.
- Invoke DomainVerifier classes from main script, to influence the resulting SeqObj's features.
  - Write test for SimpleFeatureWriter making sure that the verification column appears adequately, fix any issues
- Compare output to expected outputs for some sequences, fixing missing annotations that arise.
TerminationRule: This is used by the Java part and should be passed along.
NonElongating: This is used by the Java part and should be passed along.
VerificationDomainIsMandatory: ~~Used in the python part.~~
- ~~Should be used after calling the DomainVerifier classes, possibly to execute some changes (either remove the feature, which is preferred, or change it) if the verification fails.~~ Will be used only on the Java section, to decide whether to make use of the verifications done.

This also means that these fields need to make it into the new file that Python writes for the Java-CDK part (features file), or that Python generates a simplified file for Java and Java reads all these from the annotation file. One way to go would be to combine in the feature file the fields produced by Python from the sequence search and all the elements read from the annotation file, to avoid the risk of the Java part running with an incorrect annotation file. All the fields produced through the sequence search are stored initially in qualifiers inside SeqFeatures objects, which go inside the SeqRecords returned by the FeatureMarker classes in Query.core. These are in turn written to the .feature file passed to Java by SimpleFeatWriter class in SimpleFeatWriter.core. This could the place to add all the annotation elements if a unified output is to be used.

Alternatively, the annotation file can be passed to Java, alongside the file with the results of the sequence searches and domain annotations alterations.

Recommend Projects

pcm32 / transator-java Goto Github PK

transator-java's Introduction

TransATor

transator-java's People

Contributors

Watchers

Forkers

transator-java's Issues

Task to be done for paper release

Adapt Java code to Python refactoring

Produce new cladiifications HMMER models and setup directories

Python refactoring to new annotation scheme

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent