Python part needs to be refactored to use the new annotation scheme. Currently, the python modules read the following from the annotation file:
- Clade identifier
- Description of the clade
The current annotation file is expected to be in the same path as the HMMER model, with an .annot file extension. For backward compatibility, we could add a flag for the new annotation file, and read it if provided, otherwise, expect to find the previous annotation file in the expected location. The class currently responsible for reading the annotation is hmmer/core/ModelAnnotator.
The Python code might need the following data from the annotation file:
- Clade ID
- Clade description as shown in tool
- Mol file for monomer (this was previously based on the clade identifer, not anymore)
[ ] Make changes in code to use the mol file name given in the new annotation format, if provided. Only needed in the Java part.
- Postprocessor: This is used by the Java part and should be passed along.
- VerificationDomains: This probably should be used within the Python part, to correct the annotation given to the Java part. I have my doubts here.
- TerminationRule: This is used by the Java part and should be passed along.
- NonElongating: This is used by the Java part and should be passed along.
- VerificationDomainIsMandatory:
Used in the python part.
Should be used after calling the DomainVerifier classes, possibly to execute some changes (either remove the feature, which is preferred, or change it) if the verification fails. Will be used only on the Java section, to decide whether to make use of the verifications done.
This also means that these fields need to make it into the new file that Python writes for the Java-CDK part (features file), or that Python generates a simplified file for Java and Java reads all these from the annotation file. One way to go would be to combine in the feature file the fields produced by Python from the sequence search and all the elements read from the annotation file, to avoid the risk of the Java part running with an incorrect annotation file. All the fields produced through the sequence search are stored initially in qualifiers inside SeqFeatures objects, which go inside the SeqRecords returned by the FeatureMarker classes in Query.core. These are in turn written to the .feature file passed to Java by SimpleFeatWriter class in SimpleFeatWriter.core. This could the place to add all the annotation elements if a unified output is to be used.
Alternatively, the annotation file can be passed to Java, alongside the file with the results of the sequence searches and domain annotations alterations.