Hi Matt,
Sorry for my delay in posting. I have been working on the project a fair bit recently but have also made some errors, such as forgetting to save the Workspace. I needed to re-run some things in order to be able to compare properly the different trimming lengths as well as alignment settings. In any case, I have now moved on to Cnidaria.
I have been encountering a few errors. I have been running the Annelida code from Github (most recently obtained today) but with the Cnidaria reference sequences inserted. I've also been using maxiters = 3 and diags = TRUE for all alignments. The alignments mostly look very good. I will try a different run with a different setting to see if it's possible to resolve one case, but I think overall the alignments look good and that close relatives are properly aligned with one another, which is the most important thing here.
However, I am getting some errors in some other parts of the code.
The first one is in lline 625 in the code (note these line numbers are after I made adjustments for Cnidaria, such as inputting the Cnidaria ref seqs. Line numbers would be slightly different compared to the Annelida branch, but within a couple of lines of the Github version.)
dnaStringSet4 <- foreach(i=1:nrow(dfRefSeq)) %do% subset(dnaStringSet4[[i]][-refSeqRemove[[i]]])
Error message:
Error in subset(dnaStringSet4[[i]][-refSeqRemove[[i]]]) :
task 4 failed - "subscript is a logical vector with out-of-bounds TRUE values"
Same message for similar code a couple of lines later.
Also, error at line 711.
dfPairingResultsL1L2$inGroupPairing <- rep(1:(nrow(dfPairingResultsL1L2)/2), each = 2)
error message:
Error in $<-.data.frame
(*tmp*
, "inGroupPairing", value = c(1L, 1L, :
replacement has 2320 rows, data has 2321
Also error after 716
dfPairingResultsL1L2 <- (dfPairingResultsL1L2[,c("inGroupPairing","record_id","bin_uri","values",
"inGroupDistx1.3","medianLatAbs","medianLatMap","latMin","latMax",
"binSize","phylum_taxID","phylum_name","class_taxID",
"class_name","order_taxID","order_name","family_taxID",
"family_name","subfamily_taxID","subfamily_name",
"genus_taxID","genus_name","species_taxID","species_name",
"nucleotides","ind","medianLon")])
Error message:
Error in [.data.frame
(dfPairingResultsL1L2, , c("inGroupPairing", "record_id", :
undefined columns selected
and, further error messages are beyond these also.
I have thought of something to try (deleting those reference sequences associated with classes that didn't end up yielding data). I will keep you posted as to whether that works.
Cheers,
Sally