Currently when creating the norm-count files from input bam files, the user can choose to either keep or remove duplicates from the input. For amplicon based methods, we do not want to remove duplicates, however for hybridization-based methods we do.
I think the norm-count files should contain within them the information about how those counts where achieved, especialy if the input was filtered for duplicates or not.
This will in turn enable automatic selection of only controls that where calculated the same way as the sample we want to analyse.
For example, we could have a controls folder with 60 control samples (30 for amplicon based inputs) and 30 of hybridization inputs.
When user would select a new sample to analyse and included -rmdups parameter, the script can then read the controls folder and only select the best controls from the subset of 30 that are compatible.