Git Product home page Git Product logo

smallseq's People

Contributors

eyay avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

smallseq's Issues

Some questions/problems concerning the program pipeline

Hi eyay,

there are some things with the posted pipeline that I do not understand / do not work for me and it would be great if you could tell me, what I'm eventually doing wrong.

  1. Prerequisite "GenomeFetch.py": Looking briefly into the script it doesn't seem to support hg38 which you were using for the mapping. For what do you need the script?

  2. Remove UMI from reads: What is the difference between your script (remove_umi_from_fastq_v4.py) and the one from the umi_tools package (umi_tools extract)? The parallelization doesn't give me any speed advantage on the data?! Additionally, your output is a fq, why not compressing it which is supported by umi_tools extract?

  3. It seems that you are using also multi mapped reads in later steps of the analysis, don't you? If so, why can you use those and why didn't you change the outFilterMultimapNmax parameter from Star, as everything above 20 multi hits will be flagged as unmapped?!

3-4) Star mapping and clipped read removal: I'm missing a lot of file name conversions here. From the star command, you are creating generic output sam files, but in the clipped read removal step, you require bam files that are split into unique and multi-mappings with very rigid naming convention. Though I could create files that satisfy the name requirements (I split them based on the NH field and compress and sort them afterwards), I'm getting now an error I do not understand:

`
python /usr/local/smallseq/src/remove_softclipped_reads_parallelized_v2.py -i 01_Mapping -o 01_Mapping_clipped -p 6 -n 0 -t 3
Traceback (most recent call last):
File "/usr/local/smallseq/src/remove_softclipped_reads_parallelized_v2.py", line 93, in
Parallel(n_jobs=o.numCPU)(delayed(remove_clipped_reads)(sample) for sample in samplenames_with_fullpath)
File "/usr/local/lib/python2.7/dist-packages/joblib/parallel.py", line 768, in call
self.retrieve()
File "/usr/local/lib/python2.7/dist-packages/joblib/parallel.py", line 719, in retrieve
raise exception
joblib.my_exceptions.JoblibSamtoolsError: JoblibSamtoolsError


Multiprocessing exception:
...........................................................................
/usr/local/smallseq/src/remove_softclipped_reads_parallelized_v2.py in ()
88 samplenames_with_fullpath.append(multi_bam)
89
90 path_outbam = os.path.join(o.outstardir, sample)
91 if not os.path.exists(path_outbam): safe_mkdir(path_outbam)
92
---> 93 Parallel(n_jobs=o.numCPU)(delayed(remove_clipped_reads)(sample) for sample in samplenames_with_fullpath)
94
95
96
97

...........................................................................
/usr/local/lib/python2.7/dist-packages/joblib/parallel.py in call(self=Parallel(n_jobs=6), iterable=<generator object >)
763 if pre_dispatch == "all" or n_jobs == 1:
764 # The iterable was consumed all at once by the above for loop.
765 # No need to wait for async callbacks to trigger to
766 # consumption.
767 self._iterating = False
--> 768 self.retrieve()
self.retrieve = <bound method Parallel.retrieve of Parallel(n_jobs=6)>
769 # Make sure that we get a last message telling us we are done
770 elapsed_time = time.time() - self._start_time
771 self._print('Done %3i out of %3i | elapsed: %s finished',
772 (len(self._output), len(self._output),


Sub-process traceback:

SamtoolsError Fri Nov 25 10:47:38 2016
PID: 12961 Python 2.7.6: /usr/bin/python
...........................................................................
/usr/local/lib/python2.7/dist-packages/joblib/parallel.py in call(self=<joblib.parallel.BatchedCalls object>)
126 def init(self, iterator_slice):
127 self.items = list(iterator_slice)
128 self._size = len(self.items)
129
130 def call(self):
--> 131 return [func(*args, **kwargs) for func, args, kwargs in self.items]
func =
args = ('01_Mapping/SRR3495785/SRR3495785_unique.bam',)
kwargs = {}
self.items = [(, ('01_Mapping/SRR3495785/SRR3495785_unique.bam',), {})]
132
133 def len(self):
134 return self._size
135

...........................................................................
/usr/local/smallseq/src/remove_softclipped_reads_parallelized_v2.py in remove_clipped_reads(inbam='01_Mapping/SRR3495785/SRR3495785_unique.bam')
56
57 outbam.write(read)
58 reads_after_clip_removal += 1
59 outbam.close()
60 #sort and index the final bam file
---> 61 pysam.sort(bam_out, bam_out_sorted)
bam_out = '01_Mapping_clipped/SRR3495785/SRR3495785_unique_tmp.bam'
bam_out_sorted = '01_Mapping_clipped/SRR3495785/SRR3495785_unique'
62 pysam.index(bam_out_sorted+".bam", template=inbamPysamObj)
63
64 print reads_after_clip_removal, total_reads
65 print "%s percentage of removed reads = %.2f" %(p[-1], 100*(1 -reads_after_clip_removal/total_reads))

...........................................................................
/usr/local/lib/python2.7/dist-packages/pysam/utils.py in call(self=<pysam.utils.PysamDispatcher object>, *args=('01_Mapping_clipped/SRR3495785/SRR3495785_unique_tmp.bam', '01_Mapping_clipped/SRR3495785/SRR3495785_unique'), **kwargs={})
70 "%s returned with error %i: "
71 "stdout=%s, stderr=%s" %
72 (self.collection,
73 retval,
74 stdout,
---> 75 stderr))
stderr = '[bam_sort] Use -T PREFIX / -o FILE to specify te... Reference sequence FASTA FILE [null]\n'
76
77 self.stderr = stderr
78
79 # call parser for stdout:

SamtoolsError: 'samtools returned with error 1: stdout=, stderr=[bam_sort] Use -T PREFIX / -o FILE to specify temporary and final output files\nUsage: samtools sort [options...] [in.bam]\nOptions:\n -l INT Set compression level, from 0 (uncompressed) to 9 (best)\n -m INT Set maximum memory per thread; suffix K/M/G recognized [768M]\n -n Sort by read name\n -o FILE Write final output to FILE rather than standard output\n -T PREFIX Write temporary files to PREFIX.nnnn.bam\n -@, --threads INT\n Set number of sorting and compression threads [1]\n --input-fmt-option OPT[=VAL]\n Specify a single input file format option in the form\n of OPTION or OPTION=VALUE\n -O, --output-fmt FORMAT[,OPT[=VAL]]...\n Specify output format (SAM, BAM, CRAM)\n --output-fmt-option OPT[=VAL]\n Specify a single output file format option in the form\n of OPTION or OPTION=VALUE\n --reference FILE\n Reference sequence FASTA FILE [null]\n'


`

Why is samtools sort called again? And why is it looking for a reference fasta - a header is supplied?

Any help is much appreciated.
Best regards

No annotated mol in final output

Hi,

Thank you for your hard work and clear instruction. I ran through the pipeline and got the final count table. But there is no annotated molecules:

#samples LIB4854636_SAM24358596
#unannotatedmolc 121188
#annotatedmolc 0

It should not be since I have spike-ins.

I checked the bam files and it looks like I have hits actually, but it did not show up in the final table:
Screen Shot 2019-06-18 at 3 45 11 PM

I suspect the problem is in the "Count molecules and merge into single file" step. I attached my GenePred file(modified from mirBase ff3 file).

mmu_2.txt

One thing i noticed is that my reads(around 20nt) are much shorter than yours. Any suggestions?

thanks and best,

Lily

Scripts do not work

Hi there,
I am not proficient with python and I am unable to create an exact environment as you have used.
We have used the method described in to the paper and we are trying to QC our work.
Is there a way to help me out?
Best
Theo

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.