dmnfarrell / smallrnaseq Goto Github PK

View Code? Open in Web Editor NEW

29.0 29.0 19.0 8.52 MB

small rna-seq analysis package

License: GNU General Public License v3.0

Python 97.30% R 1.10% CSS 1.60%

bioinformatics genomics mirna python rna-seq sequencing smallrna

smallrnaseq's People

Contributors

Stargazers

Watchers

Forkers

zhangyupisa ctoste xander-y bakerwm vallurumk neptune10000 amrr101 srikanthkris yujiandongbio standardgalactic genomicsnx wangshun1121 mandyzhang6 happyguoguoshu sajjadasaf lrguo1204 wangpanqiao argonought

smallrnaseq's Issues

Error saving de_genes_edger.csv

Rscript /home/sateeshp/.local/lib/python2.7/site-packages/smallrnaseq/DEanalysis.R de_counts.csv
Loading required package: limma

Traceback (most recent call last):
  File "/home/sateeshp/.local/bin/smallrnaseq", line 11, in <module>
    sys.exit(main())
  File "/home/sateeshp/.local/lib/python2.7/site-packages/smallrnaseq/app.py", line 484, in main
    diff_expression(options)
  File "/home/sateeshp/.local/lib/python2.7/site-packages/smallrnaseq/app.py", line 363, in diff_expression
    res.to_csv(os.path.join(path,'de_genes_edger.csv'), float_format='%.4g')
  File "/home/sateeshp/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 1745, in to_csv
    formatter.save()
  File "/home/sateeshp/.local/lib/python2.7/site-packages/pandas/io/formats/csvs.py", line 156, in save
    compression=self.compression)
  File "/home/sateeshp/.local/lib/python2.7/site-packages/pandas/io/common.py", line 397, in _get_handle
    f = open(path_or_buf, mode)
IOError: [Errno 2] No such file or directory: 'results2/de_genes_edger.csv'

Could this be a float_format save option error ?

I would be happy to share the entire log if you need. do let me know. Thank you

FutureWarning: Series.nonzero() is deprecated

/python3.6/site-packages/smallrnaseq/base.py:207: FutureWarning: Series.nonzero() is deprecated and will be removed in a future version.Use Series.to_numpy().nonzero() instead
  x['mean_norm'] = x[ncols].apply(lambda r: r[r.nonzero()[0]].mean(),1)

For Future enhancement

what label corresponds to what sample

Hi,
Thanks for a wonderful tool for sncRNAs profiling. I have successfully run the analysis but I was wondering how I know what label corresponds to what sample?

I have 12 samples and in the end, I got only labels S01 - S12. Looking for a positive response thanks in advance.

Cannot set sequence. Illegal character in string

Dear Damien,
I am running smallrnaseq and I got this message in the novel module of the pipeline.

predicting novel mirnas..
pooled 6 files into 2162077 unique reads
getting default classifier
finding read clusters
128485 read clusters in 1161901 reads
3031 clusters above reads cutoff
Cannot set sequence. Illegal character in string 'CUAAGAACACUACCCCACUUAAAAUGAUUUUAACUCCUUCACAUCCAAAACGGCAUUUCAUGGUGGCU5'
Cannot set sequence. Illegal character in string 'CUAAGAACACUACCCCACUUAAAAUGAUUUUAACUCCUUCACAUCCAAAACGGCAUUUCAUGGUGGCU559_'
Cannot set sequence. Illegal character in string 'CUAAGAACACUACCCCACUUAAAAUGAUUUUAACUCCUUCACAUCCAAAACGGCAUUUCAUGGUGGCU559_eno'
Cannot set sequence. Illegal character in string 'CUAAGAACACUACCCCACUUAAAAUGAUUUUAACUCCUUCACAUCCAAAACGGCAUUUCAUGGUGGCU559_enobkg'
Cannot set sequence. Illegal character in string 'CUAAGAACACUACCCCACUUAAAAUGAUUUUAACUCCUUCACAUCCAAAACGGCAUUUCAUGGUGGCU559_enobkgtB>'
Cannot set sequence. Illegal character in string 'CUAAGAACACUACCCCACUUAAAAUGAUUUUAACUCCUUCACAUCCAAAACGGCAUUUCAUGGUGGCU559_enobkgtB>AG'
Cannot set sequence. Illegal character in string 'CUAAGAACACUACCCCACUUAAAAUGAUUUUAACUCCUUCACAUCCAAAACGGCAUUUCAUGGUGGCU559_enobkgtB>AGUGG'
Cannot set sequence. Illegal character in string 'CUAAGAACACUACCCCACUUAAAAUGAUUUUAACUCCUUCACAUCCAAAACGGCAUUUCAUGGUGGCU559_enobkgtB>AGUGGCGU'
Cannot set sequence. Illegal character in string 'CUAAGAACACUACCCCACUUAAAAUGAUUUUAACUCCUUCACAUCCAAAACGGCAUUUCAUGGUGGCU559_enobkgtB>AGUGGCGUGCA'
Cannot set sequence. Illegal character in string 'CUCUG>Backbone_7077CCAAGCUGAAGUAUUGGCGCACUCACGGUG'
Cannot set sequence. Illegal character in string 'UUGCUCUG>Backbone_7077CCAAGCUGAAGUAUUGGCGCACUCACGGUG'
Cannot set sequence. Illegal character in string 'UUUUUGCUCUG>Backbone_7077CCAAGCUGAAGUAUUGGCGCACUCACGGUG'
Cannot set sequence. Illegal character in string 'ACUUUUUUGCUCUG>Backbone_7077CCAAGCUGAAGUAUUGGCGCACUCACGGUG'
Cannot set sequence. Illegal character in string 'AGCACUUUUUUGCUCUG>Backbone_7077CCAAGCUGAAGUAUUGGCGCACUCACGGUG'
Cannot set sequence. Illegal character in string 'UUCAGCACUUUUUUGCUCUG>Backbone_7077CCAAGCUGAAGUAUUGGCGCACUCACGGUG'
Cannot set sequence. Illegal character in string 'UUUUUCAGCACUUUUUUGCUCUG>Backbone_7077CCAAGCUGAAGUAUUGGCGCACUCACGGUG'
Cannot set sequence. Illegal character in string 'UUCUUUUUCAGCACUUUUUUGCUCUG>Backbone_7077CCAAGCUGAAGUAUUGGCGCACUCACGGUG'
Cannot set sequence. Illegal character in string 'CCAUUCUUUUUCAGCACUUUUUUGCUCUG>Backbone_7077CCAAGCUGAAGUAUUGGCGCACUCACGGUG'
Cannot set sequence. Illegal character in string 'AUUCCAUUCUUUUUCAGCACUUUUUUGCUCUG>Backbone_7077CCAAGCUGAAGUAUUGGCGCACUCACGGUG'
Cannot set sequence. Illegal character in string 'AUGAUUCCAUUCUUUUUCAGCACUUUUUUGCUCUG>Backbone_7077CCAAGCUGAAGUAUUGGCGCACUCACGGUG'
Cannot set sequence. Illegal character in string 'CGAAUGAUUCCAUUCUUUUUCAGCACUUUUUUGCUCUG>Backbone_7077CCAAGCUGAAGUAUUGGCGCACUCACGGUG'
took 3173.892 seconds
no precursors found above cutoff
Could not find any novel mirnas.
There may not be sufficient aligned reads or the score cutoff is too high.`

Seems that the name of the conting which is Backbone_XXX is attached to the sequence.
I am not sure, but maybe is a formatting problem or something like that.
Any idea what I can do to solve this issue?

Docker?

Should there be a docker image of this tool?

dict object has no attribute 'has key'

Hi Damien,

I really appreciate your small rna seq workflow. I am trying to test it for a paired small rna set to bulk rna set.

I am running into an error when I initialize the program as per your YouTube video / tutorial webpage

(RNAseqSTAR) patrickboada@patrickboada smallrna % smallrnaseq -c default.conf
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.8/bin/smallrnaseq", line 8, in
sys.exit(main())
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/smallrnaseq/app.py", line 560, in main
config.write_default_config(conffile, defaults=config.baseoptions)
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/smallrnaseq/config.py", line 60, in write_default_config
cp = create_config_parser_from_dict(defaults, ['base','novel','aligner','de'])
File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/smallrnaseq/config.py", line 71, in create_config_parser_from_dict
if not data.has_key(s):
AttributeError: 'dict' object has no attribute 'has_key'

Have you run into this error before?

Best,
Patrick

matplotlib tight_layout not applied

lib/python3.6/site-packages/matplotlib/tight_layout.py:181: UserWarning: Tight layout not applied. The bottom and top margins cannot be made large enough to accommodate all axes decorations. 
  warnings.warn('Tight layout not applied. '

This seems to be the reason for figure legend to overlap with the plot

Any suggestions on how to fix this?

libblastinput.sp issue

Hi, I have been running smallrnaseq with the default.conf options selected to run novel miRNA detection from the reference genome.

I keep getting this error - any ideas?

makeblastdb: error while loading shared libraries: libblastinput.so: cannot open shared object file: No such file or directory
Traceback (most recent call last):
File "/snap/smallrnaseq/3/bin/smallrnaseq", line 11, in
load_entry_point('smallrnaseq==0.5.0', 'console_scripts', 'smallrnaseq')()
File "/snap/smallrnaseq/3/lib/python3.5/site-packages/smallrnaseq/app.py", line 504, in main
W.run()
File "/snap/smallrnaseq/3/lib/python3.5/site-packages/smallrnaseq/app.py", line 101, in run
self.map_mirnas()
File "/snap/smallrnaseq/3/lib/python3.5/site-packages/smallrnaseq/app.py", line 217, in map_mirnas
cpus=self.cpus)
File "/snap/smallrnaseq/3/lib/python3.5/site-packages/smallrnaseq/novel.py", line 727, in find_mirnas
new = find_from_known(new, species)
File "/snap/smallrnaseq/3/lib/python3.5/site-packages/smallrnaseq/novel.py", line 760, in find_from_known
utils.make_blastdb('temp.fa', title='mirbase-temp')
File "/snap/smallrnaseq/3/lib/python3.5/site-packages/smallrnaseq/utils.py", line 142, in make_blastdb
result = subprocess.check_output(cmd, shell=True, executable='/bin/bash')
File "/snap/smallrnaseq/3/usr/lib/python3.5/subprocess.py", line 626, in check_output
**kwargs).stdout
File "/snap/smallrnaseq/3/usr/lib/python3.5/subprocess.py", line 708, in run
output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command 'makeblastdb -in temp.fa -dbtype nucl -out mirbase-temp' returned non-zero exit status 127

thanks

TypeError: plot_results() missing 1 required positional argument: 'path'

Dear Damien,
I am running smallrnaseq and I got error message in app.py,could you help me!!
smallrnaseq -c default.conf -r
Traceback (most recent call last):
File "/home/zhangjun/miniconda3/bin/smallrnaseq", line 8, in
sys.exit(main())
File "/home/zhangjun/miniconda3/lib/python3.7/site-packages/smallrnaseq/app.py", line 543, in main
W.run()
File "/home/zhangjun/miniconda3/lib/python3.7/site-packages/smallrnaseq/app.py", line 124, in run
self.map_libraries()
File "/home/zhangjun/miniconda3/lib/python3.7/site-packages/smallrnaseq/app.py", line 179, in map_libraries
plot_results(res, out)
TypeError: plot_results() missing 1 required positional argument: 'path'

smallrnaseq output file data

I am trying this fabulous package for identification and quantitation of small ncRNAs. In the output file, for each sample, there are two columns on counts, one is s0, and another s0.norm. I have two questions to ask for help. First, which column should I use for downstream DESeq2 (I assume should be s0 (raw counts), is it correct?)? Second, how is the column of s0.norm calculated based on the column s0? Thanks in advance, and stay well.

Error of installation

I found smallrnaseq depends on scikit-learn==0.19.1, while the new version of which is 0.22.
I can install 0.22 version without error, but unfortunately, 0.19.1 seems not compatible in my computer.
Here is some report of error informations:

ERROR: Command errored out with exit status 1:
     command: /home/huangwb8/Downloads/python/bin/python3.7 -u -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-bxl8n81d/scikit-learn/setup.py'"'"'; __file__='"'"'/tmp/pip-install-bxl8n81d/scikit-learn/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' install --record /tmp/pip-record-jeq635gu/install-record.txt --single-version-externally-managed --compile
         cwd: /tmp/pip-install-bxl8n81d/scikit-learn/

Do you have any suggestions?

Error in collapsing the file

No matter in which format I create the input file (fastq,fasta) for the config file, I always get the same error message about a missing file that should be created from your script. Even if the collapsing step wouldn't even be necessary anymore..

collapsing reads **.fa

Traceback (most recent call last):
  File "/home/.local/bin/smallrnaseq", line 11, in <module>
    load_entry_point('smallrnaseq==0.5.0', 'console_scripts', 'smallrnaseq')()
  File "/home/.local/lib/python2.7/site-packages/smallrnaseq/app.py", line 504, in main
    W.run()
  File "/home/.local/lib/python2.7/site-packages/smallrnaseq/app.py", line 105, in run
    self.map_genomic_features()
  File "/home/.local/lib/python2.7/site-packages/smallrnaseq/app.py", line 264, in map_genomic_features
    aligner_params=params)
  File "/home/.local/lib/python2.7/site-packages/smallrnaseq/base.py", line 411, in map_genome_features
    cfiles = collapse_files(files, outpath)
  File "/home/.local/lib/python2.7/site-packages/smallrnaseq/base.py", line 529, in collapse_files
    res = collapse_reads(f, outfile=collapsedfile, kwargs)
  File "/home/.local/lib/python2.7/site-packages/smallrnaseq/base.py", line 514, in collapse_reads
    utils.dataframe_to_fasta(df, idkey='read_id', outfile=outfile)
  File "/home/.local/lib/python2.7/site-packages/smallrnaseq/utils.py", line 198, in dataframe_to_fasta
    fastafile = open(outfile, "w")
IOError: [Errno 2] No such file or directory: 'results/temp/**.fa

The directory results is is existend, but not temp after the error occured.

KeyError: "['sample_col' 'factor_col'] not in index"

I am trying to run the smallrnaseq pipeline for DE. Below are my files:

head rna_counts.csv

name,ref,Index31s_CP_C,Index32s_CP_C,Index34s_CP_V,Index35s_CP_V,Index31s_CP_C_norm,Index32s_CP_C_norm,Index34s_CP_V_norm,Index35s_CP_V_norm,total_reads,mean_norm
URS000059E1FE_10116,rattus_piRNA,207622.0,190965.0,239008.0,318060.0,189478.31,130573.11,198559.29,237604.68,955655.0,189053.84749999997
URS00003CC5C5_10116,rattus_piRNA,150770.0,413600.0,87051.0,222064.0,137594.5,282800.71,72318.85,165891.48,873485.0,164651.385
URS0000444A5C_10116,rattus_piRNA,236086.0,184428.0,216096.0,110362.0,215454.9,126103.41,179524.82,82445.22,746972.0,150882.0875

cat metadata.txt:

sample_s	group_s	replicate
Index31s_CP_C	control	s1
Index32s_CP_C	control	s2
Index34s_CP_V	vinclo	s1
Index35s_CP_V	vinclo	s2

config file:

[base]
filenames = Index31s_CP_C.fastq,Index32s_CP_C.fastq,Index34s_CP_V.fastq,Index35s_CP_V.fastq
path =
overwrite = 0
adapter =  AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA
index_path = indexes
libraries = rattus_miRNA,rattus_piRNA
ref_fasta = genome.fa
features = rn6_genes.gtf
output = results
add_labels = 0
aligner = bowtie
mirna = 0
species = rno
pad5 = 3
pad3 = 5
verbose = 1
cpus = 8

[aligner]
default_params = -v 1 --best
mirna_params = -v 1 -a --best --strata --norc

[de]
sample_labels = metadata.txt
sep = tab #tab delimiter
count_file = rna_counts.csv
sample_col = sample_s
factors_col = group_s
conditions = control,vinclo
logfc_cutoff = 1.5

Now, the error I keep getting:

running differential expression
/home/sateeshp/.local/lib/python2.7/site-packages/smallrnaseq/app.py:334: ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.
  labels = pd.read_csv(labelsfile, sep=sep)
using these labels:
Traceback (most recent call last):
  File "/home/sateeshp/.local/bin/smallrnaseq", line 11, in <module>
    sys.exit(main())
  File "/home/sateeshp/.local/lib/python2.7/site-packages/smallrnaseq/app.py", line 484, in main
    diff_expression(options)
  File "/home/sateeshp/.local/lib/python2.7/site-packages/smallrnaseq/app.py", line 343, in diff_expression
    print (labels[[samplecol, factorcol]].sort_values(factorcol))
  File "/home/sateeshp/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 2682, in __getitem__
    return self._getitem_array(key)
  File "/home/sateeshp/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 2726, in _getitem_array
    indexer = self.loc._convert_to_indexer(key, axis=1)
  File "/home/sateeshp/.local/lib/python2.7/site-packages/pandas/core/indexing.py", line 1327, in _convert_to_indexer
    .format(mask=objarr[mask]))
KeyError: "['sample_s' 'group_s'] not in index"