lpantano / seqcluster Goto Github PK

View Code? Open in Web Editor NEW

35.0 3.0 16.0 7.83 MB

small RNA analysis from NGS data

Home Page: http://seqcluster.readthedocs.io

License: MIT License

Mathematica 24.80% CSS 2.37% HTML 57.23% JavaScript 0.26% Python 14.81% Shell 0.37% R 0.11% Dockerfile 0.06%

smallrna mirna trna report sequencing

seqcluster's Introduction

seqcluster

Ask questions in the repo's Gitter: Join the chat at:

small RNA analysis from NGS data

Cite

Specific small-RNA signatures in the amygdala at premotor and motor stages of Parkinson's disease revealed by deep sequencing analysis. Pantano L, Friedländer MR, Escaramís G, Lizano E, Pallarès-Albanell J, Ferrer I, Estivill X, Martí E. Bioinformatics. 2015 Nov 2. pii: btv632. [Epub ahead of print] PMID: 26530722

A non-biased framework for the annotation and classification of the non-miRNA small RNA transcriptome. Pantano L1, Estivill X, Martí E. Bioinformatics. 2011 Nov 15;27(22):3202-3. doi: 10.1093/bioinformatics/btr527. Epub 2011 Oct 5. PMID: 21976421

Quick start links ---------

See installation at http://seqcluster.readthedocs.org/installation.html

Moreover bcbio-nextgen provides a python framework to run a whole pipeline for small RNA (miRNA + tRNA + piRNA + others).

An example of how to run with bcbio is here: http://seqcluster.readthedocs.org/example_pipeline.html#mirqc-data

In case you want to use seqcluster alone, a complete tutorial is here: http://seqcluster.readthedocs.org/getting_started.html#clustering-of-small-rna-sequences

Report

Seqcluster creates html report that looks like this. That is a table of all cluster detected, and you can go into each of them and get a complete description with profile expression figures, annotation details and sequences counts for each sample. An interactive report is as well available to explore the expression profile, position on the genome and secondary structure. Learn more http://seqcluster.readthedocs.org/more_outputs.html#report.

Contributors

Lorena Pantano (Bioinformatic Core, Harvard Chan School, Boston, USA)
Judith Flo Gaya (School of Engineer and Aplied Science- Harvard University, Boston, USA)
Eulalia Marti Puig (Genomics and Disease, Center of Genomic Regulation, Barcelona, Spain)
Francisco Pantano Rubino: Architect
Steffen Möller (University of Rostock)

seqcluster's People

Contributors

Stargazers

Watchers

Forkers

gitter-badger naumenko-sa b1234561 sinanugur ctoste dfajar2 senaj smoe bakerwm chapmanb gnetsanet lindanimoyo kkarolis druvus genesislearn standardgalactic

seqcluster's Issues

Regarding of fastq quality by collapse

Hi, Ipantano

I wonder about the quality score in fastq by collapse

Methods

The new quality values are the average of each of the sequence collapse.

I think preserving the highest average quality score from fastq file is worth more than average for mapping.

Please clean-up or git-ignore data/examples/miraligner/sim_isomir_unique.fa

Hello,
This file is apparently auto-created during tests and does not go away when cleaning the source tree.
Cheers,
Steffen

python setup.py test - init.py argument missing

Hello,
I have checked out your devel branch. Invoking tests, with python 2.7 this shows

$ python setup.py test
running test
running egg_info
writing seqcluster.egg-info/PKG-INFO
writing top-level names to seqcluster.egg-info/top_level.txt
writing dependency_links to seqcluster.egg-info/dependency_links.txt
writing entry points to seqcluster.egg-info/entry_points.txt
reading manifest template 'MANIFEST.in'
writing manifest file 'seqcluster.egg-info/SOURCES.txt'
running build_ext
Traceback (most recent call last):
  File "setup.py", line 34, in <module>
    zip_safe=False)
  File "/usr/lib/python2.7/distutils/core.py", line 151, in setup
    dist.run_commands()
  File "/usr/lib/python2.7/distutils/dist.py", line 953, in run_commands
    self.run_command(cmd)
  File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
    cmd_obj.run()
  File "/home/moeller/.local/lib/python2.7/site-packages/setuptools/command/test.py", line 211, in run
    self.run_tests()
  File "/home/moeller/.local/lib/python2.7/site-packages/setuptools/command/test.py", line 234, in run_tests
    **exit_kwarg
  File "/usr/lib/python2.7/unittest/main.py", line 94, in __init__
    self.parseArgs(argv)
  File "/usr/lib/python2.7/unittest/main.py", line 149, in parseArgs
    self.createTests()
  File "/usr/lib/python2.7/unittest/main.py", line 158, in createTests
    self.module)
  File "/usr/lib/python2.7/unittest/loader.py", line 130, in loadTestsFromNames
    suites = [self.loadTestsFromName(name, module) for name in names]
  File "/usr/lib/python2.7/unittest/loader.py", line 103, in loadTestsFromName
    return self.loadTestsFromModule(obj)
  File "/home/moeller/.local/lib/python2.7/site-packages/setuptools/command/test.py", line 43, in loadTestsFromModule
    tests.append(self.loadTestsFromName(submodule))
  File "/usr/lib/python2.7/unittest/loader.py", line 103, in loadTestsFromName
    return self.loadTestsFromModule(obj)
  File "/home/moeller/.local/lib/python2.7/site-packages/setuptools/command/test.py", line 29, in loadTestsFromModule
    tests.append(TestLoader.loadTestsFromModule(self, module))
  File "/usr/lib/python2.7/unittest/loader.py", line 65, in loadTestsFromModule
    tests.append(self.loadTestsFromTestCase(obj))
  File "/usr/lib/python2.7/unittest/loader.py", line 56, in loadTestsFromTestCase
    loaded_suite = self.suiteClass(map(testCaseClass, testCaseNames))
TypeError: __init__() takes at least 3 arguments (2 given)

while python 3.6.6 gives me

$ python3 setup.py test
running test
running egg_info
writing seqcluster.egg-info/PKG-INFO
writing dependency_links to seqcluster.egg-info/dependency_links.txt
writing entry points to seqcluster.egg-info/entry_points.txt
writing top-level names to seqcluster.egg-info/top_level.txt
reading manifest template 'MANIFEST.in'
writing manifest file 'seqcluster.egg-info/SOURCES.txt'
running build_ext
Traceback (most recent call last):
  File "setup.py", line 34, in <module>
    zip_safe=False)
  File "/usr/lib/python3/dist-packages/setuptools/__init__.py", line 129, in setup
    return distutils.core.setup(**attrs)
  File "/usr/lib/python3.6/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/usr/lib/python3.6/distutils/dist.py", line 955, in run_commands
    self.run_command(cmd)
  File "/usr/lib/python3.6/distutils/dist.py", line 974, in run_command
    cmd_obj.run()
  File "/usr/lib/python3/dist-packages/setuptools/command/test.py", line 226, in run
    self.run_tests()
  File "/usr/lib/python3/dist-packages/setuptools/command/test.py", line 248, in run_tests
    exit=False,
  File "/usr/lib/python3.6/unittest/main.py", line 94, in __init__
    self.parseArgs(argv)
  File "/usr/lib/python3.6/unittest/main.py", line 141, in parseArgs
    self.createTests()
  File "/usr/lib/python3.6/unittest/main.py", line 148, in createTests
    self.module)
  File "/usr/lib/python3.6/unittest/loader.py", line 219, in loadTestsFromNames
    suites = [self.loadTestsFromName(name, module) for name in names]
  File "/usr/lib/python3.6/unittest/loader.py", line 219, in <listcomp>
    suites = [self.loadTestsFromName(name, module) for name in names]
  File "/usr/lib/python3.6/unittest/loader.py", line 190, in loadTestsFromName
    return self.loadTestsFromModule(obj)
  File "/usr/lib/python3/dist-packages/setuptools/command/test.py", line 52, in loadTestsFromModule
    tests.append(self.loadTestsFromName(submodule))
  File "/usr/lib/python3.6/unittest/loader.py", line 190, in loadTestsFromName
    return self.loadTestsFromModule(obj)
  File "/usr/lib/python3/dist-packages/setuptools/command/test.py", line 38, in loadTestsFromModule
    tests.append(TestLoader.loadTestsFromModule(self, module))
  File "/usr/lib/python3.6/unittest/loader.py", line 123, in loadTestsFromModule
    tests.append(self.loadTestsFromTestCase(obj))
  File "/usr/lib/python3.6/unittest/loader.py", line 92, in loadTestsFromTestCase
    loaded_suite = self.suiteClass(map(testCaseClass, testCaseNames))
  File "/usr/lib/python3.6/unittest/suite.py", line 24, in __init__
    self.addTests(tests)
  File "/usr/lib/python3.6/unittest/suite.py", line 57, in addTests
    for test in tests:
TypeError: __init__() missing 1 required positional argument: 'exc_val'

Can you possibly reproduce that?

Cheers,
Steffen

Questions about using the seqcluster json output for some pipeline analysis

Hi there! I am currently using seqcluster to do some pipeline analysis on small RNAs, which requires me to extract some information that is embedded in the json output for seqcluster.

These are the two questions I have regarding the json file:

I would like to confirm that the order of the entries in the json file is as follows:
metacluster -> the loci information -> cluster ID -> chromosome -> start -> end -> strand -> number of sequence within the cluster ?
What are the factors that determine which small RNA sequences are reported in the json file ie. the ones with the most clusters?

Thank you. It would be great if you can help clarify these issues.

Contribution by class

Hello,
I "successfully" ran the smallRNA-seq pipeline on my microRNA data and am now going through the report generation R script to check for quality and such.

I am seeing a few issues and was wondering if I could bug you about it:

1. Contribution by class
I read in counts.tsv as in the module below

clus <- read.table(file.path(seq_dir, "counts.tsv"),header=T,sep="\t",row.names=1, check.names = FALSE)
ann <- clus[,2]
toomany <- clus[,1]
clus_ma <- clus[,3:ncol(clus)]
clus_ma = clus_ma[,row.names(design)]

The annotation (column 3) appears to be all "|" and therefore when I ask it to count the number of rRNA, tRNA and miRNA seen in the samples, I only get zero's returned. I checked the counts.tsv file (in final/project*/seqcluster/counts.tsv) and the "ann" column has no annotations. Is there a flag or step I need to incorporate in my pipeline for this to work?

2. pheatmap

### Clustering
{r mirna-clustering, eval=ismirbase}
counts = counts(obj)
dds = DESeqDataSetFromMatrix(counts[rowSums(counts>0)>3,], colData=design, design=~1)
vst = rlog(dds)

pheatmap(assay(vst), annotation_col = design, show_rownames = F, 
         clustering_distance_cols = "correlation",
         clustering_method = "ward.D")

when I try to run the pheatmap I get the following error :
Error in cut.default(a, breaks = 100) : 'x' must be numeric

the vst =rlog(dds) executes fine

summary(vst)
        Length          Class           Mode 
          1198 DESeqTransform             S4

MDS plot

{r pca, eval=ncol(counts) > 1, eval=ismirbase}
mds(assay(vst), condition = design[,condition])

this gives me Error: could not find function "mds"
should I be loading any specific library or can I run the mds plot through ggplot?

thanks
Arun

IsomirDataSeqFromFiles error

> obj <- IsomirDataSeqFromFiles(files = files[rownames(design)], design = design , header = T)
Error in initialize(value, ...) : 
  cannot use object of class “SummarizedExperiment0” in new():  class “IsomirDataSeq” does not extend that class

Hi, Ipantano

Would you give me any suggestion about this error ?

An error occurred when running the program

Hello, I want to use this tool for miRNA analysis. It has been installed, but the following error was reported when running the demo data. What should I do to solve this problem?

[2023-11-08T02:44Z] System YAML configuration: /home/wayenbio/bcbio/galaxy/bcbio_system.yaml. [2023-11-08T02:44Z] Locale set to C.UTF-8. [2023-11-08T02:44Z] Resource requests: atropos, picard; memory: 4.00, 4.00; cores: 16, 16 [2023-11-08T02:44Z] Configuring 1 jobs to run, using 1 cores each with 4.00g of memory reserved for each job [2023-11-08T02:44Z] Timing: organize samples [2023-11-08T02:44Z] multiprocessing: organize_samples [2023-11-08T02:44Z] Using input YAML configuration: /home/wayenbio/rnaseq-seqc/mirqc_bcbio/config/mirqc_bcbio.yaml [2023-11-08T02:44Z] Checking sample YAML configuration: /home/wayenbio/rnaseq-seqc/mirqc_bcbio/config/mirqc_bcbio.yaml Traceback (most recent call last): File "/home/wayenbio/bcbio/anaconda/bin/bcbio_nextgen.py", line 245, in <module> main(**kwargs) File "/home/wayenbio/bcbio/anaconda/bin/bcbio_nextgen.py", line 46, in main run_main(**kwargs) File "/home/wayenbio/bcbio/anaconda/lib/python3.7/site-packages/bcbio/pipeline/main.py", line 50, in run_main fc_dir, run_info_yaml) File "/home/wayenbio/bcbio/anaconda/lib/python3.7/site-packages/bcbio/pipeline/main.py", line 91, in _run_toplevel for xs in pipeline(config, run_info_yaml, parallel, dirs, samples): File "/home/wayenbio/bcbio/anaconda/lib/python3.7/site-packages/bcbio/pipeline/main.py", line 332, in smallrnaseqpipeline samples = rnaseq_prep_samples(config, run_info_yaml, parallel, dirs, samples) File "/home/wayenbio/bcbio/anaconda/lib/python3.7/site-packages/bcbio/pipeline/main.py", line 467, in rnaseq_prep_samples [x[0]["description"] for x in samples]]]) File "/home/wayenbio/bcbio/anaconda/lib/python3.7/site-packages/bcbio/distributed/multi.py", line 28, in run_parallel return run_multicore(fn, items, config, parallel=parallel) File "/home/wayenbio/bcbio/anaconda/lib/python3.7/site-packages/bcbio/distributed/multi.py", line 86, in run_multicore for data in joblib.Parallel(parallel["num_jobs"], batch_size=1, backend="multiprocessing")(joblib.delayed(fn)(*x) for x in items): File "/home/wayenbio/bcbio/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 1048, in __call__ if self.dispatch_one_batch(iterator): File "/home/wayenbio/bcbio/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 866, in dispatch_one_batch self._dispatch(tasks) File "/home/wayenbio/bcbio/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 784, in _dispatch job = self._backend.apply_async(batch, callback=cb) File "/home/wayenbio/bcbio/anaconda/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 208, in apply_async result = ImmediateResult(func) File "/home/wayenbio/bcbio/anaconda/lib/python3.7/site-packages/joblib/_parallel_backends.py", line 572, in __init__ self.results = batch() File "/home/wayenbio/bcbio/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 263, in __call__ for func, args, kwargs in self.items] File "/home/wayenbio/bcbio/anaconda/lib/python3.7/site-packages/joblib/parallel.py", line 263, in <listcomp> for func, args, kwargs in self.items] File "/home/wayenbio/bcbio/anaconda/lib/python3.7/site-packages/bcbio/utils.py", line 59, in wrapper return f(*args, **kwargs) File "/home/wayenbio/bcbio/anaconda/lib/python3.7/site-packages/bcbio/distributed/multitasks.py", line 459, in organize_samples return run_info.organize(*args) File "/home/wayenbio/bcbio/anaconda/lib/python3.7/site-packages/bcbio/pipeline/run_info.py", line 81, in organize item = add_reference_resources(item, remote_retriever) File "/home/wayenbio/bcbio/anaconda/lib/python3.7/site-packages/bcbio/pipeline/run_info.py", line 177, in add_reference_resources data["dirs"]["galaxy"], data) File "/home/wayenbio/bcbio/anaconda/lib/python3.7/site-packages/bcbio/pipeline/genome.py", line 233, in get_refs galaxy_config, data) File "/home/wayenbio/bcbio/anaconda/lib/python3.7/site-packages/bcbio/pipeline/genome.py", line 180, in _get_ref_from_galaxy_loc (genome_build, os.path.normpath(loc_file))) ValueError: Did not find genome build hg19 in bcbio installation: /home/wayenbio/bcbio/galaxy/tool-data/sam_fa_indices.loc

counts should show specific names where the clusters map

Unfamiliar output from bcbio

I was running a smallRNA analysis on bcbio, and was expecting the output to be like previous analyses I have carried out, where each isomir is specified according to your naming convention. However, the results of my most recent analysis has been the following, which isn't particularly informative

counts.txt

Is this expected? And if it isn't what's likely to be causing it?

think about apply peak detection

https://gist.github.com/sixtenbe/1178136
https://pypi.python.org/pypi/pypeaks/0.2.6

Permission denied during collapse step

Hi Lorena !

I found a new issue with seqcluster using the following command line :
/usr/local/bin/seqcluster collapse -f file.fastq -o out_collapse

Here the error message :

Traceback (most recent call last):
File "/usr/local/bin/seqcluster", line 9, in
load_entry_point('seqcluster==1.2.4a0', 'console_scripts', 'seqcluster')()
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/init.py", line 549, in load_entry_point
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/init.py", line 2542, in load_entry_point
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/init.py", line 2202, in load
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/setuptools-20.3-py2.7.egg/pkg_resources/init.py", line 2208, in resolve
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/seqcluster/command_line.py", line 6, in
from create_report import report
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/seqcluster/create_report.py", line 7, in
_set_matplotlib_default_backend()
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/install.py", line 143, in _set_matplotlib_default_backend
out_file.write(line)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/contextlib.py", line 24, in exit
self.gen.next()
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/transaction.py", line 75, in file_transaction
_move_tmp_files(safe, orig)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/transaction.py", line 92, in _move_tmp_files
_move_file_with_sizecheck(safe, orig)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/bcbio/distributed/transaction.py", line 117, in _move_file_with_sizecheck
shutil.move(tx_file, final_file)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/shutil.py", line 302, in move
copy2(src, real_dst)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/shutil.py", line 130, in copy2
copyfile(src, dst)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/shutil.py", line 83, in copyfile
with open(dst, 'wb') as fdst:
IOError: [Errno 13] Permission denied: u'/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/matplotlib/mpl-data/matplotlibrc'

Do you know what could be the problem?

Thanks in advance,
Alexandra

report problem

Hi @naumenko-sa,

for the problem with the report we can discuss here.

That report is a template that won't work in all analysis sadly. Can you tell me what would you like with your data? Since you only have 2 samples, probably you only are interested in a couple of figures only since we cannot do a lot with that number.

Some questions:

Does root_path point to the final folder?

And what you get when you run list.files(file.path(root_path),pattern = "trimming_stats",recursive = T) inside R?

As I said, little thing you will get from this report. The most important is the size distribution that you can see it as well open the HTML from the multiqc folder. I plan to migrate almost all QC figures to there during summer, so this will be better.

If you give me more information about what you would like to have, I may be able to help.

cheers

error install

#####command：bin/conda install seqcluster bcbio-nextgen -c bioconda
Fetching package metadata: ......
Solving package specifications: .
Error: Dependencies missing in current linux-64 channels:

bcbio-nextgen -> ipdb
bcbio-nextgen -> arrow

Did you mean one of these?

idba

You can search for this package on anaconda.org with

anaconda search -t conda arrow

(and similarly for the other packages)

You may need to install the anaconda-client command line client with

conda install anaconda-client

it mean need ipdb，arrow？and i pip install them ，still can't install？
how to solve it？

issue in seqcluster prepare

HI,
The command for running "prepare" step was:

seqcluster prepare -c files-collapsed -o ${outdir}/prepd-seqs --minl 17 --minc 2 --maxl 45
The error message I got is:

Traceback (most recent call last): File "/wrk/kipokh/DONOTREMOVE/bioconda3_env/bioconda_tools/bin/seqcluster", line 11, in <module> load_entry_point('seqcluster==1.2.4a14', 'console_scripts', 'seqcluster')() File "/wrk/kipokh/DONOTREMOVE/bioconda3_env/bioconda_tools/lib/python3.6/site-packages/seqcluster/command_line.py", line 25, in main prepare(kwargs["args"]) File "/wrk/kipokh/DONOTREMOVE/bioconda3_env/bioconda_tools/lib/python3.6/site-packages/seqcluster/prepare_data.py", line 38, in prepare seq_l, sample_l = _read_fastq_files(f, args) File "/wrk/kipokh/DONOTREMOVE/bioconda3_env/bioconda_tools/lib/python3.6/site-packages/seqcluster/prepare_data.py", line 103, in _read_fastq_files with open_fastq(cols[0]) as handle: AttributeError: __enter__
Can you please tell me what could I do to solve this issue?

Thanks!

html report

[x] matrix file
[ ] spread position from start => end
[ ] convert to panda
[ ] summarize
[ ] plot with python

Unable to find documentation

Hello,
I am wondering about the following call to seqcluster that I found in a pipeline:
seqcluster collapse -f reads -m 1 --min_size 15 -o some_folder

What do the options -m and --min_size do?
Please let me know or point me to the documentation.

Thanks in advance!

add matrix similarity score

the idea is to merge locus by similarity sequences. This should be inside reduce_loci function, just what is call now _solve_loci

problem with "Prepare samples" step

Hello !

I'm trying to use seqcluster to handle multi-reads obtained from small RNA-seq.
When I use "seqcluster prepare", I get the output files : log/, seqs.fastq, seqs.ma & stats_prepare.tsv.
But they contain any information.

seqs.fastq is totally empty;
seqs.ma contains only headers : id seq SCR1 SCR2 SCR7

stats_prepare.tsv contains :
total 1 SCR1
added 0 SCR1
total 1 SCR2
added 0 SCR2
total 1 SCR7
added 0 SCR7
What does this mean ?

I wanted to know how exactly I can prepare my samples before the clustering step.
Indeed, I red that seqs.ma is recommended for that.

Thanks for your help ! :)
Alexandra

running from bcbio pipeline

Hi,
I like the report about mirQC you have shown in here: https://github.com/lpantano/mypubs/blob/master/srnaseq/mirqc/ready_report.md

Mostly, I wanted to create a barplot which has contribution of each type of RNA. As suggested I run seqcluster from bcbio to achieve this. However, in the upload folder I only see an HTML report from fastq and nothing similar shown in the above page. I modified https://github.com/lpantano/seqcluster/blob/master/data/pipeline_example/mirqc/template.yaml

the template above to do this task. Am i missing something?

Second question I have is in the log file it says:

[2018-03-14T14:28Z] multiprocessing: seqcluster_prepare
[2018-03-14T14:28Z] You didn't specify any other expression caller tool.You can add to the YAML file:expression_caller:[trna, seqcluster, mirdeep2]

Is there a way to use all the capabilities without manually specifying them?

Thanks

Collapse

I'm running seqcluster inside the provided docker container (https://hub.docker.com/r/bcbio/bcbio/)

When running the collapse script I encounter this error:

['collapse', '-f', '/seqdata/readscleaned/33_S1_body.fastq.clean', '-o', '/seqdata/collapsed']
INFO Run collapse
Traceback (most recent call last):
File "/usr/local/share/bcbio-nextgen/anaconda/bin/seqcluster", line 9, in
load_entry_point('seqcluster==1.1.14', 'console_scripts', 'seqcluster')()
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/seqcluster/command_line.py", line 49, in main
collapse_fastq(kwargs["args"])
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/seqcluster/collapse.py", line 13, in collapse_fastq
seqs = collapse(args.fastq)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/seqcluster/libs/fastq.py", line 11, in collapse
with open_fastq(in_file) as handle:
AttributeError: exit

Any ideas what the problem could be?
Thanks

new alignment tool for srna

think about it: https://www.ncbi.nlm.nih.gov/pubmed/27307639

add length enrichment

cluster should show if there is any enrichment of small rna

issue in seqcluster collapser

Hi,
I have already installed the bcbio and run the cutadapt. After i run the collapse, it cannot work.

[ns_phd06@tll-rv1 bin]$ seqcluster collapse -f /data/GenhuaLab/zituo/BCBIOresults/sm0h_1.fastq -o /data/GenhuaLab/zituo/BCBIOresults/sm0h_1.collapser.fastq

Traceback (most recent call last):
File "./seqcluster", line 11, in
load_entry_point('seqcluster==1.2.4a0', 'console_scripts', 'seqcluster')()
File "/data/GenhuaLab/zituo/local/share/bcbio/anaconda/lib/python2.7/site-packages/setuptools-27.2.0-py2.7.egg/pkg_resources/init.py", line 565, in load_entry_point
File "/data/GenhuaLab/zituo/local/share/bcbio/anaconda/lib/python2.7/site-packages/setuptools-27.2.0-py2.7.egg/pkg_resources/init.py", line 2598, in load_entry_point
File "/data/GenhuaLab/zituo/local/share/bcbio/anaconda/lib/python2.7/site-packages/setuptools-27.2.0-py2.7.egg/pkg_resources/init.py", line 2258, in load
File "/data/GenhuaLab/zituo/local/share/bcbio/anaconda/lib/python2.7/site-packages/setuptools-27.2.0-py2.7.egg/pkg_resources/init.py", line 2264, in resolve
File "/data/GenhuaLab/zituo/local/share/bcbio/anaconda/lib/python2.7/site-packages/seqcluster/command_line.py", line 5, in
from make_clusters import cluster
File "/data/GenhuaLab/zituo/local/share/bcbio/anaconda/lib/python2.7/site-packages/seqcluster/make_clusters.py", line 11, in
import pybedtools
File "/data/GenhuaLab/zituo/local/share/bcbio/anaconda/lib/python2.7/site-packages/pybedtools/init.py", line 12, in
from . import contrib
File "/data/GenhuaLab/zituo/local/share/bcbio/anaconda/lib/python2.7/site-packages/pybedtools/contrib/init.py", line 3, in
from . import venn_maker
File "/data/GenhuaLab/zituo/local/share/bcbio/anaconda/lib/python2.7/site-packages/pybedtools/contrib/venn_maker.py", line 12, in
from pybedtools import helpers
File "/data/GenhuaLab/zituo/local/share/bcbio/anaconda/lib/python2.7/site-packages/pybedtools/helpers.py", line 13, in
import pysam
File "/data/GenhuaLab/zituo/local/share/bcbio/anaconda/lib/python2.7/site-packages/pysam/init.py", line 5, in
from pysam.libchtslib import *
ImportError: libhts.so.1: cannot open shared object file: No such file or directory

Please help me, Thank!! @lpantano

miraligner Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1

$ cat .command.sh

!/bin/bash -ue

/BiO/BioTools/bcbio/data/anaconda/bin/miraligner -Xms705m -Xmx4500m -freq -sub 1 -trim 3 -add 3 -s hsa -i SRR950876_trimmed.fq.gz-collapse -db /BiO/BioTools/bcbio/data/genomes/Hsapiens/hg19/srnaseq -o ./

$ cat .command.sh | sh
Format is not tabular,guessing fasta
species found
Go to mapping...
Mismatches: 1
Trimming: 3
Addition: 3
Species: hsa
Fri Jun 17 01:34:43 KST 2016

Reading reads
Number of reads to be mapped: 374831
Searching in precursors
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
at miraligner.tools.getFreq(tools.java:122)
at miraligner.map.readseq(map.java:315)
at miraligner.Main.main(Main.java:85)

$ head SRR950876_trimmed.fq.gz-collapse

1-1
AAAAAAAAAAAAAAAAAAAAAAA
2-1
AAAAAAAAAAAAAAAAAAAAAAAAA
3-1
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
4-1
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
5-1
AAAAAAAAAAAAAAAAAAAAAAAGG

Could you check it? Thanks.

problem with "seqcluster report" command

Hello lpantano,

I had the following error launching the command "seqcluster report -j seqcluster.json -o Report -r ref.fasta" :

.................
INFO locus bigger > 500 nt, skipping: [[1108, u'1', 207324614, 207326831, u'+', 35]]
INFO locus bigger > 500 nt, skipping: [[3617, u'11', 62882013, 62882997, u'+', 11]]
INFO locus bigger > 500 nt, skipping: [[7088, u'15', 40038287, 40038885, u'-', 19]]
INFO locus bigger > 500 nt, skipping: [[16701, u'6', 26538052, 26687281, u'+', 659]]
['report', '-j', 'ClusteringStep/seqcluster.json', '-o', 'ReportStep', '-r', '/root/Homo_sapiens.GRCh38.dna.toplevel.cleaned.fasta']
Traceback (most recent call last):
File "/usr/local/bin/seqcluster", line 9, in
load_entry_point('seqcluster==1.2.3a0', 'console_scripts', 'seqcluster')()
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/seqcluster/command_line.py", line 31, in main
report(kwargs["args"])
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/seqcluster/create_report.py", line 32, in report
data = make_profile(data, out_dir, args)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/seqcluster/libs/report.py", line 66, in make_profile
data[0][c]['precursor'].update(run_rnafold(data[0][c]['precursor']['seq']))
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/site-packages/seqcluster/function/rnafold.py", line 12, in run_rnafold
process = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/subprocess.py", line 711, in init
errread, errwrite)
File "/usr/local/share/bcbio-nextgen/anaconda/lib/python2.7/subprocess.py", line 1343, in _execute_child
raise child_exception
OSError: [Errno 7] Argument list too long

I don't understand how this error was triggered. How can I avoid this problem ?

Thanks,
Alexandra

can python dateutils dependency go?

Hello,

The dateutils package does not ship with a well-defined license (jmcantrell/python-dateutils#3), even though that license demands that the source ships with the license text. This has the effect that that dependency cannot redistributed. And seqcluster as a reverse depedency cannot either.

I just had a look if I could come up with a patch to remove that dependency, but, well, I failed to find any import from it. Can that beast possibly just be removed from setup.py and the dockerfile?

Many thanks!
Steffen

Support for mouse genome

I am looking for a way to download data files for mouse mm10 genome. It seems that small RNA data for mm10 is not available from cloudbiolinux yet:
fab -f cloudbiolinux/data_fabfile.py -H localhost -c fabric.txt install_data_ggd:srnaseq,mm10

May I know if there is any way to run seqcluster for other model organisms such as mouse? Thank you very much and looking forward to your reply.

MIRALIGNER/question

Hi Lorena,

Hope this email finds you well,

I am working on miRNA seq data, and I am using your pipeline https://seqcluster.readthedocs.io/mirna_annotation.html to find Isomirs, but I am faced with some difficulties as the final output show me no results. so, I was wondering if I am making a mistake or something. one of the issues if in the pipeline page one of the file needed is miRNA.str? which I am supposed to download from mirbase. but there is not such file the only file available is mature.fa file is this the same? but when I use this file, it gives me an error that I am missing miRNA.str file?
so, I changed the mature.fa file to mature.str and it worked with no error but no results?

can you please help me

Thanks

command
(bcbio) ahmed@ahmed-Precision-5570:~$ java -jar /home/ahmed/tools/miraligner.jar -sub 1 -trim 3 -add 3 -s mmu -i /home/ahmed/data/PEHmiRNA/collapse/trimmed_001.R1_trimmed.fastq -db /home/ahmed/data/PEHmiRNA/DB -o /home/ahmed/data/PEHmiRNA/collapse/output_prefix
Format is not tabular,guessing fasta
species found
Go to mapping...
Mismatches: 1
Trimming: 3
Addition: 3
Species: mmu
Feb 20, 2024 2:02:32 PM miraligner.map readseq
INFO: Tue Feb 20 14:02:32 EST 2024

Feb 20, 2024 2:02:32 PM miraligner.map readseq
INFO: Tue Feb 20 14:02:32 EST 2024

Reading reads
Feb 20, 2024 2:02:33 PM miraligner.map readseq
INFO: Number of reads excluded because size: 590697
Feb 20, 2024 2:02:33 PM miraligner.map readseq
INFO: Number of reads excluded because size: 590697
Feb 20, 2024 2:02:33 PM miraligner.map readseq
INFO: Number of reads to be mapped: 154078
Feb 20, 2024 2:02:33 PM miraligner.map readseq
INFO: Number of reads to be mapped: 154078
Feb 20, 2024 2:02:33 PM miraligner.map readseq
INFO: Searching in precursors
Feb 20, 2024 2:02:33 PM miraligner.map readseq
INFO: Searching in precursors
Feb 20, 2024 2:02:36 PM miraligner.map readseq
INFO: Tue Feb 20 14:02:36 EST 2024
Feb 20, 2024 2:02:36 PM miraligner.map readseq
INFO: Tue Feb 20 14:02:36 EST 2024
Feb 20, 2024 2:02:36 PM miraligner.map readseq
INFO: Num reads annotated: 38720
Feb 20, 2024 2:02:36 PM miraligner.map readseq
INFO: Num reads annotated: 38720

seqcluster executable not found when testing

Hello again,
I had a closer look at the tests that miss an executable "seqcluster" script. I understand that this is created by setup.py at

      entry_points={
          'console_scripts': ['seqcluster=seqcluster.command_line:main', 'seqcluster_install=seqcluster.install:main'],
      },

The default build system of Debian then places this directly at
debian/python3-seqcluster-bin/usr/bin/seqcluster
with no other copy or footprint in your build / testing tree. Consequently, this leads to an error like

======================================================================
ERROR: Run miraligner analysis
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/moeller/git/debian-med/seqcluster-smoe/seqcluster/test/test_automated_analysis.py", line 107, in test_srnaseq_miraligner
    subprocess.check_call(cl)
  File "/usr/lib/python3.6/subprocess.py", line 286, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/usr/lib/python3.6/subprocess.py", line 267, in call
    with Popen(*popenargs, **kwargs) as p:
  File "/usr/lib/python3.6/subprocess.py", line 709, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.6/subprocess.py", line 1344, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'seqcluster': 'seqcluster'
-------------------- >> begin captured stdout << ---------------------
seqcluster seqbuster --sps hsa --hairpin ../../data/examples/miraligner/hairpin.fa --mirna ../../data/examples/miraligner/miRNA.str --gtf ../../data/examples/miraligner/hsa.gff3 -o test_out_mirs_fasta --miraligner ../../data/examples/miraligner/sim_isomir.fa with workdir '/home/moeller/git/debian-med/seqcluster-smoe/seqcluster/test/test_automated_output' executed from '/home/moeller/git/debian-med/seqcluster-smoe/seqcluster/test/test_automated_output'.

--------------------- >> end captured stdout << ----------------------

I had added the extra information on the cwd to the output. I admit to also be a bit confused about the expectation that the seqcluster executable is in the path. For travis it may just be fine, but normally one would want to test in the tree. Manually, I have done the following as a bit of a workaround:

moeller@steffen-laptop-debian:~/git/debian-med/seqcluster-smoe/seqcluster/test/test_automated_output$ PYTHONPATH=../.. ./seqcluster seqbuster --sps hsa --hairpin ../../data/examples/miraligner/hairpin.fa --mirna ../../data/examples/miraligner/miRNA.str --gtf ../../data/examples/miraligner/hsa.gff3 -o test_out_mirs_fasta --miraligner ../../data/examples/miraligner/sim_isomir.fa with workdir '/home/moeller/git/debian-med/seqcluster-smoe/seqcluster/test/test_automated_output' executed from '/home/moeller/git/debian-med/seqcluster-smoe/seqcluster/test/test_automated_output'.
Probably this will fail, you need bcbio-nextgen for many installation functions.
['seqbuster', '--sps', 'hsa', '--hairpin', '../../data/examples/miraligner/hairpin.fa', '--mirna', '../../data/examples/miraligner/miRNA.str', '--gtf', '../../data/examples/miraligner/hsa.gff3', '-o', 'test_out_mirs_fasta', '--miraligner', '../../data/examples/miraligner/sim_isomir.fa', 'with', 'workdir', '/home/moeller/git/debian-med/seqcluster-smoe/seqcluster/test/test_automated_output', 'executed', 'from', '/home/moeller/git/debian-med/seqcluster-smoe/seqcluster/test/test_automated_output.']
INFO Run seqbuster
INFO Reading ../../data/examples/miraligner/sim_isomir.fa
sh: 1: miraligner: not found
INFO Running miraligner with ../../data/examples/miraligner/sim_isomir_unique.fa
INFO Hits: 21
INFO Valid hits (+/-3 reference miRNA): 21
Traceback (most recent call last):
 File "./seqcluster", line 11, in <module>
   load_entry_point('seqcluster==1.2.4a8', 'console_scripts', 'seqcluster')()
 File "/home/moeller/git/debian-med/seqcluster-smoe/seqcluster/seqcluster/command_line.py", line 40, in main
   miraligner(kwargs["args"])
 File "/home/moeller/git/debian-med/seqcluster-smoe/seqcluster/seqcluster/seqbuster/__init__.py", line 515, in miraligner
   _mirtop(out_files, args.hairpin, args.gtf, args.sps, args.out)
 File "/home/moeller/git/debian-med/seqcluster-smoe/seqcluster/seqcluster/seqbuster/__init__.py", line 393, in _mirtop
   reader(args)
 File "/usr/lib/python3/dist-packages/mirtop/gff/__init__.py", line 50, in reader
   out_dts[fn] = body.create(ann, database, sample, args)
 File "/usr/lib/python3/dist-packages/mirtop/gff/body.py", line 49, in create
   for r, read in reads.iteritems():
AttributeError: 'collections.defaultdict' object has no attribute 'iteritems'

There is another Python3 issue with iteritems (https://stackoverflow.com/questions/13998492/iteritems-in-python) but that aside, without the explicit path to the seqcluster library and the PYTHONPATH setting, this would not have executed in the first place.

Is there a way to avoid running those scripts? Maybe by invoking the respective internal funtions directly? And, bcbio is a reverse dependency to seqcluster in my point of view. Is a build (i.e. test-) dependency on it avoidable?

Cheers,

Steffen

possible error with pandas 1.1

seqcluster/seqcluster/make_clusters.py

Line 50 in 38297c9

dt = pd.DataFrame({'sample': y.keys(), 'counts': y.values()})

peaks figure in report

http://nxn.se/post/97650612370/high-contrast-stacked-distribution-plots

figsize(6, 8)
xx = np.linspace(df.min().min(), df.max().max(), 256)
yy = None
offset = 0
offs = [offset]
for i in df:
if yy is not None:
offset += 0.5 * yy.max()
offs.append(offset)

s = df[i]
f = kde.gaussian_kde(s)
yy = f(xx)

plt.fill_between(xx, yy + offset, 0 * yy + offset,
                 zorder=-i,
                 facecolor='k',
                 edgecolor='w',
                 lw=3)

plt.axhline(y=offset,
            zorder=-i,
            linestyle='-',
            color='k',
            lw=4)

sns.despine(left=True, bottom=True)
plt.yticks(offs, ['Sample {}'.format(s + 1) for s in df.columns]);

plt.tight_layout()
plt.savefig('plots/3.png');

metadata_fn in report picks up operating system artifacts

Hello Lorena,

I edited the summary.csv file and produced a new file with a tilde. This messed up the downstream processing of the report. Maybe the pattern could be changed to prevent this? (I just deleted the other file as a fix)

metadata_fn = list.files(file.path(root_path), pattern = "summary.csv",recursive = T, full.names = T)

metadata_fn
[1] "/redacted/sampleconfig/final/2016-04-19_sampleconfig/report/summary.csv"
[2] "/redacted/sampleconfig/final/2016-04-19_sampleconfig/report/summary.csv~"

NameError: global name 'gzip' is not defined

Hi,

I followed the instruction manual, but could not manage to run 'seqcluster':
[ssg29@login-sand1 ~]$ which seqcluster
~/Install/seqcluster/linuxbrew/bin/seqcluster
[ssg29@login-sand1 ~]$ seqcluster
Traceback (most recent call last):
File "/home/ssg29/Install/seqcluster/linuxbrew/bin/seqcluster", line 4, in
from seqcluster.command_line import main
File "/home/ssg29/Install/seqcluster/anaconda/lib/python2.7/site-packages/seqcluster/command_line.py", line 4, in
from prepare_data import prepare
File "/home/ssg29/Install/seqcluster/anaconda/lib/python2.7/site-packages/seqcluster/prepare_data.py", line 8, in
from libs.classes import sequence_unique
File "/home/ssg29/Install/seqcluster/anaconda/lib/python2.7/site-packages/seqcluster/libs/classes.py", line 6, in
import numpy as np
File "/home/ssg29/.local/lib/python2.7/site-packages/numpy/init.py", line 170, in
from . import add_newdocs
File "/home/ssg29/.local/lib/python2.7/site-packages/numpy/add_newdocs.py", line 13, in
from numpy.lib import add_newdoc
File "/home/ssg29/.local/lib/python2.7/site-packages/numpy/lib/init.py", line 8, in
from .type_check import *
File "/home/ssg29/.local/lib/python2.7/site-packages/numpy/lib/type_check.py", line 11, in
import numpy.core.numeric as _nx
File "/home/ssg29/.local/lib/python2.7/site-packages/numpy/core/init.py", line 6, in
from . import multiarray
ImportError: /home/ssg29/.local/lib/python2.7/site-packages/numpy/core/multiarray.so: undefined symbol: PyUnicodeUCS2_AsASCIIString

It seems anaconda tries to pick up local numpy package as shown below.

If I manually export PYTHONPATH like:
PYTHONPATH=$HOME/Install/seqcluster/anaconda/lib/python2.7/site-packages:$PYTHONPATH

Then, seqcluster seems to work:
[ssg29@login-sand1 ~]$ seqcluster
use ['stats', 'collapse', 'prepare', 'predict', 'cluster', 'explore', 'report']

I wanted to use 'collapse', but with no luck:

[ssg29@login-sand1 ~]$ seqcluster collapse -f ~/results/SLX-9176.Homo_sapiens.v1/Trim/NEBsmRNA01/SLX-9176.NEBsmRNA01.C7TWVANXX.s_6.r_1_trimmed.fq.gz -o ~/results/SLX-9176.Homo_sapiens.v1/Trim/NEBsmRNA01/SLX-9176.NEBsmRNA01.C7TWVANXX.s_6.r_1_trimmed.collapsed
07/30/2015 03:31:13 PM INFO: Run collapse Traceback (most recent call last):
File "/home/ssg29/Install/seqcluster/linuxbrew/bin/seqcluster", line 6, in
sys.exit(main())
File "/home/ssg29/Install/seqcluster/anaconda/lib/python2.7/site-packages/seqcluster/command_line.py", line 40, in main
collapse_fastq(kwargs["args"])
File "/home/ssg29/Install/seqcluster/anaconda/lib/python2.7/site-packages/seqcluster/collapse.py", line 14, in collapse_fastq
seqs = collapse(args.fastq)
File "/home/ssg29/Install/seqcluster/anaconda/lib/python2.7/site-packages/seqcluster/libs/fastq.py", line 10, in collapse
with open_fastq(in_file) as handle:
File "/home/ssg29/Install/seqcluster/anaconda/lib/python2.7/site-packages/seqcluster/libs/fastq.py", line 30, in open_fastq
return gzip.open(in_file, 'rb')
NameError: global name 'gzip' is not defined

Any idea?

Sung

miRNA annotation with seqcluster from BAM file

Hello @lpantano,

I am a bit new to miRNA annotation, and I was trying to run seqbuster subcommand seqbuster on my BAM that was aligned by STAR. My command line was pretty standard following the doc:

/bcbio_anaconda_bin/seqcluster seqbuster --hairpin [hairpin.hsa.fa] --mirna [mirna.str] -sps hsa -o test [MY_BAM]

The error I got is "global name os is not defined" in the function _sam_to_bam.py within the init.py file .

So I went into that file and found you imported os.path as op. Simply changing this part didn't fix the problem. I further looked into this function, and the way I understand it is that if the input is with a "sam" suffix it will convert it to a "bam". But why in the if statement, it evaluates whether it ends with "bam"? I updated that to "sam", and this part passed. However, there was a new error from pysam.sort(). Please see the attached file. I feel it might be caused by pysam version?
error.txt

I also came across another error when I did not specify --hairpin and --mirna options, the program tried to download the two files, but ended with complaining about "gzip !$.gz". This can be avoided by downloading hairpin.fa and miRNA.str myself. I think you might want to know.

So how can I fix the first problem? I also saw the a GTF option, do I need to prepare that file? If so, any suggestion on how should I prepare it? Thanks very much in advance for your help.

happy holidays!

Simo

Bioconda -> Github is wrong link

Hi,

Just a heads up :)

link to seqcluster's github is misspelled.

https://github.com/lpantano/seqclsuter

Thought, you might want to change that ;)

No UMI transform specified, assuming pre-transformed data.

Hi,

I tried to repeat the small RNA seq pipeline from bcbio-nextgen that you posted (https://github.com/lpantano/seqcluster/tree/master/data/pipeline_example/mirqc) . However, I got the following error message:

No UMI transform specified, assuming pre-transformed data.
No UMI transform was specified, but /data/proj/mirnatest/raw/SRR950876.fastq.gz does not look pre-transformed.

Can you check for me please?

Thanks for your help!

Best regards,

Jianxiang

add json as results

TypeError: 'map' object is not subscriptable

Hi, I came across this error
TypeError: 'map' object is not subscriptable

INFO 1834545 Clusters read
INFO Creating meta-clusters based on shared sequences: 61291
99% (61244 of 61291) |################# | Elapsed Time: 7:19:39 ETA: 0:00:00INFO 512 metaclusters from 33960 sequences
INFO 512 clusters found
INFO counts after: 19759492
INFO # sequences after: 61375
INFO Solving multi-mapping events in the network of clusters
INFO Number of loci: 1834545
N/A% (0 of 512) | | Elapsed Time: 0:00:00 ETA: --:--:--Traceback (most recent call last):
File "/scratch/luohao/software/seqcluster/bin/seqcluster", line 10, in
sys.exit(main())
File "/scratch/luohao/software/seqcluster/lib/python3.6/site-packages/seqcluster/command_line.py", line 28, in main
cluster(kwargs["args"])
File "/scratch/luohao/software/seqcluster/lib/python3.6/site-packages/seqcluster/make_clusters.py", line 76, in cluster
clusLred = _cleaning(clusL, args.dir_out)
File "/scratch/luohao/software/seqcluster/lib/python3.6/site-packages/seqcluster/make_clusters.py", line 294, in _cleaning
clus_obj = reduceloci(clusL, path)
File "/scratch/luohao/software/seqcluster/lib/python3.6/site-packages/seqcluster/detect/metacluster.py", line 77, in reduceloci
_write_cluster(c, clus_obj.clus, clus_obj.loci, n_cluster, path)
File "/scratch/luohao/software/seqcluster/lib/python3.6/site-packages/seqcluster/detect/metacluster.py", line 102, in _write_cluster
print("\t".join(pos[:4] + [str(len(cluster[idc].loci2seq[idl]))] + [pos[-1]]), file=out_handle, end="")
TypeError: 'map' object is not subscriptable
100% (61291 of 61291) |##################| Elapsed Time: 7:20:41 Time: 7:20:41
100% (512 of 512) |######################| Elapsed Time: 0:00:14 Time: 0:00:14

Thanks,
Luohao

Issue with IsomirDataSeqFromFiles

Hello,

i am trying to use the package on smallRNA seq data. I have used miraligner and got *.mirna output.
Now I want to use the isoPlot( ) on our data. I tried with the example files given and used the commands as given below it works fine.

path <- system.file("extra", package="isomiRs")
fn_list <- list.files(path, full.names = TRUE)
de <- data.frame(row.names=c("f1" , "f2"), condition = c("newborn", "newborn"))
ids <- IsomirDataSeqFromFiles(fn_list, design=de)

Now, I am trying the same for our data and it is showing the error as follows.

Error in FUN(X[[i]], ...) : 
  assay colnames() must be NULL or equal colData rownames()
In addition: Warning message:
In IsoCountsFromMatrix(listSamples, design) : No miRNA found.

��Could you please suggest any changes I need to make. One observation, The frequency column in our data has only 0. Rest all the column are same as the example data. I also deleted the one addtional precursor column as it was not present in example data.

add script to plot how similar are two clusters

amchart - redistributable in source tree?

Is ./misc/js/amcharts.js (from https://www.amcharts.com/download/) truly redistributable in your source tree? The license states "You can download and use all amCharts products for free. The only limitation of the free version is that a small amCharts logo will be displayed in the corner of your charts. If you’d rather have your charts without any branding, or you appreciate the software and would like to support its creators, please purchase a commercial license." It does not say that it may be redistributed. If I am getting this right then you only need this referenced in your reports and this could then also just point to the online version, right? Would you accept a patch to reference the remote version on the CDN instead of a local copy?

lpantano / seqcluster Goto Github PK

seqcluster's Introduction

seqcluster

Cite

Report

Contributors

seqcluster's People

Contributors

Stargazers

Watchers

Forkers

seqcluster's Issues

!/bin/bash -ue

Recommend Projects

Recommend Topics

Recommend Org