Git Product home page Git Product logo

Comments (10)

SoberDog avatar SoberDog commented on June 24, 2024

I install R and the R-packages by conda.
And the full log is show in below

##############################################################
#       SCASA V1.0.1
#       SINGLE CELL TRANSCRIPT QUANTIFICATION TOOL
#       Version Date: 2022-03-24
#       FOR ANY ISSUES, CONTACT: [email protected]
#       https://github.com/eudoraleer/scasa/
##############################################################

Directory ./ already exists. Writing into existing directory..
mkdir: cannot create directory ‘.//SCASA_My_Project_20221014093239/’: File exists

Preparing for alignment..
Indexing reference..
Directory .//SCASA_My_Project_20221014093239/0PRESETS//REF_INDEX/ already exists. Writing into existing directory..
Version Info: ### PLEASE UPGRADE SALMON ###
### A newer version of salmon with important bug fixes and improvements is available. ####
###
The newest version, available at https://github.com/COMBINE-lab/salmon/releases
contains new features, improvements, and bug fixes; please upgrade at your
earliest convenience.
###
Sign up for the salmon mailing list to hear about new versions, features and updates at:
https://oceangenomics.com/subscribe
###[2022-10-14 09:32:40.162] [jLog] [warning] The salmon index is being built without any decoy sequences.  It is recommended that decoy sequence (either computed auxiliary decoy sequence or the genome of the organism) be provided during indexing. Further details can be found at https://salmon.readthedocs.io/en/latest/salmon.html#preparing-transcriptome-indices-mapping-based-mode.
[2022-10-14 09:32:40.162] [jLog] [info] building index
out : .//SCASA_My_Project_20221014093239/0PRESETS//REF_INDEX/
[2022-10-14 09:32:40.181] [puff::index::jointLog] [info] Running fixFasta

[Step 1 of 4] : counting k-mers

[2022-10-14 09:32:44.999] [puff::index::jointLog] [warning] Removed 236 transcripts that were sequence duplicates of indexed transcripts.
[2022-10-14 09:32:44.999] [puff::index::jointLog] [warning] If you wish to retain duplicate transcripts, please use the `--keepDuplicates` flag
[2022-10-14 09:32:44.999] [puff::index::jointLog] [info] Replaced 4 non-ATCG nucleotides
[2022-10-14 09:32:44.999] [puff::index::jointLog] [info] Clipped poly-A tails from 11,186 transcripts
wrote 76267 cleaned references
[2022-10-14 09:32:45.302] [puff::index::jointLog] [info] Filter size not provided; estimating from number of distinct k-mers
[2022-10-14 09:32:47.287] [puff::index::jointLog] [info] ntHll estimated 85097693 distinct k-mers, setting filter size to 2^31
Threads = 2
Vertex length = 31
Hash functions = 5
Filter size = 2147483648
Capacity = 2
Files:
.//SCASA_My_Project_20221014093239/0PRESETS//REF_INDEX/ref_k31_fixed.fa
--------------------------------------------------------------------------------
Round 0, 0:2147483648
Pass    Filling Filtering
1       24      56
2       4       0
True junctions count = 277411
False junctions count = 439286
Hash table size = 716697
Candidate marks count = 4699425
--------------------------------------------------------------------------------
Reallocating bifurcations time: 0
True marks count: 3337299
Edges construction time: 3
--------------------------------------------------------------------------------
Distinct junctions = 277411

allowedIn: 12
Max Junction ID: 318881
seen.size():2551057 kmerInfo.size():318882
approximateContigTotalLength: 66002535
counters for complex kmers:
(prec>1 & succ>1)=26025 | (succ>1 & isStart)=69 | (prec>1 & isEnd)=67 | (isStart & isEnd)=10
contig count: 433949 element count: 98078572 complex nodes: 26171
# of ones in rank vector: 433948
[2022-10-14 09:34:23.329] [puff::index::jointLog] [info] Starting the Pufferfish indexing by reading the GFA binary file.
[2022-10-14 09:34:23.329] [puff::index::jointLog] [info] Setting the index/BinaryGfa directory .//SCASA_My_Project_20221014093239/0PRESETS//REF_INDEX
size = 98078572
-----------------------------------------
| Loading contigs | Time = 5.2253 ms
-----------------------------------------
size = 98078572
-----------------------------------------
| Loading contig boundaries | Time = 2.9247 ms
-----------------------------------------
Number of ones: 433948
Number of ones per inventory item: 512
Inventory entries filled: 848
433948
[2022-10-14 09:34:23.457] [puff::index::jointLog] [info] Done wrapping the rank vector with a rank9sel structure.
[2022-10-14 09:34:23.459] [puff::index::jointLog] [info] contig count for validation: 433,948
[2022-10-14 09:34:23.575] [puff::index::jointLog] [info] Total # of Contigs : 433,948
[2022-10-14 09:34:23.575] [puff::index::jointLog] [info] Total # of numerical Contigs : 433,948
[2022-10-14 09:34:23.593] [puff::index::jointLog] [info] Total # of contig vec entries: 3,427,302
[2022-10-14 09:34:23.593] [puff::index::jointLog] [info] bits per offset entry 22
[2022-10-14 09:34:23.640] [puff::index::jointLog] [info] Done constructing the contig vector. 433949
[2022-10-14 09:34:23.727] [puff::index::jointLog] [info] # segments = 433,948
[2022-10-14 09:34:23.727] [puff::index::jointLog] [info] total length = 98,078,572
[2022-10-14 09:34:23.742] [puff::index::jointLog] [info] Reading the reference files ...
[2022-10-14 09:34:24.195] [puff::index::jointLog] [info] positional integer width = 27
[2022-10-14 09:34:24.195] [puff::index::jointLog] [info] seqSize = 98,078,572
[2022-10-14 09:34:24.195] [puff::index::jointLog] [info] rankSize = 98,078,572
[2022-10-14 09:34:24.195] [puff::index::jointLog] [info] edgeVecSize = 0
[2022-10-14 09:34:24.195] [puff::index::jointLog] [info] num keys = 85,060,132
for info, total work write each  : 2.331    total work inram from level 3 : 4.322  total work raw : 25.000
[Building BooPHF]  99.9 %   elapsed:   0 min 6  sec   remaining:   0 min 0  sec
Bitarray       445693632  bits (100.00 %)   (array + ranks )
final hash             0  bits (0.00 %) (nb in final hash 0)
[2022-10-14 09:34:29.716] [puff::index::jointLog] [info] mphf size = 53.1308 MB
[2022-10-14 09:34:29.743] [puff::index::jointLog] [info] chunk size = 49,039,286
[2022-10-14 09:34:29.743] [puff::index::jointLog] [info] chunk 0 = [0, 49,039,286)
[2022-10-14 09:34:29.743] [puff::index::jointLog] [info] chunk 1 = [49,039,286, 98,078,542)
[2022-10-14 09:34:38.224] [puff::index::jointLog] [info] finished populating pos vector
[2022-10-14 09:34:38.224] [puff::index::jointLog] [info] writing index components
[2022-10-14 09:34:38.357] [puff::index::jointLog] [info] finished writing dense pufferfish index
[2022-10-14 09:34:38.384] [jLog] [info] done building index
Finnished indexing reference..
Begins pseudo-alignment..
nohup: redirecting stderr to stdout
Congratulations! Pseudo-alignment has completed in 30 seconds!
Scasa quantification has started..
Begin Scasa quantification for sample Sample_01_S1_L001..
Error in file(con, "r") : cannot open the connection
Calls: readLines -> file
In addition: Warning message:
In file(con, "r") :
  cannot open file './/SCASA_My_Project_20221014093239/1ALIGN//Sample_01_S1_L001_alignout/alevin/bfh.txt': No such file or directory
Execution halted
Loading required package: iterators
Loading required package: parallel
Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection
Calls: load -> readChar
In addition: Warning message:
In readChar(con, 5L, useBytes = TRUE) :
  cannot open compressed file '/storage/data/GYD/Softwares/scasa/Test_Dataset/SCASA_My_Project_20221014093239/2QUANT/Sample_01_S1_L001_quant/Sample_eqClass.RData', probable reason 'No such file or directory'
Execution halted
Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection
Calls: load -> readChar
In addition: Warning message:
In readChar(con, 5L, useBytes = TRUE) :
  cannot open compressed file './/SCASA_My_Project_20221014093239/2QUANT//Sample_01_S1_L001_quant//scasa_isoform_expression.RData', probable reason 'No such file or directory'
Execution halted
Congratulations! Scasa single cell RNA-Seq transcript quantification has completed in 30 seconds!
All done!

from scasa.

nghiavtr avatar nghiavtr commented on June 24, 2024

Hi @SoberDog,

Thank you for using Scasa. Your alevin.log indicates that there is a conflict between the cDNA file and the transcript-gene mapping file (https://github.com/eudoraleer/scasa/blob/main/scasa/REFERENCE/txp2gene_hg38.tsv). I guess you might download the latest version of hg38 which has some small differences from the older one.

So, the simple solution is using the cDNA fasta version we suggested (downloaded here https://www.dropbox.com/s/xoa6yl562a5lv35/refMrna.fa.gz)

If you really need to use the latest version, then it requires preparing a new txp2gene_hg38.tsv file and follow the instruction in (https://github.com/eudoraleer/scasa/wiki) to generate a proper X-matrix for replacement in scasa/REFERENCE folder.

Best,
Nghia

from scasa.

ThepeachYolado avatar ThepeachYolado commented on June 24, 2024

I install R and the R-packages by conda. And the full log is show in below

##############################################################
#       SCASA V1.0.1
#       SINGLE CELL TRANSCRIPT QUANTIFICATION TOOL
#       Version Date: 2022-03-24
#       FOR ANY ISSUES, CONTACT: [email protected]
#       https://github.com/eudoraleer/scasa/
##############################################################

Directory ./ already exists. Writing into existing directory..
mkdir: cannot create directory ‘.//SCASA_My_Project_20221014093239/’: File exists

Preparing for alignment..
Indexing reference..
Directory .//SCASA_My_Project_20221014093239/0PRESETS//REF_INDEX/ already exists. Writing into existing directory..
Version Info: ### PLEASE UPGRADE SALMON ###
### A newer version of salmon with important bug fixes and improvements is available. ####
###
The newest version, available at https://github.com/COMBINE-lab/salmon/releases
contains new features, improvements, and bug fixes; please upgrade at your
earliest convenience.
###
Sign up for the salmon mailing list to hear about new versions, features and updates at:
https://oceangenomics.com/subscribe
###[2022-10-14 09:32:40.162] [jLog] [warning] The salmon index is being built without any decoy sequences.  It is recommended that decoy sequence (either computed auxiliary decoy sequence or the genome of the organism) be provided during indexing. Further details can be found at https://salmon.readthedocs.io/en/latest/salmon.html#preparing-transcriptome-indices-mapping-based-mode.
[2022-10-14 09:32:40.162] [jLog] [info] building index
out : .//SCASA_My_Project_20221014093239/0PRESETS//REF_INDEX/
[2022-10-14 09:32:40.181] [puff::index::jointLog] [info] Running fixFasta

[Step 1 of 4] : counting k-mers

[2022-10-14 09:32:44.999] [puff::index::jointLog] [warning] Removed 236 transcripts that were sequence duplicates of indexed transcripts.
[2022-10-14 09:32:44.999] [puff::index::jointLog] [warning] If you wish to retain duplicate transcripts, please use the `--keepDuplicates` flag
[2022-10-14 09:32:44.999] [puff::index::jointLog] [info] Replaced 4 non-ATCG nucleotides
[2022-10-14 09:32:44.999] [puff::index::jointLog] [info] Clipped poly-A tails from 11,186 transcripts
wrote 76267 cleaned references
[2022-10-14 09:32:45.302] [puff::index::jointLog] [info] Filter size not provided; estimating from number of distinct k-mers
[2022-10-14 09:32:47.287] [puff::index::jointLog] [info] ntHll estimated 85097693 distinct k-mers, setting filter size to 2^31
Threads = 2
Vertex length = 31
Hash functions = 5
Filter size = 2147483648
Capacity = 2
Files:
.//SCASA_My_Project_20221014093239/0PRESETS//REF_INDEX/ref_k31_fixed.fa
--------------------------------------------------------------------------------
Round 0, 0:2147483648
Pass    Filling Filtering
1       24      56
2       4       0
True junctions count = 277411
False junctions count = 439286
Hash table size = 716697
Candidate marks count = 4699425
--------------------------------------------------------------------------------
Reallocating bifurcations time: 0
True marks count: 3337299
Edges construction time: 3
--------------------------------------------------------------------------------
Distinct junctions = 277411

allowedIn: 12
Max Junction ID: 318881
seen.size():2551057 kmerInfo.size():318882
approximateContigTotalLength: 66002535
counters for complex kmers:
(prec>1 & succ>1)=26025 | (succ>1 & isStart)=69 | (prec>1 & isEnd)=67 | (isStart & isEnd)=10
contig count: 433949 element count: 98078572 complex nodes: 26171
# of ones in rank vector: 433948
[2022-10-14 09:34:23.329] [puff::index::jointLog] [info] Starting the Pufferfish indexing by reading the GFA binary file.
[2022-10-14 09:34:23.329] [puff::index::jointLog] [info] Setting the index/BinaryGfa directory .//SCASA_My_Project_20221014093239/0PRESETS//REF_INDEX
size = 98078572
-----------------------------------------
| Loading contigs | Time = 5.2253 ms
-----------------------------------------
size = 98078572
-----------------------------------------
| Loading contig boundaries | Time = 2.9247 ms
-----------------------------------------
Number of ones: 433948
Number of ones per inventory item: 512
Inventory entries filled: 848
433948
[2022-10-14 09:34:23.457] [puff::index::jointLog] [info] Done wrapping the rank vector with a rank9sel structure.
[2022-10-14 09:34:23.459] [puff::index::jointLog] [info] contig count for validation: 433,948
[2022-10-14 09:34:23.575] [puff::index::jointLog] [info] Total # of Contigs : 433,948
[2022-10-14 09:34:23.575] [puff::index::jointLog] [info] Total # of numerical Contigs : 433,948
[2022-10-14 09:34:23.593] [puff::index::jointLog] [info] Total # of contig vec entries: 3,427,302
[2022-10-14 09:34:23.593] [puff::index::jointLog] [info] bits per offset entry 22
[2022-10-14 09:34:23.640] [puff::index::jointLog] [info] Done constructing the contig vector. 433949
[2022-10-14 09:34:23.727] [puff::index::jointLog] [info] # segments = 433,948
[2022-10-14 09:34:23.727] [puff::index::jointLog] [info] total length = 98,078,572
[2022-10-14 09:34:23.742] [puff::index::jointLog] [info] Reading the reference files ...
[2022-10-14 09:34:24.195] [puff::index::jointLog] [info] positional integer width = 27
[2022-10-14 09:34:24.195] [puff::index::jointLog] [info] seqSize = 98,078,572
[2022-10-14 09:34:24.195] [puff::index::jointLog] [info] rankSize = 98,078,572
[2022-10-14 09:34:24.195] [puff::index::jointLog] [info] edgeVecSize = 0
[2022-10-14 09:34:24.195] [puff::index::jointLog] [info] num keys = 85,060,132
for info, total work write each  : 2.331    total work inram from level 3 : 4.322  total work raw : 25.000
[Building BooPHF]  99.9 %   elapsed:   0 min 6  sec   remaining:   0 min 0  sec
Bitarray       445693632  bits (100.00 %)   (array + ranks )
final hash             0  bits (0.00 %) (nb in final hash 0)
[2022-10-14 09:34:29.716] [puff::index::jointLog] [info] mphf size = 53.1308 MB
[2022-10-14 09:34:29.743] [puff::index::jointLog] [info] chunk size = 49,039,286
[2022-10-14 09:34:29.743] [puff::index::jointLog] [info] chunk 0 = [0, 49,039,286)
[2022-10-14 09:34:29.743] [puff::index::jointLog] [info] chunk 1 = [49,039,286, 98,078,542)
[2022-10-14 09:34:38.224] [puff::index::jointLog] [info] finished populating pos vector
[2022-10-14 09:34:38.224] [puff::index::jointLog] [info] writing index components
[2022-10-14 09:34:38.357] [puff::index::jointLog] [info] finished writing dense pufferfish index
[2022-10-14 09:34:38.384] [jLog] [info] done building index
Finnished indexing reference..
Begins pseudo-alignment..
nohup: redirecting stderr to stdout
Congratulations! Pseudo-alignment has completed in 30 seconds!
Scasa quantification has started..
Begin Scasa quantification for sample Sample_01_S1_L001..
Error in file(con, "r") : cannot open the connection
Calls: readLines -> file
In addition: Warning message:
In file(con, "r") :
  cannot open file './/SCASA_My_Project_20221014093239/1ALIGN//Sample_01_S1_L001_alignout/alevin/bfh.txt': No such file or directory
Execution halted
Loading required package: iterators
Loading required package: parallel
Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection
Calls: load -> readChar
In addition: Warning message:
In readChar(con, 5L, useBytes = TRUE) :
  cannot open compressed file '/storage/data/GYD/Softwares/scasa/Test_Dataset/SCASA_My_Project_20221014093239/2QUANT/Sample_01_S1_L001_quant/Sample_eqClass.RData', probable reason 'No such file or directory'
Execution halted
Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection
Calls: load -> readChar
In addition: Warning message:
In readChar(con, 5L, useBytes = TRUE) :
  cannot open compressed file './/SCASA_My_Project_20221014093239/2QUANT//Sample_01_S1_L001_quant//scasa_isoform_expression.RData', probable reason 'No such file or directory'
Execution halted
Congratulations! Scasa single cell RNA-Seq transcript quantification has completed in 30 seconds!
All done!

Did you solve it? I had the same problem. Thanks!

from scasa.

nghiavtr avatar nghiavtr commented on June 24, 2024

Hi @yangwh1998 @SoberDog,

It is likely the issues you guys got were caused by not correct settings or running environment. For convenient, we have a built a docker for scasa which an instruction of how to use for the example data here: https://github.com/eudoraleer/scasa/blob/main/README.md#using-docker-to-run-scasa

Using docker, you don't have to install R packages, salmon, scasa separately. Users just need to revise the setting and input in docker_params.sh. I hope it would make scasa easier to use.

Best,
Nghia

from scasa.

SoberDog avatar SoberDog commented on June 24, 2024

Hi, @nghiavtr,
Thanks for deploying the docker, I will try.
Best Regards,
Soberdog

from scasa.

oucstar avatar oucstar commented on June 24, 2024

@nghiavtr , Sorry to bother you, but I have encountered a problem when running your software, which has not been solved, When I set up a new X-matrix for my species annotation file, When I run Rscript $scasaPath/../aux/gen_tx2gene.R cdna=$refFile gtf=$gtfFile sqlite=$sqliteFile cdnaout=$refCleanFile out=$txp2geneFile,It is always making mistakes:
not in gtf: 87794Error in S4Vectors:::normarg_names(value, class(x), length(x)) :
attempt to set too many names (1) on GroupedIRanges object of length 0
Calls: names<- -> names<- -> names<- -> names<- ->
This is my gtf file:

lg3 transdecoder gene 18426480 18430230 . + . gene_id "PB.10003";
lg3 transdecoder transcript 18426480 18429150 . + . transcript_id "PB.10003.6.p1"; gene_id "PB.10003";
lg3 transdecoder exon 18426480 18426570 . + . transcript_id "PB.10003.6.p1"; gene_id "PB.10003";
lg3 transdecoder CDS 18426482 18426570 . + 0 transcript_id "PB.10003.6.p1"; gene_id "PB.10003";
lg3 transdecoder exon 18426689 18426803 . + . transcript_id "PB.10003.6.p1"; gene_id "PB.10003";
lg3 transdecoder CDS 18426689 18426803 . + 1 transcript_id "PB.10003.6.p1"; gene_id "PB.10003";
lg3 transdecoder exon 18427439 18427627 . + . transcript_id "PB.10003.6.p1"; gene_id "PB.10003";
lg3 transdecoder CDS 18427439 18427627 . + 0 transcript_id "PB.10003.6.p1"; gene_id "PB.10003";
lg3 transdecoder exon 18428648 18429150 . + . transcript_id "PB.10003.6.p1"; gene_id "PB.10003";
lg3 transdecoder CDS 18428648 18428749 . + 0 transcript_id "PB.10003.6.p1"; gene_id "PB.10003";
lg3 transdecoder three_prime_UTR 18428750 18429150 . + . transcript_id "PB.10003.6.p1"; gene_id "PB.10003";
lg3 transdecoder transcript 18426689 18430230 . + . transcript_id "PB.10003.7.p1"; gene_id "PB.10003";
lg3 transdecoder exon 18426689 18426803 . + . transcript_id "PB.10003.7.p1"; gene_id "PB.10003";
lg3 transdecoder CDS 18426690 18426803 . + 0 transcript_id "PB.10003.7.p1"; gene_id "PB.10003";
lg3 transdecoder exon 18427439 18427627 . + . transcript_id "PB.10003.7.p1"; gene_id "PB.10003";
lg3 transdecoder CDS 18427439 18427627 . + 0 transcript_id "PB.10003.7.p1"; gene_id "PB.10003";
lg3 transdecoder exon 18428648 18430230 . + . transcript_id "PB.10003.7.p1"; gene_id "PB.10003";
lg3 transdecoder CDS 18428648 18428749 . + 0 transcript_id "PB.10003.7.p1"; gene_id "PB.10003";
lg3 transdecoder three_prime_UTR 18428750 18430230 . + . transcript_id "PB.10003.7.p1"; gene_id "PB.10003";

Could you take a look at it for me? I'd appreciate it if you could help!

from scasa.

nghiavtr avatar nghiavtr commented on June 24, 2024

Hi @oucstar ,

It looks like that some transcripts in the cdna file are not found in the gtf file. Can you send me both cdna fasta and gtf files (via email to [email protected]), that I can have a look closer?

Nghia

from scasa.

oucstar avatar oucstar commented on June 24, 2024

@nghiavtr ,Sorry to bother you, but is it me you are @?

from scasa.

nghiavtr avatar nghiavtr commented on June 24, 2024

@nghiavtr ,Sorry to bother you, but is it me you are @?

Yes, @oucstar , sorry I referred to a wrong person. Just send me your file I will have a check.

Nghia

from scasa.

oucstar avatar oucstar commented on June 24, 2024

@nghiavtr ,Ok, I will send it to you after sorting it out. Thank you for your warm reply!

from scasa.

Related Issues (11)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.