Git Product home page Git Product logo

gtz's People

Contributors

dengnan26567 avatar kjq38159 avatar xuxiali13 avatar zhaolx01 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

gtz's Issues

fatal error:invalid gz file,exit

解压一个常规gz文件时遇到这个错误,这个gz文件是正常的,就是大了些,25GB,希望排查下

$ ls -sh /resdata/SCT_DATA/data_deliver/JY_combined_R2.fastq.gz
25G /resdata/SCT_DATA/data_deliver/JY_combined_R2.fastq.gz
$ zcat /resdata/SCT_DATA/data_deliver/JY_combined_R2.fastq.gz | head
@E00552:208:HNJFYCCXY:8:1101:2351:1379 2:N:0:NGAGGCTG
NATCAGAATGAGCTGGTGGGAACCTTGGGCAGCCAAACGGAGCGGCGTTCTGCACCATGTCCCATCCAGTGCTGCGAATCCACGCCCCGCAGCCCTGCCCCCCCGCGACAGCTCACACCATGGCTCGAGGACAAGGTGTTATCCCGACAC
+
#---<A-FJF-F-7--7-7-7-7F<F-<J---7AAAF-7---A-F--7-7-7-<-7A-7AJJF-AA<7-7-7-A--7---7<A---7--7A<7-7---7-7F7-)-)7)7-A<)--AF----))-)-))))--<-----7---7-)))))
@E00552:208:HNJFYCCXY:8:1101:2392:1379 2:N:0:NGAGGCTG
NCTCATCCCAGCAGGCCCTCCCTTAGCTGAGGGAATTCTTTTTCCCCTCCCTCCACCGACAAATATTGACAGGCACCCACCGAGGATGTGCAGAGCTCAGCCGCGGCTGCGGGGACTCAATTTGCAACAGACATGGACTCCCCCCTCACG
+
#<-A-----7----7----7F---<-7-<--77-<--A<--7-<7<A--A7-7A7AJ<----7---7<--<FF-77-7-<J-A-77-7FAA----7-7A--<-)-)))-))))))-)7---------7------7--7-)))))))-)))
@E00552:208:HNJFYCCXY:8:1101:2514:1379 2:N:0:NGAGGCTG
NAAGAATTTCAAAGCCTTCGCTAGTCTCCGTATGGCCCGTGCCAACGCCCGGCTCTTCGGCATACGGGCAAAAAGAGCCAAGGAAGCCGCAGAACAGGATGTTGAAAAGAAAAAATAAAGCCCTCCTGGGGACTTGGAATCAAAAAAAAA

can you add a new feature

if i have many fq.gz in a file , i hope use this command
gtz *.gz --ref /data.genome.fa
,of course i can write a shell For loops,also can achieve this requirement.

hisat2默认流程下处理的bam文件不能用gtz压缩

ref文件:genome.fa (hg19)
下载地址:https://hgdownload.soe.ucsc.edu/goldenPath/hg19/bigZips/chromFa.tar.gz

hisat2运行的ref: hg19 genome_tran
下载地址:https://ccb.jhu.edu/software/hisat2/manual.shtml

已经通过默认的hisat2流程生成bam文件(没有用hisat2-gtz),gtz打包时报错如下:


prepare compression... 100%, cost 23s (18|5)
Enabling high-rate compression mode with /home/.config/gtz/genome.fa-D2A70550489DE356A2CD6BFC40711204.bam.rbin2( hardware speedup )
[                                                  ] 0%

RNAME "1" was not found in reference file: /home/Erythropoiesis_APA/HSC_APA/hsc_apa1/04-salmon/00-file/genome.fa
error: the reference file was detected to not match this BAM file!

'gtz -c' doesn't work

Hi
I have tried 'gtz file.gz -c' and I got output as file (file.gz.gtz). Indeed, the gtz program should print in stdout. Would you like to investigate this issue?
I used gtz version: PROFESSIONAL-3.0.0-V-2020-12-09 01:58:41.

Best regards,
Piroon

生态软件技术提问

您好,我想咨询下gtz与其生态软件匹配的问题。

我的猜测:
在如 bwa、tophat 这些软件的源码中,修改参数,使之当解析原始数据为gtz格式时,先对原始数据进行解压。解压后,数据成为原原软件可以接受的fastq(gz)格式。
还是有其他部分的修改?
谢谢~

Emergency ERROR needs to be fixed

At the same time, open 2 terminals to access the same server. If one of them runs' GTZ ', the other cannot run.
ERROR info -bash: gtz: command not found

How to estimate the compressed file is unbroken?

For example, if I was compressing a file A.fq.gz to A.fq.gz.gtz. Unfortunately, there was a problem so that the work was not finished in normal condition. Then the A.fq.gz.gtz was still created and we can't judge if it is full from the file size. Is there any tools to let me know if the compressed file is full?

假如我用gtz压缩一个文件,快结束时候断掉了,但是我不知道,而.gtz的文件也已经生成了,虽然还没结束,但是大小也接近正常大小了。那我如何检测这个gtz文件是否完整呢?

gtz uses all the available cores

I'm testing gtz. It seems that gtz uses all the cores of my computer. I think it would be best if one can specify the number of threads used by gtz.

STAR with gtz is not working

According to README file in gtz, STAR should be working with gtz compressed files, but it actually not.

We have downloaded last versions of gtz, and compressed a pair of RNA-seq as follows,
$ gtz RNAseq_R1.fastq.gz -o RNAseq_R1.fastq.gz.gtz
$ gtz RNAseq_R2.fastq.gz -o RNAseq_R2.fastq.gz.gtz
without reference.

Then we tried to use STAR directly with command
$ STAR --genomeDir /path/to/star_index --readFilesIn RNAseq_R1.fastq.gz.gtz RNAseq_R2.fastq.gz.gtz --readFilesCommand gtz -r
But it was not working.
So, we would like to know which version of STAR is required for gtz compressed files.

-- Jang-il Sohn

Installer modifies user's .bashrc without asking or warning the user

Running GTZ installer "https://gtz.io/gtz_latest.run" modifies user's .bashrc file. This intrusion into user's configuration happens silently, without asking, warning or notifying the user.

Few reasons why it's a bad idea:

  1. It's extremely rude to silently mess with user's configuration files.
  2. User may prefer other shell such as zsh (please don't get any ideas).
  3. User may prefer to run gtz using full path, or configure their own aliases or symlinks.
  4. User may already have another command with the same name.

Fortunately && source ~/.bashrc in installation instructions gives it away.

This happens as of GTZ version "PROFESSIONAL-2.1.3-V-2020-03-18 07:11:20"

GTX.Zip Professional stops working after 6 months

Installed GTX.Zip Professional stops working after 6 months (not sure if this is exact, or approximate). Expired GTZ shows message:

Powered by GTXLab of Genetalks. (built in PROFESSIONAL-2.1.2-V-2019-11-13 01:02:13 )
Warning:Invalid certificate!
Warning:The expiration date is:20200511
Warning:Please update the program from https://github.com/Genetalks/gtz .If you would like to use an unrestricted version, please contact [email protected] .

Upon seeing this message, the user is expected to download and install the latest version. (It's also mentioned in the license).

Why is this a problem for any serious use of GTX.Zip Professional?

  1. Someone may use GTX.Zip Professional as part of data analysis system. Such system (possibly consisting of dozens of software tools) is tested and deployed in production environment. It's easy to miss the expiration message in EULA when constructing such system. Then in a few months the system suddenly stops working. By this time the people who designed the system may be unavailable, and the cost of investigating, fixing, and downtime may be high.

Therefore currently GTX.Zip Professional is not suitable for data analysis pipelines (i.e., in the industry).

  1. Many journals require reproducible data analysis protocols when publishing results. A data analysis protocol must include versions of all software used. However, the exact version of GTZ used in the protocol will be unavailable for download by the time the paper is out. Even if a reader has the same version, it will expire and stop working by the time the paper is out.

Therefore GTX.Zip Professional is useless for reproducible science.

I don't know what benefits expiration brings to GTZ developers to make it worth rendering it useless for both science and industry. Perhaps it is intended to motivate the user to purchase a commercial license. This by itself is OK, the problem is that the current README.md never mentions the expiration.

To avoid misleading the users, freely downloadable GTX.Zip Professional must be clearly marked as "trial" and "6 month expiration" should be prominently mentioned in the README.md.

Suggest a parameter to directly decompress gz file to fastq.

Here are somethings I noticed

  • gtz make full use of multi cpu cores when do compress/decompress
  • here is a param --gz

so , just as the title, is there a plan to develop a somatic function to decompress gz file to fastq with multi cores in gtz ?

有关压缩率的问题

使用专业版压缩提供的sample.fq文件,在使用fasta完成压缩后,压缩完的文件大小是原文件的10%左右,并没有达到介绍中的2%,请问需要如何操作才能达到最佳的压缩率?

$ gtz sample_bak.fq --ref GCF_000001405.37_GRCh38.p11_genomic.fna.gz
......
$ ls -l
-rw-r--r-- 1 charles charles 2183810290  6月 28 20:25 sample_bak.fq
-rw-r--r-- 1 charles charles  233007202  6月 29 10:35 sample_bak.fq.gtz

how to use standard input?

I tried following code, but it didn't work

zcat sample_*_R1_*.fastq.gz | ./gtz -o sample_R1.fastq.gtz

output:

Powered by GTXLab of Genetalks. (built in PUBLIC-1.0-V-2018-08-06 02:48:41 )
Compressor initializing ... 
nothing to compress.

weird gtz behavior

I'm trying to compress a fastq file by gtz. However I obtained 3 totally different outcome by just modify the output

The first one was a success!
cat DMS_273.2_1.fastq | gtz -o ./G20481.DMS_273.2_1.fastq.gtz
Powered by GTXLab of Genetalks.
Compressor initializing ...
compressing ...
id: 375442529 / 3183320019
base: 819648004 / 8518710584
quality: 2158646143 / 8518710584
() source/compressed : 20468858971/3353746175. ratio : 16.385%
The cost time of compressing () is 00:06:11 (hh::mm:ss)

Compress finished, the total cost time is 00:06:11 (hh:mm:ss)

real 6m14.742s
user 134m18.192s
sys 1m8.160s
####################
This one caused an immediate error
cat DMS_273.2_1.fastq | gtz -o G20481.DMS_273.2_1.fastq.gtz
Powered by GTXLab of Genetalks.
Compressor initializing ...
gtz: line 8: 47524 Segmentation fault (core dumped) $basepath/_gtz $@

real 0m0.198s
user 0m0.004s
sys 0m0.000s
#####################
This one was aborted
cat DMS_273.2_1.fastq | gtz -c > G20481.DMS_273.2_1.fastq.gtz
Powered by GTXLab of Genetalks.
Compressor initializing ...
compressing ...
id: 375442529 / 3183320019
base: 819648212 / 8518710584
quality: 2158646143 / 8518710584
() source/compressed : 20468858971/3353746396. ratio : 16.385%
The cost time of compressing () is 00:07:01 (hh::mm:ss)

Compress finished, the total cost time is 00:07:02 (hh:mm:ss)
terminate called without an active exception
gtz: line 8: 23992 Aborted (core dumped) $basepath/_gtz $@

real 7m6.006s
user 136m5.360s
sys 1m6.200s

#############
This is the status by pigz:
cat DMS_273.2_1.fastq | pigz -p 4 -c > G20481.DMS_273.2_1.fastq.gz

real 6m4.838s
user 23m46.496s
sys 0m29.420s

#############
This is the final file:
3.2G Apr 4 19:55 G20481.DMS_273.2_1.fastq.gtz
5.1G Apr 4 19:36 G20481.DMS_273.2_1.fastq.gz

The compression ratio is very good.

broken pipe issue

hi
I'm testing the gtz program.

During running the program, a failure occurred due to a brken pipe.

bwa-gtz mem -t 12 -M -R "@rg\tPL:Illumina\tID:A00552\tSM:NA12878\tLB:NA12878" genome.fa NA12878_R1.fastq.gtz NA12878_R2.fastq.gtz | samtools view -bSu - | samtools sort -@ 22 -m 8G - /gtz_test/NA12878_gtz_for_bwa_use_ref_Thead12.sorted

The fault log is as follows.

M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (22, 336272, 211, 7)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (177, 235, 305)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 561)
[M::mem_pestat] mean and std.dev: (246.27, 108.03)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 689)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (335, 386, 447)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (111, 671)
[M::mem_pestat] mean and std.dev: (392.79, 84.98)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 783)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (18, 37, 70)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 174)
[M::mem_pestat] mean and std.dev: (45.20, 36.32)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 226)
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RF
[M::mem_process_seqs] Processed 794702 reads in 386.185 CPU sec, 31.998 real sec
qsub_script_noSGE.sh: line 7: 46647 Done(141) bwa-gtz mem -t 12 -M -R "@rg\tPL:Illumina\tID:A00552\tSM:NA12878\tLB:NA12878" genome.fa NA12878_R1.fastq.gtz NA12878_R2.fastq.gtz
46648 Broken pipe | samtools view -bSu -
46649 Killed | samtools sort -@ 22 -m 8G - /gtz_test/NA12878_gtz_for_bwa_use_ref_nosge_Thead1.sorted

If you know how to solve it, I would appreciate it if you let me know.

Compress bam file

Hi,

I tried to compress a bam file with the following code:

~/software/gtz/GTX.Zip/gtz CO43.bam

But I encountered this issue:

Powered by GTXLab of Genetalks. (built in STANDARD-v4.0.3 build time: 14:49:47 Sep 13 2022 license: v1.1.1 build 09132022.021212)

Edition
License expiration time: 2024-06-02 19:45:43

Compression capacity limit:
Total: 1 TB
Used: 1 GB 577.39 MB
Remaining: 1022 GB 446.61 MB

Start compression: 1 of 1
FileName: CO43.bam, CompressType: bam, Threads: 96, Verify: No

[ERROR] Catch signal 11, clear and exit
[ERROR] Exit with exception, remove temp file

gtz v1.2.2 bin模式压缩 验证出现假死

你好,目前在使用gtz压缩原始数据,在压缩小麦的数据,压缩完验证文件的时候,会偶尔出现(目前只在v1.2.2发现,出现2次了)停在验证这一步,top看进程发现gtz一直在运行,但是验证这一步超过1700min...只能手动kill掉进程。
目前还不能确定是什么问题,因为重新压缩一次的时候又通过了,这里只提交碰到的问题吧

GTZ stuck on .bam file after 72%

gtz get stuck , retried several times? Am I doing something wrong?
Here are the details:

I ran the following command
$ nohup /home/XX/.config/GTZ/gtz possorted_genome_bam.bam --ref ../ref/cellranger_custom_hg38_ref_with_full_car_and_5utr.tar.gz &

It got stuck after increasing precents with this messages (taking from the nohup.out file)

first time use this Fasta, need convert it to binary... 72%

I have tried to delete the output .gtz file and start again several times - but it still stuck
This is how the working directory looks like

XX@HH:/mnt/disks/sdb/bamfile$ ls -l
total 273824532
-rw------- 1 XX XX 1840055 Jul 15 19:54 nohup.out
-rw-r--r-- 1 XX XX 280394439771 May 19 00:20 possorted_genome_bam.bam
-rw-rw-r-- 1 XX XX 108 Jul 15 19:45 possorted_genome_bam.bam.gtz
XX@HH:/mnt/disks/sdb/bamfile$ ls -l ../ref
total 11120380
-rw-rw-r-- 1 XX XX 11387261138 Jul 15 07:50 cellranger_custom_hg38_ref_with_full_car_and_5utr.tar.gz

fastqc-gtz error

Hi,

Always got the following errors when trying to run on *.fq.gtz files:

"ERROR:Error: test.1.fq.gtz format error!(magic num error)
Started analysis of test.1.fq.gtz
Please waiting...
Analysis complete for test.1.fq.gtz
Failed to process file test.1.fq.gtz
java.lang.ArrayIndexOutOfBoundsException: -1
at uk.ac.babraham.FastQC.Modules.SequenceLengthDistribution.calculateDistribution(SequenceLengthDistribution.java:101)
at uk.ac.babraham.FastQC.Modules.SequenceLengthDistribution.raisesError(SequenceLengthDistribution.java:190)
at uk.ac.babraham.FastQC.Report.HTMLReportArchive.startDocument(HTMLReportArchive.java:336)
at uk.ac.babraham.FastQC.Report.HTMLReportArchive.(HTMLReportArchive.java:84)
at uk.ac.babraham.FastQC.Analysis.OfflineRunner.analysisComplete(OfflineRunner.java:178)
at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:110)
at java.lang.Thread.run(Thread.java:748)

The gtz is able to decompress gtz files.

Any solutions? Thank you.

Jim

Certificate Expires

Same scenario with issues #20 , the certificate expired for V3.0.1, while newer version unavailable...

GTZ failed during decompression process

There exist very serious errors in this tool, and it might not be released now. Because, I tried to compress my datasets using this tools, and deleted raw files. But, this tool can not decompress its outputs. Fortunately, I only used public datasets now. It can not imagine what could be done if it was applied in your sequencing datasets without any other backups.

Here is the accession numbers of used libraries:
ERR1276823
ERR1276808
ERR1276794
ERR1276796
ERR1276778
ERR1276771
ERR1276761
ERR1276759
ERR1276757
ERR1276743
ERR1276741
And, almost all of them have zero or 250Mb outputs.

Hope you could fix this problems.

Thanks for all your efforts of developing this tool.

Permitted to create Bioconda recipe?

Hello, I'm interested in creating a Bioconda recipe for GTZ, but I can't quite tell if the license permits this.

I noticed the license is very similar to a BSD-2 license, just without the permission to redistribute with modification. Would creating a Bioconda recipe which takes the gtz binary and throws away the lib directory that comes packaged with it, and instead installs python 2.7 through Conda be considered modification and thus not be permitted?

请问方便加入猪(Sus scrofa)的index么?

之前用gtz压缩过20T的猪的数据,压缩率为16%(gz为28%),想体验下现在的高倍压缩模式,不知道贵团队方便添加下猪的index么?谢谢!
猪的基因组下载地址如下:
ftp://ftp.ensembl.org/pub/release-94/fasta/sus_scrofa/dna/Sus_scrofa.Sscrofa11.1.dna.toplevel.fa.gz

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.