This repository contains a short description of the workflow for the assembly and comparison of genomes of the copepod Leptodiaptomus group sicilis that is under a process of ecological speciation.

License: MIT License

Shell 0.63% HTML 99.34% R 0.03%

genome-assembly-of-the-copepod-leptodiaptomus's Issues

denovo_map.pl: Aborted because the last command failed (Error: Unable to load data)

My problem is that I can't finish running denovo_map.pl.

I am using stacks on my local computer from a docker container with the following script stacks.denovo_map.prueba.sh

denovomap.pl is a program that is used to construct loci and to call SNPs de novo, this is used when there is no reference genome.

I want to run denovo_map.pl to identify SNPs in 93 samples with 23 and 22 individuals from 4 populations, but before I wanted to perform a test with 3 individuals from each population but I could not finish the process for these samples the program starts to run, but when you with to continue the analysis of sample 2, the process stops and displays an aborted message: denovo_map.pl: Aborted because the last command failed (1); see log file.

I first ran process_radtags, this is a program that checks the raw data from illumina and demultiplexes the samples for quality and cutoff sites of the restriction enzymes

I run the command as follows:

stacks process_radtags -P -p ../stacks/isuue2/GBS_raw/ --interleaved \
-b ../stacks/isuue2/barcodes_copes_iss.tsv -o ../stacks/isuue2/process_map_res/ \
-c -q -r --index_index --renz_1 mspI --renz_2 nsiI

The following image is an example of the raw data format and this is the barcode file that I use.

The output of process_radtags generates 4 different files .1 and .2 which are used to run denovo_map.pl

And then run denovo_map.pl:

stacks denovo_map.pl --samples ../stacks/isuue2/process_map_res/ \
--popmap ../stacks/isuue2/popmap_tarea_issue.tsv -o ../stacks/isuue2/denovo_map_re2 \
-M 3 -n 2 -m 3 -X "populations: -r 0.50 --min_maf 0.01 --genepop"

Here is the population map file i use

And the following happens, showing the following message: denovo_map.pl: Aborted because the last command failed (1); see log file.

I tried looking for the error, and saw that it could be due to my computer's memory, but I also ran it on a cluster, and got the same error, I also saw that it could probably be due to the ID of the samples, but I tried to change them and they still do not finish the process, and I'm still confused because I don't know if I'm using some command or parameter wrong

Describir que reads utilizas (Ilumina, nanopore)

Difficulties to use databases running BLAST

Do you need to create databases or is there a better option?

I created a database for copepods:

makeblastdb -in ../blastpacope/genomas_copepods/ncbi_dataset/data/db_all_copepods.fna -dbtype nucl -parse_seqids -out my_refrence2.fa

But they weigh a lot and running the BLAST is very slow.

To run BLAST use:

blastn -db my_refrence2.fa -query ../blastpacope/minion/minion_carmen.fa -out results_allgenomes_tab.out -outfmt "6 sframe qseqid sseqid evalue pident mismatch"

The function -outfmt "6 <options>" shows results in a tab separated table. And although it's hard to see, the line of identity percentage in general it looks higher than 80%, so I think that maybe it is not so contaminated, however, I think it makes a BLAST with many databases it will take longer and be more difficult, so I would like to know if anyone knows a more easier way to run BLAST?

Or if it will be a better option to start testing assemblers with different parameters?
The first draft I have seems to be fragmented

Database download in KRAKEN2

I want to identify if there are bacteria or viruses common that could be in my sequences and for this I would like to use KRAKEN2 and its databases

KRAKEN2 I'm running it from Docker with the repository: tbattaglia/kraken2:latest

I ran the program correctly, but I could not download the databases, try to follow the manual.

This is the help of KRAKEN2

Usage: kraken2-build [task option] [options]

Task options (exactly one must be selected):
  --download-taxonomy        Download NCBI taxonomic information
  --download-library TYPE    Download partial library
                             (TYPE = one of "archaea", "bacteria", "plasmid",
                             "viral", "human", "fungi", "plant", "protozoa",
                             "nr", "nt", "env_nr", "env_nt", "UniVec",
                             "UniVec_Core")
  --special TYPE             Download and build a special database
                             (TYPE = one of "greengenes", "silva", "rdp")
  --add-to-library FILE      Add FILE to library
  --build                    Create DB from library
                             (requires taxonomy d/l'ed and at least one file
                             in library)
  --clean                    Remove unneeded files from a built database
  --standard                 Download and build default database
  --help                     Print this message
  --version                  Print version information

Options:
  --db NAME                  Kraken 2 DB name (mandatory except for
                             --help/--version)
  --threads #                Number of threads (def: 1)
  --kmer-len NUM             K-mer length in bp/aa (build task only;
                             def: 35 nt, 15 aa)
  --minimizer-len NUM        Minimizer length in bp/aa (build task only;
                             def: 31 nt, 12 aa)
  --minimizer-spaces NUM     Number of characters in minimizer that are
                             ignored in comparisons (build task only;
                             def: 7 nt, 0 aa)
  --protein                  Build a protein database for translated search
  --no-masking               Used with --standard/--download-library/
                             --add-to-library to avoid masking low-complexity
                             sequences prior to building; masking requires
                             dustmasker or segmasker to be installed in PATH,
                             which some users might not have.
  --max-db-size NUM          Maximum number of bytes for Kraken 2 hash table;
                             if the estimator determines more would normally be
                             needed, the reference library will be downsampled
                             to fit. (Used with --build/--standard/--special)
  --use-ftp                  Use FTP for downloading instead of RSYNC; used with
                             --download-library/--download-taxonomy/--standard.
  --skip-maps                Avoids downloading accession number to taxid maps,
                             used with --download-taxonomy.

I used this command line:kraken2-build --standard --db "bacteria"

But i get this error:

Downloading taxonomy tree data.../kraken2-2.0.8-beta/download_taxonomy.sh: line 27: rsync: command not found

I think the command: rsync is to give data download permissions, but I have doubts if I would have to install it in the Docker image or if I only have to give some permission from my computer or in the same image

better explain metadata

Your metadata directory should only contain metadata info, but currently it also has images and other files related to issues or examples of the class. Please clean this and have a separate directory for non metadata files.

Comments to improve repo organization, scritps and final project

Your repo is looking good, but the following points need attention:

README: It would be clearer if you could briefly state in which order one needs to run all of your scripts. Notice that all scripts in /bin should be mentioned in the README, this is currently not the case.
README: not all of the programs you used are mentioned in "Software versions used", for instance bwa is missing.
README: specify if your Illumina data is single end or pair end and whether it was already demultiplexed
README: briefly explain the contents of all of the files in /meta-data. In the case of files with columns, please briefly mention what does each column mean.
Script DeNovo_assembly.sh line 43 and script pilon.sh line 6: mkadir is not a command, correct it for mkdir. Since the directories where not being created, also check that this was not causing an error.
Scripts: When you use mkdir in an script, is good practice to add the flag -p so that the script can be run several times without causing an error because the directory already exists. Please use mkdir -p in all your scripts that currently use mkdir
Don't forget to add a short summary explaining your main results, this can be part of the README or a separate md file.
Don't forget to add a R figure and the code used to make it. This can be done a simple plot of your number of reads or any data you already have.
When making commits, don't forget to add relevant short messages indicating what did you change.

javierurban / genome-assembly-of-the-copepod-leptodiaptomus Goto Github PK

genome-assembly-of-the-copepod-leptodiaptomus's Issues

denovo_map.pl: Aborted because the last command failed (Error: Unable to load data)

Describir que reads utilizas (Ilumina, nanopore)

Difficulties to use databases running BLAST

Database download in KRAKEN2

better explain metadata

Comments to improve repo organization, scritps and final project

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent