jcventerinstitute / pangenomepipeline Goto Github PK

View Code? Open in Web Editor NEW

15.0 15.0 9.0 21.52 MB

PanGenomePipeline

License: GNU General Public License v2.0

Ruby 3.71% Perl 82.93% R 3.00% Shell 0.26% Pep8 1.83% Python 5.94% Dockerfile 0.08% Raku 0.47% C 1.79%

pangenomepipeline's People

Contributors

Stargazers

Watchers

Forkers

inmanjm erinbeck gregducq grangersutton jstnchou anshubhardwajcri wangdi2014 chandana277 wook2014 nbel15

pangenomepipeline's Issues

Why does the core_clusters.txt file contain blank cells in genome columns (columns >=8)?

Hi, I'm a relatively new user of PanOCT. In viewing some of the output files, I noticed that in the core_clusters.txt file, while the vast majority of rows have no blank cells in the genome columns (columns >= 8), a minority of rows have some blank cells. These empty cells seem to always fall in rows where the "attributes" column contains an "FS", "FG-In", or "FN".

Given the description of core_clusters.txt file in the OUT_FILE_DESCRIPTIONS.txt file ("core_clusters.txt: Tab delimited file. Lists only those clusters which have a representative from every genome in the analysis."), I'm confused how the core_clusters.txt file could have any empty cells.

I apologize if this is a novice question, I can't seem to connect the dots by reading through the documentation on this site, the original 2012 PanOCT paper, or the 2018 pangenome pipeline paper.

Any guidance would be greatly appreciated. I have attached my core_clusters.txt file for reference.

core_clusters.txt

Problems with Quickstart

I use docker on Mac, but I have problems running it with the commands of Quickstart. There are any changes if I use Mac?
In this step, I have problems with yum

Build the docker image:
docker build -t jcvipangenome .

Problems with "running run_panoct.pl at line 1174"

Input file for script 'plot_pangenome.R'

Could anyone display an example or share a test file as the input for running the script 'plot_pangenome.R'.

Problems with "running annotation script at line 704" and "running run_panoct.pl at line 1174"

Hi, when I run the command:

perl "$Panoct_bin"/run_pangenome.pl --no_grid --blast_local --panoct_local --use_nuc --working_dir "$Panoct_out" --gb_dir "$Gbks" --panoct_verbose

The following problems are reported:
Problem running annotation script at line 704 Problem with running run_panoct.pl at line 1174

However, a results folder is produced which contains all the outputs I would expect from run_panoct.pl including the matchtables.

I'm not sure if this is something I should be concerned about. Any explanations as to why these problems are being reported would be greatly appreciated. Thanks!

python setup failed

I tried to build a docker image using the procedure clearly explained in the QUICKSTART file. Unfortunately the docker build fails returning:

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-6DNwgF/numpy/
You are using pip version 8.1.2, however version 19.2.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
The command '/bin/sh -c pip install biopython' returned a non-zero code: 1

Any help would be appreciated.

Some runs complete with no reported PanGenomePipeline errors, but fail full converstion to PanACEA

I'm experiencing some PanGenomePipeline runs where the pipeline appears to complete normally. (I.e. no errors were reported.) These runs are not able to be fully converted into the PanACEA output. (There is partial output.) I've been digging into the underlying scripts, but haven't been successful in pinning down the source of exactly what the underlying cause might be. There are couple of things that might provide some clues. In problematic runs, the Core.att and Core.attfGI files contain multiple cores identified (I.e. the first field in either file). I can't recall seeing multiple cores being identified in any normally performing runs. In normal runs, I've only seen a single core identified. The partial PanACEA conversion I get from the problematic runs only convert the data corresponding to the first core identified in the Core.att file. Is it O.K. to have more than one core being output in the Core.att and Core.attfGI files? If not then what might be causing multiple cores to be identified and output? If it is O.K. to have more than one core output, then is the problem that PanACEA's script is not handling multiple cores correctly? Any assistance in tracking down the problem and getting things running normally would be greatly appreciated.

run_pangenome.pl reports problems running run_panoct.pl

I executed the run_pangenome.pl pipeline from within a Docker container. The input files were on a group of NCBI RefSeq genbank format files. The command run was:

/pangenome/bin/run_pangenome.pl --no_grid --gb_dir gb_files

This produced the following terminal messages:
Problem running annotation script at line 704
Problem with running run_panoct.pl at line 1174

There were results files generated, so I'm not sure if this is a warning and that the results produced are complete, or whether this is an error message indicating the pipeline did not finish. I'm wondering if there is a file naming problem. The "get_max_target_seqs" subroutine at run_panoct.pl line 1174 works on list of the input file names and I see a note in the program that says "The display name must match the genome name found in the gene attribute file if running PanOCT." I haven't looked into the annotation script at line 704 message since I'm not sure which script is being referred to.

Thanks in advance for the help.

Increase the usage of combined assignment operators

👀 Some source code analysis tools can help to find opportunities for improving software components.
💭 I propose to increase the usage of combined operators accordingly.

Would you like to integrate anything from a transformation result which can be generated by a command like the following?
(:point_right: Please check also for questionable change suggestions because of an evolving search pattern.)

lokal$ perl -p -i.orig -0777 -e 's/(?<target>\$\S+)\s*=\s*\k<target>[ \t]*(?<operator>[+%^.x]|-(?!>)|&(?:&|.)?|\|(?:\||.)?|\*\*?|\/\/?|<<|>>|^.)/$+{target} $+{operator}=/gm' $(find ~/Projekte/PanGenomePipeline/lokal -name '*.pl' -o -name '*.pm')

jcventerinstitute / pangenomepipeline Goto Github PK

pangenomepipeline's People

Contributors

Stargazers

Watchers

Forkers

pangenomepipeline's Issues

Why does the core_clusters.txt file contain blank cells in genome columns (columns >=8)?

Problems with Quickstart

Problems with "running run_panoct.pl at line 1174"

Input file for script 'plot_pangenome.R'

Problems with "running annotation script at line 704" and "running run_panoct.pl at line 1174"

python setup failed

Some runs complete with no reported PanGenomePipeline errors, but fail full converstion to PanACEA

run_pangenome.pl reports problems running run_panoct.pl

/pangenome/bin/run_pangenome.pl --no_grid --gb_dir gb_files

Increase the usage of combined assignment operators

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent