Git Product home page Git Product logo

pangenomepipeline's People

Contributors

erinbeck avatar grangersutton avatar gregducq avatar idpcc avatar inmanjm avatar singhindresh avatar thclarke avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pangenomepipeline's Issues

Why does the core_clusters.txt file contain blank cells in genome columns (columns >=8)?

Hi, I'm a relatively new user of PanOCT. In viewing some of the output files, I noticed that in the core_clusters.txt file, while the vast majority of rows have no blank cells in the genome columns (columns >= 8), a minority of rows have some blank cells. These empty cells seem to always fall in rows where the "attributes" column contains an "FS", "FG-In", or "FN".

Given the description of core_clusters.txt file in the OUT_FILE_DESCRIPTIONS.txt file ("core_clusters.txt: Tab delimited file. Lists only those clusters which have a representative from every genome in the analysis."), I'm confused how the core_clusters.txt file could have any empty cells.

I apologize if this is a novice question, I can't seem to connect the dots by reading through the documentation on this site, the original 2012 PanOCT paper, or the 2018 pangenome pipeline paper.

Any guidance would be greatly appreciated. I have attached my core_clusters.txt file for reference.

core_clusters.txt

Problems with Quickstart

I use docker on Mac, but I have problems running it with the commands of Quickstart. There are any changes if I use Mac?
In this step, I have problems with yum

  1. Build the docker image:
    docker build -t jcvipangenome .

Problems with "running annotation script at line 704" and "running run_panoct.pl at line 1174"

Hi, when I run the command:

perl "$Panoct_bin"/run_pangenome.pl --no_grid --blast_local --panoct_local --use_nuc --working_dir "$Panoct_out" --gb_dir "$Gbks" --panoct_verbose

The following problems are reported:
Problem running annotation script at line 704 Problem with running run_panoct.pl at line 1174

However, a results folder is produced which contains all the outputs I would expect from run_panoct.pl including the matchtables.

I'm not sure if this is something I should be concerned about. Any explanations as to why these problems are being reported would be greatly appreciated. Thanks!

python setup failed

I tried to build a docker image using the procedure clearly explained in the QUICKSTART file. Unfortunately the docker build fails returning:

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-6DNwgF/numpy/
You are using pip version 8.1.2, however version 19.2.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
The command '/bin/sh -c pip install biopython' returned a non-zero code: 1

Any help would be appreciated.

Some runs complete with no reported PanGenomePipeline errors, but fail full converstion to PanACEA

I'm experiencing some PanGenomePipeline runs where the pipeline appears to complete normally. (I.e. no errors were reported.) These runs are not able to be fully converted into the PanACEA output. (There is partial output.) I've been digging into the underlying scripts, but haven't been successful in pinning down the source of exactly what the underlying cause might be. There are couple of things that might provide some clues. In problematic runs, the Core.att and Core.attfGI files contain multiple cores identified (I.e. the first field in either file). I can't recall seeing multiple cores being identified in any normally performing runs. In normal runs, I've only seen a single core identified. The partial PanACEA conversion I get from the problematic runs only convert the data corresponding to the first core identified in the Core.att file. Is it O.K. to have more than one core being output in the Core.att and Core.attfGI files? If not then what might be causing multiple cores to be identified and output? If it is O.K. to have more than one core output, then is the problem that PanACEA's script is not handling multiple cores correctly? Any assistance in tracking down the problem and getting things running normally would be greatly appreciated.

run_pangenome.pl reports problems running run_panoct.pl

I executed the run_pangenome.pl pipeline from within a Docker container. The input files were on a group of NCBI RefSeq genbank format files. The command run was:

/pangenome/bin/run_pangenome.pl --no_grid --gb_dir gb_files

This produced the following terminal messages:
Problem running annotation script at line 704
Problem with running run_panoct.pl at line 1174

There were results files generated, so I'm not sure if this is a warning and that the results produced are complete, or whether this is an error message indicating the pipeline did not finish. I'm wondering if there is a file naming problem. The "get_max_target_seqs" subroutine at run_panoct.pl line 1174 works on list of the input file names and I see a note in the program that says "The display name must match the genome name found in the gene attribute file if running PanOCT." I haven't looked into the annotation script at line 704 message since I'm not sure which script is being referred to.

Thanks in advance for the help.

Increase the usage of combined assignment operators

๐Ÿ‘€ Some source code analysis tools can help to find opportunities for improving software components.
๐Ÿ’ญ I propose to increase the usage of combined operators accordingly.

Would you like to integrate anything from a transformation result which can be generated by a command like the following?
(:point_right: Please check also for questionable change suggestions because of an evolving search pattern.)

lokal$ perl -p -i.orig -0777 -e 's/(?<target>\$\S+)\s*=\s*\k<target>[ \t]*(?<operator>[+%^.x]|-(?!>)|&(?:&|.)?|\|(?:\||.)?|\*\*?|\/\/?|<<|>>|^.)/$+{target} $+{operator}=/gm' $(find ~/Projekte/PanGenomePipeline/lokal -name '*.pl' -o -name '*.pm')

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.