jcventerinstitute / pangenomepipeline Goto Github PK
View Code? Open in Web Editor NEWPanGenomePipeline
License: GNU General Public License v2.0
PanGenomePipeline
License: GNU General Public License v2.0
Hi, I'm a relatively new user of PanOCT. In viewing some of the output files, I noticed that in the core_clusters.txt file, while the vast majority of rows have no blank cells in the genome columns (columns >= 8), a minority of rows have some blank cells. These empty cells seem to always fall in rows where the "attributes" column contains an "FS", "FG-In", or "FN".
Given the description of core_clusters.txt file in the OUT_FILE_DESCRIPTIONS.txt file ("core_clusters.txt: Tab delimited file. Lists only those clusters which have a representative from every genome in the analysis."), I'm confused how the core_clusters.txt file could have any empty cells.
I apologize if this is a novice question, I can't seem to connect the dots by reading through the documentation on this site, the original 2012 PanOCT paper, or the 2018 pangenome pipeline paper.
Any guidance would be greatly appreciated. I have attached my core_clusters.txt file for reference.
I use docker on Mac, but I have problems running it with the commands of Quickstart. There are any changes if I use Mac?
In this step, I have problems with yum
Could anyone display an example or share a test file as the input for running the script 'plot_pangenome.R'.
Hi, when I run the command:
perl "$Panoct_bin"/run_pangenome.pl --no_grid --blast_local --panoct_local --use_nuc --working_dir "$Panoct_out" --gb_dir "$Gbks" --panoct_verbose
The following problems are reported:
Problem running annotation script at line 704 Problem with running run_panoct.pl at line 1174
However, a results folder is produced which contains all the outputs I would expect from run_panoct.pl including the matchtables.
I'm not sure if this is something I should be concerned about. Any explanations as to why these problems are being reported would be greatly appreciated. Thanks!
I tried to build a docker image using the procedure clearly explained in the QUICKSTART file. Unfortunately the docker build fails returning:
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-6DNwgF/numpy/
You are using pip version 8.1.2, however version 19.2.3 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
The command '/bin/sh -c pip install biopython' returned a non-zero code: 1
Any help would be appreciated.
I'm experiencing some PanGenomePipeline runs where the pipeline appears to complete normally. (I.e. no errors were reported.) These runs are not able to be fully converted into the PanACEA output. (There is partial output.) I've been digging into the underlying scripts, but haven't been successful in pinning down the source of exactly what the underlying cause might be. There are couple of things that might provide some clues. In problematic runs, the Core.att and Core.attfGI files contain multiple cores identified (I.e. the first field in either file). I can't recall seeing multiple cores being identified in any normally performing runs. In normal runs, I've only seen a single core identified. The partial PanACEA conversion I get from the problematic runs only convert the data corresponding to the first core identified in the Core.att file. Is it O.K. to have more than one core being output in the Core.att and Core.attfGI files? If not then what might be causing multiple cores to be identified and output? If it is O.K. to have more than one core output, then is the problem that PanACEA's script is not handling multiple cores correctly? Any assistance in tracking down the problem and getting things running normally would be greatly appreciated.
I executed the run_pangenome.pl pipeline from within a Docker container. The input files were on a group of NCBI RefSeq genbank format files. The command run was:
This produced the following terminal messages:
Problem running annotation script at line 704
Problem with running run_panoct.pl at line 1174
There were results files generated, so I'm not sure if this is a warning and that the results produced are complete, or whether this is an error message indicating the pipeline did not finish. I'm wondering if there is a file naming problem. The "get_max_target_seqs" subroutine at run_panoct.pl line 1174 works on list of the input file names and I see a note in the program that says "The display name must match the genome name found in the gene attribute file if running PanOCT." I haven't looked into the annotation script at line 704 message since I'm not sure which script is being referred to.
Thanks in advance for the help.
๐ Some source code analysis tools can help to find opportunities for improving software components.
๐ญ I propose to increase the usage of combined operators accordingly.
Would you like to integrate anything from a transformation result which can be generated by a command like the following?
(:point_right: Please check also for questionable change suggestions because of an evolving search pattern.)
lokal$ perl -p -i.orig -0777 -e 's/(?<target>\$\S+)\s*=\s*\k<target>[ \t]*(?<operator>[+%^.x]|-(?!>)|&(?:&|.)?|\|(?:\||.)?|\*\*?|\/\/?|<<|>>|^.)/$+{target} $+{operator}=/gm' $(find ~/Projekte/PanGenomePipeline/lokal -name '*.pl' -o -name '*.pm')
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.