jambler24 / gengraph Goto Github PK
View Code? Open in Web Editor NEWA repository for the GenGraph toolkit for the creation and manipulation of graph genomes
License: GNU General Public License v3.0
A repository for the GenGraph toolkit for the creation and manipulation of graph genomes
License: GNU General Public License v3.0
This function was created to test two things, the creation of a similarity matrix based on shared nodes, and creating a consensus genome from the nodes that are most often taken through the graph, weighted based on the similarity matrix so that multiple closely related species are weighted down.
The similarity matrix is not being created properly, and the resultant trees are not a true representation of the phylogeny.
I am having this problem can you help please? Thank you
devina@devina-HP-ProBook-450-G3:~/GenGraph$ python3 gengraphTool.py make_genome_graph --seq_file analysis.txt --out_file_name test --recreate_check
Traceback (most recent call last):
File "gengraphTool.py", line 83, in
parsed_input_dict = parse_seq_file(args.seq_file)
File "/home/devina/GenGraph/gengraph.py", line 548, in parse_seq_file
A_seq_label_dict[a_seq_file['aln_name']] = a_seq_file['seq_name']
KeyError: 'seq_name'
If a substring of an isolate is also the name of another isolate, it will result in an error. This is seen in the example of if one sequence is isolate "CDC1551" and another is "C", then an error during refine_initGraph will occur. This is most likely due to the line
if isolate in data['ids']:
and should be replaced with a more strict check.
It is possible that this tools can benefit from the JIT compiler.
Hi!
i am quite new in this field and i struggle to understand the difference between vg and this library, can you help me?
Thank you in advance and thank you for this project!
The national library of medicine (https://www.ncbi.nlm.nih.gov/datasets) only allows you to download a list of genomes in a single fasta file (extension .fna), I tried to provide those files as input but it did not work.
It's my fault? Have you encountered similar problems?
Thank you!
When i run this script:
python ./gengraphTool.py make_genome_graph --seq_file TestGraphs/sequences.txt --out_file_name test
sequences.txt:
seq_name aln_name seq_path annotation_path
H37Rv seq0 /Users/filippo/Desktop/workspace/GenGraph/TestGraphs/H37Rv.fa NA
H37Rv1 seq1 /Users/filippo/Desktop/workspace/GenGraph/TestGraphs/H37Rv1.fa Na
H37Rv2 seq2 /Users/filippo/Desktop/workspace/GenGraph/TestGraphs/H37Rv2.fa N
I got the error:
progressiveMauve_call error: output of progressiveMauve empty
I fixed the error changing the line 2775 in gengraph.py:
old line: number_of_lines = 3 ----- new line: number_of_lines = 2
But i'm not sure about the fix
Thank you!
hi
One question
Why when I start running the program
(base) devina@Devinas-MacBook-Pro ~ % python3 GenGraph/gengraphTool.py make_genome_graph --seq_file Documents/anagengraph.txt --out_file_name Documents/output
Conducting progressiveMauve
progressiveMauve
It got stuck.
I am using a Mac
Processor 2.7 GHz core intel core i7
Memory 16 GB
Two sequences 4.5 MB each
Thank you for your precious help
Devina
Hey, I am trying to run example code in your repo(sequences.txt) with some modifications in local system but I am having this problem.
$ python3 ./gengraphTool.py make_genome_graph --seq_file sequences.txt --out_file_name khush --recreate_check
Running GenGraph Toolkit
Creating genome graph
[OrderedDict([('seq_name', 'H37Rv'), ('aln_name', 'seq0'), ('seq_path', '/home/noob/Documents/IIITD/tavlab/strainflow/GenGraph-master/TestGraphs/H37Rv.fa'), ('annotation_path', 'NA')]), OrderedDict([('seq_name', 'H37Rv1'), ('aln_name', 'seq1'), ('seq_path', '/home/noob/Documents/IIITD/tavlab/strainflow/GenGraph-master/TestGraphs/H37Rv1.fa'), ('annotation_path', 'Na')]), OrderedDict([('seq_name', 'H37Rv2'), ('aln_name', 'seq2'), ('seq_path', '/home/noob/Documents/IIITD/tavlab/strainflow/GenGraph-master/TestGraphs/H37Rv2.fa'), ('annotation_path', 'Na')])]
Conducting progressiveMauve
progressiveMauve Complete
Traceback (most recent call last):
File "./gengraphTool.py", line 136, in <module>
genome_aln_graph = bbone_to_initGraph(bbone_file, parsed_input_dict)
File "/home/noob/Documents/IIITD/tavlab/strainflow/GenGraph-master/gengraph.py", line 1616, in bbone_to_initGraph
backbone_lol = input_parser(bbone_file)
File "/home/noob/Documents/IIITD/tavlab/strainflow/GenGraph-master/gengraph.py", line 1189, in input_parser
in_file = open(file_path, 'r')
FileNotFoundError: [Errno 2] No such file or directory: 'globalAlignment_khush.backbone'
Can you please help me out?
Also, I have one more question to ask, Can I make De-Bruijn Directed graph using this library?
The current release still shows 'Welcome to GenGraph v0.1' in the help even though the Git release is tagged as 0.3.
It should be faster. Looking into it. Related to seq_recreate_check() function.
Hello, I'm a students trying to create a graph similar to figure 3 of the GenGraph paper. I've been trying to get the code to run for more that a week and is always error after errors the latest one is this:
FileNotFoundError: [WinError 2] The system cannot find the file specified
##full code##
C:\Users\eros1\anaconda3\Lib\site-packages\GenGraph>python ./gengraphTool.py make_genome_graph --seq_file C:\Users\eros1\OneDrive\documents\Summer2022_Genome\E.coli_tab.txt --out_file_name test
Conducting progressiveMauve
({'seq0': 'K-12', 'seq1': 'Nissle-1917', 'seq2': 'O157:H7'}, {'K-12': 'C:/Users/eros1/OneDrive/Documents/Summer2022_Genome/E. Coli k-12.fasta', 'Nissle-1917': 'C:/Users/eros1/OneDrive/Documents/Summer2022_Genome/E. Coli Nissle 1917.fasta', 'O157:H7': 'C:/Users/eros1/OneDrive/Documents/Summer2022_Genome/E. Coli O157H7.fasta'}, ['C:/Users/eros1/OneDrive/Documents/Summer2022_Genome/E. Coli k-12.fasta', 'C:/Users/eros1/OneDrive/Documents/Summer2022_Genome/E. Coli Nissle 1917.fasta', 'C:/Users/eros1/OneDrive/Documents/Summer2022_Genome/E. Coli O157H7.fasta'], {'K-12': 'NA', 'Nissle-1917': 'NA', 'O157:H7': 'NA'})
Traceback (most recent call last):
File "./gengraphTool.py", line 87, in
progressiveMauve_alignment(parsed_input_dict[2], args.out_file_name)
File "C:\Users\eros1\anaconda3\Lib\site-packages\GenGraph\gengraph.py", line 1949, in progressiveMauve_alignment
return call(progressiveMauve_call, stdout=open(os.devnull, 'wb'))
File "C:\Users\eros1\anaconda3\lib\subprocess.py", line 340, in call
with Popen(*popenargs, **kwargs) as p:
File "C:\Users\eros1\anaconda3\lib\subprocess.py", line 858, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "C:\Users\eros1\anaconda3\lib\subprocess.py", line 1311, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,
FileNotFoundError: [WinError 2] The system cannot find the file specified
As of neworkx v2.4:
G.node โ> use G.nodes
https://networkx.github.io/documentation/stable/release/release_2.4.html
resulting in an error.
Exess print commands need to be removed for a cleaner UI.
hi
Issue : Exception FileNotOpened thrown from
Unknown() in gnFileSource.cpp 67
Called by Unknown()
Traceback (most recent call last):
File "./GenGraph/gengraphTool.py", line 102, in module
genome_aln_graph = bbone_to_initGraph(bbone_file, parsed_input_dict)
File "/GenGraph/gengraph.py", line 830, in bbone_to_initGraph
iso_length = len(input_parser(input_dict[1][iso])[0]['DNA_seq'])
TypeError: 'NoneType' object is not subscriptable
specification: docker toolbox windows 10
The node realignment step can be parallelised. More research into how the best way would be is required.
Create a function to convert to and from gfa format used by vg. The format is defined here:
https://github.com/GFA-spec/GFA-spec/blob/master/GFA1.md
This may be tricky as at first glance vg does not use a coordinate system for keeping track of relative nucleotide positions. Will look into this.
In the fasta_alignment_to_subnet() function, there is a
copy.deepcopy(true_start)
that according to profiling is taking way too long. A suggested solution is using
g = cPickle.loads(cPickle.dumps(a, -1))
as suggested here:
https://stackoverflow.com/questions/24756712/deepcopy-is-extremely-slow
Will try this first, but otherwise the whole fasta_alignment_to_subnet() function could do with improvement.
--scratch-path-1 is hard-coded. This needs to become relative.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.