The compacttree from niemasd

compacttree's Introduction

My name is (Alexander) Niema Moshiri, and I am an Associate Teaching Professor in the Computer Science & Engineering Department at the University of California, San Diego. I work on computational biology, with a research focus on viral phylogenetics and epidemiology. I also place a heavy emphasis on teaching, namely on the development of online educational content, primarily Massive Adaptive Interactive Texts (MAITs). You can find more information about me on my website or in my Curriculum Vitae (CV).

compacttree's People

Stargazers

Watchers

compacttree's Issues

Store nodes in level-order traversal

Right now, I'm storing nodes in in a pre-order traversal that mirrors parsing the Newick string from left to right. As a result, I'm able to load the tree without loading the entire Newick string into memory, and I'm able to perform pre-order and post-order traversals extremely quickly (pre-order: iterate from 0 to n-1; post-order: iterate from n-1 to 0)

However, in this representation, I need to store all of every node's children, since there's no generic pattern for finding the children of a given node. As a result, every node has an entire vector to store its children, where the size of that vector is equal to the total number of children it has (and the vector itself has additional memory overhead)

Instead, I should store the nodes in a level-order traversal so that 0 = root, then the next nodes are the children of the root, then the children of the first child of the root, then the children of the second child of the root, etc. In this representation, I can represent all of the children of a given node using just 2 numbers: the first and the last child (and everything in between can be assumed to also be a child)

To do this, I'll need to read the entire Newick string into memory up-front, and then I'll need to parse it from outside-in (rather than left-to-right). I was initially avoiding this to try to save memory, but for the Greengenes2 tree, the Newick string is ~500 MB, but peak memory usage to load it as a compact_tree object is almost 2 GB. Thus, given that this new representation will dramatically reduce memory usage by avoiding having all of those vector objects to store each node's children, I think it'll still be a net reduction in peak memory usage

Add optional argument to constructor for reserving number of nodes

Right now, I'm not reserving any space up-front, so the std::vectors resize as I load the tree. I should have an optional argument, e.g. reserve, that takes an unsigned integer and that reserves that much space up-front. Default value should be 0 and should do exactly what I'm doing now (not reserving memory up-front)

Recommend Projects

niemasd / compacttree Goto Github PK

compacttree's Introduction

compacttree's People

Stargazers

Watchers

compacttree's Issues

Store nodes in level-order traversal

Add optional argument to constructor for reserving number of nodes

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent