Git Product home page Git Product logo

compacttree's Introduction

My name is (Alexander) Niema Moshiri, and I am an Associate Teaching Professor in the Computer Science & Engineering Department at the University of California, San Diego. I work on computational biology, with a research focus on viral phylogenetics and epidemiology. I also place a heavy emphasis on teaching, namely on the development of online educational content, primarily Massive Adaptive Interactive Texts (MAITs). You can find more information about me on my website or in my Curriculum Vitae (CV).

compacttree's People

Stargazers

 avatar

Watchers

 avatar

compacttree's Issues

Store nodes in level-order traversal

Right now, I'm storing nodes in in a pre-order traversal that mirrors parsing the Newick string from left to right. As a result, I'm able to load the tree without loading the entire Newick string into memory, and I'm able to perform pre-order and post-order traversals extremely quickly (pre-order: iterate from 0 to n-1; post-order: iterate from n-1 to 0)

However, in this representation, I need to store all of every node's children, since there's no generic pattern for finding the children of a given node. As a result, every node has an entire vector to store its children, where the size of that vector is equal to the total number of children it has (and the vector itself has additional memory overhead)

Instead, I should store the nodes in a level-order traversal so that 0 = root, then the next nodes are the children of the root, then the children of the first child of the root, then the children of the second child of the root, etc. In this representation, I can represent all of the children of a given node using just 2 numbers: the first and the last child (and everything in between can be assumed to also be a child)

To do this, I'll need to read the entire Newick string into memory up-front, and then I'll need to parse it from outside-in (rather than left-to-right). I was initially avoiding this to try to save memory, but for the Greengenes2 tree, the Newick string is ~500 MB, but peak memory usage to load it as a compact_tree object is almost 2 GB. Thus, given that this new representation will dramatically reduce memory usage by avoiding having all of those vector objects to store each node's children, I think it'll still be a net reduction in peak memory usage

Add optional argument to constructor for reserving number of nodes

Right now, I'm not reserving any space up-front, so the std::vectors resize as I load the tree. I should have an optional argument, e.g. reserve, that takes an unsigned integer and that reserves that much space up-front. Default value should be 0 and should do exactly what I'm doing now (not reserving memory up-front)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.