Git Product home page Git Product logo

microbedb's Introduction

MicrobeDB

How to access

MicrobeDB is distributed using the CERN VM File System (CVMFS). Docker and CSI deployment recipes are available in ./destinations. The recipes are executed by Terraform.

Docker may fail to unmount CVMFS during shutdown, run sudo fusermount -u ./microbedb/mount if you encounter transport endpoint is not connected errors.

OSX Peculiarities

OSX does not natively support Docker, it runs Docker within a Linux virtual machine. This workaround means that support is limited to only the most basic use case. While mounting MicrobeDB via CVMFS, it will fail with an error.

To work around this CVMFS must be installed and configured manually. First ensure that FUSE is enabled by running kextstat | grep -i fuse. Download the CVMFS package. Install the pkg and reboot. Copy ../destinations/docker/cvmfs.config to /etc/cvmfs/default.local. Copy ./microbedb.brinkmanlab.ca.pub to /etc/cvmfs/keys/microbedb.brinkmanlab.ca.pub. Ensure everything is configured properly by running sudo cvmfs_config chksetup. You MUST mount the CVMFS repository under a shared folder as configured in your Docker settings for it to be accessible by Docker. By default /tmp should be included as a shared folder and you can mount the repository to /tmp/microbedb. Ensure /tmp/microbedb exists and run sudo mount -t cvmfs microbedb.brinkmanlab.ca /tmp/microbedb.

Schema documentation

Run sqlite3 microbedb.sqlite '.schema' to view documentation of the various tables and columns. The assembly table is largely undocumented because NCBI does not document their data schemas.

Working with taxonomy data

Use SQLite recursive query to determine if tax_id is subclass of ancestor. The following returns 1 if the query_tax_id is a subclass of ancestor_tax_id:

WITH RECURSIVE subClassOf(n) AS (
    VALUES (query_tax_id)
    UNION
    SELECT parent_tax_id
    FROM taxonomy_nodes,
         subClassOf
    WHERE taxonomy_nodes.tax_id = subClassOf.n
      AND taxonomy_nodes.tax_id != ancestor_tax_id
)
SELECT 1
FROM subClassOf
WHERE n = ancestor_tax_id
LIMIT 1;

Build requirements

Ensure the find command supports -empty by running find --help | grep '-empty'. The most recent CVMFS commit of the repository must be mounted on all compute nodes. cvmfs_config must be accessible on all compute nodes.

Project Layout

  • destinations/* - terraform modules to deploy a CVMFS client configured with microbedb to various environments
  • update.sh - Script to sync data with NCBI for a CVMFS server
  • init_env.sh - Script to install dependencies for update.sh
  • fetch.sh - Executed by update.sh per chunk of datasets returned by Entrez
  • finalize.sh - Executed by update.sh once all invocations of fetch.sh have completed
  • resume.sh - Script to allow resuming execution of fetch.sh invocations in the event that any fail. This script is copied to the job directory by update.sh and is intended to be executed from there.
  • schema.sql - Database schema
  • temp_tables.sql - Temporary table schema used by fetch.sh
  • subclassOf.sh - Example utility to query database taxonomy data

microbedb's People

Contributors

innovate-invent avatar

Stargazers

 avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

microbedb's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.