Git Product home page Git Product logo

gabrieledcjr / dna_matcher_sigcse2020 Goto Github PK

View Code? Open in Web Editor NEW

This project forked from jhustles/dna_matcher_sigcse2020

0.0 1.0 0.0 1.29 MB

Python program that accepts a DNA database CSV file, and a target DNA sequence (text file) in the command line arguments, counts the number of consecutive STR matches, returns "MATCH" or "NO MATCH".

Home Page: https://sigcse2020.sigcse.org/online/nifty.html#dnacs1

License: MIT License

Python 100.00%

dna_matcher_sigcse2020's Introduction

Matching Short Tandem Repeats (STR) in Human DNA Using Python & The Command Line

intro1 intro2 intro3

I. Background

DNA is really just a sequence of molecules called nucleotides, arranged into a particular shape (a double helix). Each nucleotide of DNA contains one of four different bases: adenine (A), cytosine (C), guanine (G), or thymine (T). Every human cell has billions of these nucleotides arranged in sequence. Some portions of this sequence (i.e. genome) are the same, or at least very similar, across almost all humans, but other portions of the sequence have a higher genetic diversity and thus vary more across the population.

One place where DNA tends to have high genetic diversity is in Short Tandem Repeats (STRs). An STR is a short sequence of DNA bases that tends to repeat consecutively numerous times at specific locations inside of a person’s DNA. The number of times any particular STR repeats varies a lot among individuals. In the DNA samples below, for example, Alice has the STR AGAT repeated four times in her DNA, while Bob has the same STR repeated five times.

intro3

Using multiple STRs, rather than just one, can improve the accuracy of DNA profiling. If the probability that two people have the same number of repeats for a single STR is 5%, and the analyst looks at 10 different STRs, then the probability that two DNA samples match purely by chance is about 1 in 1 quadrillion (assuming all STRs are independent of each other). So if two DNA samples match in the number of repeats for each of the STRs, the analyst can be pretty confident they came from the same person. CODIS, The FBI’s DNA database, uses 20 different STRs as part of its DNA profiling process.

II. System Prerequisites

  • Python >= 3.7

III. Data Source

IV. The Target And Objective

The goal is to write a Python program that reads in a target DNA sequence file, stored in a txt file, and loops through the database files, in this case stored in flat files (CSV), and returns either a matching person's name or "NO MATCH".

Example from DNA Database from the US Department Of Commerce - National Institute of Standards And Technology target1 target2 target4

V. The Algorithm & Test Cases Results

how1 how2 how3

The test cases if you want to run them are in the "textcases.txt" file.

results

Personal Note

  • Hope you enjoyed it. Thank you for your time!

Author

  • Johneson Giang - Developer - Github

License

This project is licensed under the MIT License - see the LICENSE.md file for details

Acknowledgments

  • I definitely want to give a shout out to my dear teacher, mentor, and friend @CodingWithCorgis!
  • Shout out to David J. Malan & Brian Yu

dna_matcher_sigcse2020's People

Contributors

jhustles avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.