Bulk Primer Designer for PCR Amplification, with MySQL Database Tracking

Problem Statement

In metabolic engineering, optimizing biosynthetic pathways often involves comparing the performance of many isozymes for specific pathway reactions. This requires curating a list of genes, designing a unique set of oligonucleotide primers for each, and using PCR amplification to generate sufficient material for molecular cloning & strain engineering.
The primer design process can be time-consuming and repetitive, and it ultimately boils down to a loose algorithm involving calculation of biochemical properties and consideration of multiple constraints. This tool uses a simple multi-criteria decision analysis (MCDA) approach to automate primer design, saving time and reducing the manual design effort required for molecular cloning.

Overview

A containerized, full-stack webapp that designs oligonucleotide primers for PCR amplification of a list input of amplicon sequences.
- For streamlined deployment, simply install Docker, download the source code, modify one local file for login credentials, and run the app with a single terminal command.
The app features two web interfaces: one for uploading CSVs, viewing/download results, and triggering MySQL database tracking. The second interface serves as a database admin webpage for database management and running SQL queries.
- The app requires a two-column CSV file with headers 'amplicon name' and 'sequence' as input, and returns a zip file containing the input file, the scored list of all primer options considered, and the subsetted list of top-ranked optimal primers for each amplicon.
- The MySQL data model contains 3 tables (submissions, amplicons, and primers_all_options), and 1 filtered view (optimal_primers)
  - See init.sql file for more details

Skills Highlighted

Tech stack: Docker (docker-compose & Dockerfile), Flask (python, pandas, Biopython, HTML, jinja, session), MySQL (database configuration with DDL, data loading with python mysql-connector, and querying with SQL)
Molecular biology: Encoding a complex biological formula (Modified Breslauer Melting Temperature) into python, forward & reverse primer generation through multi-criteria decision analysis (MCDA) considering Tm, GC%, and presence or absence of a GC clamp
Best practices: containerization & dependency management (docker-compse, Dockerfile, requirements.txt), secrets management (.env file), input validation (try/except), error handling (error.html with helpful error messages), comments & documentation (docstrings, inline comments, detailed README.md)

Project File Structure

/bulk-primer-designer/
|-- /data/ (for example files, output zip files, and db files)
|   |-- (/mysql_data/) - not in repo, but automatically generated upon initialization
|   |-- empty_template_input_file.csv
|   |-- example_input_file.csv
|-- /readme_resources/ 
|   |-- various .png files referenced in README
|-- /webapp/
|   |-- /templates/
|       |-- error.html
|       |-- index.html
|       |-- success.html
|   |-- app.py
|   |-- Dockerfile
|   |-- load_database.py
|   |-- primer_designer.py
|   |-- requirements.txt
|-- .gitignore
|-- docker-compose.yml
|-- env.txt (MUST BE CHANGED TO .ENV LOCALLY; REPLACE DEFAULT VALUES)
|-- init.sql
|-- LICENSE
|-- README.md

Requirements

Supported architectures: amd64, arm64v8
Ensure Docker is installed - follow OS-specific links to docker installation documentation
- Docker Desktop recommended for MacOS & Windows.
- virtual Docker Desktop (or Docker Engine & Docker Compose) installed on 64-bit Linux should also work (but not yet tested).

Installation

Clone or download this repo to copy the required files & file structure locally
- If github is configured locally, open a terminal window and run: git clone --depth 1 https://github.com/ron-yadin/bulk-primer-designer.git
- Alternatively, click green Code button > Download ZIP, then unzip the file locally
Convert the env.txt file into a configured .env file locally
- rename the env.txt template file as .env, and replace the default user name, passwords, database name, and flask secret key with custom secret values
- this is a security best practice to avoid publication of sensitive login information - the .env is in the .gitignore file, and will not be included in version control

Usage

Ensure Docker daemon is running locally (by starting Docker Desktop, for example)
- Can verify by running docker run hello-world in a terminal window and confirming "Hello from Docker!" message displayed.
Open a terminal window
Navigate to the project folder using cd local/path/to/bulk-primer-designer
- Update the path to match project folder location in local file system
Run the command: docker-compose up --build
- Optionally, add the -d flag to run in "detached mode" (in the background)
- The app takes ~20-40 seconds to initialize, perhaps longer upon the first use.
  - Typically this is how the final initialization line in the log looks, indicating the webapp is ready for use:
```
bulk-primer-designer-mysql-1 | 2023-12-31T20:22:02.531310Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.2.0'  socket: '/var/run/mysqld/mysqld.sock'  port: 3306  MySQL Community Server - GPL.
```
Open a web browser and visit localhost:5001 for the webapp user interface
- There will be a description, instructions, links to download an example input file & empty input file template, and a form to submit an input csv and submitter name.
- Upon submission and successful execution, a redirect to the successful results display page will occur, with a link to download the output zip file
Open a web browser and visit localhost:8080 for the MySQL database administration interface
- use the user name (MYSQL_USER) & password (MYSQL_PASSWORD) configured in the .env file to sign into the MySQL admin dashboard
- To inspect & query tables, click the database name (MYSQL_DATABASE) in the left panel. Tables will be shown and "SQL" option in the top navigation bar will open a box to enter queries
  - example SQL queries to run:
```
SELECT *
FROM submissions s
LEFT JOIN amplicons a on s.submission_id = a.submission_id
LEFT JOIN optimal_primers op on a.amplicon_id = op.amplicon_id
```
```
SELECT *
FROM submissions s
LEFT JOIN amplicons a on s.submission_id = a.submission_id
LEFT JOIN primers_all_options pao on a.amplicon_id = pao.amplicon_id
```

To stop the app:
- if running in "streaming mode" (without the "detached mode" flag -d), with the terminal window selected, press Ctrl+C, followed by running the command docker-compose down -v.
- If running in "detached mode", simply run the command docker-compose down -v in the terminal.
- Note: the data in the MySQL database persists between container restarts in a locally-mounted docker volume, automatically generating the /data/mysql_data directory
  - IMPORTANT: deleting the data/mysql_data/ directory will delete the data and reset the database upon re-initialization.

Notes

In addition to containing the data definition language (DDL) that configures the MySQL data model, the init.sql file includes the insertion of a few rows of example data upon initialization, to ensure the tables get created even in the absence of any user action.
- These lines can be removed if no example data is desired.
Successful submission and execution of the app generates a link to download the results zipfile, which is also saved to the /data/ directory.
- This results zip file contains the input file, the scored list of all primer options considered, and the subsetted list of top-ranked optimal primers for each amplicon
- The filename is YYYY-MM-DD_HH:MM:SS_<input_filename>.zip
All timestamps are localized to California time.
Input amplicon sequences are cleaned up after import - so "sequence" values are not case sensitive, and newline characters ("\n"), return characters ("\r"), and spaces will be removed (i.e. copy+paste from fasta file is OK)

Primer Design Logic Explained

For each input amplicon sequence:

All 8 sequence options between 19 - 26 basepairs in length for both forward and reverse primers are generated
Each option is assessed for a GC clamp (terminal C or G), and assigned a binary GC clamp score of 0 or 1.
GC% is calculated for each primer option
Melting temperature is calculated for each primer option with this custom function
The distance between the primer option melting temperature the target melt temperature of 62°C is calculated
The distance between the primer option GC% the target GC% of 50% is calculated
A normalized melting temperature score & normalized GC% score (between 0 -1) is calculated for each option, based on the range within each primer option group
The total score is calculated by summing the GC clamp score, the normalized GC% score, and 2x the normalized melting temperature score (giving it a 2X weight, reflecting its importance for successful amplification)
The primer options are given a rank based on their total score within each primer group - this dataframe is returned as the all_options_ranked_df
A filtered subset of all the rank #1 options is then created - this dataframe is returned as the optimal_primer_results_df

Troubleshooting

This error may appear if Requirements step #1 was overlooked, and installation is attempted on an unsupported architecture

no matching manifest for <non-supported/system/architecture> in the manifest list entries

This error may appear if Installation step #2 was skipped or carried out incorrectly - ensure the env.txt file was converted successfully to a .env file, and is present in the project directory

Failed to load /Users/<username>/repos/bulk-primer-designer/.env: open /Users/<username>/repos/bulk-primer-designer/.env: no such file or directory

If issues arise in connecting to or executing actions with either user webapp or MySQL admin webapp, examine the container status & docker logs
- In a new terminal window, run docker ps. There should be 3 containers running
- If running in "detached mode", run docker logs <container id> for each container - examine if any exited with error code #. This would indicate an issue, and provide information for further troubleshooting.
- If running in "streaming mode" (not detached), simply examine the logs in the terminal window for such error codes.

Known Limitations & Next Steps

This is an early-stage, example project designed for the purpose of demonstrating basic data engineering & molecular biology skills as a part of my public portfolio - it has not been exhaustively tested, and there are certainly unhandled failure modes and error states that you may encounter while using.
- While I plan to continue improving this tool's robustness (feel free to file issues on github if bugs are encountered!), please be advised that this is meant for demonstration purposes only.
- The troubleshooting section is not comprehensive, and until further details can be added, it is recommended to enter any error messages encountered into google or a LLM chatbot for further troubleshooting assistance.
Future steps include:
- Add more comprehensive input validation checks & appropriate error handling
  - For instance: confirming that all input sequences are comprised of the 4 valid DNA bases
- Developing a testing harness of unit & integration tests
- Configuring a CI/CD pipeline to robustly protect the functional main branch
- Using Terraform (or another Infrastructure as Code platform) to provision cloud services sufficient to host this webapp in the cloud, and make it securely availble over the open internet.

tsuijl / bulk-primer-designer Goto Github PK

bulk-primer-designer's Introduction

Bulk Primer Designer for PCR Amplification, with MySQL Database Tracking

Problem Statement

Overview

Skills Highlighted

Project File Structure

Requirements

Installation

Usage

Notes

Primer Design Logic Explained

Troubleshooting

Known Limitations & Next Steps

bulk-primer-designer's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent