- In metabolic engineering, optimizing biosynthetic pathways often involves comparing the performance of many isozymes for specific pathway reactions. This requires curating a list of genes, designing a unique set of oligonucleotide primers for each, and using PCR amplification to generate sufficient material for molecular cloning & strain engineering.
- The primer design process can be time-consuming and repetitive, and it ultimately boils down to a loose algorithm involving calculation of biochemical properties and consideration of multiple constraints. This tool uses a simple multi-criteria decision analysis (MCDA) approach to automate primer design, saving time and reducing the manual design effort required for molecular cloning.
-
A containerized, full-stack webapp that designs oligonucleotide primers for PCR amplification of a list input of amplicon sequences.
- For streamlined deployment, simply install Docker, download the source code, modify one local file for login credentials, and run the app with a single terminal command.
-
The app features two web interfaces: one for uploading CSVs, viewing/download results, and triggering MySQL database tracking. The second interface serves as a database admin webpage for database management and running SQL queries.
- The app requires a two-column CSV file with headers 'amplicon name' and 'sequence' as input, and returns a zip file containing the input file, the scored list of all primer options considered, and the subsetted list of top-ranked optimal primers for each amplicon.
- The MySQL data model contains 3 tables (submissions, amplicons, and primers_all_options), and 1 filtered view (optimal_primers)
- See
init.sql
file for more details
- See
- Tech stack: Docker (docker-compose & Dockerfile), Flask (python, pandas, Biopython, HTML, jinja, session), MySQL (database configuration with DDL, data loading with python mysql-connector, and querying with SQL)
- Molecular biology: Encoding a complex biological formula (Modified Breslauer Melting Temperature) into python, forward & reverse primer generation through multi-criteria decision analysis (MCDA) considering Tm, GC%, and presence or absence of a GC clamp
- Best practices: containerization & dependency management (docker-compse, Dockerfile, requirements.txt), secrets management (.env file), input validation (try/except), error handling (error.html with helpful error messages), comments & documentation (docstrings, inline comments, detailed README.md)
/bulk-primer-designer/
|-- /data/ (for example files, output zip files, and db files)
| |-- (/mysql_data/) - not in repo, but automatically generated upon initialization
| |-- empty_template_input_file.csv
| |-- example_input_file.csv
|-- /readme_resources/
| |-- various .png files referenced in README
|-- /webapp/
| |-- /templates/
| |-- error.html
| |-- index.html
| |-- success.html
| |-- app.py
| |-- Dockerfile
| |-- load_database.py
| |-- primer_designer.py
| |-- requirements.txt
|-- .gitignore
|-- docker-compose.yml
|-- env.txt (MUST BE CHANGED TO .ENV LOCALLY; REPLACE DEFAULT VALUES)
|-- init.sql
|-- LICENSE
|-- README.md
- Supported architectures:
amd64
,arm64v8
- Ensure Docker is installed - follow OS-specific links to docker installation documentation
- Clone or download this repo to copy the required files & file structure locally
- If github is configured locally, open a terminal window and run:
git clone --depth 1 https://github.com/ron-yadin/bulk-primer-designer.git
- Alternatively, click green
Code
button >Download ZIP
, then unzip the file locally
- If github is configured locally, open a terminal window and run:
- Convert the
env.txt
file into a configured.env
file locally- rename the
env.txt
template file as.env
, and replace the default user name, passwords, database name, and flask secret key with custom secret values - this is a security best practice to avoid publication of sensitive login information - the
.env
is in the.gitignore
file, and will not be included in version control
- rename the
- Ensure Docker daemon is running locally (by starting Docker Desktop, for example)
- Can verify by running
docker run hello-world
in a terminal window and confirming "Hello from Docker!" message displayed.
- Can verify by running
- Open a terminal window
- Navigate to the project folder using
cd local/path/to/bulk-primer-designer
- Update the path to match project folder location in local file system
- Run the command:
docker-compose up --build
- Optionally, add the
-d
flag to run in "detached mode" (in the background) - The app takes ~20-40 seconds to initialize, perhaps longer upon the first use.
- Typically this is how the final initialization line in the log looks, indicating the webapp is ready for use:
bulk-primer-designer-mysql-1 | 2023-12-31T20:22:02.531310Z 0 [System] [MY-010931] [Server] /usr/sbin/mysqld: ready for connections. Version: '8.2.0' socket: '/var/run/mysqld/mysqld.sock' port: 3306 MySQL Community Server - GPL.
- Optionally, add the
- Open a web browser and visit
localhost:5001
for the webapp user interface- There will be a description, instructions, links to download an example input file & empty input file template, and a form to submit an input csv and submitter name.
- Upon submission and successful execution, a redirect to the successful results display page will occur, with a link to download the output zip file
- There will be a description, instructions, links to download an example input file & empty input file template, and a form to submit an input csv and submitter name.
- Open a web browser and visit
localhost:8080
for the MySQL database administration interface- use the user name (
MYSQL_USER
) & password (MYSQL_PASSWORD
) configured in the.env
file to sign into the MySQL admin dashboard - To inspect & query tables, click the database name (
MYSQL_DATABASE
) in the left panel. Tables will be shown and "SQL" option in the top navigation bar will open a box to enter queries- example SQL queries to run:
SELECT * FROM submissions s LEFT JOIN amplicons a on s.submission_id = a.submission_id LEFT JOIN optimal_primers op on a.amplicon_id = op.amplicon_id
SELECT * FROM submissions s LEFT JOIN amplicons a on s.submission_id = a.submission_id LEFT JOIN primers_all_options pao on a.amplicon_id = pao.amplicon_id
- use the user name (
- To stop the app:
- if running in "streaming mode" (without the "detached mode" flag
-d
), with the terminal window selected, pressCtrl+C
, followed by running the commanddocker-compose down -v
. - If running in "detached mode", simply run the command
docker-compose down -v
in the terminal. - Note: the data in the MySQL database persists between container restarts in a locally-mounted docker volume, automatically generating the /data/mysql_data directory
- IMPORTANT: deleting the data/mysql_data/ directory will delete the data and reset the database upon re-initialization.
- if running in "streaming mode" (without the "detached mode" flag
- In addition to containing the data definition language (DDL) that configures the MySQL data model, the
init.sql
file includes the insertion of a few rows of example data upon initialization, to ensure the tables get created even in the absence of any user action.- These lines can be removed if no example data is desired.
- Successful submission and execution of the app generates a link to download the results zipfile, which is also saved to the /data/ directory.
- This results zip file contains the input file, the scored list of all primer options considered, and the subsetted list of top-ranked optimal primers for each amplicon
- The filename is YYYY-MM-DD_HH:MM:SS_<input_filename>.zip
- All timestamps are localized to California time.
- Input amplicon sequences are cleaned up after import - so "sequence" values are not case sensitive, and newline characters ("\n"), return characters ("\r"), and spaces will be removed (i.e. copy+paste from fasta file is OK)
For each input amplicon sequence:
- All 8 sequence options between 19 - 26 basepairs in length for both forward and reverse primers are generated
- Each option is assessed for a GC clamp (terminal C or G), and assigned a binary GC clamp score of 0 or 1.
- GC% is calculated for each primer option
- Melting temperature is calculated for each primer option with this custom function
- The distance between the primer option melting temperature the target melt temperature of 62°C is calculated
- The distance between the primer option GC% the target GC% of 50% is calculated
- A normalized melting temperature score & normalized GC% score (between 0 -1) is calculated for each option, based on the range within each primer option group
- The total score is calculated by summing the GC clamp score, the normalized GC% score, and 2x the normalized melting temperature score (giving it a 2X weight, reflecting its importance for successful amplification)
- The primer options are given a rank based on their total score within each primer group - this dataframe is returned as the
all_options_ranked_df
- A filtered subset of all the rank #1 options is then created - this dataframe is returned as the
optimal_primer_results_df
- This error may appear if Requirements step #1 was overlooked, and installation is attempted on an unsupported architecture
no matching manifest for <non-supported/system/architecture> in the manifest list entries
- This error may appear if Installation step #2 was skipped or carried out incorrectly - ensure the
env.txt
file was converted successfully to a.env
file, and is present in the project directory
Failed to load /Users/<username>/repos/bulk-primer-designer/.env: open /Users/<username>/repos/bulk-primer-designer/.env: no such file or directory
- If issues arise in connecting to or executing actions with either user webapp or MySQL admin webapp, examine the container status & docker logs
- In a new terminal window, run
docker ps
. There should be 3 containers running - If running in "detached mode", run
docker logs <container id>
for each container - examine if anyexited with error code #
. This would indicate an issue, and provide information for further troubleshooting. - If running in "streaming mode" (not detached), simply examine the logs in the terminal window for such error codes.
- In a new terminal window, run
- This is an early-stage, example project designed for the purpose of demonstrating basic data engineering & molecular biology skills as a part of my public portfolio - it has not been exhaustively tested, and there are certainly unhandled failure modes and error states that you may encounter while using.
- While I plan to continue improving this tool's robustness (feel free to file issues on github if bugs are encountered!), please be advised that this is meant for demonstration purposes only.
- The troubleshooting section is not comprehensive, and until further details can be added, it is recommended to enter any error messages encountered into google or a LLM chatbot for further troubleshooting assistance.
- Future steps include:
- Add more comprehensive input validation checks & appropriate error handling
- For instance: confirming that all input sequences are comprised of the 4 valid DNA bases
- Developing a testing harness of unit & integration tests
- Configuring a CI/CD pipeline to robustly protect the functional main branch
- Using Terraform (or another Infrastructure as Code platform) to provision cloud services sufficient to host this webapp in the cloud, and make it securely availble over the open internet.
- Add more comprehensive input validation checks & appropriate error handling