Bioinformatics and Computational Molecular Biology
Algorithms and Hardware
Computer Code
C code and results files in support of the paper "Parallelization of Homology Search in Bioinformatics by Constraint of Substitution Matrices" which is currently under review.
Matlab code and database file in support of the paper "Covariance Searches for ncRNA Gene FInding" which is currently under review.
Matlab code and results files in support of the paper "Protein Classification Algorithm Suitable for Acceleration with Multimedia Instructions" which is currently under review.
Fuzzy-Based Protein Family Classification
Hydrophobicity-Correlation-Based Protein Family Classification
The following is only the core of a Smith-Waterman assembly language program for 32-bit (non-Thumb) ARM processors. It requires an ARM9 or latter processor (will not run on ARM7). Its primary purpose is to determine the rate of data cache block misses when running the Smith-Waterman algorithm on an ARM 922T CPU core. This processor core is designed to be a part of a system-on-a-chip (SoC).
Smith-Waterman Assembly Code for ARM 32-bit
The following Matlab program takes a Protein Data Bank (PDB) file and converts it to a simpler format with one amino acid per row. Each row contains four entries: a one-letter amino acid code, x alpha carbon location, y alpha carbon location, and z alpha carbon location.
Protein Data Bank (PDB) to Simple Format Matlab Code
This code implements the Smith-Waterman algorithm in Matlab. The input file should be in FASTA format and contain two sequences. A sample input and output file are given. Edit the Matlab file to change the gap start penalty (g), the gap continuation penalty (c), the name of the input/output file (fname), or to enter a substitution matrix other than the PAM250 matrix currently in the program. The output file name will be the same as the input file, except with "_aligned" added to the name.
The code below implements the Viterbi algorithm for finding the best fit of a target sequence to a hidden Markov model in Matlab. Two sample model files are given, a null model and a reasonable model. Both sample models have 4 stages and the string that best fits the reasonable model is "GSDG". The null model has equal probabilities on all transitions and equal probabilities for all amino acids at every match/mutate state. A sample target string and sample output are also given.
Sample 4-stage null model file
Sample model file with best fit to GSDG
Boise State University College of Engineering
Boise State University Department of Electrical and Computer Engineering
This page created by Dr. Scott F. Smith
This page was last updated on 07 May 2006.