Application Architecture


Although currently engineered for a Windows environment, this is not a Windows program, and the underlying ideas should be easily transferable to other operating system. The code itself should port to a Unix/Sybase system with only small changes because of the well known similarities between SQL Server and Sybase. Porting to other combinations of OS and database server may require some adaptation of the CGI/database communication layer.


The main function of this family of programs is to retrieve non-sequence data on the basis of sequence similarity. It works for data which has a strong association with a specific DNA or protein sequence, often, but not necessarily, a gene. A managing database stores the relationship between retrievable data and sequence IDs, and a BLAST search is used to find the data items with sequences similar to a user's query sequence. The non-sequence data is then returned directly to the user.


The main web component of the project is a classic, stateless CGI program, written in C++. This communicates on the client side with the user's browser via a web server (Apache in this case), and on the server side with a database server (in this case SQL Server), and via a DOS system call with NCBI BLAST.

The program operates as follows. A query sequence from a user's browser is passed by Apache to the CGI program. The CGI program determines the appropriate BLAST command and target database(s), wraps up the user's sequence in a temporary .fasta file, and uses a DOS system call to run the BLAST command. The output of BLAST is redirected into a text file, which is parsed and loaded into a database table. The CGI program uses the BLAST data and any user input to determine which non-sequence data to return to the user's browser. See Figure-1.pdf for a schematic overview.

The C++ CGI code is compiled to make the executable search.exe (or whatever you wish to call it). This is put in the Apache cgi-bin directory, along with the BLAST binaries and matrices, and the BLAST databases relevant to your project. The database tables need populating with your data, the functions and stored procedures built, and the database server and databases will need to be configured with the relevant permissions. The application login pages make an initial call to the CGI program, which then generates further pages on the fly as required.


There are several types and groups of code in the application. The code is laid out in folders to indicate the relationship between the different groups. Click on link to take you to any of these code folders (Apache application folders will open as index.html, and DOS batch files may attempt to run).

C++ code
application code
library code
library include files
stored procedures


Click on the following link to download a Zip file of the directory structure and code described above: download.