Sequence alignment with NCBI-BLAST search

This post will illustrate how to install NCBI BLAST package on your local compute, how to install private sequence databases for BLAST search, how to run BLAST search with sequence databases, and how to write a parallel python script for running BLAST search no a computer cluster.

BLAST installation

Install BLAST software

NCBI BLAST tool can be obtained from the software download page. In particular, I downloaded the source code package ncbi-blast-2.2.31+-src.tar.gz from the NCBI FTP server.
A instruction of BLAST toolkit can be found from NCBI books.
I will install BLAST on a linux based machine. The following instruction will work for this purpose.
Locate a random place in disk and unpack the .tar.gz file with the following command

tar -xzvf ncbi-blast-2.2.31+-src.tar.gz
Change to the newly created directory and configure the c++ BLAST package with the following command

cd c++; ./configure
Compile the c++ code with following command

cd ReleaseMT/build; make all_r

The compiling process gonna take long, so be patient.

Install BLAST database

This section will help make our own BLAST database of protein sequences and align our amino acid sequences with the database. Basically, we will be using makeblastdb application to achieve this goal.
Some more instruction can be found from NCBI website.
The current project will be focusing on transporter proteins, so we download the data file of all transporter proteins from the transporter protein data base TCDB.
Usage of the makeblastdb command can be found also from the BLAST package with the following command

makeblastdb -help
To make the BLAST database with an arbitrary FASTA file, I use the following command

./makeblastdb -in tcdb -parse_seqids -dbtype prot

However, make sure that there is no replicated sequence name in the FASTA file.

Perform BLAST

Perform BLAST search with the following command

./blastp -evalue 0.01 -num_threads 4 -outfmt "6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore" -db tcdb -query tmp -out tmp.out

Run BLAST in parallel

TO BE COMPLETED

Hongyu Su 16 June 2015

Informatica ← Hongyu Su