This post will illustrate how to install NCBI BLAST package on your local compute, how to install private sequence databases for BLAST search, how to run BLAST search with sequence databases, and how to write a parallel python script for running BLAST search no a computer cluster.
NCBI BLAST tool can be obtained from the software download page. In particular, I downloaded the source code package ncbi-blast-2.2.31+-src.tar.gz
from the NCBI FTP server.
A instruction of BLAST toolkit can be found from NCBI books.
I will install BLAST on a linux based machine. The following instruction will work for this purpose.
Locate a random place in disk and unpack the .tar.gz
file with the following command
tar -xzvf ncbi-blast-2.2.31+-src.tar.gz
Change to the newly created directory and configure the c++ BLAST package with the following command
cd c++; ./configure
Compile the c++ code with following command
cd ReleaseMT/build; make all_r
The compiling process gonna take long, so be patient.
This section will help make our own BLAST database of protein sequences and align our amino acid sequences with the database. Basically, we will be using makeblastdb
application to achieve this goal.
Some more instruction can be found from NCBI website.
The current project will be focusing on transporter proteins, so we download the data file of all transporter proteins from the transporter protein data base TCDB.
Usage of the makeblastdb
command can be found also from the BLAST package with the following command
makeblastdb -help
To make the BLAST database with an arbitrary FASTA file, I use the following command
./makeblastdb -in tcdb -parse_seqids -dbtype prot
However, make sure that there is no replicated sequence name in the FASTA file.
Perform BLAST search with the following command
./blastp -evalue 0.01 -num_threads 4 -outfmt "6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore" -db tcdb -query tmp -out tmp.out