Hi Rob, Update: You may want to check the pdbaa.gz first. It seems this is the PDB subset of the nr database. ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/pdbaa.gz ftp://ftp.ncbi.nlm.nih.gov/blast/db/README says: pdbaa.*tar.gz | protein sequences from pdb protein structures, | its parent database is nr. Zhijie =============== Hi Rob, The BLAST nr database (fasta format) can be downloaded from the NCBI ftp: ftp://ftp.ncbi.nlm.nih.gov/blast/db/FASTA/ As I remember it is the nr.gz file. When unzipped the file is called "nr". According to BLAST the nr database does contain PDB entries. http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE_TYPE=BlastDocs&DOC_TYPE=ProgSelectionGuide It is significantly larger than the PDB data file you are currently using. You might consider extract all the PDB sequences from it so that you do not need to go through all the non-PDB sequences. Zhijie -----Original Message----- From: R.D. Oeffner Sent: Friday, August 08, 2014 10:14 AM To: [email protected] Subject: [phenixbb] local BLAST server Hi, I'm in the process of installing a local BLAST server for doing blast protein queries. As I understand it I need a file with all the FASTA sequences as input for initially generating my local BLAST database. The one present in ftp://ftp.rcsb.org/pub/pdb/derived_data/pdb_seqres.txt seems to contain redundant entries. Querying it produces many extra PDB chain-ids when compared to a BLAST query on the NCBI web server. Does anyone know where to get a non-redundant version of FASTA records so that I can create a similar database as the one used by NCBI? Many thanks, Rob -- Robert Oeffner, Ph.D. Research Associate, The Read Group Department of Haematology, Cambridge Institute for Medical Research University of Cambridge Cambridge Biomedical Campus Wellcome Trust/MRC Building Hills Road Cambridge CB2 0XY www.cimr.cam.ac.uk/investigators/read/index.html tel: +44(0)1223 763234 mobile: +44(0)7712 887162 _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb