structure_search
Overview
Structure_search is a tool to quickly identify structural and/or sequence homologs of the input PDB file from the Protein Data Bank. It uses the SARST algorithm, and it's very fast. A typical search time against the whole PDB is usually less than one second. There is an option to allow users to obtain a list of ligands found in PDB structures of those homologs.
Usage
- Identify and superpose homologous pdbs of mypdb.pdb
- phenix.structure_search mypdb.pdb
- obtain a list of homologs of mypdb.pdb and all ligands found in structures of those homologs
- phenix.structure_search mypdb.pdb get_ligand=True
- Use a local PDB mirror and obtain superposed homologs of mypdb.pdb
- phenix.structure_search mypdb.pdb PDB_MIRRORDIR=/path/to/pdb_mirror/top-level
More information can be found in input/Output files sections below:
Output files
- In addition to screen output, these files contains results of structure_search:
- output_(sequence/structure).txt: files containing sequence/structure homologs of 'pdb_file' sorted by scores.
- MyBlast_(sequence/structure).log: Standard BLAST output with selected pairwise alignments. NOTE: for structure alignment, the 'sequences' are structure-based Ramachandran codes (see reference), not 1-letter code for amino acids.
- pdb_ligand.txt (if get_ligand=True): file containing all ligands found in all homologs from this search.
- superposed PDB files: Can be found in TEMPPDB_## subdirectory as prompted in the program output.
Using Local PDB mirror
- By default the program retrieves homologous PDB mmCIF files from RCSB server for downstream processes. Users may choose to use their local PDB mirror if environmental variables "PDB_MIRRORDIR"has already been defined in the running shell of phenix.structure_search. Alternatively, users may define it in the command-line script or specify the path in the GUI. See more details below.
- PDB_MIRRORDIR: Defines the top level of the local PDB mirror. The program will try to retrieve PDB mmCIF files from local mirror unless the path does not exist. Note this assumes the directory tree under it follows that in the RCSB server and will try to access $PDB_MIRRORDIR/data/structures/divided/mmCIF. The progran will fall back to using RCSB server should the path contain errors.
References
Lo WC, Huang PJ, Chang CH, Lyu PC. BMC Bioinformatics. 2007, 8:307
List of all available keywords
- structure_search
- pdb_file = None Enter a PDB file name
- sequence = None Optional Fasta sequence file. Only needed for a quick sequence search against RCSB without a PDB.
- output_prefix = 'output' Provide an output prefix if needed
- blastpath = None Enter path to blastall executable
- sequence_only = False Do a Blast search against PDBaa sequence instead of
doing a Ramanchandran-based structure search
- structure_only = False Do only a Ramanchandran-based structure search.
- db_used = 'rcsb' structure database used in search. rcsb, scop95, or AF2
- db = 'rcsb' Database used in search. rcsb, scop95, or AF2.
- get_ligand = False Use get_ligand=True to retrive ligands.
- get_ramacode_only = False Generate Rama code for input pdb/cif only.
This is for developers only.
- get_xml_only = False Get BLAST XML output returned as a string object.
No coordinate superposition will be performed. Developers only.
- use_pdb100aa = False Use PDB100 sequence database for sequence search.
- use_custom_db = False Use custom database specified by custom_db_files/custom_db_dir.
- custom_db_dir = None The directory of pdb/cif files to make custom database.
Default is current directory
- custom_db_files = None Filenames of the pdb/cif files seperated by spaces for database.
If none specified, all pdb/cif in the custom_db_dir will be collected
- atom_selection = 'all' Choose part of the pdb used in the search (default=all).
for example: chain B, resseq 113:219, ... etc.
- get_pdb = 10 get_pdb=N will collect and superpose the top N
homologous pdbs. Use get_pdb=0 to disable this option.
- deposited_before = 0 Specify the latest year of matching structures to be considered
for scoring. Pdbs deposited after this year will be discarded.
- deposited_after = 0 Specify the earliest year of matching structures to be considered
for scoring. Pdbs deposited before this year will be discarded.
- batch_size = 0 Process the pdbs in batch of <batch_size> until <min_match>
hits are identified or until all <get_pdb> pdbs are processed
- min_match = 0 Finish structure_search when <min_match> matches are found.
Usually uses with <trim_ends> to exit the search once find suitable pdbs.
- keep_all_pdb = False Keep all the PDB files, including full PDB, PDB_Chain and
superposed PDB_Chain. Default is False which will keep only superposed
PDB_Chain files in the directory specified in the output message.
- trim_ends = False Remove terminal residues of hit pdbs extending beyond those
of the the target pdb.
- write_pdb = True Set to False if no output pdb file is needed. Sometimes useful
if use Structure_Search within another program and only want to pass pdb
objects.
- write_results = True Set to False if no output results/log files is needed. Useful
when calling Structure_Search within another program and only want to pass pdb
objects.
- trim_hit_pdb = False Remove extra domains, extended loops, and unfit portions
of hit pdbs after superposed to the target pdb.
- pickle_hits = False Pickle blast hit results from xml output.
- coot_display = False (default) Display output pdb files in coot.
- ask_coot = True prompt for coot display optios
- PDB_MIRRORDIR = None Enter the top directory of local RCSB PDB mirror. The program
will try to retrieve PDBs and/or structure factors from this mirror first.
Note this assumes the directory trees under it follows those in RCSB --
pdb files as 'pdb####.ent.gz' in PDB_MIRRORDIR/data/structures/divided/pdb directory.
If you use PDB's rsync script, this variable would be the same as the $MIRRORDIR set
in the script
- PDB_MIRROR_MMCIF = None Enter the parent directory of the mmcif files in the local PDB mirror.
MMCIFs will be retrieved from subdirectory ## where ## are the second and third letters
in the PDB id. This keyword should be $PDB_MIRRORDIR/data/structures/divided/mmcif directory.
- PDB_MIRROR_PDB = None Enter the parent directory of the PDB files in the local PDB mirror.
PDBs will be retrieved from subdirectory ## where ## are the second and third letters
in the PDB id. This keyword should be $PDB_MIRRORDIR/data/structures/divided/pdb directory.
We recommend setting PDB_MIRRORDIR and it will take care of both PDB_MIRROR_PDB and
others together. However, users may choose to specify PDB_MIRROR_PDB
directly
- PDB_MIRROR_STRUCTURE_FACTORS = None Enter the parent directory of the PDB files in the local PDB mirror.
structure factors s will be retrieved from subdirectory ## where ## are the second
and third letters in the PDB id. This keyword should be the same as the
$PDB_MIRRORDIR/data/structures/divided/structure_factors directory.
We recommend setting PDB_MIRRORDIR and it will take care of both PDB_MIRROR_PDB and
DB_MIRROR_STRUCTURE_FACTORS together. However, users may choose to specify
PDB_MIRROR_STRUCTURE_FACTORS directly
- local_pdb_dir = None Enter the path directly to your local PDB repository.
- verbose = False verbose output
- debug = False debugging output
- job_title = None Job title in PHENIX GUI, not used on command line
- guiGUI-specific parameter required for output directory