Crystallographic refinement with Rosetta (phenix.rosetta_refine)

Contents

Overview

The program phenix.rosetta_refine provides a convenient wrapper for running Rosetta refinement of protein X-ray crystal structures (DiMaio et al. 2013), which integrates the Rosetta methods for conformational sampling with the X-ray targets, B-factor refinement, and map generation in phenix.refine (Afonine et al. 2012). This is intended for use in difficult cases, especially at low resolution where it combines a wide radius of convergence with excellent geometry, and for preparing crystal structures for further modelling in Rosetta.

Currently both the setup and execution of Rosetta refinement have some limitations; please read this entire document before attempting to use the program! Note that Rosetta does not currently run on Windows; you must have a Mac or Linux system to use this program.

Installation

First, you should install the appropriate version of Phenix. The binary installers should be sufficient; users of the Mac graphical package can set up the command-line environment with this command (replacing the version number as needed):

source /Applications/PHENIX-1.8.4-1496/Contents/phenix-1.8.4-1496/phenix_env.sh

To run phenix.rosetta_refine, you must have installed Rosetta, software developed from the Baker laboratory at the University of Washington.

See the central installation notes for Rosetta

Execution

Running phenix.rosetta_refine is superficially similar to running phenix.refine, but with a much more limited set of options. Required inputs are a PDB file, a data file (MTZ format), and CIF restraint files for any non-standard ligands. Other common parameters include the number of processors to use, the method for parallelizing Rosetta jobs, and the refinement protocol, for example:

phenix.rosetta_refine model.pdb data.mtz nproc=5 technology=sge protocol=hires

The default behavior is to run the full refinement protocol as described in the program citation, creating five unique models, and picking the model with the best free R-factor. This will then be passed through phenix.refine with a null strategy to optimize hydrogen and bulk solvent parameters. If desired, the argument postref=True can be given to enable additional cycles of refinement in phenix.refine, including both coordinates and B-factors with weight optimization.

The main choice of refinement strategy is between the "full" and "hires" protocols, both of which are described using the RosettaScripts syntax (Fleishman et al. 2011). The "hires" protocol is intended both for high-resolution structures and in general any structure that is already near convergence. The XML scripts for these methods are included in the Rosetta distribution. You may also create your own XML script with a customized protocol and pass that to phenix.rosetta_refine.

You can ask the program to parallelize jobs across multiple CPUs or several different queuing systems; the technology parameter determines the method used, and nproc indicates the number of jobs to run at once. The actual number of processes will be limited by the number_of_models parameter, which defaults to 5. Requesting more models may provide incremental improvements in some cases but we have found it rarely makes a difference of more than 1-2% in R-free.

Output will be to a new directory with a name such as "rosetta_1", with the directory number increasing sequentially as additional runs are created. Because phenix.refine is used as the final step, the output files will be identical to those output by that program.

Limitations

The program has several restrictions compared to phenix.refine:

  • Molecules are limited to the standard amino acids and a small number of common ions and ligands. If you have additional molecules in your model, these will be removed prior to Rosetta refinement and replaced afterwards. You must still supply and necessary CIFs to the program, as these will be used later when running the final phenix.refine step.
  • RNA is not supported.
  • Alternate conformations are not supported.
  • The B-factor refinement is limited to simply isotropic refinement at present; for more flexible parameterization you should run phenix.refine separately.
  • Rosetta scales poorly above approximately 1000 residues; although larger structures may work, the performance tends to be worse.
  • At present it is not possible to impose NCS restraints on structures in Rosetta.

Also note that the program is significantly more processor-intensive, although less so than MR-Rosetta (DiMaio et al. 2011). We recommend that you use a multiprocessor computer or queuing system for greatest efficiency.

Troubleshooting

This program is under development; please address any questions concerning its use to help@phenix-online.org. The following guidelines may be useful for best performance:

  • The default settings are optimized for difficult structures, where conventional refinement gets stuck quickly. The assumption is that fine details such as waters and ligand molecules have not yet been added. For now, the only alternative is the "high-resolution" protocol, which employs gentler optimization; additional refinement scripts will be provided in the future.
  • For optimal results, especially when preparing a structure for further Rosetta modelling, you may wish to design your own Rosetta refinement protocol using RosettaScripts.
  • Unlike methods such as simulated annealing with the default geometry restraints, Rosetta refinement will almost never improve the fit to the experimental data at the expense of model geometry. If your structure contains severe distortions as a result of aggressive optimization, Rosetta refinement may actually degrade the model.
  • While Rosetta refinement can improve structures that are outside the radius of convergence of traditional crystallographic refinement, it still requires that the molecule(s) be correctly placed in the unit cell. If you are uncertain of your molecular replacement solution, we recommend trying MR-Rosetta instead.
  • The tests performed in DiMaio et al. 2013 indicated that the best results may come from combining orthogonal refinement methods such as DEN (Schroder et al. 2010) with Rosetta.

References

List of all available keywords