Hi everyone,

The latest build, 1.15-3448, has an updated version of mmtbx.prepare_pdb_deposition that will add the entity, entity_poly, entity_poly_seq, struct_ref, and struct_ref_seq loops to an existing mmCIF model when a sequence file is provided. The struct_ref and struct_ref_seq loops are still under development since they will require user input and we're checking with the PDB to see if they actually need to be defined for the file used for deposition.

The sequence should be the canonical sequence with the one letter code for each residue. If you have a non-standard residue, the program will align the provided sequences with the chains in the model and replace the one letter code with the three-letter residue name in parentheses.

Existing loops will be kept if they exist if the input model is in mmCIF format. However, only loops that are not overwritten is kept. So if the input file already has an entity_poly loop, this program will overwrite that loop.

This tool is still under development and will be incorporated into the GUI. If you encounter any issues, please let us know and if possible, please provide your input files.

Some known issues include:
- alignment will fail with alternate conformations where each conformation is a different residue
- alignment with chains with many "UNK" residues

Thanks!

--
Billy K. Poon
Research Scientist, Molecular Biophysics and Integrated Bioimaging
Lawrence Berkeley National Laboratory
1 Cyclotron Road, M/S 33R0345
Berkeley, CA 94720
Tel: (510) 486-5709
Fax: (510) 486-5909


On Sat, Jan 5, 2019 at 1:51 PM Billy Poon <BKPoon@lbl.gov> wrote:
Hi Bernhard,

It’s something I’ll be working more the first half of this year and will be related to a general reorganization of how the validation tools, table one, and file deposition tools work. Currently, each tool sort of operates independently and so the information is not handled consistently.

The plan is to do the final statistics calculation once with the validation tool and then export the statistics as a table one and eventually as a CIF file for deposition (the sequence would be added if a sequence file is available). The cryo-em comprehensive validation tool does this currently with exporting statistics in a table. The X-ray/neutron comprehensive validation tool will be changed to follow the same approach. 

The additional wrinkle with non-standard residues is that there are many of those residues. Nigel has a way for combing through our monomer library to build the relationships (e.g. MSE is based on M). I’m working on a tool to use those relationships so that the user just provides the canonical sequence (M) and we will fill in the appropriate non-standard residue (MSE) in the CIF file. This way, users do not have to manually build the PDB specific format of putting non-standard residues in parentheses.

The next Phenix is planned for the end of February/early March, and that will have at least a beta version of this tool. 

On Fri, Jan 4, 2019 at 7:44 PM Bernhard Lechtenberg <blechtenberg@sbpdiscovery.org> wrote:
Hi Billy,

Do you have any updates on this? I just used tried to use mmtbx.prepare_pdb_deposition with the .cif file from phenix.refine and fasta sequences as input, followed by pdb_extract to deposit several structures to the PDB. The .cif files were accepted by PDB, but the refinement statistics were lost and something with the structure factors also seemed wrong, as the validation reports did not contain metrics for R-free and RSRZ outliers. I don’t quite understand how the second problem happens, since mmtbx.prepare_pdb_deposition does not see the structure factors. However, when I skipped the mmtbx.prepare_pdb_deposition and directly used the output cif from phenix.refine in pdb_extract and then uploaded this file to the PDB, both those issues were fixed.

I first used an older phenix version (1.14rc2-3139) on a Mac, then upgraded to the latest nightly-built version (dev-3374), but the issue persisted.

Additionally, for one my five structures, I had the same issue as Patrick described in October (see below), also with both versions of phenix.

Bernhard

Bernhard C. Lechtenberg, PhD 
Postdoctoral Associate
Riedl Lab
Cancer Metabolism and Signaling Networks Program
NCI-Designated Cancer Center



10901 N. Torrey Pines Road, La Jolla, CA 92037

T  858.646.3100 ext. 4216  E blechtenberg@SBPdiscovery.org

Science Benefiting Patients®


On Oct 2, 2018, at 4:01 PM, Billy Poon <BKPoon@lbl.gov> wrote:

Hi Pat,

I'm in the process of reworking that tool since it is dropping some information from phenix.refine in the process of adding the sequence. Something should be available by the end of the week in a new build.

--
Billy K. Poon
Research Scientist, Molecular Biophysics and Integrated Bioimaging
Lawrence Berkeley National Laboratory
1 Cyclotron Road, M/S 33R0345
Berkeley, CA 94720
Tel: (510) 486-5709
Fax: (510) 486-5909


On Mon, Oct 1, 2018 at 8:41 AM Patrick Loll <pjloll@gmail.com> wrote:
Hi all,

Following the instructions given here:

https://www.phenix-online.org/documentation/overviews/xray-structure-deposition.html

I’m attempting to use mmtbx.prepare_pdb_deposition to insert sequence information into the mmCIF that contains the model coordinates. Unfortunately, the program fails with an error (shown below).

The sequence file is FASTA format, and contains an entry for each of the (4) chains in the AU, i.e.

>A
MSEQNCE…
>B
MSEQNCE…
etc.

Any bright ideas?


============this is what happens (vide infra)====================================


[PJL-iMac:blahblah/PJL_final] loll% mmtbx.prepare_pdb_deposition   filename.cif   seq_name.fasta
Starting mmtbx.prepare_pdb_deposition
on Mon Oct  1 11:16:23 2018 by loll
===============================================================================
Processing files:
-------------------------------------------------------------------------------

  Found model, filename.cif
  Found sequence, seq_name.fasta

Processing PHIL parameters:
-------------------------------------------------------------------------------
  No PHIL parameters found
Final processed PHIL parameters:
-------------------------------------------------------------------------------
  data_manager {
    model {
      file = “filename.cif"
    }
    default_model = “filename.cif"
    sequence_files = "seq_name.fasta"
    default_sequence = "seq_name.fasta"
  }

Starting job
===============================================================================
Validating inputs
Using model: filename.cif
Using sequence: seq_name.fasta
Creating mmCIF block for sequence
Traceback (most recent call last):
  File "/Applications/phenix-1.14-3260/build/../modules/cctbx_project/mmtbx/command_line/prepare_pdb_deposition.py", line 9, in <module>
    run_program(program_class=prepare_pdb_deposition.Program)
  File "/Applications/phenix-1.14-3260/modules/cctbx_project/iotbx/cli_parser.py", line 71, in run_program
    task.run()
  File "/Applications/phenix-1.14-3260/modules/cctbx_project/mmtbx/programs/prepare_pdb_deposition.py", line 98, in run
    alignment_params=self.params.mmtbx.validation.sequence.sequence_alignment)
  File "/Applications/phenix-1.14-3260/modules/cctbx_project/iotbx/pdb/hierarchy.py", line 1190, in as_cif_block_with_sequence
    assert len(chain.residue_groups) + chain.n_missing_start + chain.n_missing_end == len(sequence)
AssertionError
(gouts of smoke, terrified squealing)


The ‘No PHIL parameters found’ message is concerning, but the program clearly seems to be finding the input file names.

Suggestions welcome.

Thanks,

Pat

---------------------------------------------------------------------------------------
Patrick J. Loll, Ph. D. 
Professor of Biochemistry & Molecular Biology
Drexel University College of Medicine
Room 10-102 New College Building
245 N. 15th St., Mailstop 497
Philadelphia, PA  19102-1192  USA

(215) 762-7706
pjloll@gmail.com
pjl28@drexel.edu


_______________________________________________
phenixbb mailing list
phenixbb@phenix-online.org
http://phenix-online.org/mailman/listinfo/phenixbb
Unsubscribe: phenixbb-leave@phenix-online.org
_______________________________________________
phenixbb mailing list
phenixbb@phenix-online.org
--
--
Billy K. Poon
Research Scientist, Molecular Biophysics and Integrated Bioimaging
Lawrence Berkeley National Laboratory
1 Cyclotron Road, M/S 33R0345
Berkeley, CA 94720
Tel: (510) 486-5709
Fax: (510) 486-5909