Rant: B vs TLS, anisou, and PDB headers
Just spent an hour trawling docs, BBs (recent threads) and logs to figure out what the hell my B column is telling me (phenix vs refmac vs pdb). Oh dear, it's a disaster area, quite Heissenbergian... the most important number (uncertainty) is itself unknowable: * Phenix writes total ADP, Refmac writes residual ADP. * Refmac writes a remark -- pdbdep strips it (!?!!?) * Phenix writes no remark (I think?) * Refmac writes different numbers to TLSOUT and pdb header (trace of S) * Phenix duplicates the information in header (TLS) and ANISOU cards, the latter thereby making implicit what should be explicitly stored: how the ADPs are connected. * Refmac, given phenix TLS-originating ANISOUs, flattens them into first number, but does not remove them * PDB does not care I'd like to appeal for an urgent consensus -- which should be unusually easy, since it involves only two programs and one repository. My strong recommendation, from first principles of usability: residual B into ATOM, no TLS in ANISOU, and the rest into the header. I know it's religious, but here's the reasoning: ==> the end-user looks *locally*, that's what ATOM and ANISOU are for. ==> global stuff (cell, symmetry, NCS, and yes, TLS) belongs in the header -- as do what's still missing, namely twinning, lattice modulations, scatter factors, and restraints. Yes, we crystallographers want easy B-factor stats (phenix's reason), but then lets fix the analysis programs to look at the header as well. And yes, packing and internal motions (TLS) are all very important for analysis - but that is why it should be explicit in the header, so that graphics tools have easy access to it. End rant (but not end hope :) phx.
Dear Frank, it's not a secret that phenix.refine ALWAYS writes total B-factor into ATOM records, there are strong reasons for this and this is clearly stated in the manual. Reasons to write total B-factor: 1) Easy analysis (Easy color by B-factor in graphics: no prior model manipulations are necessary); 2) All you need to reproduce the R-factors are the ATOM records and structure factor formula (and not ATOM records, PDB header with TLS records that sometimes may be lost or manipulated and specific converting programs to add TLS contribution). Also note, that not all programs extract TLS information from PDB header to compute R-factors, but ALL programs can read ATOM records. 3) Residual B-factors should obey Hirshfeld's rigid bond test (minus deviations due to internal rotational degrees of freedom), so writing a flat distribution of residual B into ATOM record is not really informative. I'm sure I had in mind more, but this is what immediately comes to my mind. phenix.refine writes the complete TLS information into PDB file header. This is not the duplication but a way to compute the residual B-factors for those who really wants to do this. phenix.refine writes out a complete information set into PDB file header under REMARK 3, ready-to-deposit into PDB. It is up to PDB how to treat this information. Doing refinement in phenix.refine it is not assumed that the user jumps back and forth between refinement packages, so no special effort is made to assure easy and straightforward transferability of refinement states / results between refinement packages. Reasons to write out residual B-factor: - I do not see any. Thanks for bringing this up. All the best! Pavel. --- Pavel V. Afonine, Ph.D. Lawrence Berkeley National Lab, Berkeley CA, USA (http://www.lbl.gov/) CCI: Computational Crystallography Initiative (http://cci.lbl.gov/) PHENIX (http://phenix-online.org/) On 3/29/2008 10:35 AM, Frank von Delft wrote:
Just spent an hour trawling docs, BBs (recent threads) and logs to figure out what the hell my B column is telling me (phenix vs refmac vs pdb).
Oh dear, it's a disaster area, quite Heissenbergian... the most important number (uncertainty) is itself unknowable:
* Phenix writes total ADP, Refmac writes residual ADP. * Refmac writes a remark -- pdbdep strips it (!?!!?) * Phenix writes no remark (I think?) * Refmac writes different numbers to TLSOUT and pdb header (trace of S) * Phenix duplicates the information in header (TLS) and ANISOU cards, the latter thereby making implicit what should be explicitly stored: how the ADPs are connected. * Refmac, given phenix TLS-originating ANISOUs, flattens them into first number, but does not remove them * PDB does not care
I'd like to appeal for an urgent consensus -- which should be unusually easy, since it involves only two programs and one repository.
My strong recommendation, from first principles of usability: residual B into ATOM, no TLS in ANISOU, and the rest into the header. I know it's religious, but here's the reasoning: ==> the end-user looks *locally*, that's what ATOM and ANISOU are for. ==> global stuff (cell, symmetry, NCS, and yes, TLS) belongs in the header -- as do what's still missing, namely twinning, lattice modulations, scatter factors, and restraints.
Yes, we crystallographers want easy B-factor stats (phenix's reason), but then lets fix the analysis programs to look at the header as well. And yes, packing and internal motions (TLS) are all very important for analysis - but that is why it should be explicit in the header, so that graphics tools have easy access to it.
End rant (but not end hope :) phx.
_______________________________________________ phenixbb mailing list [email protected] http://www.phenix-online.org/mailman/listinfo/phenixbb
Dear Pavel and Frank It is my recollection that one of the primary goals in the creation of the PDB format was the interchange of information between software packages. While it has certainly failed to meet that (difficult) goal it has been useful at least in the interchange between refinement programs, except in the cases where refinement package authors have ignored the specs. That said, my reading of the specs for the PDB format is that the column in question on the ATOM and HETATOM cards is to contain the isotropic B factor of the atom. There already is redundancy in the format, because if there is an ANISOU card defining the anisotropic B the isotropic component is still listed on the ATOM card, not some residual quantity. That said, I DON'T CARE. The most important thing is to get some consistency here so we can pick up a PDB file and have some idea what it contains. I think everyone agrees on the definitions of the elements of the TLS tensors, agrees on what an anisotropic B is, and agrees on what an isotropic b is. All this fight is over is how the numbers are arranged in a simple text file. I have a collection of models I've pulled from the PDB that I can't figure out, and I'm usually pretty good at this stuff. What will the person do who pulls these models twenty years from now, when memories of the idiosyncrasies of today's Phenix and Refmac have been forgotten? Please, could the authors of Phenix, Restrain, and Refmac get together and agree on something? I'm confident that the wwPDB would go with whatever is agreed upon. Dale Tronrud Pavel Afonine wrote:
Dear Frank,
it's not a secret that phenix.refine ALWAYS writes total B-factor into ATOM records, there are strong reasons for this and this is clearly stated in the manual.
Reasons to write total B-factor: 1) Easy analysis (Easy color by B-factor in graphics: no prior model manipulations are necessary); 2) All you need to reproduce the R-factors are the ATOM records and structure factor formula (and not ATOM records, PDB header with TLS records that sometimes may be lost or manipulated and specific converting programs to add TLS contribution). Also note, that not all programs extract TLS information from PDB header to compute R-factors, but ALL programs can read ATOM records. 3) Residual B-factors should obey Hirshfeld's rigid bond test (minus deviations due to internal rotational degrees of freedom), so writing a flat distribution of residual B into ATOM record is not really informative.
I'm sure I had in mind more, but this is what immediately comes to my mind.
phenix.refine writes the complete TLS information into PDB file header. This is not the duplication but a way to compute the residual B-factors for those who really wants to do this.
phenix.refine writes out a complete information set into PDB file header under REMARK 3, ready-to-deposit into PDB. It is up to PDB how to treat this information.
Doing refinement in phenix.refine it is not assumed that the user jumps back and forth between refinement packages, so no special effort is made to assure easy and straightforward transferability of refinement states / results between refinement packages.
Reasons to write out residual B-factor: - I do not see any.
Thanks for bringing this up. All the best! Pavel.
--- Pavel V. Afonine, Ph.D. Lawrence Berkeley National Lab, Berkeley CA, USA (http://www.lbl.gov/) CCI: Computational Crystallography Initiative (http://cci.lbl.gov/) PHENIX (http://phenix-online.org/)
On 3/29/2008 10:35 AM, Frank von Delft wrote:
Just spent an hour trawling docs, BBs (recent threads) and logs to figure out what the hell my B column is telling me (phenix vs refmac vs pdb).
Oh dear, it's a disaster area, quite Heissenbergian... the most important number (uncertainty) is itself unknowable:
* Phenix writes total ADP, Refmac writes residual ADP. * Refmac writes a remark -- pdbdep strips it (!?!!?) * Phenix writes no remark (I think?) * Refmac writes different numbers to TLSOUT and pdb header (trace of S) * Phenix duplicates the information in header (TLS) and ANISOU cards, the latter thereby making implicit what should be explicitly stored: how the ADPs are connected. * Refmac, given phenix TLS-originating ANISOUs, flattens them into first number, but does not remove them * PDB does not care
I'd like to appeal for an urgent consensus -- which should be unusually easy, since it involves only two programs and one repository.
My strong recommendation, from first principles of usability: residual B into ATOM, no TLS in ANISOU, and the rest into the header. I know it's religious, but here's the reasoning: ==> the end-user looks *locally*, that's what ATOM and ANISOU are for. ==> global stuff (cell, symmetry, NCS, and yes, TLS) belongs in the header -- as do what's still missing, namely twinning, lattice modulations, scatter factors, and restraints.
Yes, we crystallographers want easy B-factor stats (phenix's reason), but then lets fix the analysis programs to look at the header as well. And yes, packing and internal motions (TLS) are all very important for analysis - but that is why it should be explicit in the header, so that graphics tools have easy access to it.
End rant (but not end hope :) phx.
_______________________________________________ phenixbb mailing list [email protected] http://www.phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://www.phenix-online.org/mailman/listinfo/phenixbb
Hi Pavel All your reasons are there for the convenience of the *crystallographer*, mine are for the end user (=unsuspecting biologist) -- who doesn't know TLS even exists (none of used to), never mind about Hirshfeld's test and how it relates to TLS (I didn't), and certainly not how run it (I still don't). I'm latching onto Garib's philosophy (or how I understood it), which is intuitively extremely appealing: what I want to build, deposit, and see density of is the *protein molecule*. How that molecule breathes, packs, and misbehaves crystallographically I don't really care, but can be modelled and refined explicitly, as operations necessary to generate the observed diffraction. (Similar to what Sharp calculates: the phases of the *general protein*. How this protein changes from one derivative to the next is modelled explicitly and separately, and thereby refined.) And so, the *protein molecule* is described by the ATOM/ANISOU cards, the rest in the header. It's probably separate internally in phenix.refine anyway, so don't mangle it all when you write it out: That said, just my suggestion, and I agree with Dale: WHATEVER, JUST DO THE SAME THING. phx. Pavel Afonine wrote:
Dear Frank,
it's not a secret that phenix.refine ALWAYS writes total B-factor into ATOM records, there are strong reasons for this and this is clearly stated in the manual.
Reasons to write total B-factor: 1) Easy analysis (Easy color by B-factor in graphics: no prior model manipulations are necessary); 2) All you need to reproduce the R-factors are the ATOM records and structure factor formula (and not ATOM records, PDB header with TLS records that sometimes may be lost or manipulated and specific converting programs to add TLS contribution). Also note, that not all programs extract TLS information from PDB header to compute R-factors, but ALL programs can read ATOM records. 3) Residual B-factors should obey Hirshfeld's rigid bond test (minus deviations due to internal rotational degrees of freedom), so writing a flat distribution of residual B into ATOM record is not really informative.
I'm sure I had in mind more, but this is what immediately comes to my mind.
phenix.refine writes the complete TLS information into PDB file header. This is not the duplication but a way to compute the residual B-factors for those who really wants to do this.
phenix.refine writes out a complete information set into PDB file header under REMARK 3, ready-to-deposit into PDB. It is up to PDB how to treat this information.
Doing refinement in phenix.refine it is not assumed that the user jumps back and forth between refinement packages, so no special effort is made to assure easy and straightforward transferability of refinement states / results between refinement packages.
Reasons to write out residual B-factor: - I do not see any.
Thanks for bringing this up. All the best! Pavel.
--- Pavel V. Afonine, Ph.D. Lawrence Berkeley National Lab, Berkeley CA, USA (http://www.lbl.gov/) CCI: Computational Crystallography Initiative (http://cci.lbl.gov/) PHENIX (http://phenix-online.org/)
On 3/29/2008 10:35 AM, Frank von Delft wrote:
Just spent an hour trawling docs, BBs (recent threads) and logs to figure out what the hell my B column is telling me (phenix vs refmac vs pdb).
Oh dear, it's a disaster area, quite Heissenbergian... the most important number (uncertainty) is itself unknowable:
* Phenix writes total ADP, Refmac writes residual ADP. * Refmac writes a remark -- pdbdep strips it (!?!!?) * Phenix writes no remark (I think?) * Refmac writes different numbers to TLSOUT and pdb header (trace of S) * Phenix duplicates the information in header (TLS) and ANISOU cards, the latter thereby making implicit what should be explicitly stored: how the ADPs are connected. * Refmac, given phenix TLS-originating ANISOUs, flattens them into first number, but does not remove them * PDB does not care
I'd like to appeal for an urgent consensus -- which should be unusually easy, since it involves only two programs and one repository.
My strong recommendation, from first principles of usability: residual B into ATOM, no TLS in ANISOU, and the rest into the header. I know it's religious, but here's the reasoning: ==> the end-user looks *locally*, that's what ATOM and ANISOU are for. ==> global stuff (cell, symmetry, NCS, and yes, TLS) belongs in the header -- as do what's still missing, namely twinning, lattice modulations, scatter factors, and restraints.
Yes, we crystallographers want easy B-factor stats (phenix's reason), but then lets fix the analysis programs to look at the header as well. And yes, packing and internal motions (TLS) are all very important for analysis - but that is why it should be explicit in the header, so that graphics tools have easy access to it.
End rant (but not end hope :) phx.
_______________________________________________ phenixbb mailing list [email protected] http://www.phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://www.phenix-online.org/mailman/listinfo/phenixbb
Hi Frank, Hi Frank,
All your reasons are there for the convenience of the *crystallographer*, mine are for the end user (=unsuspecting biologist) -- who doesn't know TLS even exists (none of used to), never mind about Hirshfeld's test and how it relates to TLS (I didn't), and certainly not how run it (I still don't).
This is exactly what phenix.refine does: it puts all together so you are not expected to have any knowledge about magic TLS matrices in PDB file header, about right programs to convert one into another and so on. In contrast, if one split things apart: - you must know that what's in ATOM record is incomplete; - you must know that there are TLS matrices that you have to convert to appropriate B and add to residual ones; - you must know that there are the programs out there to do that; - and you must know how to use these programs too. So, having complete record doesn't require any manipulations on the model (and so extra knowledge) . Imagine the situation when you got a model with partial B-factors and another part encoded in PDB header as TLS and you want to do a refinement in SHELXL. In this case you will need to compute the total B to start with the correct values. In contrast, if the values are complete, you do not need to do anything. In the end what's important I believe is that the output information is clearly accompanied with the explanations about what it represents and that there are tools available from both ends (phenix, ccp4) to easily go from partial to total and back. The rest is the matter of personal preferences. Cheers, Pavel. --- Pavel V. Afonine, Ph.D. Lawrence Berkeley National Lab, Berkeley CA, USA (http://www.lbl.gov/) CCI: Computational Crystallography Initiative (http://cci.lbl.gov/) PHENIX (http://phenix-online.org/)
This is exactly what phenix.refine does: it puts all together so you are not expected to have any knowledge about magic TLS matrices in PDB file header, about right programs to convert one into another and so on. In contrast, if one split things apart:
Yes, but no non-crystallographer cares about the crystal -- only about the protein *in* the crystal.
- you must know that what's in ATOM record is incomplete; - you must know that there are TLS matrices that you have to convert to appropriate B and add to residual ones; - you must know that there are the programs out there to do that; - and you must know how to use these programs too.
So, having complete record doesn't require any manipulations on the model (and so extra knowledge) .
Imagine the situation when you got a model with partial B-factors and another part encoded in PDB header as TLS and you want to do a refinement in SHELXL. In this case you will need to compute the total B to start with the correct values. In contrast, if the values are complete, you do not need to do anything.
I can indeed not imagine a non-crystallographer using SHELXL -- for anybody using that program, converting TLS to B is the *least* of their worries! (With all due respect, George :) It's a very small corner case; if people jump, it's to refmac. phx.
On 3/29/2008 1:37 PM, Frank von Delft wrote:
This is exactly what phenix.refine does: it puts all together so you are not expected to have any knowledge about magic TLS matrices in PDB file header, about right programs to convert one into another and so on. In contrast, if one split things apart:
Yes, but no non-crystallographer cares about the crystal -- only about the protein *in* the crystal.
IMHO, Yes, exactly for this purpose there must be convenience tools for analysis of certain parts of the whole system. But the total data base record should be complete and not split apart for instantaneous purpose. I think this could be an endless discussion which I think I'm going to quite now -:) Thanks all for your opinions and feedback. A few things I learned and will add to phenix.refine: 1) More clear message in REMARK 3 situated close by TLS records and explaining what is what; 2) I will add an easy to use option to phenix.pdbtools to be able to go back and forth between Btotal and residual components. All these will appear in one of the next versions. All the best and thanks again, Pavel. --- Pavel V. Afonine, Ph.D. Lawrence Berkeley National Lab, Berkeley CA, USA (http://www.lbl.gov/) CCI: Computational Crystallography Initiative (http://cci.lbl.gov/) PHENIX (http://phenix-online.org/)
participants (3)
-
Dale Tronrud
-
Frank von Delft
-
Pavel Afonine