I think it is very important to be able to include unknown atoms in a deposited pdb file (with echoing the caveat about flooding the structure with UNK's to lower the R-factor). For one thing, these structures are produced not just for structure-factor calculation and validation. Many of the end users will never even bother to do a structure factor calculation. It important for the depositor to be able to refer to an unknown but likely significant ligand and for the reader to be able to go and look at that position (ideally surrounded by electron density). For another thing, the structure factor calculation will give exactly the same result whether the dummy atoms are omitted or are flagged with zero occupancy or atom-type X to be ignored in sf calculation. In the first case the person calculating structure factors can feel good because the results are exactly right for that model. In the second case he feels bad because he wasn't able to correctly account for those atoms. But the first case is actually a better model. Better to get a slightly wrong value for better model than the correct result for the less good model, especially when the two results are exactly the same. Essentially we are faced with an insurmountable problem: we cannot do a proper job of calculating sf's because of the unk atoms. Better to include but ignore them in sf calc, I think, than to eliminate them and kid ourselves that now we have the right answer. However if the depositor has refined them (suggested by the B-factors present in some of the files), and perhaps chosen an atom-type which results in B-factors compatible with surrounding, it should be possible to include the atom type so his R-factor can be reproduced. This runs the risk of someone over-interpreting the PDB ("I thought I knew what the UNK residue is, but my candidate has 3 C and one N where the UNK has 4 C"). my 2 cents, Ed Pavel Afonine wrote:
Hi Frank,
thanks a lot for your feedback - as always very useful and critical which is great!
the 2nd-last one of the validation pack, where you recommend against the use of UNK atoms, but don't say why:
<snip> Some programs and people tend to interpret unknown density using “dummy atoms”. In PDB files it typically looks like this: ATOM 10 O UNK 2 6.348 -11.323 10.667 1.00 8.06 X ATOM 11 O UNK 2 6.994 -12.600 10.740 1.00 7.16 X ATOM 12 O UNK 2 6.028 -13.737 10.607 1.00 6.58 X ATOM 13 DUM UNK 2 6.796 -15.043 10.583 1.00 8.28 ATOM 14 DUM UNK 2 5.099 -13.727 11.792 1.00 7.15 - *Do not deposit this in PDB*, especially if chemical element type is undefined (rightmost column) </snip>
Sorry for not saying "why". If it ever happens for me to show these slides again in whatever School I promise to improve the slides to be as clear as possible.
The problems with records like:
ATOM 10 O UNK 2 6.348 -11.323 10.667 1.00 8.06 X ATOM 10 O UNK 2 6.348 -11.323 10.667 1.00 8.06 ATOM 13 DUM UNK 2 6.796 -15.043 10.583 1.00 8.28 ATOM 13 DUM UNK 2 6.796 -15.043 10.583 1.00 8.28 X
are:
- the chemical element type (column 77-78 ?) (that one that use in Fcalc calculation and also may provide the charge) is undefined (simply blank or "X"), so there is no way to include these dummy atoms into structure factor calculations;
- even if you have "O" like in the first example this often contradicts with "X" in rightmost column, so you have to use guesswork, which is not good for interpreting well defined formatted data files. Plus, of course, not way to tell the charge;
- even if you have "O" like in the second example the element type in rightmost column is missing. Therefore it is a weak information to take: we cannot reliably extract scattering type from atom label - classical example CA (Calcium) and CA (C-alpha);
- of course, we can make the program simply ignore these atoms (hm... sounds like a bad practice: don't read it if you can't read it - this way we may end up being ignorant -:) ). But are we sure that the original program that put these dummies was also not using them in Fcalc calculation? Or may be it was using some default scattering factor for them? Which one: H or O or N (N better approximates than O)?
- furthermore, since we are lacking such a fundamental property of these dummy atoms as scattering type, it it laughable to assign some B-factors to these atoms! Look through PDB: you will find a some smart looking B-factors, such as 8.06 A**2 for an non-existing element X -:)
In summary:
- do not put there anything hoping that future generation smarter software will find out what it is; - if you want to put something there (which has valid reasons actually - this will improve the overall map quality which is good - then please properly define it).
All the best! Pavel.
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb