phenix refinement question

Katarina Moravcevic

15 Nov 2010 15 Nov '10

9:09 p.m.

Hi all, I have only started using PHENIX and have made an interesting observation. If I run my model through PHENIX refine (individual sites, individual ADPs, occupancies, optimized X-ray/stereochemistry weight, optimized X-ray/ADP weight options selected) I obtain R/Rfree 0.15/0.18 and very nice geometry values (data is to 1.7A). However, if I now take this refined model and run it through Refmac (without any refinement) I obtain R/Rfree of 0.19/0.21. Can anyone explain why is there such a big difference? Will this be an issue during PDB submission? Any and all comments are appreciated. Thank you all in advance Katarina M.

Attachments:

attachment.html (text/html — 674 bytes)

Show replies by date

Nathaniel Echols

15 Nov 15 Nov

9:33 p.m.

On Mon, Nov 15, 2010 at 1:09 PM, Katarina Moravcevic wrote:

...

Hi all, I have only started using PHENIX and have made an interesting observation. If I run my model through PHENIX refine (individual sites, individual ADPs, occupancies, optimized X-ray/stereochemistry weight, optimized X-ray/ADP weight options selected) I obtain R/Rfree 0.15/0.18 and very nice geometry values (data is to 1.7A). However, if I now take this refined model and run it through Refmac (without any refinement) I obtain R/Rfree of 0.19/0.21. Can anyone explain why is there such a big difference? Will this be an issue during PDB submission?

Take a look at what phenix.refine reports for the number of reflections used for refinement in the logfile - I believe it will discard reflections that are flagged as suspicious (can't remember the reference, I think it's one of Randy Read's papers), and unless you're using a very recent version, it may also be ignoring reflections where F=0. There are probably a half-dozen other reasons why the programs disagree, but these are less obvious and potentially much harder to detect. If you're concerned about reproducibility, run phenix.model_vs_data with the reflections and PDB file, and make sure that the statistics it reports agree with what you're sending to the PDB. The program is well-documented and mostly open-source, and is a good sanity check for data in the PDB. The PDB is using a much more primitive method to check R-factors, and after they process your structure you'll get an email that includes something like this: Structure factor validation High_Res Low_Res Compl Num_Ref R_obs R_work R_free Corr(Fo-Fc) Reported PHENIX 2.500 20.005 81.28 15118 0.2183 0.2150 0.2884 N/A SFCHECK without TLS 2.50 19.64 87.9 8004 0.2850 0.2850 N/A 0.8546 REFMAC without TLS 2.500 67.386 87.609 8937 0.234 0.2336 0.0000 0.918 The lack of TLS in their validation means that they'll even overestimate the R-factors of most structures refined by REFMAC! (Although in this case I think I left the ANISOU records in the deposited PDB file, so I'm not sure what they're doing wrong.) They will never complain about this, in my experience, so it's not something you need to worry about. -Nat

Pavel Afonine

9:50 p.m.

Hello,

...

...
I have only started using PHENIX and have made an interesting observation. If I run my model through PHENIX refine (individual sites, individual ADPs, occupancies, optimized X-ray/stereochemistry weight, optimized X-ray/ADP weight options selected) I obtain R/Rfree 0.15/0.18 and very nice geometry values (data is to 1.7A). However, if I now take this refined model and run it through Refmac (without any refinement) I obtain R/Rfree of 0.19/0.21.

sounds good to me, I would be worried otherwise.

...

...
Can anyone explain why is there such a big difference? Will this be an issue during PDB submission?

Have a look at this: J. Appl. Cryst. (2010). 43, 669-676 phenix.model_vs_data: a high-level tool for the calculation of crystallographic model and data statistics P. V. Afonine, R. W. Grosse-Kunstleve, V. B. Chen, J. J. Headd, N. W. Moriarty, J. S. Richardson, D. C. Richardson, A. Urzhumtsev, P. H. Zwart and P. D. Adams

...

Take a look at what phenix.refine reports for the number of reflections used for refinement in the logfile - I believe it will discard reflections that are flagged as suspicious (can't remember the reference, I think it's one of Randy Read's papers),

It is typically from zero to a few dozens of reflections and unless they are ~1.e+9 removing them rarely visible in terms of R-factor (if you just re-compute it). Pavel.

Katarina Moravcevic

11:24 p.m.

Hi Pavel, after reading the paper you suggested I have to admit I am still confused with this discrepancy. In the section that describes specific reasons in R/Rfree discrepancies that could apply to my model, REFMAC and PHENIX supposedly use same principles. So, do these differences come from some slight variations in how for example bulk solvent and anisotropic scaling are treated in two programs? I am sorry for asking basic questions and please do not bother to reply if it is too silly. Thanks Katarina On Mon, Nov 15, 2010 at 4:50 PM, Pavel Afonine wrote:

...

Hello,

I have only started using PHENIX and have made an interesting observation.

...
...
If I run my model through PHENIX refine (individual sites, individual ADPs, occupancies, optimized X-ray/stereochemistry weight, optimized X-ray/ADP weight options selected) I obtain R/Rfree 0.15/0.18 and very nice geometry values (data is to 1.7A). However, if I now take this refined model and run it through Refmac (without any refinement) I obtain R/Rfree of 0.19/0.21.

sounds good to me, I would be worried otherwise.

Can anyone explain why is there such a big difference? Will this be an

...
...
issue during PDB submission?

Have a look at this:

J. Appl. Cryst. (2010). 43, 669-676 phenix.model_vs_data: a high-level tool for the calculation of crystallographic model and data statistics P. V. Afonine, R. W. Grosse-Kunstleve, V. B. Chen, J. J. Headd, N. W. Moriarty, J. S. Richardson, D. C. Richardson, A. Urzhumtsev, P. H. Zwart and P. D. Adams

Take a look at what phenix.refine reports for the number of

...
reflections used for refinement in the logfile - I believe it will discard reflections that are flagged as suspicious (can't remember the reference, I think it's one of Randy Read's papers),

It is typically from zero to a few dozens of reflections and unless they are ~1.e+9 removing them rarely visible in terms of R-factor (if you just re-compute it).

Pavel.

_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

Pavel Afonine

16 Nov 16 Nov

5:17 a.m.

Hi Katarina,

...

after reading the paper you suggested I have to admit I am still confused with this discrepancy. In the section that describes specific reasons in R/Rfree discrepancies that could apply to my model, REFMAC and PHENIX supposedly use same principles.

yes, in general, any crystallographic program that is in common use today utilizes similar principles, which are outlined here (for example): http://www.phenix-online.org/presentations/latest/pavel_refinement_general.p... But the evil in the details and their amount (especially those that make available refinement programs different)-;) For example: - outliers detection and removing: available in phenix.refine, not in Refmac; - 2nd derivatives based minimizer: used in Refmac and not in phenix.refine; - anisotropic scaling applied to a different term; - mask calculation parameters are pretty different between two programs; - weight optimization is VERY different (including because of different minimizers used); - phenix.refine uses ML target for bulk-solvent and scaling and Refmac uses LS; - ML targets parametrized differently; - ... I can name 100+ more differences, but it's 21:09 and the dinner is still waiting for me -:) so I guess I stop here. We also know that even small change somewhere can turn refinement into a different pathway and result in refined model in different local minimum: http://www.phenix-online.org/presentations/latest/pavel_validation.pdf Also, developing phenix.refine we almost always try to implement the best currently available technology (or develop it ourselves) and we make sure it works the way we expect it to work by re-refining the whole PDB (in fact, its subset where experimental data is available). So having said all the above, I'm not too surprised that you are getting different results, and I'm not worried since the results are better using PHENIX (I would ask for more details otherwise).

...

I am sorry for asking basic questions and please do not bother to reply if it is too silly.

Not a problem. Please keep asking as many questions as you need. All the best! Pavel.

Keitaro Yamashita

20 Nov 20 Nov

2:52 a.m.

Dear Pavel Afonine

...

- outliers detection and removing: available in phenix.refine, not in Refmac;

I'm very interested in this matter. In my case, I found the following lines in phenix.refine logfile: basic_wilson_outliers = 53 extreme_wilson_outliers = 84 beamstop_shadow_outliers = 49 model_based_outliers = 16 total = 133 Could you tell me some references about these theories? Especially I'm worrying about model_based_outliers. Should I be careful about the number of this? Thanks in advance. K. Yamashita

Pavel Afonine

3:01 a.m.

Hi Keitaro, R.Read, Acta Cryst. (1999). D55, 1759-1764. You can always turn this off using "main.outliers_rejection=False". Let us know if you have any questions. Pavel. On 11/19/10 6:52 PM, Keitaro Yamashita wrote:

...

Dear Pavel Afonine

...
- outliers detection and removing: available in phenix.refine, not in Refmac; I'm very interested in this matter. In my case, I found the following lines in phenix.refine logfile:

basic_wilson_outliers = 53 extreme_wilson_outliers = 84 beamstop_shadow_outliers = 49 model_based_outliers = 16 total = 133

Could you tell me some references about these theories?

Especially I'm worrying about model_based_outliers. Should I be careful about the number of this?

Thanks in advance.

K. Yamashita _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb

Steiner, Roberto

16 Nov 16 Nov

8:06 a.m.

Dear Katarina I would suggest you post the same question to the CCP4bb and hopefully Refmac users/experts will comment on this. It is always very difficult to understand what's going on without having a look at input/output files. Best wishes Roberto From my iPhone On 15 Nov 2010, at 23:25, "Katarina Moravcevic" mailto:[email protected]> wrote: Hi Pavel, after reading the paper you suggested I have to admit I am still confused with this discrepancy. In the section that describes specific reasons in R/Rfree discrepancies that could apply to my model, REFMAC and PHENIX supposedly use same principles. So, do these differences come from some slight variations in how for example bulk solvent and anisotropic scaling are treated in two programs? I am sorry for asking basic questions and please do not bother to reply if it is too silly. Thanks Katarina On Mon, Nov 15, 2010 at 4:50 PM, Pavel Afonine <mailto:[email protected][email protected]mailto:[email protected]> wrote: Hello, I have only started using PHENIX and have made an interesting observation. If I run my model through PHENIX refine (individual sites, individual ADPs, occupancies, optimized X-ray/stereochemistry weight, optimized X-ray/ADP weight options selected) I obtain R/Rfree 0.15/0.18 and very nice geometry values (data is to 1.7A). However, if I now take this refined model and run it through Refmac (without any refinement) I obtain R/Rfree of 0.19/0.21. sounds good to me, I would be worried otherwise. Can anyone explain why is there such a big difference? Will this be an issue during PDB submission? Have a look at this: J. Appl. Cryst. (2010). 43, 669-676 phenix.model_vs_data: a high-level tool for the calculation of crystallographic model and data statistics P. V. Afonine, R. W. Grosse-Kunstleve, V. B. Chen, J. J. Headd, N. W. Moriarty, J. S. Richardson, D. C. Richardson, A. Urzhumtsev, P. H. Zwart and P. D. Adams Take a look at what phenix.refine reports for the number of reflections used for refinement in the logfile - I believe it will discard reflections that are flagged as suspicious (can't remember the reference, I think it's one of Randy Read's papers), It is typically from zero to a few dozens of reflections and unless they are ~1.e+9 removing them rarely visible in terms of R-factor (if you just re-compute it). Pavel. _______________________________________________ phenixbb mailing list mailto:[email protected][email protected]mailto:[email protected] http://phenix-online.org/mailman/listinfo/phenixbb http://phenix-online.org/mailman/listinfo/phenixbb

MARTYN SYMMONS

19 Nov 19 Nov

2:39 p.m.

Hi Guys just to clear up some points raised on the treatment of TLS during PDB deposition. It is not strictly the case that TLS is neglected during pdb deposition. The requirement for deposition now is that full ANISOU values have to be present if TLS has been used. In which case the TLS definitions are redundant as the full description of the ADP model is provided by the ATOM and ANISOU records. There is therefore no absolute requirement for the TLS definitions in the header to be correctly read in order for the validation to proceed. This aids accurate validation of the model against the provided SF data using EDS for example. The output with and without TLS was used historically to check whether the TLS definitions had been read correctly. Having said that the presence of TLS definitions is still informative for users of the coordinates to check that for example a full anisotropic refinement has not been carried out. PDB curation involves checking the description of the TLS groups that have been chosen. So, for example, it is useful that the selection expressions do not refer to ranges of residues that do not exist (for example "RESID -99:9999" for a 1-100 residue protein), or to overlapping ranges, for example: a chain with its TLS group 1 defined as "RESID 45:90" and its TLS group 2 is defined as "RESID 75:150". Depending on the wwPDB deposition site, the validation programs may differ. PDBe uses an in-house version of the EDS server which uses REFMAC with TLS taken into account. RCSB and PDBj run the particular program that was used in determining the structure for validation, in addition to a validation check using SFCHECK. It is worth saying that the PDB sites are not attempting to completely reproduce the authors' Rfactors but instead to check the deposition - so for example to spot if an incorrect SF file has been uploaded, or that space group definitions are inconsistent. You can check the details of the PHENIX header format for TLS at http://www.wwpdb.org/documentation/format32/remark3.html#Refinement%20using%... Best Regards Martyn Martyn Symmons _____________________________________________________________________ wwPDB at EBI Protein Data Bank in Europe (PDBe) EMBL Outstation Hinxton +44 (0) 1223 494444 switchboard European Bioinformatics Institute +44 (0) 1223 494487 fax Wellcome Trust Genome Campus +44 (0) 1223 494550 helpdesk Hinxton Cambridge CB10 1SD UK ______________________________________________________________________

MARTYN SYMMONS

2:45 p.m.

Hi Guys just to clear up some points raised on the treatment of TLS during PDB deposition. It is not strictly the case that TLS is neglected during pdb deposition. The requirement for deposition now is that full ANISOU values have to be present if TLS has been used. In which case the TLS definitions are redundant as the full description of the ADP model is provided by the ATOM and ANISOU records. There is therefore no absolute requirement for the TLS definitions in the header to be correctly read in order for the validation to proceed. This aids accurate validation of the model against the provided SF data using EDS for example. The output with and without TLS was used historically to check whether the TLS definitions had been read correctly. Having said that the presence of TLS definitions is still informative for users of the coordinates to check that for example a full anisotropic refinement has not been carried out. PDB curation involves checking the description of the TLS groups that have been chosen. So, for example, it is useful that the selection expressions do not refer to ranges of residues that do not exist (for example "RESID -99:9999" for a 1-100 residue protein), or to overlapping ranges, for example: a chain with its TLS group 1 defined as "RESID 45:90" and its TLS group 2 is defined as "RESID 75:150". Depending on the wwPDB deposition site, the validation programs may differ. PDBe uses an in-house version of the EDS server which uses REFMAC with TLS taken into account. RCSB and PDBj run the particular program that was used in determining the structure for validation, in addition to a validation check using SFCHECK. It is worth saying that the PDB sites are not attempting to completely reproduce the authors' Rfactors, but instead to check for errors in the deposition process. For example whether an incorrect SF file or inconsistent space group definitions have been uploaded. You can check the details of the PHENIX header format for TLS at http://www.wwpdb.org/documentation/format32/remark3.html#Refinement%20using%... Best regards, Martyn Martyn Symmons PDBe ______________________________________________________________________ wwPDB at EBI Protein Data Bank in Europe (PDBe) EMBL Outstation Hinxton +44 (0) 1223 494444 switchboard European Bioinformatics Institute +44 (0) 1223 494487 fax Wellcome Trust Genome Campus +44 (0) 1223 494550 helpdesk Hinxton Cambridge CB10 1SD UK ______________________________________________________________________

Pavel Afonine

20 Nov 20 Nov

10:08 a.m.

Hi Martyn, thanks for your feedback - it is very much appreciated!

...

It is not strictly the case that TLS is neglected during pdb deposition.

This is in-sync with my understanding of the current situation. It is really great! (Although, I should re-run my tools through the whole PDB to quickly see state-of-the-art.)

...

The requirement for deposition now is that full ANISOU values have to be present if TLS has been used.

This is really great, too!

...

In which case the TLS definitions are redundant as the full description of the ADP model is provided by the ATOM and ANISOU records.

No, this is not true. The TLS definitions define the model partitions into TLS groups that with the current tools cannot be recovered from just ANISOU records.

...

There is therefore no absolute requirement for the TLS definitions in the header to be correctly read in order for the validation to proceed.

This is true in a sense that you can re-calculate the R-factor (since the complete ANISOU records corresponding to Utotal (see reference below) are present and therefore you can compute correct Fcalcs), but inability to read this information correctly should be a BIG warning sign for everyone involved. Also, leaving out TLS records will result in obvious loss of information about TLS groups (atom selection defining TLS groups). I don't see a reason why one would want to give up this information.

...

This aids accurate validation of the model against the provided SF data using EDS for example.

Absolutely true: the ability to reproduce the reported R-factors is in close and direct relation with the ability to accurately validate the model and data. phenix.model_vs_data would do it almost unconditionally: J. Appl. Cryst. (2010). 43, 669-676 phenix.model_vs_data: a high-level tool for the calculation of crystallographic model and data statistics P. V. Afonine, R. W. Grosse-Kunstleve, V. B. Chen, J. J. Headd, N. W. Moriarty, J. S. Richardson, D. C. Richardson, A. Urzhumtsev, P. H. Zwart and P. D. Adams

...

The output with and without TLS was used historically to check whether the TLS definitions had been read correctly.

I see, but there are more to just having TLS hint in "REMARK 3"... The TLS records contain the information about TLS groups (atom selections, at least), that, if removed, cannot be easily guessed.

...

Having said that the presence of TLS definitions is still informative for users of the coordinates to check that for example a full anisotropic refinement has not been carried out.

Well, "TLS refinement" = "Constrained anisotropic refinement", so I don't really understand what is "full anisotropic refinement". Also, what about performing TLS refinement on top of treating each atom moving anisotropically: see p. 24-31: http://www.phenix-online.org/newsletter/CCN_2010_07.pdf for some overview.

...

PDB curation involves checking the description of the TLS groups that have been chosen.

Great! Did you see my report that I sent to those who might be interested a few months ago? If not, I can re-send you the old one and meanwhile I can re-compute the most current.

...

So, for example, it is useful that the selection expressions do not refer to ranges of residues that do not exist (for example "RESID -99:9999" for a 1-100 residue protein),

Absolutely true. This is what I pointed out in my report a few months ago.

...

or to overlapping ranges, for example: a chain with its TLS group 1 defined as "RESID 45:90" and its TLS group 2 is defined as "RESID 75:150".

True.

...

Depending on the wwPDB deposition site, the validation programs may differ.

Sure, the tools may vary under the requirement that the outcome must be the same.

...

PDBe uses an in-house version of the EDS server which uses REFMAC with TLS taken into account. RCSB and PDBj run the particular program that was used in determining the structure for validation, in addition to a validation check using SFCHECK.

Can you reproduce the reported R-factors of this entry using the above described tools: 2WYX or 2R24? Let me know if not.

...

It is worth saying that the PDB sites are not attempting to completely reproduce the authors' Rfactors,

This is very unfortunate, since there is no reason for the R-factors to be not 0.01% reproducible. IF you can't reproduce them, then there is THE problem either with the structure/data or with the software you use. Period. See: J. Appl. Cryst. (2010). 43, 669-676 phenix.model_vs_data: a high-level tool for the calculation of crystallographic model and data statistics P. V. Afonine, R. W. Grosse-Kunstleve, V. B. Chen, J. J. Headd, N. W. Moriarty, J. S. Richardson, D. C. Richardson, A. Urzhumtsev, P. H. Zwart and P. D. Adams

...

but instead to check for errors in the deposition process.

Well, reproducing the R-factors is the very first sanity check to do BEFORE wasting any time on checking the other lower level details. Indeed, if such gross thing as the R-factor doesn't reasonably match there is no point to validate fine details.

...

You can check the details of the PHENIX header format for TLS at http://www.wwpdb.org/documentation/format32/remark3.html#Refinement%20using%...

Thanks! This looks great. Just a minor question: What if I specify a TLS group as "chain A or chain a and resseq 123:456 and element N" (that I potentially can do no problem in PHENIX)? All the best! Pavel.

5271

Age (days ago)

5276

Last active (days ago)

List overview

Download

10 comments

6 participants

participants (6)

Katarina Moravcevic
Keitaro Yamashita
MARTYN SYMMONS
Nathaniel Echols
Pavel Afonine
Steiner, Roberto