R value difference in model vs data vs refinement
Dear all, I refined a structure in phenix quite good resolution (1.4 Ang.) After the last refinement I created the "table 1" to extract all data. Interestingly I got a warning due to small changes in the R values R/Rfree after Refinement. 15,46/18,16 Phenix modelvs.data gives me R/Rfree 15,34/18,07 Interestingly this is nearly exactly the value which I got after the first OHS scattering optimization from phenix R/Rfree 15,43/18,03 The differences are of course very small and negeligible, but I wonder why it happens. Esspecially taking into account that it was refined with phenix and in this case I would expect that both values match exactly. Thank you very much in advance Best Regards Christian
On Mon, Mar 19, 2012 at 11:20 AM, Christian Roth
The differences are of course very small and negeligible, but I wonder why it happens. Esspecially taking into account that it was refined with phenix and in this case I would expect that both values match exactly.
I have seen this quite a bit too in structures from the PDB - I do not have an explanation unfortunately. Are you certain that you selected the same column labels for both refinement and model_vs_data? -Nat
Hi Christian, I keep seeing people saying this but I've never seen a case where R-factors reported by phenix.refine and phenix.model_vs_data are different given identical inputs. We even have a specific test that asserts that Rs out of both tools are identical (of course this test cannot cover all the variety of situations). So if you send me the files that I can use to reproduce your observation that will help me to find what's going on. Thanks, Pavel On 3/19/12 11:20 AM, Christian Roth wrote:
Dear all,
I refined a structure in phenix quite good resolution (1.4 Ang.) After the last refinement I created the "table 1" to extract all data. Interestingly I got a warning due to small changes in the R values R/Rfree after Refinement. 15,46/18,16 Phenix modelvs.data gives me R/Rfree 15,34/18,07 Interestingly this is nearly exactly the value which I got after the first OHS scattering optimization from phenix R/Rfree 15,43/18,03 The differences are of course very small and negeligible, but I wonder why it happens. Esspecially taking into account that it was refined with phenix and in this case I would expect that both values match exactly.
Thank you very much in advance
Best Regards
Christian
One other thought on this: which R-factors reported by model_vs_data
did you use? This will calculate R-factors with and without outlier
removal; the latter is equivalent to phenix.refine with default
settings, but it is maybe not the obvious one to use. A few
reflections with extreme values could account for the difference you
see.
-Nat
On Mon, Mar 19, 2012 at 11:20 AM, Christian Roth
Dear all,
I refined a structure in phenix quite good resolution (1.4 Ang.) After the last refinement I created the "table 1" to extract all data. Interestingly I got a warning due to small changes in the R values R/Rfree after Refinement. 15,46/18,16 Phenix modelvs.data gives me R/Rfree 15,34/18,07 Interestingly this is nearly exactly the value which I got after the first OHS scattering optimization from phenix R/Rfree 15,43/18,03 The differences are of course very small and negeligible, but I wonder why it happens. Esspecially taking into account that it was refined with phenix and in this case I would expect that both values match exactly.
Thank you very much in advance
Best Regards
Christian _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
Hi Nat, I am not sure I looked into the polygon. In the log it is stated that the R values are calculated after a resolution and sigma cutoff applied. If I understood the log correctly the values taken from the pdb header are without sigma cutoff. Maybe thats the reason for the difference. Does modelvsdata somewhere print the values without cutoff in the log file? I did not find it. However does this mean till firsst OHS in phenix refine a default cutoff is used and in than throughout the refinement no coutoff is used anymore? Christian Am Donnerstag 22 März 2012 19:46:15 schrieb Nathaniel Echols:
One other thought on this: which R-factors reported by model_vs_data did you use? This will calculate R-factors with and without outlier removal; the latter is equivalent to phenix.refine with default settings, but it is maybe not the obvious one to use. A few reflections with extreme values could account for the difference you see.
-Nat
On Mon, Mar 19, 2012 at 11:20 AM, Christian Roth
wrote: Dear all,
I refined a structure in phenix quite good resolution (1.4 Ang.) After the last refinement I created the "table 1" to extract all data. Interestingly I got a warning due to small changes in the R values R/Rfree after Refinement. 15,46/18,16 Phenix modelvs.data gives me R/Rfree 15,34/18,07 Interestingly this is nearly exactly the value which I got after the first OHS scattering optimization from phenix R/Rfree 15,43/18,03 The differences are of course very small and negeligible, but I wonder why it happens. Esspecially taking into account that it was refined with phenix and in this case I would expect that both values match exactly.
Thank you very much in advance
Best Regards
Christian _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
On Thu, Mar 22, 2012 at 2:54 PM, Christian Roth
I am not sure I looked into the polygon. In the log it is stated that the R values are calculated after a resolution and sigma cutoff applied. If I understood the log correctly the values taken from the pdb header are without sigma cutoff. Maybe thats the reason for the difference. Does modelvsdata somewhere print the values without cutoff in the log file? I did not find it. However does this mean till firsst OHS in phenix refine a default cutoff is used and in than throughout the refinement no coutoff is used anymore?
After spending some time looking at similar cases today I am not sure myself what is going on. I do not think a sigma cutoff is applied, unless perhaps the PDB header indicates that one was used previously (this is a thoroughly antiquated practice). However, outlier filtering appears to be used throughout. I found one example where nearly 4000 reflections have amplitudes of zero (which is surely not correct), and are discarded as outliers in phenix.refine. This reduces R-free by 0.03. In model_vs_data, the same numbers appear twice: Model_vs_Data: r_work(re-computed) : 0.2030 r_free(re-computed) : 0.2639 ... After applying resolution and sigma cutoffs: n_refl_cutoff : 31257 r_work_cutoff : 0.2030 r_free_cutoff : 0.2639 But this totally contradicts what I told you earlier, sorry. I was assuming that they would be different. I do have a general piece of advice, however: ignore the discrepancy, and just report the value that came out of refinement (because that is what will end up in the PDB). The difference in your case is relatively small, probably less than what you'd see if you calculated R-factors with (for instance Refmac), because of different implementations of bulk solvent correction and scaling, etc.*. (Even different versions of Phenix aren't guaranteed to yield identical R-factors, due to low-level changes.) Considering how difficult it can be to reproduce the statistics in published structures, a change of 0.0004 isn't enough to worry about. -Nat * http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2906258/?tool=pubmed
Hi Nat, I agree with you that the difference is very small and likely negligible. I just asked for curiosity if there might be any reason for this behaviour. Christian Am Donnerstag 22 März 2012 23:07:33 schrieb Nathaniel Echols:
On Thu, Mar 22, 2012 at 2:54 PM, Christian Roth
wrote: I am not sure I looked into the polygon. In the log it is stated that the R values are calculated after a resolution and sigma cutoff applied. If I understood the log correctly the values taken from the pdb header are without sigma cutoff. Maybe thats the reason for the difference. Does modelvsdata somewhere print the values without cutoff in the log file? I did not find it. However does this mean till firsst OHS in phenix refine a default cutoff is used and in than throughout the refinement no coutoff is used anymore?
After spending some time looking at similar cases today I am not sure myself what is going on. I do not think a sigma cutoff is applied, unless perhaps the PDB header indicates that one was used previously (this is a thoroughly antiquated practice). However, outlier filtering appears to be used throughout. I found one example where nearly 4000 reflections have amplitudes of zero (which is surely not correct), and are discarded as outliers in phenix.refine. This reduces R-free by 0.03. In model_vs_data, the same numbers appear twice:
Model_vs_Data: r_work(re-computed) : 0.2030 r_free(re-computed) : 0.2639 ... After applying resolution and sigma cutoffs: n_refl_cutoff : 31257 r_work_cutoff : 0.2030 r_free_cutoff : 0.2639
But this totally contradicts what I told you earlier, sorry. I was assuming that they would be different.
I do have a general piece of advice, however: ignore the discrepancy, and just report the value that came out of refinement (because that is what will end up in the PDB). The difference in your case is relatively small, probably less than what you'd see if you calculated R-factors with (for instance Refmac), because of different implementations of bulk solvent correction and scaling, etc.*. (Even different versions of Phenix aren't guaranteed to yield identical R-factors, due to low-level changes.) Considering how difficult it can be to reproduce the statistics in published structures, a change of 0.0004 isn't enough to worry about.
-Nat
* http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2906258/?tool=pubmed _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
The R-factor difference of 0.0004 is what you have to expect as the result of writing the coordinates, B-factors, and occupancies to the PDB file. In memory floating-point numbers have >= 12 digits precision (we use double precision for almost everything); in the PDB file you have only 7 digits for the coordinates and just 5 digits for the B-factors and occupancies. Ralf On Fri, Mar 23, 2012 at 9:49 AM, Christian Roth < [email protected]> wrote:
Hi Nat,
I agree with you that the difference is very small and likely negligible. I just asked for curiosity if there might be any reason for this behaviour.
Christian
On Thu, Mar 22, 2012 at 2:54 PM, Christian Roth
wrote: I am not sure I looked into the polygon. In the log it is stated that
Am Donnerstag 22 März 2012 23:07:33 schrieb Nathaniel Echols: the
R values are calculated after a resolution and sigma cutoff applied. If I understood the log correctly the values taken from the pdb header are without sigma cutoff. Maybe thats the reason for the difference. Does modelvsdata somewhere print the values without cutoff in the log file? I did not find it. However does this mean till firsst OHS in phenix refine a default cutoff is used and in than throughout the refinement no coutoff is used anymore?
After spending some time looking at similar cases today I am not sure myself what is going on. I do not think a sigma cutoff is applied, unless perhaps the PDB header indicates that one was used previously (this is a thoroughly antiquated practice). However, outlier filtering appears to be used throughout. I found one example where nearly 4000 reflections have amplitudes of zero (which is surely not correct), and are discarded as outliers in phenix.refine. This reduces R-free by 0.03. In model_vs_data, the same numbers appear twice:
Model_vs_Data: r_work(re-computed) : 0.2030 r_free(re-computed) : 0.2639 ... After applying resolution and sigma cutoffs: n_refl_cutoff : 31257 r_work_cutoff : 0.2030 r_free_cutoff : 0.2639
But this totally contradicts what I told you earlier, sorry. I was assuming that they would be different.
I do have a general piece of advice, however: ignore the discrepancy, and just report the value that came out of refinement (because that is what will end up in the PDB). The difference in your case is relatively small, probably less than what you'd see if you calculated R-factors with (for instance Refmac), because of different implementations of bulk solvent correction and scaling, etc.*. (Even different versions of Phenix aren't guaranteed to yield identical R-factors, due to low-level changes.) Considering how difficult it can be to reproduce the statistics in published structures, a change of 0.0004 isn't enough to worry about.
-Nat
* http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2906258/?tool=pubmed _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
Hi, I am sorry I overlooked that 0.0004 is not the difference in my values. In my case the difference Rref-Rmvs = 0.12 and Rfreeref-Rfreemvs = 0.09 Probbably not rounding errors, but maybe outlier corrections done internallyas Nat mentioned, But however Pavel asked for the data and logs. Mybe he could tell me finally what is the reason. Perhaps I did smethin wrong with the parametrization in one of the jobs. Christian Am Freitag 23 März 2012 18:23:13 schrieb Ralf Grosse-Kunstleve:
The R-factor difference of 0.0004 is what you have to expect as the result of writing the coordinates, B-factors, and occupancies to the PDB file. In memory floating-point numbers have >= 12 digits precision (we use double precision for almost everything); in the PDB file you have only 7 digits for the coordinates and just 5 digits for the B-factors and occupancies. Ralf
On Fri, Mar 23, 2012 at 9:49 AM, Christian Roth <
[email protected]> wrote:
Hi Nat,
I agree with you that the difference is very small and likely negligible. I just asked for curiosity if there might be any reason for this behaviour.
Christian
Am Donnerstag 22 März 2012 23:07:33 schrieb Nathaniel Echols:
On Thu, Mar 22, 2012 at 2:54 PM, Christian Roth
wrote: I am not sure I looked into the polygon. In the log it is stated that
the
R values are calculated after a resolution and sigma cutoff applied.
If
I understood the log correctly the values taken from the pdb header are without sigma cutoff. Maybe thats the reason for the difference. Does modelvsdata somewhere print the values without cutoff in the log file?
I
did not find it. However does this mean till firsst OHS in phenix
refine
a default cutoff is used and in than throughout the refinement no
coutoff
is used anymore?
After spending some time looking at similar cases today I am not sure myself what is going on. I do not think a sigma cutoff is applied, unless perhaps the PDB header indicates that one was used previously (this is a thoroughly antiquated practice). However, outlier filtering appears to be used throughout. I found one example where nearly 4000 reflections have amplitudes of zero (which is surely not correct), and are discarded as outliers in phenix.refine. This reduces R-free by 0.03. In model_vs_data, the same numbers appear twice:
Model_vs_Data: r_work(re-computed) : 0.2030 r_free(re-computed) : 0.2639 ... After applying resolution and sigma cutoffs: n_refl_cutoff : 31257 r_work_cutoff : 0.2030 r_free_cutoff : 0.2639
But this totally contradicts what I told you earlier, sorry. I was assuming that they would be different.
I do have a general piece of advice, however: ignore the discrepancy, and just report the value that came out of refinement (because that is what will end up in the PDB). The difference in your case is relatively small, probably less than what you'd see if you calculated R-factors with (for instance Refmac), because of different implementations of bulk solvent correction and scaling, etc.*. (Even different versions of Phenix aren't guaranteed to yield identical R-factors, due to low-level changes.) Considering how difficult it can be to reproduce the statistics in published structures, a change of 0.0004 isn't enough to worry about.
-Nat
* http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2906258/?tool=pubmed _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
Hi Christian, with my current working version of Phenix I'm getting: phenix.refine: r_work : 0.1486 r_free : 0.1752 phenix.model_vs_data: r_work : 0.1485 r_free : 0.1752 What's currently is my working version will be available for download may be sometime next week. The difference 0.1486 vs 0.1485 may be due to: 1) loss of precision during formatting: write out/read in (what's Ralf explained); and/or 2) phenix.refine idealizes X-H geometry (if "hydrogens.refine=riding"), and phenix.model_vs_data does not do this. A larger difference (that you reported) in older versions may be due to minor inconsistencies between handling input data and performing scaling. All these differences are rather cosmetic, and should not create any problem. However, I totally agree that there is no place for them in a self-consistent system, so hopefully they will disappear very soon. Pavel On 3/23/12 11:39 AM, Christian Roth wrote:
Hi, I am sorry I overlooked that 0.0004 is not the difference in my values. In my case the difference Rref-Rmvs = 0.12 and Rfreeref-Rfreemvs = 0.09 Probbably not rounding errors, but maybe outlier corrections done internallyas Nat mentioned, But however Pavel asked for the data and logs. Mybe he could tell me finally what is the reason. Perhaps I did smethin wrong with the parametrization in one of the jobs. Christian
Am Freitag 23 März 2012 18:23:13 schrieb Ralf Grosse-Kunstleve:
The R-factor difference of 0.0004 is what you have to expect as the result of writing the coordinates, B-factors, and occupancies to the PDB file. In memory floating-point numbers have>= 12 digits precision (we use double precision for almost everything); in the PDB file you have only 7 digits for the coordinates and just 5 digits for the B-factors and occupancies. Ralf
On Fri, Mar 23, 2012 at 9:49 AM, Christian Roth<
[email protected]> wrote:
Hi Nat,
I agree with you that the difference is very small and likely negligible. I just asked for curiosity if there might be any reason for this behaviour.
Christian
On Thu, Mar 22, 2012 at 2:54 PM, Christian Roth
wrote: I am not sure I looked into the polygon. In the log it is stated that
Am Donnerstag 22 März 2012 23:07:33 schrieb Nathaniel Echols: the
R values are calculated after a resolution and sigma cutoff applied. If
I understood the log correctly the values taken from the pdb header are without sigma cutoff. Maybe thats the reason for the difference. Does modelvsdata somewhere print the values without cutoff in the log file? I
did not find it. However does this mean till firsst OHS in phenix refine
a default cutoff is used and in than throughout the refinement no coutoff
is used anymore? After spending some time looking at similar cases today I am not sure myself what is going on. I do not think a sigma cutoff is applied, unless perhaps the PDB header indicates that one was used previously (this is a thoroughly antiquated practice). However, outlier filtering appears to be used throughout. I found one example where nearly 4000 reflections have amplitudes of zero (which is surely not correct), and are discarded as outliers in phenix.refine. This reduces R-free by 0.03. In model_vs_data, the same numbers appear twice:
Model_vs_Data: r_work(re-computed) : 0.2030 r_free(re-computed) : 0.2639 ... After applying resolution and sigma cutoffs: n_refl_cutoff : 31257 r_work_cutoff : 0.2030 r_free_cutoff : 0.2639
But this totally contradicts what I told you earlier, sorry. I was assuming that they would be different.
I do have a general piece of advice, however: ignore the discrepancy, and just report the value that came out of refinement (because that is what will end up in the PDB). The difference in your case is relatively small, probably less than what you'd see if you calculated R-factors with (for instance Refmac), because of different implementations of bulk solvent correction and scaling, etc.*. (Even different versions of Phenix aren't guaranteed to yield identical R-factors, due to low-level changes.) Considering how difficult it can be to reproduce the statistics in published structures, a change of 0.0004 isn't enough to worry about.
-Nat
* http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2906258/?tool=pubmed
participants (4)
-
Christian Roth
-
Nathaniel Echols
-
Pavel Afonine
-
Ralf Grosse-Kunstleve