Re: [phenixbb] Geometry Restraints - Anisotropic truncation
Hi Kendall, Yes, I think you could use this kind of approach to make overall decisions of any kind, including those you suggest. I would not use Rsleep for anything at all, other than calculating a final number. I would use a fixed Rfree set (which could be a subset of the total free set or the whole set) for all such decision making. If a lot of such decisions are made with Rfree...then yes it would be good to have an Rsleep to make sure that everything is ok. All the best, Tom T ________________________________ From: Kendall Nettles [[email protected]] Sent: Thursday, May 03, 2012 9:05 AM To: Terwilliger, Thomas C; PHENIX user mailing list Subject: Fwd: [phenixbb] Geometry Restraints - Anisotropic truncation Hi Tom, Do you think something like this could be used during refinement to identify the "best" resolution limits? If you have an Rsleep set would Rfree be sufficient for this? I imagine collecting data with a ring of noise and then let the optimal resolution be determined during refinement. My understanding of this is that the modern refinement algorithms can handle some noise in the reflections, but maybe this could be a way to optimize how much signal is needed to contribute in a positive fashion? Kendall
The fact that the R value stats get better when you toss out data is NOT an indication that those data contain no signal. It simply indicates that that subset has a lower signal/noise than the remaining data. If you decide to throw away all data with less than average signal to noise you will get better and better R values until you have no data left at all! Tests along the line of what Tom has recommended are in the right direction, but they have already been done. I have unpublished work where I took a project with a 1.25A data set, as judged by I/sigI > 2 and near 100% completeness and tested the addition of higher resolution data out to 1.1A with very poor stats on both counts. I found that the Rfree calculated only to 1.25A improved by adding the noisy data, and the esd's (I was using shelxl) dropped indicating that the model was more precise. I performed the appropriate control to show that you couldn't just add any numbers out there, you had to use the measured numbers to get the improvements. At the CCP4 meeting in January Kay Diederichs reported on work he has done with P. A. Karplus which was much more rigorous. They show that a lot of data beyond the usual cut-off limits is useful to improving the final model by several measures and they have developed a tool for determining, on an objective level, at what resolution there is no longer signal. That resolution limit was found to be much higher than what we used to and our final R values will be higher as a consequence. But the models that result are better when assessed by properly controlled tests. This work will be in print shortly. An important point is that the Fc's must never be used to judge the quality of the Fo's in a production environment. At the very least you have to recognize that you don't have reliable Fc's at the start of refinement and yet you need to decide what data to use. If all you are doing is changing your resolution limit after refinement to "clean up your stats" you are wasting your time. That sort of thing has nothing to do with building better models. The Diederichs and Karplus test looks directly at the F^2s in the unmerged data to see what signal is there. None of this says anything about the merits of spherical verses elliptical cutoff surfaces. These tests only discuss the radius of whatever surface you choose. It seems to me if the signal/noise ratio drops off faster in some directions than others that the point where there is no signal will differ too. Whatever those elliptical cutoff limits are, they should be much more generous than current practice and not determined by looking at R values. Dale Tronrud On 05/03/12 08:24, Terwilliger, Thomas C wrote:
Hi Kendall, Yes, I think you could use this kind of approach to make overall decisions of any kind, including those you suggest. I would not use Rsleep for anything at all, other than calculating a final number. I would use a fixed Rfree set (which could be a subset of the total free set or the whole set) for all such decision making. If a lot of such decisions are made with Rfree...then yes it would be good to have an Rsleep to make sure that everything is ok. All the best, Tom T
------------------------------------------------------------------------ *From:* Kendall Nettles [[email protected]] *Sent:* Thursday, May 03, 2012 9:05 AM *To:* Terwilliger, Thomas C; PHENIX user mailing list *Subject:* Fwd: [phenixbb] Geometry Restraints - Anisotropic truncation
Hi Tom, Do you think something like this could be used during refinement to identify the "best" resolution limits? If you have an Rsleep set would Rfree be sufficient for this? I imagine collecting data with a ring of noise and then let the optimal resolution be determined during refinement. My understanding of this is that the modern refinement algorithms can handle some noise in the reflections, but maybe this could be a way to optimize how much signal is needed to contribute in a positive fashion? Kendall
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
On Thu, May 3, 2012 at 10:05 AM, Dale Tronrud
An important point is that the Fc's must never be used to judge the quality of the Fo's in a production environment.
I'm not sure that's completely fair - they can certainly be used to identify Fobs values that are wildly at odds with expectations. This is what Pavel does with outlier rejection in phenix.refine (I forget the exact reference for this protocol but I think it's Read 1999 or something like that). The difference, I think, is one of degree, and also that (at least in the current version) the outliers are recalculated at each cycle of refinement and never permanently excluded, unlike anisotropic truncation. -Nat
Yes, "never" is too big a word for this, but writing "nearly never" simply begs for explanation. The rejection test that Pavel and Randy perform is extremely conservative and individualistic. In addition, the last time I looked in a Phaser log file the excluded reflections were individually listed and, as you said, the exile is not permanent - there is a parole hearing every round of the calculation. This is very different than the mass rejections being discussed here. Dale On 05/03/12 10:23, Nathaniel Echols wrote:
On Thu, May 3, 2012 at 10:05 AM, Dale Tronrud
wrote: An important point is that the Fc's must never be used to judge the quality of the Fo's in a production environment.
I'm not sure that's completely fair - they can certainly be used to identify Fobs values that are wildly at odds with expectations. This is what Pavel does with outlier rejection in phenix.refine (I forget the exact reference for this protocol but I think it's Read 1999 or something like that). The difference, I think, is one of degree, and also that (at least in the current version) the outliers are recalculated at each cycle of refinement and never permanently excluded, unlike anisotropic truncation.
-Nat _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
Hi, I'm having the following problem. I've generated my Rfree set (1%) though phenix with an option to convert flags to the CCP4 convention. My working set has flags as high 100 and Buster does not want to accept that (limits are 0-99). How can I resolve this issue? Thanks, Karolina
Hi Karolina, personally I would stick to the default Phenix (and CNS) behavior, 1 for test and 0 for work, and use that consistently. In programs I know there is a way to tell what are work and what are free reflections, and therefore this is not a problem. If some programs don't like this I would contact respective authors of these programs for further assistance. Pavel On 5/3/12 12:02 PM, Karolina Michalska wrote:
I'm having the following problem. I've generated my Rfree set (1%) though phenix with an option to convert flags to the CCP4 convention. My working set has flags as high 100 and Buster does not want to accept that (limits are 0-99). How can I resolve this issue?
Thanks,
Karolina
Dale, I completely agree that Rfree does not alway correlate with the best models. Or maybe I should say that to see a correlation with other measures of model quality, I think you need differences in Rfree on the order of a couple of percent at least. My question was whether Rfree changes differently when you throw out reflections with different signal to noise ratios, and whether this difference might be a useful guide to selecting resolution. Are you saying that Rfree is completely irrelevant to selecting resolution, or that some combination of Rfree and other model quality measures should be considered together? I'm also confused about this:
An important point is that the Fc's must never be used to judge the quality of the Fo's in a production environment.
Does this mean that you shouldn't use maps to judge the resolution limits? I also don't understand how you can use the model as the judge but not the Fc. If there are some objective qualities of the model that can be used to determine resolution cut-offs, then it seems to be that it still might be possible to incorporate an automated procedure during the refinement. Do you envision that the optimal resolution limit might change over the course of refinement, as the phases improve? Kendall On May 3, 2012, at 1:05 PM, Dale Tronrud wrote:
The fact that the R value stats get better when you toss out data is NOT an indication that those data contain no signal. It simply indicates that that subset has a lower signal/noise than the remaining data. If you decide to throw away all data with less than average signal to noise you will get better and better R values until you have no data left at all!
Tests along the line of what Tom has recommended are in the right direction, but they have already been done. I have unpublished work where I took a project with a 1.25A data set, as judged by I/sigI > 2 and near 100% completeness and tested the addition of higher resolution data out to 1.1A with very poor stats on both counts. I found that the Rfree calculated only to 1.25A improved by adding the noisy data, and the esd's (I was using shelxl) dropped indicating that the model was more precise. I performed the appropriate control to show that you couldn't just add any numbers out there, you had to use the measured numbers to get the improvements.
At the CCP4 meeting in January Kay Diederichs reported on work he has done with P. A. Karplus which was much more rigorous. They show that a lot of data beyond the usual cut-off limits is useful to improving the final model by several measures and they have developed a tool for determining, on an objective level, at what resolution there is no longer signal. That resolution limit was found to be much higher than what we used to and our final R values will be higher as a consequence. But the models that result are better when assessed by properly controlled tests. This work will be in print shortly.
An important point is that the Fc's must never be used to judge the quality of the Fo's in a production environment. At the very least you have to recognize that you don't have reliable Fc's at the start of refinement and yet you need to decide what data to use. If all you are doing is changing your resolution limit after refinement to "clean up your stats" you are wasting your time. That sort of thing has nothing to do with building better models. The Diederichs and Karplus test looks directly at the F^2s in the unmerged data to see what signal is there.
None of this says anything about the merits of spherical verses elliptical cutoff surfaces. These tests only discuss the radius of whatever surface you choose. It seems to me if the signal/noise ratio drops off faster in some directions than others that the point where there is no signal will differ too. Whatever those elliptical cutoff limits are, they should be much more generous than current practice and not determined by looking at R values.
Dale Tronrud
On 05/03/12 08:24, Terwilliger, Thomas C wrote:
Hi Kendall, Yes, I think you could use this kind of approach to make overall decisions of any kind, including those you suggest. I would not use Rsleep for anything at all, other than calculating a final number. I would use a fixed Rfree set (which could be a subset of the total free set or the whole set) for all such decision making. If a lot of such decisions are made with Rfree...then yes it would be good to have an Rsleep to make sure that everything is ok. All the best, Tom T
------------------------------------------------------------------------ *From:* Kendall Nettles [[email protected]] *Sent:* Thursday, May 03, 2012 9:05 AM *To:* Terwilliger, Thomas C; PHENIX user mailing list *Subject:* Fwd: [phenixbb] Geometry Restraints - Anisotropic truncation
Hi Tom, Do you think something like this could be used during refinement to identify the "best" resolution limits? If you have an Rsleep set would Rfree be sufficient for this? I imagine collecting data with a ring of noise and then let the optimal resolution be determined during refinement. My understanding of this is that the modern refinement algorithms can handle some noise in the reflections, but maybe this could be a way to optimize how much signal is needed to contribute in a positive fashion? Kendall
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
On 05/03/12 10:35, Kendall Nettles wrote:
Dale, I completely agree that Rfree does not alway correlate with the best models. Or maybe I should say that to see a correlation with other measures of model quality, I think you need differences in Rfree on the order of a couple of percent at least.
My question was whether Rfree changes differently when you throw out reflections with different signal to noise ratios, and whether this difference might be a useful guide to selecting resolution. Are you saying that Rfree is completely irrelevant to selecting resolution, or that some combination of Rfree and other model quality measures should be considered together?
There are philosophical issues here as well as practical ones. On the practical side, you have to chose a resolution limit for your data long before you have a model good enough to start reliable free R calculations or "model quality" measures. To my mind these sorts of things can only be applied very late in the game and you have already pretty much defined your model. Changing the resolution limit then is only useful to make the stats "look better". If you can "improve" your R's without changing the model you have not accomplished anything, since what people look for in the R values is an assessment of the quality of the model. I believe (back to philosophy) that you have to define your data set and then build a model consistent with it. Any large scale modification of the data set after the fact is problematic.
I'm also confused about this:
An important point is that the Fc's must never be used to judge the quality of the Fo's in a production environment.
Does this mean that you shouldn't use maps to judge the resolution limits? I also don't understand how you can use the model as the judge but not the Fc. If there are some objective qualities of the model that can be used to determine resolution cut-offs, then it seems to be that it still might be possible to incorporate an automated procedure during the refinement. Do you envision that the optimal resolution limit might change over the course of refinement, as the phases improve?
Models, Fc's, maps, they are all the same to me. Each flows from the other. You can't criticize your data based on a map any more than the model. I didn't say, however, that the map has to be calculated using all the data you used in refinement. The map is only a tool for presenting information to the human eye (in Coot rebuilding at least). We make all sorts of decisions when calculating maps, from sampling rates and contour levels to different sets of Fourier coefficients each with special properties. Certainly resolution limit can be one of these choices. But these choices are not permanent and they do not affect the subsequent refinement or assessment of the final model. An example is the sharpening of low resolution maps. This technique can make the map more interpretable to the model builder, but the original structure factors used in refinement should never be "sharpened". Refinement will do just fine with a blurry data set and produce a model with suitably high B factors. To sharpen the Fobs's is deceptive, perhaps inadvertently but deceptive none the less. No I don't think the resolution limit might change over the course of refinement. The data set contained signal before we started refinement and the amount of signal is exactly the same afterwords. Presuming you left it alone. In the distant past one would start refinement using only low resolution data and increase the resolution as the size of the required shifts got smaller. It has been a long time since the refinement programs we use had a radius of convergence that poor. The maximum likelihood procedures used today go a long way toward properly weighting the (Fobs-Fcalc)'s to allow convergence without such, crude, manipulation of the data. Dale
Kendall
On May 3, 2012, at 1:05 PM, Dale Tronrud wrote:
The fact that the R value stats get better when you toss out data is NOT an indication that those data contain no signal. It simply indicates that that subset has a lower signal/noise than the remaining data. If you decide to throw away all data with less than average signal to noise you will get better and better R values until you have no data left at all!
Tests along the line of what Tom has recommended are in the right direction, but they have already been done. I have unpublished work where I took a project with a 1.25A data set, as judged by I/sigI > 2 and near 100% completeness and tested the addition of higher resolution data out to 1.1A with very poor stats on both counts. I found that the Rfree calculated only to 1.25A improved by adding the noisy data, and the esd's (I was using shelxl) dropped indicating that the model was more precise. I performed the appropriate control to show that you couldn't just add any numbers out there, you had to use the measured numbers to get the improvements.
At the CCP4 meeting in January Kay Diederichs reported on work he has done with P. A. Karplus which was much more rigorous. They show that a lot of data beyond the usual cut-off limits is useful to improving the final model by several measures and they have developed a tool for determining, on an objective level, at what resolution there is no longer signal. That resolution limit was found to be much higher than what we used to and our final R values will be higher as a consequence. But the models that result are better when assessed by properly controlled tests. This work will be in print shortly.
An important point is that the Fc's must never be used to judge the quality of the Fo's in a production environment. At the very least you have to recognize that you don't have reliable Fc's at the start of refinement and yet you need to decide what data to use. If all you are doing is changing your resolution limit after refinement to "clean up your stats" you are wasting your time. That sort of thing has nothing to do with building better models. The Diederichs and Karplus test looks directly at the F^2s in the unmerged data to see what signal is there.
None of this says anything about the merits of spherical verses elliptical cutoff surfaces. These tests only discuss the radius of whatever surface you choose. It seems to me if the signal/noise ratio drops off faster in some directions than others that the point where there is no signal will differ too. Whatever those elliptical cutoff limits are, they should be much more generous than current practice and not determined by looking at R values.
Dale Tronrud
On 05/03/12 08:24, Terwilliger, Thomas C wrote:
Hi Kendall, Yes, I think you could use this kind of approach to make overall decisions of any kind, including those you suggest. I would not use Rsleep for anything at all, other than calculating a final number. I would use a fixed Rfree set (which could be a subset of the total free set or the whole set) for all such decision making. If a lot of such decisions are made with Rfree...then yes it would be good to have an Rsleep to make sure that everything is ok. All the best, Tom T
------------------------------------------------------------------------ *From:* Kendall Nettles [[email protected]] *Sent:* Thursday, May 03, 2012 9:05 AM *To:* Terwilliger, Thomas C; PHENIX user mailing list *Subject:* Fwd: [phenixbb] Geometry Restraints - Anisotropic truncation
Hi Tom, Do you think something like this could be used during refinement to identify the "best" resolution limits? If you have an Rsleep set would Rfree be sufficient for this? I imagine collecting data with a ring of noise and then let the optimal resolution be determined during refinement. My understanding of this is that the modern refinement algorithms can handle some noise in the reflections, but maybe this could be a way to optimize how much signal is needed to contribute in a positive fashion? Kendall
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
participants (6)
-
Dale Tronrud
-
Karolina Michalska
-
Kendall Nettles
-
Nathaniel Echols
-
Pavel Afonine
-
Terwilliger, Thomas C