Dear Phil, I don't have access to vol 277 of Met Enz here, which is sad since I wrote a chapter in the thing, so I can't comment on Axel's exact words. I agree that the uncertainty in the ability of the free R estimate based on a subset of reflections to predict the mean of the free R calculated from all possible test sets of that size is proportional to 1/N. My point is that the pdf of the difference of two random variables is not the same as the pdf of either of them. You have to know the coupling between them to decide if the difference between them has greater or less precision than the individual variable. For example, let's assume two measurements with a precision of 1% but the difference in the two measurements is 0.5%. This small difference can be significant if the causes of the uncertainties are systematic. The "errors" would cancel out. I believe the fluctuations in free R estimate due to the small size of the test set are due to the particular indices in the test set and therefore mainly systematic. The intensities of some reflections may have been measured better than others and your test set may happen to be more enriched with those, resulting is a slightly lower free R estimate. Since the test set is always the same during refinement the benefit of this enrichment (or detriment if you were unlucky) will be subtracted out when you compare the two free R estimates during your optimization. I expect the difference between two free R estimates (using the same test set) will be a more precise indicator of the true change in free R than either estimate is in predicting the free R itself. Dale Phil Jeffrey wrote:
Although I am totally unclear why the form of the equation, which is just counting/Poisson statistics, doesn't also apply to the uncertainty of calculating the average percentage deviation over a test set of size N. For example a test set of N=1000 could be regarded as 100 different test sets of size N=10, and I think it's likely that the distribution of R-free for these 100 mini test sets would be Poisson in form centered around the R-free for the superset N=1000.
So, since the change in |Fo-Fc| for each reflection doesn't simply scale with wxc in structure refinement nor does it inevitably decrease for every reflection on structure improvement, the change in R-free for any given change in structure should be related to N in the form of the equation given, so does reflect the s.d. of R-free as an estimate for structure improvement.
No ?
If not, then I misinterpreted what Brunger mean on p.394 of his 1997 Met Enz paper because that's certainly what it read like.
Phil
[email protected] wrote:
That uncertainty is not quite the same thing. What you describe is the uncertainty in the free R due to the small sampling size of the test set. It is the spread of free R's obtained when different test sets are chosen for the same project and circumstances, and is useful when comparing free R's calculated using different test sets, or, heaven forbid, trying to compare free R's from different crystals.