Hi Mark, all true statements, in general. These tools are not to label an outlier as 'wrong'. Instead, they are meant to alert a user of something unusual, prompt to pay a closer attention and eventually explain the oddity (as result of paying a closer attention). Very much like in your example, if Polygon shows an outlier and you bring good arguments to explain it (such as peculiarity of the data -- anisotropy, I/sigma, Rmerge, etc) then it's great and you are good to go. The most common use case for the Polygon is when someone uses a suboptimal refinement strategy, gets hugely unlikely refinement statistics (such as R=25 at 1A resolution) and that goes unnoticed and ends up in the data base. One of my favorite examples is 1eic (1.4A, Rw=20, Rf=25). Polygon instantly tells you this is highly unusual. Applying proper refinement protocol, I can trivially get Rw and Rf down to 14 and 17% (otherwise, I would not know if I can potentially do this!). Using resolution as a guide is just because this is easy to grasp by most users. Clearly, something like effective resolution (that accounts for data completeness, for example) may potentially be better.. but if I say "2A resolution" most people will instantaneously know what I mean, while if I say "effective resolution is 2A" I will have to explain what I mean (and I'm sure not all will be patient enough to listen!). All in all, I'd say Polygon is based on a collection of compromises and shortcuts to get something useful and easy to grasp quickly. All the best, Pavel On 4/17/18 12:16, Mark A. White wrote:
Pavel,
I have an issue with the general use of these metrics as an "IQ score" for protein structures. They completely ignore the details of the experimental data and use one value, the maximum resolution, to set the Bar. There are at least two reasons that this can be a poor choice. (1) Highly Anisotropic data may go to 2.8A along one cell axis, but only to 3.4A for the other two. (2) The parameters used to cut the data. Previously and I/sigma~3 or an Rmerge~30% were considered the limits of usable data. Today many data sets use a CC1/2>=0.5 as a cutoff, with will include significantly more high resolution data and push the "Resolution" to a higher value. In both cases we are now comparing data sets with data to ~1 I/sigma to older data sets with an cutoff I/sigma of ~ 3 - 5. These are not meaningful comparisons. If the software were to define a comparative resolution based on I/sigma, completeness, then these comparisons would be more meaningful.
If you want to reexamine the use of a single 'factor' in evaluating anything I can highly recommend Stephen Jay Gould's the Mismeasure of Man. We need to examine the assumptions that are made in the creation of these metrics.
-- Yours sincerely,
Mark A. White, Ph.D. Associate Professor of Biochemistry and Molecular Biology, Manager, Sealy Center for Structural Biology and Molecular Biophysics Macromolecular X-ray Laboratory, Basic Science Building, Room 6.658A University of Texas Medical Branch Galveston, TX 77555-0647 mailto://[email protected] http://xray.utmb.edu
QQ: "I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail." - Abraham Maslow (1966)
-----Original Message----- *From*: Pavel Afonine
mailto:Pavel%20Afonine%20%[email protected]%3e> *To*: Tanner, John J. mailto:%22Tanner,%20John%20J.%22%20%[email protected]%3e>, [email protected] mailto:%[email protected]%22%20%[email protected]%3e> *Subject*: Re: [phenixbb] R-factor expectations when translational pseudo symmetry is present *Date*: Fri, 13 Apr 2018 11:11:59 -0700 Hi Jack,
Polygon tool is designed answer questions like "what Rwork, Rfree and Rfree-Rwork I expect at this resolution?". If focusing on R-factors only, then you can get a quick idea using a command line tool:
phenix.r_factor_statistics 2.25
Histogram of Rwork for models in PDB at resolution 2.15-2.35 A: 0.123 - 0.144 : 36 0.144 - 0.165 : 442 0.165 - 0.187 : 1669 0.187 - 0.208 : 2782 *0.208 - 0.230 : 2023 <<< Your case* * 0.230 - 0.251 : 812* 0.251 - 0.273 : 165 0.273 - 0.294 : 19 0.294 - 0.316 : 5 0.316 - 0.337 : 3 Histogram of Rfree for models in PDB at resolution 2.15-2.35 A: 0.160 - 0.183 : 43 0.183 - 0.207 : 405 0.207 - 0.231 : 1485 0.231 - 0.255 : 2759 * 0.255 - 0.278 : 2216 <<< Your case* 0.278 - 0.302 : 861 0.302 - 0.326 : 142 0.326 - 0.350 : 36 0.350 - 0.373 : 7 0.373 - 0.397 : 2 Histogram of Rfree-Rwork for all model in PDB at resolution 2.15-2.35 A: 0.001 - 0.011 : 55 0.011 - 0.021 : 247 0.021 - 0.031 : 782 0.031 - 0.041 : 1597 * 0.041 - 0.050 : 2124 <<< Your case* 0.050 - 0.060 : 1716 0.060 - 0.070 : 912 0.070 - 0.080 : 316 0.080 - 0.090 : 131 0.090 - 0.100 : 76 Number of structures considered: 7956
So it looks like R-factors you have is what one would expect at this resolution.
Pavel
On 4/12/18 18:38, Tanner, John J. wrote:
Dear PhenixBB,
We have a crystal form that xtriage flags as having strong translational pseudo symmetry (Patterson peak 57% the height of the origin peak, p-value = 3E-5).
The space group is P21212. We can solve the structure with MR and refine to R=0.233 and R-free =0.276 at 2.25 Angstrom resolution. The maps look very good, but do not suggest major additional modeling that could be done to improve the structure and lower the R-factors. I know that one expects the R-factors from refinement to be higher when TPS is present, but my question is how high is too high? Has anyone done a study that shows the expectations for R-factors when TPS is present?
Thanks,
Jack
John J. Tanner Interim Chair, Department of Biochemistry Professor of Biochemistry and Chemistry Department of Biochemistry University of Missouri-Columbia 117 Schweitzer Hall 503 S College Avenue Columbia, MO 65211 Phone: 573-884-1280 Fax: 573-882-5635 Email: [email protected] mailto:[email protected] http://faculty.missouri.edu/~tannerjj/tannergroup/tanner.html https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Ffaculty.missouri.edu%2F%257Etannerjj%2Ftannergroup%2Ftanner.html&data=02%7C01%7Cmawhite%40utmb.edu%7C4389508070e8473b2ea708d5a16a2022%7C7bef256d85db4526a72d31aea2546852%7C0%7C0%7C636592399538790326&sdata=1SiH0MMgyycxtsLmsLDhHsXLYS1XSYs%2BJ6mJuUg0D1Y%3D&reserved=0
Lab: Schlundt Annex rooms 3,6,9, 203B, 203C Office: Schlundt Annex 203A
_______________________________________________ phenixbb mailing list [email protected] mailto:[email protected] http://phenix-online.org/mailman/listinfo/phenixbb https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fphenix-online.org%2Fmailman%2Flistinfo%2Fphenixbb&data=02%7C01%7Cmawhite%40utmb.edu%7C4389508070e8473b2ea708d5a16a2022%7C7bef256d85db4526a72d31aea2546852%7C0%7C0%7C636592399538790326&sdata=H0D6e7muY9LVRReD7StNbDsbdnp4GzpQiXnA%2F1usn1A%3D&reserved=0 Unsubscribe:[email protected] mailto:[email protected]
_______________________________________________ phenixbb mailing list [email protected] mailto:[email protected] https://na01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fphenix-online.org%2Fmailman%2Flistinfo%2Fphenixbb&data=02%7C01%7Cmawhite%40utmb.edu%7C4389508070e8473b2ea708d5a16a2022%7C7bef256d85db4526a72d31aea2546852%7C0%7C0%7C636592399538790326&sdata=H0D6e7muY9LVRReD7StNbDsbdnp4GzpQiXnA%2F1usn1A%3D&reserved=0 Unsubscribe:[email protected] mailto:[email protected]