Re: [phenixbb] Adequate size for Free R test set?
Pavel, Thanks so much for the suggestions. They are really helpful! A few questions relative to Phenix. Is there a way to check with Phenix the thin resolution shells? I created Free Rs in Phenix using the thin resolution shells options but removed the 2000 limit and instead used 5%. Maybe I over-did it relative to your reply. Just wondering how to check each of these thin shells. Alternatively, does Phenix ensure one has an adequate number of test reflections in each "thin shell" if this option is set? If so, what should I used for the maximum number of reflections and/or %? Thanks again! Joe ___________________________________________________________ Joseph P. Noel, Ph.D. Investigator, Howard Hughes Medical Institute Professor, The Jack H. Skirball Center for Chemical Biology and Proteomics The Salk Institute for Biological Studies 10010 North Torrey Pines Road La Jolla, CA 92037 USA Phone: (858) 453-4100 extension 1442 Cell: (858) 349-4700 Fax: (858) 597-0855 E-mail: [email protected] Web Site (Salk): http://www.salk.edu/faculty/faculty_details.php?id=37 Web Site (HHMI): http://hhmi.org/research/investigators/noel.html ___________________________________________________________ On Aug 3, 2010, at 11:02 AM, [email protected] wrote:
Send phenixbb mailing list submissions to [email protected]
To subscribe or unsubscribe via the World Wide Web, visit http://phenix-online.org/mailman/listinfo/phenixbb or, via email, send a message with subject or body 'help' to [email protected]
You can reach the person managing the list at [email protected]
When replying, please edit your Subject line so it is more specific than "Re: Contents of phenixbb digest..."
Today's Topics:
1. Re: Adequate size for Free R test set? (Pavel Afonine) 2. Re: follow up on message 2: phenix.maps --Message: not implemented ([email protected]) 3. Re: follow up on message 2: phenix.maps --Message: not implemented (Pavel Afonine)
----------------------------------------------------------------------
Message: 1 Date: Tue, 03 Aug 2010 10:38:57 -0700 From: Pavel Afonine
To: [email protected] Cc: Joseph Noel Subject: Re: [phenixbb] Adequate size for Free R test set? Message-ID: <[email protected]> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed" Hi Joe,
I think almost every one has his/her own opinion on this... Here is what I think:
1) The test set should be such that each "relatively thin resolution shell" receives at least 50 reflections, and we empirically found that 150 is "good enough" withing phenix.refine framework. For "relatively thin resolution shell" definition see: Lunin & Skovoroda. Acta Cryst. (1995). A51, 880-887. "R-free likelihood-based estimates of errors for phases calculated from atomic models".
This basically defines how many test reflections you need.
2) It is customary to set aside either 5 or 10% for test set, with the total maximum 2000. These are all "magic numbers", that I presume more or less satisfy "1)" so they became widely used.
3) Presence of high-order NCS and selecting free-flags using "thin shells" algorithm is a different story (Acta Cryst. (2006). D62, 227--238). It is good to do that because it removes the cross-talk between test and work reflections due to NCS, but at the same time it invalidates the requirement "1)". So, this is a gray area (for me at least).
4) Some people believe that the final refinement run should be done using all reflections, arguing that taking away 5-10% of test reflections worsens the maps. There is some truth in this, yes, removing the data worsens the maps, but: a) it is noticeable (in a sense that it can reduce the interpretability of some parts of the map) only in extreme cases of somewhat low resolution or low completeness data, b) in most of all other cases it is simply negligible, c) removing reflections randomly has much smaller effect than removing them systematically (see page #40 here: http://www.phenix-online.org/presentations/latest/pavel_maps.pdf and some relevant references in 2010 PHENIX paper in Acta D). However, if you do that "final run", you will invalidate the final refinement statistics, Rfree and Rwork, and thus obtained final structure cannot have the Rfree associated with it anymore.
Pavel.
On 8/3/10 10:04 AM, Joseph Noel wrote:
Hi Folks,
Its been a while since I personally refined many structures. In the past, I used as a default, 5% of my unique reflections for the Free R test set. I have a high resolution structure with 150,000 unique reflections and noticed that Phenix defaults are 5% or 2000 reflections which ever is smaller. What is the current consensus on an adequate number of unique reflections to use for cross-validation?
Thanks! Joe
P.S. I really, really love Phenix. ___________________________________________________________ Joseph P. Noel, Ph.D. Investigator, Howard Hughes Medical Institute Professor, The Jack H. Skirball Center for Chemical Biology and Proteomics The Salk Institute for Biological Studies 10010 North Torrey Pines Road La Jolla, CA 92037 USA
Phone: (858) 453-4100 extension 1442 Cell: (858) 349-4700 Fax: (858) 597-0855 E-mail: [email protected] mailto:[email protected]
Web Site (Salk): http://www.salk.edu/faculty/faculty_details.php?id=37 Web Site (HHMI): http://hhmi.org/research/investigators/noel.html ___________________________________________________________
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
On Tue, Aug 3, 2010 at 11:21 AM, Joseph Noel
Thanks so much for the suggestions. They are really helpful! A few questions relative to Phenix. Is there a way to check with Phenix the thin resolution shells? I created Free Rs in Phenix using the thin resolution shells options but removed the 2000 limit and instead used 5%. Maybe I over-did it relative to your reply. Just wondering how to check each of these thin shells. Alternatively, does Phenix ensure one has an adequate number of test reflections in each "thin shell" if this option is set? If so, what should I used for the maximum number of reflections and/or %?
There isn't a good way to check the shells in the GUI - but all of the pieces are there, and I've been meaning to add some utilities for examining reflection files anyway. For now, on the command line, run this: iotbx.r_free_flags_accumulation data.mtz and it will print out something like this: Number of work/free reflections by resolution: work free %free bin 1: 135.8135 - 4.3073 [7861/7861] 7663 198 2.5% bin 2: 4.3073 - 3.4187 [7682/7682] 7484 198 2.6% bin 3: 3.4187 - 2.9866 [7645/7645] 7444 201 2.6% bin 4: 2.9866 - 2.7135 [7605/7605] 7406 199 2.6% bin 5: 2.7135 - 2.5190 [7593/7593] 7395 198 2.6% bin 6: 2.5190 - 2.3704 [7618/7618] 7416 202 2.7% bin 7: 2.3704 - 2.2517 [7535/7535] 7337 198 2.6% bin 8: 2.2517 - 2.1537 [7573/7573] 7374 199 2.6% bin 9: 2.1537 - 2.0708 [7556/7556] 7354 202 2.7% bin 10: 2.0708 - 1.9993 [7547/7547] 7348 199 2.6% overall 74221 1994 2.6% This is a real example, using the default settings in the reflection file editor (20 thin shells for the test set). It looks much better than what I had originally done with this dataset five years ago using a different program, where the last shell had only 0.2% of reflections flagged. However, the code that does this isn't particularly sophisticated, so I would recommend double-checking the output for your data. I don't think you're going to do any harm by sticking with 5% and removing the absolute limit. -Nat
participants (2)
-
Joseph Noel
-
Nathaniel Echols