Request for a fix in CC1/2 calculation in phenix.merging_statistics
Dear CCTBX developers, I found a big difference in CC1/2 value (say, 77.3 vs 68.3) between phenix.merging_statistics and XSCALE when more than one *anomalous* dataset was merged in XSCALE. phenix.merging_statistics seems to assume that equivalent (in case of anomalous, here I do not consider Friedel pairs as equivalent) reflections are sorted (equivalent reflections are not separeted) in the input file. Usually, this assumption should be correct. However, if multiple anomalous datasets were merged in XSCALE, this assumption is not correct. Because Friedel pairs are not necessarily separated. This is a real example (in P43212). These lines are extracted from .HKL file by XSCALE:
1 -41 1 1.749E+04 1.657E+05 2052.0 2467.7 78.8 -113.23 1 41 -1 -1 4.569E+03 6.014E+03 454.1 1112.8 132.2 116.90 1 41 1 -1 8.721E+03 2.637E+03 473.0 1064.4 130.4 116.11 2 41 1 1 7.634E+03 2.417E+03 498.2 1010.8 135.3 115.65 2 -1 -41 -1 4.847E+03 1.997E+03 1892.9 2543.0 85.9 -122.78 2
The last number indicates the dataset the reflection belongs to. Here, (1 -41 1), (41 -1 -1), (41, 1, 1), and (-1, -41, -1) are equivalent reflections, but (41, 1, -1) in the middle is not when anomalous data! Therefore, in phenix.merging_statistics, subsets calculation for (41,1,1) is performed twice (for first two and last two). When I changed the cctbx code to sort reflections after mapping to a.s.u., CC1/2 values by phenix.merging_statistics became closer to XSCALE. In merging_stats.__init__() in iotbx/merging_statistics.py, I inserted a line
array = array.sort("packed_indices") after array = array.customized_copy(anomalous_flag=anomalous).map_to_asu()
but I think there would be more correct way to fix this unexpected behavior. Best regards, Keitaro
On Sun, Oct 27, 2013 at 6:36 AM, Keitaro Yamashita < [email protected]> wrote:
When I changed the cctbx code to sort reflections after mapping to a.s.u., CC1/2 values by phenix.merging_statistics became closer to XSCALE.
In merging_stats.__init__() in iotbx/merging_statistics.py, I inserted a line
array = array.sort("packed_indices") after array = array.customized_copy(anomalous_flag=anomalous).map_to_asu()
but I think there would be more correct way to fix this unexpected behavior.
If you can think of a better method, feel free to implement it, but I think the above is quite appropriate. Assuming this passes the regression test (iotbx/regression/tst_merging_statistics.py), it's fine to check in. The only drawback is that it's probably inefficient (especially for arrays that are already sorted), but the entire program isn't very efficient to begin with, and a more comprehensive solution would require a ground-up approach mostly coded in C++. This will definitely become necessary at some point but none of us have time to spare right now. -Nat
Dear Nat,
Thank you very much for your reply. As the code passes the regression
test, I have committed this change. This sort() may slowdown the
runtime, but it seems comparable. For 3,099,125 reflections test case,
the difference was marginal (68.6 vs 68.9 sec).
Best regards,
Keitaro
2013/10/28 Nathaniel Echols
On Sun, Oct 27, 2013 at 6:36 AM, Keitaro Yamashita
wrote: When I changed the cctbx code to sort reflections after mapping to a.s.u., CC1/2 values by phenix.merging_statistics became closer to XSCALE.
In merging_stats.__init__() in iotbx/merging_statistics.py, I inserted a line
array = array.sort("packed_indices") after array = array.customized_copy(anomalous_flag=anomalous).map_to_asu()
but I think there would be more correct way to fix this unexpected behavior.
If you can think of a better method, feel free to implement it, but I think the above is quite appropriate. Assuming this passes the regression test (iotbx/regression/tst_merging_statistics.py), it's fine to check in. The only drawback is that it's probably inefficient (especially for arrays that are already sorted), but the entire program isn't very efficient to begin with, and a more comprehensive solution would require a ground-up approach mostly coded in C++. This will definitely become necessary at some point but none of us have time to spare right now.
-Nat
_______________________________________________ cctbxbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/cctbxbb
On Sun, Oct 27, 2013 at 7:25 PM, Keitaro Yamashita < [email protected]> wrote:
Thank you very much for your reply. As the code passes the regression test, I have committed this change. This sort() may slowdown the runtime, but it seems comparable. For 3,099,125 reflections test case, the difference was marginal (68.6 vs 68.9 sec).
This is within the margin of error on most systems. It's a good illustration of why the overall process needs optimization, though. -Nat
participants (2)
-
Keitaro Yamashita
-
Nathaniel Echols