Dear cctbx developers, I am interested in the implementation of model-based reflection outlier rejection. As I read the code mmtbx/scaling/outlier_rejection.py (lines 244-351), I noticed that maybe there was a discrepancy between what log_message explained and the actual code. The log_message in the code says:
Outliers are rejected on the basis of the assumption that a scaled log likelihood differnce 2(log[P(Fobs)]-log[P(Fmode)])/Q\" is distributed according to a Chi-square distribution (Q\" is equal to the second derivative of the log likelihood function of the mode of the distribution). The outlier threshold of the p-value relates to the p-value of the extreme value distribution of the chi-square distribution.
while actual p_value is calculated for each hkl as p_value = 1 - erf(sqrt(LLG))**N, where LLG = log p(F=Fbar | Fmodel) - log p(F=Fobs | Fmodel), and N is the number of reflections. Here, Fbar is F which gives the maximum value of p(F | Fmodel). At least, Q (the second derivative of p(F=Fbar | Fmodel)) is not used in the actual calculation. Could someone please explain the meaning of the actual calculation? Why taking square-root and raising erf() result to the power of N? Thank you very much, Keitaro