use_internal_variance in iotbx.merging_statistics
Dear Phenix/CCTBX developers, iotbx/merging_statistics.py is used by phenix.merging_statistics, phenix.table_one, and so on. By upgrading phenix from 1.10.1 to 1.11, merging statistics-related codes were significantly changed. Previously, miller.array.merge_equivalents() was always called with argument use_internal_variance=False, which is consistent with XDS, Aimless and so on. Currently, use_internal_variance=True is default, and cannot be changed by users (see below). These changes were made by @afonine and @rjgildea in rev. 22973 (Sep 26, 2015) and 23961 (Mar 8, 2016). Could anyone explain why these changes were introduced? https://sourceforge.net/p/cctbx/code/22973 https://sourceforge.net/p/cctbx/code/23961 My points are: - We actually cannot control use_internal_variance= parameter because it is not passed to merge_equivalents() in class filter_intensities_by_sigma. - In previous versions, if I gave XDS output to phenix.merging_statistics, values calculated in the same way (as XDS does) were shown; but not in the current version. - For (for example) phenix.table_one users who expect this behavior, it can give inconsistency. The statistics would not be consistent with the data used in refinement. cf. the related discussion in cctbxbb: http://phenix-online.org/pipermail/cctbxbb/2012-October/000611.html Best regards, Keitaro
Dear Keitaro, iotbx.merging_statistics does have the option to change the parameter use_internal_variance. In xia2 we use the defaults use_internal_variance=False, eliminate_sys_absent=False, n_bins=20, when calculating merging statistics which give comparable results to those calculate by Aimless: $ iotbx.merging_statistics Usage: phenix.merging_statistics [data_file] [options...] Calculate merging statistics for non-unique data, including R-merge, R-meas, R-pim, and redundancy. Any format supported by Phenix is allowed, including MTZ, unmerged Scalepack, or XDS/XSCALE (and possibly others). Data should already be on a common scale, but with individual observations unmerged. Diederichs K & Karplus PA (1997) Nature Structural Biology 4:269-275 (with erratum in: Nat Struct Biol 1997 Jul;4(7):592) Weiss MS (2001) J Appl Cryst 34:130-135. Karplus PA & Diederichs K (2012) Science 336:1030-3. Full parameters: file_name = None labels = None space_group = None unit_cell = None symmetry_file = None high_resolution = None low_resolution = None n_bins = 10 extend_d_max_min = False anomalous = False sigma_filtering = *auto xds scala scalepack .help = "Determines how data are filtered by SigmaI and I/SigmaI. XDS" "discards reflections whose intensity after merging is less than" "-3*sigma, Scalepack uses the same cutoff before merging, and" "SCALA does not do any filtering. Reflections with negative SigmaI" "will always be discarded." use_internal_variance = True eliminate_sys_absent = True debug = False loggraph = False estimate_cutoffs = False job_title = None .help = "Job title in PHENIX GUI, not used on command line" Below is my email to Pavel and Billy when we discussed this issue by email a while back: The difference between use_internal_variance=True/False is explained in Luc's document here: libtbx.pdflatex $(libtbx.find_in_repositories cctbx/miller)/equivalent_reflection_merging.tex Essentially use_internal_variance=False uses only the unmerged sigmas to compute the merged sigmas, whereas use_internal_variance=True uses instead the spread of the unmerged intensities to compute the merged sigmas. Furthermore, use_internal_variance=True uses the largest of the variance coming from the spread of the intensities and that computed from the unmerged sigmas. As a result, use_internal_variance=True can only ever give lower I/sigI than use_internal_variance=False. The relevant code in the cctbx is here: https://sourceforge.net/p/cctbx/code/HEAD/tree/trunk/cctbx/miller/merge_equi... Aimless has a similar option for the SDCORRECTION keyword, if you set the option SAMPLESD, which I think is equivalent to use_internal_variance=True. The default behaviour of Aimless is equivalent to use_internal_variance=False: http://www.mrc-lmb.cam.ac.uk/harry/pre/aimless.html#sdcorrection "SAMPLESD is intended for very high multiplicity data such as XFEL serial data. The final SDs are estimated from the weighted population variance, assuming that the input sigma(I)^2 values are proportional to the true errors. This probably gives a more realistic estimate of the error in <I>. In this case refinement of the corrections is switched off unless explicitly requested." I think that the "external" variance is probably better if the sigmas from the scaling program are reliable, or for low multiplicity data. For high multiplicity data or if the sigmas from the scaling program are not reliable, then "internal" variance is probably better. Cheers, Richard Dr Richard Gildea Data Analysis Scientist Tel: +441235 77 8078 Diamond Light Source Ltd. Diamond House Harwell Science & Innovation Campus Didcot Oxfordshire OX11 0DE ________________________________________ From: [email protected] [[email protected]] on behalf of Keitaro Yamashita [[email protected]] Sent: 01 November 2016 07:23 To: cctbx mailing list Subject: [cctbxbb] use_internal_variance in iotbx.merging_statistics Dear Phenix/CCTBX developers, iotbx/merging_statistics.py is used by phenix.merging_statistics, phenix.table_one, and so on. By upgrading phenix from 1.10.1 to 1.11, merging statistics-related codes were significantly changed. Previously, miller.array.merge_equivalents() was always called with argument use_internal_variance=False, which is consistent with XDS, Aimless and so on. Currently, use_internal_variance=True is default, and cannot be changed by users (see below). These changes were made by @afonine and @rjgildea in rev. 22973 (Sep 26, 2015) and 23961 (Mar 8, 2016). Could anyone explain why these changes were introduced? https://sourceforge.net/p/cctbx/code/22973 https://sourceforge.net/p/cctbx/code/23961 My points are: - We actually cannot control use_internal_variance= parameter because it is not passed to merge_equivalents() in class filter_intensities_by_sigma. - In previous versions, if I gave XDS output to phenix.merging_statistics, values calculated in the same way (as XDS does) were shown; but not in the current version. - For (for example) phenix.table_one users who expect this behavior, it can give inconsistency. The statistics would not be consistent with the data used in refinement. cf. the related discussion in cctbxbb: http://phenix-online.org/pipermail/cctbxbb/2012-October/000611.html Best regards, Keitaro _______________________________________________ cctbxbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/cctbxbb -- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
Dear Richard and everyone,
Thanks for your reply. What kind of input do you give to
iotbx.merging_statistics in xia2? For example, when XDS file is given,
use_internal_variance=False is not passed to merge_equivalents()
function. Please look at the lines of
filter_intensities_by_sigma.__init__() in iotbx/merging_statistics.py.
When sigma_filtering == "xds" or sigma_filtering == "scalepack",
array_merged is recalculated using merge_equivalents() with default
arguments.
If nobody disagrees, I would like to commit the fix so that
use_internal_variance variable is passed to all merge_equivalents()
function calls.
I am afraid that the behavior in the phenix-1.11 would be confusing.
In phenix.table_one (mmtbx/command_line/table_one.py),
use_internal_variance=False is default. This will be OK with the fix
I suggested above.
Can it also be default in phenix.merging_statistics, not to change the
program behavior through phenix versions?
Best regards,
Keitaro
2016-11-01 18:21 GMT+09:00
Dear Keitaro,
iotbx.merging_statistics does have the option to change the parameter use_internal_variance. In xia2 we use the defaults use_internal_variance=False, eliminate_sys_absent=False, n_bins=20, when calculating merging statistics which give comparable results to those calculate by Aimless:
$ iotbx.merging_statistics Usage: phenix.merging_statistics [data_file] [options...]
Calculate merging statistics for non-unique data, including R-merge, R-meas, R-pim, and redundancy. Any format supported by Phenix is allowed, including MTZ, unmerged Scalepack, or XDS/XSCALE (and possibly others). Data should already be on a common scale, but with individual observations unmerged. Diederichs K & Karplus PA (1997) Nature Structural Biology 4:269-275 (with erratum in: Nat Struct Biol 1997 Jul;4(7):592) Weiss MS (2001) J Appl Cryst 34:130-135. Karplus PA & Diederichs K (2012) Science 336:1030-3.
Full parameters:
file_name = None labels = None space_group = None unit_cell = None symmetry_file = None high_resolution = None low_resolution = None n_bins = 10 extend_d_max_min = False anomalous = False sigma_filtering = *auto xds scala scalepack .help = "Determines how data are filtered by SigmaI and I/SigmaI. XDS" "discards reflections whose intensity after merging is less than" "-3*sigma, Scalepack uses the same cutoff before merging, and" "SCALA does not do any filtering. Reflections with negative SigmaI" "will always be discarded." use_internal_variance = True eliminate_sys_absent = True debug = False loggraph = False estimate_cutoffs = False job_title = None .help = "Job title in PHENIX GUI, not used on command line"
Below is my email to Pavel and Billy when we discussed this issue by email a while back:
The difference between use_internal_variance=True/False is explained in Luc's document here:
libtbx.pdflatex $(libtbx.find_in_repositories cctbx/miller)/equivalent_reflection_merging.tex
Essentially use_internal_variance=False uses only the unmerged sigmas to compute the merged sigmas, whereas use_internal_variance=True uses instead the spread of the unmerged intensities to compute the merged sigmas. Furthermore, use_internal_variance=True uses the largest of the variance coming from the spread of the intensities and that computed from the unmerged sigmas. As a result, use_internal_variance=True can only ever give lower I/sigI than use_internal_variance=False. The relevant code in the cctbx is here:
https://sourceforge.net/p/cctbx/code/HEAD/tree/trunk/cctbx/miller/merge_equi...
Aimless has a similar option for the SDCORRECTION keyword, if you set the option SAMPLESD, which I think is equivalent to use_internal_variance=True. The default behaviour of Aimless is equivalent to use_internal_variance=False:
http://www.mrc-lmb.cam.ac.uk/harry/pre/aimless.html#sdcorrection
"SAMPLESD is intended for very high multiplicity data such as XFEL serial data. The final SDs are estimated from the weighted population variance, assuming that the input sigma(I)^2 values are proportional to the true errors. This probably gives a more realistic estimate of the error in <I>. In this case refinement of the corrections is switched off unless explicitly requested."
I think that the "external" variance is probably better if the sigmas from the scaling program are reliable, or for low multiplicity data. For high multiplicity data or if the sigmas from the scaling program are not reliable, then "internal" variance is probably better.
Cheers,
Richard
Dr Richard Gildea Data Analysis Scientist Tel: +441235 77 8078
Diamond Light Source Ltd. Diamond House Harwell Science & Innovation Campus Didcot Oxfordshire OX11 0DE
________________________________________ From: [email protected] [[email protected]] on behalf of Keitaro Yamashita [[email protected]] Sent: 01 November 2016 07:23 To: cctbx mailing list Subject: [cctbxbb] use_internal_variance in iotbx.merging_statistics
Dear Phenix/CCTBX developers,
iotbx/merging_statistics.py is used by phenix.merging_statistics, phenix.table_one, and so on. By upgrading phenix from 1.10.1 to 1.11, merging statistics-related codes were significantly changed.
Previously, miller.array.merge_equivalents() was always called with argument use_internal_variance=False, which is consistent with XDS, Aimless and so on. Currently, use_internal_variance=True is default, and cannot be changed by users (see below).
These changes were made by @afonine and @rjgildea in rev. 22973 (Sep 26, 2015) and 23961 (Mar 8, 2016). Could anyone explain why these changes were introduced?
https://sourceforge.net/p/cctbx/code/22973 https://sourceforge.net/p/cctbx/code/23961
My points are:
- We actually cannot control use_internal_variance= parameter because it is not passed to merge_equivalents() in class filter_intensities_by_sigma.
- In previous versions, if I gave XDS output to phenix.merging_statistics, values calculated in the same way (as XDS does) were shown; but not in the current version.
- For (for example) phenix.table_one users who expect this behavior, it can give inconsistency. The statistics would not be consistent with the data used in refinement.
cf. the related discussion in cctbxbb: http://phenix-online.org/pipermail/cctbxbb/2012-October/000611.html
Best regards, Keitaro _______________________________________________ cctbxbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/cctbxbb
-- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
_______________________________________________ cctbxbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/cctbxbb
Dear Keitaro,
I've made the change you suggested in merging_statistics.py - it looks like an oversight, which didn't affect xia2 since we are always calculating merging statistics given an scaled but unmerged mtz file, never an XDS or scalepack-format file.
As to what defaults Phenix uses, that is better left to one of the Phenix developers to comment on.
Cheers,
Richard
Dr Richard Gildea
Data Analysis Scientist
Tel: +441235 77 8078
Diamond Light Source Ltd.
Diamond House
Harwell Science & Innovation Campus
Didcot
Oxfordshire
OX11 0DE
________________________________________
From: [email protected] [[email protected]] on behalf of Keitaro Yamashita [[email protected]]
Sent: 01 November 2016 10:41
To: cctbx mailing list
Subject: Re: [cctbxbb] use_internal_variance in iotbx.merging_statistics
Dear Richard and everyone,
Thanks for your reply. What kind of input do you give to
iotbx.merging_statistics in xia2? For example, when XDS file is given,
use_internal_variance=False is not passed to merge_equivalents()
function. Please look at the lines of
filter_intensities_by_sigma.__init__() in iotbx/merging_statistics.py.
When sigma_filtering == "xds" or sigma_filtering == "scalepack",
array_merged is recalculated using merge_equivalents() with default
arguments.
If nobody disagrees, I would like to commit the fix so that
use_internal_variance variable is passed to all merge_equivalents()
function calls.
I am afraid that the behavior in the phenix-1.11 would be confusing.
In phenix.table_one (mmtbx/command_line/table_one.py),
use_internal_variance=False is default. This will be OK with the fix
I suggested above.
Can it also be default in phenix.merging_statistics, not to change the
program behavior through phenix versions?
Best regards,
Keitaro
2016-11-01 18:21 GMT+09:00
Dear Keitaro,
iotbx.merging_statistics does have the option to change the parameter use_internal_variance. In xia2 we use the defaults use_internal_variance=False, eliminate_sys_absent=False, n_bins=20, when calculating merging statistics which give comparable results to those calculate by Aimless:
$ iotbx.merging_statistics Usage: phenix.merging_statistics [data_file] [options...]
Calculate merging statistics for non-unique data, including R-merge, R-meas, R-pim, and redundancy. Any format supported by Phenix is allowed, including MTZ, unmerged Scalepack, or XDS/XSCALE (and possibly others). Data should already be on a common scale, but with individual observations unmerged. Diederichs K & Karplus PA (1997) Nature Structural Biology 4:269-275 (with erratum in: Nat Struct Biol 1997 Jul;4(7):592) Weiss MS (2001) J Appl Cryst 34:130-135. Karplus PA & Diederichs K (2012) Science 336:1030-3.
Full parameters:
file_name = None labels = None space_group = None unit_cell = None symmetry_file = None high_resolution = None low_resolution = None n_bins = 10 extend_d_max_min = False anomalous = False sigma_filtering = *auto xds scala scalepack .help = "Determines how data are filtered by SigmaI and I/SigmaI. XDS" "discards reflections whose intensity after merging is less than" "-3*sigma, Scalepack uses the same cutoff before merging, and" "SCALA does not do any filtering. Reflections with negative SigmaI" "will always be discarded." use_internal_variance = True eliminate_sys_absent = True debug = False loggraph = False estimate_cutoffs = False job_title = None .help = "Job title in PHENIX GUI, not used on command line"
Below is my email to Pavel and Billy when we discussed this issue by email a while back:
The difference between use_internal_variance=True/False is explained in Luc's document here:
libtbx.pdflatex $(libtbx.find_in_repositories cctbx/miller)/equivalent_reflection_merging.tex
Essentially use_internal_variance=False uses only the unmerged sigmas to compute the merged sigmas, whereas use_internal_variance=True uses instead the spread of the unmerged intensities to compute the merged sigmas. Furthermore, use_internal_variance=True uses the largest of the variance coming from the spread of the intensities and that computed from the unmerged sigmas. As a result, use_internal_variance=True can only ever give lower I/sigI than use_internal_variance=False. The relevant code in the cctbx is here:
https://sourceforge.net/p/cctbx/code/HEAD/tree/trunk/cctbx/miller/merge_equi...
Aimless has a similar option for the SDCORRECTION keyword, if you set the option SAMPLESD, which I think is equivalent to use_internal_variance=True. The default behaviour of Aimless is equivalent to use_internal_variance=False:
http://www.mrc-lmb.cam.ac.uk/harry/pre/aimless.html#sdcorrection
"SAMPLESD is intended for very high multiplicity data such as XFEL serial data. The final SDs are estimated from the weighted population variance, assuming that the input sigma(I)^2 values are proportional to the true errors. This probably gives a more realistic estimate of the error in <I>. In this case refinement of the corrections is switched off unless explicitly requested."
I think that the "external" variance is probably better if the sigmas from the scaling program are reliable, or for low multiplicity data. For high multiplicity data or if the sigmas from the scaling program are not reliable, then "internal" variance is probably better.
Cheers,
Richard
Dr Richard Gildea Data Analysis Scientist Tel: +441235 77 8078
Diamond Light Source Ltd. Diamond House Harwell Science & Innovation Campus Didcot Oxfordshire OX11 0DE
________________________________________ From: [email protected] [[email protected]] on behalf of Keitaro Yamashita [[email protected]] Sent: 01 November 2016 07:23 To: cctbx mailing list Subject: [cctbxbb] use_internal_variance in iotbx.merging_statistics
Dear Phenix/CCTBX developers,
iotbx/merging_statistics.py is used by phenix.merging_statistics, phenix.table_one, and so on. By upgrading phenix from 1.10.1 to 1.11, merging statistics-related codes were significantly changed.
Previously, miller.array.merge_equivalents() was always called with argument use_internal_variance=False, which is consistent with XDS, Aimless and so on. Currently, use_internal_variance=True is default, and cannot be changed by users (see below).
These changes were made by @afonine and @rjgildea in rev. 22973 (Sep 26, 2015) and 23961 (Mar 8, 2016). Could anyone explain why these changes were introduced?
https://sourceforge.net/p/cctbx/code/22973 https://sourceforge.net/p/cctbx/code/23961
My points are:
- We actually cannot control use_internal_variance= parameter because it is not passed to merge_equivalents() in class filter_intensities_by_sigma.
- In previous versions, if I gave XDS output to phenix.merging_statistics, values calculated in the same way (as XDS does) were shown; but not in the current version.
- For (for example) phenix.table_one users who expect this behavior, it can give inconsistency. The statistics would not be consistent with the data used in refinement.
cf. the related discussion in cctbxbb: http://phenix-online.org/pipermail/cctbxbb/2012-October/000611.html
Best regards, Keitaro _______________________________________________ cctbxbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/cctbxbb
-- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
_______________________________________________ cctbxbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/cctbxbb
_______________________________________________ cctbxbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/cctbxbb
Dear Richard,
Thanks a lot. I hope some Phenix developer will make a comment.
Best regards,
Keitaro
2016-11-01 20:19 GMT+09:00
Dear Keitaro,
I've made the change you suggested in merging_statistics.py - it looks like an oversight, which didn't affect xia2 since we are always calculating merging statistics given an scaled but unmerged mtz file, never an XDS or scalepack-format file.
As to what defaults Phenix uses, that is better left to one of the Phenix developers to comment on.
Cheers,
Richard
Dr Richard Gildea Data Analysis Scientist Tel: +441235 77 8078
Diamond Light Source Ltd. Diamond House Harwell Science & Innovation Campus Didcot Oxfordshire OX11 0DE
________________________________________ From: [email protected] [[email protected]] on behalf of Keitaro Yamashita [[email protected]] Sent: 01 November 2016 10:41 To: cctbx mailing list Subject: Re: [cctbxbb] use_internal_variance in iotbx.merging_statistics
Dear Richard and everyone,
Thanks for your reply. What kind of input do you give to iotbx.merging_statistics in xia2? For example, when XDS file is given, use_internal_variance=False is not passed to merge_equivalents() function. Please look at the lines of filter_intensities_by_sigma.__init__() in iotbx/merging_statistics.py. When sigma_filtering == "xds" or sigma_filtering == "scalepack", array_merged is recalculated using merge_equivalents() with default arguments.
If nobody disagrees, I would like to commit the fix so that use_internal_variance variable is passed to all merge_equivalents() function calls.
I am afraid that the behavior in the phenix-1.11 would be confusing. In phenix.table_one (mmtbx/command_line/table_one.py), use_internal_variance=False is default. This will be OK with the fix I suggested above.
Can it also be default in phenix.merging_statistics, not to change the program behavior through phenix versions?
Best regards, Keitaro
2016-11-01 18:21 GMT+09:00
: Dear Keitaro,
iotbx.merging_statistics does have the option to change the parameter use_internal_variance. In xia2 we use the defaults use_internal_variance=False, eliminate_sys_absent=False, n_bins=20, when calculating merging statistics which give comparable results to those calculate by Aimless:
$ iotbx.merging_statistics Usage: phenix.merging_statistics [data_file] [options...]
Calculate merging statistics for non-unique data, including R-merge, R-meas, R-pim, and redundancy. Any format supported by Phenix is allowed, including MTZ, unmerged Scalepack, or XDS/XSCALE (and possibly others). Data should already be on a common scale, but with individual observations unmerged. Diederichs K & Karplus PA (1997) Nature Structural Biology 4:269-275 (with erratum in: Nat Struct Biol 1997 Jul;4(7):592) Weiss MS (2001) J Appl Cryst 34:130-135. Karplus PA & Diederichs K (2012) Science 336:1030-3.
Full parameters:
file_name = None labels = None space_group = None unit_cell = None symmetry_file = None high_resolution = None low_resolution = None n_bins = 10 extend_d_max_min = False anomalous = False sigma_filtering = *auto xds scala scalepack .help = "Determines how data are filtered by SigmaI and I/SigmaI. XDS" "discards reflections whose intensity after merging is less than" "-3*sigma, Scalepack uses the same cutoff before merging, and" "SCALA does not do any filtering. Reflections with negative SigmaI" "will always be discarded." use_internal_variance = True eliminate_sys_absent = True debug = False loggraph = False estimate_cutoffs = False job_title = None .help = "Job title in PHENIX GUI, not used on command line"
Below is my email to Pavel and Billy when we discussed this issue by email a while back:
The difference between use_internal_variance=True/False is explained in Luc's document here:
libtbx.pdflatex $(libtbx.find_in_repositories cctbx/miller)/equivalent_reflection_merging.tex
Essentially use_internal_variance=False uses only the unmerged sigmas to compute the merged sigmas, whereas use_internal_variance=True uses instead the spread of the unmerged intensities to compute the merged sigmas. Furthermore, use_internal_variance=True uses the largest of the variance coming from the spread of the intensities and that computed from the unmerged sigmas. As a result, use_internal_variance=True can only ever give lower I/sigI than use_internal_variance=False. The relevant code in the cctbx is here:
https://sourceforge.net/p/cctbx/code/HEAD/tree/trunk/cctbx/miller/merge_equi...
Aimless has a similar option for the SDCORRECTION keyword, if you set the option SAMPLESD, which I think is equivalent to use_internal_variance=True. The default behaviour of Aimless is equivalent to use_internal_variance=False:
http://www.mrc-lmb.cam.ac.uk/harry/pre/aimless.html#sdcorrection
"SAMPLESD is intended for very high multiplicity data such as XFEL serial data. The final SDs are estimated from the weighted population variance, assuming that the input sigma(I)^2 values are proportional to the true errors. This probably gives a more realistic estimate of the error in <I>. In this case refinement of the corrections is switched off unless explicitly requested."
I think that the "external" variance is probably better if the sigmas from the scaling program are reliable, or for low multiplicity data. For high multiplicity data or if the sigmas from the scaling program are not reliable, then "internal" variance is probably better.
Cheers,
Richard
Dr Richard Gildea Data Analysis Scientist Tel: +441235 77 8078
Diamond Light Source Ltd. Diamond House Harwell Science & Innovation Campus Didcot Oxfordshire OX11 0DE
________________________________________ From: [email protected] [[email protected]] on behalf of Keitaro Yamashita [[email protected]] Sent: 01 November 2016 07:23 To: cctbx mailing list Subject: [cctbxbb] use_internal_variance in iotbx.merging_statistics
Dear Phenix/CCTBX developers,
iotbx/merging_statistics.py is used by phenix.merging_statistics, phenix.table_one, and so on. By upgrading phenix from 1.10.1 to 1.11, merging statistics-related codes were significantly changed.
Previously, miller.array.merge_equivalents() was always called with argument use_internal_variance=False, which is consistent with XDS, Aimless and so on. Currently, use_internal_variance=True is default, and cannot be changed by users (see below).
These changes were made by @afonine and @rjgildea in rev. 22973 (Sep 26, 2015) and 23961 (Mar 8, 2016). Could anyone explain why these changes were introduced?
https://sourceforge.net/p/cctbx/code/22973 https://sourceforge.net/p/cctbx/code/23961
My points are:
- We actually cannot control use_internal_variance= parameter because it is not passed to merge_equivalents() in class filter_intensities_by_sigma.
- In previous versions, if I gave XDS output to phenix.merging_statistics, values calculated in the same way (as XDS does) were shown; but not in the current version.
- For (for example) phenix.table_one users who expect this behavior, it can give inconsistency. The statistics would not be consistent with the data used in refinement.
cf. the related discussion in cctbxbb: http://phenix-online.org/pipermail/cctbxbb/2012-October/000611.html
Best regards, Keitaro _______________________________________________ cctbxbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/cctbxbb
-- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
_______________________________________________ cctbxbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/cctbxbb
_______________________________________________ cctbxbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/cctbxbb
_______________________________________________ cctbxbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/cctbxbb
Hi Keitaro,
In Phenix, we set use_internal_variance to false whenever possible. We had
users ask about the difference in merging statistics, which led to the
discussion about the parameter with Richard. We picked the default that is
mostly consistent with previous versions and added an option to change it,
but we must have missed certain cases where the parameter was not set.
Thanks for finding it!
phenix.merging_statistics is just a different way for calling the same code
as iotbx.merging_statistics, so the default value for use_internal_variance
is true on the command-line. However, I explicitly set the default to false
in the GUI for phenix.merging_statistics. phenix.table_one also sets
use_internal_variance to false. However, looking through other code in the
Phenix tree, there are some instances where use_internal_variance is set to
false with no option to change it, so we will double check if that is the
behavior that we want.
--
Billy K. Poon
Research Scientist, Molecular Biophysics and Integrated Bioimaging
Lawrence Berkeley National Laboratory
1 Cyclotron Road, M/S 33R0345
Berkeley, CA 94720
Tel: (510) 486-5709
Fax: (510) 486-5909
Web: https://phenix-online.org
On Tue, Nov 1, 2016 at 7:06 AM, Keitaro Yamashita wrote: Dear Richard, Thanks a lot. I hope some Phenix developer will make a comment. Best regards,
Keitaro Dear Keitaro, I've made the change you suggested in merging_statistics.py - it looks 2016-11-01 20:19 GMT+09:00 As to what defaults Phenix uses, that is better left to one of the Phenix developers to comment on. Cheers, Richard Dr Richard Gildea
Data Analysis Scientist
Tel: +441235 77 8078 Diamond Light Source Ltd.
Diamond House
Harwell Science & Innovation Campus
Didcot
Oxfordshire
OX11 0DE ________________________________________
From: [email protected] [cctbxbb-bounces@phenix- Sent: 01 November 2016 10:41
To: cctbx mailing list
Subject: Re: [cctbxbb] use_internal_variance in iotbx.merging_statistics Dear Richard and everyone, Thanks for your reply. What kind of input do you give to
iotbx.merging_statistics in xia2? For example, when XDS file is given,
use_internal_variance=False is not passed to merge_equivalents()
function. Please look at the lines of
filter_intensities_by_sigma.__init__() in iotbx/merging_statistics.py.
When sigma_filtering == "xds" or sigma_filtering == "scalepack",
array_merged is recalculated using merge_equivalents() with default
arguments. If nobody disagrees, I would like to commit the fix so that
use_internal_variance variable is passed to all merge_equivalents()
function calls. I am afraid that the behavior in the phenix-1.11 would be confusing.
In phenix.table_one (mmtbx/command_line/table_one.py),
use_internal_variance=False is default. This will be OK with the fix
I suggested above. Can it also be default in phenix.merging_statistics, not to change the
program behavior through phenix versions? Best regards,
Keitaro 2016-11-01 18:21 GMT+09:00 Dear Keitaro, iotbx.merging_statistics does have the option to change the parameter
use_internal_variance. In xia2 we use the defaults
use_internal_variance=False, eliminate_sys_absent=False, n_bins=20, when
calculating merging statistics which give comparable results to those
calculate by Aimless: $ iotbx.merging_statistics
Usage:
phenix.merging_statistics [data_file] [options...] Calculate merging statistics for non-unique data, including R-merge,
R-meas,
R-pim, and redundancy. Any format supported by Phenix is allowed,
including
MTZ, unmerged Scalepack, or XDS/XSCALE (and possibly others). Data
should
already be on a common scale, but with individual observations unmerged.
Diederichs K & Karplus PA (1997) Nature Structural Biology 4:269-275
(with erratum in: Nat Struct Biol 1997 Jul;4(7):592)
Weiss MS (2001) J Appl Cryst 34:130-135.
Karplus PA & Diederichs K (2012) Science 336:1030-3. Full parameters: file_name = None
labels = None
space_group = None
unit_cell = None
symmetry_file = None
high_resolution = None
low_resolution = None
n_bins = 10
extend_d_max_min = False
anomalous = False
sigma_filtering = *auto xds scala scalepack
.help = "Determines how data are filtered by SigmaI and I/SigmaI.
XDS"
"discards reflections whose intensity after merging is less "-3*sigma, Scalepack uses the same cutoff before merging, and" "SCALA does not do any filtering. Reflections with negative SigmaI" "will always be discarded."
use_internal_variance = True
eliminate_sys_absent = True
debug = False
loggraph = False
estimate_cutoffs = False
job_title = None
.help = "Job title in PHENIX GUI, not used on command line" Below is my email to Pavel and Billy when we discussed this issue by email a while back: The difference between use_internal_variance=True/False is explained in Luc's document here: libtbx.pdflatex $(libtbx.find_in_repositories cctbx/miller)/equivalent_ reflection_merging.tex Essentially use_internal_variance=False uses only the unmerged sigmas to compute the merged sigmas, whereas use_internal_variance=True uses
instead the spread of the unmerged intensities to compute the merged
sigmas. Furthermore, use_internal_variance=True uses the largest of the
variance coming from the spread of the intensities and that computed from cctbx/miller/merge_equivalents.h#l379 Aimless has a similar option for the SDCORRECTION keyword, if you set http://www.mrc-lmb.cam.ac.uk/harry/pre/aimless.html#sdcorrection "SAMPLESD is intended for very high multiplicity data such as XFEL serial data. The final SDs are estimated from the weighted population
variance, assuming that the input sigma(I)^2 values are proportional to the I think that the "external" variance is probably better if the sigmas from the scaling program are reliable, or for low multiplicity data. For
high multiplicity data or if the sigmas from the scaling program are not
reliable, then "internal" variance is probably better. Cheers, Richard Dr Richard Gildea
Data Analysis Scientist
Tel: +441235 77 8078 Diamond Light Source Ltd.
Diamond House
Harwell Science & Innovation Campus
Didcot
Oxfordshire
OX11 0DE ________________________________________
From: [email protected] [cctbxbb-bounces@phenix- online.org] on behalf of Keitaro Yamashita [[email protected]] Sent: 01 November 2016 07:23
To: cctbx mailing list
Subject: [cctbxbb] use_internal_variance in iotbx.merging_statistics Dear Phenix/CCTBX developers, iotbx/merging_statistics.py is used by phenix.merging_statistics,
phenix.table_one, and so on. By upgrading phenix from 1.10.1 to 1.11,
merging statistics-related codes were significantly changed. Previously, miller.array.merge_equivalents() was always called with
argument use_internal_variance=False, which is consistent with XDS,
Aimless and so on. Currently, use_internal_variance=True is default,
and cannot be changed by users (see below). These changes were made by @afonine and @rjgildea in rev. 22973 (Sep
26, 2015) and 23961 (Mar 8, 2016). Could anyone explain why these
changes were introduced? https://sourceforge.net/p/cctbx/code/22973
https://sourceforge.net/p/cctbx/code/23961 My points are: - We actually cannot control use_internal_variance= parameter because
it is not passed to merge_equivalents() in class
filter_intensities_by_sigma. - In previous versions, if I gave XDS output to
phenix.merging_statistics, values calculated in the same way
(as XDS does) were shown; but not in the current version. - For (for example) phenix.table_one users who expect this behavior,
it can give inconsistency. The statistics would not be consistent with
the data used in refinement. cf. the related discussion in cctbxbb:
http://phenix-online.org/pipermail/cctbxbb/2012-October/000611.html Best regards,
Keitaro
_______________________________________________
cctbxbb mailing list
[email protected]
http://phenix-online.org/mailman/listinfo/cctbxbb --
This e-mail and any attachments may contain confidential, copyright and
or privileged material, and are for the use of the intended addressee only.
If you are not the intended addressee or an authorised recipient of the
addressee please notify us of receipt by returning the e-mail and do not
use, copy, retain, distribute or disclose the information in or attached to Any opinions expressed within this e-mail are those of the individual
and not necessarily of Diamond Light Source Ltd.
Diamond Light Source Ltd. cannot guarantee that this e-mail or any
attachments are free from viruses and we cannot accept liability for any
damage which you may sustain as a result of software viruses which may be online.org] on behalf of Keitaro Yamashita [[email protected]]
than"
the unmerged sigmas. As a result, use_internal_variance=True can only ever
give lower I/sigI than use_internal_variance=False. The relevant code in
the cctbx is here:
the option SAMPLESD, which I think is equivalent to
use_internal_variance=True. The default behaviour of Aimless is equivalent
to use_internal_variance=False:
true errors. This probably gives a more realistic estimate of the error in
<I>. In this case refinement of the corrections is switched off unless
explicitly requested."
the e-mail.
transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in
England and Wales with its registered office at Diamond House, Harwell
Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom _______________________________________________
cctbxbb mailing list
[email protected]
http://phenix-online.org/mailman/listinfo/cctbxbb _______________________________________________
cctbxbb mailing list
[email protected]
http://phenix-online.org/mailman/listinfo/cctbxbb _______________________________________________
cctbxbb mailing list
[email protected]
http://phenix-online.org/mailman/listinfo/cctbxbb _______________________________________________
cctbxbb mailing list
[email protected]
http://phenix-online.org/mailman/listinfo/cctbxbb
As a side comment on what Aimless does, I’ve been looking at SD estimates and the SAMPLESD option in particular, in my development version (I think there is a bug in the current distributed version). Using the internal variance does provide an alternative estimate of var(Imean), but doesn’t answer the question of how to weight the individual observations in determining the mean, which is not straightforward. The error model used in Aimless and XDS is purely dependent on intensity, so it takes no account of radiation damage, which is hard to do. My development version of Aimless will by default “optimise” the error model (and I’m trying out alternative methods of doing this), even if SAMPLESD is switched on, and there is an analysis of the difference between the internal and external error estimates. I’m wondering about automatically defaulting to switching SAMPLESD on for high multiplicity (but how high?). Or doing some weighted mean between the two estimates. Error models are hard :-( Phil
On 1 Nov 2016, at 09:21,
wrote: Dear Keitaro,
iotbx.merging_statistics does have the option to change the parameter use_internal_variance. In xia2 we use the defaults use_internal_variance=False, eliminate_sys_absent=False, n_bins=20, when calculating merging statistics which give comparable results to those calculate by Aimless:
$ iotbx.merging_statistics Usage: phenix.merging_statistics [data_file] [options...]
Calculate merging statistics for non-unique data, including R-merge, R-meas, R-pim, and redundancy. Any format supported by Phenix is allowed, including MTZ, unmerged Scalepack, or XDS/XSCALE (and possibly others). Data should already be on a common scale, but with individual observations unmerged. Diederichs K & Karplus PA (1997) Nature Structural Biology 4:269-275 (with erratum in: Nat Struct Biol 1997 Jul;4(7):592) Weiss MS (2001) J Appl Cryst 34:130-135. Karplus PA & Diederichs K (2012) Science 336:1030-3.
Full parameters:
file_name = None labels = None space_group = None unit_cell = None symmetry_file = None high_resolution = None low_resolution = None n_bins = 10 extend_d_max_min = False anomalous = False sigma_filtering = *auto xds scala scalepack .help = "Determines how data are filtered by SigmaI and I/SigmaI. XDS" "discards reflections whose intensity after merging is less than" "-3*sigma, Scalepack uses the same cutoff before merging, and" "SCALA does not do any filtering. Reflections with negative SigmaI" "will always be discarded." use_internal_variance = True eliminate_sys_absent = True debug = False loggraph = False estimate_cutoffs = False job_title = None .help = "Job title in PHENIX GUI, not used on command line"
Below is my email to Pavel and Billy when we discussed this issue by email a while back:
The difference between use_internal_variance=True/False is explained in Luc's document here:
libtbx.pdflatex $(libtbx.find_in_repositories cctbx/miller)/equivalent_reflection_merging.tex
Essentially use_internal_variance=False uses only the unmerged sigmas to compute the merged sigmas, whereas use_internal_variance=True uses instead the spread of the unmerged intensities to compute the merged sigmas. Furthermore, use_internal_variance=True uses the largest of the variance coming from the spread of the intensities and that computed from the unmerged sigmas. As a result, use_internal_variance=True can only ever give lower I/sigI than use_internal_variance=False. The relevant code in the cctbx is here:
https://sourceforge.net/p/cctbx/code/HEAD/tree/trunk/cctbx/miller/merge_equi...
Aimless has a similar option for the SDCORRECTION keyword, if you set the option SAMPLESD, which I think is equivalent to use_internal_variance=True. The default behaviour of Aimless is equivalent to use_internal_variance=False:
http://www.mrc-lmb.cam.ac.uk/harry/pre/aimless.html#sdcorrection
"SAMPLESD is intended for very high multiplicity data such as XFEL serial data. The final SDs are estimated from the weighted population variance, assuming that the input sigma(I)^2 values are proportional to the true errors. This probably gives a more realistic estimate of the error in <I>. In this case refinement of the corrections is switched off unless explicitly requested."
I think that the "external" variance is probably better if the sigmas from the scaling program are reliable, or for low multiplicity data. For high multiplicity data or if the sigmas from the scaling program are not reliable, then "internal" variance is probably better.
Cheers,
Richard
Dr Richard Gildea Data Analysis Scientist Tel: +441235 77 8078
Diamond Light Source Ltd. Diamond House Harwell Science & Innovation Campus Didcot Oxfordshire OX11 0DE
________________________________________ From: [email protected] [[email protected]] on behalf of Keitaro Yamashita [[email protected]] Sent: 01 November 2016 07:23 To: cctbx mailing list Subject: [cctbxbb] use_internal_variance in iotbx.merging_statistics
Dear Phenix/CCTBX developers,
iotbx/merging_statistics.py is used by phenix.merging_statistics, phenix.table_one, and so on. By upgrading phenix from 1.10.1 to 1.11, merging statistics-related codes were significantly changed.
Previously, miller.array.merge_equivalents() was always called with argument use_internal_variance=False, which is consistent with XDS, Aimless and so on. Currently, use_internal_variance=True is default, and cannot be changed by users (see below).
These changes were made by @afonine and @rjgildea in rev. 22973 (Sep 26, 2015) and 23961 (Mar 8, 2016). Could anyone explain why these changes were introduced?
https://sourceforge.net/p/cctbx/code/22973 https://sourceforge.net/p/cctbx/code/23961
My points are:
- We actually cannot control use_internal_variance= parameter because it is not passed to merge_equivalents() in class filter_intensities_by_sigma.
- In previous versions, if I gave XDS output to phenix.merging_statistics, values calculated in the same way (as XDS does) were shown; but not in the current version.
- For (for example) phenix.table_one users who expect this behavior, it can give inconsistency. The statistics would not be consistent with the data used in refinement.
cf. the related discussion in cctbxbb: http://phenix-online.org/pipermail/cctbxbb/2012-October/000611.html
Best regards, Keitaro _______________________________________________ cctbxbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/cctbxbb
-- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
_______________________________________________ cctbxbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/cctbxbb
participants (4)
-
Billy Poon
-
Keitaro Yamashita
-
Phil Evans
-
richard.gildea@diamond.ac.uk