Re: [cctbxbb] Unstable platform-dependent sort order for flex.sort_permutation

newer
Re: [cctbxbb] cctbxbb Digest, Vol...

older
Unstable platform-dependent sort...

richard.gildea＠diamond.ac.uk

18 Nov 2016 18 Nov '16

4:10 p.m.

Hi Oleg, The permutations would lead to the same sequence of miller indices, but in the context of an unmerged reflection array (i.e. miller.array.sort_permutation), then the data associated with those miller indices would not necessarily be in the same order. As the dataset is then split into two half-datasets, this difference in sort order leads to a different value of the calculate correlation coefficient between those two half datasets: https://github.com/cctbx/cctbx_project/blob/master/cctbx/miller/__init__.py#... Cheers, Richard Dr Richard Gildea Data Analysis Scientist Tel: +441235 77 8078 Diamond Light Source Ltd. Diamond House Harwell Science & Innovation Campus Didcot Oxfordshire OX11 0DE ________________________________ From: [email protected] [[email protected]] on behalf of [email protected] [[email protected]] Sent: 18 November 2016 15:59 To: cctbx mailing list Subject: Re: [cctbxbb] Unstable platform-dependent sort order for flex.sort_permutation Hi Richard, it is a bug only if the permutations do not lead to the same sequence. Otherwise you cannot expect to get the same sorting permutations for collections with redundant data on different platforms or even between different versions of compilers. Cheers, Oleg. ________________________________ From: [email protected] on behalf of [email protected] Sent: 18 November 2016 14:12:29 To: [email protected] Subject: [cctbxbb] Unstable platform-dependent sort order for flex.sort_permutation Hi, I've been trying to diagnose why the miller array cc_anom calculations seem to be platform dependent. I've narrowed this down to a variation in the sort order returned by flex.sort_permutation which is called from within miller.array.sort("packed_indices"). The following code demonstrates the problem and the different output I get on mac and Linux: Mac:

...

...
...
from scitbx.array_family import flex a = flex.size_t([7, 1, 1, 5, 1, 3, 3, 7, 1, 7, 3, 3, 5, 7, 5, 5, 5, 1, 1, 1]) print list(flex.sort_permutation(a)) [1, 2, 4, 8, 17, 18, 19, 5, 6, 10, 11, 3, 12, 14, 15, 16, 0, 7, 9, 13]

Linux:

...

...
...
from scitbx.array_family import flex a = flex.size_t([7, 1, 1, 5, 1, 3, 3, 7, 1, 7, 3, 3, 5, 7, 5, 5, 5, 1, 1, 1]) print list(flex.sort_permutation(a)) [19, 1, 2, 18, 4, 17, 8, 11, 10, 6, 5, 12, 14, 15, 16, 3, 9, 7, 13, 0]

Is it known/expected for the sort order to be platform dependent, or is this a bug? Here is the relevant code for flex.sort_permutation(): https://github.com/cctbx/cctbx_project/blob/master/scitbx/array_family/sort.... Cheers, Richard Dr Richard Gildea Data Analysis Scientist Tel: +441235 77 8078 Diamond Light Source Ltd. Diamond House Harwell Science & Innovation Campus Didcot Oxfordshire OX11 0DE -- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom _______________________________________________ cctbxbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/cctbxbb

Show replies by date

Nicholas Sauter

18 Nov 18 Nov

4:26 p.m.

New subject: Unstable platform-dependent sort order for flex.sort_permutation

Richard & Oleg, I'm not in office right now, but I think the concept you are looking for is the "stable sort", which would sort items but give the fewest rearrangements. I believe the C++ standard library exposes this function, so it would be possible to create a corresponding flex.stable_sort_permuation for our array family. Nick Nicholas K. Sauter, Ph. D. Senior Scientist, Molecular Biophysics & Integrated Bioimaging Division Lawrence Berkeley National Laboratory 1 Cyclotron Rd., Bldg. 33R0345 Berkeley, CA 94720 (510) 486-5713 On Fri, Nov 18, 2016 at 8:10 AM, wrote:

...

Hi Oleg,

The permutations would lead to the same sequence of miller indices, but in the context of an unmerged reflection array (i.e. miller.array.sort_permutation), then the data associated with those miller indices would not necessarily be in the same order. As the dataset is then split into two half-datasets, this difference in sort order leads to a different value of the calculate correlation coefficient between those two half datasets:

https://github.com/cctbx/cctbx_project/blob/master/ cctbx/miller/__init__.py#L4700

Cheers,

Richard

Dr Richard Gildea Data Analysis Scientist Tel: +441235 77 8078

Diamond Light Source Ltd. Diamond House Harwell Science & Innovation Campus Didcot Oxfordshire OX11 0DE ________________________________ From: [email protected] [[email protected]] on behalf of [email protected] [[email protected]] Sent: 18 November 2016 15:59 To: cctbx mailing list Subject: Re: [cctbxbb] Unstable platform-dependent sort order for flex.sort_permutation

Hi Richard,

it is a bug only if the permutations do not lead to the same sequence. Otherwise you cannot expect to get the same sorting permutations for collections with redundant data on different platforms or even between different versions of compilers.

Cheers,

Oleg.

________________________________ From: [email protected] on behalf of [email protected] Sent: 18 November 2016 14:12:29 To: [email protected] Subject: [cctbxbb] Unstable platform-dependent sort order for flex.sort_permutation

Hi,

I've been trying to diagnose why the miller array cc_anom calculations seem to be platform dependent. I've narrowed this down to a variation in the sort order returned by flex.sort_permutation which is called from within miller.array.sort("packed_indices"). The following code demonstrates the problem and the different output I get on mac and Linux:

Mac:

...
...
...
from scitbx.array_family import flex a = flex.size_t([7, 1, 1, 5, 1, 3, 3, 7, 1, 7, 3, 3, 5, 7, 5, 5, 5, 1, 1, 1]) print list(flex.sort_permutation(a)) [1, 2, 4, 8, 17, 18, 19, 5, 6, 10, 11, 3, 12, 14, 15, 16, 0, 7, 9, 13]

Linux:

...
...
...
from scitbx.array_family import flex a = flex.size_t([7, 1, 1, 5, 1, 3, 3, 7, 1, 7, 3, 3, 5, 7, 5, 5, 5, 1, 1, 1]) print list(flex.sort_permutation(a)) [19, 1, 2, 18, 4, 17, 8, 11, 10, 6, 5, 12, 14, 15, 16, 3, 9, 7, 13, 0]

Is it known/expected for the sort order to be platform dependent, or is this a bug?

Here is the relevant code for flex.sort_permutation():

https://github.com/cctbx/cctbx_project/blob/master/ scitbx/array_family/sort.h

Cheers,

Richard

Dr Richard Gildea Data Analysis Scientist Tel: +441235 77 8078

Diamond Light Source Ltd. Diamond House Harwell Science & Innovation Campus Didcot Oxfordshire OX11 0DE

-- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom

_______________________________________________ cctbxbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/cctbxbb

_______________________________________________ cctbxbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/cctbxbb

oleg＠olexsys.org

4:30 p.m.

New subject: Unstable platform-dependent sort order for flex.sort_permutation

Hi Richard, I guess you need to extend it to also use the reflection intensity in the comparator if you really need reproducible sets :). Cheers, Oleg. ________________________________ From: [email protected] on behalf of [email protected] Sent: 18 November 2016 16:10:25 To: [email protected] Subject: Re: [cctbxbb] Unstable platform-dependent sort order for flex.sort_permutation Hi Oleg, The permutations would lead to the same sequence of miller indices, but in the context of an unmerged reflection array (i.e. miller.array.sort_permutation), then the data associated with those miller indices would not necessarily be in the same order. As the dataset is then split into two half-datasets, this difference in sort order leads to a different value of the calculate correlation coefficient between those two half datasets: https://github.com/cctbx/cctbx_project/blob/master/cctbx/miller/__init__.py#... Cheers, Richard Dr Richard Gildea Data Analysis Scientist Tel: +441235 77 8078 Diamond Light Source Ltd. Diamond House Harwell Science & Innovation Campus Didcot Oxfordshire OX11 0DE ________________________________ From: [email protected] [[email protected]] on behalf of [email protected] [[email protected]] Sent: 18 November 2016 15:59 To: cctbx mailing list Subject: Re: [cctbxbb] Unstable platform-dependent sort order for flex.sort_permutation Hi Richard, it is a bug only if the permutations do not lead to the same sequence. Otherwise you cannot expect to get the same sorting permutations for collections with redundant data on different platforms or even between different versions of compilers. Cheers, Oleg. ________________________________ From: [email protected] on behalf of [email protected] Sent: 18 November 2016 14:12:29 To: [email protected] Subject: [cctbxbb] Unstable platform-dependent sort order for flex.sort_permutation Hi, I've been trying to diagnose why the miller array cc_anom calculations seem to be platform dependent. I've narrowed this down to a variation in the sort order returned by flex.sort_permutation which is called from within miller.array.sort("packed_indices"). The following code demonstrates the problem and the different output I get on mac and Linux: Mac:

...

...
...
from scitbx.array_family import flex a = flex.size_t([7, 1, 1, 5, 1, 3, 3, 7, 1, 7, 3, 3, 5, 7, 5, 5, 5, 1, 1, 1]) print list(flex.sort_permutation(a)) [1, 2, 4, 8, 17, 18, 19, 5, 6, 10, 11, 3, 12, 14, 15, 16, 0, 7, 9, 13]

Linux:

...

...
...
from scitbx.array_family import flex a = flex.size_t([7, 1, 1, 5, 1, 3, 3, 7, 1, 7, 3, 3, 5, 7, 5, 5, 5, 1, 1, 1]) print list(flex.sort_permutation(a)) [19, 1, 2, 18, 4, 17, 8, 11, 10, 6, 5, 12, 14, 15, 16, 3, 9, 7, 13, 0]

Luc Bourhis

4:36 p.m.

New subject: Unstable platform-dependent sort order for flex.sort_permutation

...

...
The permutations would lead to the same sequence of miller indices, but in the context of an unmerged reflection array (i.e. miller.array.sort_permutation), then the data associated with those miller indices would not necessarily be in the same order. As the dataset is then split into two half-datasets, this difference in sort order leads to a different value of the calculate correlation coefficient between those two half datasets:

...

I guess you need to extend it to also use the reflection intensity in the comparator if you really need reproducible sets :).

Interesting question! The simplest solution is definitively that advocated by Nick, stable sort. But that is arbitrary as far as the intensities are concerned to keep the order of the indices, especially since that indices order is most surely quite arbitrary in the first place. Sorting first on the indices and then on the intensities: is that any less arbitrary to compute the correlation?

Luc Bourhis

4:45 p.m.

New subject: Unstable platform-dependent sort order for flex.sort_permutation

...

On 18 Nov 2016, at 17:10, [email protected] wrote:

a different value of the calculate correlation coefficient between those two half datasets:

What does split_unmerged do again?

richard.gildea＠diamond.ac.uk

4:55 p.m.

New subject: Unstable platform-dependent sort order for flex.sort_permutation

Hi, Thanks all for the responses, I think the simplest solution would be to add an option stable=False to flex.sort_permutation that can optionally use std::stable_sort in place of std::sort. Luc: split_unmerged splits an unmerged dataset into two random half datasets, in order to calculate the correlation coefficient between the two half datasets: https://github.com/cctbx/cctbx_project/blob/master/cctbx/miller/merge_equiva... See also Karplus, P. A., & Diederichs, K. (2012). Linking crystallographic model and data quality. Science, 336(6084), 1030-1033: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3457925/ Cheers, Richard Dr Richard Gildea Data Analysis Scientist Tel: +441235 77 8078 Diamond Light Source Ltd. Diamond House Harwell Science & Innovation Campus Didcot Oxfordshire OX11 0DE ________________________________ From: [email protected] [[email protected]] on behalf of Luc Bourhis [[email protected]] Sent: 18 November 2016 16:45 To: cctbx mailing list Subject: Re: [cctbxbb] Unstable platform-dependent sort order for flex.sort_permutation On 18 Nov 2016, at 17:10, [email protected]mailto:[email protected] wrote: a different value of the calculate correlation coefficient between those two half datasets: What does split_unmerged do again? -- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom

Luc Bourhis

5:06 p.m.

New subject: Unstable platform-dependent sort order for flex.sort_permutation

...

On 18 Nov 2016, at 17:55, [email protected] wrote:

Luc: split_unmerged splits an unmerged dataset into two random half datasets, in order to calculate the correlation coefficient between the two half datasets:

oh, so you mean that using the same random seed on MacOS and on Linux you did not get the same CC then?

richard.gildea＠diamond.ac.uk

6:25 p.m.

New subject: Unstable platform-dependent sort order for flex.sort_permutation

Hi Luc, Yes, the random numbers generated by the mersenne twister were the same, however the input tmp_array was in a different sort order, which meant that the output half datasets were platform-dependent: https://github.com/cctbx/cctbx_project/blob/master/cctbx/miller/__init__.py#... I have just committed the necessary changes to add the parameter stable(=False) to flex.sort_permutation(). miller.array.sort_permutation sets stable=True when calling flex.sort_permutation(). This looks to have made the CC1/2 calculations platform-independent. Cheers, Richard Dr Richard Gildea Data Analysis Scientist Tel: +441235 77 8078 Diamond Light Source Ltd. Diamond House Harwell Science & Innovation Campus Didcot Oxfordshire OX11 0DE ________________________________ From: [email protected] [[email protected]] on behalf of Luc Bourhis [[email protected]] Sent: 18 November 2016 17:06 To: cctbx mailing list Subject: Re: [cctbxbb] Unstable platform-dependent sort order for flex.sort_permutation On 18 Nov 2016, at 17:55, [email protected]mailto:[email protected] wrote: Luc: split_unmerged splits an unmerged dataset into two random half datasets, in order to calculate the correlation coefficient between the two half datasets: oh, so you mean that using the same random seed on MacOS and on Linux you did not get the same CC then? -- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom

Luc Bourhis

6:41 p.m.

New subject: Unstable platform-dependent sort order for flex.sort_permutation

...

I have just committed the necessary changes to add the parameter stable(=False) to flex.sort_permutation(). miller.array.sort_permutation sets stable=True when calling flex.sort_permutation(). This looks to have made the CC1/2 calculations platform-independent.

Thank you, that’s generally useful an addition. Sorry for not getting the context of your question in the first place!

Phil Evans

20 Nov 20 Nov

1:06 p.m.

New subject: Unstable platform-dependent sort order for flex.sort_permutation

You might be interested in an alternative method of calculating CC(1/2) from variances, rather than from explicit half-sets, described tersely in this paper 1. Assmann G, Brehm W, Diederichs K. Identification of rogue datasets in serial crystallography. Journal of Applied Crystallography. 2016 Jun;49(3):1021–8.

...

On 18 Nov 2016, at 18:41, Luc Bourhis wrote:

...
I have just committed the necessary changes to add the parameter stable(=False) to flex.sort_permutation(). miller.array.sort_permutation sets stable=True when calling flex.sort_permutation(). This looks to have made the CC1/2 calculations platform-independent.

Thank you, that’s generally useful an addition. Sorry for not getting the context of your question in the first place!

_______________________________________________ cctbxbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/cctbxbb

3079

Age (days ago)

3081

Last active (days ago)

List overview

Download

9 comments

5 participants

participants (5)

Luc Bourhis
Nicholas Sauter
oleg＠olexsys.org
Phil Evans
richard.gildea＠diamond.ac.uk