Best way to compare two structures using cctbx?
Hi, I think I've seen some code in cctbx which tries to put the similarity of two crystal structures into a number. Am I right on this? I want to have something similar: I'm interested in comparing the direct space atomic distances of two structures sharing the same unit cell parameters and space group and scatterers but different atomic coordinates for those. Now I'd like to have a mapping where I could compare the bond lengths between all scatterers which should eliminate any movements which might also be due to a shift in the origin of a P 1 cell. Also I'd like to have a value for mean, minimal and maximal displacements. The purpose of this is to check how good a structure approximation matches a reference structure. How would you suggest to take on this task? Or is there even some code related to this? Thanks in advance, Jan
Dear Jan,
this is an interesting problem and there are no perfect solutions. I'm not
currently a cctbx user so I can't say whether such an algorithm is
implemented but I done some work on this myself and can maybe point you in
the direction of some approaches you could take.
* Sorted distances (possibly the simplest approach)
1) Calculate all r_ij distances from all the atoms in your unit cell to all
neighbours within a cutoff.
2) Sort the list of distances.
3) To get the 'difference' between the two structures calculate the rms of
the differences in distances between the two lists.
Generally this approach works well although it can give false positives (by
virtue of the fact that it's throwing away a lot of information)
* Histogram radial distribution comparison (real space)
See: http://www.ncbi.nlm.nih.gov/pubmed/20720316 for details.
Basically calculate the radial distribution histogram for each species pair
and then use some method to calculate the difference between the histograms
(e.g. rms differences, cosine distance, etc.)
* Frequency space comparison
See http://arxiv.org/abs/0805.1418 for details.
Similar to calculating the r-factor although in this paper they use a more
sophisticated approach that I don't fully understand.
* Finding ideal transformation between structures
Hundt's approach (CMPZ):
http://scripts.iucr.org/cgi-bin/paper?S0021889805032450
Lonie's approach (XtalComp):
http://www.sciencedirect.com/science/article/pii/S0010465511003699
for details and
http://xtalopt.openmolecules.net/xtalcomp/xtalcomp.html
to try the algorithm for yourself.
Basically the idea is to find the more ideal transformation to take you
from one crystal structure to another. From the transformation you can
then pull out one number to get a difference or use it to map from one
system to another.
I think I'll stop there but I'd be curious to hear about what you find and
how you get on.
All the best,
-Martin
On 4 March 2013 12:06, Jan Marten Simons
Hi,
I think I've seen some code in cctbx which tries to put the similarity of two crystal structures into a number. Am I right on this?
I want to have something similar:
I'm interested in comparing the direct space atomic distances of two structures sharing the same unit cell parameters and space group and scatterers but different atomic coordinates for those. Now I'd like to have a mapping where I could compare the bond lengths between all scatterers which should eliminate any movements which might also be due to a shift in the origin of a P 1 cell. Also I'd like to have a value for mean, minimal and maximal displacements. The purpose of this is to check how good a structure approximation matches a reference structure.
How would you suggest to take on this task? Or is there even some code related to this?
Thanks in advance,
Jan _______________________________________________ cctbxbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/cctbxbb
-- Martin Uhrin Tel: +44 207 679 3466 Department of Physics & Astronomy Fax:+44 207 679 0595 University College London [email protected] Gower St, London, WC1E 6BT, U.K. http://www.cmmp.ucl.ac.uk
Hi Jan, the cctbx does not have a tool doing exactly what you describe. But there is cctbx.euclidean_model_matching, that we all call emma, which could be close enough. It tries to find the best isometry matching the biggest subset of the 1st structure onto the 2nd structure, and since an isometry preserves bond lengths, angles and even torsion angles, perhaps you may find that tool useful. Here is a very quick kickstarter: Starting from two instances of xray.structure xs1 and xs2 for the two structures you wish to compare, the following is all you need to do from cctbx import euclidean_model_matching as emma .... m = emma.model_matches( xs1.as_emma_model(), xs2.as_emma_model(), tolerance=0.5, break_if_match_with_no_singles=False ).refined_matches[0] Then m.rms gives the rms for the discrepancy between each set of coordinates, while m.rt gives you the symmetry that had to be applied to best match each structures. Then m.singles1 is the list of scatterers in the 1st structure that could not be matched to any scatterer in the 2nd one. Obviously, m.singles2 is the other-way around. Finally, m.pairs is the mapping established between a subset of the 1st structure and a subset of the 2nd structure. HtH, Luc
Am Montag 04 März 2013 16:18:03 schrieb Luc Bourhis:
Hi Jan,
the cctbx does not have a tool doing exactly what you describe. But there is cctbx.euclidean_model_matching, that we all call emma, which could be close enough. It tries to find the best isometry matching the biggest subset of the 1st structure onto the 2nd structure, and since an isometry preserves bond lengths, angles and even torsion angles, perhaps you may find that tool useful. Here is a very quick kickstarter:
Starting from two instances of xray.structure xs1 and xs2 for the two structures you wish to compare, the following is all you need to do
from cctbx import euclidean_model_matching as emma .... m = emma.model_matches( xs1.as_emma_model(), xs2.as_emma_model(), tolerance=0.5, break_if_match_with_no_singles=False ).refined_matches[0]
Then m.rms gives the rms for the discrepancy between each set of coordinates, while m.rt gives you the symmetry that had to be applied to best match each structures. Then m.singles1 is the list of scatterers in the 1st structure that could not be matched to any scatterer in the 2nd one. Obviously, m.singles2 is the other-way around. Finally, m.pairs is the mapping established between a subset of the 1st structure and a subset of the 2nd structure. Hi Luc,
Emma seems to work quite nicely although it could use some more documentation. Right now I wonder where one would put a cut-off on rms with respect to a) the structures disagree substantially b) the structures have the same scatterers but at completely different positions relative to each other or the symmetry elements c) the structures are quite similar (e.g. you only have to move a single scatterer a little) d) the structures are identical within the tolerance limit (apart from origin shifts). Is there a paper I could read for reference and can you give some rule of thumb rms intervals for the cases mentioned above? With regards, Dipl. Phys. Jan M. Simons Institute of Crystallography RWTH Aachen University
On 12 Mar 2013, at 16:56, Jan Marten Simons wrote:
Right now I wonder where one would put a cut-off on rms with respect to a) the structures disagree substantially b) the structures have the same scatterers but at completely different positions relative to each other or the symmetry elements c) the structures are quite similar (e.g. you only have to move a single scatterer a little) d) the structures are identical within the tolerance limit (apart from origin shifts).
Is there a paper I could read for reference and can you give some rule of thumb rms intervals for the cases mentioned above?
I don't know of any paper I am afraid. I think that you need to look at the distribution of the distances between the members of each matched pair. Then (c) would be characterised by one of those distance being much bigger than all the others whereas (d) would have a flat distribution near zero. I am not sure I understand what you mean by (b) and (a) on the other hand. Best wishes, Luc
There are a couple of pages on euclidean model matching in one of Ralf's
IUCr Computing Commission Newsletter articles:
http://www.iucr.org/__data/assets/pdf_file/0004/6367/iucrcompcomm_sep2005.pd...
Cheers,
Richard
On 12 March 2013 10:48, Luc Bourhis
On 12 Mar 2013, at 16:56, Jan Marten Simons wrote:
Right now I wonder where one would put a cut-off on rms with respect to a) the structures disagree substantially b) the structures have the same scatterers but at completely different positions relative to each other or the symmetry elements c) the structures are quite similar (e.g. you only have to move a single scatterer a little) d) the structures are identical within the tolerance limit (apart from origin shifts).
Is there a paper I could read for reference and can you give some rule of thumb rms intervals for the cases mentioned above?
I don't know of any paper I am afraid. I think that you need to look at the distribution of the distances between the members of each matched pair. Then (c) would be characterised by one of those distance being much bigger than all the others whereas (d) would have a flat distribution near zero. I am not sure I understand what you mean by (b) and (a) on the other hand.
Best wishes,
Luc
_______________________________________________ cctbxbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/cctbxbb
Am Montag 04 März 2013 16:18:03 schrieb Luc Bourhis:
Hi Jan,
the cctbx does not have a tool doing exactly what you describe. But there is cctbx.euclidean_model_matching, that we all call emma, which could be close enough. It tries to find the best isometry matching the biggest subset of the 1st structure onto the 2nd structure, and since an isometry preserves bond lengths, angles and even torsion angles, perhaps you may find that tool useful. Here is a very quick kickstarter:
Starting from two instances of xray.structure xs1 and xs2 for the two structures you wish to compare, the following is all you need to do
Seems I praised emma a little bit too soon, as this example leads to an exception: ----8<------------------------------------------------------------------ from cctbx import xray from cctbx import crystal from cctbx.array_family import flex from cctbx import euclidean_model_matching as emma xs1=xray.structure( crystal_symmetry=crystal.symmetry( unit_cell=(4.98599, 6.48179, 8.47619, 83, 109, 129), space_group_symbol="P -1"), scatterers=flex.xray_scatterer([ xray.scatterer( #0 label="C1", site=(0.710543, 0.440181, 0.852614), u=0.005000), xray.scatterer( #1 label="C2", site=(0.305308, 0.127783, 0.856840), u=0.005000)])) xs2=xray.structure( crystal_symmetry=crystal.symmetry( unit_cell=(4.98599, 6.48179, 8.47619, 83, 109, 129), space_group_symbol="P -1"), scatterers=flex.xray_scatterer([ xray.scatterer( #0 label="C1", site=(0.199543, 0.159467, 0.002565), u=0.001260), xray.scatterer( #1 label="C2", site=(0.205553, 0.153391, 0.002025), u=0.001260)])) m = emma.model_matches( xs1.as_emma_model(), xs2.as_emma_model(), tolerance=0.5, break_if_match_with_no_singles=False ).refined_matches[0] print(m.rms) ----8<------------------------------------------------------------------ I've got no idea as to why those two (random) structures cannot be compared. Have I found a bug in emma? With regards, Dipl. Phys. Jan M. Simons Institute of Crystallography RWTH Aachen University
On 12 Mar 2013, at 18:19, Jan Marten Simons wrote:
Seems I praised emma a little bit too soon, as this example leads to an exception:
[...] m = emma.model_matches( xs1.as_emma_model(), xs2.as_emma_model(), tolerance=0.5, break_if_match_with_no_singles=False ).refined_matches[0] <<<<<<<<<<<<<<<<<<<<<<<<
You get an "list index out of range" at the point I marked, don't you? That means emma has not found any match. You should always assert that .refined_matches is not empty in production code.
I've got no idea as to why those two (random) structures cannot be compared. Have I found a bug in emma?
In one of the structure the distance between the 2 scatterers is 1.75 whereas it is 0.06 in the other one. There is therefore no isometry that can map xs1 onto xs2 and emma behaves correctly here. Best wishes, Luc
Hi Jan, there is no dedicated tool that would do it in one go, but I think you can put something together from pieces scattered across cctbx and phenix code. Oh and by the way I don't know what is "direct space atomic distances "... First I would superpose two structures using a call like this (copied from phenix/phenix/command_line): from scitbx.math import superpose superpose.least_squares_fit(reference_sites = fixed_sites, other_sites = moving_sites) rmsd = fixed_sites.rms_difference(moving_sites) this assumes structures are aligned. Then just loop over bonds and collect bond distances into flex double array and then compute call .min_max_mean()... Pavel On 3/4/13 4:06 AM, Jan Marten Simons wrote:
Hi,
I think I've seen some code in cctbx which tries to put the similarity of two crystal structures into a number. Am I right on this?
I want to have something similar:
I'm interested in comparing the direct space atomic distances of two structures sharing the same unit cell parameters and space group and scatterers but different atomic coordinates for those. Now I'd like to have a mapping where I could compare the bond lengths between all scatterers which should eliminate any movements which might also be due to a shift in the origin of a P 1 cell. Also I'd like to have a value for mean, minimal and maximal displacements. The purpose of this is to check how good a structure approximation matches a reference structure.
How would you suggest to take on this task? Or is there even some code related to this?
Thanks in advance,
Jan _______________________________________________ cctbxbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/cctbxbb
participants (5)
-
Jan Marten Simons
-
Luc Bourhis
-
Martin Uhrin
-
Pavel Afonine
-
Richard Gildea