On Tue, Aug 12, 2014 at 4:14 PM, Michael Thompson
I'm interested in doing some comparisons of random half data sets, inspired by statistics like CC1/2, CC*, etc. Does phenix contain some tool to split unmerged data into 2 random sets? Thought I would ask before trying to write my own script.
Didn't see this before I replied off-list to your ccp4bb post, but for the record: You can do this very easily with CCTBX, for instance: from iotbx.file_reader import any_file from cctbx import miller from scitbx.array_family import flex hkl_in = any_file("data.hkl") i_obs = hkl_in.file_object.as_miller_arrays(merge_equivalents=False) i_obs = i_obs.select(i_obs.sigmas() > 0) # filter out bad sigmas if (not keep_friedel_pairs_separate) : i_obs = i_obs.as_non_anomalous_array().map_to_asu() split_datasets = miller.split_unmerged( unmerged_indices=i_obs.indices(), unmerged_data=i_obs.data(), unmerged_sigmas=i_obs.sigmas()) data_1 = split_datasets.data_1 data_2 = split_datasets.data_2 cc = flex.linear_correlation(data_1, data_2).coefficient() (Note that if you use an unmerged Scalepack file, you may need to supply the unit cell information separately since the format is broken.) Note that this is already built in to the Miller array class, i.e. this would work: cc = i_obs.cc_one_half(anomalous_flag= keep_friedel_pairs_separate) or this: cc_anom = i_obs.cc_anom() although it is somewhat inefficient to keep doing the splitting. -Nat