Hi James, thanks for email and sharing your observations!
Greetings all, and I hope this little observation helps improve things somehow.
I did not expect this result, but there it is. My MolProbity score goes from 0.7 to 1.9 after a run of phenix.geometry_minimization
I started with an AMBER-minimized model (based on 1aho), and that got me my best MolProbity score so far (0.7). But, even with hydrogens and waters removed the geometry_minimization run increases the clashscore from 0 to 3.1 and Ramachandran favored drops from 98% to 88% with one residue reaching the outlier level.
It is not a secret that 'standard geometry restraints' used in Phenix and alike (read Refmac, etc) are very simplistic. They are not aware of main chain preferential conformations (Ramachandran plot), favorable side chain rotamer conformations. They don't even have any electrostatic/attraction terms -- only anti-bumping repulsion! Standard geometry restraints won't like any NCI (non-covalent interaction) and likely will make interacting atoms break apart rather than stay close together interacting. With this in mind any high quality (high-resolution) atomic model or the one optimized using sufficiently high-level QM is going to have a more realistic geometry than the result of geometry regularization against very simplistic restraints target. An example: https://journals.iucr.org/d/issues/2020/12/00/lp5048/lp5048.pdf and previous papers on the topic.
Just for comparison, with refmac5 in "refi type ideal" mode I see the MolProbity rise to 1.13, but Clashscore remains zero, some Ramas go from favored to allowed, but none rise to the level of outliers.
I believe this is because of the nature of minimizer used. Refmac uses 2nd derivative based one, which in a nutshell means it can move the model much less (just a bit in vicinity of a local minimum) than any program that uses gradients only (like Phenix).
Files and logs here: https://bl831.als.lbl.gov/~jamesh/bugreports/phenixmin_070721.tgz
I suspect this might have something to do with library values for main-chain bonds and angles? They do seem to vary between programs. Phenix having the shortest CA-CA distance by up to 0.08 A. After running thorough minimization on a poly-A peptide I get: bond amber refmac phenix shelxl Stryer C-N 1.330 1.339 1.331 1.325 1.32 N-CA 1.462 1.482 1.455 1.454 1.47 CA-C 1.542 1.534 1.521 1.546 1.53 CA-CA 3.862 3.874 3.794 3.854
So, which one is "right" ?
I'd say they are all the same, within their 'sigmas' which are from memory about 0.02A: elbow.where_is_that_cif_file phe All the best! Pavel