Re: [cctbxbb] Scons for python3 released
That's 3 weeks from now. If we do a 3 week freeze for every DIALS release, plus for phenix releases etc, cctbx would be trapped in perma-frost :)
Feel free to start on it now - if it causes issues I can always pick a stable point in the past to branch off from.
-Markus
On 10 Oct 2017 6:34 pm, Billy Poon
Hi Markus,
We plan to do a Phenix release in December, so I would just upgrade scons
first and then work on making the build system work for both Python 2 and
3. The steps for using scons 3.0.0 would be,
1) Use Python 2 to build Python 2 version of CCTBX (no work)
2) Use Python 3 to build Python 2 version of CCTBX (some work, some
disentangling requried)
3) Use Python 3 to build Python 3 version of CCTBX (more work and will
probably take a while)
This would be done on macOS 10.9-10.12 (and probably 10.13), CentOS 5-7,
and probably Ubuntu 14.04 -16.04. I'll need Rob's help with Windows.
--
Billy K. Poon
Research Scientist, Molecular Biophysics and Integrated Bioimaging
Lawrence Berkeley National Laboratory
1 Cyclotron Road, M/S 33R0345
Berkeley, CA 94720
Tel: (510) 486-5709
Fax: (510) 486-5909
Web: https://phenix-online.org
On Tue, Oct 10, 2017 at 9:47 AM,
That's 3 weeks from now. If we do a 3 week freeze for every DIALS release, plus for phenix releases etc, cctbx would be trapped in perma-frost :) Feel free to start on it now - if it causes issues I can always pick a stable point in the past to branch off from.
-Markus
On 10 Oct 2017 6:34 pm, Billy Poon
wrote: Great! There's a DIALS release on October 31, so I can start making changes after that. -- Billy K. Poon Research Scientist, Molecular Biophysics and Integrated Bioimaging Lawrence Berkeley National Laboratory 1 Cyclotron Road, M/S 33R0345 Berkeley, CA 94720 Tel: (510) 486-5709 Fax: (510) 486-5909 Web: https://phenix-online.org
On Tue, Oct 10, 2017 at 1:54 AM, R. D. Oeffner
> wrote: http://scons.org/scons-300-is-available.html Perhaps start using this to easy upgrading cctbx to python 3
-- Robert Oeffner, Ph.D. Research Associate, The Read Group Department of Haematology, Cambridge Institute for Medical Research University of Cambridge Cambridge Biomedical Campus Wellcome Trust/MRC Building Hills Road Cambridge CB2 0XY
www.cimr.cam.ac.uk/investigators/read/index.html< http://www.cimr.cam.ac.uk/investigators/read/index.html> tel: +44(0)1223 763234
_______________________________________________ cctbxbb mailing list [email protected]mailto:[email protected] http://phenix-online.org/mailman/listinfo/cctbxbb
-- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
_______________________________________________ cctbxbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/cctbxbb
Hi All,
I spent a little bit of time looking at python3/libtbx so have some input
on this.
On Tue, Oct 10, 2017 at 6:16 PM, Billy Poon
1) Use Python 2 to build Python 2 version of CCTBX (no work)
This might not be as simple as "No Work" - cctbx is a few years behind on SCons versions (libtbx.scons --version suggests 2.2.0, from 2012) so there *might* be other issues upgrading the SCons version to 3.0, before trying python3. I also feel that SCons-Python3 is something of a red herring - the only thing that non-python3-SCons prevents is an 100% python3-only codebase, and unless the plan is to migrate the entire codebase, including all downstream dependencies (like dials) to python3-only in one massive step (probably impossible), everything would need to be dual 2/3 first, and only then a decision taken on deprecating 2.7 support. More usefully, outside of a small core of libtbx code, not much of the buildsystem files are bound to the main project so this shouldn't be too difficult. In fact, I've experimented with converting to CMake, and as one of the approaches I explored, I wrote a SCons-emulator that read and parsed the build *without* any scons/cctbx dependencies. To parse the entire "builder=dials" SCons-tree only required this subset of libtbx: https://github.com/ndevenish/read_scons/blob/master/tbx2cmake/import_env.py#... (Note: my general CMake-work works but isn't complete/ready/documented for general viewing, and still much resembles a hacky project, but I thought that this was sufficient to decouple the buildsystem is usefully illustrative of how simple the task might be) Regarding general Python3 conversion, it's definitely not "Just changing the print statements". I undertook a study in august to convert libtbx (being the core that *everything* depends on) to dual python2/3 and IIRC got most of the tests working in python3. It's a couple of months out-of-date, but is probably useful as a benchmark of the effort required. The repository links are: https://github.com/ndevenish/cctbx_project/tree/py3k-modernize https://github.com/ndevenish/cctbx_project/tree/py3k Probably best looked at with a graphical viewer to get a top-down view of the history. My approach was to separate manual/automatic changes as follows: 1. Remove legacy code/modules - e.g. old compatibility. The Optik removal came from this. We don't want to spend mental effort converting absorbed external libraries from a decade ago (see also e.g. pexpect, subprocess_with_fixes) 2. Make some manual fixes [Expanded as we go on] 3. Use futurize and modernize to update idioms ONLY e.g. remove pre-2.7 deprecated ways of working. Each operation was done is a separate commit (so that changes are more visible and I thought people would have less objection than to a massive code-change dump), and each commit ran the test suite for libtbx. Some of the 'fixers' in each tool are complementary. If there are any problems with tests or automatic conversion, then fix the problem, put the fix into step 2, then start again. This step should be entirely scriptable. I had 17 commits for separate fixes in this chain. This is the where the py3k-modernize branch stops, and should in principle be kept entirely safe to push back onto the python2-only repository. The next steps form the `py3k` branch (not being intended for direct pushing, is a little less organised - some of my changes could definitely be moved to step 2): 4. Run 'modernize' to convert the codebase to as much python2/3 as possible. This introduces the dependency on 'six' 5. Run tests, implement various fixes, repeat. This work was ongoing when I stopped working on the study. Various (non-exhaustive) problems found: - cStringIO isn't handled automatically, so these need to be fixed manually ( e.g. https://github.com/ndevenish/cctbx_project/commit/c793eb58acc37c60360dccbbbd... ) - Iterators needed to be fixed in cases where they were missed (next vs __next__) - Rounding. Python3 uses 'Bankers Rounding' and there are formatting tests where this changes the output. I didn't know enough about the exact desired result to know the best way to fix this - libtbx uses compiler.misc.mangle and I don't know why - this was always a private interface and was removed in 3. - Moving print statements to functions - there was several failed tests relating to the old python2-print-soft-spacing behaviour, which was removed. Not too difficult, but definitely causes - A couple of text/binary mode file issues, which seemed to be simple but may be more complicated than the test cases covered. I'd expect more issues with this in the format readers though. I evaluated both the futurize (using future library) and modernize (using the well known six library) tools, both being different approaches to 2to3, but for dual 2/3 codebases. I liked the approach of futurize to attempt to make code look as python3-idiomatic as-possible, but some of the performance implications were slightly opaque: e.g. libtbx makes heavy use of cStringIO (presumably for a good reason), and futurize converted all of these back to using StringIO in the Python2 case, so settled on modernize as I felt two different compatibility libraries would be messy. In either case, using the library means that you can identify exactly everywhere that needs to be removed when moving to python3 only. My conclusions: - Automatic tools are useful for the bulk of changes, but there are still lots of edge cases - The complexity means that a phased approach is *absolutely* necessary - starting by converting the core to 2/3 and only moving to 3 once everything downstream is converted.Trying to convert everything at once would likely mean months of feature-freeze. - A separate "Remove legacy" cleaning phase might be very useful, though obviously the domain of this could be endless - SCons is probably the least important of the conversion worries Nick
Hi Nick and others, That sounds like a great effort. A shame I didn't know about this. I have not had time to look in detail into your work but will nevertheless summarize my thoughts and work I have been doing lately in an effort to move CCTBX to python3. I am not sure why it would be a waste of time to use SCons3.0 with python3 as I think you are suggesting. To me it seems as a necessary step in creating a codebase that runs both on python2 and python3. Do I understand correctly that as long as CCTBX code is changed to comply with python3 and remain python2 compliant then such a codebase can be used in place of the current python2 only codebase for derived projects such as Dials and Phenix? Assuming this is the case I think it is worth focusing just on CCTBX only for now. My own attempt in porting CCTBX to python3 constitutes of the following steps: * Replace Scons2 with Scons3 * Update the subset of Boost sources to version 1.63 * Run futurize stage1 and stage2 on the CCTBX * Build base components like libtiff, hdf5, python3.6 + add-on modules) * Run bootstrap.py build with Python3.6 repeatedly and provide mock-up fixes to allow the build to continue. This work is almost near completion in the sense that the sources now can build but are unlikely to pass test due to the mock-up fixes which often constitutes of replacement of PyStringXXX functions with equivalent PyUnicodeXXX, PyBytestringXXX functions ignoring whether that is appropriate or not. These token fixes would also need to be guarded by #if PY_MAJOR_VERSION == 3 ... macros. The sources are available on https://github.com/cctbx/cctbx_project/tree/Python3 The next steps are less well defined. One approach would be to set up a build system that migrates python2 code to python3 using the futurize script, then builds CCTBX and runs test and presents build log files online as in http://cci-vm-6.lbl.gov:8010/one_line_per_build. With a hook to GitHub this could also be done on the fly as people commit code to CCTBX. This should encourage people to write code that runs on python2 as well as python3. Eventually once all tests for CCTBX pass we are done and can merge this codebase into the master branch. Robert On 17/10/2017 11:56, Nicholas Devenish wrote:
Hi All,
I spent a little bit of time looking at python3/libtbx so have some input on this.
On Tue, Oct 10, 2017 at 6:16 PM, Billy Poon
wrote: 1) Use Python 2 to build Python 2 version of CCTBX (no work)
This might not be as simple as "No Work" - cctbx is a few years behind on SCons versions (libtbx.scons --version suggests 2.2.0, from 2012) so there *might* be other issues upgrading the SCons version to 3.0, before trying python3.
I also feel that SCons-Python3 is something of a red herring - the only thing that non-python3-SCons prevents is an 100% python3-only codebase, and unless the plan is to migrate the entire codebase, including all downstream dependencies (like dials) to python3-only in one massive step (probably impossible), everything would need to be dual 2/3 first, and only then a decision taken on deprecating 2.7 support.
More usefully, outside of a small core of libtbx code, not much of the buildsystem files are bound to the main project so this shouldn't be too difficult. In fact, I've experimented with converting to CMake, and as one of the approaches I explored, I wrote a SCons-emulator that read and parsed the build *without* any scons/cctbx dependencies. To parse the entire "builder=dials" SCons-tree only required this subset of libtbx: https://github.com/ndevenish/read_scons/blob/master/tbx2cmake/import_env.py#... [1]
(Note: my general CMake-work works but isn't complete/ready/documented for general viewing, and still much resembles a hacky project, but I thought that this was sufficient to decouple the buildsystem is usefully illustrative of how simple the task might be)
Regarding general Python3 conversion, it's definitely not "Just changing the print statements". I undertook a study in august to convert libtbx (being the core that *everything* depends on) to dual python2/3 and IIRC got most of the tests working in python3. It's a couple of months out-of-date, but is probably useful as a benchmark of the effort required. The repository links are:
https://github.com/ndevenish/cctbx_project/tree/py3k-modernize [2]
https://github.com/ndevenish/cctbx_project/tree/py3k [3]
Probably best looked at with a graphical viewer to get a top-down view of the history. My approach was to separate manual/automatic changes as follows:
1. Remove legacy code/modules - e.g. old compatibility. The Optik removal came from this. We don't want to spend mental effort converting absorbed external libraries from a decade ago (see also e.g. pexpect, subprocess_with_fixes) 2. Make some manual fixes [Expanded as we go on] 3. Use futurize and modernize to update idioms ONLY e.g. remove pre-2.7 deprecated ways of working. Each operation was done is a separate commit (so that changes are more visible and I thought people would have less objection than to a massive code-change dump), and each commit ran the test suite for libtbx. Some of the 'fixers' in each tool are complementary. If there are any problems with tests or automatic conversion, then fix the problem, put the fix into step 2, then start again. This step should be entirely scriptable. I had 17 commits for separate fixes in this chain.
This is the where the py3k-modernize branch stops, and should in principle be kept entirely safe to push back onto the python2-only repository. The next steps form the `py3k` branch (not being intended for direct pushing, is a little less organised - some of my changes could definitely be moved to step 2):
4. Run 'modernize' to convert the codebase to as much python2/3 as possible. This introduces the dependency on 'six' 5. Run tests, implement various fixes, repeat. This work was ongoing when I stopped working on the study.
Various (non-exhaustive) problems found: - cStringIO isn't handled automatically, so these need to be fixed manually ( e.g. https://github.com/ndevenish/cctbx_project/commit/c793eb58acc37c60360dccbbbd... [4] )
- Iterators needed to be fixed in cases where they were missed (next vs __next__) - Rounding. Python3 uses 'Bankers Rounding' and there are formatting tests where this changes the output. I didn't know enough about the exact desired result to know the best way to fix this - libtbx uses compiler.misc.mangle and I don't know why - this was always a private interface and was removed in 3.
- Moving print statements to functions - there was several failed tests relating to the old python2-print-soft-spacing behaviour, which was removed. Not too difficult, but definitely causes - A couple of text/binary mode file issues, which seemed to be simple but may be more complicated than the test cases covered. I'd expect more issues with this in the format readers though.
I evaluated both the futurize (using future library) and modernize (using the well known six library) tools, both being different approaches to 2to3, but for dual 2/3 codebases. I liked the approach of futurize to attempt to make code look as python3-idiomatic as-possible, but some of the performance implications were slightly opaque: e.g. libtbx makes heavy use of cStringIO (presumably for a good reason), and futurize converted all of these back to using StringIO in the Python2 case, so settled on modernize as I felt two different compatibility libraries would be messy. In either case, using the library means that you can identify exactly everywhere that needs to be removed when moving to python3 only.
My conclusions: - Automatic tools are useful for the bulk of changes, but there are still lots of edge cases - The complexity means that a phased approach is *absolutely* necessary - starting by converting the core to 2/3 and only moving to 3 once everything downstream is converted.Trying to convert everything at once would likely mean months of feature-freeze. - A separate "Remove legacy" cleaning phase might be very useful, though obviously the domain of this could be endless - SCons is probably the least important of the conversion worries
Nick
Links: ------ [1] https://github.com/ndevenish/read_scons/blob/master/tbx2cmake/import_env.py#... [2] https://github.com/ndevenish/cctbx_project/tree/py3k-modernize [3] https://github.com/ndevenish/cctbx_project/tree/py3k [4] https://github.com/ndevenish/cctbx_project/commit/c793eb58acc37c60360dccbbbd...
_______________________________________________ cctbxbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/cctbxbb
Hi Robert
I think having more than one person independently look at the p3 problem is no bad thing - also with the geography it would seem perfectly possible for you / Nick to meet up and compare notes on this - it’s certainly something I would support.
Clearly there are a lot of things which could get caught up in the net with the p3 update - for example build system discussions, cleaning out cruft that is in there to support python 2.3 etc… however I did not read that Nick thought SCons3 was a waste of time - I think he was getting at the point that this is part of the move, and that there is also a lot of related work. Also that having p2 / p3 support for at least a transition rather than the “full Brexit” of no longer supporting p2 beyond the first date where p3 works would be good. I could imagine this transition period being O(1000 days) even.
I think the migration process is going to be a complex one, but doable. One thing I think we do need is to make sure that the code base as pushed by developers remains compatible with p2 and p3 - so perhaps extending find_clutter to check for things which only work in one or the other? Then developers would learn the tricks themselves and (ideally) not push p2-only or p3-only code, at least until post-transition. This I would compare with the svn to git move, which caused some grumbling and a little confusion but was ultimately successful…
Hope this is constructive, cheerio
Graeme
On 17 Oct 2017, at 13:50, R.D. Oeffner
Hi,
Just to add to this. I think Graeme's find_clutter idea has merit, which could certainly check for
from __future__ import absolute_import, print_statement
which would cover a lot of ground.
I found this also a worthwhile to read through: http://python-future.org/compatible_idioms.html
Especially handling things like xrange vs range should be done with a bit of thinking and when ancient code needs to be touched then it also presents an opportunity to make it clearer what it actually does. For example, the very first commit on the Python3 branch changed xrange->range here, and I wondered... https://github.com/cctbx/cctbx_project/commit/f10fd505841de372098bca83c40fc6... (untested)
Finally, I think doing all 2-3-compatible conversions in a branch, for example print -> print() as it is happening now, will be a nightmare to merge later because you will be touching large portions of a large numbers of files, but other development does not stop. And, let's be honest, nobody will review a 100k LoC change set anyway.
I would suggest we do those refactoring changes directly on master. A single type of change (ie. print->print()) on a per-file basis along with "from __future__ import print_statement" in say 30 files within one directory tree per commit? Much more manageable.
Oh, and we need the future module installed from bootstrap.
-Markus
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of [email protected]
Sent: 17 October 2017 15:20
To: [email protected]
Subject: Re: [cctbxbb] Scons for python3 released
Hi Robert
I think having more than one person independently look at the p3 problem is no bad thing - also with the geography it would seem perfectly possible for you / Nick to meet up and compare notes on this - it’s certainly something I would support.
Clearly there are a lot of things which could get caught up in the net with the p3 update - for example build system discussions, cleaning out cruft that is in there to support python 2.3 etc… however I did not read that Nick thought SCons3 was a waste of time - I think he was getting at the point that this is part of the move, and that there is also a lot of related work. Also that having p2 / p3 support for at least a transition rather than the “full Brexit” of no longer supporting p2 beyond the first date where p3 works would be good. I could imagine this transition period being O(1000 days) even.
I think the migration process is going to be a complex one, but doable. One thing I think we do need is to make sure that the code base as pushed by developers remains compatible with p2 and p3 - so perhaps extending find_clutter to check for things which only work in one or the other? Then developers would learn the tricks themselves and (ideally) not push p2-only or p3-only code, at least until post-transition. This I would compare with the svn to git move, which caused some grumbling and a little confusion but was ultimately successful…
Hope this is constructive, cheerio
Graeme
On 17 Oct 2017, at 13:50, R.D. Oeffner
Hi all,
I agree with everything and think that the Python 2/3 approach discussed
during the conference call and this message thread is the least disruptive
approach. And some additional points about testing in no particular order,
1) Updating SCons - I also tried building with SCons 3.0 and I just needed
to update some SConscript files. I only did this on CentOS 6, but I'll test
other operating systems before checking it in. I think anything affecting
building should be tested across multiple operating systems before check-in.
2) I think updating Python files to make them 2-3 compatible can be done by
modules. In theory, they're self-contained, so I agree that fixing things
in parts is more manageable. Since these fixes only affect Python files,
there should be fewer OS-specific issues, so we can catch those in the
nightly builds.
3) C++ changes - Folks in Berkeley can coordinate on this since we have
several people who know C++. We can also collaborate with the Diamond folks
on a separate branch. These fixes would need to be tested across multiple
OSes and compilers. Also, I would like these changes to be C++11 compliant.
Currently, CCTBX and Phenix (including DIALS) can be built with the C++11
standard, so let's keep it that way.
4) Boost - I also tested Boost 1.63 and there seems to be several
Boost-related test errors. These would need to be sorted out before
upgrading. Is there a particular feature in 1.65 that is very useful? Since
Boost is so fundamental, I would be more conservative on making changes to
it. Again, this would need to be tested on multiple OSes and compilers.
5) Reorganizing code - I support this as well, but we have to be careful
that we do not remove functionality. Also, if code is reorganized,
documentation should be added so that the Sphinx documentation can be more
complete.
Since we plan on a Phenix release in December, let's work on the Boost and
C++11 changes in January, but Python changes can go ahead as long as
nothing new breaks.
--
Billy K. Poon
Research Scientist, Molecular Biophysics and Integrated Bioimaging
Lawrence Berkeley National Laboratory
1 Cyclotron Road, M/S 33R0345
Berkeley, CA 94720
Tel: (510) 486-5709
Fax: (510) 486-5909
Web: https://phenix-online.org
On Wed, Oct 18, 2017 at 4:12 AM,
Hi,
Just to add to this. I think Graeme's find_clutter idea has merit, which could certainly check for from __future__ import absolute_import, print_statement which would cover a lot of ground.
I found this also a worthwhile to read through: http://python-future.org/ compatible_idioms.html Especially handling things like xrange vs range should be done with a bit of thinking and when ancient code needs to be touched then it also presents an opportunity to make it clearer what it actually does. For example, the very first commit on the Python3 branch changed xrange->range here, and I wondered... https://github.com/cctbx/cctbx_project/commit/ f10fd505841de372098bca83c40fc62211439f86 (untested)
Finally, I think doing all 2-3-compatible conversions in a branch, for example print -> print() as it is happening now, will be a nightmare to merge later because you will be touching large portions of a large numbers of files, but other development does not stop. And, let's be honest, nobody will review a 100k LoC change set anyway. I would suggest we do those refactoring changes directly on master. A single type of change (ie. print->print()) on a per-file basis along with "from __future__ import print_statement" in say 30 files within one directory tree per commit? Much more manageable.
Oh, and we need the future module installed from bootstrap.
-Markus
-----Original Message----- From: [email protected] [mailto:cctbxbb-bounces@ phenix-online.org] On Behalf Of [email protected] Sent: 17 October 2017 15:20 To: [email protected] Subject: Re: [cctbxbb] Scons for python3 released
Hi Robert
I think having more than one person independently look at the p3 problem is no bad thing - also with the geography it would seem perfectly possible for you / Nick to meet up and compare notes on this - it’s certainly something I would support.
Clearly there are a lot of things which could get caught up in the net with the p3 update - for example build system discussions, cleaning out cruft that is in there to support python 2.3 etc… however I did not read that Nick thought SCons3 was a waste of time - I think he was getting at the point that this is part of the move, and that there is also a lot of related work. Also that having p2 / p3 support for at least a transition rather than the “full Brexit” of no longer supporting p2 beyond the first date where p3 works would be good. I could imagine this transition period being O(1000 days) even.
I think the migration process is going to be a complex one, but doable. One thing I think we do need is to make sure that the code base as pushed by developers remains compatible with p2 and p3 - so perhaps extending find_clutter to check for things which only work in one or the other? Then developers would learn the tricks themselves and (ideally) not push p2-only or p3-only code, at least until post-transition. This I would compare with the svn to git move, which caused some grumbling and a little confusion but was ultimately successful…
Hope this is constructive, cheerio
Graeme
On 17 Oct 2017, at 13:50, R.D. Oeffner
> wrote: Hi Nick and others,
That sounds like a great effort. A shame I didn't know about this. I have not had time to look in detail into your work but will nevertheless summarize my thoughts and work I have been doing lately in an effort to move CCTBX to python3.
I am not sure why it would be a waste of time to use SCons3.0 with python3 as I think you are suggesting. To me it seems as a necessary step in creating a codebase that runs both on python2 and python3. Do I understand correctly that as long as CCTBX code is changed to comply with python3 and remain python2 compliant then such a codebase can be used in place of the current python2 only codebase for derived projects such as Dials and Phenix? Assuming this is the case I think it is worth focusing just on CCTBX only for now.
My own attempt in porting CCTBX to python3 constitutes of the following steps: * Replace Scons2 with Scons3 * Update the subset of Boost sources to version 1.63 * Run futurize stage1 and stage2 on the CCTBX * Build base components like libtiff, hdf5, python3.6 + add-on modules) * Run bootstrap.py build with Python3.6 repeatedly and provide mock-up fixes to allow the build to continue.
This work is almost near completion in the sense that the sources now can build but are unlikely to pass test due to the mock-up fixes which often constitutes of replacement of PyStringXXX functions with equivalent PyUnicodeXXX, PyBytestringXXX functions ignoring whether that is appropriate or not. These token fixes would also need to be guarded by #if PY_MAJOR_VERSION == 3 ... macros.
The sources are available on https://github.com/cctbx/ cctbx_project/tree/Python3
The next steps are less well defined. One approach would be to set up a build system that migrates python2 code to python3 using the futurize script, then builds CCTBX and runs test and presents build log files online as in http://cci-vm-6.lbl.gov:8010/one_line_per_build. With a hook to GitHub this could also be done on the fly as people commit code to CCTBX. This should encourage people to write code that runs on python2 as well as python3. Eventually once all tests for CCTBX pass we are done and can merge this codebase into the master branch.
Robert
On 17/10/2017 11:56, Nicholas Devenish wrote: Hi All, I spent a little bit of time looking at python3/libtbx so have some input on this. On Tue, Oct 10, 2017 at 6:16 PM, Billy Poon
> wrote: 1) Use Python 2 to build Python 2 version of CCTBX (no work) This might not be as simple as "No Work" - cctbx is a few years behind on SCons versions (libtbx.scons --version suggests 2.2.0, from 2012) so there *might* be other issues upgrading the SCons version to 3.0, before trying python3. I also feel that SCons-Python3 is something of a red herring - the only thing that non-python3-SCons prevents is an 100% python3-only codebase, and unless the plan is to migrate the entire codebase, including all downstream dependencies (like dials) to python3-only in one massive step (probably impossible), everything would need to be dual 2/3 first, and only then a decision taken on deprecating 2.7 support. More usefully, outside of a small core of libtbx code, not much of the buildsystem files are bound to the main project so this shouldn't be too difficult. In fact, I've experimented with converting to CMake, and as one of the approaches I explored, I wrote a SCons-emulator that read and parsed the build *without* any scons/cctbx dependencies. To parse the entire "builder=dials" SCons-tree only required this subset of libtbx: https://github.com/ndevenish/read_scons/blob/master/ tbx2cmake/import_env.py#L202-L235 [1] (Note: my general CMake-work works but isn't complete/ready/documented for general viewing, and still much resembles a hacky project, but I thought that this was sufficient to decouple the buildsystem is usefully illustrative of how simple the task might be) Regarding general Python3 conversion, it's definitely not "Just changing the print statements". I undertook a study in august to convert libtbx (being the core that *everything* depends on) to dual python2/3 and IIRC got most of the tests working in python3. It's a couple of months out-of-date, but is probably useful as a benchmark of the effort required. The repository links are: https://github.com/ndevenish/cctbx_project/tree/py3k-modernize [2] https://github.com/ndevenish/cctbx_project/tree/py3k [3] Probably best looked at with a graphical viewer to get a top-down view of the history. My approach was to separate manual/automatic changes as follows: 1. Remove legacy code/modules - e.g. old compatibility. The Optik removal came from this. We don't want to spend mental effort converting absorbed external libraries from a decade ago (see also e.g. pexpect, subprocess_with_fixes) 2. Make some manual fixes [Expanded as we go on] 3. Use futurize and modernize to update idioms ONLY e.g. remove pre-2.7 deprecated ways of working. Each operation was done is a separate commit (so that changes are more visible and I thought people would have less objection than to a massive code-change dump), and each commit ran the test suite for libtbx. Some of the 'fixers' in each tool are complementary. If there are any problems with tests or automatic conversion, then fix the problem, put the fix into step 2, then start again. This step should be entirely scriptable. I had 17 commits for separate fixes in this chain. This is the where the py3k-modernize branch stops, and should in principle be kept entirely safe to push back onto the python2-only repository. The next steps form the `py3k` branch (not being intended for direct pushing, is a little less organised - some of my changes could definitely be moved to step 2): 4. Run 'modernize' to convert the codebase to as much python2/3 as possible. This introduces the dependency on 'six' 5. Run tests, implement various fixes, repeat. This work was ongoing when I stopped working on the study. Various (non-exhaustive) problems found: - cStringIO isn't handled automatically, so these need to be fixed manually ( e.g. https://github.com/ndevenish/cctbx_project/commit/ c793eb58acc37c60360dccbbbdd5205504ec3f1a [4] ) - Iterators needed to be fixed in cases where they were missed (next vs __next__) - Rounding. Python3 uses 'Bankers Rounding' and there are formatting tests where this changes the output. I didn't know enough about the exact desired result to know the best way to fix this - libtbx uses compiler.misc.mangle and I don't know why - this was always a private interface and was removed in 3. - Moving print statements to functions - there was several failed tests relating to the old python2-print-soft-spacing behaviour, which was removed. Not too difficult, but definitely causes - A couple of text/binary mode file issues, which seemed to be simple but may be more complicated than the test cases covered. I'd expect more issues with this in the format readers though. I evaluated both the futurize (using future library) and modernize (using the well known six library) tools, both being different approaches to 2to3, but for dual 2/3 codebases. I liked the approach of futurize to attempt to make code look as python3-idiomatic as-possible, but some of the performance implications were slightly opaque: e.g. libtbx makes heavy use of cStringIO (presumably for a good reason), and futurize converted all of these back to using StringIO in the Python2 case, so settled on modernize as I felt two different compatibility libraries would be messy. In either case, using the library means that you can identify exactly everywhere that needs to be removed when moving to python3 only. My conclusions: - Automatic tools are useful for the bulk of changes, but there are still lots of edge cases - The complexity means that a phased approach is *absolutely* necessary - starting by converting the core to 2/3 and only moving to 3 once everything downstream is converted.Trying to convert everything at once would likely mean months of feature-freeze. - A separate "Remove legacy" cleaning phase might be very useful, though obviously the domain of this could be endless - SCons is probably the least important of the conversion worries Nick Links: ------ [1] https://github.com/ndevenish/read_scons/blob/master/ tbx2cmake/import_env.py#L202-L235 [2] https://github.com/ndevenish/cctbx_project/tree/py3k-modernize [3] https://github.com/ndevenish/cctbx_project/tree/py3k [4] https://github.com/ndevenish/cctbx_project/commit/ c793eb58acc37c60360dccbbbdd5205504ec3f1a _______________________________________________ cctbxbb mailing list [email protected]mailto:[email protected] http://phenix-online.org/mailman/listinfo/cctbxbb _______________________________________________ cctbxbb mailing list [email protected]mailto:[email protected] http://phenix-online.org/mailman/listinfo/cctbxbb
-- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom
_______________________________________________ cctbxbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/cctbxbb
_______________________________________________ cctbxbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/cctbxbb
participants (5)
-
Billy Poon
-
Graeme.Winter@diamond.ac.uk
-
markus.gerstel@diamond.ac.uk
-
Nicholas Devenish
-
R.D. Oeffner