HI Rob I think this is true … sometimes It sets up the qsub every time, but does not always use it - at least it works on my MacBook with no qsub ;-) That said, the question remains why exception reports are bad for parallel map… we *are* using preserve_exception_message… Cheers Graeme
On 3 Apr 2018, at 13:20, Dr. Robert Oeffner
wrote: Hi Graeme,
Just had a look at the code in dials/util/mp.py. It seems that you are using parallel_map() on a cluster using qsub. Unfortunately multi_core_run() is not designed for that. It only runs on a single multi core CPU PC.
Sorry,
Rob
On 03/04/2018 12:44, [email protected] wrote:
Thanks Rob, I could not dig out the thread (and the mail list thing does not have search that I could find) I’ll talk to the crew about swapping this out for dials.* - though is possibly quite a big change? Cheers Graeme On 3 Apr 2018, at 12:26, Dr. Robert Oeffner
mailto:[email protected]> wrote: Hi Graeme, I recall we've been here before, http://phenix-online.org/pipermail/cctbxbb/2017-December/001807.html I believe the solution is to use easy_mp.multi_core_run() instead of easy_mp.parallel_map(). The first function preserves stack traces of individual process, unlike easy_mp.parallel_map(). Regards, Rob On 03/04/2018 07:16, [email protected]mailto:[email protected] wrote: Folks, Following up on user reports again of errors within easy_mp - all that gets logged is “something went wrong” i.e. Using multiprocessing with 10 parallel job(s) Traceback (most recent call last): File "/home/user/bin/dials-installer/build/../modules/dials/command_line/integrate.py", line 613, in <module> halraiser(e) File "/home/user/bin/dials-installer/build/../modules/dials/command_line/integrate.py", line 611, in <module> script.run() File "/home/user/bin/dials-installer/build/../modules/dials/command_line/integrate.py", line 341, in run reflections = integrator.integrate() File "/home/user/bin/dials-installer/modules/dials/algorithms/integration/integrator.py", line 1214, in integrate self.reflections, _, time_info = processor.process() File "/home/user/bin/dials-installer/modules/dials/algorithms/integration/processor.py", line 271, in process preserve_exception_message = True) File "/home/user/bin/dials-installer/modules/dials/util/mp.py", line 171, in multi_node_parallel_map preserve_exception_message = preserve_exception_message) File "/home/user/bin/dials-installer/modules/dials/util/mp.py", line 53, in parallel_map preserve_exception_message = preserve_exception_message) File "/home/user/bin/dials-installer/modules/cctbx_project/libtbx/easy_mp.py", line 627, in parallel_map result = res() File "/home/user/bin/dials-installer/modules/cctbx_project/libtbx/scheduling/result.py", line 119, in __call__ self.traceback( exception = self.exception() ) File "/home/user/bin/dials-installer/modules/cctbx_project/libtbx/scheduling/stacktrace.py", line 115, in __call__ self.raise_handler( exception = exception ) File "/home/user/bin/dials-installer/modules/cctbx_project/libtbx/scheduling/mainthread.py", line 100, in poll value = target( *args, **kwargs ) File "/home/user/bin/dials-installer/modules/dials/util/mp.py", line 91, in __call__ preserve_exception_message = self.preserve_exception_message) File "/home/user/bin/dials-installer/modules/cctbx_project/libtbx/easy_mp.py", line 627, in parallel_map result = res() File "/home/user/bin/dials-installer/modules/cctbx_project/libtbx/scheduling/result.py", line 119, in __call__ self.traceback( exception = self.exception() ) File "/home/user/bin/dials-installer/modules/cctbx_project/libtbx/scheduling/stacktrace.py", line 86, in __call__ raise exception RuntimeError: Please report this error to [email protected]mailto:[email protected]: exit code = -9 I forget why it was decided that keeping the proper stack trace was a bad thing, but could this be revisited? It would greatly help to see it in the output of the program (if as is the case here I do not have the user data) My email-fu is not strong enough to dig out the previous conversation Cheers Graeme -- Robert Oeffner, Ph.D. Research Associate, The Read Group Department of Haematology, Cambridge Institute for Medical Research University of Cambridge Cambridge Biomedical Campus Wellcome Trust/MRC Building Hills Road Cambridge CB2 0XY www.cimr.cam.ac.uk/investigators/read/index.htmlhttp://www.cimr.cam.ac.uk/investigators/read/index.html tel: +44(0)1223 763234 -- Robert Oeffner, Ph.D. Research Associate, The Read Group Department of Haematology, Cambridge Institute for Medical Research University of Cambridge Cambridge Biomedical Campus Wellcome Trust/MRC Building Hills Road Cambridge CB2 0XY
www.cimr.cam.ac.uk/investigators/read/index.html tel: +44(0)1223 763234
-- This e-mail and any attachments may contain confidential, copyright and or privileged material, and are for the use of the intended addressee only. If you are not the intended addressee or an authorised recipient of the addressee please notify us of receipt by returning the e-mail and do not use, copy, retain, distribute or disclose the information in or attached to the e-mail. Any opinions expressed within this e-mail are those of the individual and not necessarily of Diamond Light Source Ltd. Diamond Light Source Ltd. cannot guarantee that this e-mail or any attachments are free from viruses and we cannot accept liability for any damage which you may sustain as a result of software viruses which may be transmitted in or with the message. Diamond Light Source Limited (company no. 4375679). Registered in England and Wales with its registered office at Diamond House, Harwell Science and Innovation Campus, Didcot, Oxfordshire, OX11 0DE, United Kingdom