parallel phenix.phaser on SGE cluster
Hello, I am trying to run phaser in a SGE cluster (qsub) using multiple proc (either phenix.phaser from the command line or phaser-MR from the GUI), but the jobs do not parallelize. When I run the phaser-MR locally (same executable) it does parallelize (multiple python threads). Any help , advice would be welcome. Best regards, Lionel
Lionel, It's not quite as easy as one would hope, and it has nothing to do with Phenix or Phaser. We make use of extensive multiprocessing across our cluster in our RAPD software used at the beamline to speed up the results for the user. I set this up a while ago, but from what I remember, you first have to setup a 'parallel environment' (PE) in SGE. The options maybe different depending on your version of SGE. We use 6.2u4. There are probably default PE's setup already but they might not have the correct parameters. I created a new one called 'smp' with the number of 'slots' set to the number of cores of your cluster, or some other lower limit. (If you set 'slots' to 12 and you submit 5 jobs requiring 4 slots each, only three will run, until one has finished and the resources are free.) The 'allocation rules' are set to '$pe_slots' so that the job can use only the cores on a single node. There are other rules that might be better for what you want to do. In our case, I setup different queues with different priorities that have access to specific PE's depending on the jobs that are getting submitted at the beamline. Your setup may not need this complexity. I would read through the huge manual for SGE for details or do a search on Oracle's website. When you submit the job, make sure you add 'qsub ... -pe smp 1-4 ...' which will tell SGE that your job will need 1-4 cores on a single node. You could also just specify a single integer (4 instead of 1-4) to request 4 slots. Obviously, you can modify these to your needs. After you submit the job, run 'qstat' and look at the last column labeled 'slots' to see how many slots are saved for the job. In your Phaser command include 'JOBS 4' to match your requested number of slots. I am not sure how much this speeds up a single Phaser job because there isn't a whole lot of code to parallelize in MR. Randy Read mentioned (either to me or the BB, I don't remember) that everything that could be parallelized in Phaser is done. On a side note, if you write code in Python, and you start a new multiprocessing.Process() it will automatically launch it on another core on the same node. You have to account for this when you request a specific number of slots during job submission, otherwise you could overload your cluster pretty quickly. Many programs will have an optimized number of slots to request and requesting more slots will not make it run any faster, but it will limit resources available for other jobs on the cluster. I assume Phaser is one of these programs. Jon -- Jonathan P. Schuermann, Ph. D. Beamline Scientist, NE-CAT Argonne National Laboratory, 436E 9700 S. Cass Ave. Argonne, IL 60439 Email: [email protected] Tel: (630) 252-0682 On 07/10/2013 09:16 AM, L. Costenaro (IBB) wrote:
Hello,
I am trying to run phaser in a SGE cluster (qsub) using multiple proc (either phenix.phaser from the command line or phaser-MR from the GUI), but the jobs do not parallelize. When I run the phaser-MR locally (same executable) it does parallelize (multiple python threads).
Any help , advice would be welcome.
Best regards, Lionel
_______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
On Wed, Jul 10, 2013 at 7:16 AM, L. Costenaro (IBB)
I am trying to run phaser in a SGE cluster (qsub) using multiple proc (either phenix.phaser from the command line or phaser-MR from the GUI), but the jobs do not parallelize. When I run the phaser-MR locally (same executable) it does parallelize (multiple python threads).
I'm a little confused - what parameters are you using and what behavior do you expect? As distributed, Phaser itself (i.e. the Phaser-MR or Phaser-EP GUIs, or phenix.phaser) will not run in parallel, regardless of whether you're running the jobs on a cluster or locally. If you want to use the OpenMP compiler extensions to parallelize single jobs, you need to build from source *without the Phenix GUI*, because OpenMP does not play nicely with Python multiprocessing (or vice-versa, I guess). I'm not sure what makes you think it's being run in parallel locally, but I suspect you're being confused by the fact that executing a job from the GUI starts a separate process. The MR programs that run in parallel are a) phaser.MRage, Gabor's automated system which is designed to try many different models (including different model preparation protocols), and b) my somewhat obsolete "parallel Phaser" GUI, which is a much more minimalist approach. These can both execute on a cluster, but the individual jobs will still be single-processor. (Currently both are still flagged as "alpha" features in the GUI, although MRage will hopefully be a fully supported program in the near future.) -Nat
Thanks for your replies Nat and Jon,
Given the answer of Nat, phenix.phaser is not (by default) OpenMP, which
explains the non-parallel behaviour of my phenix.phaser.
I think I have been misleading by the GUI behaviour that actually show 3
threads for python (1 for phenix and apparently -for some reason- 2 for the
phaser gui, one of those disappearing quickly).
As I want to run phaser with a few different model, but as quick as
possible, I switched to the ccp4 phaser that parralelize and run well on
the cluster.
Best regards,
Lionel
2013/7/10 Nathaniel Echols
On Wed, Jul 10, 2013 at 7:16 AM, L. Costenaro (IBB)
wrote: I am trying to run phaser in a SGE cluster (qsub) using multiple proc (either phenix.phaser from the command line or phaser-MR from the GUI), but the jobs do not parallelize. When I run the phaser-MR locally (same executable) it does parallelize (multiple python threads).
I'm a little confused - what parameters are you using and what behavior do you expect? As distributed, Phaser itself (i.e. the Phaser-MR or Phaser-EP GUIs, or phenix.phaser) will not run in parallel, regardless of whether you're running the jobs on a cluster or locally. If you want to use the OpenMP compiler extensions to parallelize single jobs, you need to build from source *without the Phenix GUI*, because OpenMP does not play nicely with Python multiprocessing (or vice-versa, I guess). I'm not sure what makes you think it's being run in parallel locally, but I suspect you're being confused by the fact that executing a job from the GUI starts a separate process.
The MR programs that run in parallel are a) phaser.MRage, Gabor's automated system which is designed to try many different models (including different model preparation protocols), and b) my somewhat obsolete "parallel Phaser" GUI, which is a much more minimalist approach. These can both execute on a cluster, but the individual jobs will still be single-processor. (Currently both are still flagged as "alpha" features in the GUI, although MRage will hopefully be a fully supported program in the near future.)
-Nat _______________________________________________ phenixbb mailing list [email protected] http://phenix-online.org/mailman/listinfo/phenixbb
participants (3)
-
Jon Schuermann
-
L. Costenaro (IBB)
-
Nathaniel Echols