Cluster Computing

From Peyton Hall Documentation

(Difference between revisions)
Jump to: navigation, search
m
Current revision (13:11, 15 October 2015) (view source)
(Hydra is no more, hasn't been for over three years.)
 
(11 intermediate revisions not shown.)
Line 1: Line 1:
-
{{inuse|the afternoon --[[User:Huston|huston]] 12:26, 10 May 2007 (EDT)}}
+
No, we're not talking about star clusters - not this time, anyway.  We're talking about Beowulf clusters, or to use a more general term "computer clusters". Not the same as what OIT calls their "computer labs".
-
{{oldfaq|NUM=83}}
+
-
This section aims to cover topics that pertain to using any of the computer clusters in Peyton Hall. Note that when I say cluster I don't mean the same as OIT calls "computer labs", but instead a "Beowulf cluster", or group of machines on a private network which attach to a single "master node", which sends jobs out to the "drones" to be run.
+
-
More information about Beowulf clusters in [[What is a cluster?]]
 
-
For status information on the clusters in Peyton Hall, see [[Cluster Status]].  
+
== Introduction ==
 +
So what is a cluster?  It's a set of machines, usually (but not necessarily) on a private network which is attached to a dual-homed "master node".  Dual-homed means it sits on two networks at the same time, and may even act as a router between the two.  This master node can allow logins, and is where you setup your large parallel jobs.  Once the job is submitted, software on the master connects to the drones and runs the job there.  This software is designed to fairly execute programs when there is available resources for them, and make sure that someone doesn't start a job on the same nodes that you're using for your processes so that everyone's programs get fair share of the machine.
-
For usage information, please see [[Usage Policies]].
 
-
*[[Cluster Status]]
+
== Research Computing ==
-
*[[What is a cluster?]]
+
Research Computing maintains many HPC clusters.  Information on them is available at [http://www.princeton.edu/researchcomputing/ their website].  There's also a page for [http://www.princeton.edu/researchcomputing/access/ prospective users] describing what is needed to get an account there.
-
*[[Getting access to the cluster]]
+
 
-
*[[Using PBS to submit jobs]]
+
 
-
*[[Cluster limits]]
+
== Hydra ==
-
*[[Using MPICH]]
+
We used to have a cluster of our own named "Hydra", however it was finally decommissioned in 2012 after having been converted to a general Condor cluster and slowly dismantled.  The head node lives on (in spirit, at least) as the controller for our Condor infrastructure, which lets you submit jobs to run during idle cycles of desktops and other machines around the department.
-
*[[Speeding Up 'make']]
+
 
-
*[[More PBS information]]
+
 
-
*[[PBS/ssh-agent issue]]
+
=== Submitting jobs to Hydra ===
-
*[[Sample PBS jobscript]]
+
Hydra uses [[Condor]] for job management.  You'll find information about how to use it in the [[Condor|Condor article]].
-
*[[Usage Policies]]
+
 
-
*[[PBS/Maui scheduling]]
+
 
-
*[[Redirecting Output in job scripts]]
+
[[Category:Cluster Computing]]
-
*[[Condor]]
+
-
*[[Some helpful Maui commands]]
+

Current revision

No, we're not talking about star clusters - not this time, anyway. We're talking about Beowulf clusters, or to use a more general term "computer clusters". Not the same as what OIT calls their "computer labs".


Contents

Introduction

So what is a cluster? It's a set of machines, usually (but not necessarily) on a private network which is attached to a dual-homed "master node". Dual-homed means it sits on two networks at the same time, and may even act as a router between the two. This master node can allow logins, and is where you setup your large parallel jobs. Once the job is submitted, software on the master connects to the drones and runs the job there. This software is designed to fairly execute programs when there is available resources for them, and make sure that someone doesn't start a job on the same nodes that you're using for your processes so that everyone's programs get fair share of the machine.


Research Computing

Research Computing maintains many HPC clusters. Information on them is available at their website. There's also a page for prospective users describing what is needed to get an account there.


Hydra

We used to have a cluster of our own named "Hydra", however it was finally decommissioned in 2012 after having been converted to a general Condor cluster and slowly dismantled. The head node lives on (in spirit, at least) as the controller for our Condor infrastructure, which lets you submit jobs to run during idle cycles of desktops and other machines around the department.


Submitting jobs to Hydra

Hydra uses Condor for job management. You'll find information about how to use it in the Condor article.

Personal tools