General UNIX info

From Peyton Hall Documentation

Here you'll find more generalized UNIX information, such as shells, common programs, etc.

Help! What is this Unix thing?!

For those people who have never seen a Linux or Unix system before, it can be a little daunting. Commands you may have used in DOS (if you've even seen that!) such as 'cd' work the same, but other simple ones like 'dir' don't do anything, except maybe generate a "bash: dir: command not found" message. The following books could be helpful to get you started in the basics of a Unix system. I have not read these myself, so I can't attest to their usefulness; however, books by this publisher are generally excellent resources.

An added benefit of O'Reilly books is that Princeton has a site license for the Safari Online system. This means that if you're using a computer on the Princeton campus, you can read many O'Reilly books online, without borrowing a copy from the library or purchasing a copy. If there is a box on the left side of the page that says "Search on Safari" and contains a link for the book title, you can click on the link to start reading the book online.

Unix Groups

Here's some information about using groups to organize collaborative efforts (when more than one person might be doing the work in a particular place).

How groups are defined

As new users are being created, a new group is created at the same time, with the UID (user ID #) being equal to the GID (group ID #), and the group name the same as the username. Those are only for convenience, they really could be anything.

Since there is a group created with each user, and only that user is a member of that group, it's safe to set your umask to 002. This means all files created will have permissions "-rw-rw-r--". Why is that important? Because people in the same group can modify the file, so if the file is part of group "ourproject", then everyone in that group can modify the file. This is also handy if you want people in a certain group to be able to read the file, but not everyone ('chmod 640' will do this, "-rw-r-----").

How do I take advantage of groups

As new users are created, their umask is set to 002, meaning all files/directories created will be group read/write as well as owner read/write (but still read-only to the world). This means you don't have to think about it.

So all you have to do is run 'chgrp groupname filename' and 'chmod 664 filename' and anyone in the group you named will be able to edit the file. You still retain ownership of it, which is important to remember in the case of quotas and such.

What's with the SGID bit

SGID, or "Set Group ID", means just that: when you do something with a file/directory, set the group ID to X. In the case of directories, this is quite handy. Here's an example:

You are in a group called 'myproject' with 4 other people. You create a directory to hold all your work, and call it 'mydir'. Now, every time someone writes a file there, they have to remember to run 'chgrp myproject filename' or else others in the group won't be able to do anything with it, right? Wrong, that's where the SGID bit comes in handy.

Run 'chmod g+s mydir' to set the SGID bit on the directory (you could also add a leading 2 to the octal code, for example 'chmod 2775 mydir' to make it drwxrwsr-x). Now every file created in this directory will automatically be created with the group ID of the directory. So as long as the directory is owned by group 'myproject', all files created in it will be owned by 'myproject'. Note that copying files, tar, and move don't seem to always respect this setting, so you may want to check your files now and then. But this saves you from having to remember to run 'chgrp' on every file you create every time.

Autologout in tcsh

tcsh is compiled by default to automagically log you out after 60 minutes of inactivity. If you dislike this behavior, add the line 'set autologout=0' to your .cshrc file. That will disable the autologout feature.

Quick Sell Exit

So you're too lazy to type 'exit' or 'logout' to leave your shell? So am I. So do like I did, and in your .bashrc add the line 'ignoreeof=0', or to your .cshrc add 'set ignoreeof 1'. Then the Ctrl-D (^D) sequence will kill your shell instantly.

Running background and/or long-running jobs...

...on a single machine

Guidelines

There's a few guidelines for running a time-consuming program. First, these such jobs should only be submitted to servers, not workstations (the distinction being a workstation can and/or will have someone logged into the console of it, using the machine, while a server does not allow console logins). This is so that you don't make the machine unusable for people logged into the console trying to do work locally.

We have a list of available machines. Please note the 'Primary User' field, if it's not "dept", please contact the user in question before using that machine. If it is "dept" and you want to be sure, contact help. As of this writing, coma.astro.princeton.edu and armstrong.astro.princeton.edu are the two best shared resources in the department. If you want to be even nicer, have a look at Condor, which is also more useful if you have a lot of programs to run instead of a single long one.

PLEASE NOTE that you should NEVER submit jobs to a machine that is someone's workstation without asking them first! If you do, don't be surprised to find the job killed rather unceremoniously, and a rather unhappy user and sysadmin as well. Even if it's the middle of the night, there's things that run, and other people may have already scheduled something to run at that time, so you should always ask the primary user of the machine before starting.

Getting Started

Before running a job on any machine, check its status with the 'top' program:

3:07pm up 22 days, 16:56, 6 users, load average: 0.98, 0.74, 0.48
82 processes: 79 sleeping, 3 running, 0 zombie, 0 stopped
CPU states: 0.6% user, 1.8% system, 0.7% nice, 0.4% idle
Mem: 62956K av, 61372K used, 1584K free, 13440K shrd, 2012K buff
Swap: 409616K av, 28440K used, 381176K free 20724K cached

Here you see that the machine has 62.9MB of total memory. 61.3MB is used and 1.5MB is free. It is also using 28MB of swap (disk space used when you run out of physical memory). In this case the machine should NOT be used to submit any batch job more than a couple of megabytes in size. If you submit a large job to this machine you will kill interactive use and your code won't run particularly fast either.

Another factor in running jobs is how many CPUs the machine has. If it's a single CPU machine, and there's already a resource-hogging job running, then your job will not run very fast at all. However, this does not mean you can't run it, just that it won't perform as well as it might on a machine that's mostly idle. Again you can check this with the 'top' program to see what processes are taking up how much CPU time:

81 processes: 79 sleeping, 2 running, 0 zombie, 0 stopped
CPU states: 0.6% user, 1.8% system, 0.7% nice, 0.5% idle
Mem: 62956K av, 61720K used, 1236K free, 14324K shrd, 2060K buff
Swap: 409616K av, 28716K used, 380900K free 21452K cached
PID   USER PRI NI SIZE RSS  SHARE STAT LIB %CPU %MEM TIME COMMAND
18378 mway 10  19 1516 1516 872   R    0   92.7 2.4  2:43 lame
18471 mway 3   0  1128 1128 900   R    0   4.6  1.7  0:00 top
    1 root 0   0  124  72   52    S    0   0.0  0.1  0:46 init
    2 root 0   0  0    0    0     SW   0   0.0  0.0  0:06 kflushd

In this case you can see the user "mway" is running a job called 'lame', which is eating up 92.7% of the CPU's time. You might want to send the user a mail, asking how long their job is expected to run before you start another, as at this point a second intensive process will only serve to slow down both processes.

Play 'nice'

You should always use the 'nice' command to run your jobs, so that other users of the machine will not be adversely affected by your process. This sets the priority of the process down to a lower level, so that someone logging in can still use the machine. Also, if you run a job with a low priority on a workstation that other people login to (especially on the console), they will likely not even know your job is there, since it will happily give up cycles to them when needed, and run at times when the machine is more 'idle'. To use 'nice' on a job, assuming your program is called 'cpuhog', run it with 'nice -n 19 cpuhog'. If you wish to run the job, and not have to worry about losing its status should you logout of the terminal, use 'nohup' to force it to ignore the hangup signal (SIGHUP) generated from closing a shell, with 'nohup nice -n 19 cpuhog &'. This will append the program's output to the file "nohup.out", nice it to a priority of 19, and put it in the background. You can also change the name of the output file by redirecting it with 'nohup nice -n 19 cpuhog > cpuhog.output &'.

But what if you've already submitted the job, and decide you want to change its priority? Use the 'renice' command: 'renice -p (process ID) +19'. This will change the priority of the mentioned process ID to 19. You can also make a process less nice, by changing its priority to zero (0). This will make it run with normal priority. Only root can change a priority to below zero (which will make it run faster than other processes, since it has a higher scheduling priority).

...using Condor

Condor is a batch scheduling system which is perfect for single processor "serial" jobs. It allows you to submit jobs to be run, and will farm them out to idle processors in the department as it finds room.

Condor has it's own topic.

Compiling tips

GCC

The -O -O2 -O3 are optimizer flags and may help your code run MUCH faster. However you are strongly advised to read the gcc man pages when using higher valued optimizations since some optimizations may break your code.

The flags -Wreturn-type and -Wformat check to make sure a function returns something of the right type, and check the parameter list of printf and scanf statements against the format string.

-Wunused and -Wunitialized may also be useful, but will likely lead to unwanted warning messages.

-e is useful for f77 (extends the column limit past 78), and -C when debugging (array bounds checking).

'watch'ing output

Ever find yourself watching the status of a copy by typing 'df -m .' or 'du -ks .' over and over again? Save some trouble, and let Unix watch it for you. "watch -n1 'du -ks .'" will run the command 'du -ks .' once per second, and update the display with the output. Quite handy if you're repeatedly running a command because its output changes, and you need to be updated on it. The argument to the -n option is how often it should be re-run and updated in seconds, so you could make it -n 300 to update every 5 minutes. Watch a RAID system rebuild with 'watch -n1 cat /proc/mdstat'! Watch your files disappear after running a rm in the background with 'watch ls -l'! The possibilities are endless. Well, almost.

Stray ^Ms

Here's a couple way to get rid of stray ^Ms at the end-of-line in files that came to Unix from Macintosh or other systems. According to Simon Dedeo, tr '\015' '\012' < draft.tex > draftnew.tex works well. Robert chimed in with a Perlish way: perl -pi -e 'tr/\015/\012/' draft.tex or perl -pi -e 's/\015/\012/' draft.tex as it can be done in place (no need for draftnew.tex).

Trailing slash when using tab-completion of symlinks in bash

In older version of bash, when you used tab-completion to complete the name of a symlink which points to a directory, it used to append a slash ("/") to the end of the filename. It no longer does this. If you'd like to return to the old behavior, edit your ~/.inputrc file and add set mark-symlinked-directories on to it. Once edited, it will change at your next instantiation of bash. You can also just hit Control-x then Control-r to force bash (readline, actually) to reload the .inputrc file immediately.

General UNIX info

From Peyton Hall Documentation

Contents

Help! What is this Unix thing?!

More in-depth topics

SSH

Passwords

rsync

Tarballs

Local installs