The SDSS for new members: A guide to the perplexed


Contents:

  • Learning about the data
  • Accessing the data
  • Getting involved in the collaboration

    This web guide is meant for members of institutions that have just joined the SDSS, or who are at long-term SDSS institutions but are just now getting involved in SDSS. It attempts to answer some basic questions about getting started with the project, accessing the data, and taking part in the collaboration. Questions and/or suggestions for further content in this directory should be directed to Michael Strauss. You can also ask Michael about updating the public SDSS website to reflect your institutional involvement.

    Access to links: Many of the links below are not public. There are two ways to get access to them.

  • You can give the IP address of your computer to Dave Johnston (for the Princeton-based addresses) or the FNAL webmaster (for the Fermilab-based addresses).
  • You can use the "standard" username and password. Ask any SDSS old-timer for these.

    It is useful when thinking about the SDSS telescope and the collaboration that this is not an observatory, in the sense of a telescope facility to which one can apply for observing time. It is a survey facility, which is focussed wholly on the goal of the SDSS itself. It has happened that opportunities have arisen for considering new aspects of the observing strategy; these are formally vetted by the SDSS Advisory Council.

    The survey itself is the product of the whole collaboration, a notion which drives many of the policies outlined in detail below. Thus, for example, if you find a particularly unusual type of star in the database, it is not "your" discovery alone, but really a product of the collaboration as a whole.

    The present document is focussed largely on SDSS-I. Information on the proposed SDSS-II will be folded in over the next months.

    Learning about the SDSS and the data

    The SDSS is a fairly complicated system, and the data that comes from it is similarly complicated. The following is a guide of what to read.

  • Survey Summary: The survey itself is described in capsule form in the paper by York et al. (2000). This is in fact an introduction to the Project Book (also known as the Black Book), which gives a fairly complete description of the technical aspects of the project as of late 1996.

  • Data Processing and Formats: A fairly complete description of the data processing and the formats of the outputs can be found in the EDR paper by Stoughton et al. (2002). This remains the best description available of the photometric and spectroscopic pipelines, as well as target selection algorithms for categories other than quasars, galaxies, and LRGs.

  • Technical Papers: Many of the pipelines and much of the SDSS hardware is described in a series of detailed technical papers.

  • Data Releases: The SDSS data have been released publically in a series of data releases, entitled, in sequence, the Early Data Release (EDR; think of this as Data Release 0; it contained commissioning data), and Data Release 1, 2, and 3 (DR1, DR2, and DR3). Each of the above links goes to the refereed paper describing the release. In addition, the cumulative wisdom about these releases is included in the DR3 public web site.

  • SDSS rules: Finally, we are a collaboration that is guided by rules. The SDSS Principles of Operation (PoO) is our overarching document. Below we describe some of the other sets of rules, such as publication and press release policies.

    Please consider the cumulative experience of the SDSS collaboration as a resource to tap when you have questions. Please do not hesitate to send out e-mail to the collaboration when questions arise.

    Accessing the data:

    CAS and DAS: The SDSS data are formally made available to both the public and the collaboration via two avenues: The Data Archive Server (DAS) and the Catalog Archive Server (CAS). The DAS includes the flat files generated by the pipelines, such as the flat-fielded images, the catalogs of objects in each field, the spectra, and so on. The CAS organizes all this information using a sophisticated database. Loading this database happens only every six months or so, so one cannot use the CAS to get the data taken last week, for example.

    In particular, the CAS has recently been loaded with data taken through July 2004 (what will be DR4), which can be accessed here.

    The collaboration version of the DAS is linked off Fermilab's SDSS page.

    Public versions of the DAS and CAS can be found in a link off the DR3 web page.

    Auxiliary data: It should be noted that there are SDSS data taken before July 2003 that are not (yet) part of any public release:

  • Equatorial scans of low-latitude regions (the 'Orion scans').

  • Repeat exposures of spectroscopic plates. These are available in the DAS, but not in the CAS. They are planned to be released in DR4.

  • Special spectroscopic programs carried out in the Southern hemisphere. See the sdss-southern archive for a detailed description, and the postings by Jim Annis and Huan Lin for summaries. These are available in the DAS, but not in the CAS. They are planned to be released in DR4.

  • Repeat imaging exposures on the Celestial Equator. These are available in the DAS, but not in the CAS. There are plans for a 'runs database' where the outputs of these runs will be queryable from the CAS. There is also work underway to co-add these data and run the results through the photometric pipeline.

  • Imaging scans of M31, the Perseus Cluster, and the Taurus molecular cloud. These are available through the DAS. The M31 scans in particular have been the subject of many SDSS papers to date.

  • In a somewhat different category are results of quality assurance on the SDSS imaging and spectroscopy; draft documentationof the imaging exists.

    There are also several independent and less formal compilations of SDSS data around the collaboration. You can find them as a list of value-added catalogs. Some of these come in both public and private versions. Of particular interest will be the NYU Value-Added Galaxy Catalog, described in detail here, the SDSS quasar catalog, the Princeton reductions of the imaging and spectroscopic data, and the CMU-PITT spectroscopic galaxy and cluster catalogs.

    Getting involved in the collaboration

    Mail exploders and archives: The SDSS collaboration is vast, and the principal way we communicate is through a lengthy series of archived e-mail exploders. As the above link shows, there are many of them, but those new to the SDSS will want to join those of particular interest. Look at the archives to see which are active (some have become somewhat obselete). Of particular interest:

  • sdss-general contains general announcements about the projects, project and paper announcements, and many other issues of broad interest. All SDSS participants should sign up for this one.

  • sdss-future is a forum for general discussions about SDSS-II. However, note that most of the technical discussion of SEGUE happens on sdss-stars and that on supernovae happens on sdss-sn.

  • There is a long list of science working group mailing lists, some of which are very active. Sign up for those which fit your scientific interests. Note that many of these mailing lists are specific to a working group, defined as a group of SDSS scientists interested in a broad area of scientific investigation.

  • Much of the discussion on problems with the data, and software and database issues can be found at sdss-archive.

  • If you want to follow a discussion of mountaintop issues, sign up for sdss-obs, while sdss-25mlog will send you the nightly observing logs.

    To sign up for any of these, you need to register as a new user. Once you're registered, a link off any of the mailing list indices will sign you up. To send an e-mail to any of these, address it simply to sdss-*name*@astro.princeton.edu, where *name* is of course the name of the archive in question. Your mail will go to all those subscribed to the list in question, and be archived for posterity. Use common sense to avoid inundating your fellow SDSS'ers with spam, but don't hesitate to post questions, results, etc.

    Project Announcements: An important aspect of SDSS is the opportunity to work together on matters of mutual scientific interest. The SDSS Principles of Operation (PoO) state that all projects carried out with proprietary SDSS data must be announced to the collaboration when they are begun. In practice, this is done on the web; note that this website also gives a full, searchable archive for previously posted projects. When a project is posted to this page, it is also sent out to sdss-general. Note that project announcements are editable, and should be kept up-to-date to make this a useful archive.

    There is nothing in the rules that precludes two groups of people from working on the same science. We only ask that these two groups keep in mutual communication, and work in a spirit of cooperation and collegiality. Note that PhD thesis projects are protected a bit more directly; they are listed separately here.

    Often, when people are exploring a new idea for a project, they will post their ideas to the relevant mailing list, to see if there are others with similar ideas. Such musings will usually eventually turn into a formal project announcement.

    Funding Proposals: The PoO also requires that proposals for funding science that will take advantage of proprietary SDSS data be posted to the collaboration as well. This posting is done on the web, where the list is archived in perpetuity.

    External Collaborators: Because of the proprietary nature of the SDSS data, one is not allowed to share the data with collaborators without SDSS data rights. There are times, however, when such external collaboration makes scientific sense, e.g., when a collaborator has access to follow-up telescope time that you can't get any other way, or is the world expert on the analysis that needs to be done. In such cases, you can apply for external collaborator status for this person. The application should go in before the project starts, should be sent to sdss-general and the Collaboration Council, and should clearly explain the relevant project, state clearly for whom the external collaborator request is made, state what expertise and/or external resources they bring to the project, explain the extent to which they need access to proprietary SDSS data, describe the timescale and publication plans for the project, and make it clear who is leading the project. You can expect a decision in a few weeks. More on external collaborator policy can be found here and here.

    Publication Policy: The SDSS publication policy is rather elaborate, but is worth reading in detail. We have also written a friendlier guide to explain these policies. The most important web site discussed there is the internal list of SDSS publications.

    Policy on authorship on SDSS publications is perhaps the major source of questions. As an SDSS participant, you have the right to ask for co-authorship for any SDSS paper on which you have made a substantial contribution. The definition of 'substantial' is vague, and one is expected to use common sense and good scientific judgement on this. Builders are SDSS scientists who have earned the right to put their name on any paper because of their extensive work on SDSS infrastructure. Again, SDSS participants who are not builders can also put their name on any paper in which they have made a direct contribution. All authorship, whether builders or otherwise, of course comes with the responsibility of reading the paper thoroughly, and in some intellectual sense taking ownership of the scientific results therein.

    Press releases: An important part of SDSS public outreach is disseminating important results to the press. We have a policy describing how this is done. In practice, if you think you have a result appropriate for a press release, please contact the Project Spokesperson.

    Meetings: The collaboration regularly holds collaboration-wide meetings, held at one of the SDSS member institutions (the last two of these have been at Pittsburgh and New Mexico State). These are a mixture of science talks, working group meetings, and technical discussions of hardware, software, and other collaboration issues. Definitely a good way for a new SDSS member to meet people and get a sense of the pulse of the project!

    Phonecons: Those people working on infrastructure find themselves communicating on a regular basis via phone-conferences, announced via the appropriate e-mail exploder. A few example regular phone conferences will give you the flavor of these things; for the most part, they are open to all SDSS partipants:

  • The SEGUE planning phone conference.
  • SDSS testing phone conference, mostly with discussions of problems with the data and the data releases.
  • The engineering and operations phone conference, including hardware issues at the mountain.
  • The CAS phone conference.

    Much of the tracking of problems discussed in these phonecons is done via the GNATS problem report system.

    Who are we? A semi-complete list of SDSS participants can be found in links off of the SDSS member page. If you're not there, please contact the webmaster. Another very useful link is the who's who list; add your name via the link at the top if you are not listed.

    Who is in charge? It is worth summarizing the management structure of SDSS. As described in the management chart, the SDSS is overseen by the Advisory Council (which is a subset of the ARC Board of Governors), which has overall responsibility for the financial health of the project. The project is run on a day-to-day basis by the Management Committee, consisting of the Project Director (Rich Kron), Project Manager (Bill Boroski), Project Scientist (Jim Gunn), and Project Spokesperson (Michael Strauss). The Collaboration Council (CoCo), which advises the Spokesperson on matters of policy and data rights, includes representatives from each of the SDSS institutions. More detail on management can be found at Bill Boroski's Management web page.

    For SDSS-II, there are project teams set up for each of the major components. These are headed by Connie Rockosi (Santa Cruz) and Brian Yanny (Fermilab) for SEGUE, Josh Frieman (Chicago) and Craig Hogan (Washington) for Supernovae, and Steve Kent (Fermilab) and Michael Strauss (Princeton) for Legacy.

    Finally, you will get out of the SDSS what you put into it. Sign up for mailing lists, read the archives, look for ways to get involved, start (and announce!) some projects, and do some wonderful science.