This document describes the basic restart procedures for OSIRIS. Begin with the simple restart if the data taking system hangs up. Reset the computers if all else fails.
The basic OSIRIS system consists of 3 PCs and the sun data taking computer (ctioa1 or ctioa2) as well as the instrument itself. The instrument control software is called ICIMACS: Instrument Control and Image Acquisition System. The user interfaces with the instrument entirely from the Sun machine (ctioa1 or ctioa2) through the Prospero instrument control program (unless there is a problem; read on...).
In the console room are 2 other OSIRIS computers, both are PCs. The first is the IC (Instrument Computer), the second the WC (Workstation Computer). Each of these PCs runs DOS and a single executable program. The IC runs a program called "IC" and the WC runs "WC." A third PC, the IE (Instrument Electronics), is onboard OSIRIS and handles low level motor control.
The IC passes commands to the instrument and acquires image data from the instrument via optical fibers. The WC handles communications between the PCs and the Sun and the PCs and the Telescope Control System (TCS).
If the IC is indicated as the problem, you may try to restart the IC program. Type "exit" at the IC keyboard. When you are back at the DOS prompt, type "IC." When the IC has started again, type UARTINIT to initialize the communication ports in the HE. Look for a message indicating that the IC is talking to the instrument ("PONG Received from IE"). You should also see +SEQUENCER on the IC status display.
Similar procedures can be used to restart the WC. First exit, then type "WC." The WC will begin a telnet session with the Sun for Prospero to communicate with the PCs.
If either the WC or IC is hung completely and not responding to keyboard input, you may reboot using Ctrl-Alt-Del. If both computers are hung up, reboot the IC first, then the WC.
If a soft reboot won't work, try cycling the power on the IC and WC. Turn off both computers. Then turn on the IC power. When the IC is back up and has rebooted, turn on the WC power. After the WC has rebooted, execute IC and WC on the respective computers.
Always follow a restart of the IC or WC with a "startup" on Prospero.
If the IC can not communicate with the IE, check that the flat "phone" cable between the HE and IE is connected. You can also try swapping this cable to another port on the HE (Don't use port 1 until further notice). The cable must always be connected to COM1 on the IE. There could also be a fiber transmission problem (see below).
A HOSTS command on the IC will definitively show whether communication from the IC is reaching the IE.
There are four mechanisms which have relative positions in the instrument (xpupil, ypupil, camfocus, grattilt) which must be reset or zeroed after an IE power cycle. Always follow a restart of the IE with a reset of the xpupil, ypupil, and camfocus mechanisms. See `Reseting Mechanisms' below.
Prospero will issue a line telling you to type "WC RESTART" at the Prospero prompt. You can try this once: the SYNC flags on the WC should go from - to + and a startup on Prospero should be successful. If not, type >IC req initdisk at the WC keyboard. This should force the disks to synch. Again, finish by typing startup at the Prospero prompt.
An analogous problem can happen with the shared disks between the WC and sun. These disks are synched by the Caliban daemon. As long as Caliban is up and running, the DNSYNC flag on the WC should be +. If not, it is usually best to cycle the power (or reboot) the WC and restart Caliban on the sun. However, the command >CB req initdisk can be issued at the WC to try and force synching of the shared disks. A startup on Prospero should be done if the initdisk was succesful.
When starting up Prospero, you should see a message stating that Caliban is running and the data disks are synched. If Prospero complains that the data disks are not synched, follow the instructions printed in the command window. If all else fails, try to synch the disks by restarting the WC and IC as described above.
22 July 2000. As of this date, it may be that communications between the TCS and WC on the 4m can become unstable. The basic problem is a delay in the response from the TCS to requests from the WC and the fact that the WC does not ignore late responses from the TCS. This tends to cut the com link and produce many errors displayed on the WC screen. We believe this problem has been solved by making sure that the TCS router is not used with OSIRIS. This reduces the load on the TCS (Heurikon) and hence it is not late in responding to the WC. If the problem re-occurs even with the TCS router off, one can increase the TCPPORTDELAY in the wc.ini file on the WC to 3 seconds. In this case, the system will be stable, but note that the delay is long enough that the TCS header info is never received by the IC computer (and so won't make it to the image header). The message "Telescope controller not responding" will still be sent. This message is sent by the IC when the exposure (GO) is initiated. But in this case, the message means the TCS query has timed out (because TCPPORTDELAY is long), not that the TCS is really not responding. After modifying the wc.ini file, one has to restart the WC and execute a startup on Prospero.
Corrupted images can also be caused by bad fiber connections, or dirty fiber connections. If corrupted images occur, inspect the fiber connections at the IC and the HE. Small "dust balls" can "grow" at the connections and foul the signal transmission.