Home
South Pole Logbook

Search below for 'logbook_sop' for help on usage.

Sections

Search

Archives

November 2009
Sun Mon Tue Wed Thu Fri Sat
         

RSS Feed

Powered by Blosxom


Apr 11, 2008

YAPO - yet another power outage


By: TeX
Time: 2008103 2043Z
Music: Siouxie and the Banshees and listening to the LMR channel

At about 2037Z, power outage occurred, makes a 
fairly ominous noise here at the ICL. I think it is the HVAC going down.
Machines went on to UPS here from
20:37:10 to 20:38:03.

Checked around the building, nothing terrible happening,
called in that things seems ok.

Evidentally a generator that had been switched to around 20:15 or so had
failed.

2045Z checking out the DAQ and TWR now.

	Looks like run failed:
	pdaq@sps-expcont[~] 20:46:03 (0)$ anvil check
	Passed 'sn', in state 'Ignored'
	Passed 'twr', in state 'Starting'
	Passed 'cluster', in state 'Ignored'
	Passed 'daq', in state 'Starting'
	Passed 'spade', in state 'Ignored'
	Passed 'pnf', in state 'Started'

	Run 110812 failed with:
	DAQRun [2008-04-11 20:37:21.987722] ** Run watchdog reports starving components:
	    stringHub->eventBuilder backEnd.NumReadoutsReceived not changing from 11043762
	DAQRun [2008-04-11 20:37:31.934917] ** Run watchdog reports starving components:
	    globalTrigger->eventBuilder backEnd.NumTriggerRequestsReceived not changing from 3379608
	    stringHub->eventBuilder backEnd.NumReadoutsReceived not changing from 11043762
	DAQRun [2008-04-11 20:37:40.346922]     3378980 physics events (1248.79 Hz), 9727921 moni events, 6078962 SN events, 6011732 tcals
	DAQRun [2008-04-11 20:37:41.866123] #48: eventBuilder inputs: Exception("stringHub->eventBuilder backEnd.NumReadoutsReceived is not changing") in check() (RunWatchdog.py:80) <- checkValues() (RunWatchdog.py:217) <- checkList() (RunWatchdog.py:192) <- checkComp() (RunWatchdog.py:401)
	DAQRun [2008-04-11 20:37:41.875543] ** Run watchdog reports stagnant components:
	    eventBuilder->dispatch backEnd.NumEventsSent not changing from 3378980
	DAQRun [2008-04-11 20:37:42.104140] Caught error in system, going to ERROR state...


	testdaq@sps-testdaq02:domhub ic40 quickstatus
	DOMHub (v3.8)   using command: quickstatus
	Using entry for ic40 in ~/domhubConfig.dat.
	Using hubs: sps-amanda, sps-ichub21, sps-ichub29, sps-ichub30, sps-ichub38,
	sps-ichub39, sps-ichub40, sps-ichub49, sps-ichub50, sps-ichub59, sps-ichub58,
	sps-ichub67, sps-ichub66, sps-ichub74, sps-ichub73, sps-ichub65, sps-ichub72,
	sps-ichub78, sps-ichub48, sps-ichub57, sps-ichub47, sps-ichub46, sps-ichub56,
	sps-ithub01, sps-ithub02, sps-ithub03, sps-ithub04, sps-ichub63, sps-ichub64,
	sps-ichub55, sps-ichub71, sps-ichub70, sps-ichub76, sps-ithub05, sps-ithub06,
	sps-ichub77, sps-ichub69, sps-ichub75, sps-ichub60, sps-ichub68, sps-ichub61,
	sps-ichub62, sps-ichub52, sps-ichub44, sps-ichub53, sps-ichub54, sps-ichub45
	Waiting for 47 DOMHubs to finish...
	Waiting on 46 DOMHUBs...
	Waiting on 46 DOMHUBs...
	Waiting on 15 DOMHUBs...
	All DOMHubs have finished.


	SUMMARY
	------------------------------------------------------------------
	HUB:   AM 01 02 03 04 05 06 21 29 30 38 39 40 44 45 46 47 48 49 50
	COMM:   2 32 32 32  8 32 24 60 58 58 60 60 58 54 60 58 58 60 60 59
	------------------------------------------------------------------
	------------------------------------------------------------------
	HUB:   52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
	COMM:  60 58 59 60 60 60 60 60 60 60 60 60 60 60 56 60 60 55 60 59
	------------------------------------------------------------------
	------------------------------------------------------------------
	HUB:   72 73 74 75 76 77 78
	COMM:  58 60 59 60 60 58 60
	------------------------------------------------------------------

	HUBS= 47; COMM= 2527

	***  NO PROBLEMS FOUND  ***


2050Z	Stopping the runs since not working. Errors in logs are not
	helpful, same as before, says:
		DAQRun [2008-04-11 20:49:37.802404] ** Run watchdog reports starving components:
		    globalTrigger->eventBuilder backEnd.NumTriggerRequestsReceived not changing from 0
		    stringHub->eventBuilder backEnd.NumReadoutsReceived not changing from 0
		DAQRun [2008-04-11 20:49:47.922898] ** Run watchdog reports starving components:
		    globalTrigger->eventBuilder backEnd.NumTriggerRequestsReceived not changing from 0
		    stringHub->eventBuilder backEnd.NumReadoutsReceived not changing from 0
		DAQRun [2008-04-11 20:49:57.683845]     0 physics events (0.00 Hz), 395690 moni events, 78242 SN events, 7040 tcals
		DAQRun [2008-04-11 20:49:59.226458] #4: eventBuilder inputs: Exception("globalTrigger->eventBuilder backEnd.NumTriggerRequestsReceived is not changing") in check() (RunWatchdog.py:80) <- checkValues() (RunWatchdog.py:217) <- checkList() (RunWatchdog.py:192) <- checkComp() (RunWat chdog.py:401)
		DAQRun [2008-04-11 20:49:59.250781] ** Run watchdog reports stagnant components:
		    eventBuilder->dispatch backEnd.NumEventsSent not changing from 0
		DAQRun [2008-04-11 20:49:59.452323] Caught error in system, going to ERROR state...

	Assuming it is TWR so trying standalone.
	pdaq@sps-expcont[~] 20:52:50 (0)$ ./starti3only
	Sub-system 'daq' are now under control of anvil.
	Sub-system 'twr' are now ignored by anvil.
	2008-04-11 20:52:57: Started 'sam' service with PID=9649
	2008-04-11 20:52:57: Parameters for this run set:
	2008-04-11 20:52:57:           Run Mode: PhysicsTrig
	2008-04-11 20:52:57:          DAQ Label: sps-IC40-massive-icetop-changes-V043-i3only
	2008-04-11 20:52:57:          JeB Label: I3DAQOnly (2)
	2008-04-11 20:52:57:          PnF Label: PhysicsFiltering (1)
	2008-04-11 20:52:57:         Run Length: 28800
	2008-04-11 20:52:57:     Number of Runs: 40000
	2008-04-11 20:52:57: Started 'run' service
	2008-04-11 20:52:57: Parameters for this run:
	2008-04-11 20:52:57:         Run Number: 110817
	2008-04-11 20:52:57:           Run Mode: PhysicsTrig
	2008-04-11 20:52:57:          DAQ Label: sps-IC40-massive-icetop-changes-V043-i3only
	2008-04-11 20:52:57:          JeB Label: I3DAQOnly (2)
	2008-04-11 20:52:57:          PnF Label: PhysicsFiltering (1)
	2008-04-11 20:52:57: Starting run 110817
	2008-04-11 20:52:58: Starting 'daq' sub-system
	2008-04-11 20:54:01: Started run 110817

	Confirmed events coming in:
	DAQRun [2008-04-11 20:53:15.626853] Configuring run set...
	DAQRun [2008-04-11 20:54:00.244945] Started run 110817 on run set 1
	DAQRun [2008-04-11 20:54:01.402466]     0 physics events (0.00 Hz), 43079 moni events, 2332 SN events, 7 tcals
	DAQRun [2008-04-11 20:54:30.789136]     36873 physics events (1121.12 Hz), 390954 moni events, 78119 SN events, 7032 tcals
	DAQRun [2008-04-11 20:55:00.940214]     74774 physics events (1186.12 Hz), 504819 moni events, 151842 SN events, 80546 tcals


21:03Z	Emailed run coordination list.
	Starting to prep for trip to MAPO from ICL.

	

######## See AMANDA/2008103_TWR_down.txt for info on the MAPO situation.

	On the way back to the station, I went to go check on the DNF/hose
	reel. The DNF lights were on, but the hose reel beacon was not. When I got
	there, I saw that the controller said VR1.6 and nothing else. Only the
	24V light and internal heat on light were on.

	Went back to station, had to get something to eat, so did that quickly
	and talked to the UTs, who knew nothing about the reel.
	Told Ms. Hess about the situation and that I was bailing on the safety
	stuff to try to work this out.

	My systems were down, so spent time working on bringing them up,
	(turns out my USB gig-e ethernet was partially dead). So I could	
	get to the documentation.

	In the meantime, I emailed the north on the situation from the
	raytheon computers. Saw a UT, asked what he should do, I said to
	check out the circuit breakers which should be in the Cheese Palace.

	Some time later, I was getting ready to go out to check on the DNFs
	to make sure heaters actually on and to go open up the hose reel
	controller box and power cycle it.

	Ethan tracked me down, he had been asleep which is why I wasn't
	able to get hold of him. I gave him an executive summary.

	As I was leaving the station, he radio'd me to tell me the UTs
	had gotten the hose reel controller working, though no info on how.

	I checked the 3 DNFs, all working. Went to hose reel to confirm,
	and it appears to be working, perhaps even #3 is working. I took
	readings. They must have taken off the insulation as it was very
	loose. Drillers tape was not working at all, too cold I guess, so
	I rigged up a couple of webbing straps to help hold it together then
	used the blanket to put over it all.

	Then when I got back, I started a combined run, I had asked ethan
	to do it, but the safety meeting must have gone on longer than
	expected.

	Now I need to report back on the hose reel readings, see ya later!


Edgar Nielsen | 11 Apr 2008 23:05 GMT | Power Outages | | permalink