Showing posts with label active. Show all posts
Showing posts with label active. Show all posts

Friday, March 30, 2012

Question Title: 2000 SQL Cluster Failure (Active/Passive)

(sorry to post this here, but it looks like there is no activity on the
cluster newsgroup)
We are having an issue about every 3 months or so our SQL cluster
(active/passive) will fail & go completely offline. The only recourse is to
power off/on the boxes to restore connectivity.
By the looks of it, the primary SQL node will fail & dump, the second
passive node will sense the failure & come online, bring up the resources &
start up SQL, but shortly after that the SQL Service will fail & the whole
cluster will go down & is unreachable. (via tcp or desktop) We have to
cycle the power.
I appreciate any advise or insight you can give me on the is situation.
Details are below.
Have a good weekend!
James
Details:
SQL Server Enterprise (2000)
Build: 8.00.760 (SP3)
Windows Enterprise Server (2003)
Build: 5.2(3790)
Basic Timeline & Errors:
SQLN01:
No Events written to Application or System Windows Event Logs
No errors in the Cluster Log. Just INFO logged
SQL Server Error Log: Error: 1203, Severity: 20, State 1
SQL Server Error Log: Process ID 58 attempting to unlock unowned resource
RID: 8:1:339:43
Those SQL errors report numerous times & then:
SQL Server Error Log: SQL Server Assertion: File: <lckmgr.cpp>, line-4792,
Failed Assertion = 'lockFound ==TRUE'
SQL Server Error Log: Stack Signature for the dump is 0xFEDF6C17
SQL Server Error Log: Using 'dbghelp.dll' version '4.0.5'*Dump thread...
SQL Server Error Log: Login failed for user 'sa'
That repeats about 30 times & that all for the logs...
SQLN02:
Numerous System Events recorded of lost communication with cluster & bring
SQLN02 into active mode. (Event 1123, 1209 & 1200)
System & Application Events show start of SQL Services.
System Event: 1069 (Failover Mgr) Cluster resource "SQL Server" in Resource
Group XYZ failed
System Event: 7035 The SQLSERVERAGENT service successfully sent a stop control
System Event: 7036 The SQLSERVERAGENT service successfully stopped
System Event: Multiple Events record successfully startup of SQL & Cluster
Service & then nothing in system events until previous shutdown was
unexpected.
Application Event: 17052 [sqsrvres] CheckServiceAlive: Service is dead
Application Event: 17052 [sqsrvres] OnlineThread: service stopped while
waiting for QP.
Application Event: 102 SQLServerAgent service successfully stopped.
Application Event: Multiple Events showing the restart of SQL & then
nothing until the power is cycled.
From the Cluster logs on SQLN02: (only WARN or ERROR) messages
0000090c.00000974::2005/05/19-20:16:02.815 WARN [NM] Communication was lost
with interface 2115bee3-ada3-4b23-94ab-67de328d0969 (node: SQLN01, network:
PUBLIC)
0000090c.00000a90::2005/05/19-20:16:02.815 WARN [NM] Updating local
connectivity info for network ac5dcdf2-4ec6-431c-8d0e-75a9f388f945.
00000bb8.00000d88::2005/05/19-20:16:28.895 WARN Physical Disk <Disk Q:>:
[DiskArb] Assume ownership of the device.
0000090c.00000a90::2005/05/19-20:16:28.911 WARN [NM] Leadership changed.
Cancelling connectivity report for network
ac5dcdf2-4ec6-431c-8d0e-75a9f388f945.
00000bb8.0000152c::2005/05/19-20:16:33.645 WARN Network Name <Cluster Name>:
Unable to read CreatingDC parameter, error=2
00000bb8.000004c0::2005/05/19-20:16:33.677 WARN Network Name <SQL Network
Name(VRSQL)>: Unable to read ResourceData parameter, error=2
00000bb8.000004c0::2005/05/19-20:16:33.770 WARN [ClNet] Tcpip is not bound
to adapter 0470CF36-FDC3-446D-8738-756DE859CB7A. (comment -> disabled network
adapter)
00000bb8.00000940::2005/05/19-20:16:38.286 WARN Physical Disk <Disk L:>:
[DiskArb] Assume ownership of the device.
00000bb8.0000165c::2005/05/19-20:17:19.759 ERR SQL Server <SQL Server>:
[sqsrvres] OnlineThread: service stopped while waiting for QP.
00000bb8.0000165c::2005/05/19-20:17:19.759 ERR SQL Server <SQL Server>:
[sqsrvres] OnlineThread: Error 1 bringing resource online.
00000bb8.00000bd4::2005/05/19-20:17:23.931 ERR SQL Server <SQL Server>:
[sqsrvres] CheckServiceAlive: Service is dead
0000090c.000009c4::2005/05/19-20:17:23.946 WARN [FM]
FmpHandleResourceTransition: Resource Name = eea79949-1b03-42e8-a690-9dc987e72063 [SQL Server] old state=2 new state=4Yes we run DBCC every weekend.
We also have AWE enabled @. 12GB RAM.
Looks like we will have to wait for a fix to sp4.
Thanks for your help!
James
"Mike Epprecht (SQL MVP)" wrote:
> Hi
> Have you run DBCC CheckDB on the databases?
> If you do not have more than 2GB RAM, you could install SP4 for SQL Server
> as there was one know issue after SP3a that could have caused this error.
> Regards
> --
> Mike Epprecht, Microsoft SQL Server MVP
> Zurich, Switzerland
> MVP Program: http://www.microsoft.com/mvp
> Blog: http://www.msmvps.com/epprecht/
>
> "Death_n_Gravity" wrote:
> > (sorry to post this here, but it looks like there is no activity on the
> > cluster newsgroup)
> >
> > We are having an issue about every 3 months or so our SQL cluster
> > (active/passive) will fail & go completely offline. The only recourse is to
> > power off/on the boxes to restore connectivity.
> >
> > By the looks of it, the primary SQL node will fail & dump, the second
> > passive node will sense the failure & come online, bring up the resources &
> > start up SQL, but shortly after that the SQL Service will fail & the whole
> > cluster will go down & is unreachable. (via tcp or desktop) We have to
> > cycle the power.
> >
> > I appreciate any advise or insight you can give me on the is situation.
> > Details are below.
> >
> > Have a good weekend!
> > James
> >
> >
> > Details:
> > SQL Server Enterprise (2000)
> > Build: 8.00.760 (SP3)
> > Windows Enterprise Server (2003)
> > Build: 5.2(3790)
> >
> > Basic Timeline & Errors:
> >
> > SQLN01:
> > No Events written to Application or System Windows Event Logs
> > No errors in the Cluster Log. Just INFO logged
> > SQL Server Error Log: Error: 1203, Severity: 20, State 1
> > SQL Server Error Log: Process ID 58 attempting to unlock unowned resource
> > RID: 8:1:339:43
> > Those SQL errors report numerous times & then:
> > SQL Server Error Log: SQL Server Assertion: File: <lckmgr.cpp>, line-4792,
> > Failed Assertion = 'lockFound ==TRUE'
> > SQL Server Error Log: Stack Signature for the dump is 0xFEDF6C17
> > SQL Server Error Log: Using 'dbghelp.dll' version '4.0.5'*Dump thread...
> > SQL Server Error Log: Login failed for user 'sa'
> > That repeats about 30 times & that all for the logs...
> >
> > SQLN02:
> > Numerous System Events recorded of lost communication with cluster & bring
> > SQLN02 into active mode. (Event 1123, 1209 & 1200)
> > System & Application Events show start of SQL Services.
> > System Event: 1069 (Failover Mgr) Cluster resource "SQL Server" in Resource
> > Group XYZ failed
> > System Event: 7035 The SQLSERVERAGENT service successfully sent a stop control
> > System Event: 7036 The SQLSERVERAGENT service successfully stopped
> > System Event: Multiple Events record successfully startup of SQL & Cluster
> > Service & then nothing in system events until previous shutdown was
> > unexpected.
> > Application Event: 17052 [sqsrvres] CheckServiceAlive: Service is dead
> > Application Event: 17052 [sqsrvres] OnlineThread: service stopped while
> > waiting for QP.
> > Application Event: 102 SQLServerAgent service successfully stopped.
> > Application Event: Multiple Events showing the restart of SQL & then
> > nothing until the power is cycled.
> >
> > From the Cluster logs on SQLN02: (only WARN or ERROR) messages
> >
> > 0000090c.00000974::2005/05/19-20:16:02.815 WARN [NM] Communication was lost
> > with interface 2115bee3-ada3-4b23-94ab-67de328d0969 (node: SQLN01, network:
> > PUBLIC)
> > 0000090c.00000a90::2005/05/19-20:16:02.815 WARN [NM] Updating local
> > connectivity info for network ac5dcdf2-4ec6-431c-8d0e-75a9f388f945.
> > 00000bb8.00000d88::2005/05/19-20:16:28.895 WARN Physical Disk <Disk Q:>:
> > [DiskArb] Assume ownership of the device.
> > 0000090c.00000a90::2005/05/19-20:16:28.911 WARN [NM] Leadership changed.
> > Cancelling connectivity report for network
> > ac5dcdf2-4ec6-431c-8d0e-75a9f388f945.
> > 00000bb8.0000152c::2005/05/19-20:16:33.645 WARN Network Name <Cluster Name>:
> > Unable to read CreatingDC parameter, error=2
> > 00000bb8.000004c0::2005/05/19-20:16:33.677 WARN Network Name <SQL Network
> > Name(VRSQL)>: Unable to read ResourceData parameter, error=2
> > 00000bb8.000004c0::2005/05/19-20:16:33.770 WARN [ClNet] Tcpip is not bound
> > to adapter 0470CF36-FDC3-446D-8738-756DE859CB7A. (comment -> disabled network
> > adapter)
> > 00000bb8.00000940::2005/05/19-20:16:38.286 WARN Physical Disk <Disk L:>:
> > [DiskArb] Assume ownership of the device.
> > 00000bb8.0000165c::2005/05/19-20:17:19.759 ERR SQL Server <SQL Server>:
> > [sqsrvres] OnlineThread: service stopped while waiting for QP.
> > 00000bb8.0000165c::2005/05/19-20:17:19.759 ERR SQL Server <SQL Server>:
> > [sqsrvres] OnlineThread: Error 1 bringing resource online.
> > 00000bb8.00000bd4::2005/05/19-20:17:23.931 ERR SQL Server <SQL Server>:
> > [sqsrvres] CheckServiceAlive: Service is dead
> > 0000090c.000009c4::2005/05/19-20:17:23.946 WARN [FM]
> > FmpHandleResourceTransition: Resource Name => > eea79949-1b03-42e8-a690-9dc987e72063 [SQL Server] old state=2 new state=4|||Hi
Have you run DBCC CheckDB on the databases?
If you do not have more than 2GB RAM, you could install SP4 for SQL Server
as there was one know issue after SP3a that could have caused this error.
Regards
--
Mike Epprecht, Microsoft SQL Server MVP
Zurich, Switzerland
MVP Program: http://www.microsoft.com/mvp
Blog: http://www.msmvps.com/epprecht/
"Death_n_Gravity" wrote:
> (sorry to post this here, but it looks like there is no activity on the
> cluster newsgroup)
> We are having an issue about every 3 months or so our SQL cluster
> (active/passive) will fail & go completely offline. The only recourse is to
> power off/on the boxes to restore connectivity.
> By the looks of it, the primary SQL node will fail & dump, the second
> passive node will sense the failure & come online, bring up the resources &
> start up SQL, but shortly after that the SQL Service will fail & the whole
> cluster will go down & is unreachable. (via tcp or desktop) We have to
> cycle the power.
> I appreciate any advise or insight you can give me on the is situation.
> Details are below.
> Have a good weekend!
> James
>
> Details:
> SQL Server Enterprise (2000)
> Build: 8.00.760 (SP3)
> Windows Enterprise Server (2003)
> Build: 5.2(3790)
> Basic Timeline & Errors:
> SQLN01:
> No Events written to Application or System Windows Event Logs
> No errors in the Cluster Log. Just INFO logged
> SQL Server Error Log: Error: 1203, Severity: 20, State 1
> SQL Server Error Log: Process ID 58 attempting to unlock unowned resource
> RID: 8:1:339:43
> Those SQL errors report numerous times & then:
> SQL Server Error Log: SQL Server Assertion: File: <lckmgr.cpp>, line-4792,
> Failed Assertion = 'lockFound ==TRUE'
> SQL Server Error Log: Stack Signature for the dump is 0xFEDF6C17
> SQL Server Error Log: Using 'dbghelp.dll' version '4.0.5'*Dump thread...
> SQL Server Error Log: Login failed for user 'sa'
> That repeats about 30 times & that all for the logs...
> SQLN02:
> Numerous System Events recorded of lost communication with cluster & bring
> SQLN02 into active mode. (Event 1123, 1209 & 1200)
> System & Application Events show start of SQL Services.
> System Event: 1069 (Failover Mgr) Cluster resource "SQL Server" in Resource
> Group XYZ failed
> System Event: 7035 The SQLSERVERAGENT service successfully sent a stop control
> System Event: 7036 The SQLSERVERAGENT service successfully stopped
> System Event: Multiple Events record successfully startup of SQL & Cluster
> Service & then nothing in system events until previous shutdown was
> unexpected.
> Application Event: 17052 [sqsrvres] CheckServiceAlive: Service is dead
> Application Event: 17052 [sqsrvres] OnlineThread: service stopped while
> waiting for QP.
> Application Event: 102 SQLServerAgent service successfully stopped.
> Application Event: Multiple Events showing the restart of SQL & then
> nothing until the power is cycled.
> From the Cluster logs on SQLN02: (only WARN or ERROR) messages
> 0000090c.00000974::2005/05/19-20:16:02.815 WARN [NM] Communication was lost
> with interface 2115bee3-ada3-4b23-94ab-67de328d0969 (node: SQLN01, network:
> PUBLIC)
> 0000090c.00000a90::2005/05/19-20:16:02.815 WARN [NM] Updating local
> connectivity info for network ac5dcdf2-4ec6-431c-8d0e-75a9f388f945.
> 00000bb8.00000d88::2005/05/19-20:16:28.895 WARN Physical Disk <Disk Q:>:
> [DiskArb] Assume ownership of the device.
> 0000090c.00000a90::2005/05/19-20:16:28.911 WARN [NM] Leadership changed.
> Cancelling connectivity report for network
> ac5dcdf2-4ec6-431c-8d0e-75a9f388f945.
> 00000bb8.0000152c::2005/05/19-20:16:33.645 WARN Network Name <Cluster Name>:
> Unable to read CreatingDC parameter, error=2
> 00000bb8.000004c0::2005/05/19-20:16:33.677 WARN Network Name <SQL Network
> Name(VRSQL)>: Unable to read ResourceData parameter, error=2
> 00000bb8.000004c0::2005/05/19-20:16:33.770 WARN [ClNet] Tcpip is not bound
> to adapter 0470CF36-FDC3-446D-8738-756DE859CB7A. (comment -> disabled network
> adapter)
> 00000bb8.00000940::2005/05/19-20:16:38.286 WARN Physical Disk <Disk L:>:
> [DiskArb] Assume ownership of the device.
> 00000bb8.0000165c::2005/05/19-20:17:19.759 ERR SQL Server <SQL Server>:
> [sqsrvres] OnlineThread: service stopped while waiting for QP.
> 00000bb8.0000165c::2005/05/19-20:17:19.759 ERR SQL Server <SQL Server>:
> [sqsrvres] OnlineThread: Error 1 bringing resource online.
> 00000bb8.00000bd4::2005/05/19-20:17:23.931 ERR SQL Server <SQL Server>:
> [sqsrvres] CheckServiceAlive: Service is dead
> 0000090c.000009c4::2005/05/19-20:17:23.946 WARN [FM]
> FmpHandleResourceTransition: Resource Name => eea79949-1b03-42e8-a690-9dc987e72063 [SQL Server] old state=2 new state=4

Question Title: 2000 SQL Cluster Failure (Active/Passive)

(sorry to post this here, but it looks like there is no activity on the
cluster newsgroup)
We are having an issue about every 3 months or so our SQL cluster
(active/passive) will fail & go completely offline. The only recourse is to
power off/on the boxes to restore connectivity.
By the looks of it, the primary SQL node will fail & dump, the second
passive node will sense the failure & come online, bring up the resources &
start up SQL, but shortly after that the SQL Service will fail & the whole
cluster will go down & is unreachable. (via tcp or desktop) We have to
cycle the power.
I appreciate any advise or insight you can give me on the is situation.
Details are below.
Have a good weekend!
James
Details:
SQL Server Enterprise (2000)
Build: 8.00.760 (SP3)
Windows Enterprise Server (2003)
Build: 5.2(3790)
Basic Timeline & Errors:
SQLN01:
No Events written to Application or System Windows Event Logs
No errors in the Cluster Log. Just INFO logged
SQL Server Error Log: Error: 1203, Severity: 20, State 1
SQL Server Error Log: Process ID 58 attempting to unlock unowned resource
RID: 8:1:339:43
Those SQL errors report numerous times & then:
SQL Server Error Log: SQL Server Assertion: File: <lckmgr.cpp>, line-4792,
Failed Assertion = 'lockFound ==TRUE'
SQL Server Error Log: Stack Signature for the dump is 0xFEDF6C17
SQL Server Error Log: Using 'dbghelp.dll' version '4.0.5'*Dump thread...
SQL Server Error Log: Login failed for user 'sa'
That repeats about 30 times & that all for the logs...
SQLN02:
Numerous System Events recorded of lost communication with cluster & bring
SQLN02 into active mode. (Event 1123, 1209 & 1200)
System & Application Events show start of SQL Services.
System Event: 1069 (Failover Mgr) Cluster resource "SQL Server" in Resource
Group XYZ failed
System Event: 7035 The SQLSERVERAGENT service successfully sent a stop contr
ol
System Event: 7036 The SQLSERVERAGENT service successfully stopped
System Event: Multiple Events record successfully startup of SQL & Cluster
Service & then nothing in system events until previous shutdown was
unexpected.
Application Event: 17052 [sqsrvres] CheckServiceAlive: Service is dead
Application Event: 17052 [sqsrvres] OnlineThread: service stopped while
waiting for QP.
Application Event: 102 SQLServerAgent service successfully stopped.
Application Event: Multiple Events showing the restart of SQL & then
nothing until the power is cycled.
From the Cluster logs on SQLN02: (only WARN or ERROR) messages
0000090c.00000974::2005/05/19-20:16:02.815 WARN [NM] Communication was l
ost
with interface 2115bee3-ada3-4b23-94ab-67de328d0969 (node: SQLN01, network:
PUBLIC)
0000090c.00000a90::2005/05/19-20:16:02.815 WARN [NM] Updating local
connectivity info for network ac5dcdf2-4ec6-431c-8d0e-75a9f388f945.
00000bb8.00000d88::2005/05/19-20:16:28.895 WARN Physical Disk <Disk Q:>:
[DiskArb] Assume ownership of the device.
0000090c.00000a90::2005/05/19-20:16:28.911 WARN [NM] Leadership changed.
Cancelling connectivity report for network
ac5dcdf2-4ec6-431c-8d0e-75a9f388f945.
00000bb8.0000152c::2005/05/19-20:16:33.645 WARN Network Name <Cluster Name>:
Unable to read CreatingDC parameter, error=2
00000bb8.000004c0::2005/05/19-20:16:33.677 WARN Network Name <SQL Network
Name(VRSQL)>: Unable to read ResourceData parameter, error=2
00000bb8.000004c0::2005/05/19-20:16:33.770 WARN [ClNet] Tcpip is not bou
nd
to adapter 0470CF36-FDC3-446D-8738-756DE859CB7A. (comment -> disabled networ
k
adapter)
00000bb8.00000940::2005/05/19-20:16:38.286 WARN Physical Disk <Disk L:>:
[DiskArb] Assume ownership of the device.
00000bb8.0000165c::2005/05/19-20:17:19.759 ERR SQL Server <SQL Server>:
[sqsrvres] OnlineThread: service stopped while waiting for QP.
00000bb8.0000165c::2005/05/19-20:17:19.759 ERR SQL Server <SQL Server>:
[sqsrvres] OnlineThread: Error 1 bringing resource online.
00000bb8.00000bd4::2005/05/19-20:17:23.931 ERR SQL Server <SQL Server>:
[sqsrvres] CheckServiceAlive: Service is dead
0000090c.000009c4::2005/05/19-20:17:23.946 WARN [FM]
FmpHandleResourceTransition: Resource Name =
eea79949-1b03-42e8-a690-9dc987e72063 [SQL Server] old state=2 new state=
4Yes we run DBCC every weekend.
We also have AWE enabled @. 12GB RAM.
Looks like we will have to wait for a fix to sp4.
Thanks for your help!
James
"Mike Epprecht (SQL MVP)" wrote:
[vbcol=seagreen]
> Hi
> Have you run DBCC CheckDB on the databases?
> If you do not have more than 2GB RAM, you could install SP4 for SQL Server
> as there was one know issue after SP3a that could have caused this error.
> Regards
> --
> Mike Epprecht, Microsoft SQL Server MVP
> Zurich, Switzerland
> MVP Program: http://www.microsoft.com/mvp
> Blog: http://www.msmvps.com/epprecht/
>
> "Death_n_Gravity" wrote:
>|||Hi
Have you run DBCC CheckDB on the databases?
If you do not have more than 2GB RAM, you could install SP4 for SQL Server
as there was one know issue after SP3a that could have caused this error.
Regards
--
Mike Epprecht, Microsoft SQL Server MVP
Zurich, Switzerland
MVP Program: http://www.microsoft.com/mvp
Blog: http://www.msmvps.com/epprecht/
"Death_n_Gravity" wrote:
[vbcol=seagreen]
> (sorry to post this here, but it looks like there is no activity on the
> cluster newsgroup)
> We are having an issue about every 3 months or so our SQL cluster
> (active/passive) will fail & go completely offline. The only recourse is
to
> power off/on the boxes to restore connectivity.
> By the looks of it, the primary SQL node will fail & dump, the second
> passive node will sense the failure & come online, bring up the resources
&
> start up SQL, but shortly after that the SQL Service will fail & the whole
> cluster will go down & is unreachable. (via tcp or desktop) We have to
> cycle the power.
> I appreciate any advise or insight you can give me on the is situation.
> Details are below.
> Have a good weekend!
> James
>
> Details:
> SQL Server Enterprise (2000)
> Build: 8.00.760 (SP3)
> Windows Enterprise Server (2003)
> Build: 5.2(3790)
> Basic Timeline & Errors:
> SQLN01:
> No Events written to Application or System Windows Event Logs
> No errors in the Cluster Log. Just INFO logged
> SQL Server Error Log: Error: 1203, Severity: 20, State 1
> SQL Server Error Log: Process ID 58 attempting to unlock unowned resource
> RID: 8:1:339:43
> Those SQL errors report numerous times & then:
> SQL Server Error Log: SQL Server Assertion: File: <lckmgr.cpp>, line-4792
,
> Failed Assertion = 'lockFound ==TRUE'
> SQL Server Error Log: Stack Signature for the dump is 0xFEDF6C17
> SQL Server Error Log: Using 'dbghelp.dll' version '4.0.5'*Dump thread...
> SQL Server Error Log: Login failed for user 'sa'
> That repeats about 30 times & that all for the logs...
> SQLN02:
> Numerous System Events recorded of lost communication with cluster & bring
> SQLN02 into active mode. (Event 1123, 1209 & 1200)
> System & Application Events show start of SQL Services.
> System Event: 1069 (Failover Mgr) Cluster resource "SQL Server" in Resourc
e
> Group XYZ failed
> System Event: 7035 The SQLSERVERAGENT service successfully sent a stop con
trol
> System Event: 7036 The SQLSERVERAGENT service successfully stopped
> System Event: Multiple Events record successfully startup of SQL & Cluster
> Service & then nothing in system events until previous shutdown was
> unexpected.
> Application Event: 17052 [sqsrvres] CheckServiceAlive: Service is dea
d
> Application Event: 17052 [sqsrvres] OnlineThread: service stopped whi
le
> waiting for QP.
> Application Event: 102 SQLServerAgent service successfully stopped.
> Application Event: Multiple Events showing the restart of SQL & then
> nothing until the power is cycled.
> From the Cluster logs on SQLN02: (only WARN or ERROR) messages
> 0000090c.00000974::2005/05/19-20:16:02.815 WARN [NM] Communication was
lost
> with interface 2115bee3-ada3-4b23-94ab-67de328d0969 (node: SQLN01, network
:
> PUBLIC)
> 0000090c.00000a90::2005/05/19-20:16:02.815 WARN [NM] Updating local
> connectivity info for network ac5dcdf2-4ec6-431c-8d0e-75a9f388f945.
> 00000bb8.00000d88::2005/05/19-20:16:28.895 WARN Physical Disk <Disk Q:>:
> [DiskArb] Assume ownership of the device.
> 0000090c.00000a90::2005/05/19-20:16:28.911 WARN [NM] Leadership change
d.
> Cancelling connectivity report for network
> ac5dcdf2-4ec6-431c-8d0e-75a9f388f945.
> 00000bb8.0000152c::2005/05/19-20:16:33.645 WARN Network Name <Cluster Name
>:
> Unable to read CreatingDC parameter, error=2
> 00000bb8.000004c0::2005/05/19-20:16:33.677 WARN Network Name <SQL Network
> Name(VRSQL)>: Unable to read ResourceData parameter, error=2
> 00000bb8.000004c0::2005/05/19-20:16:33.770 WARN [ClNet] Tcpip is not b
ound
> to adapter 0470CF36-FDC3-446D-8738-756DE859CB7A. (comment -> disabled netw
ork
> adapter)
> 00000bb8.00000940::2005/05/19-20:16:38.286 WARN Physical Disk <Disk L:>:
> [DiskArb] Assume ownership of the device.
> 00000bb8.0000165c::2005/05/19-20:17:19.759 ERR SQL Server <SQL Server>:
> [sqsrvres] OnlineThread: service stopped while waiting for QP.
> 00000bb8.0000165c::2005/05/19-20:17:19.759 ERR SQL Server <SQL Server>:
> [sqsrvres] OnlineThread: Error 1 bringing resource online.
> 00000bb8.00000bd4::2005/05/19-20:17:23.931 ERR SQL Server <SQL Server>:
> [sqsrvres] CheckServiceAlive: Service is dead
> 0000090c.000009c4::2005/05/19-20:17:23.946 WARN [FM]
> FmpHandleResourceTransition: Resource Name =
> eea79949-1b03-42e8-a690-9dc987e72063 [SQL Server] old state=2 new state=4[/vbc
ol]

Question Title: 2000 SQL Cluster Failure (Active/Passive)

(sorry to post this here, but it looks like there is no activity on the
cluster newsgroup)
We are having an issue about every 3 months or so our SQL cluster
(active/passive) will fail & go completely offline. The only recourse is to
power off/on the boxes to restore connectivity.
By the looks of it, the primary SQL node will fail & dump, the second
passive node will sense the failure & come online, bring up the resources &
start up SQL, but shortly after that the SQL Service will fail & the whole
cluster will go down & is unreachable. (via tcp or desktop) We have to
cycle the power.
I appreciate any advise or insight you can give me on the is situation.
Details are below.
Have a good weekend!
James
Details:
SQL Server Enterprise (2000)
Build: 8.00.760 (SP3)
Windows Enterprise Server (2003)
Build: 5.2(3790)
Basic Timeline & Errors:
SQLN01:
No Events written to Application or System Windows Event Logs
No errors in the Cluster Log. Just INFO logged
SQL Server Error Log: Error: 1203, Severity: 20, State 1
SQL Server Error Log: Process ID 58 attempting to unlock unowned resource
RID: 8:1:339:43
Those SQL errors report numerous times & then:
SQL Server Error Log: SQL Server Assertion: File: <lckmgr.cpp>, line-4792,
Failed Assertion = 'lockFound ==TRUE'
SQL Server Error Log: Stack Signature for the dump is 0xFEDF6C17
SQL Server Error Log: Using 'dbghelp.dll' version '4.0.5'*Dump thread...
SQL Server Error Log: Login failed for user 'sa'
That repeats about 30 times & that all for the logs...
SQLN02:
Numerous System Events recorded of lost communication with cluster & bring
SQLN02 into active mode. (Event 1123, 1209 & 1200)
System & Application Events show start of SQL Services.
System Event: 1069 (Failover Mgr) Cluster resource "SQL Server" in Resource
Group XYZ failed
System Event: 7035 The SQLSERVERAGENT service successfully sent a stop control
System Event: 7036 The SQLSERVERAGENT service successfully stopped
System Event: Multiple Events record successfully startup of SQL & Cluster
Service & then nothing in system events until previous shutdown was
unexpected.
Application Event: 17052 [sqsrvres] CheckServiceAlive: Service is dead
Application Event: 17052 [sqsrvres] OnlineThread: service stopped while
waiting for QP.
Application Event: 102 SQLServerAgent service successfully stopped.
Application Event: Multiple Events showing the restart of SQL & then
nothing until the power is cycled.
From the Cluster logs on SQLN02: (only WARN or ERROR) messages
0000090c.00000974::2005/05/19-20:16:02.815 WARN [NM] Communication was lost
with interface 2115bee3-ada3-4b23-94ab-67de328d0969 (node: SQLN01, network:
PUBLIC)
0000090c.00000a90::2005/05/19-20:16:02.815 WARN [NM] Updating local
connectivity info for network ac5dcdf2-4ec6-431c-8d0e-75a9f388f945.
00000bb8.00000d88::2005/05/19-20:16:28.895 WARN Physical Disk <Disk Q:>:
[DiskArb] Assume ownership of the device.
0000090c.00000a90::2005/05/19-20:16:28.911 WARN [NM] Leadership changed.
Cancelling connectivity report for network
ac5dcdf2-4ec6-431c-8d0e-75a9f388f945.
00000bb8.0000152c::2005/05/19-20:16:33.645 WARN Network Name <Cluster Name>:
Unable to read CreatingDC parameter, error=2
00000bb8.000004c0::2005/05/19-20:16:33.677 WARN Network Name <SQL Network
Name(VRSQL)>: Unable to read ResourceData parameter, error=2
00000bb8.000004c0::2005/05/19-20:16:33.770 WARN [ClNet] Tcpip is not bound
to adapter 0470CF36-FDC3-446D-8738-756DE859CB7A. (comment -> disabled network
adapter)
00000bb8.00000940::2005/05/19-20:16:38.286 WARN Physical Disk <Disk L:>:
[DiskArb] Assume ownership of the device.
00000bb8.0000165c::2005/05/19-20:17:19.759 ERR SQL Server <SQL Server>:
[sqsrvres] OnlineThread: service stopped while waiting for QP.
00000bb8.0000165c::2005/05/19-20:17:19.759 ERR SQL Server <SQL Server>:
[sqsrvres] OnlineThread: Error 1 bringing resource online.
00000bb8.00000bd4::2005/05/19-20:17:23.931 ERR SQL Server <SQL Server>:
[sqsrvres] CheckServiceAlive: Service is dead
0000090c.000009c4::2005/05/19-20:17:23.946 WARN [FM]
FmpHandleResourceTransition: Resource Name =
eea79949-1b03-42e8-a690-9dc987e72063 [SQL Server] old state=2 new state=4
Yes we run DBCC every weekend.
We also have AWE enabled @. 12GB RAM.
Looks like we will have to wait for a fix to sp4.
Thanks for your help!
James
"Mike Epprecht (SQL MVP)" wrote:
[vbcol=seagreen]
> Hi
> Have you run DBCC CheckDB on the databases?
> If you do not have more than 2GB RAM, you could install SP4 for SQL Server
> as there was one know issue after SP3a that could have caused this error.
> Regards
> --
> Mike Epprecht, Microsoft SQL Server MVP
> Zurich, Switzerland
> MVP Program: http://www.microsoft.com/mvp
> Blog: http://www.msmvps.com/epprecht/
>
> "Death_n_Gravity" wrote:
|||Hi
Have you run DBCC CheckDB on the databases?
If you do not have more than 2GB RAM, you could install SP4 for SQL Server
as there was one know issue after SP3a that could have caused this error.
Regards
Mike Epprecht, Microsoft SQL Server MVP
Zurich, Switzerland
MVP Program: http://www.microsoft.com/mvp
Blog: http://www.msmvps.com/epprecht/
"Death_n_Gravity" wrote:

> (sorry to post this here, but it looks like there is no activity on the
> cluster newsgroup)
> We are having an issue about every 3 months or so our SQL cluster
> (active/passive) will fail & go completely offline. The only recourse is to
> power off/on the boxes to restore connectivity.
> By the looks of it, the primary SQL node will fail & dump, the second
> passive node will sense the failure & come online, bring up the resources &
> start up SQL, but shortly after that the SQL Service will fail & the whole
> cluster will go down & is unreachable. (via tcp or desktop) We have to
> cycle the power.
> I appreciate any advise or insight you can give me on the is situation.
> Details are below.
> Have a good weekend!
> James
>
> Details:
> SQL Server Enterprise (2000)
> Build: 8.00.760 (SP3)
> Windows Enterprise Server (2003)
> Build: 5.2(3790)
> Basic Timeline & Errors:
> SQLN01:
> No Events written to Application or System Windows Event Logs
> No errors in the Cluster Log. Just INFO logged
> SQL Server Error Log: Error: 1203, Severity: 20, State 1
> SQL Server Error Log: Process ID 58 attempting to unlock unowned resource
> RID: 8:1:339:43
> Those SQL errors report numerous times & then:
> SQL Server Error Log: SQL Server Assertion: File: <lckmgr.cpp>, line-4792,
> Failed Assertion = 'lockFound ==TRUE'
> SQL Server Error Log: Stack Signature for the dump is 0xFEDF6C17
> SQL Server Error Log: Using 'dbghelp.dll' version '4.0.5'*Dump thread...
> SQL Server Error Log: Login failed for user 'sa'
> That repeats about 30 times & that all for the logs...
> SQLN02:
> Numerous System Events recorded of lost communication with cluster & bring
> SQLN02 into active mode. (Event 1123, 1209 & 1200)
> System & Application Events show start of SQL Services.
> System Event: 1069 (Failover Mgr) Cluster resource "SQL Server" in Resource
> Group XYZ failed
> System Event: 7035 The SQLSERVERAGENT service successfully sent a stop control
> System Event: 7036 The SQLSERVERAGENT service successfully stopped
> System Event: Multiple Events record successfully startup of SQL & Cluster
> Service & then nothing in system events until previous shutdown was
> unexpected.
> Application Event: 17052 [sqsrvres] CheckServiceAlive: Service is dead
> Application Event: 17052 [sqsrvres] OnlineThread: service stopped while
> waiting for QP.
> Application Event: 102 SQLServerAgent service successfully stopped.
> Application Event: Multiple Events showing the restart of SQL & then
> nothing until the power is cycled.
> From the Cluster logs on SQLN02: (only WARN or ERROR) messages
> 0000090c.00000974::2005/05/19-20:16:02.815 WARN [NM] Communication was lost
> with interface 2115bee3-ada3-4b23-94ab-67de328d0969 (node: SQLN01, network:
> PUBLIC)
> 0000090c.00000a90::2005/05/19-20:16:02.815 WARN [NM] Updating local
> connectivity info for network ac5dcdf2-4ec6-431c-8d0e-75a9f388f945.
> 00000bb8.00000d88::2005/05/19-20:16:28.895 WARN Physical Disk <Disk Q:>:
> [DiskArb] Assume ownership of the device.
> 0000090c.00000a90::2005/05/19-20:16:28.911 WARN [NM] Leadership changed.
> Cancelling connectivity report for network
> ac5dcdf2-4ec6-431c-8d0e-75a9f388f945.
> 00000bb8.0000152c::2005/05/19-20:16:33.645 WARN Network Name <Cluster Name>:
> Unable to read CreatingDC parameter, error=2
> 00000bb8.000004c0::2005/05/19-20:16:33.677 WARN Network Name <SQL Network
> Name(VRSQL)>: Unable to read ResourceData parameter, error=2
> 00000bb8.000004c0::2005/05/19-20:16:33.770 WARN [ClNet] Tcpip is not bound
> to adapter 0470CF36-FDC3-446D-8738-756DE859CB7A. (comment -> disabled network
> adapter)
> 00000bb8.00000940::2005/05/19-20:16:38.286 WARN Physical Disk <Disk L:>:
> [DiskArb] Assume ownership of the device.
> 00000bb8.0000165c::2005/05/19-20:17:19.759 ERR SQL Server <SQL Server>:
> [sqsrvres] OnlineThread: service stopped while waiting for QP.
> 00000bb8.0000165c::2005/05/19-20:17:19.759 ERR SQL Server <SQL Server>:
> [sqsrvres] OnlineThread: Error 1 bringing resource online.
> 00000bb8.00000bd4::2005/05/19-20:17:23.931 ERR SQL Server <SQL Server>:
> [sqsrvres] CheckServiceAlive: Service is dead
> 0000090c.000009c4::2005/05/19-20:17:23.946 WARN [FM]
> FmpHandleResourceTransition: Resource Name =
> eea79949-1b03-42e8-a690-9dc987e72063 [SQL Server] old state=2 new state=4

Monday, March 26, 2012

Question regarding configuration of an Active/Active cluster

Hey. I've a Active\Active with SQL 2000 on the nodes right now. I've to migrate to SQL 2005 and needed some help regarding using accounts.

Should I use the same account for the cluster service and the SQL Server/Agent service? If not what are the permissions I've to give to the cluster service account in SQL? I've a cluster with 2 nodes. What permissions should the SQL account be given on the box? Should it be a local admin or when installing, I give it the account and let SQL worry about giving permissions to the box? Thank you.

Tej,

According to MS:

- You should have different accounts for the cluster and SQl. Reason for this is that if the cluster account password changes, it does not affect SQL.

- You should create these account as normal domain user accoounts , but give them admin privileges on the SQL servers. When installing/configuring SQl and cluster services, these accounts will be given the correct permissions.

Check out this document

www.windowscluster.com/msdocs/confclus.doc

Question regarding configuration of an Active/Active cluster

Hey. I've a Active\Active with SQL 2000 on the nodes right now. I've to migrate to SQL 2005 and needed some help regarding using accounts.

Should I use the same account for the cluster service and the SQL Server/Agent service? If not what are the permissions I've to give to the cluster service account in SQL? I've a cluster with 2 nodes. What permissions should the SQL account be given on the box? Should it be a local admin or when installing, I give it the account and let SQL worry about giving permissions to the box? Thank you.

Tej,

According to MS:

- You should have different accounts for the cluster and SQl. Reason for this is that if the cluster account password changes, it does not affect SQL.

- You should create these account as normal domain user accoounts , but give them admin privileges on the SQL servers. When installing/configuring SQl and cluster services, these accounts will be given the correct permissions.

Check out this document

www.windowscluster.com/msdocs/confclus.doc

Question regarding "New Role Assignment"

Hi All,
I have share point server and report server on two different machines but in
the same domain. Our application categorizes users ,from Active Directory, in
Sharepoint sitegroups. For eg Marketing , Finance , Manager are sitegroups in
our application.
Is it possible to add these sitegroups in "New Role Assignment" for a given
report so that each member of the sitegroup can now access to the given
report?
Thanks in advance for any assistance provided.
KunjalTry using domain\Marketing
"Kunjal" wrote:
> Hi All,
> I have share point server and report server on two different machines but in
> the same domain. Our application categorizes users ,from Active Directory, in
> Sharepoint sitegroups. For eg Marketing , Finance , Manager are sitegroups in
> our application.
> Is it possible to add these sitegroups in "New Role Assignment" for a given
> report so that each member of the sitegroup can now access to the given
> report?
> Thanks in advance for any assistance provided.
> Kunjal
>|||I'm not sure if you can use SharePoint Sitegroups in New Role Assignment, as
they are probably SharePoint-only.
But you can use AD groups in Reporting Services, like Dillig says. If you
base both SharePoint Sitegroups and your Reporting Services Roles on the
same AD groups, you should be fine. Add <domainname>\<groupname> to the
roles you need to add them to.
Kaisa M. Lindahl
"Kunjal" <Kunjal@.discussions.microsoft.com> wrote in message
news:45CF5AC6-8DCA-4283-B0C5-1A136A77D12E@.microsoft.com...
> Hi All,
> I have share point server and report server on two different machines but
> in
> the same domain. Our application categorizes users ,from Active Directory,
> in
> Sharepoint sitegroups. For eg Marketing , Finance , Manager are sitegroups
> in
> our application.
> Is it possible to add these sitegroups in "New Role Assignment" for a
> given
> report so that each member of the sitegroup can now access to the given
> report?
> Thanks in advance for any assistance provided.
> Kunjal
>

Question RE /PAE In Active-Active Cluster

I have a 2003 Enterprise Ed 2 node cluster with each node having 8 gig of
memory.
On each active node, I am running a single named instance of sql server 2000
enterprise wiath latest svc packs.
The boot ini on the servers is currently configured with the /PAE switch only.
Each SQL named instance is defined to manage memory dynmaciclly. There are
no plans to add addtl memory to the servers.
I need to account for defined memory within each node to accomodate a
failover of either node to the other and run the resepctive SQL Server
instances.
What impact does the /PAE switch have on this particular configuration and
would it be better instead to use the /3GB switch? I believe that would
provide 3 gigs of memory to each sql server if running on the same node with
1 gig left for the O/S.
Would the /PAE switch not be necessary then?
thanks
Tom
/3GB, /PAE, and /AWE plus sp_configing your memory in SQL are great ideas,
but they all have to be monitored and test with your application, hardware,
and exact configuration. Have you called the hardware vendor to find out
what they suggest from a hardware stand point?
I am sure a true SQL DBA/MVP like Geoff will follow with a detailed SQL
explanation of the switches above for you
Cheers,
Rod
MVP - Windows Server - Clustering
http://www.nw-america.com - Clustering Website
http://msmvps.com/clustering - Blog
http://www.clusterhelp.com - Cluster Training
"Tom Frost" <TomFrost@.discussions.microsoft.com> wrote in message
news:269FF0A2-3F04-4767-8D67-15C9DC20AA95@.microsoft.com...
>I have a 2003 Enterprise Ed 2 node cluster with each node having 8 gig of
> memory.
> On each active node, I am running a single named instance of sql server
> 2000
> enterprise wiath latest svc packs.
> The boot ini on the servers is currently configured with the /PAE switch
> only.
> Each SQL named instance is defined to manage memory dynmaciclly. There are
> no plans to add addtl memory to the servers.
> I need to account for defined memory within each node to accomodate a
> failover of either node to the other and run the resepctive SQL Server
> instances.
> What impact does the /PAE switch have on this particular configuration and
> would it be better instead to use the /3GB switch? I believe that would
> provide 3 gigs of memory to each sql server if running on the same node
> with
> 1 gig left for the O/S.
> Would the /PAE switch not be necessary then?
> thanks
> Tom
>
|||Right now, your SQL Instances get about 1.6GB of physical RAM each. They
should increase memory usage until they get to that level then stay there
long term.
/PAE lets the Operating System see all the physical memory in the servers.
It is necessary anytime a 32-bit OS has more than 4 GB of physical RAM.
I might enable AWE memory in SQL. If you do, you must set a maximum memory
value for SQL on a multi-instance cluster. Two SQL behaviors combine to
make this a necessity. First, AWE kills dynamic memory allocation. Memory
is allocated at service startup and never shrinks. Second, the maximum
memory value is set to the actual physical memory amount. SQL grabs
everything. You will need to set the actual memory values on both instances
so they can "stack" on the same server without overcommitting memory. Be
sure and leave some for the OS. I would recommend 3GB or so, but you can
watch the Memory | Available MBytes counter to check.
As for the /3GB switch, you could go that route. You won't replace the /PAE
switch, you will add this switch to the boot.ini startup options.
One option I have used in this situation was to set up the two instances
with asymmetrical memory allocation. If you know one server is under higher
load, you can allocate more memory to that instance. I ran a similar system
to yours with 4GB on one instance, 2GB on the other instance, and 2GB left
for the OS. The perfmon counter SQLServer:Buffer manager | Page Life
Expectancy will let you know relative memory pressure. Be sure to use a
fairly long observation time with the server under normal workload to
determine true memory pressure.
Any way you go, you will need to manually set memory levels and monitor to
see of you have overcommitted memory. However, SQL loves memory so you may
see some performance gains. Use the Page Life Expectancy counter to see the
difference before and after.
Geoff N. Hiten
Senior Database Administrator
Microsoft SQL Server MVP
"Tom Frost" <TomFrost@.discussions.microsoft.com> wrote in message
news:269FF0A2-3F04-4767-8D67-15C9DC20AA95@.microsoft.com...
>I have a 2003 Enterprise Ed 2 node cluster with each node having 8 gig of
> memory.
> On each active node, I am running a single named instance of sql server
> 2000
> enterprise wiath latest svc packs.
> The boot ini on the servers is currently configured with the /PAE switch
> only.
> Each SQL named instance is defined to manage memory dynmaciclly. There are
> no plans to add addtl memory to the servers.
> I need to account for defined memory within each node to accomodate a
> failover of either node to the other and run the resepctive SQL Server
> instances.
> What impact does the /PAE switch have on this particular configuration and
> would it be better instead to use the /3GB switch? I believe that would
> provide 3 gigs of memory to each sql server if running on the same node
> with
> 1 gig left for the O/S.
> Would the /PAE switch not be necessary then?
> thanks
> Tom
>
|||The /3GB switch simply restricts the OS to 1GB of RAM, which allows
applications to address up to 3GB of RAM. The /PAE switch simply loads a
different NT kernel which contains the code to address RAM > 4GB.
The AWE setting in SQL Server works with the /PAE switch. If you enable
AWE, but don't turn on the /PAE switch, then you can not address the
extended memory space. So, in order to allow AWE to utilize the RAM above
4GB, you need to turn on the /PAE switch as well.
In terms of running multiple instances on a single machine, you need to
balance the RAM. (If your business requirements will allow performance
degradation in the event of a failover, then you don't necessarily need to
do this.) You control the maximum amount of RAM that SQL Server will
address by setting the max server memory config option.
Mike
http://www.solidqualitylearning.com
Disclaimer: This communication is an original work and represents my sole
views on the subject. It does not represent the views of any other person
or entity either by inference or direct reference.
"Tom Frost" <TomFrost@.discussions.microsoft.com> wrote in message
news:269FF0A2-3F04-4767-8D67-15C9DC20AA95@.microsoft.com...
>I have a 2003 Enterprise Ed 2 node cluster with each node having 8 gig of
> memory.
> On each active node, I am running a single named instance of sql server
> 2000
> enterprise wiath latest svc packs.
> The boot ini on the servers is currently configured with the /PAE switch
> only.
> Each SQL named instance is defined to manage memory dynmaciclly. There are
> no plans to add addtl memory to the servers.
> I need to account for defined memory within each node to accomodate a
> failover of either node to the other and run the resepctive SQL Server
> instances.
> What impact does the /PAE switch have on this particular configuration and
> would it be better instead to use the /3GB switch? I believe that would
> provide 3 gigs of memory to each sql server if running on the same node
> with
> 1 gig left for the O/S.
> Would the /PAE switch not be necessary then?
> thanks
> Tom
>
sql

question part 2

sorry.. should have added this to help...
Oldest active transaction:
> SPID (server process ID) : 144
> UID (user ID) : 6
> Name : implicit_transaction
> LSN : (106228:47115:1)
> Start time : Jan 28 2004 11:55:08:840AM
> DBCC execution completed. If DBCC printed error
messages, contact your system administrator.>--Original Message--
>sorry.. should have added this to help...
>Oldest active transaction:
>> SPID (server process ID) : 144
>> UID (user ID) : 6
>> Name : implicit_transaction
>> LSN : (106228:47115:1)
>> Start time : Jan 28 2004 11:55:08:840AM
>> DBCC execution completed. If DBCC printed error
>messages, contact your system administrator.
>.
>
P.S can anyone tell me how to see if its turn on or not...
sql server wiese... i now "set" will turn it off and on
but how can i check its current status.. if i turn it off
will it be for just THAT database or all of the server

Monday, March 12, 2012

Question on OUTPUT feature of DML

I need to audit inserts/updates/deletes on active tables to audit tables on a set of tables that have foreign key constaints with the update cascade and delete cascade defined. I can explicitly code the delete/update on the parent table to perform an OUTPUT to an audit table, but how do I OUTPUT the cascaded delete/update that happens on the child table because of the FK constraint with delete cascade defined without having to resort to triggers.

Thanks,

-chiraj

You might want to check into the use of DML triggers for this kind of issue, but this is not something that I have used much.


Dave

|||

Thanks for the response.

Using DML triggers is a no-brainer. I have used them all my life. I was wondering if it could be accomplished with the OUTPUT clause. I believe it is a limitation of the OUTPUT clause unless someone can show me otherwise.

Thanks,

-chiraj.