Database mirroring connection error 4 'An error occurred while receiving data: '10054(An existing connection was forcibly closed by the remote host.)
Recently while working on DR test, I have come across unknown
issue. I could not find much help from google on this. This is regarding
database mirroring and this should stand same for AlwaysON up to some extent as
they both use Endpoints to communicate and service accounts for encrypting data
over endpoints.
Problem description:
Part of the DR exercise we have disabled Mirroring on MSSQ 2014
databases and brought DR DB’s online. Post DR exercise when we try to re-establish
mirroring we have started getting below error:
Primary logs:
2018-03-20 22:50:40.120 spid40s Database mirroring connection error 4 'An
error occurred while receiving data: '10054(An existing connection was forcibly
closed by the remote host.)'.' for 'TCP://XX.XX.XX.XX:7077'.
2018-03-20 22:50:55.780 spid22s Error: 1443, Severity: 16, State: 2.
As part of general trouble shooting I have stopped and
started Endpoints on both servers no luck. Tried telnet from both servers
everything was fine and no issues found.
Then tried researching error logs as they say for any issues
a good troubleshooting will always start from SQL Server error logs. But in
this case, we must check SQL Server error logs from both Primary and DR
servers, as the issue is related to mirroring and we could not find any useful information
from Primary.
When checked Mirror or DR server error logs surprisingly
handshake errors were logged. This helped to find root cause the fix the issue.
Mirror logs:
2018-03-20 22:54:15.160 Logon Database Mirroring login attempt failed
with error: 'Connection handshake failed. An OS call failed: (80090311)
0x80090311(No authority could be contacted for authentication.). State 67.'. [CLIENT: xx.xx.xx.x]
2018-03-20 22:54:20.200 Logon Database Mirroring login attempt failed
with error: 'Connection handshake failed. An OS call failed: (80090311)
0x80090311(No authority could be contacted for authentication.). State 67.'. [CLIENT: xx.xx.xx.x]
2018-03-20 23:06:55.010 Logon Error: 17806, Severity: 20, State: 14.
2018-03-20 23:06:55.010 Logon SSPI handshake failed with error code
0x8009030c, state 14 while establishing a connection with integrated security;
the connection has been closed. Reason: AcceptSecurityContext failed. The
Windows error code indicates the cause of failure. The logon atte
2018-03-20 23:06:55.010 Logon Error: 18452, Severity: 14, State: 1.
2018-03-20 23:06:55.010 Logon Login failed. The login is from an
untrusted domain and cannot be used with Windows authentication. [CLIENT: xx.xx.xx.x]
Then logged to both servers using service accounts to make
sure Service account is working fine. From Primary server both DB instances
were accessible from DR server Primary instance was not accessible.
This gave a hint of confidence that I am on right path and
helped to fix the issue.
Possible solutions
are:
1.
If the service account password is changed make
sure to update on both SQL Servers and restart the Services this should solve
if issue is related to service account password.
2.
Sometimes this might not resolve the issue. If your
hosted on complex domains like what happened in this case. Servers were hosted
on different domain and service accounts used were from different domain. To fix
this we have make sure proper trust is established between these two domains so
that authenticity of the service account can be validated. This will be taken
care by AD admins. This helped resolve the issue.
Hope this helps someone like me 😊.
No comments:
Post a Comment