Best Practices: Do NOT Use External DNS Servers For Internal Servers' IP Configurations... Here's Why...

First, yes… I’ll let the cat out of the bag…

Figure 1: Bag Cat

…that this is not new information.  Y’all know (and have known) for some time that it’s never a good idea to use external DNS servers for internal servers’ IP configurations, but you may be asking yourself:

“But Dagan, what’s the harm?   I mean, if it’s not a Domain Controller, why shouldn’t I add an external DNS server as my tertiary DNS server in the server’s IP configuration?  You know… just in case the DCs decide to call Ralph on the porcelain telephone, they’ll still be able to get to the Internet… so who cares?”

Well, for one, the folks who can’t use Citrix.

Huh?

Let me explain by way of example.

I received a critical call at 3:00 PM today from a client who said that, “Nobody can connect to Citrix… everybody is getting an ‘RPC server unavailable’ message!”

Looking back in the Event Logs, I see the following error:

Event Type: Error

Event Source:     Userenv

Event Category:   None

Event ID:   1053

Date:       12/5/2011

Time:       1:55:07 PM

User:       NT AUTHORITY\SYSTEM

Computer:   [REDACTED]

Description:

Windows cannot determine the user or computer name. (The RPC server is unavailable. ). Group Policy processing aborted.

I also saw a whole slew of errors pertaining to an inability “to start XXXXX.exe.  The RPC server is unavailable.”  It’s only after digging further that I found the “magic event” that pointed me in the networking direction (emphasis added):

Event Type: Error

Event Source:     NETLOGON

Event Category:   None

Event ID:   5719

Date:       12/5/2011

Time:       2:56:58 PM

User:       N/A

Computer:   [REDACTED]

Description:

This computer was not able to set up a secure session with a domain controller in domain [REDACTED] due to the following:

The RPC server is unavailable. 

This may lead to authentication problems. Make sure that this computer is connected to the network. If the problem persists, please contact your domain administrator.

ADDITIONAL INFO

If this computer is a domain controller for the specified domain, it sets up the secure session to the primary domain controller emulator in the specified domain. Otherwise, this computer sets up the secure session to any domain controller in the specified domain.

So, the server can’t contact any domain controllers in the domain.

Let’s ping the internal domain (yes, this company had matching internal and external domains… we’ll ignore this bad juju for now):

[REDACTED]>ping internaldomain.com

Pinging internaldomain.com [74.205.X.X] with 32 bytes of data:

Reply from 74.205.X.X: bytes=32 time=10ms TTL=246

Reply from 74.205.X.X: bytes=32 time=10ms TTL=246

Reply from 74.205.X.X: bytes=32 time=9ms TTL=246

Reply from 74.205.X.X: bytes=32 time=9ms TTL=246

Ping statistics for 74.205.X.X:

    Packets: Sent = 4, Received = 4, Lost = 0 (0% loss),

Approximate round trip times in milli-seconds:

    Minimum = 9ms, Maximum = 10ms, Average = 9ms

So, as we can see, this is resolving outside of the 192.168.X.X network to a public IP range of 74.205.X.X… that’s odd…

Even more odd… if I do an nslookup, the primary DNS server in the 192.168.X.X range responds happily with the correct information… so, that means one of two things:

Option 1: There’s a HOSTS file entry.

Option 2: It’s a cached DNS entry.

So, I checked %WINDIR%\System32\Drivers\etc\HOSTS, but it’s unmodified.  Option 1 is out.

A quick check of the IP configuration (in the Advanced screen, no less) revealed the following DNS servers:

192.168.X.X

192.168.X.X

4.2.2.2

So, then the question becomes, “How did the first two DNS servers fail to the point where the DNS query hit the tertiary DNS record?”

A quick check of the patching/reboot schedules shows the problem… the Domain Controllers are rebooting within half an hour of each other, so it’s possible (though calamitously unlikely) that both servers were inaccessible either due to patching operations or reboot operations taking place simultaneously.

So, let’s fix this puppy…

Get it?  Moving on…

1) I removed the 4.2.2.2 tertiary DNS server entry from the IP configuration of the Citrix server

2) I ran an ipconfig /flushdns from the command line

3) I checked for name resolution both of the internal domain and the primary DNS server… happily, we’re now getting a 192.168.X.X response (like we should)

4) I attempted to connect to the Citrix server and was able to get in

So, in a timeline:

·         3:49 AM: Both DCs go down at the same time, and the Citrix server resolves the internal domain to an external IP address on account of the tertiary DNS server being set to the external DNS IP address of 4.2.2.2… this entry gets cached until the TTL expires (evidently 4.2.2.2, being a caching DNS server, gave the internaldomain.com A records a long TTL)

·         1:55 PM: The Citrix server starts quietly freaking out about the fact that it can’t contact the domain

·         2:57 PM: The Citrix server slams on the brakes and says, “Nope, I’m not running any more programs or allowing any more connections until you let me talk to my lawyer… erm, I mean, the domain!”

·         3:00 PM: The client calls to say that his Citrix server is down

·         3:35 PM: After the troubleshooting & changes 1-4 detailed above, the Citrix server started working again without a reboot

…and in summary:

Please only use internal DNS server IP addresses on internal servers’ respective IP configurations.  Oh, and tip your wait staff.  :-)

views