Wednesday, February 15, 2006

Newsletter #53 February 2006

I found this on and found it very interesting. I've copied what I see as the BIGGEST IT opportunity. But, go to the page and he has some superb tips on troubleshooting network problems.
Separate the C and Si Problems

I've solved a lot of network problems, but this one was a toughie.

"I've got a DHCP server that is delivering IP addresses to two segments. The systems on the same segment as the DHCP server are getting IP addresses with no trouble, but the systems on the other segment, none of them work!"

My first question (and probably yours, if you're a network techie) is, "does the router between the two segments pass DHCP requests?" (In geek-ese, you may know that the other way to say this is "does the router support RFC 1542 BOOTP forwarding?") Or alternatively, I ask, "is there a DHCP forwarder on the second segment?"

"Yes," the person replies, explaining that the router passes BOOTP packets.

Hmmm. So what else might it be? Check IP connectivity -- does the router block any particular port? If it's in a network with an Active Directory and the DHCP server is on a 2000 or 2003 server, has that server been authorized in AD? No port blocks, and yes, it's been authorized. That's when I realize that it was a stupid question -- if DHCP weren't working, the first segment wouldn't have IP addresses. Ah, but what if -- a eureka moment! -- somehow (1) the DHCP server hadn't been authorized for the past six days and for some reason all of the systems on the nearby segment still had lease time left but all of the ones on the second segment had their leases run out earlier, and so were the canaries in the coal mine? So I tell the person to try to do an IPCONFIG /RENEW on one system on each segment. The one of the first segment succeeds, the one on the second doesn't.

Ready for the answer? It's simple: the guy had no idea what the heck BOOTP forwarding was, figured that his router guys must have allowed for that -- after all, they did go to a CCNA boot camp -- and just told me what I wanted to hear. In other words, it is always possible that the carbon-based parts of the network ("C" is the symbol for the element carbon) don't report reliable information, and so the problem lay not in the silicon part of the network ("Si" is the symbol for the element silicon) but in the carbon component. To paraphrase Shakespeare, "the fault, dear Brutus, lay not in the chips but in the people."

Don't misunderstand me, I'm not saying that everyone lies or is incompetent. But I am saying that under stress people don't always think as clearly as they should, and that network support people have had a lot of new things thrown in their laps in the past few years -- remember when we "discovered" security in 2001, or that we all need database servers whether we want them or not in 2004? -- without receiving a concomitant increase in staffing. We're all just human. We make mistakes. Think about how we make silicon-based systems more reliable: we cluster them. The same thing works for carbon-based units: more eyeballs looking at a problem often make for a more quickly-solved problem.

And -- this is important -- remember that we techies tend to think of computer problems in terms of the silicon side sometimes more than we do the carbon side. In fact, sometimes we see the carbon side as being sort of minimal, and only relevant in a few cases. But if you sit back and think about most of the things that you have to fix, you'll end up seeing that most of those problems have a carbon component that is at least as important as the silicon component. I mean, Trojans don't write themselves, y'know?

I think that IT requires mostly people skills, so I dispute that geeks are *NOT* good with people. :P

update: corrected to include *NOT*, oops.

Newsletter #53 February 2006

No comments:

Post a Comment

All comments are moderated.

Note: Only a member of this blog may post a comment.