• Main
  • Blog
  • Who We Are
    • Jeremy Anderson
    • Amy Babinchak
    • Steve Banks
    • Cliff Galiher
    • Brian Higgins
    • Eriq Neale
    • Edwin Sarmiento
    • David Shackelford
  • Store
    • Webinar Archives
  • Support
  • Forum
  • FAQ
  • My Third Tier
  • Datto

Archive for Troubleshooting

Feb
14

Install or Uninstall of CAS Role Seems to Hang

by dave

Post to Twitter Post to Facebook Post to StumbleUpon

Sometimes when you are installing or uninstalling the CAS role from an Exchange 2010 server, the setup process hangs during the CAS installation/uninstallation process, right at the point that the GUI says:

performance counters for the client access server role are being configured

It would be good to wait up to 30 minute, but if you’ve waited longer than that, the process will not finish. If you check the App Log at this point, you will see some errors related to Exchange performance counters. If you cancel the setup using Task Manager and then restart the process, it will finish the second time without any significant issues. It’s happened to me three times so far, once during installation and twice during uninstallation, and quitting the process and restarting it again resolved the problem.

—–

So who wrote this blog and what do they do for a living anyway?
We’re Third Tier. We provide advanced Third Tier support for IT Professionals.
Third Tier Get Support BlogFeed Blog Twitter Twitter Facebook Facebook LinkedIn LinkedIN
0 Categories : Dave Shackelford, Exchange
Jan
13

Recovering “Hidden” Disk Space Used on SBS 2008 C: partitions

by Eriq

One of the significant differences in the minimum specs for installing SBS 2008 versus SBS 2003 was the minimum size of the C: partition needed for installation and operation. SBS 2008 requires a minimum of 60GB in the install partition or it won’t go. Those of us who were used to fighting the 12GB C: partition implemented by OEM vendors in SBS 2003 initially looked at that and thought “yeah, that’s a good change.” Well, as it turns out, kinda like the 4GB RAM minimum spec, the 60GB C: partition may not be big enough after all.

If you ask around those who have been doing SBS 2008 deployments, one of the best practices adopted by most is to use the Move Data Wizards in the Server Storage tab of the SBS 2008 Console and get the key data components off the C: partition and onto another partition (Exchange, SharePoint, User’s folders, User’s redirected documents, and WSUS content). And if you take the step that some do of installing third-party software to a partition other than C:, we should be ending up with a fairly pristine C: partition with minimal dynamic data on it. In theory.

I’ve been deploying my SBS 2008 installs with a 100GB C: partition simply because I figured that over time, something would find a way to suck up all the space on C: and we’d eventually get to a point where we’d have to deal with resizing paritions or doing manual data cleanup. I didn’t expect that I’d hit that scenario just over a year after my first SBS 2008 production deployment.

In the last couple of weeks, my monitoring tools have started chirping about low disk space on C: on a couple of installs. Sure enough, one installation had 17GB remaining of a 100GB partition, another had 3.5GB remaining on an 80GB partition (my own production box, and yeah, it really needs an overhaul, but that’s another story). I started digging around and found the most common disk hog that’s been complained about across the net, the winsxs folder. Based on everything I’ve been able to read about winsxs, including a post from the Windows Server Core Team, that’s something that we’ll just have to live with, and really isn’t the point of this post anyway. Still, on my boxes, the winsxs folder still only amounted to about 12GB (bigger than what I’d like, but certainly not the primary culprit) which is only about 10% of my standard install C: space. Something else had been sucking away space and keeping it from me.

We use TreeSize from JAM Software as a standard utility on our server deployments to help monitor disk space usage, as this is something that comes up from time to time. [NOTE: this is not a specific endorsement of TreeSize, just a note that it's one of the many tools that we use in our operation.] So in the case of these low-free-space servers, I fired up TreeSize and went looking for the disk hog. Surprisingly, I couldn’t find it. I did clear up some areas that showed a larger-than-expected usage, but couldn’t find the smoking gun. A few weeks have gone by, and while I’ve been monitoring the state of these servers to ensure that free space didn’t get critically low, other tasks moved up on the priority list.

Then a discussion on one of my private lists cropped up regarding this exact topic, and I learned two valuable tidbits from that discussion.

The first is that in order for TreeSize to see the contents of ALL folders on the C: partition, it must be Run As Administrator. Upon reflection, this makes sense, but I know it’s catching a lot of experienced system admins off-guard. Some are advocating disabling UAC on the server to avoid this kind of issue, and I’m honestly not fully decided where I stand on that, so I won’t comment either way on that. But it does serve as a reminder that many system tools we may have been using for years on 2003 servers might not behave the same way under 2008 if you don’t use the almighty Run As Admin option.

The second is that the WSUS site in IIS has been logging an OBSCENE amount of data into the IIS logs folder. One of my servers had nearly 30GB (yes, that’s 30 gigabytes) of data in the WSUS log folder (C:\inetpub\logs\LogFiles\W3SVC1372222313). Another had just over 20GB. And in looking in the folder, I saw numerous DAILY log files that were well over 100MB each, with some well over 200MB each.

Once I cleared out the old log files (honestly, how far back am I going to need to look at WSUS logs anyway?) the free space on C: increased to a reasonable level, and my monitoring stopped yelling at me quite so often.

There are multiple lessons learned from this experience for me. The first is the whole reminder about Run As Administrator in the Server 2008 era. I’ve even taken to labeling some shortcuts with “Run As Administrator” in the icon name just to serve as a reminder. The second lesson is that 60GB is certainly NOT going to be sufficient as a minimum partition size on a production SBS 2008 server, even if all other data is moved off to different volumes (and I haven’t even covered the option of moving the WSUS SQL database files off of C: to another partition, which can’t be done through wizards but must be done by hand). With winsxs and the WSUS logs as two items that will definitely be grabbing disk space unexpectedly (well, it’s expected now anyway), we can be sure that over time there will be others. And as stated on the Core Team blog, you can only expect that winsxs will continue to grow over time. If it’s 12GB now, how large will it be in a couple of years? The third lesson is that some logging that happens automatically on the server probably should not just be left unchecked. If you enable SMTP logging (which I do and recommend for troubleshooting purposes), you should clean out old SMTP logs on a regular basis. Well, now you can add WSUS/IIS logs to that approach as well. There are numerous posts out there for ways to script this process, and I’m evaluating the approach we’re going to take within our operation to make this happen for our customer base.

If you’ve been struggling with low disk space issues on SBS 2008 C: partitions, hopefully this information will help you get a better handle on the immediate actions as well as the long term strategy that you’ll develop for your particular environment.

Categories : Eriq Neale, SBS 2008
Dec
18

Another reason SBCore could shut down your server

by Eriq

Earlier this month an associate pinged me about an unusual situation. He had an SBS 2003 server that was shutting itself down periodically, claiming that it was doing so because there was another SBS server in the domain. Well, this is expected behavior if there is, in fact, another SBS server in the domain, but this particular network had only one server, the SBS sever, and not a single other server or history of another server in the network. Another unusual symptom of the behavior is that the server would remain up for a little over 24 hours before it would shut itself down because of the phantom SBS server. According to MS KB 925652 the SBS server will shut down every hour if it detects another SBS server in the domain, so clearly a different set of events were causing this behavior. The server was logging SBCore 1011 errors in the event logs, but only after the server had been online for about a day.

On a tip from a colleague at MS, we started to look for a possible memory leak in the system. I worked with my colleague to set up perfwiz and poolmon to try to identify the process (or processes) that were leaking. The theory was that a runaway leak could strip the server of valuable no-paged pool memory which could cause the SBCore check to fail and generate the errors and shutdown event. I must admit, perfwiz and poolmon never were my strong points, so even after we got some results back, the review didn’t come up with a smoking gun.

Then my associate found a tip that I’d not heard of before, even though I regularly modify settings where this tip was found. He opened the Task Manger on the server, selected the Processes tab, then opened Select Columns under the View menu. In here, he enabled the “Memory – Non-paged Pool” column and then sorted the Task Manager process list by that column. Sure enough, he not only quickly found the culprit, but also could sit and watch the Non-paged Pool count grow steadily right before his eyes. The service causing the problem? spoolsv.exe, the print spooler service.

A quick bit of Googling on his part ultimately led him to this post from Tek-Tips which helped him identify the root cause of the problem: HP Standard TCP/IP ports for printers on the sever. He changed the port types for the printers from HP Standard TCP/IP ports to Standard TCP/IP ports, and the server hasn’t shut down again since.

Turns out, there is a KB on this situation, too, MS KB 933999. And in going back and looking further, the server was logging the Srv 2019 errors in the event logs as well. Since we were sidetracked by the anomalous SBCore behavior, we did overlook the 2019 as a possible factor as well.

In the end, I learned two things from this. One, you can track non-paged pool memory usage in Task Manager (which really isn’t a *revelation* per se, just something that I wouldn’t have necessarily deliberately gone out and looked for), and two, memory leak issues can cause anomalous SBCore errors and the shutdown of an SBS server. The good news is that the server was shutting down “normally” because of the SBCore misfire instead of totally running out of non-paged pool memory and crashing, as MS KB 933999 points out can happen. Bottom line, customer happy, and tech support further educated!

Categories : Eriq Neale
Dec
9

Windows Activation Errors

by Eriq

One of the advantages of the activation process in newer versions of Windows is that you can install the OS in evaluation mode for 60 days without having to use a license key. Additionally, you can extend this evaluation for more than 60 days by following steps outlined in several public posts (I’m including this link to Sean Daniel’s post on this).

A critical step in this process, however, is the restart of the box AFTER the slmgr.vbs -rearm command has been run. If the system is NOT restarted after this process, some unusual behaviors can be observed. This post is to identify the specific errors that can result from this specific set of circumstances so that should someone run across this situation you can see what may be going on.

The Windows Activation Error from an slmgr -rearm without a restart.

I recently ran into this issue with an SBS 2008 server. When signing into the server, the above error dialog appeared on the server. Closing the error allowed continued normal use of the server, both from an interactive login point of view as well as from a remote resource use point of view. Checking the state of the activation window using the slmgr.vbs script generated the error below:

The error appears quickly (unlike the normal response of the slmgr.vbs script) and the key element is the error code. The 0xC004D302 indicates that an slmgr.vbs -rearm has been run, but the server has not been restarted. In the case of this system, a normal restart of the system returned the box to normal operation without Activation errors and slmgr.vbs ran correctly.

NOTE: This does not cover ALL possible causes for the Windows Activation Errors tied in with slmgr.vbs script errors. It is possible that this behavior could indicate other issues. But if you can log in and use the system “normally” after seeing this error (other activation errors prevent you from completing the login process and you never get to a desktop), chances are you just need to restart the server to return to normal behavior.

Categories : Eriq Neale
Aug
18

More Fun with SBS 2008 and Sharepoint Updates

by Eriq

Anyone who has been dealing with SBS 2008 for the last couple of months knows that there have been issues with recent Sharepoint and SBS 2008 updates:

Companyweb Inaccessible After Sharepoint 3.0 Service Pack 2

Files in Companyweb are Opening Read-Only After SBS 2008 UR2

Sharepoint Service 3 Search event errors after an SBS 2008 Update Rollup

Event 2436 for Sharepoint Services 3 Search

Bottom line, it’s not been an easy road. Fortunately, the SBS team have done a good job of documenting the issues as they come up. Unfortunately, not everything has been caught yet. As I found out this week.

I’ve had two new SBS 2008 deployments in the last two months. One a migration (won’t go there), and the other a clean install. Ironically, the clean install is the one that’s caused me the most grief. The initial install went smoothly, and we’ve been keeping up to date with all the updates. Based on the information above, we knew to install the Sharepoint 3 SP2 before installing SBS 2008 UR2, and flipped the database off of Read Only.

Yesterday, I went to create a new security group. I launched the Add Group Wizard from the SBS 2008 console and was immediately greeted with:

“Windows SBS 2008 Add Group Wizard has stopped working”

The first wizard screen never even launched. Of course, I started digging through the addgroup.log file in C:\Program Files\Windows Small Business Server\Logs, and found the following after hunting for several minutes:

An exception of type 'Type: System.Data.SqlClient.SqlException, System.Data, Version=2.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089' has occurred.

Message: Access to table dbo.Versions is blocked because the signature is not valid.

In the stack dump that followed, many of the references were to Sharepoint. “Ah ha!” I thought. “The Add Group Wizard also does some things in Sharepoint!” and I went off to look at Sharepoint. Sure enough, companyweb wouldn’t come up. So, I went back to  Companyweb Inaccessible After Sharepoint 3.0 Service Pack 2 and went through those steps again. I verified that the database was not read-only, then I went through and followed the steps to re-run the setup wizard from the command line. Uh, oh, got errors. Fortunately, the psconfig command had me look at the PSCDiagnostics log in C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\LOGS. Unfortunately, those logs didn’t really tell me anything useful. What I found was this:

08/17/2009 17:12:59  1  ERR        One or more configuration tasks has failed to execute

08/17/2009 17:12:59  1  INF        Entering function TaskDriver.Stop

08/17/2009 17:12:59  1  INF          Entering function StringResourceManager.GetResourceString

08/17/2009 17:12:59  1  INF            Resource id to be retrieved is PostSetupConfigurationFailedEventLog for language English (United States)

08/17/2009 17:12:59  1  INF            Resource retrieved id PostSetupConfigurationFailedEventLog is Configuration of SharePoint Products and Technologies failed.  Configuration must be performed in order for this product to operate properly.  To diagnose the problem, review the extended error information located at {0}, fix the problem, and run this configuration wizard again.

08/17/2009 17:12:59  1  INF          Leaving function StringResourceManager.GetResourceString

08/17/2009 17:12:59  1  ERR          Configuration of SharePoint Products and Technologies failed.  Configuration must be performed in order for this product to operate properly.  To diagnose the problem, review the extended error information located at C:\Program Files\Common Files\Microsoft Shared\Web Server Extensions\12\LOGS\PSCDiagnostics_8_17_2009_17_7_9_507_298886299.log, fix the problem, and run this configuration wizard again.

I actually found a reference to the solution in the comments in the  Companyweb Inaccessible After Sharepoint 3.0 Service Pack 2 post. Not directly, but one of the comments mentions that an account name was changed after the initial setup. I haven’t renamed any accounts, but I was reminded that I was running the psconfig command under a different account than had been used to initially install the Sharepoint SP2 update. I logged out of that account and logged back in with the account that was used to install the update, and the psconfig command completed successfully.

Woo hoo! Got it working! Only, http://companyweb and the Sharepoint Central Administration 3.0 sites still would not come up. I once again connected to the database via SQL Management Studio (reminder: run that with elevated permissions or you’ll never authenticate successfully) and verified that it was not read only. And the services were running. I checked the web site configuration in IIS and found the issue – all of the web sites had stopped. That’s when I remembered getting all the alerts overnight about the World Wide Web Publishing Service and the TS Gateway service being stopped. I had started them again first thing this morning and promptly forgot about them. Sure enough, when I checked again, they were both stopped (not surprised that the TS Gateway service stopped since it’s dependent upon the WWW Publishing service). I started both services and both companyweb and Sharepoint Central Administration were back online.

And I was able to finally add the one security group I needed to get added.

Takeaways from this process that aren’t documented in the SBS blog posts:

  1. If the Sharepoint SP2 update doesn’t take the first time and you need to run the psconfig command manually to complete the install, make sure you are running the command from the same user account that was used to attempt to install SP2 in the first place.
  2. Note that the psconfig command stops the World Wide Web Publishing Service (and TS Gateway) and does NOT restart them automatically.
Categories : Eriq Neale, SBS 2008
Jun
10

Getting your IP back

by Eriq

So you’re having trouble getting to the Internet? Can’t ping the Internet gateway? Can’t ping your own IP address? Have network adapters that refuse to enable or disable? Could be a corrupt IP stack. You can take a look at MSKB 299357, or you can follow these steps:

  1. Make sure you’re logged in with a local administrator account.
  2. Open a command prompt.
  3. Run the following command :
    netsh int ip reset logfile.txt
    where logfile.txt is the name of a file where the command can write its output.
  4.  When the command completes, run it again with a different filename for the output file. 
  5. When that run completes, run it one more time, again with a different filename for the log file.
  6. Restart the computer in Safe Mode with Networking.

This will reset the TCP/IP settings back to sane defaults, which means all adapters in the computer will be set for DHCP. If you’re doing this on an SBS server, restarting in Safe Mode with Networking is absolutely crucial in order to avoid the dreaded 30 minute reboot. When the computer comes back up, set the network settings as needed, then reboot normally.

You may still have other issues, but these steps will get you a nice, clean, DHCP-enabled set of network adapters in the system.

Categories : Eriq Neale
Apr
22

SSL Certificate Validation

by Eriq

I put up a post this morning regarding SSL certificate request validation over on the Third Tier web site. If you’ve been wondering how SSL certificates work in SBS 2008 or if you’re about to renew an SSL certificate on an SBS 2003 box, you might want to check out that post.

Categories : Eriq Neale, SBS 2008
Apr
20

Troubleshooting Tale: Remote Access Loss on Server

by Eriq

You can almost always count on interesting things happening during Update Weekend. Sometimes a patch will yield unexpected results, sometimes you lose access to the server after initiating a restart (and yet the server doesn’t actually restart), and so on. Well, this past weekend was no different, but the types of issues encountered was.

As such, I’m going to start a new series of posts in the vein of demonstrating how troubleshooting was approached during a particular situation to help others identify other possible troubleshooting steps or avenues when encountering problems. We’ll start with a rather typical behavior (restarted a server remotely and could not get access back to the server when it should have come up) that had a very unusual root problem.

As mentioned, this started when I lost access to the server in question following a remote restart request. When doing updates, we always do a clean restart of the system prior to installing updates to make sure the server will come up cleanly, so if there are problems, we know they’re NOT related to the updates. Anyway, I restarted this server in question Saturday morning at 8:30am, and by 9:00am I knew it wasn’t coming back. Not only could I not connect via RDP, but telnet to port 25 to check SMTP was also failing, so the server was pretty clearly not coming back.

I was able to reach a contact for this customer and got someone on site to take a look. Maybe it received a shutdown command instead of a restart, maybe they lost power, whatever. The on-site contact was able to log into the server, but it was running really slowly. We checked the basics: did it have a valid IP address, and it did. Was the server able to ping the default gateway, it could. Was the server able to ping www.google.com, it could not. Hm. Sounds like a DNS issue. I asked the on-site person to open the Services control panel, and it took about 5 minutes to open. Not good. At that point, I arranged for an on-site visit myself.

When I arrived, the server was running very sluggishly. I confirmed the tests we had already done: ipconfig is correct, basic networking is working (can ping the gateway and other internal resources by IP), but DNS was failing. I tried an nslookup and the DNS server timed out. OK, sounds like the DNS service isn’t running. Looked in the open Services console, and sure enough the DNS Server Service is in a Starting, but not Started, state. That’s when I noticed that a number of Automatic services were not started, including (but not limited to) DHCP server, Event Log, Terminal Services, SMTP, WINS Server, and a few others.

OK, so that explains why the server can’t get out to the Internet, and why I couldn’t remotely access the server. Now what? Let’s try to start some of the services and see if it’s just a startup glitch that kept them from launching at boot. I started with DHCP simply so we could get workstations back up if needed. DHCP Server wouldn’t start because one of its service dependencies didn’t start. OK, that’s another step towards the solution. Let’s look at the dependencies for the DHCP Server service and the other services that didn’t start and find a common service.

After looking at the dependencies for most of the services, the common thread is the EventLog service. So if we can get the EventLog service running, we’ll probably get several of the other started. Next step, let’s try to reboot into Safe Mode and see if that alters the behavior. So, we restart the server in Safe Mode with Networking, and have the same problems. EventLog and other services that should start in Safe Mode are not starting. At this point we reboot back into normal mode and troubleshoot from there.

So it’s possible that a corrupt event log file might be keeping the service from starting. So I went into C:\WINDOWS\system32\config and moved the event log files (*.evt) to a different directory and tried to start the EventLog service. It failed to come up, but only 4 log files got created, and I moved 8 or 9 out of the folder. Hm. What’s the last log that was created? The DNS log. Let’s take a look in the event viewer and see which logfile might be causing the problem.

Boom, that’s when I found the issue. Even though the event viewer couldn’t display the contents of the log files (since the service wasn’t started), I could see all the logs it wanted to display, and that’s when I found the errant log entry. One of the log files had a name that started with FSSCRM and looked more like an error message than a legitimate event log title. Since the event log service loads its component logs from the registry, I opened regedit and browsed to the HKLM\SYSTEM\CurrentControlSet\Services\Eventlog. Sure enough, I see a Key with the unusual name in there, and when I look at the values in that key, they point to places on the server that don’t exist. I saved the key to a registry file (just in case) and then deleted the key and closed the registry editor. When I attempted to launch the EventLog service again, it fired right up. As did all of the related services. Of course, we did another full reboot of the system to make sure all services started as expected, and sure enough they did.

While I still have no idea how this key got into the registry, or if it was a valid key that somehow got corrupted, we got the server back online and the system running, giving me time to do some research to see what service might have been associated with that erroneous log setting. But it also serves as a lesson that just because something looks like a networking problem doesn’t mean that it’s truly a networking problem at the core. And also another good reason why you shouldn’t go mucking around in the registry without good reason. One small incorrectly-formatted registry value effectively brought down this server, at least from the business owner’s perspective.

Categories : Eriq Neale
Apr
15

Remotely Installing This Month’s ISA Update

by Eriq

Just a heads-up for those of you who remotely install security updates for your customers. This month includes an update for ISA, and if you don’t know about it beforehand, you could end up in a bit of a jam.

As expected, when installing the ISA update, access to the Internet through the server is interrupted. Unlike some previous updates, however, when the installation of this update completes, Internet access is NOT restored. You don’t get Internet back until you restart the server.

So if you don’t have some mechanism in place for restarting the server automatically after updates install, you could find yourself, and your customer, in a rather unexpected place.

Categories : Eriq Neale
Apr
9

Troubleshooting Delayed Message Delivery in Exchange

by Eriq

As more and more anti-spam solutions start doing “interesting” things with SMTP and mail delivery, there is an increased chance of users reporting that mail messages to certain domains are delayed. Unlike a full non-delivery report (NDR) which will list the SMTP error codes for easy identification of the reason for the rejection, a delayed delivery report could be the result of an Internet connection issue, spam filter, offline server, or any number of other causes. The remainder of this post details how to track down possible causes for Internet delivery issues.

First, start with Exchange System Manager. After you open Exchange System Manager, expand Servers, expand the server, then select Queues.

Viewing the SMTP queues in ESM

Viewing the SMTP queues in ESM

Look for the connector with the domain that you are having trouble sending to. In the image above, it’s the last queue in the list. We can tell from ESM that there is a problem with this queue because it shows to be in a Retry status under the State column. And when you select the queue, look under Additional Queue Information at the bottom of the screen and you’ll see the result of the last connection attempt. In this case, we can see that the connection was dropped by the remote host. So, in this case, we were able to connect to the remote mail host, which rules out internet connectivity issues, and now we need to see why the remote host is dropping the connection. Before we can do that, we need a couple of other pieces of information.

If you double-click on the connector for the problematic domain, you will get the Find Messages window to open. Click on the Find Now button to see all the messages that are stuck in the queue:

Using Find Messages to view the hung messages in the queue

Using Find Messages to view the hung messages in the queue

In this example, we can see two messages that have been sent by the Administrator account are waiting in a Retry state in the queue. Now, we need one more piece of information, so double-click one of the messages.

Looking for the recipient in the hung message

Looking for the recipient in the hung message

If you look in the Recipients block, you can see the e-mail address of the recipient for this message. Remember that for later.

Next, we want to look in the SMTP logs to see if the remote server sent a valid SMTP code before it dropped the connection. Usually, when a remote host drops a connection, the SMTP service on the Exchange server does not log the code sent by the remote host before the connection is dropped, but we might get lucky. So, let’s open the LogFiles folder and see what the SMTP logs have to say. Open the start menu and enter the path to the LogFiles folder, usually C:\WINDOWS\system32\LogFiles

Opening the LogFiles folder

Opening the LogFiles folder

Now, if SMTP logging has been enabled on your server, you will have an SMTPSVC1 or similarly-named folder inside of the LogFiles folder.

SMTPSVC1 folder missing from LogFiles

SMTPSVC1 folder missing from LogFiles

In this example, we can see that the SMTP service has not had logging enabled. No worries, we can quickly and easily enable logging for our testing. Go back into ESM, expand Protocols under the server, expand SMTP, right-click on the Default SMTP Virtual Server, and select Properties.

Opening the properties of the Default SMTP Virtual Server

Opening the properties of the Default SMTP Virtual Server

Once you open the Properties, turn on the Enable Logging checkbox, then select Microsoft IIS Log File Format from the Active Log Format drop-down menu.

Enable the Microsoft IIS Log format logging

Enable the Microsoft IIS Log format logging

Close the Properties window and stop and restart the SMTP service on the server. You will probably need to force the connection again after you restart the SMTP service to ensure that SMTP makes another delivery attempt on the messages. Back in the Queues node, right-click on the problematic SMTP connector and select Force Connection.

Forcing teh SMTP connector to retry a connection

Forcing the SMTP connector to retry a connection

After the connection attempts and fails, you can go into the SMTPSVC1 folder that now appears under the LogFiles folder and open the log file to review the connection. If you already had logging enabled, you can force the connection attempt and then open the most recent SMTP log file to look for the connection data.

Here is the pertinent information from the log file in this example:

71.n.n.n, OutboundConnectionResponse, z/z/2009, 17:34:33, SMTPSVC1, SERVER, -, 31, 0, 117, 0, 0, -, -, 220 xx.com Microsoft ESMTP MAIL Service, Version: 6.0.3790.3959
71.n.n.n, OutboundConnectionCommand, z/z/2009, 17:34:33, SMTPSVC1, SERVER, -, 31, 0, 4, 0, 0, EHLO, -, yy.com,
71.n.n.n, OutboundConnectionResponse, z/z/2009, 17:34:33, SMTPSVC1, SERVER, -, 62, 0, 45, 0, 0, -, -, 250-xx.com Hello [70.n.n.n.n],
71.n.n.n, OutboundConnectionCommand, z/z/2009, 17:34:33, SMTPSVC1, SERVER, -, 62, 0, 4, 0, 0, MAIL, -, FROM:<
Administrator@yy.com>,
71.n.n.n, OutboundConnectionResponse, z/z/2009, 17:34:33, SMTPSVC1, SERVER, -, 78, 0, 59, 0, 0, -, -, 250 2.1.0 
Administrator@yy.com....Sender OK,
71.n.n.n, OutboundConnectionCommand, z/z/2009, 17:34:33, SMTPSVC1, SERVER, -, 78, 0, 4, 0, 0, RCPT, -, TO:<
mm@xx.com>,
71.n.n.n, OutboundConnectionResponse, z/z/2009, 17:34:44, SMTPSVC1, SERVER, -, 15, 0, 117, 0, 0, -, -, 220 xx.com Microsoft ESMTP MAIL Service, Version: 6.0.3790.3959
71.n.n.n, OutboundConnectionCommand, z/z/2009, 17:34:44, SMTPSVC1, SERVER, -, 15, 0, 4, 0, 0, EHLO, -, yy.com,
71.n.n.n, OutboundConnectionResponse, z/z/2009, 17:34:44, SMTPSVC1, SERVER, -, 47, 0, 45, 0, 0, -, -, 250-xx.com Hello [70.n.n.n.n],
71.n.n.n, OutboundConnectionCommand, z/z/2009, 17:34:44, SMTPSVC1, SERVER, -, 47, 0, 4, 0, 0, MAIL, -, FROM:<
Administrator@yy.com>,
71.n.n.n, OutboundConnectionResponse, z/z/2009, 17:34:44, SMTPSVC1, SERVER, -, 78, 0, 59, 0, 0, -, -, 250 2.1.0 
Administrator@yy.com....Sender OK,
71.n.n.n, OutboundConnectionCommand, z/z/2009, 17:34:44, SMTPSVC1, SERVER, -, 78, 0, 4, 0, 0, RCPT, -, TO:<
mm@xx.com>,

As suspected, the dropped connection from the remote site does not give us a complete SMTP transaction log on our Exchange server. We see the initial connection attempt, the EHLO command our server sends, the MAIL command out server sends, and the RCPT command our server sends. After that, the connection is reset by the other end, and the SMTP process on our server does not capture the information. Not to worry, we can still get that information. How? Telnet.

Open a command prompt on your server. Run the nslookup command. At the nslookup prompt, enter set type=mx and press Enter. Then enter the domain name of the site you are trying to send to and press Enter. You’ll get a response similar to:

Reading the results from the nslookup command

Reading the results from the nslookup command

The key piece of information needed is the mail exchanger, which will be the last item listed in the response. Make note of that server name. Now, in the same command prompt, type telnet mailserver 25, where mailserver is the name of the server you identified from the nslookup command. When the connection is made, type ehlo and press return. You should get a response similar to:

 

Connecting to the remote mail server

Connecting to the remote mail server

Now, type the following commands and press Enter after each one. You will use the FROM address that you got from the Find Now search in the ESM Queues, and you will use the TO address that you got earlier as well.

mail from: sender@domain.com
rcpt to: recipient@domain.com

In our case, we get our answer as soon as we provide the recipient’s address:

Responses from the remote SMTP server

Responses from the remote SMTP server

The remote mail server responds to the rcpt command with a 550 5.7.1 response, indicating that it will not accept the message. In this case, the remote host is using Trend Micro’s Email Reputation service, and that service, for whatever reason, has denied access for the sender to send mail to that recipient.

Unfortunately, because the remote server issues the response and then immediately drops the connection, the sending server never has an opportunity to log the response, so the message goes into a retry state, and the server will continue to try to deliver the message until the timeout value is reached (72 hours by default in Exchange) and then the sender will get an NDR indicating that the message could not be delivered within the timeout window. This doesn’t tell the sender that their message was blocked by a spam filter, and their only real recourse, without our troubleshooting, is to try to contact the recipient some other way and let the receipient know that the sender had problems getting an e-mail through.

I’m afraid that this type of SMTP behavior is only going to become more prominent, meaning that we will likely get called into action to try to figure out why a message never got delivered. So long as we have access to the sending mail server, it’s not that hard to figure out. Just follow these steps to find the SMTP code returned by the receiving mail server, and you can then continue troubleshooting from there.

Categories : Eriq Neale
Next Page »

Search

Support

Third Tier provides advanced support services to IT Professionals. Learn about what we do at http://www.thirdtier.net or click on the support icon below to chat with one of our support representatives.

Third Tier
Copyright © 2012 All Rights Reserved
iThemes Builder by iThemes
Powered by WordPress