Archive

Posts Tagged ‘Troubleshooting’

ESXi: issues with NFS datastore. Where do I put my tcpdump?

February 4th, 2011 1 comment

ESXi over NFS works just great!

But what if you have an issue with NFS and you need a network dump? 

In ESXi tipically you don’t have a local datastore where you can write files from the network dump and your datastore over NFS is not availabe!

Before running into the Data Centre and stick a USB disk or even better a SCSI disk you might want to try this. ;-)

One trick I used that worked out pretty well for me, with a little help of my a linux machine, is to send the tcpdump output to a FIFO and from a remote host (might be a VM in a different ESXi host) over SSH cat the FIFO to a local file.

How To:
On the ESXi host logon via SSH as root and create a named pipe:

root@yourESXihost# mkfifo /tmp/pipe.dmp

and from a remote linux machine launch the following:

you@yourlinuxhost > ssh root@youresxihost "cat /tmp/pipe.dmp" > capture-for-wireshark.cap

Now from a new ssh session to ESXi as root lauch

root@yourESXihost# tcpdump-uw -n -s 1524 -i vmk# -w /tmp/pipe.dmp

OR even better from the remote machine:

you@yourlinuxhost > ssh root@youresxihost "tcpdump-uw -n -s 1524 -i vmk# -w /tmp/pipe.dmp"
(replace the # with the proper vmk port number)

Reproduce your issue and when you finished just hit  “Cotrol+C” to stop the network dump and the cat.
Now you can open your file directly in wireshark (that’s what I use at least!)

This little trick of course can be used to troubleshoot network problems in a VM as well, dumping the traffic from a VMK# nic for the entire dvPortGroup. You just need to make sure that the the VM’s vNIC and the vmk# nic are connected to the same dvPortGroup and you must remember to allow promiscuous mode (not allowed by default)

Good Luck!

Please note: your network can be very chatty so the file can grow very fast and/or your ESXi host might not like the tcpdump so use it at your own risk and only if you really know what you are doing!

Undocumented Equallogic CLI Commands

June 26th, 2009 2 comments

Equallogic’s are very nice boxes; fast, robust and very scalable (linear!; adding an enclosure adds processing power spindles and cache!). They don’t have licenses to enable features, WYSIWG!.
But sometimes a bit of a “blackbox”. This has been greatly enhanced by the release of the Equallogic SAN HQ Software.
It would be nice however if they would support synchronous replication between two groups, they do support a-synchronous replication though. And if they were a bit more flexible on the networking side by supporting VLAN tagging for example.

For the people who want to have a bit more insights:

SSH into your Equallogic group, login and enter “support”.

Be aware of the following message!

You are running a support command, which is normally restricted to PS Series Technical Support personnel. Do not use without instruction from Technical Support.

When running “cachetool”:

eql-cachetool

When running “netstat -i”:

IP Statistics:
4137170846 total packets received
183707 total bad packets drop
0 bad header checksums
0 with size smaller than minimum
0 with data size < data length
0 with length > max ip packet size
0 with header length < data size
0 with data length < header length
0 with bad options
0 with incorrect version number
0 fragments received
0 fragments dropped (dup or out of space)
0 malformed fragments dropped
0 fragments dropped after timeout
0 packets reassembled ok
4136987139 packets for this host
0 packets for unknown/unsupported protocol
0 packets forwarded (0 packets fast forwarded)
183707 packets not forwardable
0 redirects sent
5530887305 packets sent from this host
0 packets sent with fabricated ip header
0 output packets dropped due to no bufs, etc.
47 output packets discarded due to no route
0 output datagrams fragmented
0 fragments created
0 datagrams that can’t be fragmented

There are more commands to discover. Try TAB completion and mind the difference between “Bad Command” & “Ambigious command”. These commands are not shown when using “help” and most have a -? or -h option.

AGAIN: ONLY DO THIS ON TEST SYSTEMS AND WHEN YOU KNOW WHAT YOU’R DOING ONLY!… Don’t come whining here if stuff breaks…

DFSR Debug Logging Explained

June 18th, 2009 No comments

While troubleshooting some DFSR today, I came across this very nice and detailed post from the Directory Services Team.

From: http://blogs.technet.com/askds/archive/2009/03/23/understanding-dfsr-debug-logging-part-1-logging-levels-log-format-guid-s.aspx

Ned here again. Today begins a 21-part series on using the DFSR debug logs to further your understanding of Distributed File System Replication. While there are specific troubleshooting scenarios that will be covered, the most important part of understanding any products logging is making sure you are comfortable with it before you have errors. That way you have some point of reference if things go wrong.

As you can probably guess, these posts were a long time in development. They are based on an internal DFSR whitepaper I have worked on for six months, and which went through review by a number of excellent folks here in Support, Field Engineering, and the Product Group itself. Except for the removal of all private source code references, this series is otherwise unchanged.

I’ll start with a couple posts on the logs themselves, how they are formatted, how they can be controlled, etc. Then I’ll dig into scenarios in detail, for both Windows Server 2003 R2 and Windows Server 2008. Don’t feel like you have to read and memorize everything – this series is a reference guide as well.

Understanding DFSR debug logging (Part 1: Logging Levels, Log Format, GUID’s)
Understanding DFSR debug logging (Part 2: Nested Fields, Module ID’s)
Understanding DFSR debug logging (Part 3: The Log Scenario Format, File Added to Replicated Folder on Windows Server 2008)
Understanding DFSR debug logging (Part 4: A Very Small File Added to Replicated Folder on Windows Server 2008)
Understanding DFSR debug logging (Part 5: File Modified on Windows Server 2003 R2)
Understanding DFSR debug logging (Part 6: Microsoft Office Word 97-2003 File Modified on Windows Server 2008)
Understanding DFSR debug logging (Part 7: Microsoft Office Word 2007 File Modified on Windows Server 2008)
Understanding DFSR debug logging (Part 8: File Deleted from Windows Server 2003 R2)
Understanding DFSR debug logging (Part 9: File is Renamed on Windows Server 2003 R2)
Understanding DFSR debug logging (Part 10: File Conflicted between two Windows Server 2008)
Understanding DFSR debug logging (Part 11: Directory created on Windows Server 2003 R2)
Understanding DFSR debug logging (Part 12: Domain Controller Bind and Config Polling on Windows Server 2008)
Understanding DFSR debug logging (part 13: A New Replication Group and Replicated Folder between two Windows Server 2008 members)
Understanding DFSR debug logging (Part 14: A sharing violation due to a file locked upstream between two Windows Server 2008)
Understanding DFSR debug logging (Part 15: Pre-Seeded Data Usage during Initial Sync)
Understanding DFSR debug logging (Part 16: File modification with RDC in very granular detail (uses debug severity 5))
Understanding DFSR debug logging (Part 17: Replication failing because of blocked RPC ports (uses debug severity 5))
Understanding DFSR debug logging (Part 18: LDAP queries failing due to network (uses debug severity 5))
Understanding DFSR debug logging (Part 19: File Blocked Inbound by a File Screen Filter Driver (uses debug severity 5))
Understanding DFSR debug logging (Part 20: Skipped temporary and filtered files (uses debug severity 5))
Understanding DFSR debug logging (Part 21: File replication performance from throttling (uses debug severity 5))


Dtrace for Windows? Windows Performance Toolkit

June 17th, 2009 No comments

So you have performance troubles on Windows, you probably already pulled the sysinternals from the shelve. But did you already know the Windows Performance toolkit for hardcore performance troubleshooting?

This toolkit has three tools;

xperf.exe – Captures traces, post-processes them for use on any machine, and supports command-line (action-based) trace analysis.

xperfview.exe – Visual Trace Analysis Tool – Presents trace content in the form of interactive graphs and summary tables

xbootmgr.exe – Automates on/off state transitions and captures traces during these transitions.

So what do these tools do?

Performance Analyzer is built on top of the Event Tracing for Windows (ETW) infrastructure. ETW enables Windows and applications to efficiently generate events, which can be enabled and disabled at any time without requiring system or process restarts. ETW collects requested kernel events and saves them to one or more files referred to as “trace files” or “traces.” These kernel events provide extensive details about the operation of the system. Some of the most important and useful kernel events available for capture and analysis are context switches, interrupts, deferred procedure calls, process and thread creation and destruction, disk I/Os, hard faults, processor P-State transitions, and registry operations, though there are many others.

One of the great features of ETW, supported in WPT, is the support of symbol decoding, sample profiling, and capture of call stacks on kernel events. These features provide very rich and detailed views into the system operation. WPT also supports automated perf testing. Specifically, xperf is designed for scripting from the command line and can be employed in automated performance gating infrastructures (it is the core of Windows PerfGates). xperf can also dump the trace data to an ANSI text file, which allows you to write your own trace processing tools that can look for performance problems and regressions from previous tests.

More info:

http://blogs.msdn.com/ntdebugging/archive/2008/04/03/windows-performance-toolkit-xperf.aspx
http://msdn.microsoft.com/en-us/performance/cc825801.aspx
http://download.microsoft.com/download/5/E/6/5E66B27B-988B-4F50-AF3A-C2FF1E62180F/COR-T594_WH08.pptx

Download the tools here:

YouTube Preview Image