Analysing and Debugging
Memory Dumps (STOP Errors)
With WinDBG
For the last 9 years
I've been working for
Systems Integrator's that
have always had Microsoft
Premier support contracts.
So whenever I've had major
server issues, I've just
flicked the dumps off to
Microsoft for analysis.
Well...a couple
of years ago I thought
that I'd stop being so
lazy and learn how to
do it myself. And do you
know what? It is so easy
to do. We had an issue
on a customers site several
months ago where their
Citrix PS4 servers (Windows
2003) were intermittently
blue screening. As part
of our build process, a
16MB pagefile is placed
on the System drive and
the servers are set to
provide a Small (mini)
memory dump. So we were
already getting memory
dumps from these blue screens.


I installed the "Debugging
Tools for Windows" from
here: http://www.microsoft.com/whdc/devtools/debugging/installx86.mspx
Then ran up WinDbg (pronounced
WinDebug).

Go to File > Symbol
File Path...
Type in SVR*V:\symbols*http://msdl.microsoft.com/download/symbols
Note: Ensure you set the
drive path correctly. In
my case I was using V.
This allows WinDbg to
download the symbols needed
to help analyse the dump.
Select OK
Go to File > Open Crash
Dump...
On Windows 2003 servers,
the mini crash dumps are
found in the %SystemRoot%\Minidump
folder, which is U:\Windows\Minidump
in my case.
Open the relevant minidump.
Then we get lots of good
information in the WinDbg
window.
----------------------------Beginning----------------------------------
Microsoft (R) Windows
Debugger Version
6.6.0003.5
Copyright (c) Microsoft
Corporation. All rights
reserved.
Loading Dump File [U:\WINDOWS\Minidump\Mini071006-04.dmp]
Mini Kernel Dump File:
Only registers and stack
trace are available
Symbol search path is:
SVR*V:\symbols*http://msdl.microsoft.com/download/symbols
Executable search path
is:
Unable to load image ntoskrnl.exe,
Win32 error 2
*** WARNING: Unable to
verify timestamp for ntoskrnl.exe
*** ERROR: Module load
completed but symbols could
not be loaded for ntoskrnl.exe
Windows Server 2003 Kernel
Version 3790 (Service Pack
1) MP (4 procs) Free x86
compatible
Product: Server, suite:
TerminalServer
Kernel base = 0x80800000
PsLoadedModuleList = 0x808af988
Debug session time: Mon
Jul 10 16:41:43.015 2006
(GMT+8)
System Uptime: 0 days 0:02:47.656
Unable to load image ntoskrnl.exe,
Win32 error 2
*** WARNING: Unable to
verify timestamp for ntoskrnl.exe
*** ERROR: Module load
completed but symbols could
not be loaded for ntoskrnl.exe
Loading Kernel Symbols
.................................................................................................................
Loading User Symbols
Loading unloaded module
list
.....
Unable to load image cdm.sys,
Win32 error 2
*** WARNING: Unable to
verify timestamp for cdm.sys
*** ERROR: Module load
completed but symbols could
not be loaded for cdm.sys
*******************************************************************************
* *
* Bugcheck
Analysis *
* *
*******************************************************************************
Use !analyze -v to get
detailed debugging information.
BugCheck 10000050, {f000b1eb,
0, f62b28ec, 2}
***** Kernel symbols are
WRONG. Please fix symbols
to do analysis.
*************************************************************************
*** ***
*** ***
*** Your
debugger is not using the
correct symbols ***
*** ***
*** In
order for this command
to work properly, your
symbol path ***
*** must
point to .pdb files that
have full type information. ***
*** ***
*** Certain
.pdb files (such as the
public OS symbols) do not ***
*** contain
the required information. Contact
the group that ***
*** provided
you with these symbols
if you need this command
to ***
*** work. ***
*** ***
*** Type
referenced: nt!_KPRCB ***
*** ***
*************************************************************************
*************************************************************************
*** ***
*** ***
*** Your
debugger is not using the
correct symbols ***
*** ***
*** In
order for this command
to work properly, your
symbol path ***
*** must
point to .pdb files that
have full type information. ***
*** ***
*** Certain
.pdb files (such as the
public OS symbols) do not ***
*** contain
the required information. Contact
the group that ***
*** provided
you with these symbols
if you need this command
to ***
*** work. ***
*** ***
*** Type
referenced: nt!_KPRCB ***
*** ***
*************************************************************************
Probably caused
by : cdm.sys ( cdm+78ec
)
Followup: MachineOwner
---------------------------------End-----------------------------------------------------
See the line
above..."Probably caused
by : cdm.sys". It's giving us a
hint already :)
Now we need to analyse
it. Notice the "1:
kd>" in the lower
right corner of the debug
Window? This is where we
type in commands.

In the command window,
type !analyze –v
Please note the American
spelling of analyse.
This performs an analysis
with full verbose display
of data, which is used
for extracting as much
information as possible.
----------------------------Beginning----------------------------------
*******************************************************************************
* *
* Bugcheck
Analysis *
* *
*******************************************************************************
PAGE_FAULT_IN_NONPAGED_AREA
(50)
Invalid system memory was
referenced. This
cannot be protected by
try-except,
it must be protected by
a Probe. Typically
the address is just plain
bad or it
is pointing at freed memory.
Arguments:
Arg1: f000b1eb, memory
referenced.
Arg2: 00000000, value 0
= read operation, 1 = write
operation.
Arg3: f62b28ec, If non-zero,
the instruction address
which referenced the bad
memory
address.
Arg4: 00000002, (reserved)
Debugging Details:
------------------
***** Kernel symbols are
WRONG. Please fix symbols
to do analysis.
*************************************************************************
*** ***
*** ***
*** Your
debugger is not using the
correct symbols ***
*** ***
*** In
order for this command
to work properly, your
symbol path ***
*** must
point to .pdb files that
have full type information. ***
*** ***
*** Certain
.pdb files (such as the
public OS symbols) do not ***
*** contain
the required information. Contact
the group that ***
*** provided
you with these symbols
if you need this command
to ***
*** work. ***
*** ***
*** Type
referenced: nt!_KPRCB ***
*** ***
*************************************************************************
*************************************************************************
*** ***
*** ***
*** Your
debugger is not using the
correct symbols ***
*** ***
*** In
order for this command
to work properly, your
symbol path ***
*** must
point to .pdb files that
have full type information. ***
*** ***
*** Certain
.pdb files (such as the
public OS symbols) do not ***
*** contain
the required information. Contact
the group that ***
*** provided
you with these symbols
if you need this command
to ***
*** work. ***
*** ***
*** Type
referenced: nt!_KPRCB ***
*** ***
*************************************************************************
MODULE_NAME: cdm
FAULTING_MODULE: 80800000
nt
DEBUG_FLR_IMAGE_TIMESTAMP: 43d682a0
READ_ADDRESS: unable to
get nt!MmSpecialPoolStart
unable to get nt!MmSpecialPoolEnd
unable to get nt!MmPoolCodeStart
unable to get nt!MmPoolCodeEnd
f000b1eb
FAULTING_IP:
cdm+78ec
f62b28ec 8b44817c mov eax,[ecx+eax*4+0x7c]
MM_INTERNAL_CODE: 2
CUSTOMER_CRASH_COUNT: 4
DEFAULT_BUCKET_ID: DRIVER_FAULT_SERVER_MINIDUMP
BUGCHECK_STR: 0x50
LAST_CONTROL_TRANSFER: from
f62d5270 to f62b28ec
STACK_TEXT:
WARNING: Stack unwind information
not available. Following
frames may be wrong.
f48f3968 f62d5270 887a3008
88952e88 f48f3a9c cdm+0x78ec
f48f3a5c 8083f9d0 8a1f7580
887a3008 887a3008 cdm+0x2a270
f48f3a70 8092e269 f48f3c18
8a1f7568 00000000 nt+0x3f9d0
f48f3b58 80936caa 8a1f7580
00000000 888a0488 nt+0x12e269
f48f3bd8 80936aa5 00000000
f48f3c18 00000040 nt+0x136caa
f48f3c2c 80936f27 00000000
00000000 3000f001 nt+0x136aa5
f48f3ca8 80936ff8 0105f654
00100080 0105f63c nt+0x136f27
f48f3d04 8093d023 0105f654
00100080 0105f63c nt+0x136ff8
f48f3d44 80834d3f 0105f654
00100080 0105f63c nt+0x13d023
f48f3d64 7c82ed54 badb0d00
0105f60c 00000000 nt+0x34d3f
f48f3d68 badb0d00 0105f60c
00000000 00000000 0x7c82ed54
f48f3d6c 0105f60c 00000000
00000000 00000000 0xbadb0d00
f48f3d70 00000000 00000000
00000000 00000000 0x105f60c
STACK_COMMAND: .bugcheck
; kb
FOLLOWUP_IP:
cdm+78ec
f62b28ec 8b44817c mov eax,[ecx+eax*4+0x7c]
FAULTING_SOURCE_CODE:
SYMBOL_STACK_INDEX: 0
FOLLOWUP_NAME: MachineOwner
SYMBOL_NAME: cdm+78ec
IMAGE_NAME: cdm.sys
BUCKET_ID: WRONG_SYMBOLS
Followup: MachineOwner
---------------------------------End-----------------------------------------------------
What this does, is confirm
that the cdm.sys driver
is the cause of the blue
screens.
Type q in the command
window to quit.
So now I just had to do
some research on the cdm.sys
driver. From experience
I know that this is a Citrix
driver used for the client
drive mapping process,
but if you Google it, you
will find that information
anyway. So I then went
searching through the Citrix
KB and Forums, and found
the following hotfix.
Hotfix PSE400R01W2K3064 - For Citrix Presentation
Server 4.0 for Windows
Server 2003
Three of the listed fixes
are:
31. Servers may experience
a fatal error, displaying
a blue screen on CDM.sys
during heavy utilization.
The issue is found when
the driver verifier is
being used.
42. Servers experience
a fatal error, displaying
a blue screen on CDM.sys.
This occurs when an application
is accessing drive A
in a session using "A:" rather
than "A:\" or
if the application is
the first process to
access a client drive
in a session.
45. Servers are trapping
in CDM.sys with the following
STOP error message:
- DRIVER_VERIFIER_DETECTED_VIOLATION
(c4)
This patch was deployed
immediately, and there
have been no more blue
screens since. So I resolved
in 2 hours what could
have turned out to be a
24 to 48 hour process going
through Microsoft Premier
Support. The customer was
chuffed, and so was I.
If you want to play around
and learn how to do this,
you can get a program called "Not
My Fault", which was
developed by Mark Russinovich
formerly of Winternals. As you
can see in the screenshot
below, it can create some
serious issues for you
to practice on.

Get it from here: http://swatrant.blogspot.com/2005/12/notmyfault-fault-maker.html
For more information,
please refer to the following
helpful presentations
and articles:
I hope this info is not
only helpful, but gives
you confidence to tackle
these issues yourself.
|