Tag Archives: WinDbg

Examining PDB files with DBH

Wow, it’s been a ridiculously long time since I’ve blogged. I think it’s time I put something up just to break the curse, and this seemed like a good, and hopefully useful, place to start. Time to polish some of these dusty drafts into published gems.

Ever been in that situation where you (or someone else) finds that Visual Studio just won’t set a breakpoint in some source code that you’re sure should be being used? You’ll see the hollow breakpoint icon and something like ‘The breakpoint will not currently be hit. No symbols have been loaded for this document’.
Continue reading Examining PDB files with DBH

.NET 4.0 Type Equivalence causes BadImageFormatException

I recently discovered a nasty backward compatibility problem with the new type equivalence feature in .NET 4.0. Luckily it’s relatively difficult to hit it if you’re in a pure-C# environment, but if you happen to generate any assemblies directly using IL, you should watch out. Read on for all the gory details.
Continue reading .NET 4.0 Type Equivalence causes BadImageFormatException

Modifying the VC runtime to get better heap allocation stack traces

Today I was trying to track down some – how can I put this politely – “unusual” memory usage in some unmanaged code running inside Excel. I broke out WinDbg and tried the usual suspects to get an idea of how memory was being used. Unfortunately, the way that msvcr80.dll is built stopped me from getting decent stack traces for the allocations, so I decided to try and rebuild it with a fix to remedy the situation.
Continue reading Modifying the VC runtime to get better heap allocation stack traces

Beware of using stack-based COM objects from .NET

There are all sorts of nasty things to be aware of if you’re mixing reference-counted COM objects with garbage-collected .NET. For instance, if you’re implementing COM objects in C++ then you’re free to allocate them anywhere you like; on the heap or perhaps on the stack if you know they’re only used in some specific scope.

But what happens if during the lifetime of that stack based COM object, it gets used from .NET? A runtime callable wrapper (RCW) will be created around the object. And this RCW expects to be able to keep the underlying object alive by incrementing its reference count. Of course, the stack-based object will soon go out of scope, and regardless of its reference count the object will be destroyed and the pointer that the RCW contains will no longer be valid. It points into the stack, so when the RCW gets cleaned-up, the CLR will call via this pointer into memory that contains garbage and you’ll get something nasty like an access violation or illegal instruction exception.

Continue reading Beware of using stack-based COM objects from .NET

Getting IUnknown from __ComObject

I’m working in an environment with a lot of mixed managed (F#) and unmanaged (C++ COM) code. One of the big problems with this is the mix of lifetime management techniques; .NET uses garbage collection while COM relies on reference counting. Furthermore .NET garbage collection is somewhat non-deterministic, which adds further complexity.

So quite often in our mixed code-base, we find that the .NET garbage collection process doesn’t kick in when we need it to. For instance, when we’ve allocated a lot of memory in the COM code that .NET isn’t aware of. Memory exhaustion has to get pretty bad for the GC to occur at any other time than during a .NET allocation, either the system-wide low-memory event has to be signalled or an OutOfMemoryException needs to be thrown. In both of these cases it’s probably too late to do anything about it.

In this case it’s extremely useful to be able to see what .NET objects are still alive, and what COM objects they’re hanging on to. Unfortunately this isn’t as easy as it might seem.
Continue reading Getting IUnknown from __ComObject

Finding the largest free block of address space

BuildingsI’ve been seeing problems recently with fragmented virtual address space. During the lifetime of a process, bits and pieces of memory are allocated throughout the 2GB 32-bit address space to such an extent that large contiguous blocks of free space are no longer available. If anything subsequently requires a large block of memory (like, for instance, a somewhat out-of-date version of the GHC runtime), it will fail to get it.

It’s obvious looking at the output from VMmap or windbg’s !address command what the largest contiguous block is, e.g.

0:008> !address -summary
Largest free region: Base 07300000 - Size 63ed0000 (1637184 KB)

But what if you need that number in order to make a decision at run-time? For instance, to decide whether your process is in a fit state to continue, or if it should instead commit hara-kiri. In that case, you need to access the information programmatically. That’s where the immensely useful VirtualQueryEx function comes in…
Continue reading Finding the largest free block of address space

Diagnosing out of memory errors with VMMap

The other day a colleague pointed me to a new tool from Mark Russinovich et al (ex-SysInternals) that turns out to be very useful for diagnosing virtual memory/address space exhaustion issues. I thought I’d describe it here, and give some – hopefully useful – extra information on what it reports.

(I had problems with WordPress choking on this long post, so I’ve split it into 2 parts. This the first part, the second part is here).

Continue reading Diagnosing out of memory errors with VMMap

WinDbg !locks command broken

It seems that the extremely useful !locks command is broken in, the current and previous release of WinDbg from the debugging tools for Windows.

You’ll get errors like:

0:007> !locks
NTSDEXTS: Unable to resolve ntdll!RTL_CRITICAL_SECTION_DEBUG type
NTSDEXTS: Please check your symbols

The suggested solution seems to be to roll-back to version, available from here, or you can just replace the version of ntsdexts.dll in the c:\program files\debugging tools for windows (x86)\winxp directory with the one from the earlier release.

Judging by the error message, I’m guessing that the new version may work if you happen to be using a debug (checked) build of the Windows kernel, but I don’t have one around to try it with.

Troubleshooting an InvalidProgramException

One of the developers who uses the .NET/COM interop DLL that I wrote recently informed me that when they used it in a certain way, they would get a fatal System.InvalidProgramException. Nasty. So I set about seeing how I could find out exactly why this was happening: here are the gory details…

To spoil the ending somewhat, we already knew the precise situation that was causing the problem. This is different to the average issue, where most of the time’s spent in isolating the conditions that cause it. In this case we had an excellent positive and negative repro case.

The users were attempting to use the legacy .local approach to changing DLL probing. This was originally implemented around the time of Windows 2000, before the current side-by-side system existed, and basically short-circuited the normal DLL search order to inject the current directory as the first potential location. See the DLL redirection MSDN page for more information.

There was already a version of the assembly in question installed in the GAC, they just wanted to use the .local file to override it. The trouble was, with certain builds of the DLL (we’ll call it XYZ.dll) in the directory along with the redirection file (appname.exe.local), an fatal InvalidProgramException would be generated by the .NET execution engine. With other builds of the same DLL it worked. Mysterious.

So the first thing to do was to try and find out exactly where the exception was occurring. I ramped up WinDbg, and took a look at what exceptions were being thrown:

(42c.1e90): CLR notification exception - code e0444143 (first chance)
(42c.1e90): Unknown exception - code 02345678 (first chance)
(42c.1e90): C++ EH exception - code e06d7363 (first chance)
(42c.1e90): C++ EH exception - code e06d7363 (first chance)
(42c.1e90): C++ EH exception - code e06d7363 (first chance)
(42c.1e90): CLR exception - code e0434f4d (first chance)

The last one generated a second chance exception and the exit. I changed the exception filter set to stop on .NET notifications – which are generated when assemblies are loaded – and for C++ exceptions:

0:000> sxe clrn
0:000> sxe eh

After restarting and running on from the CLR notification (generated whenever an assembly is loaded) I got a stack trace on the C++ exception that was being generated within the CLR:

0:000> kP
ChildEBP RetAddr
0012eea4 78158e89 KERNEL32!RaiseException+0x53
0012eedc 79fce58e MSVCR80!_CxxThrowException+0x46
0012ef0c 7a015309 mscorwks!RealCOMPlusThrow+0xd8
0012f2ac 79f0e795 mscorwks!UnsafeJitFunction+0x365
0012f350 79e87f52 mscorwks!MethodDesc::MakeJitWorker+0x1c1
0012f3a8 79e8809e mscorwks!MethodDesc::DoPrestub+0x486
0012f3f8 00341f3e mscorwks!PreStubWorker+0xeb

Hmmmm, interesting, so it looks like the JIT compiler is failing. I also noticed that there was a strange exception that I didn’t recognise:

(42c.1e90): Unknown exception - code 02345678 (first chance)

You’ve got to love those “made up” numbers. I added an exception filter for it and restarted again (good job this failure was quick to manifest).

Now I got an even more precise fault location:

0:000> kP
ChildEBP RetAddr
0012ec44 790af56c KERNEL32!RaiseException+0x53
0012ec60 7909deea mscorjit!getJit+0x38
0012ec7c 79066362 mscorjit!Compiler::fgFindJumpTargets+0x33d
0012ece0 790614c6 mscorjit!Compiler::fgFindBasicBlocks+0x4d
0012ed20 79061236 mscorjit!Compiler::compCompile+0x2bf
0012edb4 7906118c mscorjit!jitNativeCode+0xb8
0012edec 79f0f9cf mscorjit!CILJit::compileMethod+0x3d
0012ee58 79f0f945 mscorwks!invokeCompileMethodHelper+0x72
0012ee9c 79f0f8da mscorwks!invokeCompileMethod+0x31
0012eef4 79f0ea33 mscorwks!CallCompileMethodWithSEHWrapper+0x84
0012f2ac 79f0e795 mscorwks!UnsafeJitFunction+0x230
0012f350 79e87f52 mscorwks!MethodDesc::MakeJitWorker+0x1c1
0012f3a8 79e8809e mscorwks!MethodDesc::DoPrestub+0x486
0012f3f8 00341f3e mscorwks!PreStubWorker+0xeb

So now its looking like something’s going wrong deep down in the JITter. After stepping through the disassembly for a while I decided to have a look at precisely what method was being compiled at this point. You can use the DumpMD command from the sos extension to examine the method, which is the first parameter to UnsafeJitFunction (you can see parameters in the stack trace by using kb):

0:000> !DumpMD 0161ad08
Method Name: XXX()
Class: 01670a40
MethodTable: 0161ad18
mdToken: 0601626b
Module: 00d3323c
IsJitted: no
m_CodeOrIL: ffffffff

Now this is suspicious; the method as listed (XXX – name changed to protect the innocent) isn’t what we were expecting. In fact, it’s completely different. Let’s see what IL is being JITted:

0:000> !DumpIL 161ad08
error decoding IL

OK, so this is completely broken metadata.

Now, given that we were using the .local, we would have expected to see the DLL being loaded from the same directory as the application, so lets take a look and see what we’ve got loaded:

0:000> !DumpDomain
System Domain: 7a3bc8b8
Assembly: 001abe28 [C:\WINNT\assembly\GAC_MSIL\XYZ.DLL]
ClassLoader: 001aa0b0
SecurityDescriptor: 001aa018
Module Name
00d3323c C:\WINNT\assembly\GAC_MSIL\XYZ.DLL

So this looks like the culprit; we’re still loading the version from the GAC, even though there’s a .local file present. Just to verify this I set a breakpoint on CreateFileW to show me which DLLs were being loaded and by what:

0:000> bu KERNEL32!CreateFileW "dpu (@esp+4) L1"

This tells the debugger to display the unicode string that is the first parameter to CreateFileW whenever it’s hit.
There are obviously lots of DLLs loaded, even in this simple case, but eventually it led me to the point where the GAC version was being loaded:

0012cf44 001ac5a0 "C:\WINNT\assembly\GAC_MSIL\XYZ.dll"
eax=00000000 ebx=00000001 ecx=79e7f683 edx=7c90eb94 esi=001ac4e0 edi=7c80ac0f
eip=7c810760 esp=0012cf40 ebp=0012cf64 iopl=0 nv up ei pl nz na po nc
cs=001b ss=0023 ds=0023 es=0023 fs=003b gs=0000 efl=00000202
7c810760 8bff mov edi,edi

0:000> kp L30
ChildEBP RetAddr
0012cf3c 79e90f12 KERNEL32!CreateFileW
0012cf64 79f29306 mscorwks!WszCreateFile+0x72
0012cfac 79ed27ac mscorwks!PEImage::GetFileHandle+0x60
0012cfdc 79e981ae mscorwks!PEImage::GetLayoutInternal+0xcf
0012d020 79e98e69 mscorwks!PEImage::GetLayout+0x8a
0012d0bc 79e98934 mscorwks!RuntimeOpenImageInternal+0x103
0012d104 79e988a9 mscorwks!GetAssemblyMDInternalImportEx+0x9d
0012d11c 79ec314d mscorwks!CreateMetaDataImport+0x16
0012d13c 79ec30ee mscorwks!CAssemblyManifestImport::Init+0x35
0012d164 79ed39ca mscorwks!CreateAssemblyManifestImport+0x53
0012d180 79ed3943 mscorwks!CreateAssemblyFromManifestFile+0x48
0012d3e8 79ed380f mscorwks!CheckExistsInGAC+0x179
0012d418 79ed320a mscorwks!CreateAssemblyFromCacheLookup+0x9b

So then I waited until the exception was fired, and used !dlls to show me what other DLLs were loaded:

0:000> !dlls
0x00243a68: C:\Temp\interopx\XYZ.dll
Base 0x036a0000 EntryPoint 0x00000000 Size 0x0095c000
Flags 0x90404000 LoadCount 0x00000001 TlsIndex 0x00000000

Ouch. We’ve loaded some metadata from the c:\winnt\assembly\GAC_MSIL\XYZ.dll version, and then we’ve actually loaded the c:\temp\interopx\XYZ.dll module. It turned out that in some cases the metadata matched, and we got away with it, and at other times (for different builds) it didn’t and we hit the InvalidProgramException.

So, the moral of the story is, don’t mix .local and the .NET assembly loader. And if you do, prepare to break out your WinDbg skills.