Beware of using stack-based COM objects from .NET

There are all sorts of nasty things to be aware of if you’re mixing reference-counted COM objects with garbage-collected .NET. For instance, if you’re implementing COM objects in C++ then you’re free to allocate them anywhere you like; on the heap or perhaps on the stack if you know they’re only used in some specific scope.

But what happens if during the lifetime of that stack based COM object, it gets used from .NET? A runtime callable wrapper (RCW) will be created around the object. And this RCW expects to be able to keep the underlying object alive by incrementing its reference count. Of course, the stack-based object will soon go out of scope, and regardless of its reference count the object will be destroyed and the pointer that the RCW contains will no longer be valid. It points into the stack, so when the RCW gets cleaned-up, the CLR will call via this pointer into memory that contains garbage and you’ll get something nasty like an access violation or illegal instruction exception.

It’s fairly easy to reproduce this to see where things go wrong. It can be useful to see where the CLR blows up, and how we can identify this as the cause.

Lets start by creating a simple pseudo-COM object that implements just the bare minimum to be usable:

class MyClass : public IUnknown
	MyClass():l(0) {}
	STDMETHOD_(ULONG, AddRef)() { return InterlockedIncrement(&l); }
	STDMETHOD_(ULONG, Release)() { return InterlockedDecrement(&l); }
	STDMETHOD(QueryInterface)(REFIID iid, void ** ppvObject)
		if (iid == IID_IUnknown)
			*ppvObject = this;
                        return S_OK;
	long l;

We’ll also need a COM visible .NET object that will use the object. It doesn’t actually need to be COM visible, but that’s the easiest way to access it from C++, in my opinion.

I’ve created the COM object in F#. It’s a trivial class that has a single interface, with a single method that takes the object we pass to it and prints its type. This is enough for the RCW to be created.

open System
open System.Runtime.InteropServices

module Module1 =
    type public IConsumer = 
        abstract member UseObject : o:obj -> unit
    type public Consumer() =
        interface IConsumer with
            member this.UseObject (o:obj) =
                Console.WriteLine (sprintf "%A" (o.GetType()))

We can compile this into a DLL, then run regasm with the /tlb switch to generate a type library (TLB):

fsc -o:obj\Debug\testStackObjectsFs.dll Module1.fs
regasm /tlb:testStackObjectsFs.tlb testStackObjectsFs.dll

That can be #imported back into our test harness:

#import "testStackObjectsFs.tlb"

Now we’re ready to put together some code that creates an instance of our object on the stack and passes it to our .NET component:

void Foo()
	// Create an instance of our "COM object" on the stack
	MyClass obj;
	// Create a managed object
	testStackObjectsFs::IConsumerPtr mgd(__uuidof(testStackObjectsFs::Consumer));

	// and pass our COM object to it

int _tmain(int argc, _TCHAR* argv[])
	// Initialise the COM runtime, for our purposes it doesn't
	// matter which threading model we use

	// Call a separate function, to ensure stack-based objects
	// are out-of-scope on return.

	// Wait for some input
	return 0;

Now, if you run this from within Visual Studio, if you’re vigilant (and you haven’t got your debugger set to stop on access violations), then you’ll notice this in the output window after the return statement executes:

The thread 'Win32 Thread' (0x15b0) has exited with code 11001 (0x2af9).
The thread 'Win32 Thread' (0x1110) has exited with code 0 (0x0).
First-chance exception at 0x00850a2b in testStackObjects.exe: 0xC0000005: Access violation reading location 0x00850a2b.
The thread 'DebuggerRCThread::ThreadProcStatic' (0x1534) has exited with code 0 (0x0).
The thread 'RPC Callback Thread' (0x12b8) has exited with code 0 (0x0).

Lets ramp up WinDbg, attach to the process (that _getch comes in useful here) and find out what’s going on in a bit more detail.

If we let the app run to the point of failure in WinDbg, we can see that the CLR was in the act of shutting down when it caused the exception:

0:002> kp
ChildEBP RetAddr
WARNING: Frame IP not in any known module. Following frames may be wrong.
00dae3fc 79f4c1b5 0xe06ff8
00dae450 79f4c26c mscorwks!ReleaseTransitionHelper+0x5f
00dae494 79f4c2d0 mscorwks!SafeReleaseHelper+0x8c
00dae4c8 79faaa01 mscorwks!SafeRelease+0x2f
00dae4fc 79faa7c8 mscorwks!IUnkEntry::Free+0x68
00dae510 79faa91d mscorwks!RCW::ReleaseAllInterfaces+0x18
00dae540 79faa949 mscorwks!RCW::ReleaseAllInterfacesCallBack+0xbd
00dae570 7a0792ac mscorwks!RCW::Cleanup+0x22
00dae57c 7a079714 mscorwks!RCWCleanupList::ReleaseRCWListRaw+0x16
00dae5ac 7a0797df mscorwks!RCWCleanupList::ReleaseRCWListInCorrectCtx+0xdf
00dae5fc 79fdc140 mscorwks!RCWCleanupList::CleanupAllWrappers+0x77
00dafe90 79fdc7aa mscorwks!RCWCache::ReleaseWrappersWorker+0x103
00dafed8 79fd9f95 mscorwks!ReleaseRCWsInCaches+0x27
00dafee0 79f3c76a mscorwks!InnerCoEEShutDownCOM+0x1e
00daff14 79f92015 mscorwks!WKS::GCHeap::FinalizerThreadStart+0x1fc
00daffb4 7c80b683 mscorwks!Thread::intermediateThreadProc+0x49
00daffec 00000000 kernel32!BaseThreadStart+0x37

Essentially it’s cleaning up the currently unused RCWs – including our malformed one – and as part of doing this, it’s calling Release on the underlying COM object, via the mscorwks!SafeRelease function. SafeRelease wraps the call to potentially (and definitely, in this case) dangerous unmanaged code with various exception handlers, enabling it to silently handle access violations.

If we run the app again, and this time break while it’s waiting for the keypress, before it attempts to clean up the RCWs, then we can examine the wrapper ourselves, using the approach I set out in this post.

List all of the untyped COM object wrappers:

0:002> !dumpheap -type System.__ComObject
 Address       MT     Size
01418628 79306e60       16     
total 1 objects
      MT    Count    TotalSize Class Name
79306e60        1           16 System.__ComObject
Total 1 objects

Use the address of the object to obtain its object header:

0:002> dd 1418628-4 L1
01418624 08000002

Use the syncblk identifier in the header to get the syncblk:

0:002> !syncblk 2
Index SyncBlock MonitorHeld Recursion Owning Thread Info  SyncBlock Owner
    2 001e4d9c            0         0 00000000     none    01418628 System.__ComObject
Total           2
CCW             0
RCW             0
ComClassFactory 0
Free            0

Get the address of the RCW from the sync block:

0:008> dd 001e4d9c+1c L1
001e4db8 001e7dc8
0:008> dd 001e7dc8+c L1
001e7dd4 001de828

And dump out the relevant bits of the RCW, the vtable of the object, at offset 0x88, and the IUnknown pointer, at offset 0x64:

0:008> dds 001de828+88 L1
001de8b0 0041ac78 testStackObjects!MyClass::`vftable'
0:008> dds 001de828+64 L1
001de88c 0012fe7c

We can use !address to do a quick sanity check on the pointer and verify what we know to be the case; it’s stack memory:

0:008> !address 0012fe7c
    00030000 : 00124000 - 0000c000
                    Type     00020000 MEM_PRIVATE
                    Protect  00000004 PAGE_READWRITE
                    State    00001000 MEM_COMMIT
                    Usage    RegionUsageStack
                    Pid.Tid  490.13dc

If we run the app on again to the point that it fails, we can clearly see the address of the object being passed as an argument to mscorwks!IUnkEntry::Free.

So the moral of the story is; don’t pretend some arbitrary piece of stack memory is a real, reference counted COM object. You may be saving the cost of a heap allocation, but even if your app works OK today, it may not tomorrow when someone introduces a piece of .NET code somewhere in your object graph.

Bonus Extra Content

As a bonus tip, here are a couple of WinDbg breakpoints that can be used to dump each RCW as it’s created and destroyed.

bu 79faa974 "dds @ecx L23; g"
bu 79faa538 "dd @esp+20 L1; dds poi(@esp+20)+88 L1; g"