I recently discovered a nasty backward compatibility problem with the new type equivalence feature in .NET 4.0. Luckily it’s relatively difficult to hit it if you’re in a pure-C# environment, but if you happen to generate any assemblies directly using IL, you should watch out. Read on for all the gory details.
Continue reading .NET 4.0 Type Equivalence causes BadImageFormatException
Category Archives: COM
Beware of using stack-based COM objects from .NET
There are all sorts of nasty things to be aware of if you’re mixing reference-counted COM objects with garbage-collected .NET. For instance, if you’re implementing COM objects in C++ then you’re free to allocate them anywhere you like; on the heap or perhaps on the stack if you know they’re only used in some specific scope.
But what happens if during the lifetime of that stack based COM object, it gets used from .NET? A runtime callable wrapper (RCW) will be created around the object. And this RCW expects to be able to keep the underlying object alive by incrementing its reference count. Of course, the stack-based object will soon go out of scope, and regardless of its reference count the object will be destroyed and the pointer that the RCW contains will no longer be valid. It points into the stack, so when the RCW gets cleaned-up, the CLR will call via this pointer into memory that contains garbage and you’ll get something nasty like an access violation or illegal instruction exception.
Continue reading Beware of using stack-based COM objects from .NET
Getting IUnknown from __ComObject
I’m working in an environment with a lot of mixed managed (F#) and unmanaged (C++ COM) code. One of the big problems with this is the mix of lifetime management techniques; .NET uses garbage collection while COM relies on reference counting. Furthermore .NET garbage collection is somewhat non-deterministic, which adds further complexity.
So quite often in our mixed code-base, we find that the .NET garbage collection process doesn’t kick in when we need it to. For instance, when we’ve allocated a lot of memory in the COM code that .NET isn’t aware of. Memory exhaustion has to get pretty bad for the GC to occur at any other time than during a .NET allocation, either the system-wide low-memory event has to be signalled or an OutOfMemoryException
needs to be thrown. In both of these cases it’s probably too late to do anything about it.
In this case it’s extremely useful to be able to see what .NET objects are still alive, and what COM objects they’re hanging on to. Unfortunately this isn’t as easy as it might seem.
Continue reading Getting IUnknown from __ComObject
Getting .NET type information in the unmanaged world
One of the tools that I write and maintain displays type information for COM objects hidden behind “handles” in Excel spreadsheets. The underlying objects can either support an interface that allows them to be richly rendered to XML, or the viewer will fall-back to using metadata and displaying the supported interfaces and their properties and methods. It will also invoke parameterless property getters – making the assumption that doing so won’t change the state of the object – and display the returned value. This is a useful way of getting some visibility on otherwise completely opaque values.
In order to obtain the type information about the COM objects, the tool uses type libraries, and the associated ITypeLib and ITypeInfo interfaces, which, with a little effort, can be used to iterate over all the coclasses, interfaces and functions in the library. But the difficulty lies in obtaining a type library when all you’re given is an already-instantiated object. In theory, COM allows you to know no more about an object than what interfaces it supports. But in practice, there are a variety of ways you can circumvent this and get to the type information.
For unmanaged COM objects you can use the information in the registry (or SxS configuration) and obtain the server (DLL) that contain a TLB embedded as a resource, or the type library filename itself. I won’t go into that now, there’s plenty of information about the location of these common registry keys elsewhere on the internet.
But for managed COM objects – well, COM callable wrappers (CCWs) – you have a different problem: registry scraping will never work and there may not even be an associated type library. The InprocServer32 registry entry always points to mscoree.dll, which obviously doesn’t have an embedded type library, and unless you’ve registered the assembly with /tlb (which is a pain) then you won’t have entries under HKEY_CLASSES_ROOT\Typelib and a TLB file to load.
So, if you’re in the unmanaged world, and all you’ve got is a pointer to a live CCW, what can you do?
Well, the easiest thing is to use IProvideClassInfo. This is supported by all CCWs, and provides a way to get an auto-generated (by the CLR) ITypeInfo implementation for the managed class. In fact, this is what I actually used to implement the solution eventually, but along the way I discovered some other interesting aspects of the CCW.
There is another interface that it supports: _Object, the unmanaged version of System.Object, which supports basic .NET functionality such as ToString and GetType. I couldn’t find it declared anywhere in the Platform or .NET SDK headers, so I put together a version that I could use from C++:
struct __declspec(uuid(“{65074F7F-63C0-304E-AF0A-D51741CB4A8D}”)) Object : public IDispatch
{
public:
// We don’t actually call these methods, doing so seems to return
// COR_E_INVALIDOPERATION. Instead we just use the IDispatch::Invoke
// and use the DISPID of the methods.
virtual HRESULT STDMETHODCALLTYPE ToString(BSTR *) = 0;
virtual HRESULT STDMETHODCALLTYPE Equals(VARIANT, VARIANT_BOOL *) = 0;
virtual HRESULT STDMETHODCALLTYPE GetHashCode(long *) = 0;
virtual HRESULT STDMETHODCALLTYPE GetType(mscorlib::_Type **) = 0;
};
Despite the presence of the virtual functions in this “interface”, we’re not actually going to call them. Instead we’ll call through the IDispatch that it derives from. It may be possible to use them directly, but see the comment describing what happens when I tried it. Calling via IDispatch may seem slightly odd, because the object itself claims not to support it (QueryInteface returns E_NOINTERFACE).
The methods on the _Object interface have well-known DISPIDs:
ToString | 0x00000000 |
Equals | 0x60020001 |
GetHashCode | 0x60020002 |
GetType | 0x60020003 |
So we can use that to invoke the GetType method:
DISPPARAMS parms;
parms.cArgs = 0;
parms.cNamedArgs = 0;
_variant_t vType;
hr = pObject->Invoke(0x60020003, IID_NULL, 0, DISPATCH_METHOD, &parms, &vType, NULL, NULL);
And we get back a _Type interface that allows us to navigate around the type information in the same way as we can with System.Type! Just #import mscorlib.tlb and you get all the interfaces you need to e.g. iterate over all the interfaces implemented by a type, and invoke a function on them:
#import <mscorlib.tlb> rename(“ReportEvent”,“xReportEvent”)
…
mscorlib::_TypePtr t(V_UNKNOWN(&vType));
CComSafeArray<LPUNKNOWN> saInterfaces(t->GetInterfaces());
…
mscorlib::_TypePtr tInterface((LPUNKNOWN)saInterfaces.GetAt(n));
…
result = tInterface->InvokeMember(_bstr_t(“Function”),
(mscorlib::BindingFlags)
(mscorlib::BindingFlags_GetProperty +
mscorlib::BindingFlags_InvokeMethod +
mscorlib::BindingFlags_Public +
mscorlib::BindingFlags_Instance +
mscorlib::BindingFlags_IgnoreCase),
NULL, _variant_t(punk), NULL, NULL, NULL, NULL);
So this turns out to be quite nice: you can get rich managed type information even if you’re running in the unmanaged world.
Beware cached IDispatch
I’ve kinda given it away there with the title, but we had an interesting set of symptoms exhibited the other day while trying to call a function in an Excel workbook via F#. It appeared that the function being called would fail depending on what had been called previously. Very odd.
A bit of background: as you may know, if you add functions to the worksheet or workbook code in Excel then they appear as callable methods on the objects themselves. This is achieved with the use of dynamic dispatch and IDispatch. For example, creating a workbook with this function in it’s VBA code:
Public Function Foo() As Double
Foo = 100
End Function
Means you can call it like this:
MsgBox CStr(ThisWorkbook.Foo)
As well as being able to call it like this from within the Excel session (i.e. in other VBA code in the process), you can also access it externally using the COM object model that Office applications expose. For instance, you can use VBScript:
Dim excel, wkb
Set excel = CreateObject("Excel.Application")
Set wkb = excel.Workbooks.Open("a.xls")
WScript.Echo wkb.Foo
Or, more interesting, using F#:
let excel = new Excel.ApplicationClass()
let wkb = excel.Workbooks.Open(@”c:\a.xls”)
wkb.GetType().InvokeMember(“Foo”, BindingFlags.InvokeMethod, null, wkb, null)
The key part here is that we’re using wks.GetType() to get a managed representation of the unmanaged Excel COM interface. Under the covers this is creating a runtime callable wrapper (RCW) to wrap the worksheet COM object.
However, the problem we were seeing was that opening multiple sheets resulted in failures to call the method in certain situations. Although the VBA signature was exactly the same in all of the sheets, it seemed that opening b.xls after a.xls, would fail; returning null when we expected it to return a value. If we opened c.xls after b.xls, it would fail in a different way; never actually making it to the body of the function. Very odd.
My first suspicion was that it was somehow related to COM object vs .NET object lifetime. This is quite a common problem whens invoking Excel using managed code. It’s bad mixing COM and .NET anyway; generally deterministic, reference-counted lifetime semantics don’t play well with the garbage collector. Throwing an app with a full-blown UI being managed as COM object into the mix just complicates matters further. Anyway, it’s been widely discussed, so I won’t say any more about it here; suffice to say that calling Marshal.ReleaseCOMObject and GC.Collect got us to the point where we could see the Excel process terminating, so we knew that the failure wasn’t due to some state being cached inside there.
So we concentrated on different aspects of the problem:
- Given the pattern of failures, it seemed that the order of opening the sheets and calling affected the outcome. This hinted that something was persisting betweeen calls, but not in the client (Excel) site.
- The code had previously worked when written in VBScript, so there was nothing intrinsically wrong with the operations we were performing.
This seemed to strongly indicate that something was being cached at the .NET level. And the major difference between the .NET code and the VBScript was that the former used Type.GetType() on the worksheet object to get it’s managed representation, while the latter used the IDispatch directly.
So it looked like GetType() was caching some information about the particular IDispatch implementation that it encountered first, then reusing that for subsequent worksheet implementations which actually had a slightly different layout, i.e. although they also had the Foo function which we were trying to call, they had a different set of other dynamic functions too.
After a bit of digging about I uncovered this gem: the mapping between interface pointers and runtime callable wrappers. Which seemed to describe exactly what we were seeing. The first time through the loop, the runtime is asked to get the type and is given a COM pointer, for which it creates a RCW that we can use to invoke Foo. The second time through the loop, the runtime thinks it’s seen the object before, so rather than perform the expensive operation of creating the RCW again, it just returns the original one. The problem is, the underlying COM object is different!
So, in order to prevent the runtime for trying to cache the RCW, we need to use Marshal.GetUniqueObjectForIUnknown and that does the trick nicely. We first need to get an IUnknown for our object, than convert that back into an object, which is actually a RCW:
let wkb = Marshal.GetUniqueObjectForIUnknown(Marshal.GetIUnknownForObject(wkb))
Although it’s less efficent, at least the code works, and it finally allows us to call the dynamic methods on the workbook object from F# .NET.
It will be interesting to see how dynamic
in .NET 4.0 addresses this kind of issue.
Static libraries are Evil
In my opinion.
Why? Well, because it’s too easy to use them as an excuse for not defining your shared library interfaces properly.
The reason this is on my mind recently is that several hundred, yes, you heard that right, several hundred DLLs have been released by my group over the last, ooh, 10 years or so. They are all still in use. Each of them has burned into it a copy of the library that deals with interfacing with Excel. That means each of these has it’s own little internal copy of the current state-of-the-art. The problem with that is; the state-of-the-art moves on. And how do you go about updating the DLLs that are already in production? You have to re-release them. In an environment where thes DLLs are used for marking the profit and loss on a large derivatives trading book, that’s not a small undertaking. And it’s made worse if, say the DLL in question was last released with a different version of the compiler.
My approach would be to refactor this shared static library (.lib) into a stand-alone DLL.
At this point, people start saying “oh, but then you’ve got a single point of failure, if you release a broken version of that DLL, everything will stop working!”. Not exactly a compelling argument. If the functionality of the DLL is well defined, and there are well known entry points it should be easy to put together a comprehensive black-box test suite. In fact we already do that with all our other DLLs (COM servers). The fact that this shared library *isn’t* a DLL has meant that it’s fallen through the testing cracks; another good reason to refactor it.
The internal interface to the shared library is already relatively well defined. It has a set of header files that define all of the functions and classes that are consumed by others. It’s a relatively small step to compile it as a DLL, rather than a static library. The problem then becomes one of maintenance, dealing with the inevitable changes to the external interface in a backwardly compatible way.
And that’s the problem. It requires some effort. Elsewhere in our codebase we use COM as a magic cure-all for avoiding having to deal with versioning: interface immutability rules. All interfaces are public, no published interface ever changes, object identity is based purely on interfaces supported. If you haven’t got these crutches to rely on, then you have to enforce the rules yourself, which can be both logistically and technically difficult when you’re dealing with C++.
But it’s not impossible. And I really think it would be better than having hundreds of DLLs all containing subtley different versions of the same code, and being unable to change behaviour across the board without having to build, test and release them all.
Maybe you’ve got a different opinion?
F# – A little gotcha with GuidAttribute
Be careful when using the [<Guid(“…”)>] attribute on your COM-visible classes in F#. If you mistakenly use the curly-bracket delimited format for the GUID, regasm will silently, yes, silently, fail to add any CLSID entries for your class. That means it will be cocreatable by the prog ID, but not the CLSID. Ouch.
No doubt this will be addressed in the CTP release, due in a couple of weeks.
Getting COM registration data with the activation context API
When moving code from “traditional” registry-based COM to shiny new side-by-side, registration-free COM, there are a few places where you might need an analog for things like looking up a DLL name from a prog ID. E.g. in the registry-based world, you can go from a Prog ID for a class to it’s physical DLL filename by doing this:
- Get CLSID from the ProgID
- Look up filename in HKCR\CLSID\{clsguid}\InprocServer32
Now obviously, if you attempt this on a machine where the components are only being used via SxS (i.e. have never been registered) the CLSIDFromProgID step will work – OLE32.DLL, which implements this function, is aware of SxS – but the subsequent steps won’t because the information isn’t in the registry.
I guess this is really breaking because we’re taking advantage of our knowledge of COM internals, rather than going via the official APIs. Although as far as I know the “correct” way of doing the progid->(type library) DLL mapping is via IProvideClassInfo, but that relies on the COM objects supporting this interface, and unfortunately we have, err, several – hundred – that don’t.
So, I set about looking to see if there was a way to get this information using the activation context APIs. All the information is there in the manifest, so how do we get it – without something nasty like querying the manifest XML directly?
There are only a handful of functions in the ActCtx API, and one of them – FindActCtxSectionGuid – looks relevant. By calling this with the ACTIVATION_CONTEXT_SECTION_COM_SERVER_REDIRECTION flag, it looks like we can get some data from the manifest based on our COM object CLSID.
Here’s the problem. The returned data from this function is a ACTCTX_SECTION_KEYED_DATA structure, and as far as I can tell this is essentially an opaque blob, with a couple of length indicators. I couldn’t find any documentation about what the lpData member was supposed to point to (if you know any better, please let me know)!
I decided to break out WinDbg, and see what OLE32!CLSIDFromProgID did, as I assumed that this must be doing something similar. It was! In fact, it was calling FindActCtxSectionString to map the prog ID to a CLSID, then using this in a call to FindActCtxSectionGuid. After a bit of disassembly, and some staring at the memory window in Visual Studio, I got a good enough idea of the contents to be able to figure out how the data referenced the filename:
typedef struct tagSECTION_DATA
{
DWORD dwSize; // 0×78 (120) structure size?
DWORD _2; // 0×00
DWORD dwSectionType; // 0×04 (ACTIVATION_CONTEXT_SECTION_COM_SERVER_REDIRECTION)
GUID clsid; // CLSID of class?
GUID _5; // Some other GUID
GUID _6; // CLSID of class again…?
GUID _7; // NULL
DWORD dwFileNameLength; // file name size in bytes
DWORD dwFileNameSectionOffset; // file name offset into data.lpSectionBase
DWORD dwProgIDLength; // progid size in bytes
DWORD dwProgIDOffset; // offset from start of this structure to progid (0×78)
BYTE _8[28]; //Unknown
// Prog ID string follows
} SECTION_DATA;
So now you can cast the data member to this structure and easily extract the filename, voila!
ACTCTX_SECTION_KEYED_DATA data;
data.cbSize = sizeof(data);
if (!FindActCtxSectionGuid(FIND_ACTCTX_SECTION_KEY_RETURN_HACTCTX,
NULL,
ACTIVATION_CONTEXT_SECTION_COM_SERVER_REDIRECTION,
&guid,
&data))
{
return GetLastError();
}
if(data.ulDataFormatVersion == 1) // Fail-safe in case internal format changes…?
{
// Cast returned data to our structure type
SECTION_DATA *pdata = static_cast<SECTION_DATA *>(data.lpData);
// DLL filename can be found in the section base data at specified offset
std::wstring filename(
reinterpret_cast<wchar_t *>( ((BYTE *)data.lpSectionBase + pdata->dwFileNameSectionOffset) ));
}
So the SxS compliant version of the CLSID to filename/typelibrary is:
- Get CLSID from the ProgID
- Look up GUID in current activation context using FindActCtxSectionGuid
- Decipher returned data to get filename
I’m sure this will break horribly when they change the internal format of the activation context data (in fact, it’s probably different now between XP and Vista – many things in the SxS world are), but hopefully we can use the ulDataFormatVersion to do some basic sanity checking.
Now if only some of those useful looking functions in sxs.dll were documented…
Tracking COM memory using IMallocSpy
One of my aims at work is to simplify the development and testing regime as much as possible, and this mostly consists of making sure we’re using the most appropriate tools and technologies wherever possible. In the context of correctness checking this involves determining whether the time we spend installing, configuring and chasing down false positives in third-party tools such as Purify outweighs their benefits.
As a rule I’d always favour using built-in, vendor provided hooks rather than a bolt-on product, and as such I was interested in what we could do with the IMallocSpy functionality in the COM runtime, as our group’s software is almost all COM based.
It turned up some interesting things…
The first was the behaviour of CoRevokeMallocSpy. Stupidly I originally neglected to check the return value, and then when I did, found it was returning E_ACCESSDENIED. It turns out that this means there are still allocations that occurred when the spy was active that haven’t been freed. Given that my use of the spy was to check for leaked allocations in the first place, this was a bit annoying; at least, it meant I couldn’t overload the lifetime of my spy object, to, say, dump a list of leaks when it was destroyed. If there were any leaks it never got destroyed!
The next problem was that it seemed whenever I had an app that called the apparently innocous CLSIDFromProgID, it would leak memory. For example:
addr: 0x0015eb20: size: 0x2c (44) contents: c0 31 fd 76 30 c8 15 00 01 00 00 00 01 00 12 00 .1.v0........... 28 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 01 f0 ad ba bc ec 15 00 94 ec 15 00 ............ [0x77583315] CSpyMalloc_Alloc+0x49 [0x774fd073] CoTaskMemAlloc+0x13 [0x76fd18fb] operator new+0xe [0x76fd5c6b] StgDatabase::InitClbFile+0x2e [0x76fdc190] StgDatabase::InitDatabase+0x623c [0x770076fa] OpenComponentLibraryEx+0x3e [0x77005306] OpenComponentLibraryTS+0x1a [0x76fd954d] _RegGetICR+0x761f [0x76fd1f24] CoRegGetICR+0xffff877d [0x76fd6a20] IsSelfRegProgID+0x65 [0x76fd80f9] CComCLBCatalog::GetClassInfoFromProgId+0x1783 [0x77518a6d] CComCatalog::GetClassInfoFromProgId+0x100 [0x77518964] CComCatalog::GetClassInfoFromProgId+0x1e [0x775188a0] CLSIDFromProgID+0x76 [0x004120f5] wmain+0xa5 [0x004173a6] __tmainCRTStartup+0x1a6 [0x004171ed] wmainCRTStartup+0xd [0x7c816fd7] BaseProcessStart+0x23
Looking at the stack trace, it seemed there was some kind of internal caching going on, but what was confusing me was that I was under the impression that all memory allocated by the COM runtime would be freed by the time CoUninitialize was done. After all, you can’t make any further COM calls after this point. If you don’t believe me, just try using a static CComPtr, and see what happens in DllMainCRTStartup when your app exits.
After a bit of poking about with WinDbg (thank goodness we get decent symbols for the OS now), I could see that some kind of “database” was being created within CLBCATQ.DLL:
0:000> k ChildEBP RetAddr¼br> 0012faf4 770076fa CLBCATQ!StgDatabase::InitDatabase 0012fb18 77005306 CLBCATQ!OpenComponentLibraryEx+0x3e 0012fb34 76fd954d CLBCATQ!OpenComponentLibraryTS+0x1a 0012fdd0 76fd1f24 CLBCATQ!_RegGetICR+0x205 0012fdf0 76fd6a20 CLBCATQ!CoRegGetICR+0x29 0012fe48 76fd80f9 CLBCATQ!IsSelfRegProgID+0x6b 0012fe88 77518a6d CLBCATQ!CComCLBCatalog::GetClassInfoFromProgId+0x51 0012fec0 77518964 ole32!CComCatalog::GetClassInfoFromProgId+0x149 0012fee0 775188a0 ole32!CComCatalog::GetClassInfoFromProgId+0x1e 0012ff0c 00401340 ole32!CLSIDFromProgID+0x95 0012ff7c 00401cae testleakcheck!wmain+0x60 0012ffc0 7c816fd7 testleakcheck!__tmainCRTStartup+0x10f [f:\rtm\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 583] 0012fff0 00000000 kernel32!BaseProcessStart+0x23
I could see by looking at the exports that there was a function called CoRegCleanup in CLBCATQ.DLL that looked like it could be used to free up this storage before I did my leak checking. Calling it by dynamically getting the function pointer using GetProcAddress did make a difference, but there was still some memory not freed, and I didn’t feel comfortable using an undocumented function in this way.
Then I remembered the magical OANOCACHE environment variable.
This is used to tell the COM runtime not to cache memory used for BSTRs, VARIANTs, SAFEARRAYs, or anything else allocated using CoTaskMalloc. So, I set the variable, re-ran the test and voila! the apparent leaks disappeared. There must be something in CLBCATQ that detects the environment varibale and disables it’s internal cache.
So the moral of the story is; if you’re attempting to reliably track memory usage with IMallocSpy, remember to make sure you have OANOCACHE set, otherwise you will always end up with memory not being freed until late into process teardown.
Bug in _com_ptr_t::QueryStdInterfaces
Just thought I’d bring people’s attention to a bug in the COM support classes that ship with Visual Studio 2005/VC8.
It’s a fairly unusual edge case (at least, if you’re not being passed VARIANTs from a script language), where the _variant_t helper class attempts to extract a standard COM interface (IUnknown or IDispatch) from a VARIANT that’s being passed “byref”, e.g. has a type or’ed with VT_BYREF.
In this case the code in comip.h uses VariantCopy to “derefence” the pointer – convert it from, say, VT_DISPATCH|VT_BYREF to VT_DISPATCH – but if this succeeds, it then proceeds to use the wrong local variable, varSrc, rather than varDest that it’s just converted. The effect is that it causes an access violation.
Here’s the code snippet in question:
// Try to extract either IDispatch* or an IUnknown* from
// the VARIANT
//
HRESULT QueryStdInterfaces(const _variant_t& varSrc) throw()
{
if (V_VT(&varSrc) == VT_DISPATCH) {
return _QueryInterface(V_DISPATCH(&varSrc));
}
if (V_VT(&varSrc) == VT_UNKNOWN) {
return _QueryInterface(V_UNKNOWN(&varSrc));
}
// We have something other than an IUnknown or an IDispatch.
// Can we convert it to either one of these?
// Try IDispatch first
//
VARIANT varDest;
VariantInit(&varDest);
HRESULT hr = VariantChangeType(&varDest, const_cast(static_cast<const VARIANT*>(&varSrc)), 0, VT_DISPATCH);
if (SUCCEEDED(hr)) {
hr = _QueryInterface(V_DISPATCH(&varSrc)); // Should be &varDest
}
if (hr == E_NOINTERFACE) {
// That failed … so try IUnknown
//
VariantInit(&varDest);
hr = VariantChangeType(&varDest, const_cast(static_cast<const VARIANT*>(&varSrc)), 0, VT_UNKNOWN);
if (SUCCEEDED(hr)) {
hr = _QueryInterface(V_UNKNOWN(&varSrc)); // Should be &varDest
}
}
VariantClear(&varDest);
return hr;
}
Looks like it’s the same in Visual Studio 2005 SP1, but Microsoft are aware of the problem, and hopefully there’ll be a patch soon.
Hope this saves you some time!