Tag Archives: IDispatch

Getting .NET type information in the unmanaged world

One of the tools that I write and maintain displays type information for COM objects hidden behind “handles” in Excel spreadsheets. The underlying objects can either support an interface that allows them to be richly rendered to XML, or the viewer will fall-back to using metadata and displaying the supported interfaces and their properties and methods. It will also invoke parameterless property getters – making the assumption that doing so won’t change the state of the object – and display the returned value. This is a useful way of getting some visibility on otherwise completely opaque values.

In order to obtain the type information about the COM objects, the tool uses type libraries, and the associated ITypeLib and ITypeInfo interfaces, which, with a little effort, can be used to iterate over all the coclasses, interfaces and functions in the library. But the difficulty lies in obtaining a type library when all you’re given is an already-instantiated object. In theory, COM allows you to know no more about an object than what interfaces it supports. But in practice, there are a variety of ways you can circumvent this and get to the type information.

For unmanaged COM objects you can use the information in the registry (or SxS configuration) and obtain the server (DLL) that contain a TLB embedded as a resource, or the type library filename itself. I won’t go into that now, there’s plenty of information about the location of these common registry keys elsewhere on the internet.

But for managed COM objects – well, COM callable wrappers (CCWs) – you have a different problem: registry scraping will never work and there may not even be an associated type library. The InprocServer32 registry entry always points to mscoree.dll, which obviously doesn’t have an embedded type library, and unless you’ve registered the assembly with /tlb (which is a pain) then you won’t have entries under HKEY_CLASSES_ROOT\Typelib and a TLB file to load.

So, if you’re in the unmanaged world, and all you’ve got is a pointer to a live CCW, what can you do?

Well, the easiest thing is to use IProvideClassInfo. This is supported by all CCWs, and provides a way to get an auto-generated (by the CLR) ITypeInfo implementation for the managed class. In fact, this is what I actually used to implement the solution eventually, but along the way I discovered some other interesting aspects of the CCW.

There is another interface that it supports: _Object, the unmanaged version of System.Object, which supports basic .NET functionality such as ToString and GetType. I couldn’t find it declared anywhere in the Platform or .NET SDK headers, so I put together a version that I could use from C++:

struct __declspec(uuid(“{65074F7F-63C0-304E-AF0A-D51741CB4A8D}”)) Object : public IDispatch

{

public:

// We don’t actually call these methods, doing so seems to return

// COR_E_INVALIDOPERATION. Instead we just use the IDispatch::Invoke

// and use the DISPID of the methods.

virtual HRESULT STDMETHODCALLTYPE ToString(BSTR *) = 0;

virtual HRESULT STDMETHODCALLTYPE Equals(VARIANT, VARIANT_BOOL *) = 0;

virtual HRESULT STDMETHODCALLTYPE GetHashCode(long *) = 0;

virtual HRESULT STDMETHODCALLTYPE GetType(mscorlib::_Type **) = 0;

};

Despite the presence of the virtual functions in this “interface”, we’re not actually going to call them. Instead we’ll call through the IDispatch that it derives from. It may be possible to use them directly, but see the comment describing what happens when I tried it. Calling via IDispatch may seem slightly odd, because the object itself claims not to support it (QueryInteface returns E_NOINTERFACE).

The methods on the _Object interface have well-known DISPIDs:

ToString 0x00000000
Equals 0x60020001
GetHashCode 0x60020002
GetType 0x60020003

So we can use that to invoke the GetType method:

DISPPARAMS parms;

parms.cArgs = 0;

parms.cNamedArgs = 0;

_variant_t vType;

hr = pObject->Invoke(0x60020003, IID_NULL, 0, DISPATCH_METHOD, &parms, &vType, NULL, NULL);

And we get back a _Type interface that allows us to navigate around the type information in the same way as we can with System.Type! Just #import mscorlib.tlb and you get all the interfaces you need to e.g. iterate over all the interfaces implemented by a type, and invoke a function on them:

#import <mscorlib.tlb> rename(“ReportEvent”,“xReportEvent”)

mscorlib::_TypePtr t(V_UNKNOWN(&vType));

CComSafeArray<LPUNKNOWN> saInterfaces(t->GetInterfaces());

mscorlib::_TypePtr tInterface((LPUNKNOWN)saInterfaces.GetAt(n));

result = tInterface->InvokeMember(_bstr_t(“Function”),

(mscorlib::BindingFlags)

(mscorlib::BindingFlags_GetProperty +

mscorlib::BindingFlags_InvokeMethod +

mscorlib::BindingFlags_Public +

mscorlib::BindingFlags_Instance +

mscorlib::BindingFlags_IgnoreCase),

NULL, _variant_t(punk), NULL, NULL, NULL, NULL);

So this turns out to be quite nice: you can get rich managed type information even if you’re running in the unmanaged world.

Beware cached IDispatch

I’ve kinda given it away there with the title, but we had an interesting set of symptoms exhibited the other day while trying to call a function in an Excel workbook via F#. It appeared that the function being called would fail depending on what had been called previously. Very odd.

A bit of background: as you may know, if you add functions to the worksheet or workbook code in Excel then they appear as callable methods on the objects themselves. This is achieved with the use of dynamic dispatch and IDispatch. For example, creating a workbook with this function in it’s VBA code:

Public Function Foo() As Double
    Foo = 100
End Function

Means you can call it like this:

MsgBox CStr(ThisWorkbook.Foo)

As well as being able to call it like this from within the Excel session (i.e. in other VBA code in the process), you can also access it externally using the COM object model that Office applications expose. For instance, you can use VBScript:

Dim excel, wkb
Set excel = CreateObject("Excel.Application")
Set wkb = excel.Workbooks.Open("a.xls")
WScript.Echo wkb.Foo

Or, more interesting, using F#:

    let excel = new Excel.ApplicationClass()

    let wkb = excel.Workbooks.Open(@”c:\a.xls”)

    wkb.GetType().InvokeMember(“Foo”, BindingFlags.InvokeMethod, null, wkb, null)

The key part here is that we’re using wks.GetType() to get a managed representation of the unmanaged Excel COM interface. Under the covers this is creating a runtime callable wrapper (RCW) to wrap the worksheet COM object.

However, the problem we were seeing was that opening multiple sheets resulted in failures to call the method in certain situations. Although the VBA signature was exactly the same in all of the sheets, it seemed that opening b.xls after a.xls, would fail; returning null when we expected it to return a value. If we opened c.xls after b.xls, it would fail in a different way; never actually making it to the body of the function. Very odd.

My first suspicion was that it was somehow related to COM object vs .NET object lifetime. This is quite a common problem whens invoking Excel using managed code. It’s bad mixing COM and .NET anyway; generally deterministic, reference-counted lifetime semantics don’t play well with the garbage collector. Throwing an app with a full-blown UI being managed as COM object into the mix just complicates matters further. Anyway, it’s been widely discussed, so I won’t say any more about it here; suffice to say that calling Marshal.ReleaseCOMObject and GC.Collect got us to the point where we could see the Excel process terminating, so we knew that the failure wasn’t due to some state being cached inside there.

So we concentrated on different aspects of the problem:

  • Given the pattern of failures, it seemed that the order of opening the sheets and calling affected the outcome. This hinted that something was persisting betweeen calls, but not in the client (Excel) site.
  • The code had previously worked when written in VBScript, so there was nothing intrinsically wrong with the operations we were performing.

This seemed to strongly indicate that something was being cached at the .NET level. And the major difference between the .NET code and the VBScript was that the former used Type.GetType() on the worksheet object to get it’s managed representation, while the latter used the IDispatch directly.

So it looked like GetType() was caching some information about the particular IDispatch implementation that it encountered first, then reusing that for subsequent worksheet implementations which actually had a slightly different layout, i.e. although they also had the Foo function which we were trying to call, they had a different set of other dynamic functions too.

After a bit of digging about I uncovered this gem: the mapping between interface pointers and runtime callable wrappers. Which seemed to describe exactly what we were seeing. The first time through the loop, the runtime is asked to get the type and is given a COM pointer, for which it creates a RCW that we can use to invoke Foo. The second time through the loop, the runtime thinks it’s seen the object before, so rather than perform the expensive operation of creating the RCW again, it just returns the original one. The problem is, the underlying COM object is different!

So, in order to prevent the runtime for trying to cache the RCW, we need to use Marshal.GetUniqueObjectForIUnknown and that does the trick nicely. We first need to get an IUnknown for our object, than convert that back into an object, which is actually a RCW:

    let wkb = Marshal.GetUniqueObjectForIUnknown(Marshal.GetIUnknownForObject(wkb))

Although it’s less efficent, at least the code works, and it finally allows us to call the dynamic methods on the workbook object from F# .NET.

It will be interesting to see how dynamic in .NET 4.0 addresses this kind of issue.