All posts by ian

VBA is a dynamic language too!

There’s lots of talk about dynamic languages recently; what with the ubiquity of Javascript, and the rise and rise of Python, Ruby etc, and now Microsoft jumping into the fray with the DLR – dynamic language runtime – to make creating a dynamic language almost as easy as using one.But sometimes, we forget the old boys, sitting quietly in the corner, like poor, aged old VBA. Once the poster boy for scripting, and still prevalent in even the most recent versions of the Microsoft Office apps, but now really just biding it’s time, waiting for it’s inevitable replacement by some .NET-based upstart.One of things people often talk about with Javascript is that you can dynamically add properties to an object at runtime, e.g.:

var circ = new Object();
circ.hittest = function(x,y) { return false; };

But, people often forget that you can do a similar thing with VBA in Excel. Yes, VBA.
For instance, add a function to a worksheet object:

  • Open the VBA editor (Alt+F11)
  • Double click on Sheet1
  • Add some code:

Sub NewMethod()
MsgBox “I got called!”
End Sub

Now it’s possible to call this method as if it existed directly on the Worksheet class! For example, with some code in the Workbook like this:

Sub CallIt()
End Sub

So there we go, VBA was dynamic (thanks to IDispatch) before it was even cool…

Back to Mac

Well, I’ve finally got round to getting my hands on a Mac – a shiny 17″ MacBook Pro, no less – after last being heavily involved in Mac development about a decade ago when it was all about the somewhat basic IDE (at least in today’s terms) MPW – the Macintosh Programmer’s Workshop. Check it out if you’re doing any retro Mac OS 7 coding, and aren’t scared of an environment without intellisense! It was a blessed relief when MetroWerks came along with CodeWarrior and bought Mac development out of the dark ages. Gutted; I’ve just discovered that they actually discontinued it in July 2005. Still, having just run up Xcode, I can safely say it’s living on in spirit.

More by luck than judgment I picked up the machine a couple of days after the launch of Leopard, so got that included, and it’s a good job because one of the killer features for me was bootcamp, and this comes as standard with the new OS. Don’t worry, I won’t be going into excruciating detail about the other new features – but mostly because I haven’t used previous versions of the OS enough to be able to make a worthwhile comparison.

Despite that, I still think that it’s a stark reminder of how good an OS can be; a solid foundation combined with a good consistent and cutting edge UI experience.

But I’m off to boot back into Windows for a quick bit of HL2 EP2

CUDA – nVidia’s API for GPGPU

Today I was at the Oxford eResearch Center for an nVidia CUDA workshop. CUDA is their new API for facilitating scientific computing on GPU (graphics processing units).

For a while now there’s been some interest within the computer science research community on the possible application of GPUs to general purpose computing tasks. Well, it’s more accurate to call them scientific computing tasks, I guess, as these are typically embarassingly parallel numerical applications like monte carlo simulations.

nVidia are targeting CUDA directly at potential users in the financial, oil & gas and aerospace industries. These are areas where they’ve never had a significant presence, instead being focused on the consumer gaming and graphics workstation market, which has served them very well. But now they’re looking to exploit the diminishing gains in processor power in recent generations of processors by instead helping people to squeeze maximum bang-for-the-buck by utilising the highly-threaded, parallel, multi-processor nature of commodity GPUs.

The presentations were very interesting. They ranged from almost marketing level discussions of the product road maps etc, to real nuts-and-bolts optimisation discussions. Mark Harris demonstrated successive iterations of a reduction algorithm that eventually approached the limit of memory bandwidth (78gb/s), albeit with some pretty nasty loop unrolling and templating!

From my perspective the biggest drawback of the current hardware and API is that it only supports single precision floating point. Unfortunately everything in my world uses double precision maths; and although it’s possible to convert on entry/exit from the GPU API, this adds significant overhead. Of course, it should be possible to reduce the numerical range that the algorithm has to deal with in order to avoid the need for double precision at all, but this is a bit more re-engineering than I can justify. Even the on-chip double implementation will suffer from a quite significant slow-down compared to the single precision version – but even this will be an order of magnitude faster than non-GPU code, so this shouldn’t be too serious.

Another interesting aspect of the technology is the use of an “intermediate” form of assembled code; PTX files. These are generated from the C-like .CU source files, and are then turned into card-specific machine code on the hardware. This allows nVidia a degree of freedom to change the on-chip architecture, instruction set etc without breaking existing applications.

If you’re interested in keeping in touch with news of the various GPGPU users in academia and industry, you can take a look at, which was started by Mark Harris, who was one of the presenters today.

Dumping Excel XLL add-in calls

Using WinDbg it’s possible to get a dump of each XLL call that is made by Excel as it calculates. If you’re using Excel 2003, create the following breakpoint that dumps the symbol at eax+4 (the entry point that is about to be called), then continues.:

bu EXCEL!MdCallBack+0xa880 "dds @eax+0x4 L1; g"

You’ll need to adjust the offset for other versions of Excel – and I haven’t tried it yet with 2007. Assuming you’ve got symbols available for the XLLs being called, you’ll get something like this:

0013bc50 15109730 addin1!addin1_function1
0013bc50 12c0e918 addin2!addin2_anotherfunctions

This technique can be useful when troubleshooting – to identify the last addin call being made before a failure perhaps – and is also quite interesting to just watch and see the pattern in which your XLL UDFs get called.

(My) Essential Visual Studio add-ins

Thought you might like to know about a few Visual Studio tools and add-ins that I regularly use, and find very useful:

  • Copy as HTML

Fantastic little tool that lets you copy formatted text from the VS editor as HTML fragments. You can’t beat syntax highlighting to ease code readability, and this plug-in is great for adding code to blog entries, etc. Written by Colin Coller and available from

  • VC++ Code Snippets

C++ has missed out on a few of the productivity boosters available for the managed languages in VS2005; one of the most annoying ones is code snippets – mainly because it’s omission seems so arbitrary, it’s not as if it needs reflection or some other CLR specific features. Anyway, it turns out that it is available, but as part of the “PowerToys” package, available from As well as being good for day-to-day productivity, code snippets are also a really powerful demo tool, allowing you to give the appearance that you’re writing code “live” when it’s really just glorified copy and pasting.

  • XPathMania

I do quite a lot of work with XML, and this little utility proves very useful. I used to use XMLSpy, but it was really too overblown when all I wanted to do was use it’s ability to quickly run an XPath query and see what it returns. This add-in allows me to do that directly in Visual Studio… lovely! It was put together by Don Demsak, and is available from

Getting COM registration data with the activation context API

When moving code from “traditional” registry-based COM to shiny new side-by-side, registration-free COM, there are a few places where you might need an analog for things like looking up a DLL name from a prog ID. E.g. in the registry-based world, you can go from a Prog ID for a class to it’s physical DLL filename by doing this:

  • Get CLSID from the ProgID
  • Look up filename in HKCR\CLSID\{clsguid}\InprocServer32

Now obviously, if you attempt this on a machine where the components are only being used via SxS (i.e. have never been registered) the CLSIDFromProgID step will work – OLE32.DLL, which implements this function, is aware of SxS – but the subsequent steps won’t because the information isn’t in the registry.

I guess this is really breaking because we’re taking advantage of our knowledge of COM internals, rather than going via the official APIs. Although as far as I know the “correct” way of doing the progid->(type library) DLL mapping is via IProvideClassInfo, but that relies on the COM objects supporting this interface, and unfortunately we have, err, several – hundred – that don’t.

So, I set about looking to see if there was a way to get this information using the activation context APIs. All the information is there in the manifest, so how do we get it – without something nasty like querying the manifest XML directly?

There are only a handful of functions in the ActCtx API, and one of them – FindActCtxSectionGuid – looks relevant. By calling this with the ACTIVATION_CONTEXT_SECTION_COM_SERVER_REDIRECTION flag, it looks like we can get some data from the manifest based on our COM object CLSID.

Here’s the problem. The returned data from this function is a ACTCTX_SECTION_KEYED_DATA structure, and as far as I can tell this is essentially an opaque blob, with a couple of length indicators. I couldn’t find any documentation about what the lpData member was supposed to point to (if you know any better, please let me know)!

I decided to break out WinDbg, and see what OLE32!CLSIDFromProgID did, as I assumed that this must be doing something similar. It was! In fact, it was calling FindActCtxSectionString to map the prog ID to a CLSID, then using this in a call to FindActCtxSectionGuid. After a bit of disassembly, and some staring at the memory window in Visual Studio, I got a good enough idea of the contents to be able to figure out how the data referenced the filename:

typedef struct tagSECTION_DATA


DWORD dwSize; // 0×78 (120) structure size?

DWORD _2; // 0×00


GUID clsid; // CLSID of class?

GUID _5; // Some other GUID

GUID _6; // CLSID of class again…?

GUID _7; // NULL

DWORD dwFileNameLength; // file name size in bytes

DWORD dwFileNameSectionOffset; // file name offset into data.lpSectionBase

DWORD dwProgIDLength; // progid size in bytes

DWORD dwProgIDOffset; // offset from start of this structure to progid (0×78)

BYTE _8[28]; //Unknown

// Prog ID string follows


So now you can cast the data member to this structure and easily extract the filename, voila!


data.cbSize = sizeof(data);








return GetLastError();


if(data.ulDataFormatVersion == 1) // Fail-safe in case internal format changes…?


// Cast returned data to our structure type

SECTION_DATA *pdata = static_cast<SECTION_DATA *>(data.lpData);

// DLL filename can be found in the section base data at specified offset

std::wstring filename(

reinterpret_cast<wchar_t *>( ((BYTE *)data.lpSectionBase + pdata->dwFileNameSectionOffset) ));


So the SxS compliant version of the CLSID to filename/typelibrary is:

  • Get CLSID from the ProgID
  • Look up GUID in current activation context using FindActCtxSectionGuid
  • Decipher returned data to get filename

I’m sure this will break horribly when they change the internal format of the activation context data (in fact, it’s probably different now between XP and Vista – many things in the SxS world are), but hopefully we can use the ulDataFormatVersion to do some basic sanity checking.

Now if only some of those useful looking functions in sxs.dll were documented…

Process and thread tokens

I’ve recently been trying to work out what was going on with some ASP.NET/COM interop issues at work. It turned out to be due to the differences between OpenProcessToken and OpenThreadToken.

It might sound obvious in hindsight, but I haven’t worked a great deal with the security model within Windows, so I’m not particularly au fait with it all. Plus, given that it was a DLL, called from COM, called from IIS, it wasn’t particularly easy to debug.

The system I work on uses an access control mechanism whereby users have to be members of a certain NT domain group in order to use it. At certain points in the process, the security DLL checks for this by getting the current token, using GetTokenInformation and iterating over the group SIDs it contains (it was written before CheckTokenMembership was available).

The trouble is, when running under IIS and ASP.NET, the check was always failing. Even though the appropriate user identity was being passed through, by impersonation, from the client, it wasn’t working. Hmmm.

It turned out that the validation code was using OpenProcessToken, but of course, the impersonation happens at thread level. You can impersonate as much as you want, but the process access token always contains the original token (for the Network user in my case), not the one with the groups for the user you’re interested in.

By changing the code to use OpenThreadToken and passing FALSE for the OpenAsSelf parameter, you can get the properly impersonated access token. Ahhh.

Tracking COM memory using IMallocSpy



One of my aims at work is to simplify the development and testing regime as much as possible, and this mostly consists of making sure we’re using the most appropriate tools and technologies wherever possible. In the context of correctness checking this involves determining whether the time we spend installing, configuring and chasing down false positives in third-party tools such as Purify outweighs their benefits.

As a rule I’d always favour using built-in, vendor provided hooks rather than a bolt-on product, and as such I was interested in what we could do with the IMallocSpy functionality in the COM runtime, as our group’s software is almost all COM based.

It turned up some interesting things…

The first was the behaviour of CoRevokeMallocSpy. Stupidly I originally neglected to check the return value, and then when I did, found it was returning E_ACCESSDENIED. It turns out that this means there are still allocations that occurred when the spy was active that haven’t been freed. Given that my use of the spy was to check for leaked allocations in the first place, this was a bit annoying; at least, it meant I couldn’t overload the lifetime of my spy object, to, say, dump a list of leaks when it was destroyed. If there were any leaks it never got destroyed!

The next problem was that it seemed whenever I had an app that called the apparently innocous CLSIDFromProgID, it would leak memory. For example:

addr: 0x0015eb20:
size: 0x2c (44)
c0 31 fd 76 30 c8 15 00 01 00 00 00 01 00 12 00 .1.v0...........
28 00 10 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
01 f0 ad ba bc ec 15 00 94 ec 15 00             ............
[0x77583315] CSpyMalloc_Alloc+0x49
[0x774fd073] CoTaskMemAlloc+0x13
[0x76fd18fb] operator new+0xe
[0x76fd5c6b] StgDatabase::InitClbFile+0x2e
[0x76fdc190] StgDatabase::InitDatabase+0x623c
[0x770076fa] OpenComponentLibraryEx+0x3e
[0x77005306] OpenComponentLibraryTS+0x1a
[0x76fd954d] _RegGetICR+0x761f
[0x76fd1f24] CoRegGetICR+0xffff877d
[0x76fd6a20] IsSelfRegProgID+0x65
[0x76fd80f9] CComCLBCatalog::GetClassInfoFromProgId+0x1783
[0x77518a6d] CComCatalog::GetClassInfoFromProgId+0x100
[0x77518964] CComCatalog::GetClassInfoFromProgId+0x1e
[0x775188a0] CLSIDFromProgID+0x76
[0x004120f5] wmain+0xa5
[0x004173a6] __tmainCRTStartup+0x1a6
[0x004171ed] wmainCRTStartup+0xd
[0x7c816fd7] BaseProcessStart+0x23

Looking at the stack trace, it seemed there was some kind of internal caching going on, but what was confusing me was that I was under the impression that all memory allocated by the COM runtime would be freed by the time CoUninitialize was done. After all, you can’t make any further COM calls after this point. If you don’t believe me, just try using a static CComPtr, and see what happens in DllMainCRTStartup when your app exits.

After a bit of poking about with WinDbg (thank goodness we get decent symbols for the OS now), I could see that some kind of “database” was being created within CLBCATQ.DLL:

0:000> k
ChildEBP RetAddr¼br> 0012faf4 770076fa CLBCATQ!StgDatabase::InitDatabase
0012fb18 77005306 CLBCATQ!OpenComponentLibraryEx+0x3e
0012fb34 76fd954d CLBCATQ!OpenComponentLibraryTS+0x1a
0012fdd0 76fd1f24 CLBCATQ!_RegGetICR+0x205
0012fdf0 76fd6a20 CLBCATQ!CoRegGetICR+0x29
0012fe48 76fd80f9 CLBCATQ!IsSelfRegProgID+0x6b
0012fe88 77518a6d CLBCATQ!CComCLBCatalog::GetClassInfoFromProgId+0x51
0012fec0 77518964 ole32!CComCatalog::GetClassInfoFromProgId+0x149
0012fee0 775188a0 ole32!CComCatalog::GetClassInfoFromProgId+0x1e
0012ff0c 00401340 ole32!CLSIDFromProgID+0x95
0012ff7c 00401cae testleakcheck!wmain+0x60
0012ffc0 7c816fd7 testleakcheck!__tmainCRTStartup+0x10f [f:\rtm\vctools\crt_bld\self_x86\crt\src\crtexe.c @ 583]
0012fff0 00000000 kernel32!BaseProcessStart+0x23

I could see by looking at the exports that there was a function called CoRegCleanup in CLBCATQ.DLL that looked like it could be used to free up this storage before I did my leak checking. Calling it by dynamically getting the function pointer using GetProcAddress did make a difference, but there was still some memory not freed, and I didn’t feel comfortable using an undocumented function in this way.

Then I remembered the magical OANOCACHE environment variable.

This is used to tell the COM runtime not to cache memory used for BSTRs, VARIANTs, SAFEARRAYs, or anything else allocated using CoTaskMalloc. So, I set the variable, re-ran the test and voila! the apparent leaks disappeared. There must be something in CLBCATQ that detects the environment varibale and disables it’s internal cache.

So the moral of the story is; if you’re attempting to reliably track memory usage with IMallocSpy, remember to make sure you have OANOCACHE set, otherwise you will always end up with memory not being freed until late into process teardown.

Exposing static libraries to .NET using C++/CLI



I’ve been looking recently at how to make unmanaged C++ code in static libraries available to code written in .NET languages.

There isn’t any direct way of calling into a C++ static library (.lib) from C# code, but this isn’t suprising as they’re very different in terms of how they’re compiled and linked to form an executable. This is where managed C++ – or to give it’s official name C++/CLI – comes in; it does know how to link managed and unmanaged modules within the same binary. So, if you’ve got a static library that you want to make available to .NET clients, there are a couple of options:


  1. Build the .lib into a standalone unmanaged DLL, and call into it using P/Invoke
  2. Build the .lib into a DLL using managed C++

The problem with the first approach is that, especially if you have a complicated method signature, you’re likely to have some complicated marshalling instructions wherever you declare it using [DllImport]. Also, if you plan on updating your DLL at some point, you’ll quickly find yourself back in the pre-COM world of DLL hell – version management becomes a headache.

The second approach gives you some further options about how you expose the functionality to callers:

  1. Unmanaged entry points via the normal C++ __declspec(dllexport) and/or .DEF file mechanisms
  2. Managed classes that are publicly visible and consumable by .NET clients, using public ref class Foo.
  3. A combination of the two!

Exposing managed classes is very straightforward, assuming you’ve already thrown the /clr switch on the project (the binary/wrapper project, not the static library), you can write code like:

public ref class Foo
    static double Bar(array<double,2> ^ConstraintsLHS, // matrix
        array<double> ^ConstraintsLHS) // vector
        // TODO: call into unmanaged static library function.
        // “IJW” interop will deal with marshalling.
        return -1;

This has the advantage that it’s much more natural for .NET clients to consume your code. They can pass managed types to the function, and the “It Just Works” interop in the C++ code will deal with the conversion at the point that you call into the unmanaged function in the static library. It’s also possible to do some more specific marshaling of your own; for instance pinning input arrays to avoid additional copies.

Also users can add references directly to your managed DLL, because it’s an assembly just like any other, and take advantage of all the usual assembly versioning controls.

If you also want to allow unmanaged clients to call your DLL in a natural way, you can use #pragma managed to temporarily switch modes when you define your function:

#pragma managed(push, off)

__declspec(dllexport) double Foo_Bar()
    // TODO: call into static library function.
    // No marshalling/transition required if called by unmanaged code.
    return -1;

#pragma managed(pop)

Now your function will be callable without incurring a managed/unmanaged transition, although it’s worth noting that your DLL will still have a dependency on MSCOREE.DLL, so it will need to be present on machines where it’s used.

So as you can see, C++/CLI gives you a variety of ways of exposing legacy C++ code in an easily callable way to both managed and unmanaged clients. This gives the developer a high degree of control over how and when marshalling and transitions occur.

PLINQ – Parallel LINQ

One of the most interesting things in terms of performance that comes out of the move to a more declarative style when using LINQ, is the ability to easily parallelise your computations.

This benefit has been widely understood for some time in the functional programming world, where side-effect free functions can be exploited in combination with higher order functions like map and reduce to process data in parallel. Joel Spolsky gives a good background to that here, in the the context of the Google MapReduce framework.

Well, now it looks like Microsoft is finally catching up with the parallel data processing meme. CLR and concurrency expert Joe Duffy is lead developer on the parallel LINQ (PLINQ) project, that hopes to be able to automatically parallelise LINQ queries across multiple cores or processors. Joe’s excellent technical blog contains some tidbits of PLINQ information (as well as a lot of detailed concurrency-oriented posts) including a link to his presentation from the Decalarative Aspects of Multi-core Processing conference, that includes both a high-level overview and some interesting implementation details.