Tag Archives: IL

Mixing it up: when F# meets C#

Simples?
Simples?
If it were a perfect world, we’d all exist in a happy little bubble of our favourite programming language and you’d never have to worry about the nasty details of interacting with something written by – gasp – someone else in a – double-gasp – different language. But unfortunately that’s precisely what we have to do all the time. And that means that one day all of your fancy-pants algorithmic, highly parallel, functionally pure F# code is going to meet the world of “enterprise” C# development head-on.

Of course the idiomatic way to avoid problems at the boundary between your F# code and the outside world is to ensure that you only expose a small set of compatible types. This works pretty well if your clients are also .NET languages. For instance you can do things like exposing your collections as seq, rather than say, a native F# list, and this will mean your collections can be consumed as IEnumerable. The only problem is it means you’ve got the added burden of maintaining this mapping layer, because you’ll no doubt want to use the F# “native” types internally.

So, what options do we have if some of our F# types happen to leak into our public API? Luckily, lots. Let’s take a look at how some of the common F# constructs can be called from C#.
Continue reading Mixing it up: when F# meets C#

Public static fields gone from F# 2.0

There have been quite a few changes in F# version 2.0, which shipped as the first “official” version of the language as part of Visual Studio 2010. Most of the changes are detailed in various release notes on Don Syme’s blog and other places, but unfortunately one of the more significant changes passed me by, and turned out to be quite significant in the context of WPF development: public static fields are no longer supported. But what does this mean?

The change

The change itself is simple: static fields can no longer be public. Static fields can still be created, but they must be private.

In pre-2.0 versions of F# it was possible to declare a static field on a type like this:

type MyType() =
    []
    static val mutable public MyProperty : int

Resulting in a type containing a static field, as you’d expect:

    .field public static int32 MyProperty

The code now generates the compiler error:

error FS0881: Static 'val' fields in types must be mutable, private and marked with the '[<DefaultValue>]' attribute. They are initialized to the 'null' or 'zero' value for their type. Consider also using a 'static let mutable' binding in a class type.

Notice that there are already some gnarly aspects to the definition of the property; notably the use of the [<DefaultValue>] attribute, which indicates that the field is un-initialized. This gives us a hint that there might be some inherent problems.

Why do we even need them?

In a word or three: WPF dependency properties.

The recommended way of implementing dependency properties in other .NET languages is to use public static fields, e.g. in C#:

    public static readonly DependencyProperty MyPropertyProperty =
        DependencyProperty.Register( 
          "MyProperty", typeof(int), typeof(MyType)); 

This is one of the few places where C# is terser than F#. Here we can declare and define the value in one line (ish), whereas F# requires a separate declaration of the field (as above) then initialisation in the static constructor (static do).

Why was it removed?

I got the answer from the horse’s mouth. Don Syme said that:

We deliberately removed the ability to create public static fields in Beta2/RC, because of issues associated with initialization of the fields (i.e. ensuring the “static do” bindings are run correctly, and if they are run, then they are run for the whole file, in the right order, with no deadlocks or uninitialized access when initialization occurs from multiple threads).

You can imagine how this would be a problem; there would need to be a way of ensuring that whichever static field was accessed it caused the static constructor to run, which may itself access static fields. All pretty nasty. In fact, Don mentioned that C# suffers from much the same synchronisation issues, but just tends to be used in a way that means it’s less likely to be noticed!

The alternative

At least in theory it’s possible to create a type that uses static properties rather than static fields to store its registered DependencyProperty information.

type Foo () =
    inherit FrameworkElement()
    static mutable private _myPropertyInternal : DependencyProperty 
    static member this.MyProperty with get () = _myPropertyInternal

Whether or not this works depends very much on how calling code accesses DependencyProperty information. If it uses reflection to access the field directly it will obviously fail. But if it uses a more robust/flexible method then it should be OK. Empirically it seems that the WPF framework code itself does the latter, for instance, when it’s instantiating objects from XAML, and it works properly independently of how it’s implemented.

The conclusion

So the conclusion is “wontfix”: the behaviour is by design. Unfortunately it has the effect that it’s no longer possible to create a type with an identical IL “signature” in both F# and C#. It seems a bit of a shame, but I guess the trade-off is that we’re protected against insiduous initialisation issues, so it will probably turn out to be the right thing in the long term.

.NET 4.0 Type Equivalence causes BadImageFormatException

I recently discovered a nasty backward compatibility problem with the new type equivalence feature in .NET 4.0. Luckily it’s relatively difficult to hit it if you’re in a pure-C# environment, but if you happen to generate any assemblies directly using IL, you should watch out. Read on for all the gory details.
Continue reading .NET 4.0 Type Equivalence causes BadImageFormatException

IL analysis using F#

I recently needed to determine which functions were called by some of our F# code. Naively, you can use existing tools like ildasm, to disassemble a .NET DLL and then search the resulting IL source code for references. The obvious problem here though, is that you’re going to include all references whether or not they’re actually called. In some circumstances this isn’t too bad, but in our case we pull in a great deal of shared library code, so you’re likely to get lots of false positives.

There are some other options to more accurately determine whether the method you’re interested in is actually called: run the code, or “almost” run it, by simulating the operation of the CLR. To radically understate; this is quite a lot of work. Yet another option is to statically analyse the original source code. This is generally easier than dynamic evaluation, but there are some serious and well known problems doing it exhaustively, that can result in the complexity eventually converging with that of full dynamic analysis.

So broadly, we have 3 types of approches:


Approach


Implementation


Accuracy

Disassembly Easy Superset
Dynamic analysis Hard Exact
Static analysis Medium Medium

Anyone for a trade-off? Unsurprisingly I decided to look at implementing the third option. Although static analysis is normally performed on the source code itself, it’s actually easier for us to use the generated IL, it certainly requires less gnarly parsing. We can also take some short cuts based on the fact that we’re analysing F# code, more on that later.

We can use F#’s discriminated unions – a type that is constructed from one of many possible options – to describe the universe of IL instructions in a pretty concise way, e.g. (a partial example):

type inst =
    | Nop
    | Break
    | Ldarg_0
    | Ldc_i4 of int32
    | Newobj of meth
and field = FieldInfo
and meth = MethodBase
and typ = Type

This allows us to construct instances of inst by doing something like this in fsi (F# interactive):

> let i = Ldc_i4 2;;
val i : inst

You may have noticed that as well as the instructions that take simple types like int32, we also have ones that accept meth, which is an alias for System.Reflection.MethodBase, the base class for all methods, including constructors, which is what’s used to construct a Newobj.

Now we have this discrimated union defined, we need a way to build instances of it. In the IL byte stream, instructions are stored as opcodes, an unsigned 16bit integer. Firstly we need to get the raw bytes representing the IL. Using Reflection, it’s fairly easy given m of type MethodInfo:

    let body = m.GetMethodBody()
    let ilbytes = body.GetILAsByteArray()
    let ms = new IO.MemoryStream(ilbytes)
    ...

So now we have a stream of bytes, and we can use functions from System.IO to extract information in various sized pieces:

    let getByte _  = (byte (ms.ReadByte()))
    let i2 _ = readInt16 ms
    let i4 _ = readInt32 ms
    ...

As Harry Hill would say; “well, you get the idea with that”. It’s worth noting that these functions have a dummy argument (indicated by the
underscore). This is required because they have a side effect – reading from the stream, changing it’s state – which is not obvious to the compiler, so if we omitted it the function would only be called once. Although adding the dummy arg is required, it does have the unfortunate consequence that we have to pass something (normally unit) which can look a little ugly in the normally terse F# world.

As the ECMA CIL spec describes, IL opcodes consist of either 1 or 2 bytes, in which case the first is always 0xFE. Now we can begin to implement something serious. Given ms of type MemoryStream we can write something that will convert it to instructions:

    match ms.ReadByte() with
    | 0xFE as lb ->
        // Two byte instruction, read further byte
        let hb = getByte()
        let i = ((uint16 lb) <<< 8 ) + (uint16 hb)
        let t =
            match i with
            | 0xfe01us -> Ceq
    | _ as b ->
        let t =
            match b with
            | 0x0 -> Nop
            | 0x1f -> Ldc_i4_s (getByte())
            | 0x20 -> Ldc_i4 (i4())
            | 0x73 -> Newobj (meth())

So we now have a function that will go from a method to a list of opcodes and operands (MethodBase -> inst []). These are essentially the same steps we would perform if we were writing an interpreter for a textual language; taking the source and transforming it into an abstract syntax tree. In that case it’s a tree rather than a list, but the next step is pretty much the same anyway: we pattern match over it. This is the point where we can decide how we want to interpret the instruction stream.

        insts
        |> List.map (fun inst ->
            match inst with
            | Newobj(meth) ->
                printf "NEW: %s.%s\n" meth.DeclaringType.Namespace meth.DeclaringType.Name
            | _ ->
                ()

Here we need to make some compromises based on the problem domain. I’m not trying to create a general purpose static analyser, but one that will work on object code in a certain format – that generated by the F# compiler. As such we make some assumptions and use some knowledge about the internals of the compiler to get the result we’re after. To be specific we’re relying on the fact that the compiler generates types for closures, and we assume that closures will always be called, even though in reality they needn’t be.

So based on this, we can put together something that, given an entry point – a particular method on a type – can recurse through the code, following references to other methods and types via the Newobj, Call, Calli and Callvirt instructions. This will build up a graph of all types referenced directly from our starting point. We also use our intimate knowledge of the purpose of F#’s FastFunc type (from which all functions are derived) and always follow its Invoke method if we find an instance of that type, even if it’s not directly referenced.

There are some major caveats. Anything accessed purely via reflection will not be detected. And polymorphic objects passed in and accessed via interfaces will also be missed. Also, I don’t attempt to do full flow analysis; e.g. following branch instructions etc, as this isn’t a common pattern in fsc-generated IL.

Luckily in the particular cases I’m looking at, these shortcomings don’t have a significant impact. Instead, we end up with a reasonably straight-forward and useful way of determining whether a particular function is called. It’s already been used in anger to determine whether a buggy function was referenced from some release-candidate software.

As a little post-script: rather than writing your own library from the ground-up to do this, there are some “off-the-shelf” solutions that you can try. Notably the recently released CCI, a common compiler infrastructure out of Microsoft Research, that allows you to reverse engineer IL metadata. I haven’t had a chance to have a good look at this yet, but it seems to do what we need for call graph analysis. There’s also an API called AbstractIL – in the absil.dll assembly – that ships with and is used internally by the F# compiler toolset. This looks extremely powerful, but the API is complex and the documentation is poor. Depending on exactly what your motivation is for looking at this stuff, it’s worth checking if these ready-made libraries will do what you need.

Verifying dynamically generated IL

It’s safe to assume that when you use the C#, F# or (heaven forfend) VB.NET compilers, the IL generated for you will be correct. But, if you’re using Reflection.Emit to generate code “by hand” in a dynamic method or assembly it can be difficult to identify problems with the IL you emit. In the majority of cases the runtime will simply throw an InvalidProgramException. This is of course, exactly as you’d expect, as the JIT compiler (which generates architecture-specific machine code from the IL) is intended to be highly performant, rather than robust to errors which should’ve been dealt with earlier in the tool chain.

So what tools can you use to troubleshoot problems with dynamic IL? In a word: peverify.
Continue reading Verifying dynamically generated IL