Prove It To Yourself With The CIL

23 June 2019

This blog requires javascript enabled sourced from github.com and my site for full code formatting/syntax highlighting/tooltips. Gists. Some syntax highlighting is provided for the plucky few without javascript with this tool pre.cscode {margin:0em; overflow:auto; background-color:#ffffff;} .cscode code {font-family:Consolas,"Courier New",Courier,Monospace; font-size:10pt; color:#000000;} .cscode .key {color:#0000ff;} .cscode .com {color:#008000;} .cscode .str {color:#a31515;}

This blog will give an introduction to CIL, and give some examples on where it might help while teaching your friends.

I’m a natural sceptic, a characteristic that has its ups and downs - I will freely admit that I don’t know if the earth really is round because I haven’t seen the whole thing yet (much to the delight of some colleagues). On the positive side of things, a sceptic is often willing to dig a little deeper to be sure an answer is correct, which can lead to many happy hours poking around blogs, documentation and sometimes even Stack Overflow.

This scepticism really hit me when I was shoring up my knowledge on the fundamentals of computer science in anticipation of a friend spending the day in my front room for a computer science 101 boot camp (we did have a reason – the day wasn’t just for fun). How was I prove any bold claims about programming? Claims such as the C# conditional AND operator is short-circuiting, and the compiler adds a default constructor if we don’t specify one? I’d always accepted these as gospel, but had I ever bothered to check for myself... i couldn't remember? Enter one of my favourite sayings.

Read the ~~code~~ CIL.

Common Intermediate Language (CIL) is the language spat out by the C# compiler when we smash that F5 button. It is a low-level language that resembles a half-half curry of assembler and high-level languages. It is a platform and CPU agnostic language that will be just-in-time compiled to real machine instructions by a platform specific Common Language Runtime (CLR) at a later time.

CIL targets a stack-based machine, meaning most instructions will either push or pop a value from the stack. The operands of instructions are (usually) stored on the stack. I want to stress that we don’t have to be CIL experts to use it for reasoning about our .NET code - we just need to keep an open mind and try to avoid getting bogged down into unnecessary details. If you want to code along I’d highly suggest downloading LINQPad, who’s IL tab shows the CIL generated when we run a snippet and was used to write this post. Documentation on each instruction can be found here, or hover over each instruction for more detail. It should also be noted that all examples are compiled without optimisations and in LINQPad 5.

Example - Declaring and initializing a variable

int i = 128; // Push the value 128 (0x00000080) onto the stack IL_0001: ldc.i4 80 00 00 00 // Store our 128 at location 0 (where the compiler has chosen for i to live). IL_0002: stloc.0 // i System.Console.WriteLine(i + 10); // Push the value of i back onto the stack. IL_0007: ldloc.0 // i // Push 10 (0xA) onto the stack IL_0008: ldc.i4.s 0A // Add the two top values, leaving the result on top of the stack IL_000A: add // Call the static System.Console.WriteLine method, notice we’re passing a // parameter by pushing its value onto the stack before a call instruction. IL_0008: call System.Console.WriteLine

There are generally two or three parts on each CIL line (from left to right):

IL_XXXX - This is a label which can be used to refer to lines, you can mostly ignore these or think of them as line numbers.
The instruction, for example "ldc.i4" and "stloc.0" in fig1.
The operands to the instruction, example "80 00 00 00" in fig1 is the operand to the load constant instruction. Operands might also live on the stack, example in the add instruction.

Example - Calling a method. Notice how i + 10 is evaluated before the call instruction.

System.Console.WriteLine(i + 10); // Push the value of i back onto the stack. IL_0007: ldloc.0 // i // Push 10 (0xA) onto the stack IL_0008: ldc.i4.s 0A // Add the two top values, leaving the result on top of the stack IL_000A: add // Call the static System.Console.WriteLine method, notice we’re passing a // parameter by pushing its value onto the stack before a call instruction. IL_000B: call System.Console.WriteLine

So cool but there’s nothing too interesting going on here, the CIL and C# resemble each other very closely. Let’s look at some more examples…

Claim: The C# compiler adds a default, parameter-less constructor to a class if we don’t explicitly specify one.

Don’t believe me? I can prove it…

class MyClass { } // Compiles into... MyClass..ctor: // Push argument 0 onto the stack. Any CIL instance methods have an Argument 0 // which is a pointer to the current object the method being is called upon, 'this', in C# IL_0000: ldarg.0 // Call our base class constructor on this. IL_0001: call System.Object..ctor IL_0006: nop // Return from our method. IL_0007: ret

Well would you look at that, with one simple class we can gain some insight into what the C# compiler does for us behind the scenes. We’ve convinced ourselves that the compiler really is adding in a default constructor - MyClass..ctor its right there! In the past to show this I might have given some wishy-washy reasoning such as “well look – we can initialize this class without an explicit constructor, so there probably has been one added”. Now I can show it really is there!

Another insight we can gain from this example is the call to the System.Object constructor. Anyone who has sat through OOP-101 can tell you constructors are called down the inheritance hierarchy, from the least derived (always System.Object,in C#) to the most derived (assuming single inheritance!). Now I can show this concept to all my friends without wasting time writing a class and a superclass who both print out their class name in the constructor. Just read the CIL and we’ll have time for a pint too.

In summary, we can see the embellishments made by the compiler to our class:

class MyClass : object { public MyClass() : base() { } }

Claim: The C# conditional AND operator is short-circuiting.

Don’t believe me? I can prove it… sometimes…

An operator is thought of as short-circuiting if it does not necessarily have to evaluate all its operands. Take the logical AND operator, which is elegantly described in the C# 5.0 specification as:

Conditional AND (x && y): 	Evaluates y only if x is true

Everyone’s favourite demonstration that captures this short-circuiting nature is to write a couple of side affecting methods, call them in the place of the operands x and y above then observe…

void Main() { bool p = SideEffectOne() && SideEffectTwo(); } static bool SideEffectOne() { System.Console.WriteLine("SideEffectOne"); return false; } static bool SideEffectTwo() { System.Console.WriteLine("SideEffectTwo"); return true; }

The example above only writes the string literal “SideEffectOne” to the console meaning SideEffectTwo was never evaluated or invoked, but maybe we just got lucky. Better explore the CIL to be sure… (notice how 0 represents false)

// call our first method, leaving the argument result on the stack top IL_0001: call UserQuery.SideEffectOne // jump to IL_000F if the first method returned false, pop from the stack IL_0006: brfalse.s IL_000F // call our second method, leaving the argument result on the stack top IL_0008: call UserQuery.SideEffectTwo // always jump to IL_0010, (no value popped!) IL_000D: br.s IL_0010 // load integer value of 0 on to the stack top IL_000F: ldc.i4.0 // store the value at the top of the stack into the memory chosen for p IL_0010: stloc.0 // p

Perhaps its just me, but I love how devious the compiler gets here, it has pulled a fast one – generating code to evaluate a conditional AND without using a single CIL and instruction. Let’s convince ourselves this is all above-board.

Even though SideEffectOne is guaranteed to return false, the compiler has generated code covering both possibilities. fig7 considers the scenario where SideEffectOne returns false, fig8 true. In both examples the state of the stack is shown on the far right.

IL_0001: call SideEffectOne [0] IL_0006: brfalse.s IL_000F [] IL_000F: ldc.i4.0 [0] IL_0010: stloc.0 // p = 0 []

We can see from the first path the compiler was paying attention in its first Boolean Algebra lecture – stylishly utilizing the identity 0 ∧ p ≡ 0, or in English anything ANDed together with false will always equal false. If the result of SideEffectOne is false, the short-circuiting nature of the conditional and operator comes to play – the variable p will always be set to false and SideEffectTwo will never be called due to the brfalse.s skipping right over the call.

IL_0001: call SideEffectOne [1] IL_0006: brfalse.s IL_000F [] IL_0008: call SideEffectTwo [p] IL_000D: br.s IL_0010 [p] IL_0010: stloc.0 // p []

When SideEffectOne returns true we suddenly start to care about the return value of the SideEffectTwo – that’s all we care about, in fact. There’s no short-circuiting involved, however we do get to browse the compiler’s bag of Boolean tricks one more time – specifically the identity 1∧p≡p. Anything ANDed with true will always be true, meaning b will always be assigned the value returned by SideEffectTwo, regardless of which way the coin landed.

Recalling what the C# specification has to say about conditional AND: “Evaluates y only if x is true”, we can see this perfectly describes the two paths through our CIL instructions – the second call instruction is only executed if the first returns true. Similar arguments can be made about the conditional OR operator which are of course left as an exercise to the reader.

A strange counter example…

Compiling the following snippet:

bool p = ...; bool q = ...; bool r = p && q; // p and q setup here… // push p onto stack top IL_0005: ldloc.0 // p [p] // push q onto stack top IL_0006: ldloc.1 // q [p, q] // AND together top two stack values IL_0007: and [p && q] // Store our result into the memory location assigned to r IL_0008: stloc.2 // r []

We see that both p and q are evaluated without any short-circuiting. I suppose this code doesn’t use short-circuiting as that would add an extra branch instruction increasing the size and complexity of the codebase. This only works as evaluating both operands has no side effect. For completeness sake here’s the CIL generated by LINQPad 4.59.00 demonstrating the extra branch instruction, look familiar?

IL_0005: ldloc.0 // p IL_0006: brfalse.s IL_000B IL_0008: ldloc.1 // q IL_0009: br.s IL_000D IL_000B: ldc.i4.0 IL_000D: stloc.2 // r

Claim: Named parameters are evaluated in the order specified at the calling site (rather than the order on the method signature), from left to right.

Don’t believe me? You get the picture…

Introduced in C# 4, named parameters allow a programmer to specify a parameter name at the call site of a method invocation which in turn allows parameters to be passed to the method in a different order to the method definition. The very verbose and uncool example I would usually come up with of demonstrating the order evaluation without whipping out the CIL-scope was:

void Main() { Method( p1 : GetParameter(1), p2 : GetParameter(2)); Method( p2 : GetParameter(2), p1 : GetParameter(1)); } static void Method(int p1, int p2) { System.Console.WriteLine($"p1 = {p1}, p2 = {p2}"); } static int GetParameter(int num) { System.Console.WriteLine($"Evaluating {num}"); return num; }

If you got to this line without reading the code above, excellent work – who has time to read tens of lines when one will do? If you did read the above, you can pretty much forget it… At least we can appreciate how much time creating smaller examples and looking at the CIL saves us.

Parameters are passed to a method on the stack in reverse order, e.g. string.Equals(a, b) will require b on the stack top, and a one below (top of the stack is to the right, remember):

What if we reverse the order of a and b?

We can see at a glance from fig13 below that System.Environment.NewLine is evaluated before System.Environment.UserName, despite the method signature being the opposite.

// call the getter for NewLine leaving the result on the stack top. // this will be parameter b IL_0000: call System.Environment.get_NewLine [“\n”] // store the result into location 0 IL_0005: stloc.0 [] // load parameter a IL_0006: call System.Environment.get_UserName [“Ned”] // load parameter b against, notice a is still on the stack. IL_000B: ldloc.0 [“Ned”, “\n”] // call string.Equals with our two parameters on the stack in order. IL_000C: call System.String.Equals [false]

Notice how when Equals is called both stacks are identical, but some extra gunk is needed in the second example to get the stack into the correct state. The compiler can’t guarantee that the get_NewLine method has no side-effects, so is forced to evaluate parameter b before a, giving us two extra instructions - stloc and ldloc.

The stloc is included (conceptually) to temporarily store the result of get_NewLine, since it needs to be evaluated first but pushed onto the stack as Equals’ second parameter, b. The ldloc instruction resurrects our get_NewLine result and pushes it back onto the stack as the second string.Equals parameter.

So, there you have it, next time your friend asks you how named parameters are evaluated you can do better than just telling them and in about as much time. Your most perceptive friend might then inquire…

“It seems to me that there’s some overhead here at the CIL level of passing parameters to methods out of order due to our extra copy instructions, so you should avoid supplying parameters out of order... right?”

That was certainly my first thought, but from reading around online I couldn’t find much conclusive evidence either way. This extra copy does sound like the kind of thing that can be optimized out when a machine is free to use registers rather than a stack to pass parameters.

In summary, I think its well worth getting to grips with basic CIL when you’re showing someone the ropes - you can keep the code examples concise and hopefully provide a deeper level of undrstanding. This brings back a memory from years ago when I was watching a lecuture from Stanford, introducing programming with Java – the prof made a comment along the lines of “Java gets translated into Java bytecode which not a lot of people in the world can read, and those people are weird”. Anyone fancy getting weird with me?