Before the release of .NET technology in 2002, the output of the compilation process was binary files that could only run on the computer where compilation took place, and on computers of similar architecture. Those files, that took the form of executable (.exe) or dynamic link library (.dll), could run on the operating system directly. If they needed a service, like memory management or access to a port, they would send requests to the operating system, typically as COM calls. This is what is referred to as unmamanaged code, and also native code, and it means an application running on the machine where it was developed.
.NET technology changed things. Even though .NET compilation produces portable executable (PE) files with the same extensions as unmanaged Windows binaries, in other words (.exe or .dll), there is absolutely no other similarity. A .NET binary is called an assembly, and unlike its unmanaged counterpart, it has a definite internal structure, compared to being raw machine code. Specifically, it is made up of 4 elements. Those are, the Assembly manifest, Type metadata, MSIL code, and optionally a set of resources. The MSIL code is a low-level programming language, reminescent of Java bytecode, and it is the output of all .NET-aware compilers. For example, if a C# compiler, VB compiler and F# compiler, were all to process the same pseudocode in the respective languages, the three .NET binaries emitted, would be nearly identical, and capable of running on machines of different architectures. This is possible because the CLR (Common Language Runtime) component of .NET, translates MSIL into bytecode that the operating system understands. So with .NET, the programmer is freed from machine-specific programming and low level work, and can focus more on the application. The .NET platform is a layer resting on top of the operating system, that provides a real separation between the application and the underlying mechanics.
As mentioned earlier, a .NET assembly can consist of four elements:
- The assembly manifest, which contains metadata specifying the properties of the assembly.
- Type metadata.
- MSIL (Microsoft Intermediate Language) code that implements the types. It is generated from a .NET compiler that consumes a source code file of a .NET language.
- A set of resources, which is non-executable data stored in an assembly.
This element contains information about the identity of the assembly. If the assembly is single-file, the manifest is part of the portable executable (PE) file, and shares space with the Microsoft intermediate language. If it is a multi-file assembly, the manifest usually exists as a standalone PE file that contains only assembly manifest information.
The following list describes the information contained in the assembly manifest. The first four items: assembly name, version number, culture, and strong name information make up the assembly’s identity.
- Assembly name: A string that specifies the name of the assembly.
- Version number: The minor and major version numbers, and the revision and build numbers.
- Culture: Information on the culture or language that the assembly supports.
- Strong name information: The public key from the publisher, if the assembly has been given a strong name.
- List of files: A hash of each file contained in the assembly, and a file name.
- Type reference information: Information the CLR uses to map a type reference to the file that contains its definition and implementation.
- Referenced assemblies: The list of other assemblies that are statically referenced by the assembly.
Whereas the manifest metadata describes the assembly itself, the type metadata describes the contents in the assembly. More specifically, it describes in detail the characteristics of every variable type within the binary. The types could be built-in, such as int, double, or string, or user-defined and more complex ones, such as classes, interfaces, enums, structs, etc. In addition to the name of each type, Type metadata also specifies their containing namespaces, their base class, the interfaces implemented, visibility and scope, and each method’s parameters and type properties.
The type metadata is generated automatically by the compiler from the source files, and it is embedded in the target output file, in other words, the .NET assembly.
MSIL (Microsoft Intermediate Language)
The build process of a .NET application converts the source code files of the managed languages involved (C#, VB, F#, etc.), into .NET assemblies that contain MSIL code, or simply IL for short. IL is another programming language, just like C#. But one would probably not want to code an entire .NET application in IL, because IL is low-level, and far from the English-like syntax of high-level programming languages like C#.
Compiling high-level source code into IL has several benefits. One benefit is the .NET “multilingualism”. Since all .NET-aware compilers produce more or less the same IL instructions, a .NET application can have components from multiple languages. Another benefit is platform-agnosticism. IL code is compiled on the fly, and converted to platform-specific CPU instructions that are optimized by the CLR for the underlying architecture. The is done by the JIT compiler of .NET. The JIT knows that a .NET application to be deployed to a mobile device should run within low memory constraints, and that one intended for a back-end company server can run in a high-memory environment.
Learning IL may not seem necessary, but there are advantages to understanding IL grammar:
- A .NET binary might need to be modified to interoperate better with COM components.
- It might be desireable to tinker with the Common Type System. This is not supported by high-level languages like C#, but it is possible with IL.
- Building dynamic assemblies using the System.Reflection.Emit namespace, which can persist in memory. This can be a useful technique for tool builders.