The purpose of this document is to introduce developers and other technical staff to the most important piece of Microsoft's .NET strategy, the Common Language Runtime.
This piece is intended to be a resource to the following audiences:
Managers, developers, systems administrators and others with a technical background who want to understand what the CLR is, and how it fits into Microsoft's .NET strategy
After successfully completing this tutorial, you should be able to:
Familiarity with basic programming concepts
Microsoft's .NET is a broad family of products representing the company's next generation of services, software, and development tools. At the core of the .NET strategy lives the Common Language Runtime. The CLR is a platform for software development that provides services by consuming metadata. It is standards-based and component-oriented. Like any platform, the important pieces are the runtime, the libraries that provide access to them, and the languages that can target the platform.
The aim of this tutorial is to provide a foundation for forming and answering questions about the technical aspects of the CLR. We will examine this technology at a high level, defining and touching on each of the core aspects - runtime, libraries, and languages - in turn. Additionally, we'll look at the extensive support the CLR gives to standards-based and component-oriented software development.
We'll start with a broad overview of the CLR - what are the major pieces, what was the motivation for moving to this new model, and what benefits it provides. Then we'll dive down to cover some aspects of the CLR in greater depth. References to sources for further, more detailed study will be given throughout.
The Common Language Runtime is the core of Microsoft's .NET vision.
The .NET vision was officially introduced at the Microsoft Professional Developer's conference in Orlando, Florida, in July 2000, although at the time much of the documentation referred to it as "Next Generation Windows Services." Since the PDC, Microsoft has continued to expand upon the list of products and services associated with the .NET name.
In keeping with their tradition of defining vague marketing terms (think ActiveX - did anyone ever figure out exactly what that meant?), the moniker ".NET" has been applied to everything from the next version of the Windows operating system to development tools.
This effort on Microsoft's part to frame everything from mice to FoxPro in terms of .NET is actually a good sign: it indicates to consumers such as you and me that Microsoft is serious about the product, that it represents both a core part of their strategy, and that they are making a fundamental and massive shift. In the same way that they did with COM in the mid 1990s and with the Internet in later years, Microsoft is "betting the company" on this new technology.
But what exactly is .NET? Although the precise meaning can be a little hard to isolate by reading the prolific marketing literature, a little digging reveals that .NET is in fact Microsoft's grand strategy for how all of their software, systems, and services will fit together. It includes development tools (like the new version of Visual Studio, dubbed Visual Studio.NET), future versions of their Windows operating systems, new Internet-based services (like a stepped-up version of their Passport web authentication service), and an entirely new beast called the Common Language Runtime.
The Common Language Runtime is the single most important piece of the .NET product strategy, because it is in essence the engine that pulls the train - the CLR is how developers will write software in the brave new .NET world (see figure 1). For that reason, this writing will focus on the CLR exclusively.

The CLR is a development platform. Like any platform, it provides a runtime, defines functionality in some libraries, and supports a set of programming languages.
The CLR is a platform for developing applications. A platform is a set of programmatic services, exposed through some API to developers using one or more languages. Development generally targets a single platform; when I write a program using Visual Basic, I say that I'm writing it for Windows, my target platform. The forms and controls that I develop won't run directly on, say, Apple's Mac OS X.
The CLR is not an operating system in the strict sense of the term - it does not, for example, provide a file system, relying instead on the underlying OS (such as Windows) to implement that feature. The CLR is, however, a platform, and in much the same way that code written for Unix will not run on Windows, code must specifically target the CLR. Don't panic, though, because there's plenty of consideration given to interoperating with existing, non-CLR code. You'll still be able to use your existing COM objects and DLLs while taking advantage of the new features of CLR development.
The Common Language Runtime is Microsoft's development platform of the future. In Microsoft's vision of the world, most future software will be written to make use of the CLR features. We'll be looking at what the CLR provides you so you can decide for yourself whether the advantages outweigh the costs. Now, assuming that you agree that this new platform offers significant advantages over your current platform, you might wonder, "What's the big picture? What things do I need to learn in order to develop for the CLR?"
When I approach any new platform, be it a new operating system, the CLR, or even an application suite that allows automation of its features, like Microsoft Office or SAS, I mentally break down the feature set of the platform into three fundamental areas: the runtime that the platform offers, the libraries it defines, and the languages I'm going to use. These aspects of the platform overlap (see figure 2), and understanding each of them and the ways in which they interact is crucial to becoming an effective CLR programmer.

A runtime provides services to software that you write. The CLR provides a runtime.
Let's start with the first and arguably most important of the three fundamental areas in figure 2, the runtime. A runtime is a piece of code, written by the platform vendor, which provides your code with a set of services. What sorts of services? Well, it depends on the platform - it might be anything from checking security for you to implementing a file system to providing access to some piece of hardware.
I like to think of a runtime as a sort of butler for my code. My code asks the runtime to do things, and the runtime goes off and does them. This is nice because it means I can concentrate on writing code that has to do with the problem I'm trying to solve - like providing a system that approves customers for mortgages over the web - and not on writing a whole lot of low-level grungy code having to do with things like interacting with hardware.
Almost every program written these days takes advantage of some sort of runtime. Very few programmers start a project by writing their own file system or database engine. Rather, they make use of already-written pieces of software, like Windows 2000 or Oracle. Both of these platforms (yep, they're both platforms) have a runtime that provides services. In the case of Windows 2000, the runtime is the operating system kernel, and it provides services like thread management. In the case of Oracle, the runtime is the database engine, and the services include things like a SQL engine and transactions. We say that we write code that "runs on" a platform because it uses the services provided by the platform. The CLR provides these services using a layered architecture that is shown in figure 3. In some cases our code uses the CLR directly and in others indirectly.

Some other examples of a runtime you may have used before are the Visual Basic Runtime, the SQL Server Engine, and the COM+/MTS runtime. Each of these provides services in a generic way so that you can rely on the runtime to do the work, rather than having to write the code yourself every time.
The CLR is a runtime. In fact, that's what the "R" stands for. The documentation refers to this as the execution engine, but that's just another name for the same thing. I'll just call it "the runtime" or "the CLR" from here on out, and we'll know we're talking about the CLR's execution engine. We'll talk extensively about what services the runtime provides as we get into more detail later on.
The job of the CLR is to watch over your code, taking care of its needs and wants. In fact, code that takes advantage of the CLR is said to be "managed code", because the runtime is taking care of it, making sure that it has everything it needs and ensuring that it doesn't do anything it's not supposed to. "Unmanaged code" is the code that either doesn't know about or doesn't want to use the runtime. Both types of code are shown in figure 3.
In figure 3, we can see the CLR[1] sitting just below our code. Note that the CLR requires the presence of some operating system - it is not an operating system itself. Today that means Windows 2000, Windows NT4, Windows 98, Windows ME, or Windows XP must be installed on the machine where you want to use the CLR, but eventually support for other operating systems may be developed[2].
Note also that unmanaged and managed code can coexist on the same system, and can communicate with each other. This is important to preserve your investment in any code you have today that doesn't use the CLR. We'll talk in more detail about this later on.
The CLR's Base Class Library allow us to interact with the runtime, and provide additional useful functionality.
In order for my code to take full advantage of the runtime, I need a way to interact with the CLR. If I were programming unmanaged code where the operating system was the one providing the services I needed (e.g. drawing windows as part of a user interface), I would use a set of functions provided for me in the operating system's API, or Application Programming Interface. These are represented by the vertical arrows shown in figure 3.
The CLR Software Development Kit (SDK) provides an API. Because the CLR favors object-oriented programming (although other types of programming are supported), this API takes the form of a set of classes. Collectively, they are referred to as the Base Class Library, or BCL. Through the classes in the BCL[3], we can interact with the runtime, influencing the way that the runtime's services are provided to us.
That, however, is not all that the BCL gives us. In addition to giving us an "in" to the runtime, the BCL classes provide a large number of useful utilities. These include things like a new database access library, ASP.NET, and an XML parser with support for the latest XML specifications. There are literally thousands of classes in the BCL, so a complete listing is beyond the scope of this article.
We'll come back and look in more detail at some of the functionality provided by some of the BCL in the second half of this writing. For complete information, check the out the documentation at http://msdn.microsoft.com/library under the heading ".NET Development"
The CLR supports programming in one of about two dozen languages. The most popular of these are likely to be C# and Visual Basic.NET.
Having a set of libraries and a runtime is great, but neither one of them does me any good if I can't write programs to take advantage of them. In order to do that, we need to use some programming language with a compiler that is runtime-aware.
Microsoft currently lists over twenty different languages with which it will be possible to write software that targets the CLR[4]. Microsoft themselves ship support for five languages with the SDK: C#, Visual Basic.NET, IL, C++, and JScript.NET. Of these, C# and Visual Basic.NET are likely to be the languages most often used to develop software for this new platform.
C# (pronounced "see-sharp") is a new language with a C and C++ heritage. Developers familiar with either existing language or with Java will find the syntax very familiar.
Visual Basic.NET is an updated version of Microsoft's most popular programming language. The syntax differs significantly from that of VB6, and support has been added for a raft of object-oriented mechanisms not available in VB6. Moving code from VB6 to VB.NET in order to take advantage of the CLR will in most cases be a significant porting effort, even with the automated help that Visual Studio.NET provides[6] [7].
Having a runtime, providing libraries, and supporting a set of programming languages is something that all platforms do. What other things does the CLR enable that I can't easily do with straight-ahead, unmanaged Windows programming? In two areas, the CLR really shines: its support for component-based programming, and its extensive use of open, standards-based technologies.
The CLR has extensive support for component-based programming.
A fairly recent trend in software development is that of component-oriented programming, although the idea behind component development is not a new one. In fact, it's not even unique to software engineering: we stole it from the hardware guys. The basic concept is that systems are built out of discrete parts that can be assembled to make a larger whole. This is similar to the way, for example, your CD player is composed of a number of computer chips wired together with a laser and an LCD display.
The idea in software engineering is to leverage this same pattern. Software systems can be built from software components. A software component is a discrete piece of functionality that can be plugged into different applications. For example, I might develop a calendar component that allows the user to pick a day of the month. It could then be used in a Visual Basic application, on a web page, or in a Microsoft Word document. Of course, components do not have to be visual - a reusable piece of logic for calculating sales tax could also be written as a component. The important idea is that they're independent pieces that can be plugged together in a large system.
Component development has numerous attractive benefits. For one, the ability to assemble existing parts into a system offers the hope of true code reuse, a long sought-after goal. And even if you write all your own code from scratch, organizing it as components can ease maintenance by allowing independent bug fixes to be deployed.
The CLR is, from the ground up, a component-oriented platform. Every piece of CLR functionality that we run must belong to a component - an assembly in CLR parlance. Further, this component-orientation is deeply rooted in the mechanisms of the CLR, to the point where the security subsystem and the loader (among others) have the notion of component baked in.
The CLR supports several standards, such as XML and SOAP. The CLR has itself been submitted as an open standard.
Although the heady days of buying stock solely based on the presence of the letter "e" or the suffix ".com" have come and gone, there's no question that the Internet offers new opportunities for businesses. Systems that take advantage of what a global communication network can offer are here to stay.
Without standards, however, this would not be possible. A standard is a document that describes a convention or protocol that everyone agrees to follow. One example of a standard is HTTP - the Hypertext Transfer Protocol. Without HTTP, you wouldn't be able to walk up to any website with any browser and view the contents. Imagine if I could only view Linux websites with a Netscape browser running on the same version of Linux. It would severely limit the utility of the web. Because we have standards like HTTP, broad reach is possible.
Support for standards is of key importance to developers creating software today. The CLR recognizes this, and provides explicit support for standards-based computing, making extensive use of things like XML[8] and SOAP[9].
In addition to helping you write applications that conform to accepted standards - meaning that your applications have a better chance of being able to interoperate with other applications over the Internet - it turns out that the CLR itself is an implementation of a standard.
Microsoft has submitted the CLI - the Common Language Infrastructure - to ECMA, an international standards organization. The CLI is the specification that describes the CLR. It can be found at http://www.ecma-international.org/ [10] Additionally, the C# language has been submitted as a proposed standard to the same body.
This is very good news for those interested in interoperability. With the CLI in place, it will be possible for other vendors to provide compatible runtimes on other platforms, making it possible for the code that you develop on Windows to run on, say, Linux. There is development under way of an open-source version of the CLR for Linux[11].
The CLR has its origin in current Windows technologies. Examining the history of component software helps us understand the CLR.
We've painted a picture of the CLR in broad strokes. It's a platform for software development that provides a runtime, a set of classes in the Base Class Library, and it supports programming against it in one of several languages. The runtime supports component-oriented, standards-based programming, and it provides a number of services to code.
But what exactly are these services that the CLR provides? So far, we've been pretty vague on this point, but the time has come to turn on the spotlight and take a closer look. In order that we can better understand why the CLR does what it does, let's examine where it came from.
DLLs were the original component technology for Windows.
Interestingly, we've had a platform for doing component-oriented programming for quite some time; namely, Microsoft Windows[12]. One of the first forms of the software component was the DLL. Dynamic Link Libraries provided a way to assemble discrete chunks of code into a functioning system. They even have a place to list other components that they require in something called an import table. This is useful for determining what other pieces of code need to be present on the system in order for this component to run. Think of it as a set of assembly instructions to the consumer of the DLL.
Additionally, a DLL contains information about what functions it implements. This information goes into an export table, and consists of a function name (such as f) along with a relative virtual address (RVA). The RVA allows us to load a DLL into any location in memory. This is important, as the author of the DLL does not know at runtime what this address might be. How could they? Since we may have multiple DLLs running in the same process at any given time, and they all need to load at unique addresses.
Without knowing at compile time all of the DLLs that might load into the same process with us (and by extension all of the processes that we might load into), there is no way for the author of a DLL to make assumptions about absolute locations in memory. Fortunately, Windows knows about DLLs, and the operating system's loader takes care of this problem by walking through the list of RVAs at runtime and adjusting the code to correctly refer to memory locations it wants to use.
The import table and export tables are both examples of metadata. Metadata is data about data. The data in this case is the code inside the DLL, and the data about it describes things like what functions are available in the DLL and what other libraries it depends on. The Windows runtime (the loader) uses this metadata to provide a service - dynamic loading.
Remember the term "metadata". It's an important one that we'll be making use of a lot from here on.
Note figure 4 - we see the DLL, which contains both code (in the form of a series of x86 assembly instructions) and metadata.

The important thing to understand here is not the intricate details of RVAs or how the Windows loader works. The important thing is that we, as programmers, don't have to write code that worries about details like where we're going to wind up in memory. We just hand the system some metadata and it "does the right thing." As we'll see, the idea of using metadata to drive a runtime that provides services is a very powerful one. It's an idea of which the CLR makes extensive use.
While DLLs were a step in the right direction, they had a number of shortcomings. For one thing, it was necessary to locate them in one of a particular set of directories in order for them to be located by an application. More limiting, however was the fact that they were very difficult to use across language boundaries. That is, if I wanted to write some functionality in C++ and put it into a DLL, there were lots of things I had to watch out for if I wanted to be able to use it from Visual Basic, In fact, when trying to describe the programming interface to the DLL in terms of object-oriented concepts like classes and interfaces, it was often the case that even two C++ compilers from different vendors could not interoperate using DLLs.
Enter COM.
COM has advantages over DLLs as a component technology.
The Component Object Model - COM - attempted to solve the problems faced by DLLs by defining a binary standard for interoperability between languages. Any compiler that was able to emit code that could consume and produce a certain memory structure called a vtable (for "virtual table") could both call and by called by code produced by any other compiler that followed the same standard.
These vtables defined interfaces, the fundamental unit of interoperability in COM. All functionality that COM clients or servers wanted to consume or provide was done in terms of interfaces. This made interoperability possible by hiding details of implementation that languages could not agree on, like memory layout or object lifetime (who cleans up memory and when?). COM essentially consists of the specification of these vtable layouts plus a few rules about how to use them.
COM made component-based programming a reality. By offering a technology which made it possible (with some effort) for VB developers to talk to code written in C++ and vice versa. Truly reusable code became more economically possible by increasing the potential market for any given component. Even where reuse was not the goal, the advantage to having a cross-language integration technology meant that the appropriate tool for the job could be chosen for the task at hand.
COM provided more than just a certain degree of language interoperability, however. The ability to transparently make calls to objects living on remote servers without writing any grungy network code is one example. COM also provided, though an interface called IDispatch, the ability for languages to make calls to objects without having knowledge of them beforehand. This was particularly valuable in scripting scenarios such as ASP or browser scripts, where the code being executed is never compiled, but rather interpreted at runtime by the web server or browser.
These services are - exactly as was the case with DLLs - driven by metadata. In the case of COM, this metadata takes the form of a type library. Type libraries are consumed by the COM runtime to provide transparent remoting, IDispatch invocation, and other services. The type library has in it a description of the set of interfaces - or types - that our component implements. This description includes a listing of every parameter of every method of every interface in the component. It is this information that the COM runtime uses to do things like synthesize the network proxies that are used when making remote calls to objects on other machines.
Because COM provides more services than we get when writing plain DLLs, this runtime is larger and restricts somewhat the things we can do. But in general programmers are better off taking advantage of existing services in a runtime that has been carefully architected and tested many times over, rather than inventing everything from scratch themselves every time.
This idea - put more functionality in the runtime so that developers can focus on the parts of the problem relevant to their business - is one of the reasons COM was so successful. The designers of the CLR were well aware of this, and took this idea to its next logical step, as we will see.
COM has several shortcomings as a component technology. Most of these can be attributed to its lack of sufficiently detailed metadata.
Observe figure 5, which shows a typical COM component. It contains metadata that describes what lives inside the component. This consists of a type library - which we just discussed - along with our old friend the import table and the x86 code that implements the component. Note that the import table is not part of the COM type information; it's simply there because we happen to be using a DLL to house our component, and all DLLs can have an import table. (Recall that the import table contains the list of other DLLs we require to be on the system in order for our component to run.)

This is arguably a deficiency in COM. By not including in the type library information about what other COM components our component requires, the architects of COM complicated the business of deployment. Does my component require ADO[13] to be installed on the system? If so, what version?
One might think that this information could be stored in the DLL's import table. However, that would force the use of two separate APIs to ask questions about the same component, which seems inelegant at best. In any event, none of the mainstream tools support placing COM dependency information in the import table, so the point is largely moot.
To add insult to injury, we find that the type information[14] contained in the type libraries is lacking in several other respects.
Type libraries are incomplete. As a COM developer, it is possible for me to implement interfaces that cannot be described completely in a type library. Those familiar with the size_is attribute, for example, know that it cannot appear in a type library. Size_is is used to describe certain types of arrays that can appear as method parameters. Again - the intricate details aren't important. What is, is the fact that there's more going on than we can easily discover, and that makes working with certain components difficult.
Type libraries are nonextensible. There are a number of interesting applications that we could develop if we could embed our own information into the type library. Again, the idea of driving systems with metadata is a powerful one, and it would be nice if we could take advantage of this paradigm ourselves. However, type libraries are largely nonextensible, and embedding your own information into them is essentially impossible.
Type libraries contain no information about implementation. The real killer, however, is the fact that type libraries contain descriptions only of a component's interfaces. Interfaces describe only a set of methods that a component agrees to implement - they do not say anything about how that component is implemented internally. This information is critical, however, if we would like the runtime to take care of details such as memory management and persistence[15]. Without it, every object is forced to explicitly provide this functionality.
COM has limited support for versioning. COM does provide a fairly rudimentary versioning mechanism[16], but in practice versioning and deployment issues are quite difficult to deal with. It is virtually impossible to deploy an upgrade of a single, widely used component[17] without causing a cascading ripple of broken applications.
The CLR overcomes the shortcomings of COM by providing high-quality metadata. This metadata is used to drive the services provided by the .NET runtime.
While the announcement of the CLR and the surrounding family of .NET technologies came at the 2000 Professional Developer's Conference, it's clear that Microsoft has been aware of the limitations of COM for quite some time. A two-part article in Microsoft Systems Journal by Mary Kirtland in late 1997 outlines a technology that bears a remarkable resemblance to what we have in the CLR today[18]
Some brief terminology: A CLR component is called an assembly. An assembly has a manifest, where some of the metadata for the assembly is kept.
The CLR provides fixes for all of the problems that type libraries had[19].
CLR type information contains information about referenced components. If assembly A is built to use assembly B, this fact is recorded in the manifest of assembly A.
We store not only the name of the assembly on which we are dependent, but optionally also information about things like the particular version of the assembly we require, the culture-specific version (e.g. US English, Canadian French, etc.), and even a token identifying the author of the component. Because of this, complex analysis of the deployment dependencies of an assembly is possible. Combining this with the ability of the CLR to simultaneously deploy multiple versions of a component on the same system will hopefully free us from the phenomenon known as DLL Hell[20].
A picture of two CLR assemblies, one with a reference to the other, appears in figure 6 [21].

CLR type information is complete. Every parameter of every method of every interface of every class is completely described in the metadata.
CLR type information is extensible. If the complete description of all types in a CLR component is not in and of itself sufficient for your needs, you can attach your own custom information through the use of custom attributes. You can use this, for example, to drive a validation framework that will examine instances of classes and either accept or reject them based on business rules embedded in the metadata itself, such as "the amount property of the Loan class should be no less than $100,000, and no greater than $216,000."
CLR type information contains details about the internal implementation of classes. Since this allows the CLR access to details like how a particular object is laid out in memory, we can now rely on the runtime to provide services like persistence, rather than implement the same code redundantly in every object.
CLR type information has strong versioning capabilities. Every CLR assembly has a version, and the runtime knows how to use this metadata to ensure that you get either the version you ask for, or a compatible version. Further, updates of components can be deployed without interfering with existing versions.
The differences between CLR and COM metadata are summarized in figure 7.

CLR components use metadata for everything - including describing implementation that would normally take the form of machine code. The metadata that fills this role is called IL, for Intermediate Language.
CLR metadata goes beyond simply describing what a component looks like (classes, interfaces, etc.). Astute readers will have noticed that figure 6, where our CLR component is depicted, does not contain any place for x86 assembly code! And in fact none appears in a CLR assembly.
So what's going on here? How does a component technology do us any good if there are no executable instructions in it? The answer lies in something called Just-In-Time compilation, which we'll talk about momentarily. First, let's go back and take a closer look at our CLR components.
If we take another look at figure 6, we see that the CLR components contain something called IL. This stands for Intermediate Language, and it is merely another form of metadata. This particular type of metadata is not good at describing things like method parameters and interfaces, but rather at describing the program logic that implements the component.
The CLR SDK (available at http://msdn.microsoft.com) includes in it a tool called ILDASM - for IL disassembler - which we can use to view the contents of a CLR assembly, including the IL that a CLR-targeted compiler emits when run against a source code file. An example of some IL appears in figure 8.

If you've ever seen some flavor of assembly language, you'll find that the code shown in figure 8 looks familiar. It is not, however, like a typical assembly language in that it is not closely related to the instruction set of some particular CPU. Instead, it is meant to be a generic description of the steps needed to implement a particular method of a particular class. In a very real sense, it is metadata - information about the implementation of a component.
Properly speaking, this assembly language is called MSIL or CIL, for "Microsoft" or "Common" Intermediate language. Why the choices? Recall that Microsoft is currently pursuing submitting the CLR as a set of standards through ECMA. Although this assembly language started life as MSIL, when submitted to ECMA it became necessary to drop the vendor's name. Hence, CIL.
Machines cannot run IL directly. A process known as JIT compilation turns IL into executable code.
We could consider IL to be the machine language of the CLR. Whatever we call it, it's metadata that describes implementation. But because it's not x86 assembly instructions, like we stored in COM and vanilla DLLs, it cannot be executed directly by a computer. We rely on yet another service of the platform to make this magic happen: Just-In-Time Compilation.
Just-In-Time compilation (JIT for short) is the process by which the runtime examines the IL in our assemblies and creates code that can be executed by whatever processor we happen to be running on. The "Just-In-Time" comes from the fact that the runtime performs this compilation at runtime, every time the component is loaded into a new process[22] [23]. This is true compilation - the IL is turned into actual machine code a method at a time. Interpretation - reading one IL instruction at a time and executing it - never occurs in code targeting the CLR.
This is an important point. Because the code is being compiled - converted to machine code en masse - rather than reading and then executing one IL instruction after another - performance of code in the CLR should be quite good.
In fact, JIT code could even outperform unmanaged code in some situations, because it knows things that a normal compiler doesn't, like exactly what processor (Pentium II? Pentium IV? AMD Athlon?) the code is executing on. Each processor chip has its own extensions to the standard x86 instruction set, and by using them, the JIT compiler may be able to produce more efficient code[24].
Some of you may remember what a pain it was to move 16-bit Windows 3.1 code to the 32-bit Windows NT or Windows 95 platform. A similar ordeal faces us in the coming years as new 64-bit processors become available. Anyone coding the old way, where your development tools create components that contain actual machine instructions, will need to rewrite everything to take advantage of the new machines. However, anyone that is making use of JIT compilation technology can rely on the runtime to emit the correct code - your components don't have to change at all. The only thing we need to update is the JIT compiler itself, rather than hundreds or thousands of individual components. This is one of the biggest advantages to adding this level of indirection between what our compilers produce and what eventually gets executed by the target computer.
While JIT compilation theoretically[25] gives us the ability to write code that can run on a variety of platforms, there are other great benefits to this approach as well. One of these is something known as code verification. Because we have metadata about the implementation - CIL - and a runtime that understands it, we can get something we never had in COM: an assurance that the code doesn't do anything it shouldn't be allowed to.
The key here is that it's the runtime that's generating the code that gets run by the machine, not the compiler. So the runtime can check to see if the code that you just downloaded from http://www.evilcode.com/ is trying to do something you don't want it to do, like reading from a memory location that it doesn't own or writing to the tenth element of an array that only holds three things.
The CLR exposes other services besides JIT Compilation. These too are driven by metadata.
So what have we got? The CLR, as a natural evolution of COM and of the DLL technology that COM replaced, represents a continuation of two trends. The first is the increasing amount of metadata that components contain. Figure 9 illustrates this. If DLLs were 90% executable code and 10% metadata and COM components were 50-50, one might initially think that CLR components would be 75-25 or even 90-10. But as we've seen, the surprising fact is that CLR components, (by virtue of the fact that their implementation is described in IL) are 100% metadata!

The other trend that the CLR carries forward is the use of this metadata by an increasingly capable runtime to provide services. These trends are illustrated in figure 9.
Some of these services we've mentioned already, like code verification and JIT compilation. But there are a variety of other aspects to CLR programming that bear examination. Let's look at two of the important ones: memory management and security.
The CLR manages both allocation and deallocation of memory. This eliminates two of the largest sources of programmer error: leaks and memory corruption.
Programmers with a background in C++ are used to operating fairly close to the hardware. It is their responsibility to track all resources they use and to clean up after themselves. These resources include things like files, database connections, and memory.
Manual memory management is a source of a great many errors in traditional C++ programming, as evidenced by the fact that most users are painfully familiar with the dialog box shown in figure 10.

Every request for a piece of memory (allocation) must be exactly matched with one request to return that memory to the list of unused locations (deallocation). While this seems reasonable, in practice it turns out to be quite difficult to program correctly, and most non-trivial programs contain at least one location where this does not happen.
One of two things can occur when allocation and deallocation requests are not matched. If no deallocation occurs for a given allocation, that memory is still considered by the system to be in use, even though it could be gainfully employed doing something else. This is called a leak. Given enough leaks, a process will eventually use up all available memory and fail. This is shown in figure 11 - the program is using only three chunks of memory, but the system thinks that many more are in use.

The other thing that might happen is that a programmer might deallocate a piece of memory and then use it again. This is strictly verboten, as the system may have handed out that memory to someone else. If you are very lucky, using a piece of memory that you have already deallocated will result in the dialog box shown in figure 10. If you are unlucky, you will modify a piece of memory that someone else is using, almost assuredly in a way that they are not expecting. These types of errors are very difficult to locate and diagnose, since they may not manifest any symptoms until minutes or hours after the incorrect modification.
Figure 12 depicts this situation. Component A has indicated to the system that it is done using the indicated piece of memory, but continues to use it anyway. When component B comes along and asks for some memory, it may be handed that same location. Now the two components are using the same piece of memory, most likely for different purposes, almost certainly with disastrous results.

The CLR addresses both of these problems. The latter - accessing memory that has been deallocated - is not possible in verified code. In fact, assuring that this sort of thing doesn't happen is arguably the main point of verification.
To address leaky code, the CLR uses something called garbage collection. Garbage collection, or GC, is a service provided by the runtime based on - you guessed it - metadata.
Because CLR components are fully described by metadata, and in fact consist of nothing but metadata, the runtime has all the information it needs to do allocation. Specifically, when code wants to create a new object, rather than doing it ourselves (as in traditional Windows programming) or asking the component to do it for us (as occurs in COM), we rely on the runtime. The runtime looks at the metadata that describes the class we're creating an instance of and peels a chunk of free memory off. It knows how much we need because all the details of the object including implementation details like private members are listed in the metadata.
The interesting bit is, when we are done using a piece of memory[26] we don't have to do anything. Since we went to the runtime to get that memory, it knows we have it. And since the runtime has metadata that completely describes us, it also knows when we're done using the memory, too! Periodically and when required, the system will perform a garbage collection cycle, where all unused memory will be reclaimed and made available to be handed out in subsequent allocation requests. Thus we have the picture shown in figure 13. It's essentially the same as the leaky code in figure 11, but now the runtime will reclaim the memory that we "leak."

The CLR introduces a new security model that provides finer control over what is and is not allowed.
The model that we use for security in Windows is fundamentally a process-based one. That is, all code that runs inside the same process is assigned all the same privileges. While this may have worked fine in the days when all we ever ran were monolithic applications that we installed from a floppy, it has some shortcomings in the Internet age.
Consider ActiveX controls[27]. These components are typically downloaded from some website and run in a browser. If that browser was launched by a network administrator, the ActiveX control has free rein over your corporate network. While we utilize digital signatures to ensure that the code came from whom we think it came from, all this truly provides us with is a name to put on the lawsuit after the damage has been done. This problematic approach is depicted in figure 14.

The approach that the CLR takes to security is fundamentally different. Rather than assign privileges to a component based on what process it runs in, security uses evidence about the code to assign permissions on a component-by-component basis. Evidence consists of information gathered about the code at load time, and can include things like the website from which the component was obtained. This improved security model is shown in figure 15.

Using this much finer-grained model, it is possible for us to make security statements such as, "Code obtained from the following sites should be allowed to access the local filesystem, but code obtained from any other site should not." Role-based checks are also supported, for example saying that only administrators can use this particular component.
On top of this, the runtime will check code to see if it has been tampered with. The metadata in a CLR component can contain a digital signature. A digital signature is a series of numbers that can be used to verify that the component a) was written by who you think it was, and b) hasn't been tampered with[28]. If someone tries to alter a piece of code - say to change an account transfer to send money to their account instead of the intended one - the CLR will step in and prevent the altered code from running.
If you are interested in learning more about security, check out DevelopMentor's course "Essential Windows 2000 Security". It discusses security in both the traditional Windows model and in the new world of CLR-based code.
The Base Class Library are a set of classes that provide useful functionality to CLR programmers.
Every platform has a runtime, and every platform provides a set of libraries that give programmers access to the capabilities of the runtime. Recall that the BCL is the set of classes that let us access the CLR. It provides over 3000 classes - all CLR library functionality is provided in an object-oriented manner - that do everything from parse XML to provide reusable regular expression parsing.
Because there is so much functionality in the BCL, finding what you want could be quite difficult if everything were just piled in a heap. Fortunately, the classes in the library are arranged in a hierarchical structure through namespaces. A namespace is nothing more than a prefix that a group of classes share. So, for example, many of the classes that have to do with processing XML documents start with "System.Xml," like System.Xml.XmlDocument. We would say that XmlDocument lives in the System.Xml namespace.
There are many namespaces in the BCL. Let's look at a few of the more interesting ones.
The System namespace contains the core classes of the BCL
The System namespace contains many of the core classes that the rest of the BCL is defined in terms of. Most important of these are the built-in types such as System.String, System.Int32, System.DateTime, and about twenty others. These built-in types provide a framework that greatly simplifies the task of component integration.
With COM, we saw the popularity of cross-technology component integration skyrocket. COM classes written in C++ could be used by VB programmers and vice versa. As anyone that has ever tried to do this from the C++ side can attest, however, the differences between the way that VB chooses to represent strings and the way that C++ programmers do so makes for a lot of code that does nothing but convert from one to the other and back again. And strings are not the only type for which this conversion is required.
The fundamental issue here is that VB and C++ do not have a common type system. In fact, there are three type systems: VB, C++, and COM. See figure 16 for an illustration of the boundary crossings involved.

Because the BCL defines a set of standard types that all users of the CLR can agree on, there's no need to convert between your string representation and mine - we simply use System.String and don't worry about it. This scenario is shown in figure 17. This unified type system should ease multiteam development and component integration burdens, as developers can stop writing reams of conversion code and focus on implementing the logic relevant to the business problem they are trying to solve.

The classes that support ASP.NET - the replacement for Microsoft's ASP technology - live in the System.Web namespace.
Enabling component-oriented development is one major goal of the CLR. Another is enabling distributed programming. There are several popular models for building Internet systems today. Arguably the most successful of these is Microsoft's Active Server Page (ASP) technology.
The System.Web Namespace contains classes that are aimed at providing the framework for the replacement of ASP. Dubbed ASP.NET, this will likely be the first place many developers will encounter the CLR, as ASP.NET represents one of the most compelling reasons to use this new platform.
Among the many new capabilities that ASP.NET delivers into the hands of web developers are:
The CLR has built-in support for many XML technologies. The classes that support this functionality live in the System.Xml namespace.
ASP.NET will be used mostly for distributed applications where what's at the other end of the connection is a browser. More and more often, however, systems being built are intended to provide information not to a user sitting in front of Internet Explorer somewhere, but rather to another business system. For example, an inventory management system may automatically connect to a supplier's system using the Internet to request shipment of some replacement parts.
Because of the huge variety of systems that want to communicate - Win32, Sun, Mainframe, etc. - some mechanism for transferring information in a mutually understandable manner is essential. XML, the Extensible Markup Language, has emerged as the standard way to represent information in a platform-neutral manner.
Recognizing that XML is the lingua franca of distributed systems, the BCL includes an XML parser in the System.Xml namespace. Classes such as XmlReader, XmlWriter and XmlDocument provide support for both reading and writing XML to any of a variety of media. Support for other XML technologies such as XPath, XSLT and XML Schemas is present as well.
The CLR supports remote method invocation via the System.Runtime.Remoting namespace. This provides capabilities similar to what DCOM provided for COM programmers, although without some of the limitations of that technology.
COM included a facility calling methods of objects residing in other processes or even in processes on other machines. This mechanism was called remoting and in COM relied on a network protocol called DCOM, for Distributed COM. It was difficult to build large-scale distributed systems based on DCOM for a number of reasons, not least of which was the fact that DCOM is not an open protocol.
Remoting in the CLR is supported by the classes that live in the System.Runtime.Remoting namespace. The remoting architecture is very flexible, supporting extensibility at almost every point. This allows developers to implement their own protocols and perform pre- and post-processing of remote calls.
While such extensibility is handy or even occasionally indispensable, in most situations developers will want to rely on one of the two different protocols that are available out of the box. One uses the Internet standard protocol HTTP to make remote method calls as XML messages in the SOAP format[29]. The other uses a binary format layered directly on top of low-level TCP/IP communications. The latter is more efficient, but the former is generally more appropriate when attempting to communicate through a firewall. Both provide significant advantages over DCOM when building large-scale systems by giving developers a variety of options for activation semantics and load balancing.
The CLR has excellent support for working with legacy code.
While the CLR presents an exciting new environment in which to develop applications, businesses cannot realistically be expected to rewrite all of their code to take advantage of it immediately. Tested and working systems, developed using current technologies like COM, need to be accessible from CLR code. Similarly, it would be great if we could use new, CLR-based code from our existing systems.
The term for code that runs under the auspices of the Common Language Runtime - taking advantage of things like garbage collection, CLR security, and remoting - is managed code. Code that runs outside the CLR (e.g. all COM code, regular DLLs, VB6 code, etc.) is referred to as unmanaged code.
The CLR supports calls between managed and unmanaged code in three different ways.
The integration support between managed and unmanaged code is fully extensible, and in the rare cases where the default handling of transitions is not acceptable, hooks are provided for full customization.
MTS and COM+ brought support for services such as automatic transaction management to COM. The CLR does not provide these services natively, but instead provides them through easy interoperability with COM+.
Although the runtime has an extensive list of services that it provides based on metadata, it doesn't do everything just yet. Services such as automatic transactions and object pooling are not currently provided directly by the CLR, although they likely will be in a future release of the runtime. Instead, the CLR provides support for obtaining these services from COM+ and MTS. The support is deeply integrated into the runtime, and includes things like the automatic creation of COM+ applications.
If you are interested in learning more, check out the documentation for System.EnterpriseServices.ServicedComponent, or read Tim Ewald's October 2001 article on this topic in MSDN Magazine.
The CLR provides an API for accessing data in a manner similar to ADO. These APIs are collectively referred to as ADO.NET, and the classes that support them live in the System.Data namespace.
Microsoft has a penchant for introducing a new data access technology every few years - ODBC, RDO, OLEDB, ADO, etc. Well, it's been a while since the last one, so...enter the classes in the System.Data namespace.
There's a good reason for Microsoft to introduce a new data access mechanism. While the interoperability facilities of the runtime are quite sophisticated, there is still a cost associated with calling into unmanaged code. So a CLR-based implementation for SQL Server is provided. Providers for other database engines are sure to follow. Additionally, there are classes for accessing any OLEDB-compliant data source from managed code as well. Both sets of classes are collectively referred to as ADO.NET.
Beyond simply recasting the object models of ADO and OLEDB in terms that are more compatible with the new platform, there are some fundamental shifts in the way that data access is provided. For one, support in ADO.NET for server-side cursors has been dropped. Server-side database cursors were not generally appropriate for the types of web-based, large-scale systems that people build, and developers often mistakenly used them without being aware of the penalty in performance they incurred.
Another fundamental difference between the new and old world is the level of support for XML integration. While the old ADO Recordset object supported saving arbitrary query results to an XML document, the DataSet object goes far beyond this simple capability. The DataSet allows information from relational tables, XML documents, and other sources to be combined in a single collection. It supports advanced capabilities like the preservation of table relationships and the automatic generation of XML Schema information describing the contents of the DataSet.
A table comparing the capabilities of ADO and ADO.NET is shown in figure 18.

Those interested in data access technologies in the managed world are encouraged to check out DevelopMentor's Essential Web Services.NET, Essential ASP.NET, and Developing SQL Server Applications.
The Base Class Library contain a large number of useful classes.
The BCL is extensive, and a complete description of all the areas that it covers wouldn't fit here. I'll just mention a few of the other namespaces and what they provide. * System.IO provides support for stream-based IO operations like file access. * System.Net supports client-side network programming via sockets or HTTP. * System.Messaging provides a managed interface to the Microsoft Message Queue (MSMQ). * System.Threading provides support for writing highly concurrent, multithreaded applications. * System.Text supplies classes for dealing with text in a variety of formats. * System.Collections provides built-in support for things like lists, arrays, and hashtables. There are many other namespaces, and dozens of classes in each. Fortunately, the use of namespaces makes finding the functionality you're looking for fairly intuitive.
Microsoft's .NET is a product family and a strategy covering most if not all of their present and future products. At the core of this strategy is a new programming platform called the Common Language Runtime. The CLR exists to give developers a new, more capable environment for writing component-oriented, standards-based software.
CLR programmers deal with three things when writing for this new platform: a set of services provided by a runtime, a suite of classes known as the Base Class Library, and one or more of the languages that support the CLR.
Most of the services that the runtime provides are driven by metadata. Metadata is the information that describes what is in a component. The CLR metadata goes well beyond the metadata available in previous technologies such as COM, allowing for a range of new services and features. Some of these include JIT compilation, automatic memory management, and a new security model.
Overall, the features of the CLR exist to make the developer more productive. By freeing programmers from many details of software development, it allows them to focus on the business problem at hand.
[1] The CLR runtime lives in a DLL called MSCOREE.DLL, which stands for Microsoft Common Object Runtime Execution Engine. "Common Object Runtime," or COR, is one of the many names this technology has had during its lifetime. Others include Next Generation Windows Services (NGWS), the Universal Runtime (URT), Lightning, COM+, and COM+ 2.0. There's a t-shirt waiting to happen here.
[2] In fact, there is an open source effort to port the CLR to Linux underway right now. Get more info at http://www.go-mono.org
[3] The term FCL (Framework Class Libraries) is also used to refer to these classes, depending on which piece of documentation you happen to be looking at.
[4] http://msdn.microsoft.com/vstudio/partners/language/default.asp
[5] Those interested in learning more about C# (and the CLR in general) can also take DevelopMentor's "Essential .NET : Component Development with C#".
[6] DevelopMentor offers a class, "Essential .NET: building applications and components with VB.NET". Also check out the book VB.NET Programming with the Public Beta, by Hollis & Lhotka, on Wrox Press.
[7] Another interesting note is that support for VBScript has been dropped. As we'll discuss in the sections on MSIL and JIT compilation, all code that executes under the CLR is compiled. Given that the primary use of VBScript is interpreted ASP pages, and since ASP.NET pages can be written in any CLR-compliant language, VBScript has been replaced by VB.NET, the new version of Visual Basic.
[8] XML, the Extensible Markup Language, is a defined at http://www.w3.org/XML. For a more gentle introduction, see DevelopMentor's XML tutorial.
[9] SOAP (the Simple Object Access Protocol) is an XML protocol that has been widely adopted as an interoperability specification for exchanging messages between platforms. If you are interested in SOAP for reasons of interoperating with other platforms, and want to know more about how to achieve this using the .NET technologies, Essential Web Services.NET is for you. Also check out http://www.develop.com/soap for a more extensive list of SOAP resources.
[10] Relevant specifications are also mirrored at http://msdn.microsoft.com/net/ecma/default.asp.
[11] The FSF implementation will be led by Ximian and will be called Mono. Intel has announced a similar effort. More information is available at http://www.go-mono.com and http://sourceforge.net/projects/ocl.
[12] Which is not to say that other operating systems don't offer the same functionality. They do, but as we'll see the CLR integrates these capabilities to a level not seen before in any OS.
[13] The ActiveX Data Objects, Microsoft's Visual Basic and script-friendly database access API.
[14] Type information is simply a description of the classes, interfaces, methods, functions, data structures, and other programming constructs defined within a component. For example, a component used in finance might define a Loan class that had things like an Amount, a Borrower Name, etc. If we wrote the Loan class in Visual Basic, a description of the Loan class would appear in the type library that VB automatically generates for us. This is one example of type information.
[15] Persistence is the saving of an object or set of objects to a medium such as a file or database.
[16] The mechanism can be summed up as, "Once you define an interface, don't ever change it."
[17] For example, ADO. Installation of the Microsoft Data Access Components, of which ADO is a part, commonly requires retesting and often redeploying every application on a machine.
[18] Of course, Mary refers to it as COM+, the name of the technology at the time. See footnote 1.
[19] It is a rule in software engineering that a product fixes all problems in the previous version, waxes your car, and walks the dog.
[20] This usually manifests itself as "I installed ADO 21.53 service pack 9 to make App Y work, and it broke App Z."
[21] Notice that CLR components are generally stored in Portable Executable (PE) files - which is to say DLLs and EXEs - the same way that COM components are. However, the DLL or EXE is mostly just a wrapper around the new metadata format.
[22] Technically it happens once per App Domain, which is the CLR equivalent of an operating system process.
[23] This compilation step can also be performed once at the time when the component is installed on the target machine. This is known as "pre-JITting", despite the contradiction in terms.
[24] Of course, actual performance will probably be slightly slower than code written in C++, simply because the runtime does more for us, and that takes extra CPU cycles. In any event, it was somewhat pointless to measure performance in a Beta product, since optimization generally occurs very late in the product cycle. Now that .NET has been released, expect to see people start to perform these tests.
[25] It's theoretical as long as the only implementations of the CLI are available from Microsoft.
[26] And an object is just a piece of memory, after all.
[27] An ActiveX control is a type of COM component, generally downloaded from the Internet and run in a browser such as Internet Explorer. They usually implement some small piece of reusable user interface code, such as a calendar or a stopwatch. Because of the current security model in Windows, however, there is no guarantee that they don't also do something else, like email the contents of your hard drive to someone.
[28] Two excellent resources for learning more about digital signatures and about computer security in general are Bruce Schneier's Applied Cryptography: Protocols, Algorithms, and Source Code in C, 2nd Edition (John Wiley & Sons), and Keith Brown's Programming Windows Security (Addison Wesley).
[29] If you are interested in SOAP for reasons of interoperating with other platforms, and want to know more about how to achieve this using the .NET technologies, Essential Web Services.NET is for you.
[30] BCL documentation is installed with the .NET SDK, or can be found online here.
 
[ Home ] [Services]
[ Customers]
Send mail to webmaster@ninestein.com with
questions or comments about this web site.
Copyright © 2000 Ninestein Technologies
Last modified: March 16, 2004