To take advantage of this article you must have a basic understanding of IDisposable pattern already. Dan Rigsby has recently published a great article on how to implement IDisposable pattern. You will find complete code snippets and download link to the source code on his blog.
There is a misperception about IDisposable that it is used only when you have unmanaged resources declared in classes. The MSDN Library documentation is partially to be blamed for this for mentioning the “unmanaged resources” only not both managed and unmanaged resources. Here is what MSDN Library says about IDisposable interface;
“Defines a method to release allocated unmanaged resources.”
The definition only covers half the story.
I’ve recently written an article about Designing rich classes with Smarties 2008 that I talked about ways to make a class rich. One way to make a class rich is by providing services to the users of the class such as the common interfaces it would implement i.e. IDisposable, ISerializable and ICloneable.
Since the release of C# 2.0 IDisposable’s responsibilities have increased. One of the cool things that you can do in C# is to declare a type within a scope at the end of which the type will be disposed.
Example:
using (Class1 obj = new Class1())
{
}
Prior to C# 2.0 the declaring type only needed to have a public (Public in Visual Basic) Dispose() method but now the specification in C# 2.0 specifies that the Type must implement the IDisposable interface. I think this on its own proves my theory that there is nothing wrong to implement IDisposable for all declaring types. By doing so you would ensure the user of type is able to take advantage of the “using” keyword in C#.
As part of my research for this article I looked for patterns in .Net framework and it appeared that Microsoft only used IDisposable when there were unmanaged resources or unsafe types were used up to Framwork 2.0 but I started noticing two things in Framwork 3.0 or higher. Remember the majority of Framework 2.0 types were ported from 1.1. The first thing was that IDisposable appeared to be implemented by types that have no unmanaged resources and unsafe types. The second thing that I noticed was the pattern I use with collection types (a type that extends the generic Collection class) that I’ll explain about it shortly.
This is my conceptual understanding of how GC (Garbage Collection) works in .NET...
When an application or module (single .dll) is launched the OS creates a process and loads the application/module to the process. The process itself is a boundary that has knowledge of memory addresses it can use. A process cannot consume the memory address of another process on the same machine. The responsibility of creating the process is up to the OS when the application is written in low level programming languages such as C++.
In C++ environment each application has its own responsibility to release the memory it consumes failing to do so would result in memory leakage and eventually the machine runs out of memory space for other new processes to consume. C++ developers have the extra coding to release anything that they would consume and this makes developing C++ application very difficult.
.Net developers don’t have the same issues that C++ developers have because the .Net framework itself creates the processes therefore .Net Framework can monitor and manage all processes that it creates for assemblies. This would free the .Net Developers to write routines to release the consumed memory.
To reclaim used memories .Net framework has a service called GC (Garbage Collection) that on certain intervals it scans the memory and reclaims memory spaces that no longer are in use. I really don’t know base on what complicated algorithm GC starts scanning. One thing I know is that the GC is designed to receive low memory notification messages from OS to kick start the scanning process.
In .Net when you assign an object to null (Nothing in Visual Basic) you actually instruct the CLR (Common Language Runtime) to make the memory address of the object inaccessible but the object’s data is not destroyed from the memory (heap) yet. I have assumed there is a table (graph) that holds such memory addresses for GC to use. The GC at certain intervals would scan the table (graph) to destroy the object’s data or to delete the pointers that held a reference to another object in memory.
How about when you don’t assign objects to null (Nothing in Visual Basic) when you are done using them? In this case the GC would examine all allocated memory addresses within your application domain to see if the object in memory has been referenced by other objects or not. When an object is being referenced by another object there will be a pointer for this reference in the memory table. If no pointers found for an object and the object is out of scope then the object is destroyed.
As you can see the process that needs to check the entire memory table (graph) (objects that are not assigned to null) would take longer to be processed than the scanning process for marked memory addresses to be reclaimed. In terms of ticks you would never notice the difference between the two but if your application stores large amount of data you might notice sluggish performance when GC is in action from time to time.
You can also think of using IDisposable interface to help GC to do it jobs quicker by assigning all declared fields to null (Nothing in Visual Basic). Obviously there are few more valid reasons that I’d try to cover them in this article.
Now that I talked about a very basic concept of how GC collection works we can move on to the pattern that I use for types that extend generic Collection class.
Please examine the following type;
public class EmployeesCollection
: Collection<Employees>
{
public EmployeesCollection()
{
}
}
The class above represents a collection of Employee type.
In database-driven applications the user might call a query that fetches hundreds of Employee records from the data store. If the user repeats the same action there is a good chance before GC gets a chance to clean up the memory of previous fetch event the user consumes more memory. The process that the application is loaded into then can run out of memory address to use and finally crashes the application.
These days newly build machines have over 1GB of extended memory. A single process can allocate up to 2GB (32bit processor) of space on the heap. That is a lot of data to store in 2GB space but EmployeeCollection object is not going to be the only object that would resides in the process of the application at one given time.
It would be a good Software Development principal to think once the object lifetime is reached to take the necessary measures. I will show you another real world example so that you can see how this principal is important even in .Net platform with GC at your disposal.
Now, let see the IDisposable pattern in action. Please assume the Employee class already implements IDisposable interface. The IDisposable pattern for the EmployeesCollection would like the snippet below:
private void Dispose(bool disposing)
{
if (!this._Disposed)
{
if (disposing)
{
foreach (Employees emp in base.Items)
{
emp.Dispose();
}
base.Clear();
}
}
this._Disposed = true;
}
public void Dispose()
{
this.Dispose(true);
GC.SuppressFinalize(this);
}
The important point is in the most inner if statement. Please observer how each Employees class gets disposed and then the collection’s Clear() method is called. With a single call to Dispose() method you can now take care of everything.
To compliment this article I have added support (C# only) to Smarties2008’s Smart IDisposable command to create the foreach loop statement like in above snippet. The next release 1.3.6 that I plan to release on the same day as publishing this article will have this feature included.
Finally I’m going to talk about another scenario where IDisposable interface becomes a necessity to implement by giving you a real world example.
Smarties 2008 has a command called Regionize This that organises members within designated #region directives. Regionize This command can be executed from various locations including from Solution Explorer’s Solution Menu. Once Regionize This command is executed it would traverse the projects and project’s items.
Regionize This does a relatively complex process to complete its job and also would consume relatively large data during the Regionization. Considering a large solution such as Microsoft Enterprise Library with over 3,000 .cs files it would require a great deal of care not to run out of memory when the Regionize process is traversing and processing all the files. One way to do this is by disposing virtually all types that are used by the Regionize Process to handle a single file (Project Item) at a time. I couldn’t simply rely on GC to clean up all types that the Regionize process consumed for each .cs file as the Regionize process continues instantiating the next set of types as soon as it completes the current one.
Now we have reached to the real question as to when it is safe to call a Dispose() method. The answer to this is really depends how a type is designed and how it is used within an application therefore I can only offer you guidelines to watch for the hidden implications.
Please examine the following class below:
public class ClassA : IDisposable
{
private string m_Field1 = null;
private int m_Field2 = 0;
private bool _Disposed = false;
public string Field1
{
get
{
return this.m_Field1;
}
set
{
this.m_Field1 = value;
}
}
public int Field2
{
get
{
return this.m_Field2;
}
set
{
this.m_Field2 = value;
}
}
protected virtual void Dispose(bool disposing)
{
if (!this._Disposed)
{
if (disposing)
{
this.m_Field1 = null;
}
}
this._Disposed = true;
}
public void Dispose()
{
this.Dispose(true);
GC.SuppressFinalize(this);
}
~ClassA()
{
this.Dispose(false);
}
}
ClassA does not declare any reference types that implements IDisposable pattern therefore it has no implications with regards to the overloaded Dispose(bool) method. Please examine ClassB now:
public class ClassB : IDisposable
{
private ClassA m_Data = null;
[System.NonSerialized]
private bool _Disposed = false;
public ClassB()
{
this.m_Data = new ClassA();
}
public ClassA Data
{
get
{
return this.m_Data;
}
}
public void Dispose()
{
this.Dispose(true);
GC.SuppressFinalize(this);
}
~ClassB()
{
this.Dispose(false);
}
protected virtual void Dispose(bool disposing)
{
if (!this._Disposed)
{
if (disposing)
{
if (this.m_Data != null)
{
((IDisposable)this.m_Data).Dispose();
}
this.m_Data = null;
}
}
this._Disposed = true;
}
}
The ClassB declares a reference type field (m_Data) that its type implements IDisposable interface. You need to pay a close attention to how ClassB is designed. For one Data property is ReadOnly therefore there is no chance this property is assigned by another object (instantiated ClassA). The second thing you need to notice is that ClassB has no constructor that declares a parameter of ClassA type and in default constructor m_Data field is instantiated. Once again the Dispose() method of ClassB can be called without any implications.
Please examine ClassD now: I do know after ‘B’ is ‘C’ in English alphabet but ClassC is so ugly to type :)
public class ClassD : IDisposable
{
private ClassA m_Data1 = null;
private ClassB m_Data2 = null;
[System.NonSerialized]
private bool _Disposed = false;
public ClassD()
{
this.m_Data1 = new ClassA();
}
public ClassD(ClassB data2)
: this()
{
this.m_Data2 = data2;
}
public ClassA Data1
{
get
{
return this.m_Data1;
}
}
public ClassB Data2
{
get
{
return this.m_Data2;
}
set
{
this.m_Data2 = value;
}
}
protected virtual void Dispose(bool disposing)
{
if (!this._Disposed)
{
if (disposing)
{
if (this.m_Data1 != null)
{
((IDisposable)this.m_Data1).Dispose();
}
this.m_Data1 = null;
if (this.m_Data2 != null)
{
((IDisposable)this.m_Data2).Dispose();
}
this.m_Data2 = null;
}
}
this._Disposed = true;
}
public void Dispose()
{
this.Dispose(true);
GC.SuppressFinalize(this);
}
~ClassD()
{
this.Dispose(false);
}
}
ClassD’s design is similar to ClassB with exception of m_Data2 member which can be assigned via the constructor or the setter of Data2 property. The overloaded Dispose(bool) method calls the Dispose() method of m_Data2 if it is not null. This is the scenario that you need to watch because you can easily cause a runtime exception in your application.
When a type’s member can be assigned in this manner that means most likely (not always) you would assign or pass another object reference of the same kind to ClassD. Please examine the code snippet below to see what I mean:
ClassB objB = new ClassB();
ClassD objD = new ClassD(objB);
objD.Dispose();
// The following code throws a null reference exception since objD
// disposed objB by calling the Dispose() method.
ClassA objA = objB.Data;
Here is the correct version of overloaded Dispose(bool) method:
protected virtual void Dispose(bool disposing)
{
if (!this._Disposed)
{
if (disposing)
{
if (this.m_Data1 != null)
{
((IDisposable)this.m_Data1).Dispose();
}
this.m_Data1 = null;
this.m_Data2 = null;
}
}
this._Disposed = true;
}
We simply assign the m_Data2 field to null. In real world the first instantiated object can be safely disposed as shown in the following snippet:
ClassB objB = new ClassB();
ClassD objD = new ClassD(objB);
objD.Dispose();
ClassA objA = objB.Data;
objA.Dispose();
// All references to objB are also Disposed
objB.Dispose();
As you have seen how a type is constructed can determine how safe or unsafe it is to call the Dispose() method of reference type fields that implemented IDisposable pattern. Perhaps unlike what you have assumed previously the key of implementing a trouble free IDisposable pattern actually is in the overloaded Dispse(bool) method.
You need to be aware of how reference type fields are assigned or instantiated in the types that implements IDisposable and then decide what measures to take. Smart Interface command of Smarties 2008 can never correctly determine the type’s design therefore it would prompt for the fields that implemented IDisposable pattern to be selected by the developer who would know which field is safe its Dispose() method to be called.
The only other thing you need to watch is not to call a type’s Dispose() method too early when copies of the same reference are still being used.
Conclusion
I hope this article cleared some of the confusions or doubts that you had about IDisposable interface in your mind. I leave you with some keys points in this article so that you can come back and remind yourself of them from time to time;
- Try to see IDisposable interface as a service to a type.
- Collection types that hold Entities or Business Objects must implement IDisposable interface.
- To take advantage of “using” in C# make sure types are Disposable.
- Types that are used in recursive or traverse processes must implement IDisposable to release resources as soon as the type’s lifetime is reached.
- The key to implement IDisposable safely is in Overloaded Dispose(bool) method and taking a good look at how the type is designed.
- Ensure the Dispose() method is not called too early when copies of the same reference are still being used.
- IDisposable can ease up the job of GC
- Consider assigning objects to null (Nothing in Visual Basic) more often when IDisposable is not implemented i.e. string (String in Visual Basic) type.
- IDisposable is not just for unmanaged resources.
Added the rest on 20 April, 2008
After Greg’s discussion I felt I need to add more details to this article to clarify certain parts of the article.
One of the things that I touched on was how GC is capable of cleaning up the memory. I stated that since the .net runtime launches the processes GC can monitor and clean up the memory.
The question that you need to ask yourself is?
If GC is so clever why it doesn’t clean up all the processes in your machine, even those that are not compiled in .Net?
Well it can’t because managed code executed very differently. In .Net environment processes are loaded within Application Domains. Here is what MSDN Library says about Application Domains;
"Application domains provide a more secure and versatile unit of processing that the common language runtime can use to provide isolation between applications. You can run several application domains in a single process with the same level of isolation that would exist in separate processes, but without incurring the additional overhead of making cross-process calls or switching between processes. The ability to run multiple applications within a single process dramatically increases server scalability."
In an essence the process has to be created first within the Application Domain to host the assembly (Process Module) that is being loaded.
I have also stated that unlike C++ application when is launched or executed the framework runtime creates the process not the OS. I do agree that this statement can be a bit confusing to some readers. Here is what I meant by that statement.
To understand this first we need to understand .net assemblies which are fundamental unit of deployment. .Net assemblies unlike C++ written applications cannot be executed by OS “directly” but assemblies contain information about how they can be loaded into a runtime (VM). Let call it the “loader” information that OS can understand. Once I read about this in early days of .net but I can’t remember what exactly it is called now. This is true that once you pass 40 your memory starts rusting :)
Here is what MSDN says about assemblies;
"Assemblies are the building blocks of .NET Framework applications; they form the fundamental unit of deployment, version control, reuse, activation scoping, and security permissions. An assembly is a collection of types and resources that are built to work together and form a logical unit of functionality. An assembly provides the common language runtime with the information it needs to be aware of type implementations. To the runtime, a type does not exist outside the context of an assembly."
The way I understand everything up to this point is that executable assemblies (*.exe) are packed with types and resources than can be loaded, executed, and managed in an environment called .Net Framework. They have no common characteristics to native applications that we have known from the past. The OS has no knowledge as how to execute them (run) therefore no process is created “directly” from the call (launching) to the executable assembly. The only thing OS can do is to pass the assembly to its runtime.
The framework itself is written in C++ to create an environment for managed code. The framework itself calls Windows APIs to do so many things i.e. to create a new process or to unload a process. In that sense every process created to host an assembly, known as Process Module, are created by OS but via a call from the framework (runtime).
Something extra about GC…
GC is not as active as you might think in the background or at least by the time an application is terminated (unloaded) still GC has few jobs to finish with the memory cleaning up process. One way that you can observe this is by inserting breakpoints into the Dispose() method of types that implemented IDisposable and unload the running application. I’m not too sure which part of the framework is responsible for these calls but I doubt it the call is from GC. If you do know then please let us all know.
Picture a web hosting server if you will, if GC meant to be that active all processes that are running in the server would require extra processing power to handle the GC activities.
You have to forgive me for not being good at putting down what I really want to say. I have only recently come out of my shelf after number of years working in Software Development field. I would only get better at writing my next articles for you :)