Mapping Dependencies with Mono Cecil
Posted by Tom on 2012-04-15 10:12
So a while back I got curious about how a .NET assembly would look if you could map the internal structure. There is code around there to create maps of references at an assembly level, but I wanted something more fine-grained. If a class A contains a reference to class B then we could say that A depends on B. Now we look at B and see that it references C and D and eventually we can build a directed graph of all these dependencies.
I decided to take Mono Cecil for a spin this time, simply in order to try something new. With hindsight I don't think it would have been half as easy to catch the unassigned local variable case using .NET reflection since it doesn't have any real abstraction at the IL level. But I'm getting ahead of myself. Let's start from the top.
Enumerating Dependencies
First of all: what form can these references take? At the class level:
- Inheritance and Implementation - Each class can inherit once and implement to its heart content, so we need to look for both of these when we check each class.
- Properties and fields - Pretty obvious here. We'll need to check the types on each property and field.
At the method level we've got a few things to check:
- Parameters
- Local variables
- Unassigned local variables
I'm going to expound on that last one because it doesn't make any sense, but I've spent the last five minutes wracking my brains for a better term and I'm now bored and stumped, so it's example time.
class TestClass
{
void TestCase()
{
var foo = new HereIsMyClass();
var bar = new HereIsMyReader(foo);
}
}
This acts as expected. However . . .
class TestClass
{
void TestCase()
{
var boo = new HereIsMyReader(new HereIsMyClass());
}
}
In the first example the HereIsMyClass dependency will be picked up due to the local variable, but in the second it won't. The HereIsMyClass class itself will still be processed since it's on the parameter list of the HereIsMyReader constructor, but we will miss the association between TestClass and HereIsMyClass if we only check for local variables. Curses!
So instead we're grinding through all of the instructions in every method. Nasty, but I can't find a neater alternative. And to be honest once we're doing that we don't need to check parameters and local variables any more either since they'll be picked up while looking for the last case.
And now for some cases best categorised under 'other' . . .
Attributes
Attributes can be specified at global scope (to specify attributes on the containing assembly or module) and for type-declarations (Section 9.5), class-member-declarations (Section 10.2), interface-member-declarations (Section 13.2), struct-member-declarations (Section 11.2), enum-member-declarations (Section 14.3), accessor-declarations for properties (Section 10.6.2), event-accessor-declarations (Section 10.7.1), and formal-parameter-lists (Section 10.5.1).
So, basically, anywhere. Less facetiously, we're going to have to check for attributes on class, properties, fields and methods.
Generics
The next irritating corner case that needs handling: generics. Checking the GenericParameters property of the class or method will give us back a list of constructed generic types and that is all well and good, but say one of those parameters is akin to the deliciously convoluted example below.
string, IList<Tuple<int, Tuple<MyClass, string, decimal>, SomeOtherClass>>>>
We're going to have to walk a tree in order to get back all of the relevant types. Once we get a generic parameter we're going to need to recurse into each of its own generic parameters to make sure we've caught them all.
Compiler Generated Types
In some situations (anonymous types and LINQ spring to mind) the compiler itself will create types. These don't have any relevence to the job at hand, so let's filter them out. This is simply a case of checking to see if a Type has the CompilerGeneratedAttribute assigned. Please note that I've not actually done this yet, but it's the next job on the list!
The Code!
As usual all of this is on GitHub under the charming name Spaghetti Detector, which given an assembly it will enumberate the classes in it and check them for references to other types using the above strategy, until it builds up a map of the module as a whole. It'll happily jump into other assemblies, and by default it filters out the System and Microsoft namespaces, or we'd be here all day. It can also be limited to only chase references to a certain depth.
Finally, if the JSON output of the sample application seems a little unusual it's because I'm serialising into a format designed to be loaded by another piece of code I wrote around the same time: a force-based graph renderer in Javascript. Next post (once I stop playing Tribes Ascend ;-) will probably be exploring some of the implementation details of that, since it kept me amused for a while and got me started with the Raphael SVG Javascript library.
Thoughts on Cecil
Unless you really get down into the IL of your target code then I don't think you'll see a lot of difference between .NET Reflection and Mono Cecil. They're both well equipped for code inspection. Conceptually I still find Cecil's distinction between a TypeDefinition and a TypeReference a little hard to grasp, but it started to make more sense once I began to think of it as similar to the relationship between classes and objects. I dunno. Maybe I'm just being fick.
There doesn't seem to any documentation, which is a bit hairy for a project of this size and scope, but the community is very active and most of the big questions have already been answered on the forums.
The only genuine problem I ran into was using NUnit with Cecil on 64 bit Win7, which eventually led me to the linked Gist and was all fixed in short order. The VS solution files are right there in the source downloads, so rebuilding using the Microsoft .NET compiler is crazy simple.
To be continued . . .
If anyone knows of a slicker alternative (what I really want is a TypeDefinition.GetReferencedTypes or something) then please drop me a line. In the mean time, hopefully this will save someone else some work. I'm sure there are some more zany cases that I haven't covered. It's bounded by my imagination, and my knowledge of IL is woefully inadaquate.
That said, this project has piqued my interest in code analysis (and it has some interesting visualisation opportunities on the side) so expect more!