In my previous post I have been talking about reducing code pollution by ‘private helper methods’. I think however that code pollution comes in different shapes and styles; covering the entire subject in a single post would be absolute madness.

In this article I will shed a light on my thoughts on anonymous event handlers, and when to use them. I will use an ASP.NET Web Forms scenario because it is a notorious generator of event handlers. It’s also a facilitator for piles of horrible, non-generic, non-reusable code if used by improperly educated hands, but that’s a different story. Anyways, don’t let the scenario scare you off if you haven’t touched Web Forms or ASP.NET whatsoever, because it’s still pure C# we’re talking here!

I am, however, referring to version 4.0 of the .NET Framework (and higher), and the (anonymous) delegates have gone through a significant maturity process. If you’re targetting versions 2.0 or higher of the .NET Framework you’re still capable of pulling the tricks off I’m about to demonstrate, but you will have to do some things manually that the cool EventHandler and Action classes from version 4.0 will do for you automatically. So, if version 4.0 or higher is not within your grasp, this excellent article might do you some good!

Right, let’s begin! Have you ever encountered code that resembles something like this?

protected void Page_Init(object sender, EventArgs e)
{
   btnReload.Click += btnReload_Click;
   refresh();
}

private void refresh()
{
   grid1.DataBind();
   grid2.DataBind();
}

private void btnReload_Click(object sender, EventArgs e)
{
   refresh();
}

Stop looking for errors; it just works. Also, it does its thing properly, right? But still, why pollute the class body with an event handler method if its sole purpose is nothing more but calling an already defined method? OK, it’s not the same as a ‘helper method’, and implementing event handlers as class members surely isn’t a bad thing. But, come on! This handler does nothing interesting! The meat here is located in the ‘refresh()’-method!

I think there is room for improvement here. So, what about this?

protected void Page_Init(object sender, EventArgs e)
{
   refresh();
   btnReload.Click += new EventHandler((object x_sender, EventArgs x_e) => { refresh(); });
}

private void refresh()
{
   grid1.DataBind();
   grid2.DataBind();
}

That is technically the same implementation, right? Just with less code.

The EventHandler that is instantiated here is a declaration of a delegate, which of course is nothing but a reference to a method. The method here that is represented by the EventHandler is declared *inside* the delegate declaration; it does not have an identity such as a method name in the codebase. (It’s anonymous, get it?)

If you truly want to go crazy then you could also anonimize the ‘refresh()’-method. Like this:

protected void Page_Init(object sender, EventArgs e)
{
   // Scope the anonymous method declaration so that they cannot be used elsewhere in this method.
   {
      Action refresh() = new Action(() => { grid1.DataBind(); grid2.DataBind(); });
      refresh();
      btnReload.Click += new EventHandler((object x_sender, EventArgs x_e) => { refresh(); });
   }
}

What I have done here is anonimizing the ‘refresh()’-method with the Action-class, which is also just a fancy wrapper for anonymous delegate implementations. Now, the class is void (pun intended!) of pollution by context-specific class members.

When you write blocks like this, always make sure that you scope the declaration. It’ll make sure that the ‘refresh()’-method can not be called anywhere else within the Page_Init event handler. Scoping also puts emphasis on ‘keeping it all together’, which is a nice-to-have upon initial code inspection.

So, when is this pattern justified? I usually go by these rules:

In the Page_Init-method, both the ‘refresh’-method as any calling event handler implementations are exclusively declared, and not anywhere else in the class;
The event handler does nothing but calling the ‘refresh’-method;
The logic that is being executed is both too context-specific and trivial that implementing it in a dedicated class would not be worth the effort.

Also, don’t go anonimizing everything everywhere! Just do it where it feels right, such as in specific cases like this. If, for instance, the event handler did not exclusively call the ‘refresh()’-method but also mutates some data elsewhere then I’d already reconsider the anonimization we’ve done here.

Try to use it wisely and experiment with it, you’ll be glad you did!

C#: Getting rid of class member pollution by ‘Helper Methods’

April 12, 2013

General

Leave a comment

Choosing the right accessibility for class members such as Properties or Methods is usually a no-brainer. You start off from private and work your way up to protected, internal, protected internal and public; whatever suits your need. And of course, there are some gotchas such as that you never expose your fields publicly. I barely encounter code written by developers that have issues with this concept, but what I do encounter a lot however is the abundant declaration of private ‘helper methods’. I recognize them by a pattern which more or less looks like this:

public sealed class SomeKindOfProcessor
{
   public DoStuff(int p)
   {
      string str = doSomething(p);
      DateTime dt = workSomeMore(str);
      phaseThree(dt);
   }

   private static string doSomething(int p) { /* Complex junk... */ }
   private static DateTime workSomeMore(string p) { /* Complex junk... */ }
   private static void phaseThree(DateTime p) { /* Complex junk... */ }
}

I’m not very surprised to encounter this type of code in (mostly, legacy) code bases; classes like these are often created by folks who are fresh out of procedural programming languages and still need to find their way into the Object Oriented Programming paradigm. I’m sure that that’s a probable cause, because I used to be one of ’em! The whole idea behind ‘everything is an object’, ‘closed to modification, open for extension’ and ‘program to interface, not implementation’ is hard to grasp at that point, and it’s easy enough to bring ‘Procedural Programming’ to C# whilst turning a completely blind eye to it’s OO-features.

But when you inherit code like this, you’re still stuck with it. And it’s very likely that you can’t just embark on a project in which you treat these ‘god objects’ to a proper Object Oriented implementation. So how do you clean up this pollution without breaking anything?

We’ll get onto that in a minute. First, let’s examine the problem a bit deeper. I consider a method a ‘helper’ if it conforms to the following characteristics:

It’s marked private or protected;
It does not change the state of the object it is a member of.

Concerning the latter characteristic: if the developer is really well-intentioned, he’d have made the helper method static, but more often than not do I come across instance type helper methods

These helper methods are often written with the best intentions; mostly to provide reusable code, but sometimes also just to reduce the length of the method that calls them. Especially the latter type of method usually fits in just a single context: the one which caused the method to be written in the first place.

For me, the biggest issue with this type of method is that they pollute the class body. Upon first inspection of the class members, I get to see three methods which appear to serve some generic purpose. But in reality they are actually bound to a very specific context (and worthless anywhere else).

So, what options do you have? The most logical step of course is to write a new class, which merely consists of the three helper methods.

internal static class DoStuffHelper
{
   public static void DoSomething(int p) { /* Complex junk... */ }
   public static void WorkSomeMore(int p) { /* Complex junk... */ }
   public static void PhaseThree(int p) { /* Complex junk... */ }
}

public sealed class SomeKindOfProcessor
{
   public DoStuff()
   {
      string str = DoStuffHelper.DoSomething(p);
      DateTime dt = DoStuffHelper.WorkSomeMore(str);
      DoStuffHelper.PhaseThree(dt);
   }
}

That’s what it says in the book right? Write classes for everything! Yes, but to me, this is not a great solution either. Instead of polluting the ‘SomeKindOfProcessor’-class with context-specific methods, we now have polluted the assembly with a context-specific class.

Luckily, C# supports nested types, which allows us to write something like this:

public sealed class SomeKindOfProcessor
{
   public DoStuff()
   {
      string str = DoStuffHelper.DoSomething(p);
      DateTime dt = DoStuffHelper.WorkSomeMore(str);
      DoStuffHelper.PhaseThree(dt);
   }

   private static class DoStuffHelper
   {
      public static string DoSomething(int p) { /* Complex junk... */ }
      public static DateTime WorkSomeMore(string p) { /* Complex junk... */ }
      public static void PhaseThree(DateTime p) { /* Complex junk... */ }
   }
}

This is a great solution, because the context-specific helper methods are:

…grouped by context which is described by the nested class;
…unavailable anywhere outside the class that owns the nested class.

This will at least clean up the body of the DoStuffHelper class. You’re still to investigate what further actions are necessary to improve the code at this stage but at least you’ve got a means to organize things without touching a single line of business logic.

There is a bonus involved with this strategy: as soon as you discover a purpose for the ‘DoStuffHelper’-class *outside* of the ‘SomeKindOfProcessor’-class, all you need to do is raise its accessibility level and move it outside the ‘SomeKindOfProcessor’-class. After all, it being used in different scenario’s no longer makes it a context-specific class from that point.

C#: Returning NULL References or Uninitialized Values of Generic Types with the ‘default’-Keyword

March 20, 2013

General

3 Comments

I stumbled across an interesting situation whilst working with generics in C#:

// This class will not compile, because the type 
// parameter TType is not exclusively constrained 
// to reference types. NULL is not valid for value 
// types. 
// The compiler interprets this as a violation.
public class GenericInstanceGenerator<TType> where TType : new()
{
   public TType Generate(bool generateNull)
   {
      if (generateNull)
         return new TType();
      else
         return null; // <- This will not compile.
   }
}

The compiler does not comply with returning NULL here because it is unknown whether TType is a value type or a reference type. Value types cannot be NULL, so the generic nature of TType would be broken if this would compile. One solution is to introduce a new constraint on the TType parameter, indicating that we’re dealing with a reference type:

// This piece of code compiles. The TType type parameter has been 
// constrained to class types. (In other words: reference types)
// This works, but in this state the class can not be used with 
// value types.
public class GenericInstanceGenerator<TType> 
   where TType : new()
   where TType : class
{
   public TType Generate(bool generateNull)
   {
      if (generateNull)
         return new TType();
      else
         return null;
   }
}

This is a viable solution, but now you have lost support for non-reference types (structs) such as integer and boolean values. This might defeat the purpose of your type parameter usage in the first place! So, do we have any options left? The default-keyword comes to the rescue!

// This is where the ‘default’-keyword comes in. 
// The class now supports initialization of default 
// values for any type parameter that is passed on. 
// Note that some names have been changed; we don’t 
// have a ‘generateNull’-parameter anymore because 
// that doesn’t cover the case with value types.
public class GenericInstanceGenerator<TType> 
   where TType : new()
{
   public TType Generate(bool generateNull)
   {
      if (generateNull)
         return new TType();
      else
         return default(TType);
   }
}

So why does this build? Usage of the default-keyword here represents a call to a method that returns a value of the TType type in its uninitialized state. If TType was a reference type, then a NULL reference would be returned. But if TType was for instance a numeric or boolean value (which are a value types, not reference types) then it would respectively return 0 or false.

If that doesn’t make sense to you, let’s compare the output of the GenericGenerator-class with uninitialized variables.

// [Using GenericGenerator with value types]
int intOne; // <- Not initialized, so it's 0.
int intTwo = new GenericGenerator<int>().Generate(true);// <- Initialized with 'default'. The value is 0.
int intThree = new GenericGenerator<int>().Generate(false);// <- Initialized with default constructor. The value is 0.

// [Using GenericGenerator with reference types]
object objectOne; // <- Not initialized, so it's NULL.
object objectTwo = new GenericGenerator<object>().Generate(true); // <- Initialized with 'default'. The value is NULL.
object objectThree = new GenericGenerator<object>().Generate(false); // <- Initialized with default constructor. The value is *not* NULL. It's a reference to an instance of the 'object'-class.

As you can see, default produces a value of TType that is equal to declaring (but not initializing) a variable of TType. And that’s actually what I *really* wanted to do with the class I pictured in the first image of this post.

Nullable Unique Indexes in Relational Databases

March 11, 2013

General

2 Comments

In an ideal situation, a record stored in a table represents a single fact in the real world. Unfortunately, as long as relational databases have to deal with human data input they also need to deal with human mistakes, such as duplicate records. Consider the following table:

Presenting the ‘Person’-table. It has two unique indexes: the primary key on ‘PersonId’, and an additional unique index on ‘Email’.

In this table, named ‘Person’, we introduce a unique index based on the ‘Email’-column. This index prevents the introduction of multiple ‘Person’-records that are using the same e-mail address. Applying unique indexes on ‘Email’-columns are a pretty solid defense mechanism against multiple records that represent the same real-world entity, also known duplicate records. In this case, it’s a good barrier against multiple ‘Person’-records that are representing the same person. It’s obviously not perfect, but that’s a different story.

There’s nothing wrong with this solution. But there’s a catch: what if there are countless ‘Person’-records about to be imported from an external system that are lacking e-mail data? You could try turning the ‘Email’-column into a nullable column but that won’t get you anywhere. NULL is a unique value by itself. Attempts at inserting multiple records with a NULL value in the ‘Email’-column will fail due to violation of the unique index.

Technically speaking, with the single table-approach, there are two possible solutions:

Drop the unique index;
Discard all records that have no e-mail address.

You might find the latter solution acceptable in certain scenarios but most developers would probably go with the former. At least, they would if they are not aware of the fact that there’s another way out:

An ‘attribute table’-approach: the ‘Person’-table is now accompanied by the ‘PersonEmail’-table. A record in ‘Person’ may be accompanied by a single ‘PersonEmail’-record. The ‘PersonEmail’-table gets two unique indexes: the ‘PersonId’-based primary key, and an additional unique index on ‘Email’.

There you go. The ‘PersonEmail’ table will have a unique index on its ‘Email’-column, enforcing the uniqueness of any e-mail address associated with a person. The primary key ‘PersonId’ in the ‘PersonEmail’-table will make sure that a ‘Person’-record can only be represented once by a ‘PersonEmail’-record. Persons that don’t have an e-mail address will simply not be represented by a record in the ‘PersonEmail’-table.

In other words: goal achieved. That wraps it up for the relational model! It’s obvious that in the object model, ‘Email’ can still be represented as a property in a hypothetical ‘Person’ data transfer class:

In the object model, representing the 'attribute table'-approach can be executed easily in a single class.

In the object model, representing the ‘attribute table’-approach can be executed easily in a single data transfer class.

If that’s not acceptable to you, you can also represent the relational model explicitly and represent ‘PersonEmail’ with a ‘PersonEmail’-data transfer class altogether.

If you're dealing with an object oriented language that doesn't implement nullable strings and you're concerned that the 'attribute table'-approach should be represented explicitly in the object model, nothing is holding you back to be honest about it and implement two data transfer classes.

If you feel like the ‘attribute table’-approach should be represented explicitly in the object model, nothing is holding you back to be honest about it and implement two data transfer classes.

Obviously, the scenario I’m discussing here is just an example. The same solution could be applied to other columns that belong in indexes, such as:

Telephone numbers;
Zipcodes;
Social security numbers;
(Possibly) anything else found in table indexes.

On a final note: try to avoid nullable columns in the relational model as much as you can. Nullable equals optional, which means you have to cover cases in which data is present and absent. That usually leads to more complexity in whatever business logic you’re implementing on top of your model.

Object Model versus Relational Model: Boolean Properties

March 5, 2013

General

4 Comments

In business applications that are supported by relational databases, chances are that the object model is not exactly the same as its underlying relational model. If you are a developer then you might have stumbled across this statement. But what does it really mean? In my brief career as a software engineer it has taken me quite some time to understand where the two models differentiate, and sometimes I had to learn it the hard way.

In this article I’m going to share my findings on the usage of boolean properties in data transfer classes in the object model and how blatantly mirroring them in the relational model can lead to difficulties sometimes. Of course, I will also present a solution for this.

Let me start out with the basic idea of tackling the object-model-versus-relational-model ‘problem’. I presume you are already familiar with this concept, but let me sum it up for context’s sake.

When a new data entity is to be introduced to an application, it usually starts out with the definition of its data structure. For example: if I want to work with personal information such as names, birthdates and/or addresses I simply need to introduce an entity called ‘Person’. It has to be represented by a table with a corresponding set of columns.

A table called ‘person’ and a data transfer class which can be used to represent records in the aforementioned table.

In order to work with (data transfer) object representations of records in this table, it’s easy enough to introduce a new class to the codebase. Every column in the table is represented by an equally named and typed property in its representing class. The class is instantiated and consumed by some kind of data access layer for some CRUD-operations and that’s pretty much it. For most of us.

To some developers the whole object-model-versus-relational-model issue may appear to be resolved at this stage. It certainly is with most cases but I have been fortunate enough to stumble across situations in which this solution won’t suffice.

Consider the following situation: In the object model, a change occurs which involves the addition of a boolean property called ‘IsManager’ to the ‘Person’-class:

A change caused the introduction of the ‘IsManager’-property to the ‘Person’ data transfer class.

It’s very tempting to mirror the changes made in the object model into the relational model like this:

Mirroring the change in the data transfer class is the easiest thing to do, but it’s not necessarily the best.

…but this introduces a new problem: what if the introduction of a new entity, called ‘ManagementReport’, is introduced in the object model and you want to exclude the possibility of having ‘ManagementReport’-records in the database that are linked to non-managers?

In an entity relation diagram, the business rule ‘ManagementReports only belong to managers’ will not be directly visible.

To some, preventing ‘ManagementReport’-record instances pointing to ‘non-managers’ in the database, the following solutions may seem obvious:

Write more business logic and validation in the codebase;
Write a set of check constraints in the database.

Both solutions are bad news: more code equals more bugfixing and additional maintenance headaches. But did you know that there’s a third option which does not only enforce the business rules we’ve just specified surrounding the ‘ManagementReport’-entity, but also explains it when the relational model is expressed in an entity relation diagram?

If you’re not aware of this third option, then this article is definitely for you.

I promised that the third solution would explain itself in an entity diagram, so without further ado:

‘Managers’ are data attributes of ‘Persons’, but *not* data objects themselves.

Note the absence of the ‘IsManager’-bit column. ‘IsManager’ may still be expressed as a boolean value in the object model, but in the relational model it is based on the fact whether a ‘Person’-record has its primary key represented in the ‘Manager’-table. Also note the fact that the ‘Manager’-table does not have an identity column (or surrogate key). Records in this table are simply not allowed to exist as long as they don’t represent a record in the ‘Person’-table.

And that’s exactly what we wanted in the first place: ‘Manager’ is an attribute, not an entity by itself.

I think that lessons like these on relational modeling should be boilerplate material to most developers but in my experience this isn’t always the case. I can’t blame them: there are some (ORM)-tools out there that completely disregard these features of relational modeling.

Whenever I come across situations in which entities are assigned with special attributes I usually look into solutions like this. Attributes such as ‘IsManager’ on an entity named ‘Person’ are usually bound to have special logic when it comes down to relational data.

It’s also not always applicable or necessary to implement this type of modeling technique. But if you’re specifically targeting a relational database with your application and you are concerned about data validation and code maintainability then this might be a suitable solution.

—Mirage Coder

Code solves problems until it's a problem by itself

C#: Implementing anonymous event handlers, and when…

C#: Getting rid of class member pollution by ‘Helper Methods’

C#: Returning NULL References or Uninitialized Values of Generic Types with the ‘default’-Keyword

Nullable Unique Indexes in Relational Databases

Object Model versus Relational Model: Boolean Properties