Tuesday, August 27, 2013

C# 4.0's New Features Explained

Introduction

The Beta for Visual Studio 2010 is upon us and included is the CTP of C# 4.0. While C# 4.0 does not represent a radical departure from the previous version, there are some key features that should be understood thoroughly in order to take advantage of their true potential.

Background

The white paper for C# 4.0's features does a good job of explaining the changes in the language. I thought, however, that some larger code samples and historical perspective would help people (especially new developers) in understanding why things have changed.

Feature Categories

Microsoft breaks the new features into the following four categories so I will maintain the pattern:
  • Named and Optional Parameters
  • Dynamic Support
  • Variance
  • COM Interop

Conventions

Some of the examples assume the following classes are defined:
public class Person
{
    public string FirstName { get; set; }
    public string LastName { get; set; }
}

public class Customer : Person
{
    public int CustomerId { get; set; }
    public void Process() { ... }
}

public class SalesRep : Person
{
    public int SalesRepId { get; set; }
    public void SellStuff() { ... }
}

Named and Optional Parameters

We'll start off with one of the easier features to explain. In fact, if you have ever used Visual Basic, then you are probably already familiar with it.

Optional Parameters

Support for optional parameters allows you to give a method parameter a default value so that you do not have to specify it every time you call the method. This comes in handy when you have overloaded methods that are chained together.

The Old Way

public void Process( string data )
{
    Process( data, false );
}

public void Process( string data, bool ignoreWS )
{
    Process( data, ignoreWS, null );
}

public void Process( string data, bool ignoreWS, ArrayList moreData )
{
    // Actual work done here
}
The reason for overloading Process in this way is to avoid always having to include "false, null" in the third method call. Suppose 99% of the time there will not be 'moreData' provided. It seems ridiculous to type and passnull so many times.
// These 3 calls are equivalent
Process( "foo", false, null );
Process( "foo", false );
Process( "foo" );

The New Way

public void Process( string data, bool ignoreWS = false, ArrayList moreData = null )
{
    // Actual work done here
}
// Note: data must always be provided because it does not have a default value
Now we have one method instead of three, but the three ways we called Process above are still valid and still equivalent.
ArrayList myArrayList = new ArrayList();
Process( "foo" ); // valid
Process( "foo", true ); // valid
Process( "foo", false, myArrayList ); // valid
Process( "foo", myArrayList ); // Invalid! See next section
Awesome, one less thing VB programmers can brag about having to themselves. I haven't mentioned it up to this point, but Microsoft has explicitly declared that VB and C# will be "co-evolving" so the number of disparate features is guaranteed to shrink over time. I would like to think this will render the VB vs. C# question moot, but I'm sure people will still find a way to argue about it. ;-)

Named Parameters

In the last example, we saw that the following call was invalid:
Process( "foo", myArrayList ); // Invalid!
But if the boolean ignoreWS is optional, why can't we just omit it? Well, one reason is for readability and maintainability, but primarily because it can become impossible to know what parameter you are specifying. If you had two parameters of the same type, or if one of the parameters was "object" or some other base class or interface, the compiler would not know which parameter you are sending. Imagine a method with ten optional parameters and you give it a single ArrayList. Since an ArrayList is also an object, an IList, and an IEnumerable, it is impossible to determine how to use it. Yes, the compiler could just pick the first valid option for each parameter (or a more complex system could be used), but this would become impossible for people to maintain and would cause countless programming mistakes.
Named parameters provide the solution:
ArrayList myArrayList = new ArrayList();
Process( "foo", true ); // valid, moreData omitted
Process( "foo", true, myArrayList ); // valid
Process( "foo", moreData: myArrayList); // valid, ignoreWS omitted
Process( "foo", moreData: myArrayList, ignoreWS: false ); // valid, but silly
As long as a parameter has a default value, it can be omitted, and you can just supply the parameters you want via their name. Note in the second line above, the 'true' value for ignoreWS did not have to be named since it is the next logical parameter.

Dynamic Support

OK, I'm sure we all have had to deal with code similar to the following:
public object GetCustomer()
{
    Customer cust = new Customer();
    ...
    return cust;
}
...
Customer cust = GetCustomer() as Customer;
if( cust != null )
{
    cust.FirstName = "foo";
}
Note the GetCustomer method returns object instead of Customer. Code like this is frustrating because you know it returns a Customer; it always has and it always will. Unfortunately, the coder chose to return object and you can't change it because it modifies the public contract and could potentially break legacy software.
Another instance in which you will be dealing with an object that you know is another type is Reflection.
Type myType = typeof( Customer );
ConstructorInfo consInfo = myType.GetContructor(new Type[]{});
object cust = consInfo.Invoke(new object[]{});
((Customer)cust).FirstName = "foo";
Because Reflection can act on any type, ConstructorInfo.Invoke() must return object. Like the first example, this forces you to cast the object. Now, consider the situation where you can't, or don't want to, cast the object. Perhaps, the code author is always changing the name of the type or creating different versions (e.g., 'Customer2'), but the properties and methods stay the same. The examples above assume you, as the programmer, have knowledge of what the true type is. What if you didn't? What if you had to use Reflection to find and invoke methods? What if the object being returned was coming from IronPython, JavaScript, COM, or some other non-statically typed environment?

Enter 'dynamic'

The dynamic keyword is new to C# 4.0, and is used to tell the compiler that a variable's type can change or that it is not known until runtime. Think of it as being able to interact with an Object without having to cast it.

dynamic cust = GetCustomer();
cust.FirstName = "foo"; // works as expected
cust.Process(); // works as expected
cust.MissingMethod(); // No method found!
Notice we did not need to cast nor declare cust as type Customer. Because we declared it dynamic, the runtime takes over and then searches and sets the FirstName property for us. Now, of course, when you are using a dynamic variable, you are giving up compiler type checking. This means the call cust.MissingMethod() will compile and not fail until runtime. The result of this operation is a RuntimeBinderException because MissingMethod is not defined on the Customer class.
The example above shows how dynamic works when calling methods and properties. Another powerful (and potentially dangerous) feature is being able to reuse variables for different types of data. I'm sure the Python, Ruby, and Perl programmers out there can think of a million ways to take advantage of this, but I've been using C# so long that it just feels "wrong" to me.
dynamic foo = 123;
foo = "bar";
OK, so you most likely will not be writing code like the above very often. There may be times, however, when variable reuse can come in handy or clean up a dirty piece of legacy code. One simple case I run into often is constantly having to cast between decimal and double.
decimal foo = GetDecimalValue();
foo = foo / 2.5; // Does not compile
foo = Math.Sqrt(foo); // Does not compile
string bar = foo.ToString("c");
The second line does not compile because 2.5 is typed as a double and line 3 does not compile because Math.Sqrtexpects a double. Obviously, all you have to do is cast and/or change your variable type, but there may be situations where dynamic makes sense to use.
dynamic foo = GetDecimalValue(); // still returns a decimal
foo = foo / 2.5; // The runtime takes care of this for us
foo = Math.Sqrt(foo); // Again, the DLR works its magic
string bar = foo.ToString("c");

Update

After some great questions and feedback, I realized I need to clarify a couple points I made above. When you use thedynamic keyword, you are invoking the new Dynamic Language Runtime libraries (DLR) in the .NET framework. There is plenty of information about the DLR out there, and I am not covering it in this article. Also, when possible, you should always cast your objects and take advantage of type checking. The examples above were meant to show howdynamic works and how you can create an example to test it. Over time, I'm sure best practices will emerge; I am making no attempt to create recommendations on the use of the DLR or dynamic.
Also, since publishing the initial version of this article, I have learned that if the object you declared as dynamic is a plain CLR object, Reflection will be used to locate members and not the DLR. Again, I am not attempting to make a deep dive into this subject, so please check other information sources if this interests you.

Switching Between Static and Dynamic

It should be apparent that 'switching' an object from being statically typed to dynamic is easy. After all, how hard is it to 'lose' information? Well, it turns out that going from dynamic to static is just as easy.
Customer cust = new Customer();
dynamic dynCust = cust; // static to dynamic, easy enough
dynCust.FirstName = "foo";
Customer newCustRef = dynCust; // Works because dynCust is a Customer
Person person = dynCust; // works because Customer inherits from Person
SalesRep rep = dynCust; // throws RuntimeBinderException exception
Note that in the example above, no matter how many different ways we reference it, we only have one Customerobject (cust).

Functions

When you return something from a dynamic function call, indexer, etc., the result is always dynamic. Note that you can, of course, cast the result to a known type, but the object still starts out dynamic.

dynamic cust = GetCustomer();
string first = cust.FirstName; // conversion occurs
dynamic id = cust.CustomerId; // no conversion
object last = cust.LastName; //conversion occurs
There are, of course, a few missing features when it comes to dynamic types. Among them are:
  • Extension methods are not supported
  • Anonymous functions cannot be used as parameters
We will have to wait for the final version to see what other features get added or removed.

Variance

OK, a quick quiz. Is the following legal in .NET?
// Example stolen from the whitepaper  ;-)
IList<string> strings = new List<string>();
IList<object> objects = strings;
I think most of us, at first, would answer 'yes' because a string is an object. But the question we should be asking ourselves is: Is a -list- of strings a -list- of objects? To take it further: Is a -strongly typed- list of strings a -strongly typed- list of objects? When phrased that way, it's easier to understand why the answer to the question is 'no'. If the above example was legal, that means the following line would compile:
objects.Add(123);
Oops, we just inserted the integer value 123 into a List<string>. Remember, the list contents were never copied; we simply have two references to the same list. There is a case, however, when casting the list, this should be allowed. If the list is read-only, then we should be allowed to view the contents any (type legal) way we want.

Co and Contra Variance

From Wikipedia:
Within the type system of a programming language, a type conversion operator is:
  • covariant if it preserves the ordering, =, of types, which orders types from more specific to more generic;
  • contravariant if it reverses this ordering, which orders types from more generic to more specific;
  • invariant if neither of these apply.
C# is, of course, covariant, meaning a Customer is a Person and can always be referenced as one. There are lots of discussions on this topic, and I will not cover it here. The changes in C# 4.0 only involve typed (generic) interfaces and delegates in situations like in the example above. In order to support co and contra variance, typed interfaces are going to be given 'input' and 'output' sides. So, to make the example above legal, IList must be declared in the following manner:
public interface IList<out T> : ICollection<T>, IEnumerable<T>, IEnumerable
{
    ...
}
Notice the use of the out keyword. This is essentially saying the IList is readonly and it is safe to refer to aList<string> as a List<object>. Now, of course, IList is not going to be defined this way; it must support having items added to it. A better example to consider is IEnumerable which should be, and is, readonly.
public interface IEnumerable<out T> : IEnumerable
{
    IEnumerator<T> GetEnumerator();
}
Using out to basically mean 'read only' is straightforward, but when does using the in keyword to make something 'write only' useful? Well, it actually becomes useful in situations where a generic argument is expected and only used internally by the method. IComparer is the canonical example.
public interface IComparer<in T>
{
    public int Compare(T left, T right);
}
As you can see, we can't get back an item of type T. Even though the Compare method could potentially act on the left and right arguments, it is kept within the method so it is a 'black hole' to clients that use the interface.
To continue the example above, this means that an IComparer<object> can be used in the place of anIComparer<string>. The C# 4.0 whitepaper sums the reason up nicely: 'If a comparer can compare any two objects, it can certainly also compare two strings'. This is counter-intuitive (or maybe contra-intuitive) because if a method expects a string, you can't give it an object.

Putting it Together

OK, comparing strings and objects is great, but I think a somewhat realistic example might help clarify how thenew variance keywords are used. This first example demonstrates the effects of the redefined IEnumerable interface in C# 4.0. In .NET 3.5, line 3 below does not compile with an the error: 'can not convert List<Customer> to List<Person>'. As stated above, this seems 'wrong' because a Customer is a Person. In .NET 4.0, however, this exact same code compiles without any changes because IEnumerable is now defined with the out modifier.
MyInterface<Customer> customers = new MyClass<Customer>();
List<Person> people = new List<Person>();
people.AddRange(customers.GetAllTs()); // no in 3.5, yes in 4.0
people.Add(customers.GetAllTs()[0]); // yes in both
...
interface MyInterface<T>
{
    List<T> GetAllTs();
}
public class MyClass<T> : MyInterface<T>
{
    public List<T> GetAllTs()
    {
        return _data;
    }
    private List<T> _data = new List<T>();
}
This next example demonstrates how you can take advantage of the out keyword. In .NET 3.5, line 3 compiles, but line 4 does not with the same 'cannot convert' error. To make this work in .NET 4.0, simply change the declaration ofMyInterface to interface MyInterface<out T>. Notice that in line 4, T is Person, but we are passing theCustomer version of the class and interface.
MyInterface<Person> people = new MyClass<Person>();
MyInterface<Customer> customers = new MyClass<Customer>();
FooClass<Person>.GetThirdItem(people);
FooClass<Person>.GetThirdItem(customers);
...
public class FooClass<T>
{
    public static T GetThirdItem(MyInterface<T> foo)
    {
        return foo.GetItemAt(2);
    }
}
public interface MyInterface<out T>
{
    T GetItemAt(int index);
}
public class MyClass<T> : MyInterface<T>
{
    public T GetItemAt(int index)
    {
        return _data[index];
    }
    private List<T> _data = new List<T>();
}
This final example demonstrates the wacky logic of contravariance. Notice that we put a SalesRep 'inside' our Personinterface. This isn't a problem because a SalesRep is a Person. Where it gets interesting is when we pass theMyInterface<Person> to FooClass<Customer>. In essence, we have 'inserted' a SalesRep into an interface declared to work with only Customers! In .NET 3.5, line 5 does not compile; as expected. By adding the in keyword to our interface declaration in .NET 4.0, everything works fine because we are 'agreeing' to treat everything as a Personinternally and not expose the internal data (which might be that SalesRep).
MyInterface<Customer> customer = new MyClass<Customer>();
MyInterface<Person> person = new MyClass<Person>();
person.SetItem(new SalesRep());
FooClass<Customer>.Process(customer);
FooClass<Customer>.Process(person);
...
public class FooClass<T>
{
    public static void Process(MyInterface<T> obj)
    {
    }
}
public interface MyInterface<in T>
{
    void SetItem(T obj);
    void Copy(T obj);
}
public class MyClass<T> : MyInterface<T>
{
    public void SetItem(T obj)
    {
        _item = obj;
    }
    private T _item;
    public void Copy(T obj)
    {
    }
}

COM Interop

This is by far the area in which I have the least experience; however, I'm sure we have all had to interact with Microsoft Office at one point and make calls like this:
// Code simplified for this example
using Microsoft.Office.Interop;
using Microsoft.Office.Interop.Word;

object foo = "MyFile.txt";
object bar = Missing.Value;
object optional = Missing.Value;

Document doc = (Document)Application.GetDocument(ref foo, ref bar, ref optional);
doc.CheckSpelling(ref optional, ref optional, ref optional, ref optional);
There are (at least) three problems with the code above. First, you have to declare all your variables as objects and pass them with the ref keyword. Second, you can't omit parameters and must also pass the Missing.Value even if you are not using the parameter. And third, behind the scenes, you are using huge (in file size) interop assemblies just to make one method call.
C# 4.0 will allow you to write the code above in a much simpler form that ends up looking almost exactly like 'normal' C# code. This is accomplished by using some of the features already discussed; namely dynamic support and optional parameters.
// Again, simplified for example.
using Microsoft.Office.Interop.Word;

var doc = Application.GetDocument("MyFile.txt");
doc.CheckSpelling();
What will also happen behind the scenes is that the interop assembly that is generated will only include the interop code you are actually using in your application. This will cut down on application size tremendously. My apologies in advance for this weak COM example, but I hope it got the point across.

Opps Part 1 : Abstraction

  Abstraction in C# is a fundamental concept of object-oriented programming (OOP) that allows developers t...