Thursday 16 October 2008

Programming C++/CLI (VS2005/2008)

Lack of Automation, Plethora of Design Choices

C++CLI (formerly but sometimes still referred to as "Managed C++") is a potent technology for making C++ code intelligible from C# client programs. Fortunately or unfortunately, .NET does not have any javah style tools that can automatically generate managed C++ headers for a given c# source file. S0 for now, you have to write your headers manually. But Managed C++ also allows for much greater flexibility than Java in writing managed/unmanaged interfaces and thus more variety of designs are possible.

The main keywords in C++/CLI are as follows.

  1. ref -- as in "public ref class" declares a managed class or struct.
  2. hat operator -- e.g. in return types such as System::String^, denotes a handle to an object on the managed heap (this is a separate heap maintained by the CLR and implements asynchronous gc).
  3. percent operator -- this denotes a tracking reference and can apply to ValueTypes (e.g. int%) or reference types (e.g. String^%) to denote a modifiable parameter passed by reference.
  4. gcnew - when returning managed objects e.g. return gcnew String("kill germs"); or, if converting from an STL string, you might do something like this: return gcnew String( stlString.c_str() ).
To compile a C++ program as managed C++ you need to use the /clr compile flag. (You can compile flags in Project Properties -> Config -> C++ -> Command Line).
Arguments, Return Types and the Blittability of Data Types: Strings and Arrays are non-blittable!
To make sense to C#, all return types must be CLR types and denoted as being on the managed heap using the hat operator. Inputs, though, can be C data types e.g. pointers, which are legal in C# provided pointers to stack allocated objects are used in an unsafe context, and pointers to heap-allocated (and hence managed) objects are pinned (i.e. not shifted around by the garbage collector). It is a design decision whether to allow unmanaged data as inputs into your managed C++ interface.
Certain data types are blittable across language boundaries, the standards ones for C++/C# are enumerated in this article on blittable and non-blittable types. These include System.Single, System.Double, System.Byte, System.IntPtr but NOT System.String, System.Array or System.Char. STRINGS AND ARRAYS REQUIRE SPECIAL CODE WHEN MOVING BETWEEN C# AND C++!!!
Core C# Keywords for interacting with C++/CLI are as follows:
  1. fixed - used to create pinning pointers. multiple pointers of the same type can be "fixed" in the same clause e.g fixed (byte* src=array1, dest=array2) {...}. Otherwise use nested fix statements. "fixed" is just syntactic sugar for GCHandle.Alloc(pointer_into_some_array, GCHandleType.Pinned).
  2. GCHandle - see fixed, relates to handling a managed type from unmanaged memory. Four kinds of GCHandle can be created, a pinned handle (commonly used), a normal handle, Weak handle and WeakTrackResurrection. GCHandles are allowed to be value types, the reason for this is that the lifetime of a GCHandle is controlled by Alloc and Free and not the garbage collector. Don't get too excited though - ARRAYS of GCHandles still need to be created on the managed heap!
Examples
Also, take a look at this msdn example. Two other fundamental concepts in Managed C++ are pinned pointers and interior pointers. Interior pointers are pointers into the CLI heap which point to managed objects (or members thereof) and thus have a magical "dynamic" property which normal pointers do not have (and do not need), namely tracking objects as they move through the heap. As we will discuss later, care must be taken when converting between interior pointers and pinned pointers!
System.Array class
It is worth knowing this class well, since it is the base class for all arrays in the CLR! Avoid provoking errors like "Add is not a method of System::Array". Also, as we know, arrays do not transmit seamlessly between C# and C++. Let's get down to business. Array is a "ref" class and an "abstract" class - only instantiable in its derived form. Despite this you can't derive from Array directly. If you try you get the error "cannot derive from special class 'System.Array'". Useful methods and properties:
  1. Length (property). 32 bit integer representing number of elements. For a multidimensional array, call it d, to find the number of elements allocated for the first vector, we need d[0].Length. For 64 bit we need to use LongLength.
Multidimensional Arrays, Two Phrases
In C# the basic syntax for defining a (rectangular) multidimensional array is as follows:
int [,] myArray = new int[2,2]; myArray[0,0] =1; myArray[0,1]=2;
But this is not the only syntax. There is also a syntax for jagged, or non-rectangular, arrays. It looks like C++ (although you can't define arrays in C++ as below, you need subscripts, and brackets following the variable name rather than preceding it, but that's by the by):
double[][] locationCodes = new double[2][]; locationCodes[0] = new double[3];
These jagged arrays can be represented in Managed C++ using the templated form of the Array class: array<array<double>^>^ locationCodes = gcnew array<array<double>^>(num_arrays);
Can I pin a multidimensional array?
The short answer is no. A multidimensional array contains System.Arrays, which are non-blittable types and .NET DOES NOT ALLOW PINNING OF ARRAYS OF NON-BLITTABLE TYPES (GCHandle will generate a run-time exception). Pinning an array of primitive types is not a problem.
If you pin an array of arrays by a) pinning the subarrays b) pinning the main array (using a pinning pointer, as GCHandle won't allow pinning of arrays of non-blittable types) your program will compile. However, passing in your MD array from C# to Managed C++ then forwarding to a native C++ function (which copies the arrays) may result in a System.AccessViolationException: Attempted to read or write protected memory.
Let's work on the idea of pinning subarrays for a moment. How do we store the pinned pointers? In an array perhaps? An STL vector of pinned pointers will create pointers to pinned pointers, this is illegal - "pointer to pinned pointer disallowed by CLR". A native array defined using [] is illegal too - since "a native array cannot contain a managed type". A managed array (array<pin_ptr<double>>^ is illegal - "a managed array cannot have this type". Neither can you have an array of references to pin_ptrs because both pointers and references to pinned pointers are illegal. The bottom line is you can't store pinned pointers in a collection. What you can store, though, are GCHandles. So, to create an array of pinned pointers, the trick is: gcnew an array of GCHandles, use Length to find the number of pointers in the array, loop from i=o to maxdimensions, then use GCHandle::Alloc to pin the vectors in each dimension. You will then need to use pin_ptr to pin the array of pointers (GCHandle won't work since it doesn't pin Arrays of non-blittables).
GCHandles and pin_ptr are synergistic. What can a GCHandle do that a pin_ptr cannot do? Answer: GCHandles can be stored in collections. What can a pin_ptr do that a GCHandle cannot do? Answer: a pin_ptr can point to an array of a non-blittable type.
Copying Arrays
Given the complexities of using pin_ptrs and GCHandles, an easier alternative to interfacing with C and C++ functions is to simply copy managed arrays into their unmanaged equivalents. Here Marshal.Copy proves useful. (Marshal is one of the core classes in InteropServices).
Marshal.Copy copies data from managed arrays to unmanaged pointers and vice versa. Only problem is - it's designed for one-dimensional arrays. To copy multidimensional arrays into unmanaged memory, you need to write your own custom marshaller. This is easy provided you know how to program in C (knowledge of simple facts like - malloc returns void*, I need to cast it to a pointer to my specific data type e.g. int* or double*). Knowledge of C# is not enough, you need to know BOTH C AND C++ to be an awesome interop specialist. There is no restriction on the use of malloc and free from within managed C++ classes - use them to your advantage.
C++CLI MindMatter
Herb Sutter's blog - http://blogs.gotdotnet.com/hsutter/. Herb Sutter is the cool dude who brought us GOTW and joined MSFT's Developer and Platform Evangelism Division in 2002 to create more awesome C++ experiences in Visual Studio. Another very good post on malloc, new and custom memory allocation (useful in MC++) can be found here. And another interesting blog post on design choices in interop here.
C++/CLI is also formalised in an ECMA standard, number 372.
C++/CLI Programming Style (typedefs and pointer conversions)
If you decide to use typedefs for managed types e.g. typedef pin_ptr <const int>s; pin_constint32; typedef pin_ptr<const long> pin_constlong; the types used for managed and unmanaged code must exactly match the function signatures. For example, a pinned pointer to an int is not equivalent to a pinned pointer to a long, which is not equivalent to a pinned pointer to a const long, which is not equivalent to const long long. For interop to work, matches must be exact...otherwise you will get a lot of errors converting between interior pointers to pinned pointers.
Differences between C++ and Managed C++ (C++/CLI)
Can't supply default arguments to member functions of managed type, C3222.
Care with the Delete Operator
Calling delete twice on the same pointer officially results in undefined behaviour. In MSVC it results in the message: "Debug Assertion Failed! Expression: _BLOCK_TYPE_IS_VALID(pHead->nBlockUse). Calling delete on null pointers is safe though so it is good to set pointers to NULL after delete.
A double delete may happen in a ref class if a destructor and finalizer both delete the same unmanaged pointer.
Deterministic and Nondetermistic Finalization
"ref" classes in CLI extend IDisposable (automatically) and call the (deterministic) destructor when obj.Dispose() is called.
Criticism of CLI
There are a number of objections to CLI.

No comments: