Template Metaprogramming with C++

Home / Template Metaprogramming with C++

Template Metaprogramming with C++

November 24, 2015 | Article | 1 Comment

Template metaprogramming is a metaprogramming technique in which templates are used by a compiler to generate temporary source code, which is merged by the compiler with the rest of the source code and then compiled. The output of these templates include compile-time constants, data structures, and complete functions. The use of templates can be thought of as compile-time execution.

The Components

Using Template Metaprogramming technique requires two distinct operations: a template must be defined, and a defined templates must be instantiated. The template definition describes the generic form of the generated source code, and the instantiation cause a specific set of source code to be generated from the generic form in the template.

Template metaprogramming is generally Turing-complete, meaning that any computation expressible by a computer program can be computed, in some form, by a template metaprogram.

The key to template metaprogramming is expansion and precalculated value for constant literals.

Let see some function in non-template C++:

unsigned int factorial(unsigned int n) {
	return (n==0) ? 1 : n*factorial(n-1);
}
const int x = factorial(4); // equal to (4*3*2*1) == 24
const int y = factorial(0); // equal to 0! = 1

With the source code like above, the code will be executed runtime to determine factorial value of the literals 4 and 0.

By using template metaprogramming and template specialization to provide the ending condition for recursion, the factorials used in the program, ignoring any factorial not used, can be calculated at compile time by:

template <int N>
	struct Factorial {
	enum { value = N * Factorial<N-1>::value };
};
template <>
struct Factorial<0> {
	enum { value = 1 };
};
const int x = Factorial<4>::value;
const int y = Factorial<0>::value;

The code above calculates the factorial value of literals 4 and at compile time and uses the results as if they were precalculated constants. To be able to use templates in this manner, the compiler must know the value of its parameters at compile time, which has the natural precondition that Factorial<X>::value can only be used if X is known at compile time.

This articles would intent to give an introduction to template metaprogramming in C++.

The Different Kind of Metatemplates

For a simplicity, we make two kinds on metatemplates – ones that calculate constant value and ones that produce code. Note that the first kind never produce instructions that are executed at runtime.

Templates that Calculate a Value

Assume we want to calculate the number of bits set in an byte:

int bits_set(unsigned char byte) {
	int count = 0;
	for (int i = 0; i < 8; i++)
		if ( (0x1L << i) & byte )
			count++;
	return count;
}

In case where the byte is known at compile time this can also be done by the compiler:

template< unsigned char byte > class BITS_SET {
public:
	enum {
		B0 = (byte & 0x01) ? 1:0,
		B1 = (byte & 0x02) ? 1:0,
		B2 = (byte & 0x04) ? 1:0,
		B3 = (byte & 0x08) ? 1:0,
		B4 = (byte & 0x10) ? 1:0,
		B5 = (byte & 0x20) ? 1:0,
		B6 = (byte & 0x40) ? 1:0,
		B7 = (byte & 0x80) ? 1:0
	};
	enum { RESULT = B0+B1+B2+B3+B4+B5+B6+B7 };
}

We use an enum for temporary variables as well as for the result since they are easier to use and enumerators have the type of const int.

We can now use BITS_SET<15>::RESULT and get the constant 4 in result. In this case the compiler evaluate the line enum { RESULT = B0+B1+B2+B3+B4+B5+B6+B7}; to enum 1+1+1+1+0+0+0+0}; and finally to enum {RESULT = 4};

We also rely on TMP to calculate a value using loop. With TMP we rely on recursion which can naturally suit TMP. The following code is a compile-time factorial calculator:

template< int i >
class FACTOR{
public:
      enum {RESULT = i * FACTOR<I-1>::RESULT};
};

class FACTOR< 1 >{
  public:
      enum {RESULT = 1};
};

When we write something like

int j = FACTOR< 5 >::RESULT;

somewhere in our code the compiler will generate something like this:

mov DWORD PTR _j$[ebp], 120 ; 00000078H - a constant value!

As we instantiate FACTOR<5> the definition of this class depends on FACTOR<4>, which in turn depend on FACTOR<3> and so on until the compiler reach template-specialization FACTOR<1>. All the class is created by recursion and done by compile while the final program just contain a constant.

Template that Unroll Loops/ Specialize Functions

Template Metaprograms can generate useful code when interpreted by the compiler. Examples: a massively inlined algorithm that has its loops unrolled. This gives impact in large speed increase in the application.
Let see following code that do sum of number 1 to 1000:

int sum = 0;
for(int i = 1; i<=1000; i++) {
	sum += 1;
}

If you see, actually we did 2000 addition, rather than 1000. We have to increment i by one for each loop. In addition we perform a thousend test operations on the variable i. Another way to write the code would be:

int sum = 0;
sum += 1;
sum += 2;
...
sum += 1000;

This is the way template metaprogram would expand a loop. But the tradeoff is not small. The code size increase, to put it simply we could take a performance hit by increasing the number of page faults. In practice, the code is often invoked multiple times and already loaded in cache.

Loop unrolling

We can defined loop unrolling using recursive templates, similiar to calculating a value:

template< int i >
class LOOP{
  public:
    static inline void EXEC(){
      cout << "A-" << i << " ";
            LOOP< i-1 >::EXEC();
       cout << "B-" << i << " ";
    }
};
class LOOP< 0 >{
  public:
    static inline void EXEC(){
      cout << "A-" << i;
      cout << "\n";
       cout << "B-" << i;
    }
};

The output of LOOP< 8 >::EXEC() is:

A-8 A-7 A-6 A-5 A-4 A-3 A-2 A-1 A-0

B-0 B-1 B-2 B-3 B-4 B-5- B-6 B-7 B-8

Again the thing to notice is that there is no loop in the resulting binary code. The loop “unrolls” by itself to produce the code.

Another tricks:

IF – Conditional statement

template< bool Condition >
class IF {
public:
    static inline void EXEC(){
    cout << "Statement is true";
    }
};

class IF< false > {
public:
    static inline void EXEC(){
    cout << "Statement is false";
    }
};

SWITCH – Conditional statement

template< int _case >
class SWITCH {
public:
    static inline void EXEC(){
        cout << " SWITCH - default ";
    }
};

class SWITCH< 1 > {
    public:
    static inline void EXEC(){
        cout << " SWITCH - 1 ";
    }
};

class SWITCH< 2 > {
    public:
    static inline void EXEC(){
        cout << " SWITCH - 2 ";
    }
};

The example of usage of both classes:

SWITCH< 2 > mySwitch;
mySwitch.EXEC();
IF< false >::EXEC();
mySqitch.EXEC();

The output will be: “ SWITCH – 2 Statement is false SWITCH – 2

Using Meta-Metatemplates

It is possible to define a generic template for a special kind of operation, like an if- or for-statement. Let’s call it meta-metatemplates since the operation is defined in a class outside the template itseld. Let’s have simplest codes:

template< bool Condition, class THEN, class ELSE > struct IF {
    template< bool Condition > struct selector
    { typedef THEN SELECT_CLASS; };

    struct selector< false >
    { typedef ELSE SELECT_CLASS; };

    typedef selector< Condition >::SELECT_CLASS RESULT;
};

Example of usage:

struct THEN
{
 static int func()
 {cout << "Inside THEN";
  return 42;
 }
};

struct ELSE
{
 static int func()
 {cout << "Inside ELSE";
  return 0;
 }
};

int main(int argc, char* argv[])
{
 int result = IF< 4 == sizeof(int), THEN, ELSE >::RESULT::func();
 cout << " - returning: " << result;
}

On 32-bit architectures this will print “Inside THEN – returning: 42” to standard output. Please note that if func() is not defined insed ELSE this would be a simple compile-time assert breaking compilation on 4 != sizeof(int).

Suggestions

As template metaprogrammin is a great technique when used correctly, the failure can gives disadvantage as program might result in code bloat and performance decrease. Below are some advice to use TMP:

Use it when:

  • A Macro is not enough. Doing something more complex than a macro and we need it expanded before compiled.
  • Using recursive function with a predetermined number of loops. The overhead of function calls and setting up stack variables can be avoided and runtime will significantly decrease.
  • Using loops that can be unrolled at compile time. For example hash-algorithms like MD5 and SHA1 contains well-defined block processing loops that can be unrolled with TMP.
  • When calculating constant values. Having constant that depend on other constant in our program.
  • When the program should be portable to other platforms. TMP might be alternative

Don’t use it when:

  • A macro will do. In most cases a macro will be enough and often easier to understood.
  • Want a small executables. This is mislead as templates in general and TMP in particular will in often increase the code size.
  • The program already takes a long time to compile. You got extra compile time then.

References: http://www.codeproject.com/Articles/3743/A-gentle-introduction-to-Template-Metaprogramming

, ,

About Author

about author

xathrya

A man who is obsessed to low level technology.

1 Comment
  1. Implementation of Template Metaprogramming: CRC Checker - Xathrya.ID

    […] This article is intent as a complement for article Template Metaprogramming in C++. […]

Leave a Reply

Your email address will not be published. Required fields are marked *

Social media & sharing icons powered by UltimatelySocial