Tag Archive : programming

/ programming

Templates are a feature of the C++ programming language that allow function and classes to operate with generic types. This allows a function or class to work on many different data types without being rewritten for each one. The C++ Standard Library specify a special section for heavy use of template, called Standard Template Library.

Defining a template is like defining a blueprint. When we need a specific instance of function or class, the instance can be made from the generic blueprint. However, the common procedure in C++ which put the definition in a C++ header file and the implementation in a C++ source file can’t be applied to template. This is the behavior which is true for all compiler.

In this article we will discuss about how to separate declaration and implementation of template object (class / function). The article will be divided into three sections: how template is parsed, compilation issue, and the solution methods.

How Template is Parsed

Unlike other code, templates are parsed not only once, but twice. This process is explicitly defined in the C++ standard and although some compilers do ignore this, they are, in effect, non-compliant and may behave differently to what this article describes.

So what are those two processes? They are, let’s say, Point of Declaration (PoD) and Point of Instantiation (PoI).

1. Point of Declaration (PoD)

During the first parse, or Point of Declaration, the compiler checks the syntax of the template and match it with C++ grammar. This way, compiler does not consider the dependent types (the template parameters that form the templates types within the templates).

In real world, we can consider this phase as checking the grammar of a paragraph without checking the meaning of the words (the semantics). Grammatically the paragraph can be correct but the the arrangement of words may have no useful meaning. During the grammar checking phase, we don’t are about the meaning of the words. Our intention only toward the correct syntax.

Now, consider following template code:

template <typename T>
void foo (T const & t)
{
    t.bar();
}

This is syntactically correct. However, at this point we have no idea what type the dependent type T is so we just assume that in all cases of T it is correct to call member bar() on it. Of course, if type T doesn’t have this member then we have a problem but until we know what type T is we don’t know if there is a problem so this code is ok for now.

2. Point of Instantiation (PoI)

At PoD we have declare a template, which we have the blueprint. The instantiation process start at Point of Instantiation. At this point we actually define a concrete type of our template. So consider these 2 concrete instantiations of the template defined above…

foo(1); // this will fail the 2nd pass because an int (1 is an int) does not have a member function called bar()
foo(b); // Assuming b has a member function called bar this instantiation is fine

It is perfectly legal to define a template that won’t be corrected under all circumstances of instantiation. Since code for a template is not generated unless it is instantiated the compiler won’t complain unless we try to instantiate it.

At this point, the semantics are now checked against the known dependent type to make sure that the generated code will be correct. To do this, the compiler must be able to see the full definition of template. If the definition of the template is defined in a different place where it is being instantiated the compiler has no way to perform this check. Thus, the template won’t be instantiated and an error message will be delivered.

Remember that compiler can only see and process one translation unit (module / source code) at a time. Now, if the template is only used in one translation unit and the template is defined in same translation unit, no problem rises.

Now let’s recap what we got so far. If the template definition is in translation unit A and we try to instantiate it in translation unit B the compiler won’t be able to instantiate the template because it can’t see the full definition so it will result in linker errors (undefined symbols). If everything is in one place then it will work. but it is not a good way to write templates.

Template Compilation-Issue

From the first section we know that between template definition should be visible and on the same translation unit with the instantiation. Thus, the following code works:

/** BigFile.cpp **/
#ifndef _FOO_HPP_
#define _FOO_HPP_

template<typename T>
class Foo {
public:
   Foo() { }
   void setValue (T obj_i) { }
   T getValue () { return m_Obj; }

private:
   T m_Obj;
}

#endif

The above code is legal, but we can’t afford to this everytime. Sooner or later we’ll probably end up using the template in other translation units because it is highly unlikely (although not improbable) that we’d go to all the effort of creating a generic template class/function that we’ll only ever use in one place.

Our first effort to separate the declaration and implementation (or definition) would be:

/** Foo.hpp **/
#ifndef _FOO_HPP_
#define _FOO_HPP_

template<typename T>
class Foo {
public:
   Foo();
   void setValue (T obj_i);
   T getValue ();

private:
   T m_Obj;
}

#endif

and

#include "Foo.hpp"

Foo::Foo()
{
}

void Foo::setValue (T obj_i)
{
}

T Foo::getValue()
{
   return m_Obj;
}

Now if we try to implement the template class like a normal class implementation as shown above, it will generate a set of compilation error.

To compile the class without any errors, we need to put the template specific declaration in a source file (.cpp file)

#include "Foo.hpp"

template <typename T>
Foo<T>::Foo()
{
}

template <typename T>
void Foo<T>::setValue (T obj_i)
{
}

template <typename T>
T Foo<T>::getValue()
{
   return m_Obj;
}

We pass the compilation phase (the PoD). However the next problem we have is in linking. With above code, after resolving all the compilation errors, we may face linking error issue while we create an object of this class in any file other than Foo.cpp.

For example we have a client source code which instantiate the template (in different translation unit):

/** Client.cpp **/
#include "Foo.hpp"

int main() {
   Foo<int> temp;

   return 0;
}

We have known that the declaration+definition and instantiation are in different translation unit. The client code in Client.cpp can’t access the template implementation source (Foo.cpp) thus compiler refuse to instantiate the template. Compiler have no idea how to construct the Foo member function. And if we have put the implementation in a source file and made it a separate part of the project, the compiler won’t be able to find it when it is trying to compile the client source file.

The inclusion of Foo header file won’t be sufficient at this time. It only tells the compiler how to allocate the object data and how to build the calls to the member functions, not how to build the member functions. The compiler won’t complain. It will assume that these functions are provided elsewhere, and leave it to the linker to find them. So, when it’s time to link, you will get “unresolved references” to any of the class member functions that are not defined “inline” in the class definition.

So how do we structure our code so that the compiler can see the definition of the template in all translation units where it is instantiated? The solution is really quite simple, put the templates definition somewhere that is visible to all PoIs and that is. All the methods will be described at next section.

Solution Methods

There are different methods to solve our presented problem. We can select from any of these methods depending on which suitable for our application design.

Method 1

We can create an object of a template class in the same source file where it is implemented (TestTemp.cpp). So, there is no need to link the object creation code with its actual implementation in some other file. This will cause the compiler to compile these particular types so the associated class member functions will be available at link time. Here is the sample code:

/** Foo.hpp **/
#ifndef _FOO_HPP_
#define _FOO_HPP_

template<typename T>
class Foo {
public:
   Foo();
   void setValue (T obj_i);
   T getValue ();

private:
   T m_Obj;
}

#endif

and

#include "Foo.hpp"

Foo::Foo()
{
}

void Foo::setValue (T obj_i)
{
}

T Foo::getValue()
{
   return m_Obj;
}

/* We define a function but no need to call this temporary function.
 * It is used only to avoid link error
 *
void TemporaryFunction()
{
   Foo<int> temp;
}

also the client:

/** Client.cpp **/
#include "Foo.hpp"

int main() {
   Foo<int> temp;
   temp.setValue(3);
   int value = temp.getValue();
   return 0;
}

The temporary function in Foo.cpp will solve the link error. No need to call this function because it’s global.

Method 2

In this method we include source file that implements our template in our client source file.

/** Foo.hpp **/
#ifndef _FOO_HPP_
#define _FOO_HPP_

template<typename T>
class Foo {
public:
   Foo();
   void setValue (T obj_i);
   T getValue ();

private:
   T m_Obj;
}

#endif

and

#include "Foo.hpp"

Foo::Foo()
{
}

void Foo::setValue (T obj_i)
{
}

T Foo::getValue()
{
   return m_Obj;
}

also the client:

/** Client.cpp **/
#include "Foo.hpp"
#include "Foo.cpp"

int main() {
   Foo<int> temp;
   temp.setValue(3);
   int value = temp.getValue();
   return 0;
}

Method 3

In this method we include the source file that implements our template class (Foo.cpp) in our header file that declare the template class (Foo.hpp), and remove the source file form the project, not from the folder.

/** Foo.hpp **/
#ifndef _FOO_HPP_
#define _FOO_HPP_

template<typename T>
class Foo {
public:
   Foo();
   void setValue (T obj_i);
   T getValue ();

private:
   T m_Obj;
}

#include "Foo.cpp"

#endif

and

Foo::Foo()
{
}

void Foo::setValue (T obj_i)
{
}

T Foo::getValue()
{
   return m_Obj;
}

also the client:

/** Client.cpp **/
#include "Foo.hpp"

int main() {
   Foo<int> temp;
   temp.setValue(3);
   int value = temp.getValue();
   return 0;
}

Ten C++11 Features You Should Know and Use

December 11, 2015 | Article | No Comments

This article will be a resume to several articles discussing individual subject.

There are lots of new additions to the C++ language and standard library after C++11 standard passed. However, I believe some of these new features should become routing for all C++ developers.

Features we are talking about:

  1. auto & decltype
  2. nullptr
  3. Range-based for loops
  4. Override and final
  5. Strongly-typed enums
  6. Smart pointers
  7. Lambdas
  8. non-member begin() and end()
  9. static_assert and type traits
  10. Move semantics

auto & decltype

More: Improved Typed Reference in C++11: auto, decltype, and new function declaration syntax

Before C++11 era, keyword auto was used for storage duration specification. In the new standard, C++ define clearly the purpose to be type inference. Keyword auto is a placeholder for a type, telling the compiler it has to deduce the actual type of a variable that is being declared from its initializer.

auto I = 42;        // I is an int
auto J = 42LL;      // J is an long long
auto P = new foo(); // P is a foo* (pointer to foo)

Using auto means less code for writing typename. The very convenience use of auto would be inferring type for iterator:

std::map<std::string, std::vector<int>> myMap;
for (auto it = begin(map); it != end(map); ++it)
{
//...
}

Here we save lot of works by order compiler to deduce the type of it.

decltype in other hand is a handy keyword to get type of an expression. Here we can inspect what’s going on this code:

short a = 10;
long b = 655351334;
decltype(a+b) c = 5;

std::cout << sizeof(a) << " " << sizeof(b) << " " << sizeof(c) << std::endl;

When we execute this code using proper C++11 compiler, we got variable c as a type used for summation of a and b. We know that when a short is summed with a long, the short variable will be typecasted automatically to type sufficient enough to hold the result code. And the decltype will give us it’s type.

As said before, decltype is used to get type of an expression, therefore it is valid for use to do this:

int function(int a, int b)
{
	return a * b;
}

int main()
{
	decltype(function(a,b)) c = 10;

	return 0;
}

As long as the expression involved is valid.

Now, in C++ we have a new function declaration syntax. This syntax leverage the power of both auto and decltype. Note that auto cannot be used as the return type of a function, so we must have a trailing return type. In this case auto does not tell the compiler it has to infer the type, it only instructs it to look for the return type at the end of the function.

template <typename T1, typename T2>
auto compose(T1 t1, T2 t2) -> decltype (t1 + t2)
{
	return t1+t2;
}

In above snippet, we compose the return type of function from the type of operator + that sums the values of types T1 and T2.

nullptr

More: Nullptr, Strongly typed Enumerations, and Cstdint

Since the inception of C++, zero is used as the value of null pointers. This is a direct influence from C language. The system itself has drawbacks due to the implicit conversion to integral types.

void function(int a);
void function(void* a);

function(NULL);

Now, which one is being called? On smarter compiler, it will gives error saying “the call is ambiguous”.

C++11 library gives solution for this. Keyword nullptr denotes a value of type std::nullptr_t that represents the null pointer literal. Implicit conversions exists from nullptr to null pointer value of any pointer type and any pointer-to-member types, but also to bool (as false). But no implicit conversion to integral types exists.

void foo(int* p) {}

void bar(std::shared_ptr<int> p) {}

int* p1 = NULL;
int* p2 = nullptr;   
if(p1 == p2)
{
}

foo(nullptr);
bar(nullptr);

bool f = nullptr;
int i = nullptr; // error: A native nullptr can only be converted to bool or, using reinterpret_cast, to an integral type

Using 0 is still valid for backward compatibility.

Range-based for loops

More: C++ Ranged Based Loop

Ever wonder how could we do foreach statement in C++? Joy for us because C++11 now augmented the for statement to support it. Using this foreach paradigm we can iterate over collections. In the new form, it is possible to iterate over C-like arrays, initializer lists, and anything for which the non-member begin() and end() functions are overloaded.

std::map<std::string, std::vector<int>> map;
std::vector<int> v;
v.push_back(1);
v.push_back(2);
v.push_back(3);
map["one"] = v;

for(const auto& kvp : map) 
{
  std::cout << kvp.first << std::endl;

  for(auto v : kvp.second)
  {
     std::cout << v << std::endl;
  }
}

int arr[] = {1,2,3,4,5};
for(int& e : arr) 
{
  e = e*e;
}

The syntax is not different from “normal” for statement, okay it is a little different.

The for syntax in this paradigm is:

for (type iterateVariable : collection)
{
// ...
}

Override and final

In C++, there isn’t a mandatory mechanism to mark virtual methods as overriden in derived class. The virtual keyword is optional and that makes reading code a bit harder, because we may have to look through the top of the hierarchy to check if the method is virtual.

class Base 
{
public:
   virtual void f(float);
};

class Derived : public Base
{
public:
   virtual void f(int);
};

Derived::f is supposed to override Base::f. However, the signature differ, one takes a float, one takes an int, therefor Base::f is just another method with the same name (and overload) and not an override. We may call f() through a pointer to B and expect to print D::f, but it’s printing B::f.

C++11 provides syntax to solve this problem.

class Base 
{
public:
   virtual void f(float);
};

class Derived : public Base
{
public:
   virtual void f(int) override;
};

Keyword override force the compiler to check the base class(es) to see if there is a virtual function with this exact signature. When we compile this code, it will triggers a compile error because the function supposed to override the base class has different signature.

On the other hand if you want to make a method impossible to override any more (down the hierarchy) mark it as final. That can be in the base class, or any derived class. If it’s in a derived class, we can use both the override and final specifiers.

class Base 
{
public:
   virtual void f(float);
};

class Derived : public Base
{
public:
   virtual void f(int) override final;
};

class F : public Derived
{
public:
   virtual void f(int) override;
}

Function declared as ‘final’ cannot be overridden by ‘F::f’.

Strongly-typed enums

More: Nullptr, Strongly typed Enumerations, and Cstdint

Traditional enums in C++ have some drawbacks: they export their enumerators in the surrounding scope (which can lead to name collisions, if two different enums in the same have scope define enumerators with the same name), they are implicitly converted to integral types and cannot have a user-specified underlying type.

C++11 introduces a new category of enums, called strongly-typed enums. They are specified with the “enum class” keyword which won’t export their enumerators in the surrounding scope and no longer implicitly converted to integral types. Thus we can have a user specified underlying type.

enum class Options { None, One, All };
Options o = Options::All;

Smart pointers

All the pointers are declared in header <memory>

In this article we will only mention smart pointers with reference counting and auto releasing of owned memory that are available:

  • unique_ptr: should be used when ownership of a memory resource does not have to be shared (it doesn’t have a copy constructor), but it can be transferred to another unique_ptr (move constructor exists).
  • shared_ptr: should be used when ownership of a memory resource should be shared (hence the name).
  • weak_ptr: holds a reference to an object managed by a shared_ptr, but does not contribute to the reference count; it is used to break dependency cycles (think of a tree where the parent holds an owning reference (shared_ptr) to its children, but the children also must hold a reference to the parent; if this second reference was also an owning one, a cycle would be created and no object would ever be released).

The library type auto_ptr is now obsolete and should no longer be used.

The first example below shows unique_ptr usage. If we want to transfer ownership of an object to another unique_ptr use std::move. After the ownership transfer, the smart pointer that ceded the ownership becomes null and get() returns nullptr.

void foo(int* p)
{
   std::cout << *p << std::endl;
}
std::unique_ptr<int> p1(new int(42));
std::unique_ptr<int> p2 = std::move(p1); // transfer ownership

if(p1)
  foo(p1.get());

(*p2)++;

if(p2)
  foo(p2.get());

The second example shows shared_ptr. Usage is similar, though the semantics are different since ownership is shared.

void foo(int* p)
{
   std::cout << *p << std::endl;
}
void bar(std::shared_ptr<int> p)
{
   ++(*p);
}
std::shared_ptr<int> p1(new int(42));
std::shared_ptr<int> p2 = p1;

foo(p2.get());
bar(p1);   
foo(p2.get());

We can also make equivalent expression for first declaration as:

auto p3 = std::make_shared<int>(42);

make_shared<T> is a non-member function and has the advantage of allocating memory for the shared object and the smart pointer with a single allocation, as opposed to the explicit construction of a shared_ptr via the contructor, that requires at least two allocations. In addition to possible overhead, there can be situations where memory leaks can occur because of that. In the next example memory leaks could occur if seed() throws an error.

void foo(std::shared_ptr<int> p, int init)
{
   *p = init;
}
foo(std::shared_ptr<int>(new int(42)), seed());

No such problem exists if using make_shared.

The last sample shows usage of weak_ptr. Notice that you always must get a shared_ptr to the referred object by calling lock(), in order to access the object.

auto p = std::make_shared<int>(42);
std::weak_ptr<int> wp = p;

{
  auto sp = wp.lock();
  std::cout << *sp << std::endl;
}

p.reset();

if(wp.expired())
  std::cout << "expired" << std::endl;

Lambdas

More: Guide to Lambda Closure in C++11

Lambda is anonymous function. It is powerful feature borrowed from functional programming that in turned enabled other features or powered library. We can use lambda wherever a function object or a functor or a std::function is expected.

You can read the expression here.

std::vector<int> v;
v.push_back(1);
v.push_back(2);
v.push_back(3);

std::for_each(std::begin(v), std::end(v), [](int n) {std::cout << n << std::endl;});

auto is_odd = [](int n) {return n%2==1;};
auto pos = std::find_if(std::begin(v), std::end(v), is_odd);
if(pos != std::end(v))
  std::cout << *pos << std::endl;

A bit trickier are recursive lambdas. Imagine a lambda that represents a Fibonacci function. If you attempt to write it using auto you get compilation error:

auto fib = [&fib](int n) {return n < 2 ? 1 : fib(n-1) + fib(n-2);};

The problem is auto means the type of the object is inferred from its initializer, yet the initializer contains a reference to it, therefore needs to know its type. This is a cyclic problem. The key is to break this dependency cycle and explicitly specify the function’s type using std::function.

std::function<int(int)> lfib = [&lfib](int n) {return n < 2 ? 1 : lfib(n-1) + lfib(n-2);};

non-member begin() and end()

Two new addition to standard library, begin() and end(), gives new flexibility. It is promoting uniformity, concistency, and enabling more generic programming which work with all STL containers. These two functions are overloadable, can be extended to work with any type including C-like arrays.

Let’s take an example. We want to print first odd element on a C-like array.

int arr[] = {1,2,3};
std::for_each(&arr[0], &arr[0]+sizeof(arr)/sizeof(arr[0]), [](int n) {std::cout << n << std::endl;});

auto is_odd = [](int n) {return n%2==1;};
auto begin = &arr[0];
auto end = &arr[0]+sizeof(arr)/sizeof(arr[0]);
auto pos = std::find_if(begin, end, is_odd);
if(pos != end)
  std::cout << *pos << std::endl;

With non-member begin() and end() it could be put as this:

int arr[] = {1,2,3};
std::for_each(std::begin(arr), std::end(arr), [](int n) {std::cout << n << std::endl;});

auto is_odd = [](int n) {return n%2==1;};
auto pos = std::find_if(std::begin(arr), std::end(arr), is_odd);
if(pos != std::end(arr))
  std::cout << *pos << std::endl;

This is basically identical code to the std::vector version. That means we can write a single generic method for all types supported by begin() and end().

template <typename Iterator>
void bar(Iterator begin, Iterator end) 
{
   std::for_each(begin, end, [](int n) {std::cout << n << std::endl;});

   auto is_odd = [](int n) {return n%2==1;};
   auto pos = std::find_if(begin, end, is_odd);
   if(pos != end)
      std::cout << *pos << std::endl;
}

template <typename C>
void foo(C c)
{
   bar(std::begin(c), std::end(c));
}

template <typename T, size_t N>
void foo(T(&arr)[N])
{
   bar(std::begin(arr), std::end(arr));
}

int arr[] = {1,2,3};
foo(arr);

std::vector<int> v;
v.push_back(1);
v.push_back(2);
v.push_back(3);
foo(v);

static_assert and type traits

static_assert performs an assertion check at compile-time. If the assertion is true, nothing happens. If the assertion is false, the compiler displays the specified error message.

template <typename T, size_t Size>
class Vector
{
   static_assert(Size < 3, "Size is too small");
   T _points[Size];
};

int main()
{
   Vector<int, 16> a1;
   Vector<double, 2> a2;
   return 0;
}

static_assert becomes more useful when used together with type traits. These are a series of classes that provide information about types at compile time. They are available in the <type_traits> header. There are several categories of classes in this header: helper classes, for creating compile-time constants, type traits classes, to get type information at compile time, and type transformation classes, for getting new types by applying transformation on existing types.

In the following example function add is supposed to work only with integral types.

template <typename T1, typename T2>
auto add(T1 t1, T2 t2) -> decltype(t1 + t2)
{
   return t1 + t2;
}

However, there are no compiler errors if one writes

std::cout << add(1, 3.14) << std::endl;
std::cout << add("one", 2) << std::endl;

The program actually prints 4.14 and “e”. But if we add some compile-time asserts, both these lines would generate compiler errors.

template <typename T1, typename T2>
auto add(T1 t1, T2 t2) -> decltype(t1 + t2)
{
   static_assert(std::is_integral<T1>::value, "Type T1 must be integral");
   static_assert(std::is_integral<T2>::value, "Type T2 must be integral");

   return t1 + t2;
}

Move semantics

More: Move Semantics and rvalue references in C++11

C++11 has introduced the concept of rvalue references (specified with &&) to differentiate a reference to an lvalue or an rvalue. An lvalue is an object that has a name, while an rvalue is an object that does not have a name (temporary object). The move semantics allow modifying rvalues (previously considered immutable and indistinguishable from const& types).

A C++ class/struct used to have some implicit member functions: default constructor (only if another constructor is not explicitly defined) and copy constructor, a destructor and a copy assignment operator. The copy constructor and the copy assignment operator perform a bit-wise (or shallow) copy, i.e. copying the variables bitwise. That means if you have a class that contains pointers to some objects, they just copy the value of the pointers and not the objects they point to. This might be OK in some cases, but for many cases you actually want a deep-copy, meaning that you want to copy the objects pointers refer to, and not the values of the pointers. In this case you have to explicitly write copy constructor and copy assignment operator to perform a deep-copy.

What if the object you initialize or copy from is an rvalue (a temporary). You still have to copy its value, but soon after the rvalue goes away. That means an overhead of operations, including allocations and memory copying that after all, should not be necessary.

Enter the move constructor and move assignment operator. These two special functions take a T&& argument, which is an rvalue. Knowing that fact, they can modify the object, such as “stealing” the objects their pointers refer to. For instance, a container implementation (such as a vector or a queue) may have a pointer to an array of elements. When an object is instantiating from a temporary, instead of allocating another array, copying the values from the temporary, and then deleting the memory from the temporary when that is destroyed, we just copy the value of the pointer that refers to the allocated array, thus saving an allocation, copying a sequence of elements, and a later deallocation.

The following example shows a dummy buffer implementation. The buffer is identified by a name (just for the sake of showing a point revealed below), has a pointer (wrapper in an std::unique_ptr) to an array of elements of type T and variable that tells the size of the array.

template <typename T>
class Buffer 
{
   std::string          _name;
   size_t               _size;
   std::unique_ptr<T[]> _buffer;

public:
   // default constructor
   Buffer():
      _size(16),
      _buffer(new T[16])
   {}

   // constructor
   Buffer(const std::string& name, size_t size):
      _name(name),
      _size(size),
      _buffer(new T[size])
   {}

   // copy constructor
   Buffer(const Buffer& copy):
      _name(copy._name),
      _size(copy._size),
      _buffer(new T[copy._size])
   {
      T* source = copy._buffer.get();
      T* dest = _buffer.get();
      std::copy(source, source + copy._size, dest);
   }

   // copy assignment operator
   Buffer& operator=(const Buffer& copy)
   {
      if(this != &copy)
      {
         _name = copy._name;

         if(_size != copy._size)
         {
            _buffer = nullptr;
            _size = copy._size;
            _buffer = _size > 0 > new T[_size] : nullptr;
         }

         T* source = copy._buffer.get();
         T* dest = _buffer.get();
         std::copy(source, source + copy._size, dest);
      }

      return *this;
   }

   // move constructor
   Buffer(Buffer&& temp):
      _name(std::move(temp._name)),
      _size(temp._size),
      _buffer(std::move(temp._buffer))
   {
      temp._buffer = nullptr;
      temp._size = 0;
   }

   // move assignment operator
   Buffer& operator=(Buffer&& temp)
   {
      assert(this != &temp); // assert if this is not a temporary

      _buffer = nullptr;
      _size = temp._size;
      _buffer = std::move(temp._buffer);

      _name = std::move(temp._name);

      temp._buffer = nullptr;
      temp._size = 0;

      return *this;
   }
};

template <typename T>
Buffer<T> getBuffer(const std::string& name) 
{
   Buffer<T> b(name, 128);
   return b;
}
int main()
{
   Buffer<int> b1;
   Buffer<int> b2("buf2", 64);
   Buffer<int> b3 = b2;
   Buffer<int> b4 = getBuffer<int>("buf4");
   b1 = getBuffer<int>("buf5");
   return 0;
}

The default copy constructor and copy assignment operator should look familiar. What’s new to C++11 is the move constructor and move assignment operator, implemented in the spirit of the aforementioned move semantics. If you run this code you’ll see that when b4 is constructed, the move constructor is called. Also, when b1 is assigned a value, the move assignment operator is called. The reason is the value returned by getBuffer() is a temporary, i.e. an rvalue.

You probably noticed the use of std::move in the move constructor, when initializing the name variable and the pointer to the buffer. The name is actually a string, and std::string also implements move semantics. Same for the std::unique_ptr. However, if we just said _name(temp._name) the copy constructor would have been called. For _buffer that would not have been even possible because std::unique_ptr does not have a copy constructor. But why wasn’t the move constructor for std::string called in this case? Because even if the object the move constructor for Buffer is called with is an rvalue, inside the constructor it is actually an lvalue. Why? Because it has a name, “temp” and a named object is an lvalue. To make it again an rvalue (and be able to invoke the appropriate move constructor) one must use std::move. This function just turns an lvalue reference into an rvalue reference.

An alternative implementation:

template <typename T>
class Buffer
{
   std::string          _name;
   size_t               _size;
   std::unique_ptr<T[]> _buffer;

public:
   // constructor
   Buffer(const std::string& name = "", size_t size = 16):
      _name(name),
      _size(size),
      _buffer(size? new T[size] : nullptr)
   {}

   // copy constructor
   Buffer(const Buffer& copy):
      _name(copy._name),
      _size(copy._size),
      _buffer(copy._size? new T[copy._size] : nullptr)
   {
      T* source = copy._buffer.get();
      T* dest = _buffer.get();
      std::copy(source, source + copy._size, dest);
   }

   // copy assignment operator
   Buffer& operator=(Buffer copy)
   {
       swap(*this, copy);
       return *this;
   }

   // move constructor
   Buffer(Buffer&& temp):Buffer()
   {
      swap(*this, temp);
   }

   friend void swap(Buffer& first, Buffer& second) noexcept
   {
       using std::swap;
       swap(first._name  , second._name);
       swap(first._size  , second._size);
       swap(first._buffer, second._buffer);
   }
};

JavaScript: String Objects

December 11, 2015 | Article | No Comments

This article is a supplement for Compact Tutorial on JavaScript Basic.

We can create a string primitives by giving them some characters to hold.

var myPrimitiveString = "Xathrya Sabertooth";

A String object does things slight differently, not only allowing us to store characters, but also providing a way to manipulate and change those characters.

Creating String Object

Declaring a new variable and assign it a new string primitive to initialize it. Now we try using typeof() to make sure data in the variable myPrimitiveString:

<HTML>
<BODY>
<SCRIPT type="text/javascript">
    var myPrimitiveString = "Xathrya Sabertooth";
    document.write (typeof(myPrimitiveString));
</SCRIPT>
</BODY>
</HTML>

We can still use the String object’s methods on it, though. JavaScript will simply convert the string primitive to a temporary string object, use the method on it, and then change the data type back to string. Now let’s try out using the length property of the String object.

<HTML>
<BODY>
<SCRIPT type="text/javascript">
    var myPrimitiveString = "Xathrya Sabertooth";
    document.write ( typeof( myPrimitiveString ) );
    document.write ( "<br>" );
    document.write ( myPrimitiveString.length );
    document.write ( "<br>" );
    document.write ( typeof( myPrimitiveString ) );
</SCRIPT>
</BODY>
</HTML>

Which should give this result:

string
18
string

So, myPrimitiveString is still holding a string primitive after the temporary conversion. You can also create String objects explicitly, using the new keyword together with the String() constructor.

<HTML>
<BODY>
<SCRIPT type="text/javascript">
    var myObjectString = new String("Xathrya Sabertooth");
    document.write ( typeof( myObjectString) );
    document.write ( "<br>" );
    document.write ( myObjectString.length );
    document.write ( "<br>" );
    document.write ( typeof( myObjectString) );
</SCRIPT>
</BODY>
</HTML>

Which should give this result:

object
18
object

The only difference between this script and the previous one is the myObjectString is a new object, created and supplied with some characters.

var myObjectString = new String("Xathrya Sabertooth");

The result of checking the length property is the same whether we create the String object implicitly or explicitly. The only real difference is that creating string object explicitly is marginally more efficient if we’re going to be using the same String object again and again. Explicitly creating String objects also helps prevent the JavaScript interpreter getting confused between numbers and strings, as it can do.

String Object’s Methods

The String object has a lot of methods, so we will limit to two of them: indexOf() and substring() methods. The more complete list of String object’s properties and methods can be found at the handout section.

String is made of characters (it is sequence of characters). Each of these characters is given an index, zero-based index. So the first character’s position has the index 0, the second is 1, and so on. The method indexOf() finds and returns the position in the index at which a substring begins (and the lastIndexOf() method returns the position of last occurrence of the substring).

We can use this method for example, checking the position (and existence) of symbol @ in our entry when we ask user to input email address. No guarantee that the email address is valid, but at least it would go some way to that direction.

We will use prompt() function to obtain the user’s e-mail address and then check the input for the @ symbol, returning the index of the symbol using indexOf().

<html>
<body>
<script type="text/javascript">
    var userEmail= prompt("Please enter your emailaddress ", "" );
    document.write( userEmail.indexOf( "@" ) );
</script>
</body>
</html>

If the @ is not found, -1 is written to the page. As long as the character is in string, the indexOf() will return the index, something greater than -1.

The substring() method carves one string from another string. It takes index of start and end position of the substring as parameters. We can return everything from the first index to the end of the string by leaving off the second argument.

So, to extract all the character from the third character (at index 2) to the sixth character (index 5), we can write as:

<html>
<body>
<script type="text/javascript">
    var characterName = "I am Xathrya Sabertooth";
    var lastNameIndex = characterName.indexOf( "Xathrya " ) + 8;
    var lastName = characterName.substring( lastNameIndex );
    document.write( lastName );
</script>
</body>
</html>

We are extracting Sabertooth from the stirng in the variable characterName. We first find the start of the first name and add it with 8 to find the index of the first name (because “Xathrya ” is 8 characters long). We use indexOf here. The result is stored on lastNameIndex. Using that value, we extract the substring of lastname, form lastName Index to unspecified final index. The rest of the characters in the string will be returned.

Handouts

  1. JavaScript String Properties and Methods List (PDF)

NodeJS HTTPS

December 11, 2015 | Article | No Comments

HTTPS or HTTP Secure is the HTTP protocol over TLS. It is the secure version of HTTP and implemented as separate module in Node. The API itself is very similar to the HTTP one, with some small differences.

HTTPS Server

To create a server, we can do:

var https = require('https'),
    fs = require('fs');

var options = {
    key: fs.readFileSync('/path/to/server/private-key.pem');
    cert: fs.readFileSync('/path/to/server/certificate.pem');
};

https.createServer(options, function(req,res) {
    res.writeHead(200, {'Content-Type': 'text/plain'});
    res.end('Hello World!');
});

So here, the first argument to https.createServer is an options object that much like in TLS module, provides the private key and the certificate strings.

HTTPS Client

To make a HTTPS request, we must use https module:

var https = require('https'),
    options = {
        host: 'encrypted.google.com',
        port: 443,
        path: '/',
        method: 'GET'
    };

var req = https.request(options, function(res) {
    console.log("statusCode: ", res.statusCode);
    console.log("headers: ", res.headers);

    res.on('data', function(d) {
        process.stdout.write(d);
    });
});
req.end();

Here options we can change:

  • port: port of host to request to. Defaults is 443
  • key: the client private key string to use for SSL. Defaults to null.
  • cert: the client certificate to use. Defaults to null
  • ca: an authority certificate or array of authority certificates to check the remote host against.

We may want to use the key and cert options if the server needs to verify the client.

Much like http module, this module also offers a shortcut https.get method that can be used:

var https = require('https');
var options = { host: 'encrypted.google.com', path: '/' };
https.get(options, function(res) {
    res.on('data', function(d) {
        process.console.log(d.toString());
    });
});

NodeJS TLS / SSL

December 11, 2015 | Article | No Comments

Transport Layer Security (TLS) is a successor of Secure Socket Layer (SSL) protocol. The technology allow client /server application to communicate across a network in a way designed to prevent eavesdropping and tampering. TLS and SSL encrypt the segments of network connections above the Transport layer, enabling both privacy and message authentication.

Node uses OpenSSL to provide TLS and/or SSL encrypted stream communication.

TLS is a standard base on the earlier SSL specification. In fact, TLS 1.0 is also known as SSL 3.1, and the latest version (TLS 1.2) is also known as SSL 3.3. From now on we will use TLS instead of SSL.

Public / Private Keys

The keys used in TLS is in pair of public / private keys. A public key is a key which is open publicly, where used by other party to encrypt data they want to sent to us. While the private key, like implied by name, is only known by us or our machine. This key is used to decrypt the message sent by other machine.

Private Key

Each client and server must have a private key. A private key can be generated by openssl utility on the command line by:

openssl genrsa -out private-key.pem 1024

This should create a file named private-key.pem with our private key.

Public Key

All servers and some clients need to have a certificate. Certificates are public keys signed by a Certificate Authority or self-signed. The first step to getting a certificate is to create a “Certificate Signing Request” file. This can be done with:

openssl req -new -key private-key.pem -out csr.pem

This will create CSR named node.csrcsr.pem and using our generated key (private-key.pem). When you are asked for some data, answer it. They will be written in certificate.

The purpose of CSR is to request certificate. That is, if we want a CA (Certificate Authority) to sign our certificate, we could give this file to them to process and they will give us back a certificate.

Alternatively, we can create self-signed certificate with the CSR we had.

openssl x509 -req -in csr.pem -signkey private-key.pem -out certificate.pem

Thus we have our certificate file certificate.pem

TLS Client

We can connect to a TLS server using something like this:

var tls = require('tls'),
    fs = require('fs'),
    port = 3000,
    host = '192.168.1.135',
    options = {
        key: fs.readFileSync('/path/to/private-key.pem'),
        cert: fs.readFileSync('/path/to/certificate.pem')
    };

var client = tls.connect(port, host, options, function() {
    console.log('connected');
    if (client.authorized) {
        console.log('authorized: ' + client.authorized);
        client.on('data', function(data) {
            client.write(data);    // Just send data back to server
        });
    } else {
        console.log('connection not authorized: ' + client.authorizationError);
    }
});

First we need to inform Node of the client private key and client certificate, which should be strings. We are then reading the pem files into memory using synchronous version of fs.readFile, fs.readFileSync.

Then, we are connecting to the server. tls.connect returns a CryptoStream object, which we can use normally as ReadStream and WriteStream. We then wait for data from server as we would on a ReadStream, and then we send it back to the server.

TLS Server

TLS server is a subclass of net.Server. With it, we can make everything we can with a net.Server, except that we are doing over a secure connection.

var tls = require('tls'),
    fs = require('fs');
    options = {
        key: fs.readFileSync('/path/to/private-key.pem'),
        cert: fs.readFileSync('/path/to/certificate.pem')
    };

tls.createServer(options, function(s) {
    s.pipe(s);
}).listen(4004);

Beside key and cert options, tls.createServer also accepts:

  • requestCert: if true, the server will request a certificate from clients that connect and attempt to verify that certificate. The default value is false.
  • rejectUnauthorized: If true, the server will reject any connection which is not authorized with the list of supplied CAs. This option only has effect if requestCert is true. The default value is false.

Verification

On both the client and the server APIs, the stream has a property named authorized. This is a boolean indicating if the client was verified by one of the certificate authorities you are using, or one that they delegate to. If s.authorized is false, then s.authorizationError contains the description of how the authorization failed.

NodeJS Streaming HTTP Chunked Responses

December 9, 2015 | Article | No Comments

NodeJS is extremely streamable, including HTTP responses. HTTP being a first-class protocol in Node, make HTTP response streamable in very convenient way.

HTTP chunked encoding allows a server to keep sending data to the client without ever sending the body size. Unless we specify a “Content-Length” header, Node HTTP server sends the header

Transfer-Encoding: chunked

to client, which makes it wait for a final chunk with length of 0 before giving the response as terminated.

This can be useful for streaming data – text, audio, video – or any other into the HTTP client.

Streaming Example

Here we are going to code an pipes the output of a child process into the client:

var spawn = require('child_process').spawn;

require('http').createServer(function(req, res) {
    var child = spawn('tail', ['-f', '/var/log/system.log']);
    child.stdout.pipe(res);
    res.on('end', function() {
        child.kill();
    });
}).listen(4000);

Here we are creating an HTTP server and binding it to port 4000.

When there is a new request, we launch a new child process by executing the command “tail -f /var/log/system.log” which output is being piped into the response.

When response ends (because the browser window was closed, or the network connection severed, etc), we kill the child process so it does not hang around indefinitely.

NodeJS Child Processes

December 9, 2015 | Article | No Comments

Child process is a process created by another process (the parent process). The child inherits most of its parent’s attributes, such as file descriptors.

On Node, we can spawn child processes, which can be another Node process or any process we can launch from the command line. For that we will have to provide the command and arguments to execute it. We can either spawn and live along side with the process (spawn), or we can wait until it exits (exec).

Executing Command

We can launch another process and wait for it to finish like this:

var exec = require('child_process').exec;

exec('cat *.js | wc -l', function(err, stdout, stderr) {
    if (err) {
        console.log('child process exited with error code ' + err.code);
        return;
    }
    console.log(stdout);
});

Here we are executing a command, represented as string like what we type on terminal, as first argument of exec(). Here our command is “cat *.js | wc -l” which has two commands piped. The first command is print out the content of every file which has .js extension. The second argument will retrieve data from the pipe and count the line for each file. The second argument is a callback which will be invoked once the exec has finished.

If the child process returned an error code, the first argument of the callback will contain an instance of Error, with the code property set to the child exit code.

If not, the output of stdout and stderr will be collected and be offered to us as strings.

We can also pass an optional options argument between the command and the callback function:

var options = { timeout: 10000 };
exec('cat *.js | wc -l', options, function (err, stdout, stderr) { ... });

The available options are:

  • encoding: the expected encoding for the child output. Default to ‘utf8’
  • timeout: the timeout in miliseconds for the execution of the command. Default is 0 which does not timeout.
  • maxBuffer: specify the maximum size of the output allowed on stdout or stderr. If exceeded, the child is killed. Default is 200 * 1024
  • killSignal: the signal to be sent to the child, if it times out or exceeds the output buffers. Identified as string.
  • cwd: current working directory, the working directory it will operated in.
  • env: environment variables to be passed in to child process. Defaults to null.

On the killSignal option, we can pass a string identifying the name of the signal we wish to send to the target process. Signals are identified in node as strings.

Spawning Processes

If in previous section we see that we can execute a process. In node we can spawn a new child process based on the child_process.spawn function.

var spawn = require('child_process').spawn;

var child = spawn('tail', ['-f', 'file.txt']);
child.stdout.on('data', function(data) {
    console.log('stdout: ' + data);
});
child.stderr.on('data', function(data) {
    console.log('stderr: ' + data);
});

Here we are spawning a child process to run the “tail” command. The tail command need some arguments, therefore we pass array of string for “tail”, which become second argument of  spawn(). This tail receive arguments “-f” and “file.txt” which will monitor the file “file.txt” if it exists and output every new data appended to it into the stdout.

We also listening to the child stdout and printing it’s output. So here, in this case, we are piping the changes to the “file.txt” file into our Node application.  We also print out the stderr stream.

Killing Process

We can (and should) eventually kill child process by calling the kill method on the child object. Otherwise, it will become zombie process.

var spawn = require('child_process').spawn;

var child = spawn('tail', ['-f', 'file.txt']);
child.stdout.on('data', function(data) {
    console.log('stdout: ' + data);
    child.kill();
});

In UNIX, this sends a SIGTERM signal to the child process.

We can also send another signal to the child process. You need to specify it inside the kill call like this:

child.kill('SIGKILL');

NodeJS Datagrams (UDP)

December 9, 2015 | Article | No Comments

UDP is a connectionless protocol that does not provide the delivery characteristics that TCP does. When sending UDP packets, there is no guarantee for the order of packets and no guarantee for all packets will arrive. This may lead to conclusion that there is possibility that packets are arrive in random order or packets are incomplete.

On the other hand, UDP can be quite be useful in certain cases, like when we want to broadcast data, when we don’t need strict quality of delivery and sequence or even when we don’t know the address of our peers.

NodeJS has ‘dgram’ module to support Datagram transmission.

Datagram Server

A server listening to UDP port can be:

var dgram = require('dgram');

var server = dgram.createSocket('udp4');
server.on('message', function(message, rinfo) {
    console.log('server got message: ' + message +
                ' from ' + rinfo.address + ':' + rinfo.port);
});

server.on('listening', function() {
    var address = server.address();
    console.log('server listening on ' + address.address +
                ':' + address.port);
});

server.bind(4002);

The createSocket function accepts the socket type as the first argument, which can be either “udp4” (UDP over IPv4), “udp6” (UDP over IPv6) or “unix_dgram” (UDP over UNIX domain socket).

When you run the script, you will see the server address, port, and then wait for messages.

We can test it using a tool like netcat:

echo 'hello' | netcat -c -u -w 1 localhost 4002

This sends an UDP packet with “hello” to localhost port 4002 which our program listen to. You should then get the server print out like:

server got message: hello
from 127.0.0.1:54950

Datagram Client

To create an UDP client to send UDP packets, we can do something like:

var dgram = require('dgram');

var client = dgram.createSocket('udp4');

var message = new Buffer('this is a message');
client.send(message, 0, message.length, 4002, 'localhost');
client.close();

Here we are creating a client using the same createSocket function we did to create the client, with difference we don’t bind.

You have to be careful not to change the buffer you pass on client.send before the message has been sent. If you need to know when your message has been flushed to the kernel, you should pass a callback function when the buffer may be reused.

client.send(message, 0, message.length, 4002, 'localhost', function() {
    // buffer can be reused now
});

Since we are not binding, the message is sent from random UDP port. If we want to send from a specific port, we use client.bind(port).

var dgram = require('dgram');

var client = dgram.createSocket('udp4');

var message = new Buffer('this is a message');
client.bind(4001);
client.send(message, 0, message.length, 4002, 'localhost');
client.close();

The port binding on the client really mixes what a server and client are, but it can be useful for maintaining conversations like this:

var dgram = require('dgram');

var client = dgram.createSocket('udp4');

var message = new Buffer('this is a message');
client.bind(4001);
client.send(message, 0, message.length, 4002, 'localhost');
client.on('message', function(message, rinfo) {
    console.log('and got the response: ' + message);
    client.close();
});

Here we are sending a message and also listening to messages. When we receive one message we close the client.

Don’t forget that UDP is unreliable. Whatever protocol we devise on top of it, it has possibility of lost and out of order.

Datagram Multicast

One of the interesting uses of UDP is to distribute message to several nodes using only one network message. This is multicast. Message multicasting can be useful when we don’t want to need to know the address of all peers. Peers just have to “tune in” and listen to multicast channel.

Nodes can report their interest in listening to certain multicast channels by “tuning” into that channel. In IP addressing there is a space reserved for multicast addresses. In IPv4 the range is between 224.0.0.0 and 239.255.255.255, but some of these are reserved. 224.0.0.0 through 224.0.0.255 is reserved for local purposes (as administrative and maintenance tasks) and the range 239.0.0.0 to 239.255.255.255 has also been reserved for “administrative scoping”.

Receiving Multicast Message

To join a multicast address, for example 230.1.2.3, we can do something like this:

var server = require('dgram').createSocket('udp4');

server.on('message', function(message, rinfo) {
    console.log('server got message: ' + message + ' from ' + 
                 rinfo.address + ':' + rinfo.port);
});

server.addMembership('230.1.2.3');
server.bind(4002);

We say to the kernel that this UDP socket should receive multicast message for the multicast address 230.1.2.3. When calling the addMembership, we can pass the listening interface as an optional second argument. If omitted, Node will try to listen on every public interface.

The we can test the server using netcat like this:

echo 'hello' | netcat -c -u -w 1 230.1.2.3

Sending Multicast Message

To send a multicast message we simply have to specify the multicast address:

var dgram = require('dgram');

var client = dgram.createSocket('udp4');

var message = new Buffer('this is a message');
client.setMulticastTTL(10);
client.send(message, 0, message.length, 4002, '230.1.2.3');
client.close();

Here, besides sending the message, we previously set the Multicast time-to-live to 10 (an arbitrary value here). This TTL tells the network how many hops (routers) it can travel through before it is discard. Every time an UDP packet travels through a hop, TTL counter is decremented and if 0 is reached, the packet is discard.

NodeJS UNIX Sockets

December 9, 2015 | Article | No Comments

UNIX socket or Unix domain socket, also known as IPC socket (inter-process communication socket) is a data communications endpoint for exchanging data between processes executing within the same host operating system. The similar functionality used by named pipes, but Unix domain sockets may be created as connection-mode or as connectionless.

Unix domain sockets use the file system as their address name space. They referenced by process as inodes in the file system. This allows two processes to open the same socket in order to communicate. However, communication occurs entirely within the operating system kernel.

Server

Node’s net.Server class not only supports TCP sockets, but also UNIX domain sockets.

To create a UNIX socket server, we have to create a normal net.Server bu then make it listen to a file path instead of a port.

var server = net.createServer(function(socket) {
    // got a client connection here
});
server.listen('/path/to/socket');

UNIX domain socket servers present the exact same API as TCP server.

If you are doing inter-process communication that is local to host, consider using UNIX domain sockets instead of TCP sockets, as they should perform much better. For instance, when connecting node to a front-end web-server that stays on the same machine, choosing UNIX domain sockets is generally preferable.

Client

Connecting to a UNIX socket server can be done by using net.createConnection as when connecting to a TCP server. The difference is in the argument, a socket path is passed in instead of a port.

var net = require('net');
var conn = net.createConnection('/path/to/socket');
conn.on('connect', function() {
    console.log('connected to unix socket server');
});

Passing File Descriptors Around

UNIX sockets have this interesting feature that allows us to pass file descriptors from a process into another process. In UNIX, a file descriptor can be a pointer to an open file or network connection, so this technique can be used to share files and network connections between processes.
For instance, to grab the file descriptor from a file read stream we should use the fd attribute like this:

var fs = require('fs');
var readStream = fs.createReadStream('/etc/passwd', {flags: 'r'});
var fileDescriptor = readStream.fd;

and then we can pass it into a UNIX socket using the second or third argument of socket.write like this:

var socket = ...
// assuming it is UTF-8
socket.write('some string', fileDescriptor);

// specifying the encoding
socket.write('453d9ea499aa8247a54c951', 'base64', fileDescriptor);

On the other end, we can receive a file descriptor by listening to the “fd” event like this:

var socket = ...
socket.on('fd', function(fileDescriptor) {
    // now I have a file descriptor
});

We can do various Node API operation, depend on the type of file descriptor.

Read or Write into File

If it’s a file-system file descriptor, we can use the Node low-level “fs” module API to read or write data.

var fs = require('fs');
var socket = ...
socket.on('fd', function(fileDescriptor) {
    // write some
    var writeBuffer = new Buffer("here is my string");
    fs.write(fileDescriptor, writeBuffer, 0, writeBuffer.length);

    // read some
    var readBuffer = new Buffer(1024);
    fs.read(fileDescriptor, readBuffer, 0, readBuffer.length, 0,
    function(err, bytesRead) {
        if (err) {console.log(err); return; }
        console.log('read ' + bytesRead + ' bytes:');
        console.log(readBuffer.slice(0, bytesRead));
    });
});

We should be careful because of the file open mode. If the is opened with “r” flag, no write operation can be done.

Listen to the Server Socket

As another example on sharing a file descriptor between processes: if the file descriptor is a server socket that was passed in, we can create a server on the receiving end and associate the new file descriptor by using the server.listenFD method on it to it like this:

var server = require('http').createServer(function(req, res) {
    res.end('Hello World!');
});

var socket = ...
socket.on('fd', function(fileDescriptor) {
    server.listenFD(fileDescriptor);
});

We can use listenFD() on an “http” or “net” server. In fact, on anything that descends from net.Server.

NodeJS TCP

December 9, 2015 | Uncategorized | No Comments

TCP or Transmission Control Protocol, is a connection oriented protocol for data communication and data transmission in network. It provides reliability over transmitted data so the packets sent or received is guaranteed to be in correct format and order.

Node has a first-class HTTP module implementation, but this descends from the “bare-bones” TCP module. Being so, everything described here applies also to every class descending from the net module.

TCP Server

We can create TCP server and client, using “net” module.

Here, how we create a TCP server.

require('net').createServer(function(socket) {
    // new connection
    socket.on('data', function(data) {
        // got data
    });

    socket.on('data', function(data) {
        // connection closed
    });

    socket.write('Some string');
}).listen(4001);

Here our server is created using “net” module and listen to port 4001 (to distinguish with our HTTP server in 4000). Our callback is invoked every time new connection arrived, which is indicated by “connection” event.

On this socket object, we can then listen to “data” events when we get a package of data and the “end” event when that connection is closed.

Listening

As we saw, after the server is created, we can bind it to a specific TCP port.

var port = 4001;
var host = '0.0.0.0';
server.listen(port, host);

The second argument (host) is optional. If omitted, the server will accept connections directed to any IP address.

This method is asynchronous. To be notified when the server is really bound we have to pass a callback.

//-- With host specified
server.listen(port, host, function() {
    console.log('server listening on port ' + port);
});

//-- Without host specified
server.listen(port, function() {
    console.log('server listening on port ' + port);
});

Write Data

We can pass in a string or buffer to be sent through the socket. If a string is passed in, we can specify an encoding as a second argument. If no encoding specified, Node will assume it as UTF-8. The operation are much like in HTTP module.

var flush = socket.write('453d9ea499aa8247a54c951', 'base64');

The socket object is an instance of net.Socket, which is a writeStream, so the write method returns a boolean, saying whether it flushed to the kernel or not.

We can also pass in a callback. This callback will be invoked when data is finally written out.

// with encoding specified
var flush = socket.write('453d9ea499aa8247a54c951', 'base64', function(){
    // flushed
});

// Assuming UTF-8
var flush = socket.write('Heihoo!', function(){
    // flushed
});

.end()

Method .end() is used to end the connection. This will send the TCP FIN packet, notifying the other end that this end wants to close the connection.

But, we can still get “data” events after we have issued this. It is simply because there still might be some data in transit, or the other end might be insisting on sending you some more data.

In this method, we can also pass in some final data to be sent:

socket.end('Bye bye!');

Other Methods

Socket object is an instance of net.Socket, and it implements the WriteStream and ReadStream interface, so all those methods are available like pause() and resume(). We can also bind to the “drain” events like other stream object can do.

Idle Sockets

A socket can be in idle state, or idle for some time. For example, there has been no data received at moment. When this condition happen, we can be notified by calling setTimeout():

var timeout = 60000;    // 1 minute
socket.setTimeout(timeout);
socket.on('timeout', function() {
    socket.write('idle timeout, disconnecting, bye!');
    socket.end();
});

or in shorter form:

socket.setTimeout(60000, function() {
    socket.end('idle timeout, disconnecting, bye!');
});

Keep-Alive

Keep-alive is mechanism to make the server prevent timeout. The concept is very simple: when we set up a TCP connection, we associate a set of timers and some of it deal with the keep-alive procedure. When the keep-alive timer reaches zero, we send our peer a keep-alive probe packet with no data in it and the ACK flag turned on.

In Node, all the functionality has been simplified. So, we can send keep-alive notification by invoking.

socket.keepAlive(true);

We can also speficy the delay between the last packet received and the next keep-alive packet on the second argument to the keep-alive call.

socket.keepAlive(true, 10000);    // 10 seconds

Delay or No Delay

When sending off TCP packets, the kernel buffers data before sending it off and uses the Naggle algorithm to determine when to send off the data. If you wish to turn this off and demand that the data gets sent immediately after write commands, use:

socket.setNoDelay(true);

Of course we can turn it on by simply invoking it with false value.

Connection Close

This method closes the server, preventing it from accepting new connections. This function is asynchronous, and the server will emit the “close” event when actually closed:

var server = ...
server.close();
server.on('close', function() {
    console.log('server closed!');
});

TCP Client

We can create a TCP client which connect to a TCP server using “net” module.

var net = require('net');
var port = 4001;
var host = 'www.google.com';
var conn = net.createConnection(port, host);

Here, if we omitted the host for creating connection, the defaults will be localhost.

Then we can listen for data.

conn.on('data', function(data) {
    console.log('some data has arrived');
});

or send some data.

conn.write('I send you some string');

or close it.

conn.close();

and also listen to the “close” event (either by yourself, or sent by peer)

conn.on('close', function(data) {
    console.log('connection closed');
});

Socket conforms to the ReadStream and WriteStream interfaces, so we can use all of the previously described methods on it.

Error Handling

When handling a socket on the client or the server, we can (and should) handle the errors by listening to the “error” event.

Here is simple template how we do:

require('net').createServer(function(socket) {
    socket.on('error', function(error) {
        // do something
    });
});

If we don’t catch an error, Node will handle an uncaught exception and terminate the current process. Unless that’s what we want, we should handle the errors.

Social media & sharing icons powered by UltimatelySocial