Select Page

Categories and sub categories. Directories and sub directories. Building a tree of nodes that have parents is a common problem these days.

There are two solutions to this problem, but unfortunately, the inefficient one is often used by programmers, due to the lack of time, or to inexperience.

Imagine, for the sake of clarity, a simple dataset that goes as follow:

  0 => Array(
    [name] => Node 0
    [parent] => null
  1 => Array(
    [name] => Node 1
    [parent] => 4
  2 => Array(
    [name] => Node 2
    [parent] => 8

The common approach is to use a recursive function that finds all the root nodes, then all their sub nodes, then all the sub nodes of their sub nodes, and so on.

function mapTree($dataset, $parent=null) {
  $tree = array();
  foreach ($dataset as $id=>$node) {
    if ($node['parent'] !== $parent) continue;
    $node['children'] = mapTree($dataset, $id);
    $tree[$id] = $node;

  return $tree;

The problem with this method is that, in terms of calculation, the problem complexity is exponential: you need to search all the dataset for children nodes for each node. Thus, building a tree of 1,000 nodes is 100 times more complicated than building a tree of 100 nodes, because you will have to search 10 times more nodes, 10 times more often.

A more efficient way to build the tree is to use dereferencing. This allows to map the whole tree in a single pass, without recursion. The problem complexity then becomes linear rather than exponential, and 10 times more nodes equals to 10 times more time (although the time required by PHP to find a specific index of an array isn’t taken into account here).

function mapTree($dataset) {
  $tree = array();
  foreach ($dataset as $id=>&$node) {
    if ($node['parent'] === null) { // root node
      $tree[$id] = &$node;
    } else { // sub node
      if (!isset($dataset[$node['parent']]['children'])) {
         $dataset[$node['parent']]['childs'] = array();
      $dataset[$node['parent']]['children'][$id] = &$node;
  return $tree;