Monday, August 1, 2016

The Tortoise and The Hare - Array_splice vs Array_pop in PHP

Sometimes, the most succinct and elegant piece of code can fall short on other factors like speed of execution due to the hidden complexity of the underlying code powering a high level language like PHP.  This happened to me today while I was refactoring a data processing routine.  For a fleeting moment, I experienced a sense of accomplishment having managed to reduce the number of lines of code in the routine to make it more readable and maintainable.  Unfortunately, that gleeful moment only lasted until I ran the routine against a data set.  To my utter surprise, the code ran 900 times slower than before!

Through the process of elimination, I tracked the culprit down to the PHP function array_splice.


 array array_splice ( array &$input , int $offset [, int $length = 0 [, mixed $replacement = array() ]] )


From the PHP manual, the array_splice function

"Removes the elements designated by offset and length from the input array, and replaces them with the elements of the replacement array, if supplied."

For my task, I needed to process the elements in an array starting from the end with a set of N elements at a time so array_splice seemed like the perfect function to use.  The code I had before refactoring was similar to the following stripped down snippet:


 // initialize arrays with 50,000 elements
 $arr = array_fill(0, 50000, 'abc');

 // remove last 5 elements at a time using array_pop
 while (count($arr)) {
    $subArr = [];
    for ($i = 0; $i < 5; $i++) {
        $subArr[] = array_pop($arr);
    }
 }


In the snippet above, N is set to 5 (i.e. remove 5 elements at a time), but in reality, N can change inside the while-loop so the PHP function array_chunk is not applicable.  Also, array_slice isn't appropriate because I needed the input array to be truncated inside the while-loop.

The following refactored version replaces the ugly inner for-loop with a single array_splice statement.


 // initialize arrays with 50,000 elements
 $arr = array_fill(0, 50000, 'abc');

 // remove last 5 elements at a time using array_splice
 while (($len = count($arr)) > 0) {
    $subArr = array_splice($arr, $len - 5);
 }


Don't you agree the refactored version is simpler and easier to read?  At least I thought so.  Unfortunately, while the version before refactoring took < 0.02 second to complete, the refactored version took 18 seconds in Zend PHP 5.6.  Furthermore, as the number of elements in the array scales up, the execution time for array_splice is non-linear while array_pop is more or less linear.  This is illustrated in the following graph by comparing the running time of the two code snippets above over an increasing array size.


Execution Time of Array_splice vs Array_pop


My first thought was that perhaps arrays are implemented as singly linked lists in PHP so every time array_splice() is called, the linked list has to traverse from start until the given offset is reached near the end of the list.  This would explain the non-linear time performance.  If that's the case, if I use array_splice() to remove the first N elements instead of the last N elements as in the following code snippet, the time performance should be linear.


 // initialize arrays with 50,000 elements
 $arr = array_fill(0, 50000, 'abc');

 // remove first 5 elements at a time using array_splice
 while (($len = count($arr)) > 0) {
    $subArr = array_splice($arr, 0, 5);
 }


Strangely, when I ran the test, the run time is almost identical to the graph above so I'm at a loss. I have only tested this with Zend PHP 5.6.  I don't have Zend PHP 7 or HHVM at my disposal but I wonder how they stack up.