In PHP, arrays are used for virtually everything (I’ll spare my rant on that), therefore it’s likely that you’re going to use them abundantly in your scripts. The issue we’re going to be focusing on here isn’t with the arrays themselves – it’s the simple fact that you’re probably going to have to iterate over them in order to get the data you want. PHP offers a wide range of methods for doing so, so which do we use?
First, let’s generate an array to work with. We’re going for quantity here, not size of data. Therefore, we’ll simply generate an array containing a list of integers. PHP offers a quick and easy way to do this using the array_fill() function:
1 2 | // generate an array containing 100,000 elements, each with the value of 1 $data = array_fill( 0, 100000, 1 ); |
We want to use the array_fill() function rather than range() here because we will be summing up the values in the examples below. Range will generate incremental values, leading to integer overflows and inconsistent results across various architectures.
Note: This article assumes that the reader is running at least PHP 5.0.0. If you are running anything else, you should understand that official support for PHP 4 ended back in 2008 and you should consider upgrading immediately.
for loop
Those coming from a procedural background may be most familiar with the for loop as a means of iterating through an array. Since it is so common, let’s take a look at that first:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | $time_start = microtime( true ); $total = 0; // loop through each of the array items and sum them up for ( $i = 0; $i < count( $data ); $i++ ) { $total += $data[ $i ]; } // calculate total running time and output the result $time_end = ( microtime( true ) - $time_start ); printf( "Total: %d\nloop time: %fs\n", $total, $time_end ); |
Let’s run that test and see what type of results we get:
Total: 100000
for loop time: 0.127712s
Ouch. The time that is returned from the above test will depend on the speed of your system. I do know that, on my system, that’s a lot of time to spend simply adding up elements of an array. Let’s see how we may be able to improve that a bit.
The first thing that may jump out is the use of the count method in the condition of the for clause:
for ( $i = 0; $i < count( $data ); $i++ )
It may help to understand exactly how PHP executes the above statement. The assignment portion of the loop is executed once before any iterations begin. The step/count, which is the $i++ part, is executed after each iteration. The test expression (middle) is executed before each iteration to determine if the step should be executed and if we should continue with the loop.
Alright, that’s pretty straightforward. But that is the issue. That count function is being executed for every iteration. PHP is not caching the result of that function. That gets pretty expensive considering that it would be calling the function no less than 100,000 times.
Let’s see what happens when we change it up a bit:
1 2 3 4 5 6 7 8 9 | // ... // loop through each of the array items and sum them up for ( $i = 0, $count = count( $data ); $i < $count; $i++ ) { $total += $data[ $i ]; } // ... |
Total: 100000
for loop time: 0.046609s
Wow – that is a huge improvement! We nearly tripled the speed of the iteration simply by having the conditional check a precalculated value, rather than calling the function with each iteration. That’s pretty good, right?
No; it’s still pretty slow. We can do much better.
while loop and current()
Some people may prefer to use a while loop in order to loop through the elements of an array. One of the most common ways of doing that is to use the current() function in conjunction with next():
1 2 3 4 5 6 | // loop through each of the array items and sum them up while ( $val = current( $data ) ) { $total += $val; next( $data ); } |
So how does that fair speed wise. It certainly looks a little less complicated than the for loop…that means it’s faster, right?
Total: 100000
for loop time: 0.175094s
Yikes. We took a step in the wrong direction. Let’s pretend we didn’t see this.
foreach loop
PHP 4 introduced the foreach construct which provides a much more simple means of iterating through an array. Surely if it was designed for arrays, it should be faster than for:
1 2 3 4 5 | // loop through each of the array items and sum them up foreach ( $data as $val ) { $total += $val; } |
Notice that, in the above snippet, we are also avoiding the use of a temporary variable $i to store the current index. While PHP is undoubtedly doing so internally, we’re talking about a difference between compiled and optimized C code versus interpreted, sluggish PHP code. So that alone will provide us with a bit of a benefit, however slight.
Let’s see the result:
Total: 100000
for loop time: 0.037076s
Not too bad…we managed to shave off nearly 1/4th the time of the for loop. But I’m not satisfied.
The Problem
What do all of the above have in common? They’re looping structures. Great, so what’s the issue? If you’re coming from a compiled, high-performance language like C, there is no issue. Let’s see how long it’d take to iterate through an array containing 100,000 elements in C:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 | #include <stdio.h> #include <sys/time.h> #define DATALEN 100000 int main() { int data[ DATALEN ]; register long i; unsigned long total = 0; struct timeval time_start, time_end; long elapsed, uelapsed; double elapsed_total; // create our data array for ( i = 0; i < DATALEN; data[ i++ ] = 1 ); // get current time gettimeofday( &time_start, NULL ); // loop through the array summing up the values for ( i = 0; i < DATALEN; i++ ) { total += data[ i ]; } // get total running time gettimeofday( &time_end, NULL ); elapsed = ( time_end.tv_sec - time_start.tv_sec ); uelapsed = ( time_end.tv_usec - time_start.tv_usec ); elapsed_total = ( elapsed + ( (double)( uelapsed ) / 1000000 ) ); // output total and loop time printf( "Total: %ld\nloop time: %fs\n", total, elapsed_total ); } |
Total: 100000
loop time: 0.000273s
Well, then. The point here is – PHP is a slow language, especially for looping. Wherever possible, you want to avoid it.
But you need to iterate through your arrays somehow! Does that mean you are stuck with PHP’s slow loops?
Use Built-In Functions
Fortunately, there’s certain functions available to you that can drastically improve the speed of array processing. In fact, PHP has a whole slew of functions devoted to processing and altering arrays.
Programmers coming form other languages such as C may prefer to write their own loops for processing data, because it is faster than a function call. This is not true at all with PHP; in fact, much of PHP’s power comes from it’s built-in functions. PHP itself and its libraries are written in C. Therefore, if you can use a built-in PHP function that will do the looping for you, that loop will be executed much more quickly.
Let’s take another look at our above example – summing up 100,000 values of an array. Is there a PHP function that could do this for us? array_sum() perhaps?
1 2 3 4 5 6 7 8 9 10 | $time_start = microtime( true ); // sum up the values in the array $total = array_sum( $data ); // calculate total running time and output the result $time_end = ( microtime( true ) - $time_start ); printf( "Total: %d\nfor loop time: %fs\n", $total, $time_end ); |
Above, we are making a single function call to array_sum() rather than going in a loop and manually adding everything up. Let’s see how this method does:
Total: 100000
for loop time: 0.012571s
Not bad. Using everything discussed here, we’ve cut our time down from the original 0.127712s to a mere 0.012571s. It’s still no C code, but our loop is 10 times faster than it was originally. It’s still written in PHP; we’re just using the tools we were given.
But wait – I said that PHP’s libraries were written in C. Why isn’t array_sum() just as fast as the C example provided above? Unfortunately, we’re still dealing with PHP’s variables and other internal representations here. It’s not just jumping to a position in memory like it does with an array in C. The array_sum() function still has to scan the PHP array to retrieve all the values. There’s no getting away from that.
Expense of Function Calls
One must also be mindful of what their loop is doing. Take a look at the following example. It calls a function in order to count all the values in an array that contain the number 2.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 | <?php function check_value( $val ) { return ( $val === 2 ) ? 1 : 0 ; } // generate an array with 1/3 values containing 1, the rest containing 2 $data = ( array_fill( 0, 10000, 1 ) + array_fill( 10000, 20000, 2 ) ); $time_start = microtime( true ); // count the values in the array greater than 50000 $count = 0; foreach ( $data as $val ) { $count += check_value( $val ); } // calculate total running time and output the result $time_end = ( microtime( true ) - $time_start ); printf( "Count: %d\nfor loop time: %fs\n", $count, $time_end ); |
Count: 20000
for loop time: 0.043597s
Above, we called the check_value() function 30,000 times. Functions do an excellent job at improving code clarity and reducing duplicate code. However, function and method calls are expensive. Consider what happens when we get rid of the function and do all the processing in the body of the loop:
1 2 3 4 5 6 7 8 9 | // count the values in the array greater than 50000 $count = 0; foreach ( $data as $val ) { if ( $val === 2 ) { $count++; } } |
If we run it again using that loop instead of the previous one, we get the following result:
Count: 20000
for loop time: 0.013397s
That’s a significant speed improvement. Now, I’m not advocating that you get rid of all of your functions (though I would advocate preprocessor macros in PHP). However, consider the performance implications of excessive function or method calls where it is important. In most situations, the benefits you gain from legibility will outweigh the performance pitfalls.
It should also be noted that under many circumstances, the overhead of the function call may be insignificant. If your method takes 0.3 seconds to execute, you’re not going to care about milliseconds of overhead.
Memory Tradeoff & Apparent Processing
Let’s continue with the example from the previous section (counting the number of elements in an array that contain the value 2). PHP does not contain a function that can explicitly search an array for the given value and return the number of results. It does, however, have something very similar – array_count_values().
The array_count_values() function will loop through the array and maintain a count of each of the values. It will then return an array containing the results. Therefore, in our above example, it would return an array much like the following:
1 2 3 4 | array( 1 => 10000, // there are 10,000 1s in our data array 2 => 20000, // there are 20,000 2s in our data array ); |
That’s a bit more information than we want. In fact, what if we had an array containing hundreds of different values? Wouldn’t it take longer to count every element than a single one (well, no, but one might assume)? That would account for a bit more memory usage as well, even though it does give us the answer we are looking for. Which is a more elegant solution?
When attempting to optimize code, developers must often have to choose between CPU time and memory footprint. This is one of those situations due to language limitations. Let’s take a look at how much time it takes for PHP to count the values for us:
1 2 3 4 5 6 | // count the values in the array greater than 50000 $count_data = array_count_values( $data ); $count = $count_data[ 2 ]; // output memory consumption printf( "Memory Consumption: %d bytes\n", memory_get_usage( true ) ); |
Memory Consumption: 6815744 bytes
Count: 20000
for loop time: 0.003805s
Even though we’re gathering more information, it’s still 3.5x faster than counting only a single value manually. In fact, I’m willing to bet that it even consumed less memory than our previous example, simply due to the fact that we aren’t having to deal with userland variables and function calls:
(Our example using check_value(), with memory output)
Memory Consumption: 7077888 bytes
Count: 20000
for loop time: 0.043128s
It seems that counting a single element using our check_value() method used 262144 more bytes than array_count_values()! Damn you, PHP! Yes, it’s unfortunate that considerations like this must be made, but you’d be surprised how many internal methods really do provide both strong memory and CPU benefits even though they “do more work”. Userland consumes considerable amounts of memory.
That isn’t to say that array_count_values() can’t consume more memory. Like I said previously – if it returns an array with thousands of values, it’s going to offset that memory savings.
ArrayObjects and Iterators
PHP 5 began a strong move toward Object-Oriented programming. It provides a number of powerful components that deal with iteration – specifically Traversable objects and Iterators. They provide a benefit to developers in the sense that it makes code much easier to understand and allows for powerful OO implementations, but does it provide any performance benefit?
Firstly, let’s see what happens when we loop through an ArrayObject rather than a normal array:
1 2 | // generate an array containing 100,000 elements, each with the value of 1 $data = new ArrayObject( array_fill( 0, 100000, 1 ) ); |
Total: 100000
loop time: 0.061663s
This is compared to our foreach example with a result of 0.037076s. So we did loose a bit of time there. However, that does not mean you shouldn’t use ArrayObjects. In fact, I use them a great deal in OO scripts, because ArrayObjects coupled with Iterators provide powerful designs that aren’t quite as elegant when using procedural code. However, when performance is a requirement, I have to steer clear of them.
You will find that the Iterator object provides very little benefit over a conventional foreach loop as well. The only time I make use of an Iterator is when it contains additional logic that determines what results it should return (e.g. to iterate only over odd numbers), or if the implementation allows for an Iterator to be passed in at runtime to determine what results should be returned.
1 2 3 4 5 6 7 | // loop through each of the array items and sum them up $iterator = $data->getIterator(); while ( $iterator->valid() ) { $total += $iterator->current(); $iterator->next(); } |
Total: 100000
loop time: 0.291542s
In fact, that’s our slowest method yet. So, just be wise when weighing the costs and benefits. If performance doesn’t matter – go ahead. Use whatever implementation is best for you. That’s why you have the option. If performance is a factor, don’t even glance at those objects. Maybe future versions of PHP will enhance the functionality to the point where their speeds are equivalent to the loop structures.
Wall Of Text Summary
PHP provides a number of ways to iterate through arrays – rest assured that we haven’t nearly covered them all here. Out of all the loop structures available, foreach is the fastest for iteration. However, it comes up short when compared to built-in functions designed for array processing. Where possible, those built-in functions should be used, even where it may seem at first glance that it may be slower. They often provide a very strong performance benefit – both CPU and memory wise.
PHP 5 introduced a number of new means of iteration through the use of objects. Certain implementations may do well to improve code legibility, and may enable the developer to implement some fairly powerful object-oriented designs, but when performance is a factor, you want to steer clear of them.
Hopefully this article helped to clear up some questions. I expect not to see any more for loops with count() in the loop test expression!
Tagged: Development, PHP


No Trackbacks
You can leave a trackback using this URL: http://mikegerwitz.com/php-performance-array-iteration/trackback/
5 Comments
I find our article very interesting but I think that all the output timings are in fact in seconds and not milli-seconds???
Hi Mike,
Brilliant page, so good in fact that I have popped this in to my bookmarks for reference purposes.
All the best,
James
Heh, excellent observation, Delhaise. I’ll make that correction.
good article, thanks very much for sharing.
Thanks for this. I always wondered just how expensive method calls were inside a loop as opposed to putting the logic within the loop itself.