My intention with my my "Adventures in Configuring HipHopVM" article was to describe my experiences in the hopes that it might be useful for someone else who's trying to get hhvm running. Maybe I could save someone else some time and headache. I didn't do any rigorous performance benchmarking but I did post the observation that one one particularly slow page I have, which was loading in something around 5 seconds on my development machine under PHP, would load in around 0.6 under HHVM. HHVM is wickedly fast but I didn't think much about it because it wasn't what I was focusing on. I was just happy my code was running on HHVM at all.
Imagine my surprise when the article was shared out by the HipHopVM Facebook page. From there it found it's way to Reddit. Then an article was written about it on the HighScalability blog. A number seemed to latch on to the speed observation.
They say you should never read the comments because only pain and darkness awaits you there. But I did not heed this sage warning and went obsessively reading all comments I could find. My world darkened and I began to despair.
One comment posted to the HighScalability blog article stuck with me:
"How does it compare to a modern php (5.4, 5.5) best-practice implementation? This seems a very roundabout way to improve performance, verging on architecture astronautics, unless he is trying to get a job at facebook. ... We have to take his word for everything."
This got me thinking. While the focus on the article was on getting it working with a non-trivial codebase, he does have a point that there's no way to independently verify the result since my code is not opensource yet, and may never be.
My platform is a mix of SQL, noSQL on top of MySQL (akin to the way FriendFeed implemented a NoSQL layer), tons of classes, a custom XML domain specific language parser, expression evaluator and a lot of I/O. As such it has many external dependencies that are greatly affected by configuration changes and what else is happening on the machine. What it does not do is strictly isolate the core PHP language itself so that one can make a clear apples to apples comparison between PHP and HHVM language implementations. We could do something similar to compare the full stack but that would then involve comparing the HipHopVM stack to other components such as Apache.
Interestingly the commentor went on to state that "modern php is not slow".
In computing as in motorcycling, if you have never gone fast then slow may seem fast. PHP is an interpreter. Of course it's slow.
As a general rule, I don't believe in benchmarks. They give you little insight into how a given thing is going to perform in your use case. I can tell you, subjectively, across everything I have run through it, HHVM is "way faster" than PHP ranging from 2X to 9X faster depending on the situation. For most scenarios I've tried it seems to sit somewhere between 2X and 3X faster.
I don't specialize in benchmarking but I have spent some time trying to think of a real world problem one could use as a fairly decent benchmark that would have a few qualities:
- it would be something useful.
- it would test the speed of the core language without involving extensions and it would not be I/O bound.
- it would clearly demonstrate how hhvm can allow PHP to be used in problem domains where it currently makes no sense to do so.
- it would be something that can be independently verified.
I thought about some code I evaluated at one point when I was trying to increase the performance of my expression parser, an LALR1 parser generator for PHP. It allows you to build robust parsers in PHP much the way you can in C/C++ using Yet Another Compiler Compiler (YACC) or it's wonderfully named cousin Bison. Unfortunately, the parsers generated using this tool were unbelievably slow. I wondered if maybe one of the examples bundled with it could be used as a decent benchmark.
It turns out that one of the examples involves a generated parser for a custom expression language which is then used to evaluate a bunch of expressions. It's all in code and involves no I/O or external dependencies. It's also algorithmically intense and will work a good percentage of the core features of the language.
"Perfect", I thought as I went to modify the test to add some timing.
Reproducing the LALR(1) in PHP Performance Test
To reproduce this test, download Ppage the PHP Parser Generator project from: http://www.synflag.com/de/pub/products/ppage.php. The code is in english. The example code in question is in ppage/examples/expressionLanguage/test.php.
The intent of this test is to isolate it to the PHP runtime as much as possible so any errors or I/O need to be disabled. Luckily this is easy to accomplish.
The code generates two warnings which are annoying in a loop and will affect results. Add "@" error suppression to each. The two warnings are:
PHP Notice: Uninitialized string offset: 21 in ppage/examples/expressionLanguage/ELLexer.o.inc.php on line 401
PHP Notice: Undefined index: codeopen in ppage/lib/ParserMachine.o.inc.php on line 74
There is an intentional parse error which generates a message on line 76. comment that out.
I altered test.php to remove the print statements and to execute all steps in a loop of 1000 iterations.
Alter test.php and replace everything after line 569 with:
$startTime = microtime(true);
$counter = 0;
while ( $counter++ < 1000 )
$i = 0;
$call = new ELCallBack();
foreach ($exprs as $expr)
$elex = new ELLexer($expr);
$parser = new ParserMachine ($elex, $parsertable, $parserproductions);
$result = $call->getAST();
$endTime = microtime(true);
$time = $endTime - $startTime;
print( "Time elapsed: " . $time . "\n");
I'm running Ubuntu 13.10 64 bit on a cheap several years old Dell with an Intel(R) Core(TM)2 Duo CPU E6750 @ 2.66GHz and 4 gigs of ram.
yml@humility:expressionLanguage$ php --version
PHP 5.5.3-1ubuntu2 (cli) (built: Oct 9 2013 14:49:12)
Copyright (c) 1997-2013 The PHP Group Zend Engine v2.5.0, Copyright (c) 1998-2013 Zend Technologies
with Zend OPcache v7.0.3-dev, Copyright (c) 1999-2013, by Zend Technologies
Stock Ubuntu 13.10 configuration using mod_php and Apache.
yml@humility:expressionLanguage$ hhvm --version
HipHop VM v2.3.0-dev (rel)
Repo schema: 5a59384aec1e7bb9b316f27ae87d6c2a5cd57f76
Configuration per my previous article. The build in question is not a debugging build and is one I compiled myself. (takes forever)
LALR(1) Test Results
I first ran the test on the command line using PHP:
yml@humility:expressionLanguage$ php ./test.php Time elapsed: 5.0141038894653
then followed by HHVM:
yml@humility:expressionLanguage$ hhvm test.php Time elapsed: 5.5141990184784
At this point I was a little disheartened. I had expected HHVM to be faster in all cases. I had been told by the HHVM developers that hhvm first profiles the execution before invoking the JIT. I believe they said it happens after 10 or so requests. Figuring that maybe this is not invoked when run as a command line interpreter, I repeated the experiment in the browser running the test 15 times. This should give the JIT compiler a chance to get involved.
Time elapsed: 5.0715029239655
Time elapsed: 5.067526102066
Time elapsed: 5.0686640739441
Time elapsed: 5.1444919109344
Time elapsed: 5.0798120498657
Time elapsed: 5.0674719810486
Time elapsed: 5.0758039951324
Time elapsed: 5.0680429935455
Time elapsed: 5.0671808719635
Time elapsed: 5.1225011348724
Time elapsed: 5.0637631416321
Time elapsed: 5.0711719989777
Time elapsed: 5.0798349380493
Time elapsed: 5.0700869560242
Time elapsed: 5.0699179172516
Time elapsed: 5.9072468280792
Time elapsed: 5.9045979976654
Time elapsed: 5.8999648094177
Time elapsed: 5.8969769477844
Time elapsed: 5.8965940475464
Time elapsed: 5.8854668140411
Time elapsed: 5.8934369087219
Time elapsed: 5.8918490409851
Time elapsed: 5.8934841156006
Time elapsed: 5.8858242034912
Time elapsed: 5.8903369903564
Time elapsed: 1.5902738571167
Time elapsed: 1.5128128528595
Time elapsed: 1.5166771411896
Time elapsed: 1.5109701156616
And one can clearly see where the JIT was invoked yielding, in this case, a 70% reduction in execution time. Not as dramatic as 0.6 but still impressive.
There are many other aspects of the system that could be tested to paint a more detailed picture of the performance differences between PHP and HipHopVM. One can lose a lifetime trying to benchmark every feature of a given platform and emerge from the exercise no wiser.
HHVM vm-perf Tests
HipHopVM includes a few intense benchmarks that exercise narrower feature sets. These can be found in the github repository.
Out of curiosity, I decided to experiment with a few of these.
I used the same configuration as above except for these I used wget in a loop on the local machine and stopped it after 15 requests as in:
while true; do time wget -q http://localhost/vm-perf/fibr.php; done
The tests run long enough that the overhead of processing the tests and returning the result represents a negligible percentage.
This test computes the n'th Fibonacci number recursively up to 35.
This ran pretty consistently around:
Before the JIT kicked in it was slower than PHP.
But once the JIT kicked in:
This test is designed to be used as a benchmark and contains a number of different sorts and calculations.
This is the one test case I've been able to find where PHP outperforms HHVM.
I found this test interesting in that the JIT had a profoundly negative impact in this one scenario. I suspect this is probably due to some issue that will get resolved. The HHVM team is constantly pushing the envelope of performance and I expect they will find many more ways of improving this platform as time continues.
As a general rule, benchmarks don't tell you much about how a given technology is going to perform in your particular use case. They just make an argument that it might bear investigation. Is PHP slow? Slow is relative. It is clearly much slower than HHVM in a wide range of use cases. Of course we would expect this since PHP is an interpreter and HHVM is a native compiler.
In my particular use case, which is likely representative of a wide range of web application platforms, HHVM represents a vast performance improvement over PHP without requiring me to change my code.
Because of this constantly improving speed and the multi-threaded nature of the hhvm server, I suspect that the range of problems that PHP can be used to solve will expand.
Personally, I would love to see more multi-threaded features exposed in PHP. Yes, there is already some basic multithreading support added available. But it would be awesome if one could launch long lived services in separate threads. For particularly large and heavy frameworks like my own, this would be a boon. Build up your application object trees once and then re-use across requests, you know, like real server software.
Such support would also open a host of interesting possibilities. AMQP/STOMP in PHP? XMPP? NodePHP?