Better performance testing

My latest problem to solve was performance, particularly on the algorithm that calculates lighting, and the tools I had were severely lacking, so when I made a tweak somewhere, it wasn’t completely clear whether it made things better or didn’t make a difference.

The main problem here was that, to assess performance, I had my general visual impression of “how fast it went”, which is of course flawed, and an FPS counter, which oscillates a lot, so not much help there either. To make matters worse, different scenarios have different effects on performance. Standing still is considerably faster than moving around close to the screen center, which is ridiculously faster than having to scroll; the FPS counter oscillates like crazy, and it’s hard to get a good measure of performance differences.

After thinking about this for a while, I found a workaround that’d let me be a bit more scientific in my measurements. Basically, I’d have the character automatically follow a certain walking path once the engine loaded, and I’d time how long it took to do that. This is also not 100% exact, as there is a setTimeout with 0 delay between frames, which can make things vary, but it’s reasonably realistic, and if I let it run a few times, I can get a decent measure, much more scientific than what I had so far at least.

I obviously also can use my timing function from the beginning of my experiments, where I turn the engine off, and just run one function in a loop thousands of times and time it, and I’m going to do that in some cases, but I also want to have a way to measure performance of “the whole thing”.

Coding that was reasonably simple, and I intentionally coded it in a way that is easy to “add it on” to the existing code (God, I love Javascript). That way, I can add it to older versions of the code, and check whether some of the changes I made (particularly the ones where I was following the profilers, which made my code somewhat uglier) actually made a difference, and whether that difference is big enough to justify the uglier code.

In the table below, you’ll find the results of running all the relevant versions of the code, in each of the browsers I’m testing. You’ll notice I made “lighting on” and “lighting off” tests. Lighting off means that the global light level is the maximum, in which case the code knows it doesn’t need to care about darkening stuff, and it completely bypasses all the lighting and lightmap calculations. I did this because I believe the lighting code is part of the performance problem, and I want to measure to what extent.

Times with lighting

Version Desktop Chrome Desktop Firefox Laptop Chrome Laptop Firefox HTC Desire S Portrait HTC Desire S Landscape iPhone 4 Portrait
1 24954
+-26
23264
+-80
38934
+-106
44639
+-298
91412
+-3945
90239
+-495
176235
+-3842
2 13577
+-114
8026
+-47
34776
+-174
28582
+-149
75171
+-1257
75891
+-1342
133525
+-1643
3 13655
+-104
7985
+-48
35336
+-98
29336
+-42
74852
+-362
83093
+-146
131648
+-865
4 14889
+-66
19095
+-205
38262
+-248
35366
+-72
92587
+-386
100320
+-326
189879
+-1482
5 14817
+-49
17039
+-240
38533
+-217
34861
+-79
90699
+-608
180984
+-2326
6 14922
+-46
17842
+-192
38716
+-207
36049
+-146
94242
+-1103
201195
+-1992

Times without lighting

Version Desktop Chrome Desktop Firefox Laptop Chrome Laptop Firefox HTC Desire S Portrait HTC Desire S Landscape iPhone 4 Portrait
1 24313
+-37
21117
+-190
37905
+-172
38476
+-91
62009
+-507
64704
+-552
96330
+-296
2 12877
+-126
5585
+-240
34577
+-270
24609
+-62
47220
+-217
53905
+-739
65196
+-690
3 12829
+-117
5795
+-267
34513
+-270
23971
+-66
47460
+-171
52511
+-292
65979
+-1485
4 14090
+-65
10828
+-185
37473
+-183
29390
+-240
68764
+-286
115799
+-493
5 13995
+-64
9154
+-204
36726
+-139
29172
+-140
66950
+-552
109763
+-599
6 14029
+-66
9141
+-229
36394
+-254
29408
+-93
66718
+-250
110656
+-1570

(I’m sorry for the holes, testing this took forever, so I didn’t finish all the “landscape” versions)

Versions of the code:

  • Code1: First version that included following a walking path. Includes lighting, right before doing delta drawing.
  • Code2: First version with delta drawing, incompletely done (would pan the existing canvas, and draw invalidated cells, but wouldn’t invalidate the new cols/rows when scrolling, nor the lightmap). It’s not useful to compare against Code 1, because it’s very incomplete, but it’s useful to compare to the next several versions, as I add things one by one, to see how long each of those take.
  • Code3: As part of delta drawing, I had to adjust the viewport to the screen better, to be able to draw only a few columns/rows when scrolling. This is that change, I included this version of the code to see how much that change had impacted performance. Verdict: Not much really.
  • Code4: Delta drawing: Invalidate new columns/rows that show up while scrolling, to “fill the black gaps at the borders”
  • Code5: At one point I started playing around with the Chrome and Firebug Profilers, and I made a bunch of tweaks to the code based on where I was finding bottlenecks. These basically made the code a bit uglier, and inlined a few things, to gain performance. At the time, I wasn’t sure how effective they were. Verdict: More effective than I thought.
  • Code6: The final missing change for delta drawing to be done 100%: When the lightmap changes, invalidate the cells whose lighting changed. This was a good hit to performance, now I need to figure out if it’s finding which cells to invalidate, or actually having to draw more cells that makes the impact. That’s pretty easy to figure out.

Hardware:

  • Desktop: Quad-core i7-2600 @ 3.4 GHz, 8 Gb RAM, running at 1920×1080 (both Firefox and Chrome results shown)
  • Laptop: Three year old Dell laptop: Core 2 Duo @ 2.00Ghz, 3Gb RAM (both Firefox and Chrome results shown)
  • Android: HTC Desire S, 768 MB RAM, Android v2.3.3 (480×720)
  • iPhone: iPhone 4, running inside Safari (not standalone app) (640×832)

Interesting things to note:

  1. First of all… How did I not have this before? I feel so idiotic. Just as I was running the tests, giving cursory glances to the data, I started seeing so many obvious patterns emerge I wanted to kick myself. Having a standard performance benchmark to run in all your devices (and actually running it all the time) is way more fundamental than I would’ve expected, mainly because performance-wise, browsers are way more different than I expected.
  2. Comparing equal to equal, lighting vs non-lighting: Lighting doesn’t take a lot of time for Chrome, but it does take a lot of time in Firefox, and it takes a huge amount of time in the cell phones. This was a big red-herring for me, and the main reason why not having this test framework before was a huge mistake. When I did lighting, I was checking out how it affected performance, but just in Chrome, and since it was really fast, the price I was paying in render time was absolutely worth it to get the gorgeous effect. Testing in other browsers shows I have considerably less leeway when it comes to what I can do in the lighting department, I really need to improve that.
  3. The performance gains from the first stage of delta drawing (which was thoroughly incomplete, and by far the fastest version of it) were incredibly bigger in my desktop (for both Chrome and FF) than in all the others. In the worst case (the iPhone), it is actually a bit slower than just drawing the whole frame every time. I’m not sure what this means. Probably the performance ratio in the iPhone of blitting against the extra JS processing I need to do on invalidated cells is completely different than in Chrome in my desktop. (Which would mean blitting in my desktop is stupidly slow compared to JS processing, it could be…). I’m going to keep delta drawing anyway since it’s stupidly fast if you’re not scrolling, and I can work on improving the performance of the code I’m running every frame, but this was a big disappointment to find out.
  4. The little tweaks I made based on the profiler indications were a nice win. Not in Chrome, and I now know that my computer was the worst possible choice to run the profiler on, but the tradeoff of performance vs slightly uglier code definitely paid off. I’ll be doing considerably more of this, if I can. By that I mean… In my computer, I took the profiler as far as I could, until it started giving me stupid, incorrect data. Hopefully running it in my laptop will give me new useful information.

The main thing I got out of all this is: A lot of things I tested whether they made a difference or not, and didn’t, only didn’t make a difference in Chrome, or in my beast of a computer, and they do make a big difference in other browsers/platforms.

For example, I had a bunch of constants to turn things on/off (for testing/debugging purposes), like drawing FPSs, drawing Viewport Data, etc. This meant a bunch of “if we should do x, do it”, where it didn’t do anything. Just checking the flag, in Chrome, didn’t make any difference at all, so I never bothered with it. When I removed those checks, my test runs in iPhone without lighting went from 110s to 93s. That’s a pretty fucking big change, where as in Chrome the difference was absolutely zero.

The part that sucks is that this means I need to revisit pretty much *all* my assumptions, everything, from the beginning. All the little benchmark tests I made for “unit things” need to be re-run, in all platforms.

Shit.

The second big thing I learned is that lighting is stupidly fast in my computer, and stupidly slow in all the others. That’s cool, it wasn’t written to be fast initially, it was written to be somewhat elegant and I never revisited that because it seemed fast enough (in my computer). I’ll be working *a lot* on that now.

The upside is, obviously, that it’s pretty obvious I’ll be able to squeeze some good amount of extra performance out of these…


So the moral is… Different devices / browsers are not just a matter of faster or slower. Some things will be faster or slower relative to other things. In my desktop, it seems like JS execution is stupidly fast compared to drawing to screen. In the cell phones, it’s exactly the opposite. Things that didn’t make any difference at all in my computer had dismal differences on the mobiles.

So benchmark in all the devices. All time time. Every time.

Performance problems

It’s time to really get into performance improvements. I did a bunch of performance experiments before I started, and I sanity-checked a thing or two while I was going, but I never really did serious work on the codebase, always counting on invalidation drawing to solve all my problems.

It didn’t.

Time to roll up the sleeves…

The first stop was obviously playing around a bit with the Chrome and Firefox profilers. I did some inlining of simple functions I was calling a lot. This did improve the performance of the caller routines, in theory, according to the profilers. However, since FPS oscillate a lot when scrolling, and max out at 250 when standing, the effect is not very visible. It seemed to be somewhat visible in mobile, but it wasn’t huge. Or maybe it was wishful thinking.

Also, after the first few improvements, the profilers started giving me weird data.

Chrome is telling me most of the time (20% of the total, with the other 80% belonging to “(program)”) is spent in a function that barely does anything. If I wrap the contents of that function into another function, with “(function() {})();”, the inner function uses almost zero time, as I’d expect, and the outer one still reports 20% (“self” time, not “total”). If I comment out the contents, it still takes 20% of the time, and it’s not even being called that much, compared to others… So that doesn’t make any sense.

Firefox was giving me reasonable data at first, but when I turned on lighting, I get a tremendous slowdown with the profiler on (which I didn’t get without lighting, and also turning on lighting didn’t slow the code down almost at all without the profiler), and it’s reporting times that are clearly off compared to the “profiler off”.

And the ones I wanted the most, the profilers for Android and iPhone, well… They don’t seem to exist, so not much help there…

Now, Chrome and Firefox run more than fast enough, but in the Android and iPhone, it not only runs slower, but it’s visibly obvious that the lightmap calculation is a problem. The lightmap gets recalculated every time a light source changes cells (or changes power), and since the character holds a “torch”, it happens every time you cross a cell boundary. On cell phones, there is a very clear pause when that happens. If you walk a long straight line, you see the character literally stop and stutter every time it passes from one cell to another.

So the lightmap calculation is heavy. This doesn’t surprise me, given that I made the simplest possible implementation of it, and it’s not exactly written in the fastest possible way. What does surprise me is that after making it, I did check how it impacted performance (in Chrome), and it didn’t. At all.

But without a profiler in mobile, untrustable profilers in desktop, and a fluctuating FPS meter as my only measure of performance, it’s kind of hard to make improvements. I made a bunch of little changes, and I have no idea whether they actually made a difference or not. Sometimes it seems they did, sometimes it doesn’t… It’s very frustrating and it feels very hit and miss.

My general feeling at this point is of disappointment (given, mainly, that my magic card didn’t work as I expected). We’ll see what happens, but it seems HTML5 games with a lot of screen action are still not as viable as I expected on mobile. I’ll have to keep experimenting.
Puzzlers and the like should be very viable, I believe, though, and these are the types of games I’m most interested in anyway. So we’ll see…

One thing I did, at this point, was Google around for other “HTML5 games”, to see if at least other people had been able to do cool looking, kind of intensive games in HTML5 that’d work fast enough on my mobiles.
I didn’t find anything even remotely close to what I’m trying to do. Nothing. At all. It’s either tech demos, crappy looking stuff, very, very basic things, or Flash. Lots and lots of Flash that for some reason says it’s “HTML5″
That’s both encouraging and discouraging.