Overview of memory allocation
What follows is an analysis of the heap allocation pattern for a few pages containing different kinds of contents, obtained using Valgrind's massif tool. MxLauncher was modified to wait for the page to load, then wait 5 seconds, then quit. The base command line used for obtaining the measurements follows:
valgrind --smc-check=all --threshold=0.1 --alloc-fn='g_malloc' --alloc-fn='g_malloc0' --alloc-fn='g_realloc' --alloc-fn='g_slice_alloc' --alloc-fn='g_slice_alloc0' --alloc-fn='slab_allocator_alloc_chunk' --alloc-fn='alloc_small' --alloc-fn='ralloc_size' --alloc-fn='resize' --alloc-fn='ra_alloc_reg_set' --alloc-fn='alloc_texmat_data' --alloc-fn='alloc_texgen_data' --alloc-fn='ft_mem_qalloc' --alloc-fn='_mesa_align_malloc' --alloc-fn='_mesa_vector4f_alloc' --alloc-fn='_mesa_align_calloc' --alloc-fn='_tnl_init_vertices' --alloc-fn='rzalloc_size' --alloc-fn='WTF::fastMalloc(unsigned long)' --alloc-fn='WTF::fastZeroedMalloc(unsigned long)' --alloc-fn='WTF::tryFastMalloc(unsigned long)' --alloc-fn='WTF::fastRealloc(void*, unsigned long)' --alloc-fn='WTF::VectorBufferBase<char>::allocateBuffer(unsigned long)' --alloc-fn='WTF::Vector<char, unsigned long>::expandCapacity(unsigned long)' --alloc-fn='WTF::VectorBufferBase<JSC::ValueProfile>::allocateBuffer(unsigned long)' --alloc-fn='JSC::CodeBlock::operator new(unsigned long)' --alloc-fn='_default_mem_new_block' --tool=massif ./Programs/MxLauncher <url>
This is a reading of the memory allocation for the test browser when loading Google's main page, which is a relatively simple page. The big red rectangle at the bottom, taking up 13MB of memory, is allocated very early in the process lifetime. That is the GL context allocation performed by cogl at startup. That allocation happens very early and remains the same throughout the program's execution, not being affected by the creation of the WebView actor and its sub-actors (such as the tiles for the backing store), which means the GL stack probably pre-allocates a larger amount of memory than strictly necessary and then hands out that memory as needed. In addition to those 13MB, about 3MB more are also associated with internal memory used by the GL stack, coming from the _swsetup_CreateContext and brwNewProgram functions. The allocation does not seem excessive and this is outside of WebKit Clutter's control. However, tuning the GL stack might be worthy investigating in future performance research.
The light red that goes on top of the graph represents over-allocation done by the memory allocation functions. It is very common to have memory allocation functions ask the operating system for a much larger memory chunk than the one requested by the API user, and hand it out over as the API user asks for more. That is a common optimization with two main goals: to avoid memory fragmentation, making the application's usage pattern more CPU cache-friendly, and to reduce the number of system calls performed by the application. System calls are usually very expensive, since they require context switches to and from kernel space, so this is a common optimization found in pretty much all operating systems and basic system libraries.
This is a reading of the test browser loading image search. The overall memory allocation looks very similar to the simple page case. There are two interesting changes, though. This time we can see the JSC JIT showing up in our graph - using roughly 3MB, indicating some of the JS code in this page is executed many times, causing JSC to consider it a hot path and worthy of being compiled. The second, and expected, change is the presence of WebCore::ImageFrame::setSize(), used by the image decoding/rendering code in WebKit to create and expand the buffer used to hold image data. There are several images in this page, so that is expected. The allocated memory does not seem to be excessive.
This reading comes from loading this blog post. Notice there is an HTML5 video tag in this post, but the video is not playing. Furthermore, the video tag has the preload attribute set to none. That means video should not start downloading before the play button is pressed. What we see, though, is that GStreamer does start preloading the video: the orange area of the graph showing a 10MB allocation comes from GStreamer allocating buffers for video data it is downloading.
That is bug in the GStreamer HTML5 implementation of the preload attribute, even though the preload attribute is to be a hint that may be ignored by user agents according to the specification. For Apertis it is probably a good idea to ignore preload completely but do the opposite of what is currently being done, i.e., never preload.
One more thing of note in this graph is the yellow spike. That is caused by the JPEG decoder allocating memory to decode and render the JPEG image the blog post uses as the poster for the video tag. That memory is quickly reclaimed so it is not something to worry about. The decoded image, however, may be cached for a long time, to make repainting less costly. That cache and the trade-offs involved will be discussed later in this document.
This reading comes from loading this WebGL sample. The memory allocated by the GL stack is bigger in this case because a different platform was used (EGL instead of GLX). Also of note, the light blue area is memory used by the tiled backing store tiles, which did not show up before. The big orange area represents memory allocated for the images used as textures in the WebGL sample. As discussed before, the internal memory cache used by WebKit to store the decoded images will be looked into in more detail later in this document.
WebCore has a memory cache that it lays on top of the disk cache, to serve resources even faster when necessary. There is also what is called the page cache, which keeps a number of pages that have been visited parsed and laid out for a very fast load when the back button is hit. Note that not all pages benefit from the page cache, there are rules that dictate the cacheability of a given page. The cache sizes can be customized and they can be disabled as well. Investigation was performed loading the following image-heavy pages and measuring memory usage with massif:
The above reading comes from loading all those pages with the cache enabled. The resulting graph shows memory usage keeps growing, and we end our measurement with a few kilobytes short of a 100MB heap cost, with 60MB being used.
The second reading shows a peak that is very similar to our reading with caches enabled, but that memory is quickly reclaimed partially, and the final measurement shows around 93MB heap cost, with about 46MB being actually used. That's a bit more than 20% saved by disabling the caches. Although the heap cost is not that much lower, the savings on live bytes would already represent a better chance of not triggering swap for pages that are actually being used.
SVG is a big chunk of code in WebKit, and disabling it represents savings of around 15% in binary size, and around 12% less internal symbols for the library. While this also represents small savings in terms of memory usage, load time and symbol resolution time are the main ones to benefit. Code size being smaller may also help with cache efficiency. While SVG is one of the features that people relate to the modern web, its actual usage has been very limited up to now. Most mobile OSes today seem to enable SVG on their WebKit, but disabling it is an option that may be considered.
WebKit Clutter provides the webkit_set_cache_model API to let applications specify what kind of usage patterns they have. Setting the cache model to WEBKIT_CACHE_MODEL_DOCUMENT_VIEWER disables both the page cache and the memory cache. Although the more conceptually correct choice would seem to be WEBKIT_CACHE_MODEL_WEB_BROWSER, Collabora recommends adopting the document viewer cache model for the Apertis browser, given the results above. This cache model should also be adopted for the web runtime.
Collabora also recommends disabling the media preload completely. Given the preload attribute is just a hint that does not need to be honored by the user agent, it makes sense to avoid any memory pre-buffering on memory constrained devices such as Apertis. Collabora added a new enable-media-preload setting to WebKitWebSettings that can be set to FALSE to disable preloading of media completely.