During the OpenStack summit of May 2015 in Vancouver, the OpenStack Telemetry community team ran a session for operators to provide feedback. One of the main issues operators relayed was the polling that Ceilometer was running on Nova to gather instance information. It had a highly negative impact on the Nova API CPU usage, as it retrieves all the information about instances on regular intervals.

Indeed, it turns out that Nova is not optimizing the retrieval of these bits of information (a few rows in a database), and does not utilize a cache. Fortunately, Nova does provide a way to poll more efficiently with the Changes-Since request parameter.

As a result of this discovery, the Telemetry team built a blueprint named “resource-metadata-caching”, targeting the implementation of a local in-memory cache in Ceilometer, and the use of the Changes-Since parameter. This blueprint has been completed by Jason Myers during the Liberty development cycle and is therefore part of the final version of Ceilometer released for the Liberty cycle.

Recently, the Red Hat PerformanceQE team decided to run a test for the upcoming version of the Red Hat Enterprise Linux OpenStack Platform 8, which will be based on community OpenStack “Liberty,” in order to see how this blueprint improved the performance of Ceilometer and Nova. We deployed the current shipping version, RHEL OpenStack Platform 7, as well as the current beta version of  RHEL OpenStack Platform 8, and ran the same polling tests while measuring the CPU usage of both platforms. The test configuration was deployed with 10 computes nodes, 200 virtual machines, and polling was set to every 5 seconds.

Ceilometer_perf_test

As the graph clearly demonstrates, the performance improvement for Ceilometer alone is quite significant, decreasing the CPU usage by 70%. It may be possible to further improve the Nova side as well, in order to reduce the CPU usage while retrieving such information.

This blueprint is a great example that demonstrates how the community Telemetry team addresses problems that are reported by downstream operators, and how the community and customers use the software it produces. This change is currently available in OpenStack Liberty which will be included as part of RHEL OpenStack Platform 8.