Ceilometer is a tool that collects usage and performance data, while Heat orchestrates complex deployments on top of OpenStack. Heat aims to autoscale its deployments, scaling up when they're running hot and scaling back when idle.
Ceilometer can access decisive data and trigger the appropriate actions in Heat. The result of these two OpenStack projects meeting is value creation in the form of an alarming API in Ceilometer and its consumption in Heat.
Slides presented at the Fall OpenStack Design Summit in Hong Kong
2. Speakers
● Nick Barcet co-founded the Ceilometer project at the
Folsom summit and led the project through incubation
● Julien Danjou has been a core Ceilometer contributor
from the outset, taking over the PTL reins for Havana
● Eoghan Glynn drove the addition of the Alarming
feature to Ceilometer over the Havana cycle
3. Two seemingly disjoint
projects intersect
● Heat is a template-driven orchestration engine
○ automates complex deployments via declarative
configuration
● Ceilometer is a metering infrastructure
○ collects data measuring resource usage and
performance
● Appear on the surface to have minimal commonality ...
12. Ceilometer to the rescue!
● compute agent already collects most
relevant stats from outside the instance
● API service exposes aggregation over the
evaluation window
● define new API exposing alarm lifecycle
● provide new service to evaluate alarms
against their defined rules
● additional service driving asynchronous
notifications when alarms fire
13. How it all hangs together
{ "AWSTemplateFormat" : "2010-0909",
"Parameters": { "VolumeSize" : { …
}}
"Mappings": {
"Flavor2Arch" : { "tiny": {"Arch" : "64"
},
... },
"Resources": {
"MyInstance" : {
"Type" : "AWS::EC2::Instance",
"Properties" : { “Volumes” : […] }
} } },
"Outputs": { "DNS" : { "Value" : { … }
}}}
added to template
● alarms bounding busy/idleness of
instances
● membership of autoscale group
represented via user metadata
● alarm actions refer to scale
up/down policies
● action URLs are pre-signed
● policies define adjustment step size
& cooldown period
14. How it all hangs together
"CPUAlarmHigh": {
"Type": "OS::Metering::Alarm",
"Properties": {
"meter_name": "cpu_util", threshold: "75"
"evaluation_periods": "5", "period": "60",
"statistic": "avg", "comparison_operator": "gt",
"description": "Scale-up if CPU > 75% for 300s",
"alarm_actions":[…"ScaleUpPolicy", "AlarmUrl"…],
"matching_metadata": {
"metadata.user_metadata.server_group":
"MyWebServerGroup"
}}}
15. How it all hangs together
Heat Engine
injects user
metadata
my_stack
Instance
16. How it all hangs together
Heat Engine
injects user
metadata
my_stack
Instance
API service
Ceilometer
creates
alarms
17. How it all hangs together
API service
Heat Engine
injects user
metadata
my_stack
Instance
monitors
instances
Compute
Agent
Ceilometer
creates
alarms
18. How it all hangs together
API service
Heat Engine
injects user
metadata
my_stack
Instance
triggers
alarm
monitors
instances
Alarm
evaluator
Compute
Agent
Ceilometer
creates
alarms
19. How it all hangs together
alarming
Heat Engine
Alarms
injects user
metadata
my_stack
Instance
Instance
Instance
scales out
stack
Compute
Ceilometer
API
20. How it all hangs together
alarming
Heat Engine
Alarms
injects user
metadata
my_stack
Instance
Instance
Instance
Instance
Instance
scales out
stack
Compute
Ceilometer
API
21. How it all hangs together
API service
my_stack
Meter store
Instance
provides
alarm rules
queries
stats
reports
samples
Compute
Agent
Alarm
evaluator
Ceilometer
Heat Engine
22. Lessons learned
Keys to successful intra-project interactions:
● buy-in from stakeholders on both sides
● early validation and proof-points
● protect consuming project from churn during
the development cycle
● split deliverables into bite-sized separately
consumable chunks
23. Future directions
● expand metering coverage to also capture:
○ memory utilization %
○ LBaaS statistics
○ network & disk I/O rates
● add combination alarm support to Heat
templates
○ allow thresholds over multiple metrics to be modeled
● exclude low-quality datapoints
○ avoid scaling when only outliers have reported metrics
24. Future directions
● monitor baremetal via IPMI or SNMP
○ autoscale groups of hosts managed as Ironic instances
● constrain alarms for time-of-day or day-of-week
○ e.g. set the bar higher on weekends, lower on weekdays
● decouple autoscaling usage from Heat templates
● authenticate webhook calls with keystone trusts
○ avoid ec2-signer use without keystone EC2 tokens ext
25. Further questions?
● Chat on Freenode:
○ #openstack-metering
○ #heat
● Mail the dev list:
○ openstack-dev@lists.openstack.org
● Harangue us via Launchpad:
○ https://launchpad.net/ceilometer/+filebug