If it's not metered, it's not cloud

Introduction to OpenStack Ceilometer metering

Rich Bowen - rbowen@redhat.com

@rbowen

@rdocommunity

http://openstack.redhat.com/

SLIDES ARE AT: http://tm3.org/ceilometer

Ceilometer

Measures the height of the clouds.

Ceilometer

Why?

Elastic is great, but if you don't know how far it's being stretched, you'll find out the hard way when it snaps.

Infinite resources is a nice illusion, but an illusion all the same.

Alarms trigger when a threshold is reached, and can take action based on that event.

Mission

The project aims to become the infrastructure to collect measurements within OpenStack so that no two agents would need to be written to collect the same data. Its primary targets are monitoring and metering, but the framework should be easily expandable to collect for other needs. To that effect, Ceilometer should be able to share collected data with a variety of consumers.

Definitions

Metering:

Measure and record what's happening

Monitoring:

Notify ("alarm") when one of the meters reaches a threshold

Monitoring

Alarms

Capacity planning vs point in time monitoring

You'll need to think ahead about what you want to be notified about, of course, since you only get notified about things that you set alarms for. ie, you don't know you need it until you need it that first time.

API

Installation

Don't really have time to go through manual installation. It's on the wiki.

RDO

sudo yum install -y http://rdo.fedorapeople.org/rdo-release.rpm
sudo yum install -y openstack-packstack
packstack --allinone

This is not a product pitch. RDO is a community project sponsored by Red Hat, but isn't a product.

Configuration

< interval: 600
> interval: 60

Configuration

Database

Auth

curl -i 'http://192.168.0.177:5000/v2.0/tokens'
-X POST -H "Content-Type: application/json"
-H "Accept: application/json"
-d '{"auth": {"tenantName": "admin",
     "passwordCredentials": {"username":
     "admin", "password": "885d4b669cc04a1b"}}}'

API

Meters

curl -H 'X-Auth-Token: <inserttokenhere>' \
  "http://localhost:8777/v2/meters"
{
  'resource_id' => '06ac2e66591345e1a46f7a2193aeabf9',
  'user_id' => ${\$VAR1->[0]{'user_id'}},
  'type' => 'gauge',
  'meter_id' => 'MDZhYzJlNjY1OTEzNDVlMWE0NmY3YTIxOTNhZWFiZjkrc3RvcmFnZS5vYmplY3Rz',
  'name' => 'storage.objects',
  'project_id' => '06ac2e66591345e1a46f7a2193aeabf9',
  'source' => 'openstack',
  'unit' => 'object'
},
{
  'type' => 'gauge',
  'user_id' => ${\$VAR1->[0]{'user_id'}},
  'resource_id' => '06ac2e66591345e1a46f7a2193aeabf9',
  'name' => 'storage.objects.size',
  'meter_id' => 'MDZhYzJlNjY1OTEzNDVlMWE0NmY3YTIxOTNhZWFiZjkrc3RvcmFnZS5vYmplY3RzLnNpemU=',
  'project_id' => '06ac2e66591345e1a46f7a2193aeabf9',
  'source' => 'openstack',
  'unit' => 'B'
},
  ...

Or, from the cli

Ceilometer CLI

<subcommand>
  alarm-combination-create
                      Create a new alarm based on state of other alarms.
  alarm-combination-update
                      Update an existing alarm based on state of other
                      alarms.
  alarm-create        Create a new alarm (Deprecated).
  alarm-delete        Delete an alarm.
  alarm-history       Display the change history of an alarm.
  alarm-list          List the user's alarms.
  alarm-show          Show an alarm.
  alarm-state-get     Get the state of an alarm.
  alarm-state-set     Set the state of an alarm.
  alarm-threshold-create
                      Create a new alarm based on computed statistics.

Ceilometer CLI, cont'd

alarm-threshold-update
                    Update an existing alarm based on computed statistics.
alarm-update        Update an existing alarm.
meter-list          List the user's meters.
resource-list       List the resources.
resource-show       Show the resource.
sample-create       Create a sample.
sample-list         List the samples for this meters.
statistics          List the statistics for this meter.
help                Display help about this program or one of its
                    subcommands.

Meter by name

What can I measure?

# ceilometer meter-list | grep cpu

+----------------------------+------------+-----------+
| Name                       | Type       | Unit      |
+----------------------------+------------+-----------+
| cpu                        | cumulative | ns        |
| cpu_util                   | gauge      | %         |
| vcpus                      | gauge      | vcpu      |
| vcpus                      | gauge      | vcpu      |
| vcpus                      | gauge      | vcpu      |
| vcpus                      | gauge      | vcpu      |

...

---------+----------------------------------------------+-----------
nit      | Resource ID                                  | User ID
---------+----------------------------------------------+-----------
s        | a940742a-e5ea-4ca5-8b72-3d69a40bb90f         | dc65900640
         | a940742a-e5ea-4ca5-8b72-3d69a40bb90f         | dc65900640
cpu      | 0efbc49a-3b8e-46f0-b1ed-4ae350660338         | dbf119d6fb
cpu      | 36907e0e-0beb-46d8-854f-eb91c886a6dc         | dbf119d6fb
cpu      | d02ca358-c5b9-49f5-b912-3a30d1ce5514         | b1d175a25e
cpu      | e21b3a26-7f11-4d55-8886-e5a9cca16316         | f6da3a210d

...

-----------+----------------------------------+----------------------------------+
           | User ID                          | Project ID                       |
-----------+----------------------------------+----------------------------------+
0f         | dc65900640f54c6fa79460e419b236b8 | 457eaeb167394dff9b436367d97d218f |
0f         | dc65900640f54c6fa79460e419b236b8 | 457eaeb167394dff9b436367d97d218f |
38         | dbf119d6fb5b491aaf31efd8e6561354 | 3f5a0852f30c464c8127f0c8a2baa57a |
dc         | dbf119d6fb5b491aaf31efd8e6561354 | 3f5a0852f30c464c8127f0c8a2baa57a |
14         | b1d175a25e254b6bbc70a72ad3fe63e0 | 9296ff92d66e452682c721fe2dafe0f5 |
16         | f6da3a210df3406e929683cc07358d2c | 876c28a7f2ea4e59964b5e06e0f32972 |

Statistics

Any samples about tiny instances, in June.

GET /v2/meters/instance:m1.tiny
q: [{"field": "timestamp",
   "op": "ge",
   "value": "2013-06-01T00:00:00"},
   {"field": "timestamp",
   "op": "lt",
   "value": "2013-07-01T00:00:00"},
   {"field": "project_id",
   "op": "eq",
   "value": "8d6057bc-5b90-4296-afe0-84acaa2ef909"}]

Aggregated statistics for those samples

GET /v2/meters/instance:m1.tiny/statistics
q: [{"field": "timestamp",
   "op": "ge",
   "value": "2013-06-01T00:00:00"},
   {"field": "timestamp",
   "op": "lt",
   "value": "2013-07-01T00:00:00"},
   {"field": "project_id",
   "op": "eq",
   "value": "8d6057bc-5b90-4296-afe0-84acaa2ef909"}]

CPU utilization on this resource in June.

GET /v2/meters/cpu_util/statistics
q: [{"field": "timestamp",
    "op": "ge",
    "value": "2013-06-01T00:00:00"},
    {"field": "timestamp",
    "op": "lt",
    "value": "2013-07-01T00:00:00"},
    {"field": "resource_id",
    "op": "eq",
 "value": "64da755c-9120-4236-bee1-54acafe24980"}]

10 minute aggregates

GET /v2/meters/cpu_util/statistics
q: [{"field": "timestamp",
   "op": "ge",
   "value": "2013-06-01T00:00:00"},
   {"field": "timestamp",
   "op": "lt",
   "value": "2013-07-01T00:00:00"},
   {"field": "resource_id",
   "op": "eq",
   "value": "64da755c-9120-4236-bee1-54acafe24980"}]
period: 600

Syntax

If you're not familiar with the syntax:

GET /v2/meters/cpu_util/statistics
q: [{"field": "timestamp",
   "op": "ge",
   "value": "2013-06-01T00:00:00"},
   {"field": "timestamp",
   "op": "lt",
   "value": "2013-07-01T00:00:00"},
   {"field": "resource_id",
   "op": "eq",
   "value": "64da755c-9120-4236-bee1-54acafe24980"}]
period: 600

So, a URL might look like:

http://192.168.0.177:8777/v2/meters/memory/statistics
    ?q.field=resource_id&q.op=eq&q.value=
    36907e0e-0beb-46d8-854f-eb91c886a6dc
    &q.field=timestamp&q.op=ge
    &q.value=2011-06-01T00:00:00
    &q.field=timestamp&q.op=lt
    &q.value=2014-07-01T00:00:00
    &period=600

Stats output

Output will vary depending on what you're querying:

'duration_start' => '2013-11-19T20:43:50.013000',
'duration_end' => '2013-11-25T17:10:40.135000'
'duration' => '505610.122',
'period_start' => '2013-11-19T20:43:50.013000',
'period_end' => '2013-11-19T20:43:50.013000',
'period' => 0,
'groupby' => undef,
'unit' => 'MB',
'count' => 8,
'sum' => '4096',
'avg' => '512',
'min' => '512',
'max' => '512',

Stats

# ceilometer sample-list -m cpu
+--------------------------------------+------+------------+
| Resource ID                          | Name | Type       |
+--------------------------------------+------+------------+
| a940742a-e5ea-4ca5-8b72-3d69a40bb90f | cpu  | cumulative |
| a940742a-e5ea-4ca5-8b72-3d69a40bb90f | cpu  | cumulative |
| a940742a-e5ea-4ca5-8b72-3d69a40bb90f | cpu  | cumulative |
| a940742a-e5ea-4ca5-8b72-3d69a40bb90f | cpu  | cumulative |
| a940742a-e5ea-4ca5-8b72-3d69a40bb90f | cpu  | cumulative |
+--------------------------------------+------+------------+

...

-+------------+---------------+------+---------------------+
 | Type       | Volume        | Unit | Timestamp           |
-+------------+---------------+------+---------------------+
 | cumulative | 71250000000.0 | ns   | 2014-01-22T16:00:05 |
 | cumulative | 65300000000.0 | ns   | 2014-01-22T15:59:05 |
 | cumulative | 59390000000.0 | ns   | 2014-01-22T15:58:05 |
 | cumulative | 52990000000.0 | ns   | 2014-01-22T15:57:06 |
 | cumulative | 13430000000.0 | ns   | 2014-01-22T15:50:40 |
-+------------+---------------+------+---------------------+

Or, from the API ...

$ ./statistics -m
    YTk0MDc0MmEtZTVlYS00Y2E1LThiNzItM2Q2OWE0MGJiOTBmK2NwdQ==

Fetching http://192.168.0.177:8777/v2/meters/cpu/statistics
?q.field=resource_id&q.op=eq
&q.value=a940742a-e5ea-4ca5-8b72-3d69a40bb90f

(That was just a shell script that made that curl request.)

'unit' => 'ns',
'period' => 0,
'groupby' => undef,
'period_start' => '2014-01-22T15:50:40',
'period_end' => '2014-01-22T15:50:40',
'duration_start' => '2014-01-22T15:50:40',
'duration_end' => '2014-01-22T16:03:05',
'duration' => '745',
'count' => 8,
'sum' => '512140000000',
'min' => '13430000000',
'max' => '89030000000',
'avg' => '64017500000'

Billing

Units

Type

Each meter also has a type:

Cumulative:

Increasing over time (instance hours, cpu usage)

Gauge:

Discrete items (floating IPs, image uploads) and fluctuating values (disk I/O)

Delta:

Changing over time (eg change in number of routers)

Range

You can specify a range of time using the timestamp arguments in the query:

# ceilometer statistics -m cpu -q \
"timestamp>2014-01-22T16:05:00;timestamp<2014-01-22T17:05:00"

Output is intended to be human-readable rather than machine readable, so the API may be better suited to automated reporting.

+--------+---------------------+---------------------+-------
| Period | Period Start        | Period End          | Count
+--------+---------------------+---------------------+-------
| 0      | 2014-01-22T17:04:05 | 2014-01-22T17:04:05 | 60
+--------+---------------------+---------------------+-------

...

--------+-------+----------------+----------------+--------------+-
        | Count | Min            | Max            | Sum          |
--------+-------+----------------+----------------+--------------+-
7:04:05 | 60    | 101730000000.0 | 467430000000.0 | 1.700835e+13 |
--------+-------+----------------+----------------+--------------+-

...

-----+--------------+----------------+----------+
     | Sum          | Avg            | Duration |
-----+--------------+----------------+----------+
00.0 | 1.700835e+13 | 283472500000.0 | 3540.0   |
-----+--------------+----------------+----------+

Alarms

ceilometer alarm-threshold-create \
 --name high_cpu_alarm \
 --description 'instance running hot'  \
 --meter-name cpu_util  --threshold 75.00 \
 --comparison-operator gt  --statistic avg \
 --period 60 --evaluation-periods 1 \
 --alarm-action 'log://' \
 --query resource_id=$INSTANCE_ID

Trigger

--meter-name cpu_util  --threshold 75.00 \
--comparison-operator gt  --statistic avg \
--period 60 --evaluation-periods 1 \
[root@localhost ~(keystone_admin)]# ceilometer alarm-list
+--------------------------------------+----------------+-
| Alarm ID                             | Name           |
+--------------------------------------+----------------+-
| bea23f44-16e0-4faf-9d85-e04d8c3bc13d | high_cpu_alarm |
+--------------------------------------+----------------+-

...

-+-------------------+---------+------------+-------
 | State             | Enabled | Continuous | Alarm
-+-------------------+---------+------------+-------
 | insufficient data | True    | False      | cpu_ut
-+-------------------+---------+------------+-------

...

---------+-------------------------------+
ntinuous | Alarm condition               |
---------+-------------------------------+
lse      | cpu_util > 5.0 during 1 x 60s |
---------+-------------------------------+

Then

# tail /var/log/ceilometer/alarm-notifier.log
...
2014-01-22 13:29:15.989 16226 INFO
ceilometer.alarm.notifier.log
[req-ecbc19b0-d9bb-4f6e-9a2a-5e3a519f112c None None]
Notifying alarm bea23f44-16e0-4faf-9d85-e04d8c3bc13d
from insufficient data to alarm with action
SplitResult(scheme=u'log', netloc=u'', path=u'',
query='', fragment='') because Transition to alarm
due to 1 samples outside threshold, most recent: 10.2

Actions

--alarm-action 'http://localhost/alarm'

Heat

And so much more

Would be nice ...

Ceilometer, Nagios, whatever

More Info

Finis

Rich Bowen

rbowen@redhat.com

@rbowen

@rdocommunity

http://tm3.org/ceilometer

SpaceForward
Left, Down, Page DownNext slide
Right, Up, Page UpPrevious slide
POpen presenter console
HToggle this help