Recovering from a cinder RBD host failure

OpenStack Havana Cinder volumes associated with a RBD Ceph pool are bound to a host.

cinder service-list --host
|     Binary    |          Host         | Zone |  Status | State |
| cinder-volume | | ovh  | enabled |   up  |

A volume created on this host is permanently associated with it:

$ mysql -e "select host from volumes where deleted = 0 and display_name = ''" cinder
| host                  |
| |

If the host fails, any attempt to detach the volume will fail because the cinder-api cannot reach the host:

2014-05-04 17:50:59.928 15128 TRACE cinder.api.middleware.fault Timeout: Timeout while
   waiting on RPC response - topic: "",
   RPC method: "terminate_connection" info: ""

The failed cinder host is first disabled so the scheduler will no longer try to access it:

cinder service-disable cinder-volume

The database is updated with another host configured with access to the same Ceph pool.

$ mysql -e "update volumes set host = '' \
   where deleted = 0 and display_name = ''" cinder
This entry was posted in Havana, ceph, openstack. Bookmark the permalink.

3 Responses to Recovering from a cinder RBD host failure

  1. You could cheat and set the cinder agent name in cinder.conf (host =) to a static string, shared amongst all your cinder hosts.

  2. Just set:

    host = volume

    in the cinder.conf file of every volume agent, and you should be good to go. Please note that you still have to manually update the database to use the new name for existing volumes (only once, though).
    That’s what Bloomberg’s cookbook does, for exemple:

    Thank you Jordan for the tip ;-)

Leave a Reply

Your email address will not be published. Required fields are marked *


You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>