Temporarily disable Ceph scrubbing to resolve high IO load

In a Ceph cluster with low bandwidth, the root disk of an OpenStack instance became extremely slow during days.

When an OSD is scrubbing a placement group, it has a significant impact on performances and this is expected, for a short while. In this case, however it slowed down to the point where the OSD was marked down because it did not reply in time:

2014-07-30 06:43:27.331776 7fcd69ccc700  1
   mon.bm0015@0(leader).osd e287968
   we have enough reports/reporters to mark osd.12 down

To get out of this situation, both scrub and deep scrub were deactivated with:

root@bm0015:~# ceph osd set noscrub
set noscrub
root@bm0015:~# ceph osd set nodeep-scrub
set nodeep-scrub

After a day, as the IO load remained stable confirming that no other factor was causing it, scrubbing was re-activated. The context causing the excessive IO load was changed and it did not repeat itself after another 24 hours, although scrubbing was confirmed to resume when examining the logs on the same machine:

2014-07-31 15:29:54.783491 7ffa77d68700  0 log [INF] : 7.19 deep-scrub ok
2014-07-31 15:29:57.935632 7ffa77d68700  0 log [INF] : 3.5f deep-scrub ok
2014-07-31 15:37:23.553460 7ffa77d68700  0 log [INF] : 7.1c deep-scrub ok
2014-07-31 15:37:39.344618 7ffa77d68700  0 log [INF] : 3.22 deep-scrub ok
2014-08-01 03:25:05.247201 7ffa77d68700  0 log [INF] : 3.46 deep-scrub ok
This entry was posted in ceph. Bookmark the permalink.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>