Author Archives: Loic Dachary

An algorithm to fix uneven CRUSH distributions in Ceph

The current CRUSH implementation in Ceph does not always provide an even distribution. The most common cause of unevenness is when only a few thousands PGs, or less, are mapped. This is not enough samples and the variations can be … Continue reading

Posted in ceph, crush, libcrush | Leave a comment

Ceph space lost due to overweight CRUSH items

When a CRUSH bucket contains five Ceph OSDs with the following weights: weight osd.0 5 osd.1 1 osd.2 1 osd.3 1 osd.4 1 20% of the space in osd.0 will never be used by a pool with two replicas. The … Continue reading

Posted in ceph, crush | Leave a comment

Comprendre la démocratie liquide

J’ai beaucoup de mal à expliquer l’idée de démocratie liquide et ce n’est pas faute d’avoir essayé. Peut-être que le coté récursif de la délégation de vote n’est pas naturel pour les non-informaticiens. A l’occasion de l’entre deux tour des … Continue reading

Posted in Liquid Democracy | Leave a comment

Ceph full ratio and uneven CRUSH distributions

A common CRUSH rule in Ceph is step chooseleaf firstn 0 type host meaning Placement Groups (PGs) will place replicas on different hosts so the cluster can sustain the failure of any host without losing data. The missing replicas are … Continue reading

Posted in ceph, crush, libcrush | Leave a comment

Improving PGs distribution with CRUSH weight sets

In a Ceph cluster with a single pool of 1024 Placement Groups (PGs), the PG distribution among devices will not be as expected. (see Predicting Ceph PG placement for details about this uneven distribution). In the following, the difference between … Continue reading

Posted in ceph, crush, libcrush | Leave a comment

Faster Ceph CRUSH computation with smaller buckets

The CRUSH function maps Ceph placement groups (PGs) and objects to OSDs. It is used extensively in Ceph clients and daemons as well as in the Linux kernel modules and its CPU cost should be reduced to the minimum. It … Continue reading

Posted in ceph, libcrush | 1 Comment

Predicting Ceph PG placement

When creating a new Ceph pool, deciding for the number of PG requires some thinking to ensure there are a few hundred PGs per OSD. The distribution can be verified with crush analyze as follows: $ crush analyze –rule data … Continue reading

Posted in ceph | 2 Comments

How many objects will move when changing a crushmap ?

After a crushmap is changed (e.g. addition/removal of devices, modification of weights or tunables), objects may move from one device to another. The crush compare command can be used to show what would happen for a given rule and replication … Continue reading

Posted in ceph | Leave a comment

Predicting which Ceph OSD will fill up first

When a device is added to Ceph, it is assigned a weight that reflects its capacity. For instance if osd.1 is a 1TB disk, its weight will be 1.0 and if osd.2 is a 4TB disk, its weight will be … Continue reading

Posted in ceph | Leave a comment

logging udev events at boot time

Adapted from Peter Rajnoha post: create a special systemd unit to monitor udev during boot: cat > /etc/systemd/system/systemd-udev-monitor.service <<EOF [Unit] Description=udev Monitoring DefaultDependencies=no Wants=systemd-udevd.service After=systemd-udevd-control.socket systemd-udevd-kernel.socket Before=sysinit.target systemd-udev-trigger.service [Service] Type=simple ExecStart=/usr/bin/sh -c “/usr/sbin/udevadm monitor –udev –env > /udev_monitor.log” [Install] WantedBy=sysinit.target … Continue reading

Posted in Uncategorized | Leave a comment