Welcome to the Free Software contributions diary of Loïc Dachary. Although the posts look like blog entries, they really are technical reports about the work done during the day. They are meant to be used as a reference by co-developers and managers.
The jerasure library is the default erasure code plugin of Ceph. The gf-complete companion library supports SSE optimizations at compile time, when the compiler provides them (-msse4.2 etc.). The jerasure (and gf-complete with it) plugin is compiled multiple times with various levels of SSE features:
- jerasure_sse4 uses SSE4.2, SSE4.1, SSSE3, SSE3, SSE2, SSE
- jerasure_sse3 uses SSSE3, SSE3, SSE2, SSE
- jerasure_generic uses no SSE instructions
When an OSD loads the jerasure plugin, the CPU features are probed and the appropriate plugin is selected depending on their availability.
The gf-complete source code is cleanly divided into functions that take advantage of specific SSE features. It should be easy to use the ifunc attribute to semi-manually select each function individually, at runtime and without performance penalty (because the choice is made the first time the function is called and recorded for later calls). With such a fine grain selection, there would be no need to compile three plugins because each function would be compiled with exactly the set of flag it needs.
The Ceph erasure code plugin must run on Intel CPU that have no SSE4.2 support. A Qemu is run without SSE4.2 support:
qemu-system-x86_64 -machine accel=kvm:tcg -m 2048 \
-drive file=server.img -boot c \
-display sdl \
-net nic -net user,hostfwd=tcp::2222-:22 \
-fsdev local,security_model=passthrough,id=fsdev0,path=~/ceph \
The qemu CPU has no SSE4.2 although the native CPU has it:
$ grep sse4.2 /proc/cpuinfo | wc -l
$ ssh -p 2222 firstname.lastname@example.org grep sse4.2 /proc/cpuinfo | wc -l
The local development directory is a Plan 9 folder shared over Virtio mounted inside the VM:
sudo mount -t 9p -o trans=virtio,version=9p2000.L hostshare /home/loic/ceph
and the functional test is run to assert that encoding and decoding an object:
$ cd /home/loic/ceph/src
[----------] Global test environment tear-down
[==========] 16 tests from 8 test cases ran. (30 ms total)
[ PASSED ] 16 tests.
$ rbd create --size $((1024 * 1024 * 1024 * 1024)) tiny
$ rbd info tiny
rbd image 'tiny':
size 1024 PB in 274877906944 objects
order 22 (4096 kB objects)
Note: rbd rm tiny will take a long time.
Gource is run on the Ceph git repository for each of the 192 developers who contributed to its development over the past six years. Their footprint is the last image of a video clip created from all the commits they authored.
Posted in ceph, git, gource
The gf-complete and jerasure libraries implement the erasure code functions used in Ceph. They were copied in Ceph in 2013 because there were no reference repositories at the time. The copy was removed from the Ceph repository and replaced by git submodules to decouple the release cycles.
The AMT of an ASRock Q87M motherboard is configured to enable remote power control (power cycle) and display of the BIOS and the console. It is a cheap alternative to iLO or IPMI that can be used with Free Software. AMT is a feature of vPro that was available in 2011 with some Sandy Bridge chipsets. It is included in many of the more recent Haswell chipsets.
The following is a screenshot of vinagre connected to the AMT VNC server displaying the BIOS of the ASRock Q87M motherboard.
L’erasure code, c’est aussi le RAID5, qui permet de perdre un disque dur sans perdre ses données. Du point de vue de l’utilisateur, le concept est simple et utile, mais pour la personne qui est chargée de concevoir le logiciel qui fait le travail, c’est un casse-tête. On trouve des boîtiers RAID5 à trois disques dans n’importe quelle boutique : quand l’un d’eux cesse de fonctionner, on le remplace et les fichiers sont toujours là. On pourrait imaginer ça avec six disques dont deux cessent de fonctionner simultanément. Mais non : au lieu d’avoir recours à une opération XOR, assimilable en cinq minutes, il faut des corps de Galois, un bon bagage mathématique et beaucoup de calculs. Pour corser la difficulté, dans un système de stockage distribué tel que Ceph, les disques sont souvent déconnectés temporairement pour cause d’indisponibilité réseau.
The Ceph erasure code plugin benchmark for jerasure version 1 are compared after an upgrade to jerasure version 2, using the same command, on the same hardware.
- Encoding: 5.2GB/s which is ~20% better than 4.2GB/s
- Decoding: no processing necessary (because the code is systematic)
- Recovering the loss of one OSD: 11.3GB/s which is ~13% better than 10GB/s
- Recovering the loss of two OSD: 4.42GB/s which is ~35% better than 3.2GB/s
The relevant lines from the full output of the benchmark are:
seconds KB plugin k m work. iter. size eras.
0.088136 1048576 jerasure 6 2 decode 1024 1048576 1
0.226118 1048576 jerasure 6 2 decode 1024 1048576 2
0.191825 1048576 jerasure 6 2 encode 1024 1048576 0
The improvements are likely to be greater for larger K+M values.
The OpenStack Foundation is delivering a training program to accelerate the speed at which new OpenStack developers are successful at integrating their own roadmap into that of the OpenStack project. If you’re a new OpenStack contributor or plan on becoming one soon, you should sign up for the next OpenStack Upstream Training in Atlanta, May 10-11. Participation is strongly advised also for first time participants to OpenStack Design Summit.