Welcome to the Free Software contributions diary of Loïc Dachary. Although the posts look like blog entries, they really are technical reports about the work done during the day. They are meant to be used as a reference by co-developers and managers. Erasure Code Patents StreamScale.

Using a cloud image with kvm

It would be convenient to have a virt-builder oneliner such as

$ virt-builder --arch i386 --ssh-inject ~/.ssh/id_rsa.pub fedora-21

to get an image suitable to run and login with

$ qemu-kvm -m 1024 -net user,hostfwd=tcp::2222-:22 \
  -drive file=fedora-21.qcow2 &
$ ssh -p 2222 localhost grep PRETTY /etc/os-release
PRETTY_NAME="Fedora 21 (Twenty One)"

Docker users have a simpler form because there is no need to ssh to enter the container:

$ docker run fedora:21 grep PRETTY /etc/os-release
PRETTY_NAME="Fedora 21 (Twenty One)"

It is not currently possible to use virt-builder as described above because

  • the set of images available by default is limited (no i386 architecture for instance)
  • the –inject-ssh option is only available in the development version

The libguestfs.org toolbox can however be used to implement a script modifying images prepared for the cloud (see ubuntu cloud images for instance):

  • wget the image
    wget -O my.img http://cloud-images.ubuntu.com/trusty/current/trusty-server-cloudimg-i386-disk1.img
    
  • create a config-drive for cloud-init to feed it the ssh public key.
    mkdir -p config-drive
    cat > config-drive/user-data <<EOF
    #cloud-config
    ssh_authorized_keys:
     - $(cat ~/.ssh/id_rsa.pub)
    chpasswd: { expire: False }
    EOF
    cat > config-drive/meta-data <<EOF
    instance-id: iid-123459
    local-hostname: testhost
    EOF
    ( cd config-drive ; LIBGUESTFS_BACKEND=direct virt-make-fs \
      --type=msdos --label=cidata .  ../config-drive.img )
    
  • launch the image with the config drive attached and it will be auto detected
    qemu-kvm -m 1024 -net user,hostfwd=tcp::2222-:22 \
      -drive file=my.img -drive config-drive.img
    

Continue reading

Posted in Uncategorized | 2 Comments

Ceph make check in a ram disk

When running tests from the Ceph sources, the disk is used intensively and a ram disk can be used to reduce the latency. The kernel must be rebooted to set the ramdisk maximum size to 16GB. For instance on Ubuntu 14.04 in /etc/default/grub (the module name which could be rb or brd depending).

GRUB_CMDLINE_LINUX="brd.rd_size=16777216" # 16GB in KB

the grub configuration must then be updated with

sudo update-grub

After reboot the ram disk is formatted as an ext4 file system and mounted:

$ cat /sys/module/brd/parameters/rd_size
16777216
$ sudo mkfs -t ext4 /dev/ram1
$ sudo mount /dev/ram1 /srv
$ df -h /srv
Filesystem      Size  Used Avail Use% Mounted on
/dev/ram1        16G   44M   15G   1% /srv
$ free -g
             total       used       free     shared    buffers     cached
Mem:            31          0         31          0          0          0
-/+ buffers/cache:          0         31

Cloning ceph, compiling and running tests should now take less than 15 minutes with

$ git clone https://github.com/ceph/ceph
$ cd ceph
$ ./run-make-check.sh

When the ram disk is umounted, some of the memory used by the ram disk is still in use

$ free -g
             total       used       free     shared    buffers     cached
Mem:            31         27          4          0          0         17
-/+ buffers/cache:          9         22
$ sudo umount /srv
$ free -g
             total       used       free     shared    buffers     cached
Mem:            31         18         13          0          0          8
-/+ buffers/cache:          9         22

It can be flushed with

$ sudo blockdev --flushbufs /dev/ram1
$ free -g
             total       used       free     shared    buffers     cached
Mem:            31          9         22          0          0          8
-/+ buffers/cache:          0         31

Continue reading

Posted in ceph | Leave a comment

Upgrade nodejs on Ubuntu 14.04

To run gh a version of nodejs more recent than the one packaged by default on Ubuntu 14.04 is required:

$ apt-cache policy nodejs
nodejs:
  Installed: 0.10.25~dfsg2-2ubuntu1
  Candidate: 0.10.25~dfsg2-2ubuntu1
  Version table:
 *** 0.10.25~dfsg2-2ubuntu1 0
        500 http://fr.archive.ubuntu.com/ubuntu/ trusty/universe amd64 Packages
        100 /var/lib/dpkg/status
$ gh watch
fatal: Please update your NodeJS version: http://nodejs.org/download

The recommended way to upgrade is currently broken and the following can be used instead:

sudo add-apt-repository 'deb https://deb.nodesource.com/node trusty main'
sudo apt-get update
sudo apt-get install nodejs

If either apt-get update or apt-get install fail with a message like SSL: certificate subject name:

...
Err https://deb.nodesource.com trusty/main amd64 Packages
  SSL: certificate subject name (login.meteornetworks.com) does not match target host name 'deb.nodesource.com'
Ign http://ceph.com trusty/main Translation-en
Err https://deb.nodesource.com trusty/main i386 Packages
  SSL: certificate subject name (login.meteornetworks.com) does not match target host name 'deb.nodesource.com'
Ign https://deb.nodesource.com trusty/main Translation-en_US
Ign https://deb.nodesource.com trusty/main Translation-en
Ign http://get.docker.io docker/main Translation-en_US
Ign http://get.docker.io docker/main Translation-en
W: Failed to fetch https://deb.nodesource.com/node/dists/trusty/main/binary-amd64/Packages  SSL: certificate subject name (login.meteornetworks.com) does not match target host name 'deb.nodesource.com'
W: Failed to fetch https://deb.nodesource.com/node/dists/trusty/main/binary-i386/Packages  SSL: certificate subject name (login.meteornetworks.com) does not match target host name 'deb.nodesource.com'
E: Some index files failed to download. They have been ignored, or old ones used instead.

The following will fix it:

echo 'Acquire::https::deb.nodesource.com::Verify-Peer "false";' > /etc/apt/apt.conf.d/99verify

Alternatively a version of gh that does not require a recent version of nodejs can be installed with

sudo npm install -g gh@1.9.4
Posted in Uncategorized | Leave a comment

Provisionning a teuthology target with a given kernel

When a teuthology target (i.e. machine) is provisioned with teuthology-lock for the purpose of testing Ceph, there is no way to choose the kernel. But it can be installed afterwards using the following:

cat > kernel.yaml <<EOF
interactive-on-error: true
roles:
- - mon.a
  - client.0
kernel:
   branch: testing
tasks:
- interactive:
EOF

Assuming the target on which the new kernel is to be installed is vpm083, running

$ teuthology  --owner loic@dachary.org \
  kernel.yaml <(teuthology-lock --list-targets vpm083)
...
2015-03-09 17:47 INFO:teuthology.task.internal:Starting timer...
2015-03-09 17:47 INFO:teuthology.run_tasks:Running task interactive...
Ceph test interactive mode, use ctx to interact with the cluster
>>>

will install an alternate kernel and reboot the machine:

[ubuntu@vpm083 ~]$ uname -a
Linux vpm083 3.19.0-ceph-00029-gaf5b96e #1 SMP Thu Mar 5 01:04:25 GNU/Linux
[ubuntu@vpm083 ~]$ lsb_release -a
LSB Version:	:base-4.0-amd64:base-4.0-noarch:
Distributor ID:	RedHatEnterpriseServer
Description:  release 6.5 (Santiago)
Release:	6.5
Codename:	Santiago

Command line arguments to the kernel may be added to /boot/grub/grub.conf. For instance loop.max_part=16 to allow partition creation on /dev/loop devices:

default=0
timeout=5
splashimage=(hd0,0)/boot/grub/splash.xpm.gz
hiddenmenu
title rhel-6.5-cloudinit (3.19.0-ceph-00029-gaf5b96e)
        root (hd0,0)
        kernel /boot/vmlinuz-3.19.0 ro root=LABEL=79d3d2d4  loop.max_part=16
        initrd /boot/initramfs-3.19.0.img
Posted in ceph | Leave a comment

Ceph OSD uuid conversion to OSD id and vice versa

When handling a Ceph OSD, it is convenient to assign it a symbolic name that can be chosen even before it is created. That’s what the uuid argument for ceph osd create is for. Without a uuid argument, a random uuid will be assigned to the OSD and can be used later. Since the ceph osd create uuid is idempotent, it can also be used to lookup the id of a given OSD.

$ osd_uuid=b2e780fc-ec82-4a91-a29d-20cd9159e5f6
# convert the OSD uuid into an OSD id
$ ceph osd create $osd_uuid
0
# convert the OSD id into an OSD uuid
$ ./ceph --format json osd dump | jq '.osds[] | select(.osd==0) | .uuid'
"b2e780fc-ec82-4a91-a29d-20cd9159e5f6"
Posted in ceph | Leave a comment

Re-schedule failed teuthology jobs

The Ceph integration tests may fail because of environmental problems (network not available, packages not built, etc.). If six jobs failed out of seventy, these failed test can be re-run instead of re-scheduling the whole suite. It can be done using the **–filter** option of teuthology-suite with a comma separated list of the job description that failed.
The job description can either be copy/pasted from the web interface or extracted from the paddles json output with:

$ run=loic-2015-03-03_12:46:38-rgw-firefly-backports---basic-multi
$ paddles=http://paddles.front.sepia.ceph.com
$ eval filter=$(curl --silent $paddles/runs/$run/jobs/?status=fail |
  jq '.[].description' | \
  while read description ; do echo -n $description, ; done | \
  sed -e 's/,$//')

Where the paddles URL outputs a json description of each job of the form:

[
  {
    "os_type": "ubuntu",
    "nuke_on_error": true,
    "status": "pass",
    "failure_reason": null,
    "success": true,
...
    "description": "rgw/multifs/{clusters/fixed-2.yaml}"
  },
  {
    "os_type": "ubuntu",
...

The jobs/?status=fail part of the URL selects the jobs with "success":false. The jq expression displays the description field (.[].description), one by line. These lines are aggregated into a comma separated list (while read description ; do echo -n $description, ; done) and the trailing comma is stripped (sed -e 's/,$//'). The filter variable is set to the resulting line and evaled to get rid of the quotes (eval filter=$(..)).
The command used to schedule the entire suite can be re-used by adding the --filter="$filter" argument and will only run the failed jobs.

$ ./virtualenv/bin/teuthology-suite --filter="$filter" \
  --priority 101 --suite rgw --suite-branch firefly \
  --machine-type plana,burnupi,mira \
  --distro ubuntu --email loic@dachary.org \
  --owner loic@dachary.org  \
  --ceph firefly-backports
...
Suite rgw in suites/rgw scheduled 6 jobs.
Suite rgw in suites/rgw -- 56 jobs were filtered out.
Posted in ceph | Leave a comment

HOWTO extract a stack trace from teuthology (take 2)

When a Ceph teuthology integration test fails (for instance a rados jobs), it will collect core dumps which can be downloaded from the same directory where the logs and config.yaml files can be found, under the remote/mira076/coredump directory.
The binary from which the core dump comes from can be displayed with:

$ file 1425077911.7304.core
ELF 64-bit LSB  core file x86-64, version 1, from 'ceph-osd -f -i 3'

The teuthology logs contains command lines that can be used to install the corresponding binaries:

$ echo deb http://gitbuilder.ceph.com/ceph-deb-trusty-x86_64-basic\
/sha1/e54834bfac3c38562987730b317cb1944a96005b trusty main | \
  sudo tee /etc/apt/sources.list.d/ceph.list
$ sudo apt-get update
$ sudo DEBIAN_FRONTEND=noninteractive apt-get -y --force-yes \
  -o Dpkg::Options::="--force-confdef" \
  -o Dpkg::Options::="--force-confold" install \
  ceph=0.80.8-75-ge54834b-1trusty \
  ceph-dbg=0.80.8-75-ge54834b-1trusty

The ceph-dbg package contains debug symbols that will automatically be used by gdb(1):

$ gdb /usr/bin/ceph-osd 1425077911.7304.core
...
Reading symbols from /usr/bin/ceph-osd...
Reading symbols from /usr/lib/debug//usr/bin/ceph-osd...done.
...
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `ceph-osd -f -i 3'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f59d6e9af07 in _dl_map_object_deps at dl-deps.c:528
(gdb) bt
#0  0x00007f59d6e9af07 in _dl_map_object_deps at dl-deps.c:528
#1  0x00007f59d6ea1aab in dl_open_worker at dl-open.c:272
#2  0x00007f59d6e9cff4 in _dl_catch_error  at dl-error.c:187
#3  0x00007f59d6ea13bb in _dl_open    at dl-addr.c:61
#5  __GI__dl_addr at dl-addr.c:137
#6  0x00007f59c06dcbc0 in ?? ()
#7  0x00007f59d70b11c8 in _r_debug ()
#8  0x00007f59c06dcba0 in ?? ()
#9  0x00007f59c06dcbb0 in ?? ()
#10 0x00007f59c06dcb90 in ?? ()
#11 0x00007f59c06dca94 in ?? ()
#12 0x0000000000000000 in ?? ()
Posted in ceph | Leave a comment

An example of controlled technical debt

When I started working to help with Ceph backports, I was not familiar with the workflow (who does what, when and why) or the conventions (referencing commits from redmine issues, the redmine backport field, …). I felt the need for scripts to help me cross reference information (from git, github and redmine) and consolidate them into an inventory which I could use as a central point to measure progress and find what needed to be done. But I was not able to formulate this in so many words and at the beginning it was little more than a vague feeling that I would quickly be lost if I did not write down my findings. I chose to write a script, with no tests and no structure, to do things like matching a pull request with a redmine issue when the only clue was a Fixes: #XXX embedded in the comment one of the commits.

After a few weeks the script grew into a 500 lines monstrosity, extremely useful and quite impossible to maintain in the long run. My excuse was that I had no clue what I needed to begin with and that I could not have understood the backport workflow without this script. After the first backport release was declared ready, I stopped adding functionalities and re-started from scratch what became the ceph-workbench backport sub command.

This refactor was done without modifying the behavior of the original script (there were only a few occurrences where it was impossible to preserve). The architecture of the script was completely new: the original script was a near linear sequence of operations with only global variables. The quick summary is that the script pulls information from a few sources (one class for redmine, one for gitlab, one for git), cross reference them with ad-hoc methods and display them into rdoc pages to be displayed in the wiki.

Writing unit tests helped proceed incrementally, pulling one code snippet after the other and checking they were not broken by the refactor. Instead of unit testing the top level command, integration tests were written and run via tox, using real gitlab and redmine instances as fixtures running in docker containers. It will help when adding new use cases such as scrapping the ceph-qa mailing list to match teuthology job failures with the corresponding redmine issue or interpreting the Backport: field in commit messages.

Posted in Uncategorized | Leave a comment

How was a cherry-pick conflict resolved ?

When a git cherry-pick fails because of a conflict, it can be resolved and committed. The reviewer is reminded that a conflict had to be resolved by the Conflicts section at the end of the message body:

commit 7b8e5c99a4a40ae788ad29e36b0d714f529b12eb
Author: John Spray 
Date:   Tue May 20 16:25:19 2014 +0100
...
    Signed-off-by: John Spray 
    (cherry picked from commit 1d9e4ac2e2bedfd40ee2d91a4a6098150af9b5df)
    Conflicts:
    	src/crush/CrushWrapper.h

The difference between the original commit and the cherry-picked commit including the conflict resolution can be displayed with:

commit=7b8e5c99a4a40ae788ad29e36b0d714f529b12eb
picked_from=$(git show --no-patch --pretty=%b $commit  |
  perl -ne 'print if(s/.*cherry picked from commit (\w+).*/$1/)')
diff -u --ignore-matching-lines '^[^+-]' \
   < (git show $picked_from) <(git show $commit)

Continue reading

Posted in git | Leave a comment

Script to enable redmine REST API

When redmine is installed in a container (as a test fixture for instance) with

$ docker run --name=redmine -d -p 10080:80 \
  -v $(pwd)/data/redmine:/home/redmine/data \
  -v /var/run/docker.sock:/run/docker.sock \
  -v $(which docker):/bin/docker  sameersbn/redmine:2.6.1

the following script can be used to enable the REST API with redmine-enable-rest-api.py http://localhost:10080 admin admin

import sys
import re
import requests

def params(page):
    (csrf_token,)=re.findall(r'meta content="(.*?)" name="csrf-token"', page.text)
    (csrf_param,)=re.findall(r'meta content="(.*?)" name="csrf-param"', page.text)
    return {csrf_param:csrf_token}

def enable_rest_api(url, user, password):
    s = requests.Session()

    p = params(s.get(url + '/login'))
    p.update({"username":user, "password":password})
    s.post(url + '/login', params=p)

    p = params(s.get(url + '/settings?tab=authentication'))
    p.update({'settings[rest_api_enabled]':'1'})
    s.post(url + '/settings/edit?tab=authentication', params=p)

enable_rest_api(*sys.argv[1:])
Posted in redmine | Leave a comment