<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Loïc Dachary</title>
	<atom:link href="http://dachary.org/?feed=rss2" rel="self" type="application/rss+xml" />
	<link>http://dachary.org</link>
	<description>Free Software Developer Journey</description>
	<lastBuildDate>Fri, 24 May 2013 07:52:29 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.4</generator>
		<item>
		<title>Bio++: efficient, extensible libraries and tools for computational molecular evolution</title>
		<link>http://dachary.org/?p=2002</link>
		<comments>http://dachary.org/?p=2002#comments</comments>
		<pubDate>Wed, 22 May 2013 07:33:25 +0000</pubDate>
		<dc:creator>Loic Dachary</dc:creator>
				<category><![CDATA[Bio++]]></category>
		<category><![CDATA[debian]]></category>

		<guid isPermaLink="false">http://dachary.org/?p=2002</guid>
		<description><![CDATA[Efficient algorithms and programs for the analysis of the ever-growing amount of biological sequence data are strongly needed in the genomics era. The pace at which new data and methodologies are generated calls for the use of pre-existing, optimized – &#8230; <a href="http://dachary.org/?p=2002">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>Efficient algorithms and programs for the analysis of the ever-growing amount of biological sequence data are strongly needed in the genomics era. The pace at which new data and methodologies are generated calls for the use of pre-existing, optimized – yet extensible – code, typically distributed as libraries or packages. This motivated the Bio++ project, aiming at developing a set of C++ libraries for sequence analysis, phylogenetics, population genetics and molecular evolution. The main attractiveness of Bio++ is the extensibility and reusability of its components through its object-oriented design, without compromising on the computer-efficiency of the underlying methods. We present here the second major release of the libraries, which provides an extended set of classes and methods. These extensions notably provide built-in access to sequence databases and new data structures for handling and manipulating sequences from the omics era, such as multiple genome alignments and sequencing reads libraries. More complex models of sequence evolution, such as mixture models and generic n-tuples alphabets, are also included.<br />
The <a href="http://mbe.oxfordjournals.org/content/early/2013/05/21/molbev.mst097.abstract?keytype=ref&#038;ijkey=tpNducmIGMDMEdH">article was published May 21st, 2013</a>. Read the full article : <a href='http://dachary.org/wp-uploads/2013/05/biopp.pdf'>Bio++: efficient, extensible libraries and tools for computational molecular evolution</a></p>
]]></content:encoded>
			<wfw:commentRss>http://dachary.org/?feed=rss2&#038;p=2002</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Virtualizing legacy hardware in OpenStack</title>
		<link>http://dachary.org/?p=1991</link>
		<comments>http://dachary.org/?p=1991#comments</comments>
		<pubDate>Sun, 19 May 2013 22:43:01 +0000</pubDate>
		<dc:creator>Loic Dachary</dc:creator>
				<category><![CDATA[Essex]]></category>
		<category><![CDATA[debian]]></category>
		<category><![CDATA[openstack]]></category>

		<guid isPermaLink="false">http://dachary.org/?p=1991</guid>
		<description><![CDATA[A five years old hardware is being decommissioned and hosts fourteen vservers on a Debian GNU/Linux lenny running a 2.6.26-2-vserver-686-bigmem linux kernel. The April non profit relies on these services (mediawiki, pad, mumble, etc. ) for the benefit of its &#8230; <a href="http://dachary.org/?p=1991">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>A five years old hardware is being decommissioned and hosts fourteen <a href="http://linux-vserver.org/">vservers</a> on a <a href="http://www.debian.org/releases/lenny/">Debian GNU/Linux lenny</a> running a <b>2.6.26-2-vserver-686-bigmem</b> linux kernel. The <a href="http://april.org/">April</a> non profit relies on these services (mediawiki, pad, mumble, etc. ) for the benefit of its 5,000 members and many working groups. Instead of migrating each vserver individually to an OpenStack instance, it was decided that the vserver host would be copied over to an OpenStack instance.<br />
The old hardware has 8GB of RAM, 150GB disk and a dual Xeon totaling 8 cores. The munin statistics show that no additional memory is needed, the disk is half full and an average of one core is used at all times. A 8GB RAM, 150GB disk and dual core openstack instance is prepared. The instance will be booted from a 150GB volume placed <a href="http://dachary.org/?p=1082">on the same hardware</a> to get maximum disk I/O speed.<br />
After the volume is created, it is mounted from the OpenStack node and the disk of the old machine is rsync&#8217;ed to it. It is then booted after modifying a few files such as fstab. The OpenStack node is in the same rack and the same switch as the old hardware. The IP is removed from the interface of the old hardware and it is bound to the OpenStack instance. Because it is running on nova-network with multi-host activated, it is bound to the interface of the OpenStack node which can take over immediately. The public interface of the node is set as an ARP proxy to advertise the bridge where the instance is connected. The security group of the instance are disabled ( by opening all protocols and ports ) because a firewall is running in the instance.<br />
<span id="more-1991"></span></p>
<h3>Collocated hardware</h3>
<p>The OpenStack cluster used to migrate the legacy hardware is configured to allow the collocation of instances and volumes <a href="http://dachary.org/?p=1082">on the same hardware</a>. One OpenStack availability zone groups hardware located in the same rack and uses the same switch as the legacy hardware. This allows for a migration that does not involve changing the IP of the machine. If the OpenStack nodes were located in a different <a href="http://en.wikipedia.org/wiki/Autonomous_System_%28Internet%29">autonomous system</a>, a DNS change would be necessary and require additional preparations.</p>
<h3>Maintenance LAN connection</h3>
<p>The primary IP address used by the legacy hardware is also used by a number of services provided by the vservers it hosts. Moving this IP address to the OpenStack instance would mean losing access to the legacy hardware, without any hope to fallback, should something unexpected happen. Because both machines involved in the migration are connected to the same switch and use the same VLAN, an additional IP address is manually added to preserve communications:</p>
<pre>
ns1 : ip addr add 10.222.222.1/24 dev eth0
yopo : ip addr add 10.222.222.2/24 dev eth0
</pre>
<h3>Preparations</h3>
<p>In the following, <b>desktop</b> is any machine on which there are enough credentials to either connect to the legacy machine using ssh, run nova commands or EC2 commands targeting the OpenStack cluster, <b>yopo</b> is the OpenStack node, <b>ns1</b> is the legacy hardware.</p>
<pre>
desktop: euca-create-volume --zone bm0008 --size 150
+----+-----------+--------------+------+-------------+-------------+
| ID |   Status  | Display Name | Size | Volume Type | Attached to |
+----+-----------+--------------+------+-------------+-------------+
| 3f | available | None         | 150  | None        |             |
+----+-----------+--------------+------+-------------+-------------+
desktop: nova volume-list | grep " $(printf "%d" 0x3f) "
| 63 | available      | None             | 150  | None        |                                      |
</pre>
<p>The <b>bm0008</b> is the availability zone matching the OpenStack node known as <b>yopo</b>. Note that <strong>euca-create-volume</strong> which is an EC2 command reports the volume id as an exadecimal number but <b>nova volume-list</b> shows it as a decimal number. The hexadecimal form is used to name the LV volumes of the LVM backend. A partition table is then created on the 150GB volume and configured to have a single primary partition taking all the space.</p>
<pre>
yopo: kpartx -av /dev/vg/volume-0000003f
yopo: mkfs.ext3 /dev/mapper/vg-volume--0000003f1
yopo: mount /dev/mapper/vg-volume--0000003f1 /mnt
yopo: rsync -i --exclude=/etc/fstab --exclude=70-persistent-net.rules \
 --exclude=/boot/grub \
 --exclude=/srv/backup \
  --exclude=/var/cache \
  --exclude=/var/lib/backuppc \
  --exclude=/var/tmp \
  --exclude=/proc \
  --exclude=/sys -avHS --delete --numeric-ids 10.222.222.1:/ /mnt/
</pre>
<p>The partition is  formatted with <b>ext3</b> instead of <b>ext4</b> to avoid any issues : the installed lenny from <b>ns1</b> only uses ext3. A copy of the <b>ns1</b> disk is made and excludes files that will either be replaced or contain data that are not worth replicating. </p>
<pre>
yopo: echo 'proc /proc proc defaults 0 0' > /mnt/etc/fstab
yopo: echo '/dev/vda1 / ext3 defaults,errors=remount-ro 0 1' >> /mnt/etc/fstab
</pre>
<p>The <b>fstab</b> is rewritten entirely to take into account the presence of a single partition ( as opposed to seven on <b>ns1</b> ) and a device name starting with <b>/dev/vd</b> instead of <b>/dev/hd</b> or <b>/dev/sd</b>. </p>
<pre>
yopo: cp /mnt/boot/vmlinuz-2.6.26-2-vserver-686-bigmem /tmp
yopo: cp /mnt/boot/initrd.img-2.6.26-2-vserver-686-bigmem /tmp
yopo: umount
yopo: kpartx -dv /dev/vg/volume-0000003f
yopo: sed -i -e 's:kopt=.*:kopt=root=/dev/vda1' \
 -e 's/default=.*/default=0/' \
 -e 's/groot=.*/groot=(hd0,0)/' /boot/grub/menu.lst
yopo: echo '(hd0) /dev/vda' > /mnt/boot/grub/device.map
yopo: kvm -m 1024 -drive file=/dev/mapper/vg-volume--0000003f,if=virtio,index=0 \
  -boot c -initrd /tmp/initrd.img-2.6.26-2-vserver-686-bigmem\
   -kernel /tmp/vmlinuz-2.6.26-2-vserver-686-bigmem -append 'root=/dev/vda1' \
  -net nic -net user -nographic -curses -monitor unix:/tmp/file.mon,server,nowait
curses: grub-install /dev/vda
curses: update-grub
curses: halt
</pre>
<p>Grub is installed on the disk by using <b>kvm</b> to actually boot the instance, using a curses based console instead of a VGA console. The grub menu is edited to update the <b>menu.lst</b> and the <b>device.map</b> to reflect the changes with the disk and the partition table.  The kernel and initrd are copied out of the file system imported from <b>ns1</b> to be given as arguments to <b>kvm</b> to allow it to boot under conditions that are close to the one existing on the legacy hardware. Once the machine is successfully booted, <b>grub-install</b> and <b>update-grub</b> are called to allow <b>kvm</b> to boot without an external kernel. It can be verified with:</p>
<pre>
yopo: kvm -m 1024 -drive file=/dev/mapper/vg-volume--0000003f,if=virtio,index=0 \
  -boot c -net nic -net user -nographic \
  -curses -monitor unix:/tmp/file.mon,server,nowait
</pre>
<h3>Routing  the public IP</h3>
<p>The legacy installation for <b>ns1</b> does not obtain its IP address from DHCP and may contain a number of occurrence of this IP in various configuration files. The OpenStack node is configured to add a route dedicated to this IP by adding the following to <b>/etc/rc.local</b>.</p>
<pre>
brctl addbr br2004
ip link set br2004 up
ip r add 88.191.240.4/32 dev br2004
</pre>
<p>The <b>br2004</b> bridge is dedicated to the tenant used to run the OpenStack instance, as shown by <b>2004</b> :</p>
<pre>
desktop: keystone tenant-list | grep ' april '
| 7c918c873280465da3785f5699d48316 | april           | True    |
desktop: nova-manage network list | grep 7c918c873280465da3785f5699d48316
5 10.145.4.0/24 None 10.145.4.3 None None 2004 7c918c873280465da3785f5699d48316 20941588-2c35-40b3-9ecb-af87cadae446
</pre>
<p>The bridge can be created before OpenStack runs so that the public IP can be routed to it. The existing router will be used by OpenStack.</p>
<h3>Migrating</h3>
<p>The rsync command shown is run to update copy, without stoping any service. </p>
<pre>
yopo: ssh 10.222.222.1 ip addr del 88.191.250.4/27 dev eth0
yopo: ssh 10.222.222.1 /etc/init.d/util-vserver stop
</pre>
<p>The rsync command is run again after stopping all vservers on <b>ns1</b> and removing the IP from the interface.</p>
<pre>
yopo: umount /mnt
yopo: kpartx -dv /dev/vg/volume-0000003f
desktop: ssh controller.vm.april-int nova boot \
 --image 'CirrOS 0.3' \
 --block_device_mapping vda=63::0:0 \
 --flavor e.1-cpu.0GB-disk.8GB-ram \
 --key_name loic --availability_zone=bm0008 ns1 --poll
</pre>
<p>The partition is unmounted and the instance booted from the volume. It should recover as if a power failure happened.</p>
<pre>
yopo: ip r add 88.191.240.4/32 dev br2004
</pre>
<p>After the public IP is routed to the bridge <b>br2004</b> to which the newly created instance is connected, the services should be up and communicated properly.</p>
<h3>Setup the arp proxy</h3>
<p>The interface of the OpenStack node that is used for floating IPs must be configured as an arp proxy.</p>
<pre>
echo 1 > /proc/sys/net/ipv4/conf/eth0/proxy_arp
echo 1 > /proc/sys/net/ipv4/conf/br2004/proxy_arp
</pre>
<p>These lines are appended to <b>/etc/rc.local</b> so that they are run at boot time. The switch to which both machines were connected has an arp cache. It needs to be cleared so that it notices that packets must be sent to another MAC.</p>
<pre>
 ip addr add 88.191.250.4/32 dev eth0
arping -U 88.191.240.4 -I eth0
ip addr del 88.191.250.4/32 dev eth0
</pre>
]]></content:encoded>
			<wfw:commentRss>http://dachary.org/?feed=rss2&#038;p=1991</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>OpenStack Upstream University training</title>
		<link>http://dachary.org/?p=1978</link>
		<comments>http://dachary.org/?p=1978#comments</comments>
		<pubDate>Tue, 14 May 2013 19:49:27 +0000</pubDate>
		<dc:creator>Loic Dachary</dc:creator>
				<category><![CDATA[Upstream University]]></category>
		<category><![CDATA[openstack]]></category>

		<guid isPermaLink="false">http://dachary.org/?p=1978</guid>
		<description><![CDATA[Upstream University training for OpenStack contributors include a live session where students contribute to a Lego town. They have to comply with the coding standards imposed by the existing buildings. More than fifteen participants created an impressive city within a &#8230; <a href="http://dachary.org/?p=1978">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://upstream-university.org/">Upstream University</a> <a href="http://www.youtube.com/watch?v=27RFn6ASemk">training for OpenStack contributors</a> include a live session where students contribute to a Lego town. They have to comply with the coding standards imposed by the existing buildings. More than fifteen participants created an impressive city within a few hours during the session held in may 2013. The images speak for themselves. The <a href="http://upstream-university.org/apply/">next sessions</a> will be in Paris in June and Portland in July.<br />
<a href="http://dachary.org/wp-uploads/2013/05/upun-lego-07.jpg"><img src="http://dachary.org/wp-uploads/2013/05/upun-lego-07.jpg" alt="" title="upun-lego-07" width="800" height="450" class="alignleft size-full wp-image-1985" /></a><br />
<span id="more-1978"></span><br />
<a href="http://dachary.org/wp-uploads/2013/05/upun-lego-06.jpg"><img src="http://dachary.org/wp-uploads/2013/05/upun-lego-06.jpg" alt="" title="upun-lego-06" class="alignleft size-full wp-image-1984" /></a><br />
<a href="http://dachary.org/wp-uploads/2013/05/upun-lego-05.jpg"><img src="http://dachary.org/wp-uploads/2013/05/upun-lego-05.jpg" alt="" title="upun-lego-05"  class="alignleft size-full wp-image-1984" /></a><br />
<a href="http://dachary.org/wp-uploads/2013/05/upun-lego-04.jpg"><img src="http://dachary.org/wp-uploads/2013/05/upun-lego-04.jpg" alt="" title="upun-lego-04" class="alignleft size-full wp-image-1984" /></a><br />
<a href="http://dachary.org/wp-uploads/2013/05/upun-lego-03.jpg"><img src="http://dachary.org/wp-uploads/2013/05/upun-lego-03.jpg" alt="" title="upun-lego-03" class="alignleft size-full wp-image-1984" /></a><br />
<a href="http://dachary.org/wp-uploads/2013/05/upun-lego-02.jpg"><img src="http://dachary.org/wp-uploads/2013/05/upun-lego-02.jpg" alt="" title="upun-lego-02" class="alignleft size-full wp-image-1984" /></a><br />
<a href="http://dachary.org/wp-uploads/2013/05/upun-lego-01.jpg"><img src="http://dachary.org/wp-uploads/2013/05/upun-lego-01.jpg" alt="" title="upun-lego-01"  class="alignleft size-full wp-image-1984" /></a></p>
]]></content:encoded>
			<wfw:commentRss>http://dachary.org/?feed=rss2&#038;p=1978</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Installing ceph with ceph-deploy</title>
		<link>http://dachary.org/?p=1971</link>
		<comments>http://dachary.org/?p=1971#comments</comments>
		<pubDate>Mon, 13 May 2013 09:16:55 +0000</pubDate>
		<dc:creator>Loic Dachary</dc:creator>
				<category><![CDATA[Raring]]></category>
		<category><![CDATA[Ubuntu]]></category>
		<category><![CDATA[ceph]]></category>

		<guid isPermaLink="false">http://dachary.org/?p=1971</guid>
		<description><![CDATA[A ceph-deploy package is created for Ubuntu raring and installed with dpkg -i ceph-deploy_0.0.1-1_all.deb A ssh key is generated without a password and copied over to the root .ssh/authorized_keys file of each host on which ceph-deploy will act: # ssh-keygen &#8230; <a href="http://dachary.org/?p=1971">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>A <a href="https://github.com/ceph/ceph-deploy">ceph-deploy</a> package is created for <a href="http://releases.ubuntu.com/raring/">Ubuntu raring</a> and installed with</p>
<pre>
dpkg -i ceph-deploy_0.0.1-1_all.deb
</pre>
<p>A ssh key is generated without a password and copied over to the root <b>.ssh/authorized_keys</b> file of each host on which ceph-deploy will act:</p>
<pre>
# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
ca:1f:c3:ce:8d:7e:27:54:71:3b:d7:31:32:14:ba:68 root@bm0014.the.re
The key's randomart image is:
+--[ RSA 2048]----+
|            .o.  |
|            oo.o |
|           . oo.+|
|          . o o o|
|        SE o   o |
|     . o. .      |
|      o +.       |
|       + =o .    |
|       .*..o     |
+-----------------+
# for i in 12 14 15
do
 ssh bm00$i.the.re cat \>\> .ssh/authorized_keys < .ssh/id_rsa.pub
done
</pre>
<p>Each host is installed with Ubuntu raring and has a spare, unused, disk at <b>/dev/sdb</b>. The ceph packages are installed with:</p>
<pre>
ceph-deploy  install bm0012.the.re bm0014.the.re bm0015.the.re
</pre>
<p>The short version of each FQDN is added to <b>/etc/hosts</b> on each host, because ceph-deploy will assume that it exists:</p>
<pre>
for host in bm0012.the.re bm0014.the.re bm0015.the.re
do
 getent hosts bm0012.the.re bm0014.the.re bm0015.the.re | \
   sed -e 's/\.the\.re//' | ssh $host cat \>\> /etc/hosts
done
</pre>
<p>The ceph cluster configuration is created with:</p>
<pre>
# ceph-deploy new bm0012.the.re bm0014.the.re bm0015.the.re
</pre>
<p>and the corresponding <strong>mon</strong> are deployed with</p>
<pre>
ceph-deploy mon create bm0012.the.re bm0014.the.re bm0015.the.re
</pre>
<p>Even after the command returns, it takes a few seconds for the keys to be generated on each host: the <b>ceph-mon</b> process shows when it is complete. Before creating the <b>osd</b>, the keys are obtained from a <strong>mon</strong> with:</p>
<pre>
ceph-deploy gatherkeys bm0012.the.re
</pre>
<p>The <strong>osd</strong>s are then created with:</p>
<pre>
ceph-deploy osd create bm0012.the.re:/dev/sdb  bm0014.the.re:/dev/sdb  bm0015.the.re:/dev/sdb
</pre>
<p>After a few seconds the cluster stabilizes, as shown with </p>
<pre>
# ceph -s
   health HEALTH_OK
   monmap e1: 3 mons at {bm0012=188.165:6789/0,bm0014=188.165:6789/0,bm0015=188.165:6789/0}, election epoch 24, quorum 0,1,2 bm0012,bm0014,bm0015
   osdmap e14: 3 osds: 3 up, 3 in
    pgmap v106: 192 pgs: 192 active+clean; 0 bytes data, 118 MB used, 5583 GB / 5583 GB avail
   mdsmap e1: 0/0/1 up
</pre>
<p>A 10GB RBD is created, mounted and destroyed with:</p>
<pre>
# rbd create --size 10240 test1
# rbd map test1 --pool rbd
# mkfs.ext4 /dev/rbd/rbd/test1
# mount /dev/rbd/rbd/test1 /mnt
# df -h /mnt
Filesystem      Size  Used Avail Use% Mounted on
/dev/rbd1       9.8G   23M  9.2G   1% /mnt
# umount /mnt
# rbd unmap /dev/rbd/rbd/test1
# rbd rm test1
Removing image: 100% complete...done.
</pre>
<p><span id="more-1971"></span></p>
<h3>Ubuntu raring package</h3>
<p>A <a href="https://github.com/ceph/ceph-deploy/pull/10">series of patches</a> fix minor build and deploy problems for the <a href="https://github.com/ceph/ceph-deploy">ceph-deploy</a> package:</p>
<ul>
<li>the debian packages need python-setuptools as a build dependency</li>
<li>Add python-pushy to the list of packages required to run ceph-deploy when installed on debian</li>
<li>The list of path added by ceph-deploy does not cover all the deployment scenarios. In particular, when installed from a package it will end up in /usr/lib/python2.7/dist-packages/ceph_deploy . The error message is removed : the from will fail if it does not find the module.</li>
<li>add missing python-setuptools runtime dependency to debian/control</li>
</ul>
<h3>Reseting the installation</h3>
<p>To restart from scratch ( i.e. discarding all data and all installation parameters ), uninstall the software with</p>
<pre>
ceph-deploy uninstall bm0012.the.re bm0014.the.re bm0015.the.re
</pre>
<p>and purge any leftovers with</p>
<pre>
for host in bm0012.the.re bm0014.the.re bm0015.the.re
do
 ssh $host apt-get remove --purge ceph ceph-common ceph-mds
done
</pre>
<p>Remove the configuration files and data files with</p>
<pre>
for host in bm0012.the.re bm0014.the.re bm0015.the.re
do
 ssh $host rm -fr /etc/ceph /var/lib/ceph
done
</pre>
<p>Reset the disk with </p>
<pre>
for host in bm0012.the.re bm0014.the.re bm0015.the.re
do
 ssh $host &lt;&lt;EOF
umount /dev/sdb1
dd if=/dev/zero of=/dev/sdb bs=1024k count=100
sgdisk -g --clear /dev/sdb
EOF
done
</pre>
]]></content:encoded>
			<wfw:commentRss>http://dachary.org/?feed=rss2&#038;p=1971</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Disaster recovery on host failure in OpenStack</title>
		<link>http://dachary.org/?p=1961</link>
		<comments>http://dachary.org/?p=1961#comments</comments>
		<pubDate>Sat, 11 May 2013 14:09:26 +0000</pubDate>
		<dc:creator>Loic Dachary</dc:creator>
				<category><![CDATA[Essex]]></category>
		<category><![CDATA[debian]]></category>
		<category><![CDATA[openstack]]></category>

		<guid isPermaLink="false">http://dachary.org/?p=1961</guid>
		<description><![CDATA[The host bm0002.the.re becomes unavailable because of a partial disk failure on an Essex based OpenStack cluster using LVM based volumes and multi-host nova-network. The host had daily backups using rsync / and each LV was copied and compressed. Although &#8230; <a href="http://dachary.org/?p=1961">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>The host bm0002.the.re becomes unavailable because of a partial disk failure on an Essex based OpenStack cluster using LVM based volumes and multi-host nova-network. The host had daily backups using <b>rsync /</b> and each LV was copied and compressed. Although the disk is failing badly, the host is not down and some reads can still be done. The nova services are shutdown, the host disabled using <b>nova-manage</b> and an attempt is made to recover from partially damaged disks and LV, when it leads to better results than reverting to yesterday&#8217;s backup.<br />
<span id="more-1961"></span></p>
<h3>restoring an instance from backup</h3>
<p>The host is marked as unavailable</p>
<pre>
nova-manage service disable --host=bm0002.the.re --service=nova-compute
nova-manage service disable --host=bm0002.the.re --service=nova-network
nova-manage service disable --host=bm0002.the.re --service=nova-volume
</pre>
<p>and shows as such when listed</p>
<pre>
# nova-manage service list --host=bm0002.the.re
Binary           Host    Zone Status     State Updated_At
nova-compute     bm0002.the.re  bm0002  disabled   XXX   2013-05-11 09:18:25
nova-network     bm0002.the.re  bm0002  disabled   XXX   2013-05-11 09:18:30
nova-volume      bm0002.the.re  bm0002  disabled   XXX   2013-05-11 09:18:33
</pre>
<p>It can be removed completely later by modifying the mysql database directly. The april-ci instance was running on bm0002.the.re:</p>
<pre>
# nova list --name april-ci
+--------------------------------------+----------+---------+--------------------------------------+
|                  ID                  |   Name   |  Status |               Networks               |
+--------------------------------------+----------+---------+--------------------------------------+
| 4e8a8126-b27d-4c9e-abeb-4dc574c54254 | april-ci | SHUTOFF | novanetwork=10.145.9.5, 176.31.18.26 |
+--------------------------------------+----------+---------+--------------------------------------+
</pre>
<p>It is artificially moved to a host that is enabled:</p>
<pre>
mysql -e "update instances set host = 'bm0001.the.re', availability_zone = 'bm0001' where hostname = 'april-ci'" nova
</pre>
<p>and deleted</p>
<pre>
nova delete april-ci
</pre>
<p>Assuming the content of failed host  was backed up entirely ( i.e. rsync / ), the april-ci disk is located using the <b>id</b> shown above as the output of <b>nova list</b></p>
<pre>
# grep 4dc574c54254 /var/lib/nova/instances/*/*.xml
/var/lib/nova/instances/instance-000001de/libvirt.xml:    <uuid>4e8a8126-b27d-4c9e-abeb-4dc574c54254</uuid>
</pre>
<p>and the corresponding disk is turned into a minimal file system</p>
<pre>
chroot /backup/bm0002.the.re
mount -t proc none /proc
qemu-nbd --port 20000 /var/lib/nova/instances/instance-000001de/disk &#038;
nbd-client localhost 20000 /dev/nbd0
pv /dev/nbd0 > april-ci.april-ci.img
fsck -fy $(pwd)/april-ci.april-ci.img
resize2fs -M april-ci.april-ci.img
exit
</pre>
<p>and uploaded to glance, using the same kernel and initrd, as shown with <b>nova image-show original-image-of-april-ci</b></p>
<pre>
glance add name="april-ci-2013-05-11" disk_format=ami container_format=ami \
 kernel_id=2e714ea3-45e5-4bb8-ab5d-92bfff64ad28 \
 ramdisk_id=6458acca-24ef-4568-bb2b-e52322a5a11c < /backup/bm0002.the.re/april-ci.april-ci.img
</pre>
<p>it is then rebooted using the same flavor</p>
<pre>
nova boot --image 'april-ci-2013-05-11' \
  --flavor e.1-cpu.10GB-disk.1GB-ram \
  --key_name loic --availability_zone=bm0001 --poll april-ci
</pre>
<h3>recovering from a partially damaged logical volume</h3>
<p>A 30GB volume contains bad blocks toward the end ( after 26GB ) but it was not full. A <b>fsck</b> is run on a copy of the disk to check how much the recovery process would lose. It turns out to be less than a hundred files in a non-critical area. A new disk of the same size is allocated on another machine with</p>
<pre>
# euca-create-volume --zone bm0001 --size 30
VOLUME  vol-0000005b    30      bm0001  creating        2013-05-11T11:22:19.889Z
</pre>
<p>and the content of the damaged volume are copied over, until it fails with an I/O error.</p>
<pre>
ssh -A root@bm0001.the.re
ssh bm0002.the.re pv /dev/nova-volumes/volume-00000143 | \
 pv > /dev/nova-volumes/volume-0000005b
</pre>
<p>and it is repaired</p>
<pre>
fsck -fy /dev/nova-volumes/volume-0000005b
</pre>
<p>The volume residing on the failed host is removed directly from the database</p>
<pre>
mysql -e "update volumes set deleted = 1 where id = 30" nova
</pre>
<h3>recovering from a partially damanged instance disk</h3>
<p>An instance disk has a few failed blocks and may be recovered if the others are copied over. Because <b>rsync</b> is more resilient to I/O errors than <b>dd</b> or <b>pv</b>, it is used to recover as much as possible with:</p>
<pre>
# ssh -A root@bm0002.the.re
# rsync --inplace --progress /var/lib/nova/instances/instance-00000089/disk root@bm0001.the.re:/backup/bm0002.the.re/var/lib/nova/instances/instance-00000089/disk
  1843396608 100%    8.41MB/s    0:03:28 (xfer#1, to-check=0/1)
rsync: read errors mapping "/mnt/var/lib/nova/instances/instance-00000089/disk": Input/output error (5)
WARNING: disk failed verification -- update retained (will try again).
disk
  1843396608 100%   37.37MB/s    0:00:47 (xfer#2, to-check=0/1)
rsync: read errors mapping "/var/lib/nova/instances/instance-00000089/disk": Input/output error (5)
ERROR: disk failed verification -- update retained.
sent 1843836447 bytes  received 858892 bytes  7000741.32 bytes/sec
total size is 1843396608  speedup is 1.00
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1070) [sender=3.0.9]
</pre>
<p>It is then turned into a file using <b>nbd</b> as shown above and checked for errors:</p>
<pre>
# fsck -fy $(pwd)/openstack.jenkins.img
fsck from util-linux 2.20.1
e2fsck 1.42.5 (29-Jul-2012)
/openstack.jenkins.img: recovering journal
Clearing orphaned inode 117551 (uid=0, gid=0, mode=0100644, size=0)
Clearing orphaned inode 9764 (uid=0, gid=0, mode=0100644, size=1393052)
Clearing orphaned inode 9765 (uid=0, gid=0, mode=0100644, size=302040)
Clearing orphaned inode 7050 (uid=105, gid=109, mode=0100644, size=0)
Clearing orphaned inode 8841 (uid=0, gid=0, mode=0100644, size=81800)
Clearing orphaned inode 10235 (uid=0, gid=0, mode=0100644, size=253328)
Clearing orphaned inode 10240 (uid=0, gid=0, mode=0100644, size=180624)
Clearing orphaned inode 8840 (uid=0, gid=0, mode=0100644, size=874608)
Clearing orphaned inode 6469 (uid=0, gid=0, mode=0100755, size=1245180)
Clearing orphaned inode 10739 (uid=0, gid=0, mode=0100644, size=18192)
Clearing orphaned inode 10927 (uid=0, gid=0, mode=0100644, size=19908)
Clearing orphaned inode 10754 (uid=0, gid=0, mode=0100644, size=100820)
Clearing orphaned inode 10738 (uid=0, gid=0, mode=0100644, size=11468)
Clearing orphaned inode 10926 (uid=0, gid=0, mode=0100644, size=31568)
Clearing orphaned inode 10956 (uid=0, gid=0, mode=0100644, size=18780)
Clearing orphaned inode 10958 (uid=0, gid=0, mode=0100644, size=22312)
Clearing orphaned inode 10723 (uid=0, gid=0, mode=0100644, size=13976)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
Free blocks count wrong (2299561, counted=2283092).
Fix? yes
Free inodes count wrong (538192, counted=534536).
Fix? yes
/openstack.jenkins.img: ***** FILE SYSTEM WAS MODIFIED *****
/openstack.jenkins.img: 52984/587520 files (0.3% non-contiguous), 338348/2621440 blocks
</pre>
<p>If the lossage is better than recovering from yesterday's backup, the instance is rebooting using this copy.</p>
]]></content:encoded>
			<wfw:commentRss>http://dachary.org/?feed=rss2&#038;p=1961</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Minimal DNS spoofing daemon</title>
		<link>http://dachary.org/?p=1947</link>
		<comments>http://dachary.org/?p=1947#comments</comments>
		<pubDate>Fri, 03 May 2013 16:31:27 +0000</pubDate>
		<dc:creator>Loic Dachary</dc:creator>
				<category><![CDATA[DNS]]></category>

		<guid isPermaLink="false">http://dachary.org/?p=1947</guid>
		<description><![CDATA[When running tests in a controlled environment, it should be possible to spoof the domain names. For instance foo.com could be mapped into slow.novalocal, an OpenStack instance responding very slowly to simulate timeouts. A twisted based spoofing DNS reverse proxy &#8230; <a href="http://dachary.org/?p=1947">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>When running tests in a controlled environment, it should be possible to spoof the domain names. For instance <b>foo.com</b> could be mapped into <b>slow.novalocal</b>, an OpenStack instance responding very slowly to simulate timeouts. A <a href="http://twistedmatrix.com/">twisted</a> based <a href="https://agir.april.org/projects/admins/repository/revisions/e34a82625a72d684f4f2011daa27e639dbb01a51/entry/puppetmaster/modules/april_ci_dns/files/dns_spoof.py">spoofing DNS reverse proxy</a> is implemented to transparently resolve domain names with other domain names IP addresses, using a python hash table such as:</p>
<pre>
fqdn2fqdn = {
    'foo.com': 'foo.me',
    'bar.com': 'bar.me',
}
</pre>
<p>It will map <b>foo.com</b> to <b>foo.me</b> as follows:</p>
<pre>
$ sudo python dns_spoof.py 8.8.8.8 &#038;
$ ping -c 1 foo.me
PING foo.me (91.185.200.115) 56(84) bytes of data.
64 bytes from 91.185.200.115: icmp_req=1 ttl=47 time=42.2 ms
--- foo.me ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 42.268/42.268/42.268/0.000 ms
$ ping -c 1 foo.com
PING foo.com (91.185.200.115) 56(84) bytes of data.
64 bytes from 91.185.200.115: icmp_req=1 ttl=47 time=42.2 ms
--- foo.com ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 42.290/42.290/42.290/0.000 ms
</pre>
<p><strong>Update May 10, 2013:</strong> an easier solution is to <a href="http://jpmens.net/2011/04/26/how-to-configure-your-bind-resolvers-to-lie-using-response-policy-zones-rpz/">configure your BIND resolvers to lie using Response Policy Zones (RPZ)</a>. Thanks to S. Bortzmeyer for pointing <a href="http://www.bortzmeyer.org/rpz-faire-mentir-resolveur-dns.html">in the right direction</a>.<br />
<span id="more-1947"></span></p>
<h3>OpenStack based test environment</h3>
<p>In an integration environment where tests scripts boot instances using</p>
<pre>
nova boot foo
</pre>
<p>the associated domain name will be <b>foo.novalocal</b> by default but the associated private IP is not known in advance. When testing the integration of a puppet manifests designed to deploy this machine to serve <b>foo.com</b>, the manifest will typically contain:</p>
<pre>
node 'foo.com', 'foo.novalocal' {
...
}
</pre>
<p>so that it deploys on the production machine ( <b>foo.com</b> ) and the test machine ( <b>foo.novalocal</b> ). If the integration test checks <b>foo.com</b> availability, the corresponding nagios command will try to access <b>foo.com</b> and not <b>foo.novalocal</b>.<br />
Although it would be possible to avoid using FQDN such as <b>foo.com</b>, there are a number of circumstances where the software will make it inconvenient or the operator will just forget about this rule. </p>
<h3>spoofing DNS requests</h3>
<p>To transparently ensure that all FQDN used in puppet manifests resolve to private IPs matching a <b>*.novalocal</b> name, the manifests are <a href="https://agir.april.org/projects/admins/repository/revisions/e34a82625a72d684f4f2011daa27e639dbb01a51/entry/puppetmaster/modules/april_ci_dns/generate_dnsspoof_config">parsed</a> to create a table mapping each node name to the <b>.novalocal</b> FQDN found in the same <b>node</b> stanza. This dictionary can then be used by a daemon to spoof each DNS request to <b>foo.com</b> into a request to <b>foo.me</b>.</p>
<h3>spoofing DNS server</h3>
<p>The DNS <a href="http://twistedmatrix.com/trac/browser/tags/releases/twisted-12.1.0/twisted/names/__init__.py">server and client</a> library provided with <a href="http://twistedmatrix.com/">twisted</a> is used to implement a <a href="https://agir.april.org/projects/admins/repository/revisions/e34a82625a72d684f4f2011daa27e639dbb01a51/entry/puppetmaster/modules/april_ci_dns/files/dns_spoof.py">spoofing DNS reverse proxy</a>.<br />
It <a href="https://agir.april.org/projects/admins/repository/revisions/e34a82625a72d684f4f2011daa27e639dbb01a51/entry/puppetmaster/modules/april_ci_dns/files/dns_spoof.py#L60">reads</a> a <a href="https://agir.april.org/projects/admins/repository/revisions/e34a82625a72d684f4f2011daa27e639dbb01a51/entry/puppetmaster/modules/april_ci_dns/files/spoof_map.py">python file mapping FQDN to their spoofed equivalent</a>.<br />
When an <a href="https://agir.april.org/projects/admins/repository/revisions/e34a82625a72d684f4f2011daa27e639dbb01a51/entry/puppetmaster/modules/april_ci_dns/files/dns_spoof.py#L32">incoming DNS request is received</a>, the FQDN is  <a href="https://agir.april.org/projects/admins/repository/revisions/e34a82625a72d684f4f2011daa27e639dbb01a51/entry/puppetmaster/modules/april_ci_dns/files/dns_spoof.py#L36">substituted</a> if it is <a href="https://agir.april.org/projects/admins/repository/revisions/e34a82625a72d684f4f2011daa27e639dbb01a51/entry/puppetmaster/modules/april_ci_dns/files/dns_spoof.py#L34">found in the spoof map</a>.<br />
The spoofed name is saved in the <b>spoofed</b> variable and <a href="https://agir.april.org/projects/admins/repository/revisions/e34a82625a72d684f4f2011daa27e639dbb01a51/entry/puppetmaster/modules/april_ci_dns/files/dns_spoof.py#L41">included in the closure</a> added to the <a href="http://twistedmatrix.com/documents/12.2.0/core/howto/defer.html">deferred</a> created to forward the query to the authoritative DNS.<br />
When the answer is received, the spoofed name is <a href="https://agir.april.org/projects/admins/repository/revisions/e34a82625a72d684f4f2011daa27e639dbb01a51/entry/puppetmaster/modules/april_ci_dns/files/dns_spoof.py#L48">restored</a> so that the original requester does not notice the difference. The <a href="https://agir.april.org/projects/admins/repository/revisions/e34a82625a72d684f4f2011daa27e639dbb01a51/entry/puppetmaster/modules/april_ci_dns/files/dns_spoof.py#L55">name is also restored in case of an error</a> otherwise the client is not notified of the error because it contains a reference to a FQDN that is not know to the original client. </p>
<h3>testing the DNS server</h3>
<p>The <a href="https://agir.april.org/projects/admins/repository/revisions/e34a82625a72d684f4f2011daa27e639dbb01a51/entry/puppetmaster/modules/april_ci_dns/files/dns_spoof.py#L59">standalone</a> python script is isolated to allow for inclusion by the <a href="https://agir.april.org/projects/admins/repository/revisions/e34a82625a72d684f4f2011daa27e639dbb01a51/entry/puppetmaster/modules/april_ci_dns/files/test.py">test</a> script. It is run as follows:</p>
<pre>
$ trial test.py
test
  DnsSpoofTestCase
    testAddressRecord_one ...                                    [OK]
    testAddressRecord_two ...                                    [OK]
    testSpoofedRecord_fail ...                                   [OK]
    testSpoofedRecord_one ...                                    [OK]
    testSpoofedRecord_two ...                                    [OK]
---------------------------------------------------------------------
Ran 5 tests in 0.017s
PASSED (successes=5)
</pre>
<p><a href="https://agir.april.org/projects/admins/repository/revisions/e34a82625a72d684f4f2011daa27e639dbb01a51/entry/puppetmaster/modules/april_ci_dns/files/test.py#L62">Before each test is run</a>, a <a href="https://agir.april.org/projects/admins/repository/revisions/e34a82625a72d684f4f2011daa27e639dbb01a51/entry/puppetmaster/modules/april_ci_dns/files/test.py#L82">DNS server is run</a> to provide information for a <a href="https://agir.april.org/projects/admins/repository/revisions/e34a82625a72d684f4f2011daa27e639dbb01a51/entry/puppetmaster/modules/april_ci_dns/files/test.py#L46">pre-defined set of FQDN</a>. Another DNS server is run <a href="https://agir.april.org/projects/admins/repository/revisions/e34a82625a72d684f4f2011daa27e639dbb01a51/entry/puppetmaster/modules/april_ci_dns/files/test.py#L103">based on the DNSSpoofFactory</a> class, which <a href="https://agir.april.org/projects/admins/repository/revisions/e34a82625a72d684f4f2011daa27e639dbb01a51/entry/puppetmaster/modules/april_ci_dns/files/test.py#L84">is configured to forward to the first DNS</a>.<br />
The first DNS is tested to check that it resolves <a href="https://agir.april.org/projects/admins/repository/revisions/e34a82625a72d684f4f2011daa27e639dbb01a51/entry/puppetmaster/modules/april_ci_dns/files/test.py#L126">one.my-domain.com</a> and <a href="https://agir.april.org/projects/admins/repository/revisions/e34a82625a72d684f4f2011daa27e639dbb01a51/entry/puppetmaster/modules/april_ci_dns/files/test.py#L132">two.my-domain.com</a>. The spoofing DNS is tested to check that is resolves <a href="https://agir.april.org/projects/admins/repository/revisions/e34a82625a72d684f4f2011daa27e639dbb01a51/entry/puppetmaster/modules/april_ci_dns/files/test.py#L138">one.my-domain.com</a> to the same IP as <a href="https://agir.april.org/projects/admins/repository/revisions/e34a82625a72d684f4f2011daa27e639dbb01a51/entry/puppetmaster/modules/april_ci_dns/files/test.py#L144">two.my-domain.com</a> because the <a href="https://agir.april.org/projects/admins/repository/revisions/e34a82625a72d684f4f2011daa27e639dbb01a51/entry/puppetmaster/modules/april_ci_dns/files/test.py#L34">spoof map</a> asks for it:</p>
<pre>
fqdn2fqdn = { 'two.my-domain.com': 'one.my-domain.com' }
</pre>
<p>The error handler is <a href="https://agir.april.org/projects/admins/repository/revisions/e34a82625a72d684f4f2011daa27e639dbb01a51/entry/puppetmaster/modules/april_ci_dns/files/test.py#L151">checked to raise a DNSNameError</a> if trying to resolve a hostname that is unknown to the authoritative DNS.</p>
<h3>other DNS servers</h3>
<p>bind9 could be used to create zones in which some hostnames (<b>foo.com</b> for instance) are CNAME to the corresponding private hostname (<b>foo.novalocal</b>). The puppet module creating these CNAME could even be merged into the puppet module <a href="http://www.bind9.net/manual/bind/9.3.2/Bv9ARM.ch06.html#view_statement_grammar">bind9 view</a>. But the bind9 configuration to achieve is either difficult or does not exist.<br />
The <a href="http://www.unbound.net/documentation/unbound.conf.html">unbound</a>, <a href="http://www.thekelleys.org.uk/dnsmasq/doc.html">dnsmasq</a> and <a href="http://ettercap.github.io/ettercap/">ettercap</a> DNS servers do not support CNAME : they all require that the IP of a hostname is known in advance, which is not the case when launching an instance in OpenStack.</p>
]]></content:encoded>
			<wfw:commentRss>http://dachary.org/?feed=rss2&#038;p=1947</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>nova-network debugging tips</title>
		<link>http://dachary.org/?p=1929</link>
		<comments>http://dachary.org/?p=1929#comments</comments>
		<pubDate>Mon, 04 Mar 2013 19:50:17 +0000</pubDate>
		<dc:creator>Loic Dachary</dc:creator>
				<category><![CDATA[Folsom]]></category>
		<category><![CDATA[debian]]></category>
		<category><![CDATA[openstack]]></category>

		<guid isPermaLink="false">http://dachary.org/?p=1929</guid>
		<description><![CDATA[A single machine is installed with Debian GNU/Linux OpenStack Folsom. Four instances are created and it turns out that nova-network is configured with the wrong public interface. It can be fixed without shutting down the instance: nova suspend target1 The &#8230; <a href="http://dachary.org/?p=1929">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>A single machine is installed with <a href="http://openstack.dachary.org/2013-02-20/README.html">Debian GNU/Linux OpenStack Folsom</a>. Four instances are created and it turns out that <strong>nova-network</strong> is configured with the wrong public interface. It can be fixed without shutting down the instance:</p>
<pre>
nova suspend target1
</pre>
<p>The instance is suspended to disk (as if it was a laptop) and the corresponding <strong>KVM</strong> process is killed. While the instance is suspended, <strong>nova-network</strong> can be stopped.</p>
<pre>
/etc/init.d/nova-network stop
</pre>
<p>The source of the problem was a typo in the public interface leading to an incorrect <strong>VLAN</strong> interface</p>
<pre>
13: vlan100@eth2: <BROADCAST,MULTICAST,PROMISC,M-DOWN> mtu 1500 qdisc noqueue state DOWN mode DEFAULT
    link/ether fa:16:3e:54:5b:57 brd ff:ff:ff:ff:ff:ff
</pre>
<p>it can be fixed in the <b>/etc/nova/nova.conf</b> configuration file at the line:</p>
<pre>
public_interface = eth3
</pre>
<p>The incorrect <strong>VLAN</strong> interface is manually deleted and <b>nova-network</b> can be restarted. The instance is then resumed with</p>
<pre>
nova resume target1
</pre>
<p>and <b>nova-network</b> will automatically re-create the <b>VLAN</b> interface.<br />
<span id="more-1929"></span></p>
<h3>fixing nova-network configuration and restarting the service</h3>
<p>When the public IP of the bare metal host is not configured properly in <strong>/etc/nova/nova.conf</strong></p>
<pre>
my_ip = 192.168.20.10
</pre>
<p><b>nova-network</b> will create an incorrect <b>SNAT</b> iptables rule</p>
<pre>
0 0 SNAT all  --  any tun0 10.20.0.0/16 anywhere to:<b>192.168.20.10</b>
</pre>
<p>When the <b>my_ip</b> line is fixed, <b>/etc/init.d/nova-network</b> can be safely be restarted, even when instances are running on the bare metal. It will not disrupt their connections and the iptables rule will be updated as expected.</p>
<h3>modifying the interfaces and restarting the service</h3>
<p>Some problems cannot be fixed by simply modifying the <b>/etc/nova/nova.conf</b> file and the <strong>VLAN</strong> interface must be deleted manually. When the public interface is wrongly configured:</p>
<pre>
public_interface = eth2
</pre>
<p>and an instance has been created on the bare metal, a <strong>VLAN</strong> interface and a bridge are created:</p>
<pre>
# ip link vlan100
13: vlan100@<strong>eth2</strong>: <BROADCAST,MULTICAST,PROMISC,M-DOWN> mtu 1500 qdisc noqueue state DOWN mode DEFAULT
    link/ether fa:16:3e:54:5b:57 brd ff:ff:ff:ff:ff:ff
# brctl show br100
bridge name     bridge id               STP enabled     interfaces
br100           8000.fa163e6e08de       no              vlan100
                                                        vnet0
                                                        vnet1
                                                        vnet2
                                                        vnet3
</pre>
<p>If the configuration file is fixed to use <b>eth3</b> instead of <b>eth2</b></p>
<pre>
public_interface = eth3
</pre>
<p>restarting <b>nova-network</b> will <strong>not</strong> change the interface to which <b>vlan100</b> is attached. Assuming the instances bound to the bridge are as follows:</p>
<pre>
 nova list
+--------------------------------------+------------+--------+---------------------+
| ID                                   | Name       | Status | Networks            |
+--------------------------------------+------------+--------+---------------------+
| 5e263310-a578-4653-bb48-697cca589297 | target1    | ACTIVE | private_0=10.20.0.6 |
| b108007d-d7e4-4289-83a8-a72280541eb2 | target2    | ACTIVE | private_0=10.20.0.7 |
| 8b97f260-f888-49a2-8f80-065ea49ea3b6 | target3    | ACTIVE | private_0=10.20.0.8 |
| 585ad852-2d34-45a5-8036-607fa0087511 | teuthology | ACTIVE | private_0=10.20.0.5 |
+--------------------------------------+------------+--------+---------------------+
</pre>
<p>they can be temporarily suspended with:</p>
<pre>
# nova suspend target1
# nova suspend target2
# nova suspend target3
# nova suspend teuthology
# nova list
+--------------------------------------+------------+-----------+---------------------+
| ID                                   | Name       | Status    | Networks            |
+--------------------------------------+------------+-----------+---------------------+
| 5e263310-a578-4653-bb48-697cca589297 | target1    | SUSPENDED | private_0=10.20.0.6 |
| b108007d-d7e4-4289-83a8-a72280541eb2 | target2    | SUSPENDED | private_0=10.20.0.7 |
| 8b97f260-f888-49a2-8f80-065ea49ea3b6 | target3    | SUSPENDED | private_0=10.20.0.8 |
| 585ad852-2d34-45a5-8036-607fa0087511 | teuthology | SUSPENDED | private_0=10.20.0.5 |
+--------------------------------------+------------+-----------+---------------------+
</pre>
<p>It will kill the KVM process running the instance and save its state to disk, the equivalent of a laptop <strong>suspend to disk</strong>. The bridge shows they are no longer attachd to it:</p>
<pre>
# brctl show br100
bridge name     bridge id               STP enabled     interfaces
br100           8000.fa163e6e08de       no              vlan100
</pre>
<p>The bridge and the VLAN interface are manually deleted</p>
<pre>
# ip link set br100 down
# brctl delbr br100
# ip link delete vlan100
</pre>
<p>When <b>nova-network</b> is stopped, the <b>dnsmasq</b> process persists and it will not notice when a new bridge is created. It must be killed so that <b>nova-network</b> restarts it after the re-creating the bridge.</p>
<pre>
pkill dnsmasq
</pre>
<p>If <b>dnsmasq</b> is not killed, the instance will resume properly but will loose its IP after trying to renew the DHCP lease. The instances can be resumed after starting <b>nova-network</b></p>
<pre>
# /etc/init.d/nova-network start
# nova resume target1
# nova resume target2
# nova resume target3
# nova resume teuthology
</pre>
<p>The VLAN interface is created as a side effect of starting the first instance:</p>
<pre>
# ip link vlan100
13: vlan100@eth3: <BROADCAST,MULTICAST,PROMISC,M-DOWN> mtu 1500 qdisc noqueue state DOWN mode DEFAULT
    link/ether fa:16:3e:54:5b:57 brd ff:ff:ff:ff:ff:ff
</pre>
<p>and the instances will not notice the difference.</p>
]]></content:encoded>
			<wfw:commentRss>http://dachary.org/?feed=rss2&#038;p=1929</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>ceph internals : buffer lists</title>
		<link>http://dachary.org/?p=1904</link>
		<comments>http://dachary.org/?p=1904#comments</comments>
		<pubDate>Fri, 22 Feb 2013 09:28:20 +0000</pubDate>
		<dc:creator>Loic Dachary</dc:creator>
				<category><![CDATA[ceph]]></category>

		<guid isPermaLink="false">http://dachary.org/?p=1904</guid>
		<description><![CDATA[The ceph buffers are used to process data in memory. For instance, when a FileStore handles an OP_WRITE transaction it writes a list of buffers to disk. +---------+ &#124; +-----+ &#124; list ptr &#124; &#124; &#124; &#124; +----------+ +-----+ &#124; &#8230; <a href="http://dachary.org/?p=1904">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://ceph.com/">ceph</a> <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/common/buffer.cc">buffers</a> are used to process data in memory. For instance, when a <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/os/FileStore.cc#L2345">FileStore handles an OP_WRITE transaction</a> it <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/os/FileStore.cc#L2904">writes a list of buffers</a> to disk.</p>
<pre>
                                             +---------+
                                             | +-----+ |
    list              ptr                    | |     | |
 +----------+       +-----+                  | |     | |
 | append_  >------->     >-------------------->     | |
 |  buffer  |       +-----+                  | |     | |
 +----------+                        ptr     | |     | |
 |   _len   |      list            +-----+   | |     | |
 +----------+    +------+     ,--->+     >----->     | |
 | _buffers >---->      >-----     +-----+   | +-----+ |
 +----------+    +----^-+     \      ptr     |   raw   |
 |  last_p  |        /         `-->+-----+   | +-----+ |
 +--------+-+       /              +     >----->     | |
          |       ,-          ,--->+-----+   | |     | |
          |      /        ,---               | |     | |
          |     /     ,---                   | |     | |
        +-v--+-^--+--^+-------+              | |     | |
        | bl | ls | p | p_off >--------------->|     | |
        +----+----+-----+-----+              | +-----+ |
        |               | off >------------->|   raw   |
        +---------------+-----+              |         |
              iterator                       +---------+
</pre>
<p>The actual data is stored in <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/include/buffer.h#L117">buffer::raw</a> opaque objects. They are accessed through a <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/include/buffer.h#L144">buffer::ptr</a>. A <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/include/buffer.h#L220">buffer::list</a> is a sequential list of <strong>buffer::ptr</strong> which can be used as if it was a contiguous data area although it can be spread over many <strong>buffer::raw</strong> containers, as represented by the rectangle enclosing the two <strong>buffer::raw</strong> objects in the above drawing. The <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/include/buffer.h#L228">buffer::list::iterator</a> can be used to walk each character of the <strong>buffer::list</strong> as follows:</p>
<pre>
  bufferlist bl;
  bl.append("ABC", 3);
  {
    bufferlist::iterator i(&#038;bl);
    ++i;
    EXPECT_EQ('B', *i);
  }
</pre>
<p><span id="more-1904"></span></p>
<h3>documentation and unit tests</h3>
<p>The ultimate documentation for <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/common/buffer.cc">buffer.cc</a> and <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/include/buffer.h">buffer.h</a> are the <a href="https://github.com/ceph/ceph/blob/master/src/test/bufferlist.cc">unit tests</a> that demonstrate how it actually works. This document is a short guide designed to provide an overview and does not attempt to cover everything.</p>
<h3>buffer::ptr and buffer::raw</h3>
<p>The <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/common/buffer.cc#L57">buffer::raw</a> is where the data is actually stored. It is allocated with <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/common/buffer.cc#L88">malloc</a>, <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/common/buffer.cc#L192">new</a> or reusing <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/common/buffer.cc#L216">a pointer provided by the caller</a>. A variant of the <b>malloc</b> constructor provides an area that is <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/common/buffer.cc#L216">aligned</a> on <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/include/page.h#L14">CEPH_PAGE_SIZE</a>. The address of the allocated memory will be a multiple  of  CEPH_PAGE_SIZE,       which  must  be  a  power  of  two  and  a multiple of sizeof(void *).</p>
<pre>

 +-----------+                +-----+
 |           |                |     |
 |  offset   +----------------+     |
 |           |                |     |
 |  length   +----            |     |
 |           |    \-------    |     |
 +-----------+            \---+     |
 |   ptr     |                +-----+
 +-----------+                | raw |
                              +-----+
</pre>
<p>The <b>buffer::raw</b> area can only be accessed through the <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/include/buffer.h#L144">buffer::ptr</a>. It adresses the <b>buffer::raw</b> bytes in the range <b>[offset,offset+length[</b>. The <b>buffer::ptr</b> methods are very flexible and mostly designed to be used to implement <b>buffer::lists</b>. The constructors can allocate a <b>buffer::raw</b> area with <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/include/buffer.h#L153">ptr(unsigned l)</a> or be assigned an existing <b>buffer::raw</b> with <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/include/buffer.h#L152">ptr(raw *r)</a>, as in <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/common/buffer.cc#L761">buffer::list::rebuild()</a>.<br />
Bytes can be <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/include/buffer.h#L208">copied in</a> or <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/include/buffer.h#L190">copied out</a> within the <b>[offset,offset+length[</b> range. If the underlying <b>buffer::raw</b> extends beyond <b>offset+length</b> (as reported by <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/include/buffer.h#L182">unused_tail_length()</a>), bytes can be <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/include/buffer.h#L207">appended</a>. </p>
<pre>
    bufferptr ptr(2);
    ptr.set_length(0);
    ptr.append('A');
    EXPECT_EQ((unsigned)1, ptr.length());
    EXPECT_EQ('A', ptr[0]);
    ptr.append(&#8220;B&#8221;, (unsigned)1);
    EXPECT_EQ((unsigned)2, ptr.length());
    EXPECT_EQ(&#8216;B&#8217;, ptr[1]);
</pre>
<h3>buffer::list and buffer::list::iterator</h3>
<p>A <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/include/buffer.h#L299">buffer::list</a> is a <a href="http://www.cplusplus.com/reference/list/list/">list</a> of <b>buffer::ptr</b>, as shown below.</p>
<pre>
                                             +---------+
                                             | +-----+ |
    list                                     | |     | |
 +----------+                                | |     | |
 | append_  |                                | |     | |
 |  buffer  |                                | |     | |
 +----------+                        ptr     | |     | |
 |   _len   |      list            +-----+   | |     | |
 +----------+    +------+     ,--->+     >----->     | |
 | _buffers >---->      >-----     +-----+   | +-----+ |
 +----------+    +----^-+     \      ptr     |   raw   |
 |  last_p  |                  `-->+-----+   | +-----+ |
 +----------+                      |     >----->     | |
                                   +-----+   | |     | |
                                             | |     | |
                                             | |     | |
                                             | |     | |
                                             | |     | |
                                             | +-----+ |
                                             |   raw   |
                                             |         |
                                             +---------+
</pre>
<p>The <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/include/buffer.h#L406">operator[]</a> abstracts it.</p>
<pre>
  bufferlist bl;
  bl.append('A');
  bufferlist other;
  other.append('B');
  bl.append(other);
  EXPECT_EQ((unsigned)2, bl.buffers().size());
  EXPECT_EQ('B', bl[1]);
</pre>
<p>The <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/include/buffer.h#L228">buffer::list::iterator</a> class provides some of the <a href="http://www.cplusplus.com/reference/iterator/">usual iterator behavior</a> and is used in the <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/common/buffer.cc#L680">buffer::list::contents_equal(ceph::buffer::list&#038; other)</a> method. </p>
<pre>
                                             +---------+
                                             | +-----+ |
    list                                     | |     | |
 +----------+                                | |     | |
 | append_  |                                | |     | |
 |  buffer  |                                | |     | |
 +----------+                        ptr     | |     | |
 |   _len   |      list            +-----+   | |     | |
 +----------+    +------+     ,--->+     >----->     | |
 | _buffers >---->      >-----     +-----+   | +-----+ |
 +----------+    +----^-+     \      ptr     |   raw   |
 |  last_p  |        /         `-->+-----+   | +-----+ |
 +--------+-+       /              +     >----->     | |
          |       ,-          ,--->+-----+   | |     | |
          |      /        ,---               | |     | |
          |     /     ,---                   | |     | |
        +-v--+-^--+--^+-------+              | |     | |
        | bl | ls | p | p_off >--------------->|     | |
        +----+----+-----+-----+              | +-----+ |
        |               | off >------------->|   raw   |
        +---------------+-----+              |         |
              iterator                       +---------+
</pre>
<p>The <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/include/buffer.h#L229">bl</a> data member points to the <b>buffer::list</b>. The <b>ls</b> data member is used to avoid dereferencing a pointer and is equivalent to <b>bl->_buffers</b>. The <b>p</b> data member is a <b>std::list&lt;ptr&gt;::iterator</b> used to walk the list of <b>buffer::ptr</b>. The <b>p_off</b> data member is the offset at which the iterator currently is, within the <b>buffer::raw</b> pointed by <b>p</b>. The <b>off</b> data member is the offset of the iterator, as if there was only one <b>buffer::raw</b>.<br />
Although <b>buffer::list::iterator</b> exposes <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/include/buffer.h#L282">copy in and out</a> methods, they are designed to be used as supporting methods for the <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/include/buffer.h#L386">corresponding copy in and out</a> methods from the <b>buffer::list</b> class. </p>
<pre>
{
    bufferlist bl;
    bufferlist dest;
    const char *expected = "ABC";
    bl.append(expected);
    bl.copy(1, 2, dest);
    EXPECT_EQ(0, ::memcmp(expected + 1, dest.c_str(), 2));
}
{
    bufferlist bl;
    bl.append("XXX");
    bl.copy_in(1, 2, "AB");
    EXPECT_EQ(0, ::memcmp("XAB", bl.c_str(), 3));
}
</pre>
<p>The internal representation of the <b>buffer::list</b> can be <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/include/buffer.h#L369">rebuilt</a> to use a single <b>buffer::raw</b>.</p>
<pre>
  {
    bufferlist bl;
    const std::string str(CEPH_PAGE_SIZE, 'X');
    bl.append(str.c_str(), str.size());
    bl.append(str.c_str(), str.size());
    EXPECT_EQ((unsigned)2, bl.buffers().size());
    bl.rebuild();
    EXPECT_EQ((unsigned)1, bl.buffers().size());
  }
</pre>
<p>It can also be rebuilt to only use <b>buffer::raw</b> that are <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/include/buffer.h#L370">aligned on CEPH_PAGE_SIZE</a>.</p>
<pre>
  {
    bufferlist bl;
    {
      bufferptr ptr(CEPH_PAGE_SIZE + 1);
      ptr.set_offset(1);
      ptr.set_length(CEPH_PAGE_SIZE);
      bl.append(ptr);
    }
    EXPECT_EQ((unsigned)1, bl.buffers().size());
    EXPECT_FALSE(bl.is_page_aligned());
    bl.rebuild_page_aligned();
    EXPECT_TRUE(bl.is_page_aligned());
    EXPECT_EQ((unsigned)1, bl.buffers().size());
  }
</pre>
<p>The content of the <b>buffer::list</b> can be <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/include/buffer.h#L418">read from a file</a> or <a href="https://github.com/ceph/ceph/blob/02a353e5940e003cfcdffc77920a6b518d581d95/src/include/buffer.h#L420">written to a file</a>, either with a file descriptor or a path.</p>
<pre>
{
  bufferlist bl;
  bl.append("ABC");
  EXPECT_EQ(0, bl.write_file("testfile"));
}
{
  std::string error;
  bufferlist bl;
  ::system("echo ABC > testfile");
  EXPECT_EQ(0, bl.read_file("testfile", &#038;error));
  EXPECT_EQ((unsigned)4, bl.length());
  std::string actual(bl.c_str(), bl.length());
  EXPECT_EQ("ABC\n", actual);
}
</pre>
]]></content:encoded>
			<wfw:commentRss>http://dachary.org/?feed=rss2&#038;p=1904</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Upstream University at the OpenStack summit</title>
		<link>http://dachary.org/?p=1846</link>
		<comments>http://dachary.org/?p=1846#comments</comments>
		<pubDate>Mon, 11 Feb 2013 08:29:25 +0000</pubDate>
		<dc:creator>Loic Dachary</dc:creator>
				<category><![CDATA[Upstream University]]></category>
		<category><![CDATA[openstack]]></category>

		<guid isPermaLink="false">http://dachary.org/?p=1846</guid>
		<description><![CDATA[What if contributing to OpenStack was made a lot easier by a few days of training? You could get this training at Upstream University, which was created shortly after the OpenStack design summit, in April 2012, with this sole goal &#8230; <a href="http://dachary.org/?p=1846">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://dachary.org/wp-uploads/2013/01/logo_small.png"><img src="http://dachary.org/wp-uploads/2013/01/logo_small.png" alt="Upstream University" title="logo_small" width="100" height="97" class="alignleft size-full wp-image-1864" /></a> What if contributing to <a href="http://openstack.org/">OpenStack</a> was made a lot easier by a few days of training? You could get this training at <a href="http://upstream-university.org/">Upstream University</a>, which was created shortly after the OpenStack design summit, in April 2012, with this sole goal of improving developers&#8217; contribution skills. Upstream University has since coached new OpenStack contributors, from <a href="http://enovance.com/">eNovance</a> and <a href="http://cloudwatt.fr/">Cloudwatt</a>,  developers; for the kernel Linux and many others. <a href="http://dachary.org/wp-uploads/2013/01/openstack.png"><img src="http://dachary.org/wp-uploads/2013/01/openstack.png" alt="OpenStack" title="openstack" width="216" height="216" class="alignright size-full wp-image-1867" /></a><br />
To celebrate its first year, Upstream University <a href="http://upstream-university.org/apply/">is organizing a session</a> in advance of the next <a href="http://www.openstack.org/summit/portland-2013/">OpenStack summit</a>, in Portland. If you can fly in two days ahead of the event to spend the weekend improving your OpenStack contribution skills, please consider <a href="http://upstream-university.org/apply/">submitting an application</a> to attend the workshop. This a one-time offer for free training.<br />
<span id="more-1846"></span><br />
<a href="http://dachary.org/wp-uploads/2013/01/rms.jpg"><img src="http://dachary.org/wp-uploads/2013/01/rms.jpg" alt="Richard Stallman" title="rms" width="150" height="150" class="alignleft size-full wp-image-1880" /></a> Contributing to Free Software is easy. So easy, in fact, that we forget that it&#8217;s worth training for it.  It&#8217;s just like running, which is also easy, but which you&#8217;d better train for, if your plan is to go to the Olympics or get sponsored.  If you want to be an efficient and productive Free Software contributor, you need to be familiar with certain strategies and have certain communication skills. We are not all born hackers like Richard Stallman, but we can get inspiration from him.  With his active support, Upstream University has designed a unique training program that can benefit all developers.<br />
While a lecture about Free Software contribution could be entertaining or even inspirational,  there is no substitute for learning by example. In this workshop, each student enters the training program with an actual bug or blueprint from the OpenStack bug tracker&#8212;a problem that matters to them, either because their company has a deadline or because they like to solve problems. They are then assigned a simple task: to get their work accepted upstream.<br />
This is something that wouldn&#8217;t be possible with traditional training formats, as it often takes weeks for a contribution to get accepted. Instead the curriculum is divided in two parts: one live and one online. The first two days of training are live: a mentor explains how to approach the contribution,  and advises the students on the best strategies. The highlight of the training is a simulation of agile methods applied to upstream contribution, involving a lot of legos. The picture below was taken during last week&#8217;s training for <a href="http://cloudwatt.fr/">Cloudwatt</a> developers and  the kanban is visible on the paper board in the photo below. The result was a French Riviera village with a beach, a café, and waves: a great source of inspiration when the temperature outside is below zero.<br />
<a href="http://dachary.org/wp-uploads/2013/01/libre-agile.jpg"><img src="http://dachary.org/wp-uploads/2013/01/libre-agile-1024x856.jpg" alt="Agile applied to Free Software contribution" title="libre-agile" width="640" height="535" class="alignleft size-large wp-image-1849" /></a><br />
The online mentoring that follows the live part is where the actual learning experience happens. The student meets with the mentor every other day to explain the progress of their contribution and discuss how to improve it. It is not only about making it easier for the developer to resolve problems, ranging from keeping a patch minimal to kindly  asking a reviewer for attention; it&#8217;s also about making life upstream easier, because contributions with a higher  quality standard mean less work for everyone else.<br />
Although Upstream University was originally designed to help companies improve their contribution skills and reduce the <a href="http://blogs.gnome.org/bolsh/2011/09/01/the-cost-of-going-it-alone/">cost of going it alone</a>, it was successfully integrated into the university at <a href="http://www.univ-littoral.fr/">ULCO</a> in 2012. OpenStack has many low-hanging fruits that appeal to its students, who&#8217;ve all become registered contributors during the online mentoring sessions. Two universities will have Upstream University courses focused on OpenStack, in 2013, and we are eager to provide training material if another opportunity presents itself.</p>
]]></content:encoded>
			<wfw:commentRss>http://dachary.org/?feed=rss2&#038;p=1846</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Chaining extended attributes in ceph</title>
		<link>http://dachary.org/?p=1893</link>
		<comments>http://dachary.org/?p=1893#comments</comments>
		<pubDate>Fri, 08 Feb 2013 12:00:57 +0000</pubDate>
		<dc:creator>Loic Dachary</dc:creator>
				<category><![CDATA[ceph]]></category>

		<guid isPermaLink="false">http://dachary.org/?p=1893</guid>
		<description><![CDATA[Ceph uses extended file attributes to store file meta data. It is a list of key / value pairs. Some file systems implementations do not allow to store more than 2048 characters in the value associated with a key. To &#8230; <a href="http://dachary.org/?p=1893">Continue reading <span class="meta-nav">&#8594;</span></a>]]></description>
			<content:encoded><![CDATA[<p><a href="http://ceph.com/">Ceph</a> uses <a href="http://en.wikipedia.org/wiki/Extended_file_attributes">extended file attributes</a> to store file meta data. It is a list of <strong>key</strong> / <strong>value</strong> pairs. Some file systems implementations do not allow to store more than <a href="https://github.com/ceph/ceph/blob/0db11c79c26a1f3a926f95475a1446f6044d9b87/src/os/chain_xattr.h#L12">2048 characters</a> in the <strong>value</strong> associated with a <strong>key</strong>. To overcome this limitation Ceph implements <a href="https://github.com/ceph/ceph/blob/0db11c79c26a1f3a926f95475a1446f6044d9b87/src/os/chain_xattr.cc">chained extended attributes</a>.<br />
A <strong>value</strong> that is 5120 character long will be stored in three separate attributes:</p>
<ul>
<li>user.<strong>key</strong> : first 2048 characters</li>
<li>user.<strong>key</strong>@1 : next 2048 characters</li>
<li>user.<strong>key</strong>@2 : last 1024 characters</li>
</ul>
<p>The proposed <a href="https://github.com/ceph/ceph/pull/40">unit tests</a> may be used as a documentation describing in detail how it is implemented from the caller point of view.<br />
<span id="more-1893"></span></p>
<h3>coverage</h3>
<p>The <a href="https://github.com/ceph/ceph/pull/40">unit tests</a> cover most (> 93%) of the <a href="https://github.com/ceph/ceph/blob/0db11c79c26a1f3a926f95475a1446f6044d9b87/src/os/chain_xattr.cc">chained extended attributes</a> implementation. The following functions are tested:</p>
<ul>
<li>int chain_getxattr(const char *fn, const char *name, void *val, size_t size);</li>
<li>int chain_fgetxattr(int fd, const char *name, void *val, size_t size);</li>
<li>int chain_setxattr(const char *fn, const char *name, const void *val, size_t size);</li>
<li>int chain_fsetxattr(int fd, const char *name, const void *val, size_t size);</li>
<li>int chain_listxattr(const char *fn, char *names, size_t len);</li>
<li>int chain_flistxattr(int fd, char *names, size_t len);</li>
<li>int chain_removexattr(const char *fn, const char *name);</li>
<li>int chain_fremovexattr(int fd, const char *name);</li>
</ul>
<h3>untested lines</h3>
<p>The function <strong>translate_raw_name</strong> substitutes <strong>@@</strong> into <strong>@.</strong> When the trailing<br />
character is a @, it breaks. However, such an occurrence cannot be created by <strong>chain_setxattr</strong> because it always create pairs of <strong>@.</strong> Instead of silently breaking the loop, the function should probably return on error so that the caller can ignore it.</p>
<p>The function <strong>chain_fgetxattr_len</strong> may return on error if <strong>fgetxattr</strong> returns on error. However, it is only called after another attr function returned success and the tests cannot create the conditions under which it would fail.</p>
<p>The function <strong>chain_fsetxattr</strong> contains untested code being <a href="http://marc.info/?l=ceph-devel&#038;m=136027076615853&#038;w=4">discussed on the mailing list </a></p>
<h3>dependency to the underlying filesystem</h3>
<p>If the file system in which the tests are run does not support extended attributes, the tests are not run. The detection uses the same logic as the one implemented in <strong>FileStore::_detect_fs</strong>. The test should be run as part of teuthology to <a href="http://marc.info/?l=ceph-devel&#038;m=136015147332122&#038;w=4">check how it behaves when used with the supported file systems</a>.</p>
<h3>minimizing the noise</h3>
<p>The output of the tests are silenced to reduce the output when testing assertions ( except for the dout_emergency function which cannot be controlled).</p>
]]></content:encoded>
			<wfw:commentRss>http://dachary.org/?feed=rss2&#038;p=1893</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
