Shattered Crystalline Matrixhttps://mithrandi.net/index.atom2018-05-19T08:14:05ZmithrandiNikolaBuild artifacts with pexhttps://mithrandi.net/blog/build-artifacts-with-pex/2018-05-19T08:01:03Z2018-05-19T08:01:03Zmithrandi<div><p>For Continuous Integration in my Python application (as opposed to library)
projects, I generally want to run my tests as well as build a Docker image,
making use of the same artifacts and environment for both: I want to test what
I'm actually deploying, rather than something merely similar. Previously this meant doing <tt class="docutils literal">pip install</tt> twice; once into a virtualenv to run the tests, then again in the Docker image build. Sharing the pip cache between these steps speeds things up a lot, but this still ends up using quite a bit of time on network roundtrips etc.</p>
<p>Now that <a class="reference external" href="https://github.com/pantsbuild/pex/issues/222#issuecomment-389387644">pex works with pypy</a>, I have developed a slightly better workflow
for this. Briefly speaking, <a class="reference external" href="https://github.com/pantsbuild/pex">Pex</a> is a tool for assembling a Python application
into a single runnable file that embeds the dependencies of the application; at
runtime, the dependencies are ziploaded or extracted to a temporary location as
necessary to run the application.</p>
<p>The workflow:</p>
<ol class="arabic simple">
<li><a class="reference external" href="https://github.com/fusionapp/fusion-index/blob/b607b5359ad70da3eb4e8a360efd2d7e6b307940/.drone.yml#L23-L33">Build a pex file</a>.</li>
<li><a class="reference external" href="https://github.com/fusionapp/fusion-index/blob/b607b5359ad70da3eb4e8a360efd2d7e6b307940/.drone.yml#L34">Run the tests</a> against the pex file.</li>
<li><a class="reference external" href="https://github.com/fusionapp/fusion-index/blob/master/Dockerfile#L3">Copy the pex</a> into the Docker image.</li>
</ol>
<p>This is similar to what I would do with a language like Go or Haskell that
would produce a single executable as part of the build process.</p></div><div><p>For Continuous Integration in my Python application (as opposed to library)
projects, I generally want to run my tests as well as build a Docker image,
making use of the same artifacts and environment for both: I want to test what
I'm actually deploying, rather than something merely similar. Previously this meant doing <tt class="docutils literal">pip install</tt> twice; once into a virtualenv to run the tests, then again in the Docker image build. Sharing the pip cache between these steps speeds things up a lot, but this still ends up using quite a bit of time on network roundtrips etc.</p>
<p>Now that <a class="reference external" href="https://github.com/pantsbuild/pex/issues/222#issuecomment-389387644">pex works with pypy</a>, I have developed a slightly better workflow
for this. Briefly speaking, <a class="reference external" href="https://github.com/pantsbuild/pex">Pex</a> is a tool for assembling a Python application
into a single runnable file that embeds the dependencies of the application; at
runtime, the dependencies are ziploaded or extracted to a temporary location as
necessary to run the application.</p>
<p>The workflow:</p>
<ol class="arabic simple">
<li><a class="reference external" href="https://github.com/fusionapp/fusion-index/blob/b607b5359ad70da3eb4e8a360efd2d7e6b307940/.drone.yml#L23-L33">Build a pex file</a>.</li>
<li><a class="reference external" href="https://github.com/fusionapp/fusion-index/blob/b607b5359ad70da3eb4e8a360efd2d7e6b307940/.drone.yml#L34">Run the tests</a> against the pex file.</li>
<li><a class="reference external" href="https://github.com/fusionapp/fusion-index/blob/master/Dockerfile#L3">Copy the pex</a> into the Docker image.</li>
</ol>
<p>This is similar to what I would do with a language like Go or Haskell that
would produce a single executable as part of the build process.</p></div>Adventures in deployment, reduxhttps://mithrandi.net/blog/adventures-in-deployment-redux/2018-01-19T21:52:04Z2018-01-19T21:52:04Zmithrandi<div><p>Just a brief update: I've moved on from the hackish deployment strategy
described in an <a class="reference external" href="https://mithrandi.net/blog/2015/06/adventures-in-deployment-with-propellor-docker-and-fabric/">older post</a>; we now have <a class="reference external" href="http://docs.drone.io/">Drone</a> for CI building Docker images,
pushing them to a registry, and deploying them on <a class="reference external" href="https://rancher.com/">Rancher</a>. You can see an
example of the <a class="reference external" href="https://github.com/fusionapp/clj-documint/blob/e254a1ab1eeba58f53d815d7a16514179d6fe57b/.drone.yml">Drone pipeline</a> to do this in our GitHub repos.</p>
<p>This is all I'm going to say on the topic, because that's all there really is
to it. If you're using Kubernetes, I strongly suggest you look into <a class="reference external" href="https://github.com/weaveworks/flux">Weave Flux</a>
for deployment; this is eventually where I want to end up, but migrating to
Kubernetes is hard for us at the moment.</p></div><div><p>Just a brief update: I've moved on from the hackish deployment strategy
described in an <a class="reference external" href="https://mithrandi.net/blog/2015/06/adventures-in-deployment-with-propellor-docker-and-fabric/">older post</a>; we now have <a class="reference external" href="http://docs.drone.io/">Drone</a> for CI building Docker images,
pushing them to a registry, and deploying them on <a class="reference external" href="https://rancher.com/">Rancher</a>. You can see an
example of the <a class="reference external" href="https://github.com/fusionapp/clj-documint/blob/e254a1ab1eeba58f53d815d7a16514179d6fe57b/.drone.yml">Drone pipeline</a> to do this in our GitHub repos.</p>
<p>This is all I'm going to say on the topic, because that's all there really is
to it. If you're using Kubernetes, I strongly suggest you look into <a class="reference external" href="https://github.com/weaveworks/flux">Weave Flux</a>
for deployment; this is eventually where I want to end up, but migrating to
Kubernetes is hard for us at the moment.</p></div>Running away againhttps://mithrandi.net/blog/running-away-again/2016-11-10T23:12:40Z2016-11-10T23:12:40Zmithrandi<div><p>[I originally posted this on Facebook; I've disabled comments here because I
don't think I can cope with them over here on my blog, but if you really need
to comment you can find me on Facebook or elsewhere]</p>
<p>I've run away several times in my life.</p>
<p>The very first time was as a teenager; I don't even recall exactly what it was
that triggered it, I think it was anxiety over a compulsory school camp that I
didn't want to attend. I didn't have a plan, I didn't even have any real
intention to actually run away for real, I just felt like I was trapped and
needed to escape / lash out somehow. "Running away" amounted to declaring my
intent, leaving the house, hiding somewhere nearby while everyone frantically
searched for me, and eventually returning a few hours later once I had calmed
down a bit. Of course, at the time, I wasn't really capable of fully
understanding the situation; I didn't understand the destructive mental paths
that lead to being unable to cope, I didn't really care how much emotional
stress I was causing to others, and I didn't realise that had I been able to
communicate better about how I was feeling, I could probably have dealt with
things much better. In short: I was wrong to run away this time. Then again, I
was just a teenager, so I think I can be forgiven for failing at those things.</p>
<p>Later on in life, running away took on a more sophisticated form. Once I gained
sufficient control over my life (both in the legal and the financial sense), I
took steps to cut certain people out of it. I have few regrets about the
process (which was staggered, not something I did all at once); some
relationships and interactions are simply not healthy or beneficial for the
parties involved, and not having to deal with the constant stress of those
interactions definitely improved my mental health and my life in general by
quite a bit. In short: I think I was right to run away this time. But in the
end, it was still running away; I couldn't cope with the situation, so I chose
not to cope with it.</p>
<p>And now…recent events surrounding the 2016 US elections are slowly producing
that rising sense of panic and inability to cope again. The elections
themselves are not the trigger; rather, it is the conversations they have
sparked which has lead me to realise certain things about relatives,
acquaintainces, and friends. As most of you probably know, I'm no stranger to
intellectual debate / conflict; this isn't about that. I can quite happily
handle interactions with someone I disagree with on politics, economics, or
what have you; most of the time, I'm happy to discuss / argue the matter for
hours, unpacking our disagreement into differences in assumptions vs.
differences in reasoning, and ending with a better understanding of both sides
even if my mind remains unchanged.</p>
<p>But this is different: this is a conflict over whether or not we should care
about other people. Caring about other people is something I learned as an
incontrovertible truth from childhood, not a selective privilege to be extended
to only some and not others. I now realise that many around me do not feel the
same way; they feel that some people are not deserving of their care, for
whatever reason, and…I don't know how to deal with this.</p>
<p>I feel like running away this time would be a mistake, that I need to stay and
engage with these people, attempt to change their minds; but I have no idea how
to do this, and I fear the poisoning of either my own mind, or my
relationships, as a result of this. I fear that to maintain these relationships
as-is, with no condemnation, will result in sending the message that I support
something I do not. I fear that such condemnation would inevitably lead to the
destruction of those relationships anyway, accomplishing nothing good in the
end. I fear that by running away, I am losing the opportunity to do something
about all of this. I feel guilty because of my fear; there are others who do
not even have the option of running away, others who are the direct target of
the uncaring and the hatred, rather than mere bystanders who can leave at any
other time. How much worse must their fear be than mine?</p>
<p>And so here I am, trapped in a seemingly endless mental loop, afraid to run and
afraid not to run.</p></div><div><p>[I originally posted this on Facebook; I've disabled comments here because I
don't think I can cope with them over here on my blog, but if you really need
to comment you can find me on Facebook or elsewhere]</p>
<p>I've run away several times in my life.</p>
<p>The very first time was as a teenager; I don't even recall exactly what it was
that triggered it, I think it was anxiety over a compulsory school camp that I
didn't want to attend. I didn't have a plan, I didn't even have any real
intention to actually run away for real, I just felt like I was trapped and
needed to escape / lash out somehow. "Running away" amounted to declaring my
intent, leaving the house, hiding somewhere nearby while everyone frantically
searched for me, and eventually returning a few hours later once I had calmed
down a bit. Of course, at the time, I wasn't really capable of fully
understanding the situation; I didn't understand the destructive mental paths
that lead to being unable to cope, I didn't really care how much emotional
stress I was causing to others, and I didn't realise that had I been able to
communicate better about how I was feeling, I could probably have dealt with
things much better. In short: I was wrong to run away this time. Then again, I
was just a teenager, so I think I can be forgiven for failing at those things.</p>
<p>Later on in life, running away took on a more sophisticated form. Once I gained
sufficient control over my life (both in the legal and the financial sense), I
took steps to cut certain people out of it. I have few regrets about the
process (which was staggered, not something I did all at once); some
relationships and interactions are simply not healthy or beneficial for the
parties involved, and not having to deal with the constant stress of those
interactions definitely improved my mental health and my life in general by
quite a bit. In short: I think I was right to run away this time. But in the
end, it was still running away; I couldn't cope with the situation, so I chose
not to cope with it.</p>
<p>And now…recent events surrounding the 2016 US elections are slowly producing
that rising sense of panic and inability to cope again. The elections
themselves are not the trigger; rather, it is the conversations they have
sparked which has lead me to realise certain things about relatives,
acquaintainces, and friends. As most of you probably know, I'm no stranger to
intellectual debate / conflict; this isn't about that. I can quite happily
handle interactions with someone I disagree with on politics, economics, or
what have you; most of the time, I'm happy to discuss / argue the matter for
hours, unpacking our disagreement into differences in assumptions vs.
differences in reasoning, and ending with a better understanding of both sides
even if my mind remains unchanged.</p>
<p>But this is different: this is a conflict over whether or not we should care
about other people. Caring about other people is something I learned as an
incontrovertible truth from childhood, not a selective privilege to be extended
to only some and not others. I now realise that many around me do not feel the
same way; they feel that some people are not deserving of their care, for
whatever reason, and…I don't know how to deal with this.</p>
<p>I feel like running away this time would be a mistake, that I need to stay and
engage with these people, attempt to change their minds; but I have no idea how
to do this, and I fear the poisoning of either my own mind, or my
relationships, as a result of this. I fear that to maintain these relationships
as-is, with no condemnation, will result in sending the message that I support
something I do not. I fear that such condemnation would inevitably lead to the
destruction of those relationships anyway, accomplishing nothing good in the
end. I fear that by running away, I am losing the opportunity to do something
about all of this. I feel guilty because of my fear; there are others who do
not even have the option of running away, others who are the direct target of
the uncaring and the hatred, rather than mere bystanders who can leave at any
other time. How much worse must their fear be than mine?</p>
<p>And so here I am, trapped in a seemingly endless mental loop, afraid to run and
afraid not to run.</p></div>Progress on txacmehttps://mithrandi.net/blog/progress-on-txacme/2016-08-15T17:40:28Z2016-08-15T17:40:28Zmithrandi<div><p>The initial release of <a class="reference external" href="https://github.com/mithrandi/txacme">txacme</a> was perhaps a quieter affair than I was hoping
for, but the response has been enthusiastic from the few people that did take
notice. Work has been progressing on the next release (which will be 0.9.1, I'm
not at all ready to pronounce the API fully baked yet), which will have a
number of minor improvements such as better documentation, but the headline
feature is probably <tt class="docutils literal"><span class="pre">dns-01</span></tt> challenge support.</p>
<p>There is currently a <a class="reference external" href="https://github.com/mithrandi/txacme/issues/59">PR</a> up with a <a class="reference external" href="https://libcloud.apache.org/">libcloud</a>-based implementation that
works <a class="footnote-reference" href="https://mithrandi.net/blog/progress-on-txacme/#id3" id="id1">[1]</a>; I also have plans for a <a class="reference external" href="https://github.com/mithrandi/txacme/issues/62">twistd plugin</a> to run a standalone
issuing service, but I'm not yet sure if this will make it into <a class="reference external" href="https://github.com/mithrandi/txacme/milestone/2">0.9.1</a>.</p>
<p>Implementing a <tt class="docutils literal"><span class="pre">dns-01</span></tt> challenge responder revealed some deficiencies in the
<tt class="docutils literal">IResponder</tt> interface that made it necessary to expand it; this is exactly
the sort of API change that I was expecting to crop up, and one of the reasons
why I'm holding off on a 1.0.0 release for the time being. My guess is that
I'll make it to at least 0.9.5 or thereabouts before 1.0.0 comes out.</p>
<p class="rubric">Footnotes</p>
<table class="docutils footnote" frame="void" id="id3" rules="none">
<colgroup><col class="label"><col></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="https://mithrandi.net/blog/progress-on-txacme/#id1">[1]</a></td><td>Well... mostly. libcloud does synchronous network I/O so it needs to be
invoked in a thread pool, and some of the DNS drivers are buggy (eg. the
Route 53 driver does not format TXT records how Route 53 expects them),
but I have a passing integration test using Rackspace Cloud DNS!</td></tr>
</tbody>
</table></div><div><p>The initial release of <a class="reference external" href="https://github.com/mithrandi/txacme">txacme</a> was perhaps a quieter affair than I was hoping
for, but the response has been enthusiastic from the few people that did take
notice. Work has been progressing on the next release (which will be 0.9.1, I'm
not at all ready to pronounce the API fully baked yet), which will have a
number of minor improvements such as better documentation, but the headline
feature is probably <tt class="docutils literal"><span class="pre">dns-01</span></tt> challenge support.</p>
<p>There is currently a <a class="reference external" href="https://github.com/mithrandi/txacme/issues/59">PR</a> up with a <a class="reference external" href="https://libcloud.apache.org/">libcloud</a>-based implementation that
works <a class="footnote-reference" href="https://mithrandi.net/blog/progress-on-txacme/#id3" id="id1">[1]</a>; I also have plans for a <a class="reference external" href="https://github.com/mithrandi/txacme/issues/62">twistd plugin</a> to run a standalone
issuing service, but I'm not yet sure if this will make it into <a class="reference external" href="https://github.com/mithrandi/txacme/milestone/2">0.9.1</a>.</p>
<p>Implementing a <tt class="docutils literal"><span class="pre">dns-01</span></tt> challenge responder revealed some deficiencies in the
<tt class="docutils literal">IResponder</tt> interface that made it necessary to expand it; this is exactly
the sort of API change that I was expecting to crop up, and one of the reasons
why I'm holding off on a 1.0.0 release for the time being. My guess is that
I'll make it to at least 0.9.5 or thereabouts before 1.0.0 comes out.</p>
<p class="rubric">Footnotes</p>
<table class="docutils footnote" frame="void" id="id3" rules="none">
<colgroup><col class="label"><col></colgroup>
<tbody valign="top">
<tr><td class="label"><a class="fn-backref" href="https://mithrandi.net/blog/progress-on-txacme/#id1">[1]</a></td><td>Well... mostly. libcloud does synchronous network I/O so it needs to be
invoked in a thread pool, and some of the DNS drivers are buggy (eg. the
Route 53 driver does not format TXT records how Route 53 expects them),
but I have a passing integration test using Rackspace Cloud DNS!</td></tr>
</tbody>
</table></div>WordPress to Nikolahttps://mithrandi.net/blog/wordpress-to-nikola/2016-07-24T17:35:33Z2016-07-24T17:35:33Zmithrandi<div><p>As you may or may not have noticed, my blog looks substantially different now.
This is because I have migrated from <a class="reference external" href="https://wordpress.org/">WordPress</a> to <a class="reference external" href="https://getnikola.com/">Nikola</a>. Using the
WordPress importer worked out reasonably well, although I needed to fix up the
code blocks myself. Going forward I'm just going to use the standard ReST
format for writing posts.</p>
<p>I'm also hosting the site on S3 with CloudFront in front, rather than a dinky
Linode, so the site should be a lot faster now. However, there's probably some
stuff broken that I didn't notice; please yell if you see anything.</p></div><div><p>As you may or may not have noticed, my blog looks substantially different now.
This is because I have migrated from <a class="reference external" href="https://wordpress.org/">WordPress</a> to <a class="reference external" href="https://getnikola.com/">Nikola</a>. Using the
WordPress importer worked out reasonably well, although I needed to fix up the
code blocks myself. Going forward I'm just going to use the standard ReST
format for writing posts.</p>
<p>I'm also hosting the site on S3 with CloudFront in front, rather than a dinky
Linode, so the site should be a lot faster now. However, there's probably some
stuff broken that I didn't notice; please yell if you see anything.</p></div>Adventures in deployment with Propellor, Docker, and Fabrichttps://mithrandi.net/blog/2015/06/adventures-in-deployment-with-propellor-docker-and-fabric/2015-06-01T22:27:00Z2015-06-01T22:27:00Zmithrandi<div><h2>Preface</h2>
<p>After playing around with Docker a bit, I decided that it would make an ideal deployment platform for my work services (previously we were using some ad-hoc isolation using unix users and not much else). While Docker’s security is…suspect…compared to a complete virtualization solution (see Xen), I’m not so much worried about complete isolation between my services, as things like easy resource limits and imaging. You can build this yourself out of cgroups, chroot, etc. but in the end you’re just reinventing the Docker wheel, so I went with Docker instead.</p>
<p>However, Docker by itself is not a complete solution. You still need some way to configure the Docker host, and you also need to build Docker images, so I added Propellor (which I recently discovered) and Fabric to the mix.</p>
<h2>Propellor</h2>
<p><a href="http://propellor.branchable.com/">Propellor</a> is a configuration management system (in the sense of Puppet, Chef, Salt, et al.) written in Haskell, where your configuration itself is Haskell code. For someone coming from a systems administration background, the flexibility and breadth offered by a real programming language like Haskell may be quite daunting, but as a programmer, I find it far more straightforward to just write code that does what I want, extracting common pieces into functions and so on. Our previous incarnation of things used Puppet for configuration management, but it always felt very awkward to work with; another problem is that Puppet was introduced after a bunch of the infrastructure was in place, meaning a lot of things were not actually managed by Puppet because somebody forgot. Propellor was used to configure a new server from scratch, ensuring that nothing was done ad-hoc, and while I won’t go into too much detail about Propellor, I am liking it a lot so far.</p>
<p>The role of Propellor in the new order is to configure things to provide the expected platform. This includes installing Docker, installing admin user accounts, SSH keys, groups, and so on.</p>
<h2>Docker</h2>
<p>The Docker workflow I adopted is based on the one <a href="https://glyph.twistedmatrix.com/2015/03/docker-deploy-double-dutch.html">described by Glyph</a>. I would strongly recommend you go read his excellent post for the long explanation, but the short version is that instead of building a single container image, you instead build three: A “build” container used to produce the built artifacts from your sources (eg. Python wheels, Java/Clojure JARs), a “run” container which is built by installing the artifacts produced by running the “build” container, and thus does not need to contain your toolchain and -dev packages (keeping the size down), and a “base” container which contains the things shared by the “build” and “run” containers, allowing for even more efficiency of disk usage.</p>
<p>While I can’t show the Docker bits for our proprietary codebases, you can see the bits for <a href="https://github.com/fusionapp/documint/tree/master/docker">one of our free software codebases</a>, including instructions for building and running the images. The relative simplicity of the <code>.docker</code> files is no accident; rather than trying to shoehorn any complex build processes into the Docker image build, all of the heavy lifting is done by standard build and install tools (in the case of Documint: apt/dpkg, pip, and setuptools). Following this principal will save you a lot of pain and tears.</p>
<h2>Fabric</h2>
<p>The steps outlined for building the Docker images are relatively straightforward, but copy/pasting shell command lines from a README into a terminal is still not a great experience. In addition, our developers are typically working from internet connections where downloading multi-megabyte Docker images / packages / etc. is a somewhat tedious experience, and uploading the resulting images is ten times worse (literally ten times worse; my connection at home is 10M down / 1M up ADSL, for example). Rather than doing this locally, this should instead run on one of our servers which has much better connectivity and a stable / well-defined platform configuration (thanks to Propellor). So now the process would be “copy/paste shell command lines from a README into an ssh session” — no thanks. (For comparison, our current deployment processes use some ad-hoc shell scripts lying around on the relevant servers; a bit better than copy/pasting into an ssh session, but not by much.)</p>
<p>At this point, <a href="http://blog.froztbyte.net/">froztbyte</a> reminded me of Fabric (which I knew about previously, but hadn’t thoughto f in this context). So instead I wrote some fairly simple Fabric tasks to automate the process of building new containers, and also deploying them. For final production use, I will probably be setting up a scheduled task that automatically deploys from our “prod” branch (much like our current workflow does), but for testing purposes, we want a deploy to happen whenever somebody merges something into our testing release branch (eventually I’d like to deploy test environments on demand for separate branches, but this poses some challenges which are outside of the scope of this blog post). I could build some automated deployment system triggered by webhooks from BitBucket (where our private source code is hosted), but since everyone with access to merge things into that branch also has direct SSH access to our servers, Fabric was the easiest solution; no need to add another pile of moving parts to the system.</p>
<p>My Fabric tasks look like this (censored slightly to remove hostnames):</p>
<div class="code-unformatted"><pre class="code literal-block"><span></span><span class="lineno"> 1 </span>@hosts('my.uat.host')
<span class="lineno"> 2 </span>def build_uat_documint():
<span class="lineno"> 3 </span> with settings(warn_only=True):
<span class="lineno"> 4 </span> if run('test -d /srv/build/documint').failed:
<span class="lineno"> 5 </span> run('git clone --quiet -- https://github.com/fusionapp/documint.git /srv/build/documint')
<span class="lineno"> 6 </span> with cd('/srv/build/documint'):
<span class="lineno"> 7 </span> run('git pull --quiet')
<span class="lineno"> 8 </span> run('docker build --tag=fusionapp/documint-base --file=docker/base.docker .')
<span class="lineno"> 9 </span> run('docker build --tag=fusionapp/documint-build --file=docker/build.docker .')
<span class="lineno">10 </span> run('docker run --rm --tty --interactive --volume=&quot;/srv/build/documint:/application&quot; --volume=&quot;/srv/build/documint/wheelhouse:/wheelhouse&quot; fusionapp/documint-build')
<span class="lineno">11 </span> run('cp /srv/build/clj-neon/src/target/uberjar/clj-neon-*-standalone.jar bin/clj-neon.jar')
<span class="lineno">12 </span> run('docker build --tag=fusionapp/documint --file=docker/run.docker .')
<span class="lineno">13 </span>
<span class="lineno">14 </span>
<span class="lineno">15 </span>@hosts('my.uat.host')
<span class="lineno">16 </span>def deploy_uat_documint():
<span class="lineno">17 </span> with settings(warn_only=True):
<span class="lineno">18 </span> run('docker stop --time=30 documint')
<span class="lineno">19 </span> run('docker rm --volumes --force documint')
<span class="lineno">20 </span> run('docker run --detach --restart=always --name=documint --publish=8750:8750 fusionapp/documint')
</pre></div>
<p>Developers can now deploy a new version of Documint (for example) by simply running <code>fab build_uat_documint deploy_uat_documint</code>. Incidentally, the unit tests are run during the container build (from the <code>.docker</code> file), so deploying a busted code version by accident shouldn’t happen.</p></div><div><h2>Preface</h2>
<p>After playing around with Docker a bit, I decided that it would make an ideal deployment platform for my work services (previously we were using some ad-hoc isolation using unix users and not much else). While Docker’s security is…suspect…compared to a complete virtualization solution (see Xen), I’m not so much worried about complete isolation between my services, as things like easy resource limits and imaging. You can build this yourself out of cgroups, chroot, etc. but in the end you’re just reinventing the Docker wheel, so I went with Docker instead.</p>
<p>However, Docker by itself is not a complete solution. You still need some way to configure the Docker host, and you also need to build Docker images, so I added Propellor (which I recently discovered) and Fabric to the mix.</p>
<h2>Propellor</h2>
<p><a href="http://propellor.branchable.com/">Propellor</a> is a configuration management system (in the sense of Puppet, Chef, Salt, et al.) written in Haskell, where your configuration itself is Haskell code. For someone coming from a systems administration background, the flexibility and breadth offered by a real programming language like Haskell may be quite daunting, but as a programmer, I find it far more straightforward to just write code that does what I want, extracting common pieces into functions and so on. Our previous incarnation of things used Puppet for configuration management, but it always felt very awkward to work with; another problem is that Puppet was introduced after a bunch of the infrastructure was in place, meaning a lot of things were not actually managed by Puppet because somebody forgot. Propellor was used to configure a new server from scratch, ensuring that nothing was done ad-hoc, and while I won’t go into too much detail about Propellor, I am liking it a lot so far.</p>
<p>The role of Propellor in the new order is to configure things to provide the expected platform. This includes installing Docker, installing admin user accounts, SSH keys, groups, and so on.</p>
<h2>Docker</h2>
<p>The Docker workflow I adopted is based on the one <a href="https://glyph.twistedmatrix.com/2015/03/docker-deploy-double-dutch.html">described by Glyph</a>. I would strongly recommend you go read his excellent post for the long explanation, but the short version is that instead of building a single container image, you instead build three: A “build” container used to produce the built artifacts from your sources (eg. Python wheels, Java/Clojure JARs), a “run” container which is built by installing the artifacts produced by running the “build” container, and thus does not need to contain your toolchain and -dev packages (keeping the size down), and a “base” container which contains the things shared by the “build” and “run” containers, allowing for even more efficiency of disk usage.</p>
<p>While I can’t show the Docker bits for our proprietary codebases, you can see the bits for <a href="https://github.com/fusionapp/documint/tree/master/docker">one of our free software codebases</a>, including instructions for building and running the images. The relative simplicity of the <code>.docker</code> files is no accident; rather than trying to shoehorn any complex build processes into the Docker image build, all of the heavy lifting is done by standard build and install tools (in the case of Documint: apt/dpkg, pip, and setuptools). Following this principal will save you a lot of pain and tears.</p>
<h2>Fabric</h2>
<p>The steps outlined for building the Docker images are relatively straightforward, but copy/pasting shell command lines from a README into a terminal is still not a great experience. In addition, our developers are typically working from internet connections where downloading multi-megabyte Docker images / packages / etc. is a somewhat tedious experience, and uploading the resulting images is ten times worse (literally ten times worse; my connection at home is 10M down / 1M up ADSL, for example). Rather than doing this locally, this should instead run on one of our servers which has much better connectivity and a stable / well-defined platform configuration (thanks to Propellor). So now the process would be “copy/paste shell command lines from a README into an ssh session” — no thanks. (For comparison, our current deployment processes use some ad-hoc shell scripts lying around on the relevant servers; a bit better than copy/pasting into an ssh session, but not by much.)</p>
<p>At this point, <a href="http://blog.froztbyte.net/">froztbyte</a> reminded me of Fabric (which I knew about previously, but hadn’t thoughto f in this context). So instead I wrote some fairly simple Fabric tasks to automate the process of building new containers, and also deploying them. For final production use, I will probably be setting up a scheduled task that automatically deploys from our “prod” branch (much like our current workflow does), but for testing purposes, we want a deploy to happen whenever somebody merges something into our testing release branch (eventually I’d like to deploy test environments on demand for separate branches, but this poses some challenges which are outside of the scope of this blog post). I could build some automated deployment system triggered by webhooks from BitBucket (where our private source code is hosted), but since everyone with access to merge things into that branch also has direct SSH access to our servers, Fabric was the easiest solution; no need to add another pile of moving parts to the system.</p>
<p>My Fabric tasks look like this (censored slightly to remove hostnames):</p>
<div class="code-unformatted"><pre class="code literal-block"><span></span><span class="lineno"> 1 </span>@hosts('my.uat.host')
<span class="lineno"> 2 </span>def build_uat_documint():
<span class="lineno"> 3 </span> with settings(warn_only=True):
<span class="lineno"> 4 </span> if run('test -d /srv/build/documint').failed:
<span class="lineno"> 5 </span> run('git clone --quiet -- https://github.com/fusionapp/documint.git /srv/build/documint')
<span class="lineno"> 6 </span> with cd('/srv/build/documint'):
<span class="lineno"> 7 </span> run('git pull --quiet')
<span class="lineno"> 8 </span> run('docker build --tag=fusionapp/documint-base --file=docker/base.docker .')
<span class="lineno"> 9 </span> run('docker build --tag=fusionapp/documint-build --file=docker/build.docker .')
<span class="lineno">10 </span> run('docker run --rm --tty --interactive --volume=&quot;/srv/build/documint:/application&quot; --volume=&quot;/srv/build/documint/wheelhouse:/wheelhouse&quot; fusionapp/documint-build')
<span class="lineno">11 </span> run('cp /srv/build/clj-neon/src/target/uberjar/clj-neon-*-standalone.jar bin/clj-neon.jar')
<span class="lineno">12 </span> run('docker build --tag=fusionapp/documint --file=docker/run.docker .')
<span class="lineno">13 </span>
<span class="lineno">14 </span>
<span class="lineno">15 </span>@hosts('my.uat.host')
<span class="lineno">16 </span>def deploy_uat_documint():
<span class="lineno">17 </span> with settings(warn_only=True):
<span class="lineno">18 </span> run('docker stop --time=30 documint')
<span class="lineno">19 </span> run('docker rm --volumes --force documint')
<span class="lineno">20 </span> run('docker run --detach --restart=always --name=documint --publish=8750:8750 fusionapp/documint')
</pre></div>
<p>Developers can now deploy a new version of Documint (for example) by simply running <code>fab build_uat_documint deploy_uat_documint</code>. Incidentally, the unit tests are run during the container build (from the <code>.docker</code> file), so deploying a busted code version by accident shouldn’t happen.</p></div>Axiom benchmark results on PyPy 2.5.0https://mithrandi.net/blog/2015/03/axiom-benchmark-results-on-pypy-2-5-0/2015-03-07T13:22:56Z2015-03-07T13:22:56Zmithrandi<div><p>This is a followup to a post I made about 1.5 years ago, <a href="https://mithrandi.net/blog/2013/08/axiom-benchmark-results-on-pypy/">benchmarking Axiom on PyPy 2.1.0</a>. Not too much has changed in Axiom since then (we fixed two nasty bugs that mainly affected PyPy, but I don’t expect those changes to have had much impact on performance), but PyPy (now at 2.5.0) has had plenty of work done on it since then, so let’s see what that means for Axiom performance!</p>
<p>Unlike my previous post, I’m basically just going to show the results here without much commentary:</p>
<p><img src="https://docs.google.com/a/mithrandi.net/spreadsheet/oimg?key=0Aqhx69EOZxPFdGFfbjUzQldhUlpwYWJDUTlfdFAxVXc&oid=4&zx=m1u084kzaksu" alt="Graph of Axiom performance"></p>
<p>A few notes:</p>
<ul>
<li>I didn’t redo the old benchmark results, but the hardware/software I ran the benchmarks on is not significantly different, so I think the results are still valid as far as broad comparisons go (as you can see, the CPython results match fairly closely).</li>
<li>The benchmark harness I’m using now is improved over the last time, using some statistical techniques to determine how long to run the benchmark, rather than relying on some hardcoded values to achieve JIT warmup and performance stability. Still could use some work (eg. outputting a kernel density estimate / error bars, rather than just a single mean time value).</li>
<li>There is one new benchmark relative to the last time, <code>powerup-loading</code>; PyPy really shines here, cutting out a ton of overhead. There’s still room for a few more benchmarks of critical functions such as actually running and loading query results (as opposed to just constructing query objects).</li>
<li>The branch I used to run these benchmarks is <a href="https://github.com/twisted/axiom/tree/42-better-benchmarks">available on Github</a>.</li>
<li>The horizontal axis is cut off at 1.0 so you can’t actually see how <code>store-opening</code> lines up, but the raw data shows that PyPy 2.1.0 took about 53% longer on this benchmark, whil PyPy 2.5.0 only takes about 2% longer.</li>
</ul></div><div><p>This is a followup to a post I made about 1.5 years ago, <a href="https://mithrandi.net/blog/2013/08/axiom-benchmark-results-on-pypy/">benchmarking Axiom on PyPy 2.1.0</a>. Not too much has changed in Axiom since then (we fixed two nasty bugs that mainly affected PyPy, but I don’t expect those changes to have had much impact on performance), but PyPy (now at 2.5.0) has had plenty of work done on it since then, so let’s see what that means for Axiom performance!</p>
<p>Unlike my previous post, I’m basically just going to show the results here without much commentary:</p>
<p><img src="https://docs.google.com/a/mithrandi.net/spreadsheet/oimg?key=0Aqhx69EOZxPFdGFfbjUzQldhUlpwYWJDUTlfdFAxVXc&oid=4&zx=m1u084kzaksu" alt="Graph of Axiom performance"></p>
<p>A few notes:</p>
<ul>
<li>I didn’t redo the old benchmark results, but the hardware/software I ran the benchmarks on is not significantly different, so I think the results are still valid as far as broad comparisons go (as you can see, the CPython results match fairly closely).</li>
<li>The benchmark harness I’m using now is improved over the last time, using some statistical techniques to determine how long to run the benchmark, rather than relying on some hardcoded values to achieve JIT warmup and performance stability. Still could use some work (eg. outputting a kernel density estimate / error bars, rather than just a single mean time value).</li>
<li>There is one new benchmark relative to the last time, <code>powerup-loading</code>; PyPy really shines here, cutting out a ton of overhead. There’s still room for a few more benchmarks of critical functions such as actually running and loading query results (as opposed to just constructing query objects).</li>
<li>The branch I used to run these benchmarks is <a href="https://github.com/twisted/axiom/tree/42-better-benchmarks">available on Github</a>.</li>
<li>The horizontal axis is cut off at 1.0 so you can’t actually see how <code>store-opening</code> lines up, but the raw data shows that PyPy 2.1.0 took about 53% longer on this benchmark, whil PyPy 2.5.0 only takes about 2% longer.</li>
</ul></div>BitBucket migrationhttps://mithrandi.net/blog/2013/09/bitbucket-migration/2013-09-28T22:59:42Z2013-09-28T22:59:42Zmithrandi<div><p>At work, we are currently using Launchpad for project hosting of our
proprietary codebase. Launchpad charges $250/year/project for hosting of
proprietary projects which is a little steep, and Launchpad/bzr has been
falling behind the alternatives in terms of tooling / development / support, so
when our Launchpad subscription came up for renewal at the beginning of the
month, this caused our somewhat vague plans to switch to something else to
crystalize.</p>
<p>I initially assumed that Github/git would be the obvious way to go, but after
looking into BitBucket/hg, I was pleasantly surprised to discover that things
like hosted CI were available there too. Nobody on our team is much of a git
enthusiast to begin with, so using hg seemed like a far more attractive option.
This meant figuring out how to do two things: 1) migrate all of our existing
bugs, and 2) migrate our existing bzr branches to hg. The former proved to be
relatively straightforward: <a class="reference external" href="http://jonathan.jsphere.com/">Jonathan</a> wrote a script that used launchpadlib to
access the Launchpad API, retrieve the bug data + metadata, and write it out in
the BitBucket import format (more on this in another post, or on Jonathan's
blog, depending on if I can convince him to write it up or not).</p>
<p>The bzr to hg conversion turned out to be a little more complex. A simple "hg
convert" of our trunk branch worked surprisingly well; the trunk history
converted correctly (at least as far as I could tell), but more (pleasantly)
surprisingly, the branches which were merged into trunk were also reconstructed
by the conversion, along with the merges. The conversion relies on the bzr
branch nick; this works somewhat like hg branches (it's associated with a
commit at the time that you commit), but as bzr does not place as much
importance on this label as hg, it is more likely to be wrong by accident (by
default the branch nick is just taken from the last component of the path to
the branch you are committing in, I believe, and in our case I suspect nobody
has ever set the branch nick manually). Among other things, this resulted in 4
different branch names for "trunk", as well as some other oddities like
misspelled feature branch names.</p>
<p>(As an aside, I'd like to mention that <tt class="docutils literal">hg log</tt> has far more utility than
<tt class="docutils literal">bzr log</tt>, mostly due to the "revsets" feature. Almost all of the inspection
I did while debugging the conversion was done on the converted repo using hg,
not on the original bzr repo, simply because it was far easier to get the
information that way.)</p>
<p>A "branchmap" file solved the problem with differing branches; mapping the
different names for "trunk" to "default" made the revision history graph look a
lot more reasonable than when I originally did the conversion. I also switched
to using <tt class="docutils literal"><span class="pre">--datesort</span></tt> for the conversion at this point; the documentation
warns that this may produce a much larger repository than <tt class="docutils literal"><span class="pre">--branchsort</span> (the
default)</tt>, but in my case, the size difference was trivial. I suspect this may
only apply in scenarios with back-and-forth merges between long-lived branches,
rather than the short-lived topic branches that form the majority of our
workflow. I also created an "authormap" file at this point to reconcile
differing author identities over the history of our repository. The bzr author
identity is a full name/email (eg. "Tristan Seligmann
<<a class="reference external" href="mailto:mithrandi@mithrandi.net">mithrandi@mithrandi.net</a>>", but again, there were various historical oddities
here; BitBucket also has the ability to map author identities to BitBucket
users, but I decided normalizing during the conversion was a good idea anyway.</p>
<p>The biggest problem I had to deal with (although this was actually one of the
first problems I noticed) was that all of these merged branches were still
open. Mercurial has the concept of "open" and "closed" branches, with closed
branches being hidden by default in most places. A "closed" branch is simply
one whose head revision is marked as closing the branch; which of course, none
of my branches had, due to being converted from bzr which does not have an
equivalent concept. Committing a closing revision to each branch was simple
enough to script, but that only lead to more difficulties: 1) a gigantic pile
of noise revisions in the history, and 2) a bunch of dangling heads as the new
"close" revision was not part of the merge to trunk. Scripting a merge of all
of the dangling heads would have produced even more noise, so I looked for a
different solution.</p>
<p>Eventually I ran across a <a class="reference external" href="http://www.selenic.com/pipermail/mercurial-devel/2013-January/047450.html">patch</a> on the Mercurial mailing list; unfortunately
the thread in which it was posted never went anywhere, but the patch still
worked. What this patch allowed me to do was after the initial conversion, run
another hg-to-hg conversion in which I marked the last branch revision before
the merge to trunk as closing the branch. The full conversion process now
looked something like this:</p>
<pre class="code bash"><a name="rest_code_51dcd4538b274beca1e36dc5ef2c9eee-1"></a>hg convert --datesort --branchmap branchmap --authormap authormap Fusion Fusion-hg-unspliced
<a name="rest_code_51dcd4538b274beca1e36dc5ef2c9eee-2"></a><span class="nb">cd</span> Fusion-hg-unspliced
<a name="rest_code_51dcd4538b274beca1e36dc5ef2c9eee-3"></a>hg log --template <span class="s2">"{node} close\n"</span> -r <span class="s2">"head() and not branch(default)"</span> > ../splicemap
<a name="rest_code_51dcd4538b274beca1e36dc5ef2c9eee-4"></a><span class="nb">cd</span> ..
<a name="rest_code_51dcd4538b274beca1e36dc5ef2c9eee-5"></a><span class="nv">PYTHONPATH</span><span class="o">=</span><span class="nv">$HOME</span>/hg-patched python <span class="nv">$HOME</span>/hg-patched/hg convert --splicemap splicemap Fusion-hg-unspliced Fusion-hg
</pre><p>This was good enough for a trunk conversion, but what about open branches that
aren't yet merged into trunk? We could have held off until we were able to
merge all of these branches, but that seemed like a lot of work (although we
did merge as many outstanding branches as possible). Fortunately hg convert can
operate in an incremental way; during the conversion, the mapping from source
revs to destination revs is stored in dest/.hg/shamap; the only wrinkle was my
two stage-conversion process. What I needed was a way to map the original bzr
revisions to the hg revisions in the <em>second</em> repository. In order to
accomplish this, I wrote a small Python script to merge the two mappings:</p>
<script src="https://gist.github.com/a71bffcdc4e42496e39a6a659aa1fe7f.js"></script><noscript><pre class="literal-block">
import sys
revmap2 = {}
with open(sys.argv[2]) as fMap2:
for line in fMap2:
a, b = line.rstrip('\n').split(' ')
revmap2[a] = b
with open(sys.argv[1]) as fMap1:
for line in fMap1:
a, b = line.rstrip('\n').split(' ')
print '%s %s' % (a, revmap2[b])
</pre>
</noscript><p>With the help of this script, I could now convert other branches:</p>
<pre class="code bash"><a name="rest_code_12a10c7df7e2413db6528e2c8eeaba56-1"></a><span class="c1"># Only take the bzr revisions</span>
<a name="rest_code_12a10c7df7e2413db6528e2c8eeaba56-2"></a>grep <span class="s1">'@'</span> Fusion-hg-unspliced/.hg/shamap > shamap
<a name="rest_code_12a10c7df7e2413db6528e2c8eeaba56-3"></a>python mergemaps.py shamap Fusion-hg/.hg/shamap > shamap-spliced
<a name="rest_code_12a10c7df7e2413db6528e2c8eeaba56-4"></a>mv shamap-spliced Fusion-hg/.hg/shamap
<a name="rest_code_12a10c7df7e2413db6528e2c8eeaba56-5"></a>
<a name="rest_code_12a10c7df7e2413db6528e2c8eeaba56-6"></a><span class="c1"># Now let's convert a branch</span>
<a name="rest_code_12a10c7df7e2413db6528e2c8eeaba56-7"></a>hg convert --branchmap branchmap --authormap authormap Fusion-some-branch Fusion-hg
</pre><p>In summary, the process, while hardly trivial, worked out a lot better than I
had initially expected.</p>
<p>EDIT: I forgot to mention in the original draft: We first started thinking
about moving away from Launchpad at the beginning of September, and completed
the migration in the last week, so the entire process took us less than a month
of part-time discussion / work.</p></div><div><p>At work, we are currently using Launchpad for project hosting of our
proprietary codebase. Launchpad charges $250/year/project for hosting of
proprietary projects which is a little steep, and Launchpad/bzr has been
falling behind the alternatives in terms of tooling / development / support, so
when our Launchpad subscription came up for renewal at the beginning of the
month, this caused our somewhat vague plans to switch to something else to
crystalize.</p>
<p>I initially assumed that Github/git would be the obvious way to go, but after
looking into BitBucket/hg, I was pleasantly surprised to discover that things
like hosted CI were available there too. Nobody on our team is much of a git
enthusiast to begin with, so using hg seemed like a far more attractive option.
This meant figuring out how to do two things: 1) migrate all of our existing
bugs, and 2) migrate our existing bzr branches to hg. The former proved to be
relatively straightforward: <a class="reference external" href="http://jonathan.jsphere.com/">Jonathan</a> wrote a script that used launchpadlib to
access the Launchpad API, retrieve the bug data + metadata, and write it out in
the BitBucket import format (more on this in another post, or on Jonathan's
blog, depending on if I can convince him to write it up or not).</p>
<p>The bzr to hg conversion turned out to be a little more complex. A simple "hg
convert" of our trunk branch worked surprisingly well; the trunk history
converted correctly (at least as far as I could tell), but more (pleasantly)
surprisingly, the branches which were merged into trunk were also reconstructed
by the conversion, along with the merges. The conversion relies on the bzr
branch nick; this works somewhat like hg branches (it's associated with a
commit at the time that you commit), but as bzr does not place as much
importance on this label as hg, it is more likely to be wrong by accident (by
default the branch nick is just taken from the last component of the path to
the branch you are committing in, I believe, and in our case I suspect nobody
has ever set the branch nick manually). Among other things, this resulted in 4
different branch names for "trunk", as well as some other oddities like
misspelled feature branch names.</p>
<p>(As an aside, I'd like to mention that <tt class="docutils literal">hg log</tt> has far more utility than
<tt class="docutils literal">bzr log</tt>, mostly due to the "revsets" feature. Almost all of the inspection
I did while debugging the conversion was done on the converted repo using hg,
not on the original bzr repo, simply because it was far easier to get the
information that way.)</p>
<p>A "branchmap" file solved the problem with differing branches; mapping the
different names for "trunk" to "default" made the revision history graph look a
lot more reasonable than when I originally did the conversion. I also switched
to using <tt class="docutils literal"><span class="pre">--datesort</span></tt> for the conversion at this point; the documentation
warns that this may produce a much larger repository than <tt class="docutils literal"><span class="pre">--branchsort</span> (the
default)</tt>, but in my case, the size difference was trivial. I suspect this may
only apply in scenarios with back-and-forth merges between long-lived branches,
rather than the short-lived topic branches that form the majority of our
workflow. I also created an "authormap" file at this point to reconcile
differing author identities over the history of our repository. The bzr author
identity is a full name/email (eg. "Tristan Seligmann
<<a class="reference external" href="mailto:mithrandi@mithrandi.net">mithrandi@mithrandi.net</a>>", but again, there were various historical oddities
here; BitBucket also has the ability to map author identities to BitBucket
users, but I decided normalizing during the conversion was a good idea anyway.</p>
<p>The biggest problem I had to deal with (although this was actually one of the
first problems I noticed) was that all of these merged branches were still
open. Mercurial has the concept of "open" and "closed" branches, with closed
branches being hidden by default in most places. A "closed" branch is simply
one whose head revision is marked as closing the branch; which of course, none
of my branches had, due to being converted from bzr which does not have an
equivalent concept. Committing a closing revision to each branch was simple
enough to script, but that only lead to more difficulties: 1) a gigantic pile
of noise revisions in the history, and 2) a bunch of dangling heads as the new
"close" revision was not part of the merge to trunk. Scripting a merge of all
of the dangling heads would have produced even more noise, so I looked for a
different solution.</p>
<p>Eventually I ran across a <a class="reference external" href="http://www.selenic.com/pipermail/mercurial-devel/2013-January/047450.html">patch</a> on the Mercurial mailing list; unfortunately
the thread in which it was posted never went anywhere, but the patch still
worked. What this patch allowed me to do was after the initial conversion, run
another hg-to-hg conversion in which I marked the last branch revision before
the merge to trunk as closing the branch. The full conversion process now
looked something like this:</p>
<pre class="code bash"><a name="rest_code_51dcd4538b274beca1e36dc5ef2c9eee-1"></a>hg convert --datesort --branchmap branchmap --authormap authormap Fusion Fusion-hg-unspliced
<a name="rest_code_51dcd4538b274beca1e36dc5ef2c9eee-2"></a><span class="nb">cd</span> Fusion-hg-unspliced
<a name="rest_code_51dcd4538b274beca1e36dc5ef2c9eee-3"></a>hg log --template <span class="s2">"{node} close\n"</span> -r <span class="s2">"head() and not branch(default)"</span> > ../splicemap
<a name="rest_code_51dcd4538b274beca1e36dc5ef2c9eee-4"></a><span class="nb">cd</span> ..
<a name="rest_code_51dcd4538b274beca1e36dc5ef2c9eee-5"></a><span class="nv">PYTHONPATH</span><span class="o">=</span><span class="nv">$HOME</span>/hg-patched python <span class="nv">$HOME</span>/hg-patched/hg convert --splicemap splicemap Fusion-hg-unspliced Fusion-hg
</pre><p>This was good enough for a trunk conversion, but what about open branches that
aren't yet merged into trunk? We could have held off until we were able to
merge all of these branches, but that seemed like a lot of work (although we
did merge as many outstanding branches as possible). Fortunately hg convert can
operate in an incremental way; during the conversion, the mapping from source
revs to destination revs is stored in dest/.hg/shamap; the only wrinkle was my
two stage-conversion process. What I needed was a way to map the original bzr
revisions to the hg revisions in the <em>second</em> repository. In order to
accomplish this, I wrote a small Python script to merge the two mappings:</p>
<script src="https://gist.github.com/a71bffcdc4e42496e39a6a659aa1fe7f.js"></script><noscript><pre class="literal-block">
import sys
revmap2 = {}
with open(sys.argv[2]) as fMap2:
for line in fMap2:
a, b = line.rstrip('\n').split(' ')
revmap2[a] = b
with open(sys.argv[1]) as fMap1:
for line in fMap1:
a, b = line.rstrip('\n').split(' ')
print '%s %s' % (a, revmap2[b])
</pre>
</noscript><p>With the help of this script, I could now convert other branches:</p>
<pre class="code bash"><a name="rest_code_12a10c7df7e2413db6528e2c8eeaba56-1"></a><span class="c1"># Only take the bzr revisions</span>
<a name="rest_code_12a10c7df7e2413db6528e2c8eeaba56-2"></a>grep <span class="s1">'@'</span> Fusion-hg-unspliced/.hg/shamap > shamap
<a name="rest_code_12a10c7df7e2413db6528e2c8eeaba56-3"></a>python mergemaps.py shamap Fusion-hg/.hg/shamap > shamap-spliced
<a name="rest_code_12a10c7df7e2413db6528e2c8eeaba56-4"></a>mv shamap-spliced Fusion-hg/.hg/shamap
<a name="rest_code_12a10c7df7e2413db6528e2c8eeaba56-5"></a>
<a name="rest_code_12a10c7df7e2413db6528e2c8eeaba56-6"></a><span class="c1"># Now let's convert a branch</span>
<a name="rest_code_12a10c7df7e2413db6528e2c8eeaba56-7"></a>hg convert --branchmap branchmap --authormap authormap Fusion-some-branch Fusion-hg
</pre><p>In summary, the process, while hardly trivial, worked out a lot better than I
had initially expected.</p>
<p>EDIT: I forgot to mention in the original draft: We first started thinking
about moving away from Launchpad at the beginning of September, and completed
the migration in the last week, so the entire process took us less than a month
of part-time discussion / work.</p></div>Axiom benchmark results on PyPyhttps://mithrandi.net/blog/2013/08/axiom-benchmark-results-on-pypy/2013-08-04T05:01:17Z2013-08-04T05:01:17Zmithrandi<div><p><strong>EDIT:</strong> <a href="https://mithrandi.net/blog/2015/03/axiom-benchmark-results-on-pypy-2-5-0/">Updated version now available</a>.</p>
<p><strong>EDIT:</strong> Fixed the issue with the store-opening benchmark</p>
<p>Axiom conveniently includes a few microbenchmarks; I thought I’d use them to give an idea of the speed increase made possible by running Axiom on PyPy. In order to do this, however, I’m going to have to modify the benchmarks a little. To understand why this is necessary, one has to understand how PyPy achieves the speed it does: namely, through the use of JIT (Just-In-Time) compilation techniques. In short, these techniques mean that PyPy is compiling code during the execution of a program; it does this “just in time” to run the code (or actually, if I understand correctly, in some cases only after the code has been run). This means that when a PyPy program has just started up, there is a lot of performance overhead in the form of the time taken up by JIT compilation running, as well as time taken up by code being interpreted slowly because it has not yet been compiled. While this performance hit is quite significant for command-line tools and other short-lived programs, many applications making use of Axiom are long-lived server processes; for these, any startup overhead is mostly unimportant, the performance that interests us is the performance achieved once the startup cost has already been paid. The Axiom microbenchmarks mostly take the form of performing a certain operation N times, recording the time taken, then dividing that time by N to get an average time per single operation. I have made two modifications to the microbenchmarks in order to demonstrate the performance on PyPy; first, I have increased the value of “N”; second, I have modified the benchmarks to run the entire benchmark twice, throwing away the results from the first run and only reporting the second run. This serves to exclude startup/”warmup” costs from the benchmark.</p>
<p>All of the results below are from my desktop machine running Debian unstable on amd64, CPython 2.7.5, and PyPy 2.1.0 on a Core i7-2600K running at 3.40GHz. I tried to keep the system mostly quiet during benchmarking, but I did have a web browser and other typical desktop applications running at the same time. Here’s a graph of the results; see the rest of the post for the details, especially regarding the store-opening benchmark (which is actually slower on PyPy).</p>
<p>[graph removed, see the <a href="https://mithrandi.net/blog/2015/03/axiom-benchmark-results-on-pypy-2-5-0/">new post</a> instead]</p>
<p>To get an example of how much of a difference this makes, let’s take a look at the first benchmark I’m going to run, <code>item-creation 15</code>. This benchmark constructs an <code>Item</code> type with 15 <code>integer</code> attributes, then runs 10 transactions where each transaction creates 1000 items of that type. In its initial form, the results look like this:</p>
<div class="code-unformatted"><pre class="code literal-block"><span></span><span class="lineno">1 </span>mithrandi@lorien&gt; python item-creation 15
<span class="lineno">2 </span>0.000164939785004
<span class="lineno">3 </span>mithrandi@lorien&gt; pypy item-creation 15
<span class="lineno">4 </span>0.000301389718056
</pre></div>
<p>That’s about 165µs per item creation on CPython, and 301µs on PyPy, nearly 83% slower; not exactly what we were hoping for. If I increase the length of the outer loop (number of transactions) from 10 to 1000, and introduce the double benchmark run, the results look a lot more encouraging:</p>
<div class="code-unformatted"><pre class="code literal-block"><span></span><span class="lineno">1 </span>mithrandi@lorien&gt; python item-creation 15
<span class="lineno">2 </span>0.000159110188484
<span class="lineno">3 </span>mithrandi@lorien&gt; pypy item-creation 15
<span class="lineno">4 </span>8.7410929203e-05
</pre></div>
<p>That’s about 159µs per item creation on CPython, and only 87µs on PyPy; that’s a 45% speed increase. The PyPy speed-up is welcome, but it’s also interesting to note that CPython benefits slightly from the changes to the benchmark. I don’t have any immediate explanation for why this might be, but the difference is only about 3%, so it doesn’t matter too much.</p>
<p>The second benchmark is <code>inmemory-setting</code>. This benchmark constructs 10,000 items with 5 <code>inmemory</code> attributes (actually, the number of attributes is hardcoded, due to a limitation in the benchmark code), and then times how long it takes to set all 5 attributes to new values on each of the 10,000 items. I decreased the number of items to 1000, wrapped a loop around the attribute setting to repeat it 1000 times, and introduced the double benchmark run:</p>
<div class="code-unformatted"><pre class="code literal-block"><span></span><span class="lineno">1 </span>mithrandi@lorien&gt; python inmemory-setting
<span class="lineno">2 </span>4.86490821838e-07
<span class="lineno">3 </span>mithrandi@lorien&gt; pypy inmemory-setting
<span class="lineno">4 </span>1.28742599487e-07
</pre></div>
<p>That’s 486ns to set an attribute on CPython, and 129ns on PyPy, for a 74% speed increase. Note that this benchmark is extremely sensitive to small fluctuations since the operation being measured is such a fast one, so the results can vary a fair amount between benchmarks run. For interest’s sake, I repeated the benchmark except with a normal Python class substituted for <code>Item</code>, in order to compare the overhead of setting an <code>inmemory</code> attribute as compared with normal Python attribute access. The result was 61ns to set an attribute on CPython (making an inmemory attribute about 700% slower), and 2ns on PyPy (inmemory is 5700% slower). The speed difference on PyPy is more to do with how fast setting a normal attribute is on PyPy, than to do with Axiom being slow.</p>
<p>The third benchmark is <code>integer-setting</code>. This benchmark is similar to <code>inmemory-setting</code> except that it uses <code>integer</code> attributes instead of <code>inmemory</code> attributes. I performed the same modifications, except with an outer loop of 100 iterations:</p>
<div class="code-unformatted"><pre class="code literal-block"><span></span><span class="lineno">1 </span>mithrandi@lorien&gt; python integer-setting
<span class="lineno">2 </span>1.23480038643e-05
<span class="lineno">3 </span>mithrandi@lorien&gt; pypy integer-setting
<span class="lineno">4 </span>3.80326986313e-06
</pre></div>
<p>That’s 12.3µs to set an attribute on CPython, and 3.8µs on PyPy, a 69% speed increase.</p>
<p>The fourth benchmark is <code>item-loading 15</code>. This benchmark creates 10,000 items with 15 <code>integer</code> attributes each, then times how long it takes to load an item from the database. On CPython, the items are deallocated and removed from the item cache immediately thanks to refcounting, but on PyPy a <code>gc.collect()</code> after creating the items is necessary to force them to be garbage collected. In addition, I increased the number of items to 100,000 and introduced the double benchmark run:</p>
<div class="code-unformatted"><pre class="code literal-block"><span></span><span class="lineno">1 </span>mithrandi@lorien&gt; python item-loading 15
<span class="lineno">2 </span>9.09668397903e-05
<span class="lineno">3 </span>mithrandi@lorien&gt; pypy item-loading 15
<span class="lineno">4 </span>5.70205903053e-05
</pre></div>
<p>That’s 90µs to load an item on CPython, and 57µs on PyPy, for a modest 37% speed increase.</p>
<p>The fifth benchmark is <code>multiquery-creation 5 15</code>. This benchmark constructs (but does not run) an Axiom query involving 5 different types, each with 15 attributes (such a query requires Axiom to construct SQL that mentions each item table, and each column in those tables) 10,000 times. I increased the number of queries constructed to 100,000 and introduced the double benchmark run:</p>
<div class="code-unformatted"><pre class="code literal-block"><span></span><span class="lineno">1 </span>mithrandi@lorien&gt; python multiquery-creation 5 15
<span class="lineno">2 </span>5.5426299572e-05
<span class="lineno">3 </span>mithrandi@lorien&gt; pypy multiquery-creation 5 15
<span class="lineno">4 </span>7.98981904984e-06
</pre></div>
<p>55µs to construct a query on CPython; 8µs on PyPy; 86% speed increase.</p>
<p>The sixth benchmark is <code>query-creation 15</code>. This benchmark is the same as <code>multiquery-creation</code>, except for queries involving only a single item type. I increased the number of queries constructed to 1,000,000 and introduced the double benchmark run:</p>
<div class="code-unformatted"><pre class="code literal-block"><span></span><span class="lineno">1 </span>mithrandi@lorien&gt; python query-creation 15
<span class="lineno">2 </span>1.548528409e-05
<span class="lineno">3 </span>mithrandi@lorien&gt; pypy query-creation 15
<span class="lineno">4 </span>1.56546807289e-06
</pre></div>
<p>15.5µs to construct a query on CPython; 1.6µs on PyPy; 90% speed increase.</p>
<p>The final benchmark is <code>store-opening 20 15</code>. This benchmark simply times how long it takes to open a store containing 20 different item types, each with 15 attributes (opening a store requires Axiom to load the schema from the database, among other things). I increased the number of iterations from 100 to 10,000; due to a <a href="https://bugs.launchpad.net/divmod-axiom/+bug/1208092">bug in Axiom</a>, the benchmark will run out of file descriptors partway, so I had to work around this. I also introduced the double benchmark run:</p>
<div class="code-unformatted"><pre class="code literal-block"><span></span><span class="lineno">1 </span>mithrandi@lorien&gt; python store-opening 20 15
<span class="lineno">2 </span>0.00140788140297
<span class="lineno">3 </span>mithrandi@lorien&gt; pypy store-opening 20 15
<span class="lineno">4 </span>0.00202187280655
</pre></div>
<p>1.41ms to open a store on CPython; 2.02ms on PyPy; 44% slowdown. I’m not sure what the cause of the slowdown is.</p>
<p>A bzr branch containing all of my modifications is available at <code><a href="https://bazaar.launchpad.net/~mithrandi/divmod.org/pypy-benchmarking/files">lp:~mithrandi/divmod.org/pypy-benchmarking</a></code>.</p></div><div><p><strong>EDIT:</strong> <a href="https://mithrandi.net/blog/2015/03/axiom-benchmark-results-on-pypy-2-5-0/">Updated version now available</a>.</p>
<p><strong>EDIT:</strong> Fixed the issue with the store-opening benchmark</p>
<p>Axiom conveniently includes a few microbenchmarks; I thought I’d use them to give an idea of the speed increase made possible by running Axiom on PyPy. In order to do this, however, I’m going to have to modify the benchmarks a little. To understand why this is necessary, one has to understand how PyPy achieves the speed it does: namely, through the use of JIT (Just-In-Time) compilation techniques. In short, these techniques mean that PyPy is compiling code during the execution of a program; it does this “just in time” to run the code (or actually, if I understand correctly, in some cases only after the code has been run). This means that when a PyPy program has just started up, there is a lot of performance overhead in the form of the time taken up by JIT compilation running, as well as time taken up by code being interpreted slowly because it has not yet been compiled. While this performance hit is quite significant for command-line tools and other short-lived programs, many applications making use of Axiom are long-lived server processes; for these, any startup overhead is mostly unimportant, the performance that interests us is the performance achieved once the startup cost has already been paid. The Axiom microbenchmarks mostly take the form of performing a certain operation N times, recording the time taken, then dividing that time by N to get an average time per single operation. I have made two modifications to the microbenchmarks in order to demonstrate the performance on PyPy; first, I have increased the value of “N”; second, I have modified the benchmarks to run the entire benchmark twice, throwing away the results from the first run and only reporting the second run. This serves to exclude startup/”warmup” costs from the benchmark.</p>
<p>All of the results below are from my desktop machine running Debian unstable on amd64, CPython 2.7.5, and PyPy 2.1.0 on a Core i7-2600K running at 3.40GHz. I tried to keep the system mostly quiet during benchmarking, but I did have a web browser and other typical desktop applications running at the same time. Here’s a graph of the results; see the rest of the post for the details, especially regarding the store-opening benchmark (which is actually slower on PyPy).</p>
<p>[graph removed, see the <a href="https://mithrandi.net/blog/2015/03/axiom-benchmark-results-on-pypy-2-5-0/">new post</a> instead]</p>
<p>To get an example of how much of a difference this makes, let’s take a look at the first benchmark I’m going to run, <code>item-creation 15</code>. This benchmark constructs an <code>Item</code> type with 15 <code>integer</code> attributes, then runs 10 transactions where each transaction creates 1000 items of that type. In its initial form, the results look like this:</p>
<div class="code-unformatted"><pre class="code literal-block"><span></span><span class="lineno">1 </span>mithrandi@lorien&gt; python item-creation 15
<span class="lineno">2 </span>0.000164939785004
<span class="lineno">3 </span>mithrandi@lorien&gt; pypy item-creation 15
<span class="lineno">4 </span>0.000301389718056
</pre></div>
<p>That’s about 165µs per item creation on CPython, and 301µs on PyPy, nearly 83% slower; not exactly what we were hoping for. If I increase the length of the outer loop (number of transactions) from 10 to 1000, and introduce the double benchmark run, the results look a lot more encouraging:</p>
<div class="code-unformatted"><pre class="code literal-block"><span></span><span class="lineno">1 </span>mithrandi@lorien&gt; python item-creation 15
<span class="lineno">2 </span>0.000159110188484
<span class="lineno">3 </span>mithrandi@lorien&gt; pypy item-creation 15
<span class="lineno">4 </span>8.7410929203e-05
</pre></div>
<p>That’s about 159µs per item creation on CPython, and only 87µs on PyPy; that’s a 45% speed increase. The PyPy speed-up is welcome, but it’s also interesting to note that CPython benefits slightly from the changes to the benchmark. I don’t have any immediate explanation for why this might be, but the difference is only about 3%, so it doesn’t matter too much.</p>
<p>The second benchmark is <code>inmemory-setting</code>. This benchmark constructs 10,000 items with 5 <code>inmemory</code> attributes (actually, the number of attributes is hardcoded, due to a limitation in the benchmark code), and then times how long it takes to set all 5 attributes to new values on each of the 10,000 items. I decreased the number of items to 1000, wrapped a loop around the attribute setting to repeat it 1000 times, and introduced the double benchmark run:</p>
<div class="code-unformatted"><pre class="code literal-block"><span></span><span class="lineno">1 </span>mithrandi@lorien&gt; python inmemory-setting
<span class="lineno">2 </span>4.86490821838e-07
<span class="lineno">3 </span>mithrandi@lorien&gt; pypy inmemory-setting
<span class="lineno">4 </span>1.28742599487e-07
</pre></div>
<p>That’s 486ns to set an attribute on CPython, and 129ns on PyPy, for a 74% speed increase. Note that this benchmark is extremely sensitive to small fluctuations since the operation being measured is such a fast one, so the results can vary a fair amount between benchmarks run. For interest’s sake, I repeated the benchmark except with a normal Python class substituted for <code>Item</code>, in order to compare the overhead of setting an <code>inmemory</code> attribute as compared with normal Python attribute access. The result was 61ns to set an attribute on CPython (making an inmemory attribute about 700% slower), and 2ns on PyPy (inmemory is 5700% slower). The speed difference on PyPy is more to do with how fast setting a normal attribute is on PyPy, than to do with Axiom being slow.</p>
<p>The third benchmark is <code>integer-setting</code>. This benchmark is similar to <code>inmemory-setting</code> except that it uses <code>integer</code> attributes instead of <code>inmemory</code> attributes. I performed the same modifications, except with an outer loop of 100 iterations:</p>
<div class="code-unformatted"><pre class="code literal-block"><span></span><span class="lineno">1 </span>mithrandi@lorien&gt; python integer-setting
<span class="lineno">2 </span>1.23480038643e-05
<span class="lineno">3 </span>mithrandi@lorien&gt; pypy integer-setting
<span class="lineno">4 </span>3.80326986313e-06
</pre></div>
<p>That’s 12.3µs to set an attribute on CPython, and 3.8µs on PyPy, a 69% speed increase.</p>
<p>The fourth benchmark is <code>item-loading 15</code>. This benchmark creates 10,000 items with 15 <code>integer</code> attributes each, then times how long it takes to load an item from the database. On CPython, the items are deallocated and removed from the item cache immediately thanks to refcounting, but on PyPy a <code>gc.collect()</code> after creating the items is necessary to force them to be garbage collected. In addition, I increased the number of items to 100,000 and introduced the double benchmark run:</p>
<div class="code-unformatted"><pre class="code literal-block"><span></span><span class="lineno">1 </span>mithrandi@lorien&gt; python item-loading 15
<span class="lineno">2 </span>9.09668397903e-05
<span class="lineno">3 </span>mithrandi@lorien&gt; pypy item-loading 15
<span class="lineno">4 </span>5.70205903053e-05
</pre></div>
<p>That’s 90µs to load an item on CPython, and 57µs on PyPy, for a modest 37% speed increase.</p>
<p>The fifth benchmark is <code>multiquery-creation 5 15</code>. This benchmark constructs (but does not run) an Axiom query involving 5 different types, each with 15 attributes (such a query requires Axiom to construct SQL that mentions each item table, and each column in those tables) 10,000 times. I increased the number of queries constructed to 100,000 and introduced the double benchmark run:</p>
<div class="code-unformatted"><pre class="code literal-block"><span></span><span class="lineno">1 </span>mithrandi@lorien&gt; python multiquery-creation 5 15
<span class="lineno">2 </span>5.5426299572e-05
<span class="lineno">3 </span>mithrandi@lorien&gt; pypy multiquery-creation 5 15
<span class="lineno">4 </span>7.98981904984e-06
</pre></div>
<p>55µs to construct a query on CPython; 8µs on PyPy; 86% speed increase.</p>
<p>The sixth benchmark is <code>query-creation 15</code>. This benchmark is the same as <code>multiquery-creation</code>, except for queries involving only a single item type. I increased the number of queries constructed to 1,000,000 and introduced the double benchmark run:</p>
<div class="code-unformatted"><pre class="code literal-block"><span></span><span class="lineno">1 </span>mithrandi@lorien&gt; python query-creation 15
<span class="lineno">2 </span>1.548528409e-05
<span class="lineno">3 </span>mithrandi@lorien&gt; pypy query-creation 15
<span class="lineno">4 </span>1.56546807289e-06
</pre></div>
<p>15.5µs to construct a query on CPython; 1.6µs on PyPy; 90% speed increase.</p>
<p>The final benchmark is <code>store-opening 20 15</code>. This benchmark simply times how long it takes to open a store containing 20 different item types, each with 15 attributes (opening a store requires Axiom to load the schema from the database, among other things). I increased the number of iterations from 100 to 10,000; due to a <a href="https://bugs.launchpad.net/divmod-axiom/+bug/1208092">bug in Axiom</a>, the benchmark will run out of file descriptors partway, so I had to work around this. I also introduced the double benchmark run:</p>
<div class="code-unformatted"><pre class="code literal-block"><span></span><span class="lineno">1 </span>mithrandi@lorien&gt; python store-opening 20 15
<span class="lineno">2 </span>0.00140788140297
<span class="lineno">3 </span>mithrandi@lorien&gt; pypy store-opening 20 15
<span class="lineno">4 </span>0.00202187280655
</pre></div>
<p>1.41ms to open a store on CPython; 2.02ms on PyPy; 44% slowdown. I’m not sure what the cause of the slowdown is.</p>
<p>A bzr branch containing all of my modifications is available at <code><a href="https://bazaar.launchpad.net/~mithrandi/divmod.org/pypy-benchmarking/files">lp:~mithrandi/divmod.org/pypy-benchmarking</a></code>.</p></div>Divmod / PyPy status updatehttps://mithrandi.net/blog/2013/08/divmod-pypy-status-update/2013-08-04T02:18:31Z2013-08-04T02:18:31Zmithrandi<div><p>Just a quick status update:</p>
<ul>
<li>Epsilon test suite passes on PyPy.</li>
<li>Nevow test suite passes on PyPy.</li>
<li>Axiom (trunk) test suite has two failures on PyPy, fixed by <a href="https://code.launchpad.net/~divmod-dev/divmod.org/829869-explicit-type-dropping/+merge/72276">this branch</a> (which just deletes the tests); I don’t expect this to affect any application code.</li>
<li>Mantissa test suite mostly fails. This is due to modules that indirectly import <code>xmantissa.terminal</code> which imports PyCrypto directly or indirectly (via <code>twisted.conch</code>) — PyCrypto does not build on PyPy.</li>
<li>I haven’t looked at Combinator; the test suite has a whole bunch of test failures on CPython due to a change in Subversion, and there’s no real reason to run Combinator with PyPy (I would expect it to be slower than on CPython).</li>
<li>I haven’t looked at Quotient or Imaginary yet, as they depend on Mantissa.</li>
<li>I haven’t looked at Hyperbola, Prime, Reverend, or Sine — I doubt anyone cares about these (I don’t even know what Prime does, and it doesn’t have a test suite).</li>
</ul>
<p>The next thing I’m going to work on is <a href="https://bugs.launchpad.net/mantissa/+bug/830343">making the Mantissa dependency on PyCrypto optional</a>; while having an SSH server is nice, there is plenty of functionality in Mantissa that does not depend on or interact with the SSH server in any way, so it’ll still be useful for most applications. With any luck, once this is fixed, the entire test suite will pass; it’s hard to predict given how little of the test suite is currently runnable.</p></div><div><p>Just a quick status update:</p>
<ul>
<li>Epsilon test suite passes on PyPy.</li>
<li>Nevow test suite passes on PyPy.</li>
<li>Axiom (trunk) test suite has two failures on PyPy, fixed by <a href="https://code.launchpad.net/~divmod-dev/divmod.org/829869-explicit-type-dropping/+merge/72276">this branch</a> (which just deletes the tests); I don’t expect this to affect any application code.</li>
<li>Mantissa test suite mostly fails. This is due to modules that indirectly import <code>xmantissa.terminal</code> which imports PyCrypto directly or indirectly (via <code>twisted.conch</code>) — PyCrypto does not build on PyPy.</li>
<li>I haven’t looked at Combinator; the test suite has a whole bunch of test failures on CPython due to a change in Subversion, and there’s no real reason to run Combinator with PyPy (I would expect it to be slower than on CPython).</li>
<li>I haven’t looked at Quotient or Imaginary yet, as they depend on Mantissa.</li>
<li>I haven’t looked at Hyperbola, Prime, Reverend, or Sine — I doubt anyone cares about these (I don’t even know what Prime does, and it doesn’t have a test suite).</li>
</ul>
<p>The next thing I’m going to work on is <a href="https://bugs.launchpad.net/mantissa/+bug/830343">making the Mantissa dependency on PyCrypto optional</a>; while having an SSH server is nice, there is plenty of functionality in Mantissa that does not depend on or interact with the SSH server in any way, so it’ll still be useful for most applications. With any luck, once this is fixed, the entire test suite will pass; it’s hard to predict given how little of the test suite is currently runnable.</p></div>