May 2017 – Grinding Gears

In this Testermonial post, FreeAgent’s resident test engineer describes why we’ve rebranded ‘QA’ and why what we call things matters.

As mentioned in my previous Testermonial, my only gripe when starting at FreeAgent — and a very minor one at that — was the rather entrenched use of the term quality assurance or QA in the development and release process to describe the pre-release testing phase which occurs before deploying a user story or bug fix to production.

Last week, in anticipation of the launch of an overall ‘FreeAgent Test Strategy’ in the next week or so (blog post to follow), I have taken the bold step of rebranding this release phase to the more aptly named pre-production testing or PPT phase. (We could nickname it the ‘powerpoint’ phase, if we really have to). You might be asking yourself why we should even bother changing the name of something that is arguably well understood and defined. It is by no means a pedantic or frivolous move and I would like to take the opportunity below to describe my rationale.

In 2015 I had the good fortune to attend a keynote address by the software testing thought leader, Michael Bolton, at the Ministry of Testing’s TestBash conference in Brighton. His talk explored the things that people often say in software development environments, what they might have meant to say, and why it all matters. It was something I had been thinking about for a while and his talk really resonated with me and informed a lot of how I use language when I talk about testing and software development.

George Orwell, in his essay Politics and the English Language, stated “[Language] becomes ugly and inaccurate because our thoughts are foolish, but the slovenliness of our language makes it easier for us to have foolish thoughts. The point is that the process is reversible.” What I believe Orwell is saying here is that because we use language as scaffolding for both our reasoning and for our understanding, it is of paramount importance, from a psychological perspective, that the language we use accurately describes what we are doing and what we are trying to achieve. Misusing language can indeed foster a culture of ineffectiveness and dysfunction, which was one of Bolton’s main points in his address.

The usage of QA in its current context is problematic for a number of reasons:
‘Quality’ is by its nature subjective; it is a measure of value to some person or people (who matter) and decisions about it are usually emotional and/or political.
Different stakeholders will perceive the same product as having different levels of quality.
At the end of the day we are trying to create value for our customers, but since we are not our customers, we are not in the position to assure that (try as we might).
Ignoring the subjectiveness of ‘quality’ for a minute, can we really assure it? Maybe we could say if you run this exact thing, with this browser, with this much memory, at this time of day it will probably work like this, but are we really accounting for all the variables in play, and does that assurance really tell us the whole story?

What then are we doing and trying to achieve in the release phase we previously called QA? We are performing a combination of verification, validation, and regression testing in order to elicit information that can help inform us and stakeholders about risks and impacts, and whether we have built the intended thing in an acceptable way — and without unexpected consequences — before we release it to our customers. Simply put: We are performing pre-production testing.

QA nae mair! Long live PPT!

OpenVPN is a wonderfully flexible piece of software in anyone’s toolkit, but recently we found a sharp edge that wasn’t the most obvious thing to work around.

After spinning up a new VPN server we wanted to add username/password authentication against an external source. Looking at the OpenVPN documentation, the --auth-user-pass-verify <script> flag provides this functionality. Writing the script for this was easy enough — read the credentials from a temporary file OpenVPN hands us and exit with the appropriate exit code depending on whether authentication should pass or fail.

Shortly after letting users loose on it we noticed that the VPN kept reconnecting for some people, and occasionally packet loss was being observed over the VPN specifically. Looking into it, we discovered that the OpenVPN process was blocked and not forwarding any traffic during the time our authentication script took to execute.

As soon as the script finished, OpenVPN behaved as expected. Given our script was talking to a service on another node, our runtime wasn’t super quick so people would generally notice a few seconds of seconds of packet loss every time someone authenticated.

64 bytes from 192.168.2.1: icmp_seq=24 ttl=251 time=21.535 ms
# Another user starts authentication to the VPN
Request timeout for icmp_seq 25
Request timeout for icmp_seq 26
Request timeout for icmp_seq 27
Request timeout for icmp_seq 28
Request timeout for icmp_seq 29
# Authentication script completes, and OpenVPN process plays catchup
64 bytes from 192.168.2.1: icmp_seq=25 ttl=251 time=5673.376 ms
64 bytes from 192.168.2.1: icmp_seq=26 ttl=251 time=4670.284 ms
64 bytes from 192.168.2.1: icmp_seq=27 ttl=251 time=3667.352 ms
64 bytes from 192.168.2.1: icmp_seq=28 ttl=251 time=2662.066 ms
64 bytes from 192.168.2.1: icmp_seq=29 ttl=251 time=1661.908 ms
64 bytes from 192.168.2.1: icmp_seq=30 ttl=251 time=660.354 ms
64 bytes from 192.168.2.1: icmp_seq=31 ttl=251 time=22.217 ms
64 bytes from 192.168.2.1: icmp_seq=32 ttl=251 time=20.984 ms
64 bytes from 192.168.2.1: icmp_seq=33 ttl=251 time=21.381 ms

As per normal for an open source project, we promptly grabbed the OpenVPN source code and looked to see how/where the auth user pass verify script was called. After digging down through the call tree, we end up in openvpn_execve which handles calling external commands. This is intended to behave like system() and therefore blocks the process waiting for the external command to return. This works quite well when the auth user pass script is doing something quickly on the local node, but quickly becomes a blocker (pun intended) when the network is involved.

Thinking there must be a better solution to this than “suck it up”, after some quick searching we turned up this bug from five years ago which mentions a C plugin API. After some more looking, we eventually ran across the IPTables & Okta OpenVPN plugins, both of which use the C API from a shared library to interact with external scripts without blocking. With this in mind, we went away and wrote a more generic plugin that just calls an external script, much like --auth-user-pass-verify – but importantly does so without blocking the main process whilst the script runs.

The API is documented in openvpn-plugin.h, and really only needs three methods implementing to have a working plugin. These are:

openvpn_plugin_open_v3

Needs to amend the type_mask to tell OpenVPN what API calls the plugin wishes to hook into. In our case we just hook into OPENVPN_PLUGIN_AUTH_USER_PASS_VERIFY. Also needs to define retptr->handle, which is a struct the plugin can store any context it needs in. (We store the pointer to the OpenVPN logger & path of our external script.)
openvpn_plugin_func_v3

This is invoked to handle all the API hooks you’ve asked OpenVPN to let you intercept. In our case this is where we tell OpenVPN we’re going to defer authentication handling, and write the authentication result to the auth_control_file when we’ve worked out if the attempt can proceed or not. This is also where we fork our child process to exec the external script without blocking the main process.
openvpn_plugin_close_v1

Called before OpenVPN exits or reloads the plugin, intended for the plugin to tidyup any resources it may have.

We also defined openvpn_plugin_min_version_required_v1 to make sure the v3 API was available for the plugin to use. Our generic auth-script-openvpn plugin is available on github.

The script we had needed amending slightly too — instead of signifying success/failure of the authentication attempt via exit codes, we were now using the deferred authentication mechanism. When the script is called, OpenVPN generates a temporary file and passes the path of it along. The script is then expected to write a single character to the file to signify whether the authentication succeeds or fails. (OpenVPN watches the file for changes in the meantime, until hand-window seconds passes at which point the authentication attempt is considered a failure.)

As an example, let’s pretend we wanted to authenticate anyone called dave to our VPN, we could have a script like this to do the heavy lifting for us:

#!/usr/bin/env ruby

if ENV["username"] =~ /dave/i
  value = "1" # Success
else
  value = "0" # Failure
end

File.open(ENV["auth_control_file"], "w+") do |f|
  f.puts(value)
end

(For a fuller featured example, including comparing static challenge strings, please see the example external script in the repo)

Once the plugin is built and you have a working script, the only thing that remains is to hook it all together and make users lives’ easier once more. This requires telling OpenVPN where to find the plugin, and telling the plugin where the external script to invoke is:

plugin /opt/local/lib/openvpn/plugins/auth-script.so /opt/local/etc/openvpn/auth-script

Now this VPN’s traffic is no longer interrupted whilst users are authenticating, and everyone is happy.

[If you like solving problems like this and you’re looking for a new career challenge, we’d love to hear from you! https://www.freeagent.com/company/careers]

Grinding Gears

Tales of code crunching from the FreeAgent Engineering team

FreeAgent Testermonials: Getting rid of ‘QA’ and why what we call things matters

External authentication scripts in OpenVPN the right way

We're totally hiring!