It is a rather common practice to include version numbers directly in the meta files or sources that are stored
inside a code repository. To list few examples: npm makes it part of its regular
practices, python's setuptools usually involves various hacks, be it normal
imports, reading file, or anything else. Even in their more declarative approach their support limits itself to string
literals and file or module attribute reading.
OK, we have a version number in a configuration file, or some other special file, or directly in the code. What's so
wrong about it? The natural enemy of this blog - duplication. In this case, we're talking responsibilities.
There's a rather high chance that you are using Git to track changes. If not Git then Mercurial, SVN, Fossil, Darcs,
or really any other version control system, distributed or not, it doesn't matter. What matters is that these tools are
designed to help you control different versions of your software. When you add your software version into the source
code of said software, you create a new independent layer of versioning.
Now, not everyone is a minimalist and a mere threat of an additional entity handling the same thing might not scare
you. Same thing regarding the duplication of the version data. The problems begin when the version data is actually not
duplicated between VCS and the source code. Native identification of commits in Git - SHA - doesn't really fit for
distribution use, where Semantic Versioning makes much more sense for users. The point
stands for other VCSes as well.
We end up with two distinct layers of versioning where one controls the other. This usually leads to very awkward
workflows. In a commercial project I have worked in, we wanted to mark mainline branch as unstable and make it visible
through a version number. For a released (and validated; the whole workflow was heavily oriented on paperwork) piece of
software we wanted a regular version number. This resulted in an interesting process that was required for back-fixing:
find merge base between release branch and mainline, make fix, merge to mainline, merge release branch into fix branch,
increment version file, merge to release branch.
Other tendency that I see as a result of the duplication - the versions are out-of-sync or simply meaningless. Let's
consider two approaches to incrementing version number in a file: before and after the release. First one makes it that
the first commit of the release is a commit that increments the version number. Now, between this commit and the next
release file is out-of-sync, because with each change the state only becomes more and more different from the version
that is described by the file. The second approach is: create release - deploy application and whatnot - and then
increment the version number to what is expected to be the next release. This requires strict management of what changes
will get merged or good fortune-telling skills, otherwise that predicted number is meaningless as you won't be able to
ensure that the release is a major/minor/patch.
Happily for us, most of the VCSes have built-in functionalities to help us control version numbers that are
meaningful in a context of distribution and deployment. They are called tags or rarely baselines. With
them we can mark arbitrary repository states with arbitrary strings that can be later referenced. Usually, they are used
to mark commits that a certain release originated from. Sometimes it might be even the same commit that incremented
version number in a file. We're not doing the last part. Instead we want to tag a commit, and read it from our build,
distribution, deployment, or packaging system.
Some of them support it better, some worse. Luckily, good chunk of them allow for arbitrary logic to be executed, so
we can implement it by ourselves. Additionally, there is a good chance that the VCS provides some kind of helper for
getting the human-readable tag-driven version description. In case of Git there is git-describe(1), which fits
this use case directly. We can just call it from CMake or setup.py and read its output. Very often we may be forced to
generate some full pledged files, so be ready to do it.
Example CMake Project
Let's consider a C++ project built by CMake. It is intended to be packaged, but is also meant to support installing
it directly from the repository. The program itself is rather simple - it prints out its version. Every single time:
#include <iostream>
#include "version.h"
int main(int, char*[])
{
std::cout << version::full << std::endl;
}
The version.h is just a namespace with an extern constant. To provide an actual value for the version number,
let's write a version.cpp.in that will be processed by CMake to generate the actual source that will be added to
the target:
namespace version {
const char* full = "@VERSION@";
}
The @VERSION@
is a pattern that is recognized by CMake's
configure_file which we will use to
generate the actual source code that is intended for compilation just like we planned:
add_executable(version-print main.cpp)
git_describe(VERSION)
configure_file(version.cpp.in ${CMAKE_CURRENT_BINARY_DIR}/version.cpp @ONLY)
target_sources(version-print PRIVATE ${CMAKE_CURRENT_BINARY_DIR}/version.cpp)
Now, CMake does not support this out-of-box sadly, so you need to implement git_describe
(or similar)
yourself or get an external module for it. There is one in
Ryan A. Pavlik's
repository and I have implemented
one for Starshatter at some point.
They are rather easy and fun to write (although there is a variety of edge cases), but it's still a bit shame that they
are not part of the core CMake. Now, note that one of these "edge cases" might be triggering the reconfiguration of the
output file as configure_file
is configuration-time action and changes to the output of the
git_describe
may or may not be properly detected. Be wary.
Alternatively, you can consider using a define to, well, define the version number. I use this approach with separate
file that defines the variable for the sake of dependency checks and recompilation.
Final Thoughts
Of course, a full-pledged support would be way nicer and stable, but surprisingly, we're not there yet.
Well, some of us are. To contrast the list of bad examples consider Go programming language, which recommends this
method of versioning. Interestingly, it also uses code repositories as a form of package distribution, so any arguments
saying that file with a version is needed because the repository is used as a means of distribution are baseless.
What to take away from this post? Next time when you will start a project, consider keeping meaningful version only
in the VCS. If your building/distribution/whatever-else system does not support it fully out-of-box - try implementing
it. Once you have a working implementation - push it upstream. Who knows, maybe in some time we might be able to have a
consistent support in all across the ecosystem. As for now, back to experimenting, and until next time!