how_to_generate_files_from_templates_in_shell.html


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159

<!doctype html>
<html lang="en">
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="author" content="aki">
<meta name="tags" content="template, configure, cat, shell, sh, envsubst, sed">
<meta name="published-on" content="2023-03-26T18:05:00+02:00">
<link rel="icon" type="image/png" href="favicon.png">
<link rel="stylesheet" href="style.css">

<title>How to Generate Files From Templates in Shell</title>

<header>
<nav><a href="https://ignore.pl">ignore.pl</a></nav>
<time>26 March 2023</time>
<h1>How to Generate Files From Templates in Shell</h1>
</header>

<article>
<p>This is a total rewrite of an old article. I no longer liked it and my methods have changed.
<p>Generating or "configuring" files from a template is a common occurrence. A prime example from what I usually use is
<a href="https://cmake.org/cmake/help/latest/command/configure_file.html">configure_file</a> in CMake. Another example
would be service configuration files sitting in a staging location before getting deployed (e.g., version controlled
configs for e-mail or web server). Today let's focus on these kinds of use and not verbose template engines for e.g.,
HTML.
<p>Common notations to mark replacement spots in templates are: <code>$VARIABLE</code>, <code>${VARIABLE}</code>, and
<code>@VARIABLE@</code>. First two are obviously coming directly from shell-like notation of variable substitution. The
latter exists exactly to be different from these for when we want to create a template that may contain <code>$</code>
notation in the output as part of its natural syntax. In other words: when the syntax of generated file uses
<code>$</code>.

<h2>Using shell itself</h2>
<p>POSIX-compliant shells support a mechanism called
<a href="https://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap02.html#tag_18_07_04">heredoc</a>. We can
use it in combination with <b>cat</b>(1):

<pre>
#!/bin/sh
cat &lt;&lt;CONTENT
server {
	listen 80;
	server_name $USER.$DOMAIN;
	root /srv/http/$UESR/public;
}
CONTENT
</pre>

<p>This case has an obvious problem. This isn't really a template that I promised. Instead, it is a script that
generates the intended output. Someone could also try to argue about useless use of <b>cat</b> here.
<p>Using <b>cat</b> and heredocs gives us a lot of flexibility. We can wrap some content with a common header and footer
if we want to:

<pre>
#!/bin/sh
cat /dev/fd/3 $@ /dev/fd/3 3&lt;&lt;HEAD 4&lt;&lt;FOOT
&lt;!doctype html&gt;
&lt;html lang="en"&gt;
HEAD
&lt;script src="script.js"&gt;&lt;/script&gt;
FOOT
</pre>

<h2>Using envsubst</h2>
<p>If we want a real template instead of an executable script we can use <b>envsubst</b>(1). This tool is extremely
straight-forward in use: put template in standard input and get substituted text in standard output:

<pre>
server {
	listen 80;
	server_name $USER.$DOMAIN;
	root /srv/http/$USER/public;
}
</pre>

<p>Then:

<pre>
$ export DOMAIN=example.tld
$ envsubst &lt;nginx.conf.in
server {
	listen 80;
	server_name aki.ignore.pl;
	root /srv/http/aki/public;
}
</pre>

<img src="how_to_generate_files_from_templates_in_shell-1.png" alt="shell substitution">

<p><b>Envsubst</b> supports <code>${VARIABLE}</code>, too.
<p>Major potential problem with <b>envsubst</b> is that it substitutes everything as it goes. It doesn't matter whether
the variable exists in the environment or not. This is the usual expected behaviour from shell, but it might not be
well suited for handling any output that uses <code>$</code> in a meaningful way. We can partially workaround it using
<i>SHELL-FORMAT</i> argument:

<pre>
$ envsubst <em>'$USER, $DOMAIN'</em> &lt;nginx.conf.in &gt;nginx.conf
</pre>

<p>This limits the substitutions to selected variables. The format of this argument is not important. Whatever is a
conformant variable reference will work: <code>'$USER$DOMAIN'</code>, <code>'$USER $DOMAIN'</code>,
<code>'$USER,$DOMAIN'</code>, and the first example are all equivalent. Just remember to not substitute the variables
when calling <b>envsubst</b> by accident and to put it in one argument (hence why single-quotes are used).

<h2>Using sed</h2>
<p>Finally, we can use <b>sed</b>(1) to gain even more control over what happens to our templates. This comes at the
cost: <b>sed</b> does not have access to environment variables on its own. Usually, we can find it being used like this:

<pre>
$ sed "s/@VERSION@/$VERSION/g" &lt;version.h.in &gt;version.h
</pre>

<p>Shell will substitute <code>$VERSION</code> there with the variable and any use of <code>@VERSION@</code> in template
file will be replaced. Note that <b>sed</b> can replace anything it wants - <code>@</code> are used here to make it more
strict and to make template more readable.
<p>If we are feeling like over-engineering, we can generate script for <b>sed</b> and use that instead. The variable
values may come from anywhere you want at that point, let's use shell:

<pre>
#!/bin/sh
if tag=$(git describe --tags --exact); then
	echo s/@VERSION@/$tag/g
else
	echo s/@VERSION@/@BRANCH@-@HASH@/g
fi
echo s/@HASH@/$(git rev-parse --short HEAD)/g
echo s/@BRANCH@/$(git symbolic-ref --short HEAD || echo detached)/g
</pre>

<p>Here rather than using <b>cat</b> I used <code>echo</code>. Depending on the state of repository it is used in, it
may output something similar to:

<pre>
s/@VERSION@/@BRANCH@-@HASH@/g
s/@HASH@/4242424/g
s/@BRANCH@/nightly/g
</pre>

<p>We can then feed it into <b>sed</b>:

<pre>
$ sed -f <em>subst.sed</em> &lt;version.h.in &gt;version.h
</pre>

<p>Now, if we want to over-engineer it for real, let's put it into a Makefile:

<pre>
subst.sed: subst.sed.sh
	./$&lt; &gt;$@

%.h: %.h.in subst.sed
	sed -f subst.sed &lt;$&lt; &gt;$@
</pre>

<h2>Other Alternatives</h2>
<p>Otherwise one could potentially use: <b>perl</b>(1), <b>python</b>(1), <b>awk</b>(1), maybe shell's <code>eval</code>
if feeling adventurous (and malicious, I guess). CMakes <code>configure_file</code> is very nice but is limited to
CMake. I'm starting to feel like it could be a nice weekend project to make a utility after a beer or two.
</article>
<script src="https://stats.ignore.pl/track.js"></script>