1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
|
<!doctype html>
<html lang="en">
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="author" content="aki">
<meta name="tags" content="Lua, serialization, markup">
<meta name="published-on" content="2024-08-28T13:09:26+02:00">
<link rel="icon" type="image/png" href="favicon.png">
<link rel="stylesheet" href="style.css">
<title>Lua as Human-Readable Serialization Format</title>
<header>
<nav><a href="https://ignore.pl">ignore.pl</a></nav>
<time>28 August 2024</time>
<h1>Lua as Human-Readable Serialization Format</h1>
</header>
<article>
<p>It was this time of year again and I was asked to prepare a definition for
<a href="https://clang.llvm.org/docs/ClangFormat.html">clang-format</a>. I had a style guideline document to work with,
so the task was rather straightforward. Similarly to Google's C++ Style Guideline it requested to group headers in: C
system and standard headers, C++ standard library headers, other library headers, and project headers. And so, my first
thought was to use the default <i>Regroup</i> behaviour.
<p>Until I noticed that they use angle-brackets for other libraries together with an <i>.h</i> extension. Of course,
this matched with the default C-group regex. The strictest and the second easiest solution here is to make the regex
contain a list of alternatives with all of the headers. This requires some maintenance and I decided to have a bit of
fun with it.
<p>There is a somewhat common practice for serializing data for later use in Lua scripts that looks similar to this:
<pre>
return {
name = "Henry",
position = {x=0, y=0},
}
</pre>
<p>This makes use of how <a href="https://www.lua.org/manual/5.4/manual.html#6.3">modules</a> and importing them works.
In short, module script is interpreted and value of the final <code>return</code> is used as the value for the "module".
In this case the script is a lone return-statement with no logic involved. This somewhat declarative-like style is
purely conventional. More commonly the returned value is a table of functions, exactly what we would consider a "normal
module", or a class.
<p>In the example above we are not really sure what kind of thing we are dealing with. To stay consistent we could add a
<code>type = "slime",</code> to the table. Now reader would know what they are dealing with. Of course, scheme would
remain assumed on the user side. Building an object from there could be a straightforward table lookup. Alternatively,
we could rely on <a href="https://www.lua.org/manual/5.4/manual.html#3.4.10">function call syntactic sugar</a> and
prefix the definitions with types:
<pre>
return Slime{
name = "Henry",
position = Vec2{x=0, y=0},
}
</pre>
<p>This too is a table lookup, but the responsibility shifts a bit from the user implementation to the execution
environment. Loading would become trickier this way, but increased preparation complexity can be desired to e.g., cause
errors in an unknown or otherwise unintended environment. If we simply want to make it run, setting global callable
<code>Slime</code> and <code>Vec2</code> is enough here.</p>
<img src="lua_as_human_readable_serialization_format-1.png" alt="henry the slime">
<p>It is somewhat similar to some use cases from <a href="https://www.lua.org/history.html">history of Lua</a>. The main
common part is the syntactic sugar but this feels like a stretch as it can be observed quite often in a "regular" Lua
code. Let's go one step further and remove the <code>return</code>:
<pre>
Slime {
name = "Henry",
position = Vec2{x=0, y=0},
}
</pre>
<p>It now looks like some generic markup language. But the module loading mechanism will no longer work for us. Instead
reader needs to use <a href="https://www.lua.org/manual/5.4/manual.html#pdf-load">load</a>. It conveniently has an
option to specify execution environment. Additionally, a mechanism for tracking top-level statements in one way or
another is needed.
<p>This approach is somewhat similar to what <a href="https://premake.github.io/">Premake</a> does. Surprisingly, this
is also pretty close to regular register-event-callback approach for plugin systems (e.g., in
<a href="https://github.com/martanne/vis">vis</a>). How so? The "tracking top-level statements" will result in a
side-effect in some global or loader state. In callback approach, it's the event dispatcher or otherwise plugin system
state that fulfils similar role. Additionally, API is usually exposed through environment (and not e.g., user function
argument and plugin returned as module never able to directly interact with the API).
<p>For my standards definitions I settled for the last style. It allows for multiple items without indention and I liked
the idea at the time. It allowed me to play around and neatly layer parser, environment, and model. Final definitions
looked like this:
<pre>
scheme "headers/1"
aliases "ANSI C" {"ANSI X3.159-1989", "C89", "C90", "ISO/IEC 9899:1990"}
headers "ANSI C" {
"assert.h",
"ctype.h",
...
}
</pre>
<p>It allowed for more complex structures with <code>include</code> and <code>remove</code>, for example:
<pre>
headers "C++20" {
include "C++17",
remove "ciso646",
"concepts",
...
}
</pre>
<p>See <a href="https://git.ignore.pl/headers/">headers</a> for full source code. Command allowed me to get the list of
headers and join into a regex:
<pre>
$ headers C11 POSIX
aio.h
arpa/inet.h
assert.h
...
</pre>
<p>Except that I never joined them into a regex. After all considerations and some discussions we decided to use
<i>Preserve</i> instead of <i>Regroup</i>, so that we wouldn't need to bother with any of the costs of grouping includes
automatically.
<p>I feel like re-implementing this in SQL.
</article>
<script src="https://stats.ignore.pl/track.js"></script>
|