Lua as Human-Readable Serialization Format

It was this time of year again and I was asked to prepare a definition for clang-format. I had a style guideline document to work with, so the task was rather straightforward. Similarly to Google's C++ Style Guideline it requested to group headers in: C system and standard headers, C++ standard library headers, other library headers, and project headers. And so, my first thought was to use the default Regroup behaviour.

Until I noticed that they use angle-brackets for other libraries together with an .h extension. Of course, this matched with the default C-group regex. The strictest and the second easiest solution here is to make the regex contain a list of alternatives with all of the headers. This requires some maintenance and I decided to have a bit of fun with it.

There is a somewhat common practice for serializing data for later use in Lua scripts that looks similar to this:

return {
	name = "Henry",
	position = {x=0, y=0},
}

This makes use of how modules and importing them works. In short, module script is interpreted and value of the final return is used as the value for the "module". In this case the script is a lone return-statement with no logic involved. This somewhat declarative-like style is purely conventional. More commonly the returned value is a table of functions, exactly what we would consider a "normal module", or a class.

In the example above we are not really sure what kind of thing we are dealing with. To stay consistent we could add a type = "slime", to the table. Now reader would know what they are dealing with. Of course, scheme would remain assumed on the user side. Building an object from there could be a straightforward table lookup. Alternatively, we could rely on function call syntactic sugar and prefix the definitions with types:

return Slime{
	name = "Henry",
	position = Vec2{x=0, y=0},
}

This too is a table lookup, but the responsibility shifts a bit from the user implementation to the execution environment. Loading would become trickier this way, but increased preparation complexity can be desired to e.g., cause errors in an unknown or otherwise unintended environment. If we simply want to make it run, setting global callable Slime and Vec2 is enough here.

henry the slime

It is somewhat similar to some use cases from history of Lua. The main common part is the syntactic sugar but this feels like a stretch as it can be observed quite often in a "regular" Lua code. Let's go one step further and remove the return:

Slime {
	name = "Henry",
	position = Vec2{x=0, y=0},
}

It now looks like some generic markup language. But the module loading mechanism will no longer work for us. Instead reader needs to use load. It conveniently has an option to specify execution environment. Additionally, a mechanism for tracking top-level statements in one way or another is needed.

This approach is somewhat similar to what Premake does. Surprisingly, this is also pretty close to regular register-event-callback approach for plugin systems (e.g., in vis). How so? The "tracking top-level statements" will result in a side-effect in some global or loader state. In callback approach, it's the event dispatcher or otherwise plugin system state that fulfils similar role. Additionally, API is usually exposed through environment (and not e.g., user function argument and plugin returned as module never able to directly interact with the API).

For my standards definitions I settled for the last style. It allows for multiple items without indention and I liked the idea at the time. It allowed me to play around and neatly layer parser, environment, and model. Final definitions looked like this:

scheme "headers/1"
aliases "ANSI C" {"ANSI X3.159-1989", "C89", "C90", "ISO/IEC 9899:1990"}
headers "ANSI C" {
	"assert.h",
	"ctype.h",
	...
}

It allowed for more complex structures with include and remove, for example:

headers "C++20" {
	include "C++17",
	remove "ciso646",
	"concepts",
	...
}

See headers for full source code. Command allowed me to get the list of headers and join into a regex:

$ headers C11 POSIX
aio.h
arpa/inet.h
assert.h
...

Except that I never joined them into a regex. After all considerations and some discussions we decided to use Preserve instead of Regroup, so that we wouldn't need to bother with any of the costs of grouping includes automatically.

I feel like re-implementing this in SQL.