From cce90246757b4567888de0889663e3e3c53dca40 Mon Sep 17 00:00:00 2001 From: Aki Date: Wed, 28 Aug 2024 13:09:26 +0200 Subject: Published Lua serialization thing --- lua_as_human_readable_serialization_format.html | 111 ++++++++++++++++++++++++ 1 file changed, 111 insertions(+) create mode 100644 lua_as_human_readable_serialization_format.html (limited to 'lua_as_human_readable_serialization_format.html') diff --git a/lua_as_human_readable_serialization_format.html b/lua_as_human_readable_serialization_format.html new file mode 100644 index 0000000..1802aa1 --- /dev/null +++ b/lua_as_human_readable_serialization_format.html @@ -0,0 +1,111 @@ + + + + + + + + + + +Lua as Human-Readable Serialization Format + +
+ + +

Lua as Human-Readable Serialization Format

+
+ +
+

It was this time of year again and I was asked to prepare a definition for +clang-format. I had a style guideline document to work with, +so the task was rather straightforward. Similarly to Google's C++ Style Guideline it requested to group headers in: C +system and standard headers, C++ standard library headers, other library headers, and project headers. And so, my first +thought was to use the default Regroup behaviour. +

Until I noticed that they use angle-brackets for other libraries together with an .h extension. Of course, +this matched with the default C-group regex. The strictest and the second easiest solution here is to make the regex +contain a list of alternatives with all of the headers. This requires some maintenance and I decided to have a bit of +fun with it. +

There is a somewhat common practice for serializing data for later use in Lua scripts that looks similar to this: +

+return {
+	name = "Henry",
+	position = {x=0, y=0},
+}
+
+

This makes use of how modules and importing them works. +In short, module script is interpreted and value of the final return is used as the value for the "module". +In this case the script is a lone return-statement with no logic involved. This somewhat declarative-like style is +purely conventional. More commonly the returned value is a table of functions, exactly what we would consider a "normal +module", or a class. +

In the example above we are not really sure what kind of thing we are dealing with. To stay consistent we could add a +type = "slime", to the table. Now reader would know what they are dealing with. Of course, scheme would +remain assumed on the user side. Building an object from there could be a straightforward table lookup. Alternatively, +we could rely on function call syntactic sugar and +prefix the definitions with types: +

+return Slime{
+	name = "Henry",
+	position = Vec2{x=0, y=0},
+}
+
+

This too is a table lookup, but the responsibility shifts a bit from the user implementation to the execution +environment. Loading would become trickier this way, but increased preparation complexity can be desired to e.g., cause +errors in an unknown or otherwise unintended environment. If we simply want to make it run, setting global callable +Slime and Vec2 is enough here.

+henry the slime +

It is somewhat similar to some use cases from history of Lua. The main +common part is the syntactic sugar but this feels like a stretch as it can be observed quite often in a "regular" Lua +code. Let's go one step further and remove the return: +

+Slime {
+	name = "Henry",
+	position = Vec2{x=0, y=0},
+}
+
+

It now looks like some generic markup language. But the module loading mechanism will no longer work for us. Instead +reader needs to use load. It conveniently has an +option to specify execution environment. Additionally, a mechanism for tracking top-level statements in one way or +another is needed. +

This approach is somewhat similar to what Premake does. Surprisingly, this +is also pretty close to regular register-event-callback approach for plugin systems (e.g., in +vis). How so? The "tracking top-level statements" will result in a +side-effect in some global or loader state. In callback approach, it's the event dispatcher or otherwise plugin system +state that fulfils similar role. Additionally, API is usually exposed through environment (and not e.g., user function +argument and plugin returned as module never able to directly interact with the API). +

For my standards definitions I settled for the last style. It allows for multiple items without indention and I liked +the idea at the time. It allowed me to play around and neatly layer parser, environment, and model. Final definitions +looked like this: +

+scheme "headers/1"
+aliases "ANSI C" {"ANSI X3.159-1989", "C89", "C90", "ISO/IEC 9899:1990"}
+headers "ANSI C" {
+	"assert.h",
+	"ctype.h",
+	...
+}
+
+

It allowed for more complex structures with include and remove, for example: +

+headers "C++20" {
+	include "C++17",
+	remove "ciso646",
+	"concepts",
+	...
+}
+
+

See headers for full source code. Command allowed me to get the list of +headers and join into a regex: +

+$ headers C11 POSIX
+aio.h
+arpa/inet.h
+assert.h
+...
+
+

Except that I never joined them into a regex. After all considerations and some discussions we decided to use +Preserve instead of Regroup, so that we wouldn't need to bother with any of the costs of grouping includes +automatically. +

I feel like re-implementing this in SQL. +

+ -- cgit v1.1