diff options
-rw-r--r-- | environments_in_lua_5_2_and_beyond.html | 172 | ||||
-rw-r--r-- | lua_as_human_readable_serialization_format.html | 26 |
2 files changed, 126 insertions, 72 deletions
diff --git a/environments_in_lua_5_2_and_beyond.html b/environments_in_lua_5_2_and_beyond.html index 3c6469c..722645a 100644 --- a/environments_in_lua_5_2_and_beyond.html +++ b/environments_in_lua_5_2_and_beyond.html @@ -17,80 +17,134 @@ </header> <article> -<p>Environments are a way of dealing with various problems. Or creating them entirely on your own. Primarily, they are -used to isolate a selected part of a program. As Lua is meant to be used as an embedded language, you may find yourself -wanting to separate user created addons from more internal scripts. In short: sandboxing and overall security. -<p>While I will focus on environments in Lua I won't go too deeply into the implementation details. I'll try to focus -more on design and general overview with minor thoughts on syntax and the inner workings. I think you might find this -text interesting no matter if you know or are interested in Lua itself. -<p>Previously, we had <code><a href="https://www.lua.org/manual/5.1/manual.html#pdf-setfenv">setfenv</a></code> and -related functions to do exactly that. With some tricks, debug library, and faith, you could do real magic, which is kind -of cool. However, magic can easily become arcane, thus unclear. -<p>With Lua 5.2 <code>setfenv</code> and related were removed in favour of a new approach. This one uses a simple local -variable with name <code>_ENV</code>. Luckily, this approach can also be fun. It has other benefits over the old one, -but the goal here is not a comparison. -<p>One more thing before we hop into examples: I strongly encourage to read some documents on Lambda Calculus. They -will give you quite good overview over what is happening with closures/anonymous functions/lambda expressions/whatever -they are called. It's a good foundation and an entertaining exercise. This way you can quickly draw similarities, -considering that the Calculus is easier to grasp than the most of language-syntax-related shenanigans.</p> +<p>Environments in Lua can be used to isolate selected parts of a program, expose an API, <a href="">create a markup +format</a>, widely modify behaviour at runtime, and many more. They can also create a lot of problems. Overall, we can +summarize it as "sandboxing." +<p>I will focus on Lua, but I won't go too deep into language or implementation details. Instead, I want to provide an +overview on design and essentials with minor points on syntax and the inner workings. This way I aim to make this +document useful to you whether you are or not a Lua user. +<p>Before Lua 5.2, we had <code><a href="https://www.lua.org/manual/5.1/manual.html#pdf-setfenv">setfenv</a></code> +family of functions for operating on function environments directly. With some tricks and debug library we could do real +magic. This approach was very versatile but could also very easily become unclear. It was also almost completely +disconnected from the language. +<p>In Lua 5.2 <code>setfenv</code> family was removed in favour of a new approach centred around <code>_ENV</code> +variable. This approach is usually called <em>Lexical Environments</em> and it is the main focus of this document. +<p>I strongly encourage everyone to learn lambda calculus. Some vocabulary will be borrowed from it and the design and +behaviour of Lua resembles it at many levels. Additionally, it gives a good overview over what is happening with +closures, anonymous functions, lambda expressions, or whatever else a programming language can call an ad hoc function. +It may also serve as a way to build better intuition about scoping rules in different languages.</p> <img src="environments_in_lua_5_2_and_beyond-1.png" alt="lambda"> <h2>Terms</h2> -<p>Now then, let's have a simple example to define few selected terms:</p> +<p>Now then, let's consider a code sample and define selected terms:</p> +<style> +.merged pre { + margin-top: 0; + margin-bottom: 0; + padding-top: 0; + padding-bottom: 0; +} +.merged pre:first-child { padding-top: 0.4em; } +.merged pre:last-child { padding-bottom: 0.4em; } +span.decl { font-weight: bold; } +span.use { text-decoration: underline; } +span.a { background-color: #c7cdee; } +span.b { background-color: #f6f4cd; } +span.f { background-color: #c38df9; } +</style> +<div class="merged"> +<pre style="background: #c7cdee;"> +local <span class="decl">OFFSET_ERROR</span> = 0.97731 +local function <span class="decl">calibrate</span> (<span class="decl">value</span>, <span class="decl">ratio</span>, <span class="decl">offset</span>) +</pre> +<pre style="background: #f6f4cd; border-left: 1rem solid #c7cdee; padding-left: 0;"> + local <span class="decl">real_offset</span> = <span class="use">offset</span> * <span class="use">OFFSET_ERROR</span> + <span class="use">print</span>("offset:", <span class="use">real_offset</span>) + return <span class="use">value</span> * <span class="use">ratio</span> + <span class="use">real_offset</span> +</pre> +<pre style="background: #c7cdee;"> +end +</pre> +</div> +<p>Blocks marked with blue and yellow are called "chunks" in Lua. In this case blue is also considered as a special case +of a "main chunk" assuming it is a file fed into Lua. Yellow block is body of <b>calibrate</b> function. We can call it +"function body" and it is still considered as a "chunk". Yellow is not a "main chunk." +<p>In general, chunks are also lexical scopes for variables and it's fine to use these terms interchangeably. I'll do +that. Differentiation between these may matter, but only in the context of Lua loading internals as "chunk" has a more +specific meaning in there. +<p>I marked names of variables that are being declared or assigned in bold. Names that are underlined are being used or +otherwise referenced. And so we can observe <code>OFFSET_ERROR</code> being declared as a local and assigned a value in +the first line and then used in multiplication in third. In the second line variable <code>calibrate</code> is assigned +with a function as its value. These two syntax forms are explained in detail in +<a href="https://www.lua.org/manual/5.4/manual.html#3.3.7">local declaration</a> and +<a href="https://www.lua.org/manual/5.4/manual.html#3.4.11">function definition</a>. +<p>Both <code>OFFSET_ERROR</code> and <code>calibrate</code> use <code>local</code> keyword making these names available +only to the main chunk and its nested blocks. Of course, this makes them available in <b>calibrate</b>'s body. There's a +little bit more to how syntax around local function definition works but no meaningful problems occur in this particular +case, so I'll ignore it. +<p>Function arguments are automatically declared as local variables in their respective function scope. Other than that +from variables in function body we have <code>real_offset</code> local declaration. Similarly, because it is local, it +will be available only to function body and its nested blocks. This means main chunk cannot access it. +<p>Now, let's do a simple exercise and colour code variable references and their declarations: <pre> --- Start of chunk's body -local OFFSET_ERROR = 0.97731 -local -function calibrate (value, ratio, offset) - -- Start of function's body; not part of chunk's body - local real_offset = offset * OFFSET_ERROR - print("offset:", real_offset) - return value * ration + real_offset - -- End of function's body +local <span class="a">OFFSET_ERROR</span> = 0.97731 +local function calibrate (<span class="b">value</span>, <span class="b">ratio</span>, <span class="b">offset</span>) + local <span class="b">real_offset</span> = <span class="b use">offset</span> * <span class="a use">OFFSET_ERROR</span> + <span class="f use">print</span>("offset:", <span class="b use">real_offset</span>) + return <span class="b use">value</span> * <span class="b use">ratio</span> + <span class="b use">real_offset</span> end --- End of chunk's body </pre> -<p>Surprisingly, there's a lot of going on in here. First off, we have two scopes: "chunk's body", and "function's -body". "Chunk's body" has two local variables: <code>OFFSET_ERROR</code> (that acts as a constant), and -<code>calibrate</code> (a function). In turn, "function's body" has four local variables: <code>value</code>, -<code>ratio</code>, <code>offset</code> (those three are the arguments for the function), and <code>real_offset</code> -(a temporary variable I added just to show that function body may also have explicit local variable). We will call all -of those variables exactly in the way I already did: <em>local variables</em>. -<p>In addition to the local variables, "function's body" also refers to two other names. First one is -<code>OFFSET_ERROR</code>. We already know this one; it's a local variable from the chunk. A smaller scope that is -inside another scope can refer to their local variables as they want. They are called <em>upvalues</em> then. This works -on any level, no matter how deeply the scopes are nested, as long as it makes sense. It doesn't work the other way: -outer scope referring to local variable in inner scope is a no-no. -<p>Second external reference in "function's body" is <code>print</code>. We don't see it defined anywhere as a local -variable. Commonly such variables are called <em>globals</em> or global variables. That's how we will call them. -<p>Here's the part that can be slightly ambiguous: definition of "environment". I can see myself include in it all three -types of variables I mentioned, only upvalues and globals, or just globals. Depending on the situation, all of these -cases may be fitting, and looking at how other languages use the term, and how it's similar to bound variables in the -Lambda Calculus; they are all good explanations. The Lua itself uses "environment" only to refer to the method that -resolves references to globals. +<p>Variables are used in three different ways here: +<dl> +<dt><code>value</code>, <code>ratio</code>, <code>offset</code>, <code>real_offset</code> +<dd>are used and declared inside function body +<dt><code>OFFSET_ERROR</code><dd>is used inside function body but is declared in the outer scope +<dt><code>print</code><dd>is used inside function body but is never visibly declared +</dl> +<p>First case is our ideal case. Keyword <code>local</code> is telling us exactly what is happening: variables are +declared and used only within the function body. They are local variables. They don't spill anywhere else unless they +are passed to a function as an argument. <code>OFFSET_ERROR</code> does spill but only to <b>calibrate</b>'s body +because it is a nested scope. +<p>They behave similarly to <em>bound variables</em> from lambda calculus. <code>OFFSET_ERROR</code> is a bit closer to +them in principle, but the idea is that source of their value is exactly known. +<p>On the other hand, <code>print</code> behaves like a <em>free variable</em>. It is never declared as local and by +default Lua considers such variables <em>global</em>. When global is encountered, <em>environment</em> is queried to +provide a value for it. + + <h2>Upvalues</h2> -<p>That being said, let's talk about the upvalues for a moment before we go to the globals. Upvalues are the closest -thing you can have to a bound variable from the Lambda Calculus. They always bind "by reference", simply because the -variable name is a direct reference to the variable and it's value. Whatever that means, see for yourself:</p> +<p>Before jumping into <em>environments</em>, let's introduce one last term and talk about how <code>OFFSET_ERROR</code> +propagates internally. Whenever a local from an outer scope is referenced in a chunk it's called an <em>upvalue</em>. +Once referenced, they are bound to that particular chunk "by reference" and stored so they continue to live with it. +<p>Upvalues implement core principle of closures in Lua. Consider two counters: <pre> -local counter = 0 -local -function increment () - counter = counter + 1 +local init = 0 +local function new_counter () + local x = init + return function () + x = x + 1 + return x + end end -increment() -increment() -increment() -print(counter) +local a = new_counter() +local b = new_counter() +print(a()) -- 1 +print(a()) -- 2 +print(a()) -- 3 +print(b()) -- 1 </pre> -<p>This example will print out "3". Upvalues also play huge part in garbage collection due to their nature but that's -none of the concerns of this article. +<p>There are two instances of <code>x</code> created one for each call of <code>new_counter</code>. Each is bound to the +anonymous function which is in turn returned. This sequence can be also interpreted as a construction of function-like +object. +<p><code>Init</code> is bound only to the <code>new_counter</code> and is not bound to the anonymous function. + + <h2>Environment</h2> <p>Back to the main topic! Quick reminder: in Lua "environment" is used to resolve references to names that are not local variables or upvalues. In other words it's a way to deal with free variables when the program executes. <p>"Environments" are associative tables. They link global variable names to actual variables. The environments themselves are bound to functions as upvalues called <code>_ENV</code>, whenever they are needed. It's done implicitly; quietly in the background. This means that the <code>calibrate</code> function from the first example actually has two -upvalues: <code>OFFEST_ERROR</code>, and <code>_ENV</code>. <code>_ENV</code> by default took as its value a table that +upvalues: <code>OFFSET_ERROR</code>, and <code>_ENV</code>. <code>_ENV</code> by default took as its value a table that was used as global environment at that time. If <code>calibrate</code> wouldn't use <code>print</code>, <code>_ENV</code> wouldn't be there at all. <p>This is quite important, so let me repeat. Environments are here to deal with free variables, but are bound variables diff --git a/lua_as_human_readable_serialization_format.html b/lua_as_human_readable_serialization_format.html index 1802aa1..75e6117 100644 --- a/lua_as_human_readable_serialization_format.html +++ b/lua_as_human_readable_serialization_format.html @@ -33,10 +33,10 @@ return { position = {x=0, y=0}, } </pre> -<p>This makes use of how <a href="https://www.lua.org/manual/5.4/manual.html#6.3">modules</a> and importing them works. -In short, module script is interpreted and value of the final <code>return</code> is used as the value for the "module". -In this case the script is a lone return-statement with no logic involved. This somewhat declarative-like style is -purely conventional. More commonly the returned value is a table of functions, exactly what we would consider a "normal +<p>This makes use of how <a href="https://www.lua.org/manual/5.4/manual.html#6.3">modules</a> are imported. In short, +module script is interpreted and value of the final <code>return</code> is used as the value for the "module". In this +case the script is a lone return-statement with no logic involved. This somewhat declarative-like style is purely +conventional. More commonly the returned value is a table of functions, exactly what we would consider a "normal module", or a class. <p>In the example above we are not really sure what kind of thing we are dealing with. To stay consistent we could add a <code>type = "slime",</code> to the table. Now reader would know what they are dealing with. Of course, scheme would @@ -63,15 +63,15 @@ Slime { position = Vec2{x=0, y=0}, } </pre> -<p>It now looks like some generic markup language. But the module loading mechanism will no longer work for us. Instead -reader needs to use <a href="https://www.lua.org/manual/5.4/manual.html#pdf-load">load</a>. It conveniently has an -option to specify execution environment. Additionally, a mechanism for tracking top-level statements in one way or -another is needed. -<p>This approach is somewhat similar to what <a href="https://premake.github.io/">Premake</a> does. Surprisingly, this -is also pretty close to regular register-event-callback approach for plugin systems (e.g., in -<a href="https://github.com/martanne/vis">vis</a>). How so? The "tracking top-level statements" will result in a -side-effect in some global or loader state. In callback approach, it's the event dispatcher or otherwise plugin system -state that fulfils similar role. Additionally, API is usually exposed through environment (and not e.g., user function +<p>It now looks like generic markup language but the module loading mechanism will no longer work for us. Instead +reader needs to <a href="https://www.lua.org/manual/5.4/manual.html#pdf-load">load</a> it and execute. Functions +handling data types here are expected to cause side-effects in order to record the entries. This may be coupled with a +way of detecting top-level statements. Load conveniently has an option to specify execution environment. +<p>This approach is somewhat similar to what <a href="https://premake.github.io/">Premake</a> does. And surprisingly, +this is also pretty close to a regular register-event-callback approach that some plugin systems use (e.g., +<a href="https://github.com/martanne/vis">vis</a>). How so? Here, point is to modify loader state as side-effect. In +callback approach, when plugin is loading it has access to register itself for certain events, modifying the plugin or +event system's state. Additionally, API is usually exposed through an environment (and not e.g., user function argument and plugin returned as module never able to directly interact with the API). <p>For my standards definitions I settled for the last style. It allows for multiple items without indention and I liked the idea at the time. It allowed me to play around and neatly layer parser, environment, and model. Final definitions |