summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
-rw-r--r--environments_in_lua_5_2_and_beyond.html172
-rw-r--r--lua_as_human_readable_serialization_format.html26
2 files changed, 126 insertions, 72 deletions
diff --git a/environments_in_lua_5_2_and_beyond.html b/environments_in_lua_5_2_and_beyond.html
index 3c6469c..722645a 100644
--- a/environments_in_lua_5_2_and_beyond.html
+++ b/environments_in_lua_5_2_and_beyond.html
@@ -17,80 +17,134 @@
</header>
<article>
-<p>Environments are a way of dealing with various problems. Or creating them entirely on your own. Primarily, they are
-used to isolate a selected part of a program. As Lua is meant to be used as an embedded language, you may find yourself
-wanting to separate user created addons from more internal scripts. In short: sandboxing and overall security.
-<p>While I will focus on environments in Lua I won't go too deeply into the implementation details. I'll try to focus
-more on design and general overview with minor thoughts on syntax and the inner workings. I think you might find this
-text interesting no matter if you know or are interested in Lua itself.
-<p>Previously, we had <code><a href="https://www.lua.org/manual/5.1/manual.html#pdf-setfenv">setfenv</a></code> and
-related functions to do exactly that. With some tricks, debug library, and faith, you could do real magic, which is kind
-of cool. However, magic can easily become arcane, thus unclear.
-<p>With Lua 5.2 <code>setfenv</code> and related were removed in favour of a new approach. This one uses a simple local
-variable with name <code>_ENV</code>. Luckily, this approach can also be fun. It has other benefits over the old one,
-but the goal here is not a comparison.
-<p>One more thing before we hop into examples: I strongly encourage to read some documents on Lambda Calculus. They
-will give you quite good overview over what is happening with closures/anonymous functions/lambda expressions/whatever
-they are called. It's a good foundation and an entertaining exercise. This way you can quickly draw similarities,
-considering that the Calculus is easier to grasp than the most of language-syntax-related shenanigans.</p>
+<p>Environments in Lua can be used to isolate selected parts of a program, expose an API, <a href="">create a markup
+format</a>, widely modify behaviour at runtime, and many more. They can also create a lot of problems. Overall, we can
+summarize it as "sandboxing."
+<p>I will focus on Lua, but I won't go too deep into language or implementation details. Instead, I want to provide an
+overview on design and essentials with minor points on syntax and the inner workings. This way I aim to make this
+document useful to you whether you are or not a Lua user.
+<p>Before Lua 5.2, we had <code><a href="https://www.lua.org/manual/5.1/manual.html#pdf-setfenv">setfenv</a></code>
+family of functions for operating on function environments directly. With some tricks and debug library we could do real
+magic. This approach was very versatile but could also very easily become unclear. It was also almost completely
+disconnected from the language.
+<p>In Lua 5.2 <code>setfenv</code> family was removed in favour of a new approach centred around <code>_ENV</code>
+variable. This approach is usually called <em>Lexical Environments</em> and it is the main focus of this document.
+<p>I strongly encourage everyone to learn lambda calculus. Some vocabulary will be borrowed from it and the design and
+behaviour of Lua resembles it at many levels. Additionally, it gives a good overview over what is happening with
+closures, anonymous functions, lambda expressions, or whatever else a programming language can call an ad hoc function.
+It may also serve as a way to build better intuition about scoping rules in different languages.</p>
<img src="environments_in_lua_5_2_and_beyond-1.png" alt="lambda">
<h2>Terms</h2>
-<p>Now then, let's have a simple example to define few selected terms:</p>
+<p>Now then, let's consider a code sample and define selected terms:</p>
+<style>
+.merged pre {
+ margin-top: 0;
+ margin-bottom: 0;
+ padding-top: 0;
+ padding-bottom: 0;
+}
+.merged pre:first-child { padding-top: 0.4em; }
+.merged pre:last-child { padding-bottom: 0.4em; }
+span.decl { font-weight: bold; }
+span.use { text-decoration: underline; }
+span.a { background-color: #c7cdee; }
+span.b { background-color: #f6f4cd; }
+span.f { background-color: #c38df9; }
+</style>
+<div class="merged">
+<pre style="background: #c7cdee;">
+local <span class="decl">OFFSET_ERROR</span> = 0.97731
+local function <span class="decl">calibrate</span> (<span class="decl">value</span>, <span class="decl">ratio</span>, <span class="decl">offset</span>)
+</pre>
+<pre style="background: #f6f4cd; border-left: 1rem solid #c7cdee; padding-left: 0;">
+ local <span class="decl">real_offset</span> = <span class="use">offset</span> * <span class="use">OFFSET_ERROR</span>
+ <span class="use">print</span>("offset:", <span class="use">real_offset</span>)
+ return <span class="use">value</span> * <span class="use">ratio</span> + <span class="use">real_offset</span>
+</pre>
+<pre style="background: #c7cdee;">
+end
+</pre>
+</div>
+<p>Blocks marked with blue and yellow are called "chunks" in Lua. In this case blue is also considered as a special case
+of a "main chunk" assuming it is a file fed into Lua. Yellow block is body of <b>calibrate</b> function. We can call it
+"function body" and it is still considered as a "chunk". Yellow is not a "main chunk."
+<p>In general, chunks are also lexical scopes for variables and it's fine to use these terms interchangeably. I'll do
+that. Differentiation between these may matter, but only in the context of Lua loading internals as "chunk" has a more
+specific meaning in there.
+<p>I marked names of variables that are being declared or assigned in bold. Names that are underlined are being used or
+otherwise referenced. And so we can observe <code>OFFSET_ERROR</code> being declared as a local and assigned a value in
+the first line and then used in multiplication in third. In the second line variable <code>calibrate</code> is assigned
+with a function as its value. These two syntax forms are explained in detail in
+<a href="https://www.lua.org/manual/5.4/manual.html#3.3.7">local declaration</a> and
+<a href="https://www.lua.org/manual/5.4/manual.html#3.4.11">function definition</a>.
+<p>Both <code>OFFSET_ERROR</code> and <code>calibrate</code> use <code>local</code> keyword making these names available
+only to the main chunk and its nested blocks. Of course, this makes them available in <b>calibrate</b>'s body. There's a
+little bit more to how syntax around local function definition works but no meaningful problems occur in this particular
+case, so I'll ignore it.
+<p>Function arguments are automatically declared as local variables in their respective function scope. Other than that
+from variables in function body we have <code>real_offset</code> local declaration. Similarly, because it is local, it
+will be available only to function body and its nested blocks. This means main chunk cannot access it.
+<p>Now, let's do a simple exercise and colour code variable references and their declarations:
<pre>
--- Start of chunk's body
-local OFFSET_ERROR = 0.97731
-local
-function calibrate (value, ratio, offset)
- -- Start of function's body; not part of chunk's body
- local real_offset = offset * OFFSET_ERROR
- print("offset:", real_offset)
- return value * ration + real_offset
- -- End of function's body
+local <span class="a">OFFSET_ERROR</span> = 0.97731
+local function calibrate (<span class="b">value</span>, <span class="b">ratio</span>, <span class="b">offset</span>)
+ local <span class="b">real_offset</span> = <span class="b use">offset</span> * <span class="a use">OFFSET_ERROR</span>
+ <span class="f use">print</span>("offset:", <span class="b use">real_offset</span>)
+ return <span class="b use">value</span> * <span class="b use">ratio</span> + <span class="b use">real_offset</span>
end
--- End of chunk's body
</pre>
-<p>Surprisingly, there's a lot of going on in here. First off, we have two scopes: "chunk's body", and "function's
-body". "Chunk's body" has two local variables: <code>OFFSET_ERROR</code> (that acts as a constant), and
-<code>calibrate</code> (a function). In turn, "function's body" has four local variables: <code>value</code>,
-<code>ratio</code>, <code>offset</code> (those three are the arguments for the function), and <code>real_offset</code>
-(a temporary variable I added just to show that function body may also have explicit local variable). We will call all
-of those variables exactly in the way I already did: <em>local variables</em>.
-<p>In addition to the local variables, "function's body" also refers to two other names. First one is
-<code>OFFSET_ERROR</code>. We already know this one; it's a local variable from the chunk. A smaller scope that is
-inside another scope can refer to their local variables as they want. They are called <em>upvalues</em> then. This works
-on any level, no matter how deeply the scopes are nested, as long as it makes sense. It doesn't work the other way:
-outer scope referring to local variable in inner scope is a no-no.
-<p>Second external reference in "function's body" is <code>print</code>. We don't see it defined anywhere as a local
-variable. Commonly such variables are called <em>globals</em> or global variables. That's how we will call them.
-<p>Here's the part that can be slightly ambiguous: definition of "environment". I can see myself include in it all three
-types of variables I mentioned, only upvalues and globals, or just globals. Depending on the situation, all of these
-cases may be fitting, and looking at how other languages use the term, and how it's similar to bound variables in the
-Lambda Calculus; they are all good explanations. The Lua itself uses "environment" only to refer to the method that
-resolves references to globals.
+<p>Variables are used in three different ways here:
+<dl>
+<dt><code>value</code>, <code>ratio</code>, <code>offset</code>, <code>real_offset</code>
+<dd>are used and declared inside function body
+<dt><code>OFFSET_ERROR</code><dd>is used inside function body but is declared in the outer scope
+<dt><code>print</code><dd>is used inside function body but is never visibly declared
+</dl>
+<p>First case is our ideal case. Keyword <code>local</code> is telling us exactly what is happening: variables are
+declared and used only within the function body. They are local variables. They don't spill anywhere else unless they
+are passed to a function as an argument. <code>OFFSET_ERROR</code> does spill but only to <b>calibrate</b>'s body
+because it is a nested scope.
+<p>They behave similarly to <em>bound variables</em> from lambda calculus. <code>OFFSET_ERROR</code> is a bit closer to
+them in principle, but the idea is that source of their value is exactly known.
+<p>On the other hand, <code>print</code> behaves like a <em>free variable</em>. It is never declared as local and by
+default Lua considers such variables <em>global</em>. When global is encountered, <em>environment</em> is queried to
+provide a value for it.
+
+
<h2>Upvalues</h2>
-<p>That being said, let's talk about the upvalues for a moment before we go to the globals. Upvalues are the closest
-thing you can have to a bound variable from the Lambda Calculus. They always bind "by reference", simply because the
-variable name is a direct reference to the variable and it's value. Whatever that means, see for yourself:</p>
+<p>Before jumping into <em>environments</em>, let's introduce one last term and talk about how <code>OFFSET_ERROR</code>
+propagates internally. Whenever a local from an outer scope is referenced in a chunk it's called an <em>upvalue</em>.
+Once referenced, they are bound to that particular chunk "by reference" and stored so they continue to live with it.
+<p>Upvalues implement core principle of closures in Lua. Consider two counters:
<pre>
-local counter = 0
-local
-function increment ()
- counter = counter + 1
+local init = 0
+local function new_counter ()
+ local x = init
+ return function ()
+ x = x + 1
+ return x
+ end
end
-increment()
-increment()
-increment()
-print(counter)
+local a = new_counter()
+local b = new_counter()
+print(a()) -- 1
+print(a()) -- 2
+print(a()) -- 3
+print(b()) -- 1
</pre>
-<p>This example will print out "3". Upvalues also play huge part in garbage collection due to their nature but that's
-none of the concerns of this article.
+<p>There are two instances of <code>x</code> created one for each call of <code>new_counter</code>. Each is bound to the
+anonymous function which is in turn returned. This sequence can be also interpreted as a construction of function-like
+object.
+<p><code>Init</code> is bound only to the <code>new_counter</code> and is not bound to the anonymous function.
+
+
<h2>Environment</h2>
<p>Back to the main topic! Quick reminder: in Lua "environment" is used to resolve references to names that are not
local variables or upvalues. In other words it's a way to deal with free variables when the program executes.
<p>"Environments" are associative tables. They link global variable names to actual variables. The environments
themselves are bound to functions as upvalues called <code>_ENV</code>, whenever they are needed. It's done implicitly;
quietly in the background. This means that the <code>calibrate</code> function from the first example actually has two
-upvalues: <code>OFFEST_ERROR</code>, and <code>_ENV</code>. <code>_ENV</code> by default took as its value a table that
+upvalues: <code>OFFSET_ERROR</code>, and <code>_ENV</code>. <code>_ENV</code> by default took as its value a table that
was used as global environment at that time. If <code>calibrate</code> wouldn't use <code>print</code>,
<code>_ENV</code> wouldn't be there at all.
<p>This is quite important, so let me repeat. Environments are here to deal with free variables, but are bound variables
diff --git a/lua_as_human_readable_serialization_format.html b/lua_as_human_readable_serialization_format.html
index 1802aa1..75e6117 100644
--- a/lua_as_human_readable_serialization_format.html
+++ b/lua_as_human_readable_serialization_format.html
@@ -33,10 +33,10 @@ return {
position = {x=0, y=0},
}
</pre>
-<p>This makes use of how <a href="https://www.lua.org/manual/5.4/manual.html#6.3">modules</a> and importing them works.
-In short, module script is interpreted and value of the final <code>return</code> is used as the value for the "module".
-In this case the script is a lone return-statement with no logic involved. This somewhat declarative-like style is
-purely conventional. More commonly the returned value is a table of functions, exactly what we would consider a "normal
+<p>This makes use of how <a href="https://www.lua.org/manual/5.4/manual.html#6.3">modules</a> are imported. In short,
+module script is interpreted and value of the final <code>return</code> is used as the value for the "module". In this
+case the script is a lone return-statement with no logic involved. This somewhat declarative-like style is purely
+conventional. More commonly the returned value is a table of functions, exactly what we would consider a "normal
module", or a class.
<p>In the example above we are not really sure what kind of thing we are dealing with. To stay consistent we could add a
<code>type = "slime",</code> to the table. Now reader would know what they are dealing with. Of course, scheme would
@@ -63,15 +63,15 @@ Slime {
position = Vec2{x=0, y=0},
}
</pre>
-<p>It now looks like some generic markup language. But the module loading mechanism will no longer work for us. Instead
-reader needs to use <a href="https://www.lua.org/manual/5.4/manual.html#pdf-load">load</a>. It conveniently has an
-option to specify execution environment. Additionally, a mechanism for tracking top-level statements in one way or
-another is needed.
-<p>This approach is somewhat similar to what <a href="https://premake.github.io/">Premake</a> does. Surprisingly, this
-is also pretty close to regular register-event-callback approach for plugin systems (e.g., in
-<a href="https://github.com/martanne/vis">vis</a>). How so? The "tracking top-level statements" will result in a
-side-effect in some global or loader state. In callback approach, it's the event dispatcher or otherwise plugin system
-state that fulfils similar role. Additionally, API is usually exposed through environment (and not e.g., user function
+<p>It now looks like generic markup language but the module loading mechanism will no longer work for us. Instead
+reader needs to <a href="https://www.lua.org/manual/5.4/manual.html#pdf-load">load</a> it and execute. Functions
+handling data types here are expected to cause side-effects in order to record the entries. This may be coupled with a
+way of detecting top-level statements. Load conveniently has an option to specify execution environment.
+<p>This approach is somewhat similar to what <a href="https://premake.github.io/">Premake</a> does. And surprisingly,
+this is also pretty close to a regular register-event-callback approach that some plugin systems use (e.g.,
+<a href="https://github.com/martanne/vis">vis</a>). How so? Here, point is to modify loader state as side-effect. In
+callback approach, when plugin is loading it has access to register itself for certain events, modifying the plugin or
+event system's state. Additionally, API is usually exposed through an environment (and not e.g., user function
argument and plugin returned as module never able to directly interact with the API).
<p>For my standards definitions I settled for the last style. It allows for multiple items without indention and I liked
the idea at the time. It allowed me to play around and neatly layer parser, environment, and model. Final definitions