summaryrefslogtreecommitdiff
path: root/environments_in_lua_5_2_and_beyond.html
diff options
context:
space:
mode:
authorAki <please@ignore.pl>2024-11-13 23:17:13 +0100
committerAki <please@ignore.pl>2024-11-13 23:19:03 +0100
commit907144ec08a506bcd03053770f8b481045bca848 (patch)
tree6f309ceea67349545c58a646af2c2d5d136df38f /environments_in_lua_5_2_and_beyond.html
parentbd159f4c18ec8b193e797aedfa271550e4a98177 (diff)
downloadignore.pl-master.zip
ignore.pl-master.tar.gz
ignore.pl-master.tar.bz2
Rewrote introduction sections of environments in Lua 5.2HEADmaster
I'm still not happy with the state of this article, but it is a bit better now. The flow is quite bad, there are too many pointless expressions, and too little good information. However, it still seems good enough to stay, but I'll keep an eye on it.
Diffstat (limited to 'environments_in_lua_5_2_and_beyond.html')
-rw-r--r--environments_in_lua_5_2_and_beyond.html172
1 files changed, 113 insertions, 59 deletions
diff --git a/environments_in_lua_5_2_and_beyond.html b/environments_in_lua_5_2_and_beyond.html
index 3c6469c..722645a 100644
--- a/environments_in_lua_5_2_and_beyond.html
+++ b/environments_in_lua_5_2_and_beyond.html
@@ -17,80 +17,134 @@
</header>
<article>
-<p>Environments are a way of dealing with various problems. Or creating them entirely on your own. Primarily, they are
-used to isolate a selected part of a program. As Lua is meant to be used as an embedded language, you may find yourself
-wanting to separate user created addons from more internal scripts. In short: sandboxing and overall security.
-<p>While I will focus on environments in Lua I won't go too deeply into the implementation details. I'll try to focus
-more on design and general overview with minor thoughts on syntax and the inner workings. I think you might find this
-text interesting no matter if you know or are interested in Lua itself.
-<p>Previously, we had <code><a href="https://www.lua.org/manual/5.1/manual.html#pdf-setfenv">setfenv</a></code> and
-related functions to do exactly that. With some tricks, debug library, and faith, you could do real magic, which is kind
-of cool. However, magic can easily become arcane, thus unclear.
-<p>With Lua 5.2 <code>setfenv</code> and related were removed in favour of a new approach. This one uses a simple local
-variable with name <code>_ENV</code>. Luckily, this approach can also be fun. It has other benefits over the old one,
-but the goal here is not a comparison.
-<p>One more thing before we hop into examples: I strongly encourage to read some documents on Lambda Calculus. They
-will give you quite good overview over what is happening with closures/anonymous functions/lambda expressions/whatever
-they are called. It's a good foundation and an entertaining exercise. This way you can quickly draw similarities,
-considering that the Calculus is easier to grasp than the most of language-syntax-related shenanigans.</p>
+<p>Environments in Lua can be used to isolate selected parts of a program, expose an API, <a href="">create a markup
+format</a>, widely modify behaviour at runtime, and many more. They can also create a lot of problems. Overall, we can
+summarize it as "sandboxing."
+<p>I will focus on Lua, but I won't go too deep into language or implementation details. Instead, I want to provide an
+overview on design and essentials with minor points on syntax and the inner workings. This way I aim to make this
+document useful to you whether you are or not a Lua user.
+<p>Before Lua 5.2, we had <code><a href="https://www.lua.org/manual/5.1/manual.html#pdf-setfenv">setfenv</a></code>
+family of functions for operating on function environments directly. With some tricks and debug library we could do real
+magic. This approach was very versatile but could also very easily become unclear. It was also almost completely
+disconnected from the language.
+<p>In Lua 5.2 <code>setfenv</code> family was removed in favour of a new approach centred around <code>_ENV</code>
+variable. This approach is usually called <em>Lexical Environments</em> and it is the main focus of this document.
+<p>I strongly encourage everyone to learn lambda calculus. Some vocabulary will be borrowed from it and the design and
+behaviour of Lua resembles it at many levels. Additionally, it gives a good overview over what is happening with
+closures, anonymous functions, lambda expressions, or whatever else a programming language can call an ad hoc function.
+It may also serve as a way to build better intuition about scoping rules in different languages.</p>
<img src="environments_in_lua_5_2_and_beyond-1.png" alt="lambda">
<h2>Terms</h2>
-<p>Now then, let's have a simple example to define few selected terms:</p>
+<p>Now then, let's consider a code sample and define selected terms:</p>
+<style>
+.merged pre {
+ margin-top: 0;
+ margin-bottom: 0;
+ padding-top: 0;
+ padding-bottom: 0;
+}
+.merged pre:first-child { padding-top: 0.4em; }
+.merged pre:last-child { padding-bottom: 0.4em; }
+span.decl { font-weight: bold; }
+span.use { text-decoration: underline; }
+span.a { background-color: #c7cdee; }
+span.b { background-color: #f6f4cd; }
+span.f { background-color: #c38df9; }
+</style>
+<div class="merged">
+<pre style="background: #c7cdee;">
+local <span class="decl">OFFSET_ERROR</span> = 0.97731
+local function <span class="decl">calibrate</span> (<span class="decl">value</span>, <span class="decl">ratio</span>, <span class="decl">offset</span>)
+</pre>
+<pre style="background: #f6f4cd; border-left: 1rem solid #c7cdee; padding-left: 0;">
+ local <span class="decl">real_offset</span> = <span class="use">offset</span> * <span class="use">OFFSET_ERROR</span>
+ <span class="use">print</span>("offset:", <span class="use">real_offset</span>)
+ return <span class="use">value</span> * <span class="use">ratio</span> + <span class="use">real_offset</span>
+</pre>
+<pre style="background: #c7cdee;">
+end
+</pre>
+</div>
+<p>Blocks marked with blue and yellow are called "chunks" in Lua. In this case blue is also considered as a special case
+of a "main chunk" assuming it is a file fed into Lua. Yellow block is body of <b>calibrate</b> function. We can call it
+"function body" and it is still considered as a "chunk". Yellow is not a "main chunk."
+<p>In general, chunks are also lexical scopes for variables and it's fine to use these terms interchangeably. I'll do
+that. Differentiation between these may matter, but only in the context of Lua loading internals as "chunk" has a more
+specific meaning in there.
+<p>I marked names of variables that are being declared or assigned in bold. Names that are underlined are being used or
+otherwise referenced. And so we can observe <code>OFFSET_ERROR</code> being declared as a local and assigned a value in
+the first line and then used in multiplication in third. In the second line variable <code>calibrate</code> is assigned
+with a function as its value. These two syntax forms are explained in detail in
+<a href="https://www.lua.org/manual/5.4/manual.html#3.3.7">local declaration</a> and
+<a href="https://www.lua.org/manual/5.4/manual.html#3.4.11">function definition</a>.
+<p>Both <code>OFFSET_ERROR</code> and <code>calibrate</code> use <code>local</code> keyword making these names available
+only to the main chunk and its nested blocks. Of course, this makes them available in <b>calibrate</b>'s body. There's a
+little bit more to how syntax around local function definition works but no meaningful problems occur in this particular
+case, so I'll ignore it.
+<p>Function arguments are automatically declared as local variables in their respective function scope. Other than that
+from variables in function body we have <code>real_offset</code> local declaration. Similarly, because it is local, it
+will be available only to function body and its nested blocks. This means main chunk cannot access it.
+<p>Now, let's do a simple exercise and colour code variable references and their declarations:
<pre>
--- Start of chunk's body
-local OFFSET_ERROR = 0.97731
-local
-function calibrate (value, ratio, offset)
- -- Start of function's body; not part of chunk's body
- local real_offset = offset * OFFSET_ERROR
- print("offset:", real_offset)
- return value * ration + real_offset
- -- End of function's body
+local <span class="a">OFFSET_ERROR</span> = 0.97731
+local function calibrate (<span class="b">value</span>, <span class="b">ratio</span>, <span class="b">offset</span>)
+ local <span class="b">real_offset</span> = <span class="b use">offset</span> * <span class="a use">OFFSET_ERROR</span>
+ <span class="f use">print</span>("offset:", <span class="b use">real_offset</span>)
+ return <span class="b use">value</span> * <span class="b use">ratio</span> + <span class="b use">real_offset</span>
end
--- End of chunk's body
</pre>
-<p>Surprisingly, there's a lot of going on in here. First off, we have two scopes: "chunk's body", and "function's
-body". "Chunk's body" has two local variables: <code>OFFSET_ERROR</code> (that acts as a constant), and
-<code>calibrate</code> (a function). In turn, "function's body" has four local variables: <code>value</code>,
-<code>ratio</code>, <code>offset</code> (those three are the arguments for the function), and <code>real_offset</code>
-(a temporary variable I added just to show that function body may also have explicit local variable). We will call all
-of those variables exactly in the way I already did: <em>local variables</em>.
-<p>In addition to the local variables, "function's body" also refers to two other names. First one is
-<code>OFFSET_ERROR</code>. We already know this one; it's a local variable from the chunk. A smaller scope that is
-inside another scope can refer to their local variables as they want. They are called <em>upvalues</em> then. This works
-on any level, no matter how deeply the scopes are nested, as long as it makes sense. It doesn't work the other way:
-outer scope referring to local variable in inner scope is a no-no.
-<p>Second external reference in "function's body" is <code>print</code>. We don't see it defined anywhere as a local
-variable. Commonly such variables are called <em>globals</em> or global variables. That's how we will call them.
-<p>Here's the part that can be slightly ambiguous: definition of "environment". I can see myself include in it all three
-types of variables I mentioned, only upvalues and globals, or just globals. Depending on the situation, all of these
-cases may be fitting, and looking at how other languages use the term, and how it's similar to bound variables in the
-Lambda Calculus; they are all good explanations. The Lua itself uses "environment" only to refer to the method that
-resolves references to globals.
+<p>Variables are used in three different ways here:
+<dl>
+<dt><code>value</code>, <code>ratio</code>, <code>offset</code>, <code>real_offset</code>
+<dd>are used and declared inside function body
+<dt><code>OFFSET_ERROR</code><dd>is used inside function body but is declared in the outer scope
+<dt><code>print</code><dd>is used inside function body but is never visibly declared
+</dl>
+<p>First case is our ideal case. Keyword <code>local</code> is telling us exactly what is happening: variables are
+declared and used only within the function body. They are local variables. They don't spill anywhere else unless they
+are passed to a function as an argument. <code>OFFSET_ERROR</code> does spill but only to <b>calibrate</b>'s body
+because it is a nested scope.
+<p>They behave similarly to <em>bound variables</em> from lambda calculus. <code>OFFSET_ERROR</code> is a bit closer to
+them in principle, but the idea is that source of their value is exactly known.
+<p>On the other hand, <code>print</code> behaves like a <em>free variable</em>. It is never declared as local and by
+default Lua considers such variables <em>global</em>. When global is encountered, <em>environment</em> is queried to
+provide a value for it.
+
+
<h2>Upvalues</h2>
-<p>That being said, let's talk about the upvalues for a moment before we go to the globals. Upvalues are the closest
-thing you can have to a bound variable from the Lambda Calculus. They always bind "by reference", simply because the
-variable name is a direct reference to the variable and it's value. Whatever that means, see for yourself:</p>
+<p>Before jumping into <em>environments</em>, let's introduce one last term and talk about how <code>OFFSET_ERROR</code>
+propagates internally. Whenever a local from an outer scope is referenced in a chunk it's called an <em>upvalue</em>.
+Once referenced, they are bound to that particular chunk "by reference" and stored so they continue to live with it.
+<p>Upvalues implement core principle of closures in Lua. Consider two counters:
<pre>
-local counter = 0
-local
-function increment ()
- counter = counter + 1
+local init = 0
+local function new_counter ()
+ local x = init
+ return function ()
+ x = x + 1
+ return x
+ end
end
-increment()
-increment()
-increment()
-print(counter)
+local a = new_counter()
+local b = new_counter()
+print(a()) -- 1
+print(a()) -- 2
+print(a()) -- 3
+print(b()) -- 1
</pre>
-<p>This example will print out "3". Upvalues also play huge part in garbage collection due to their nature but that's
-none of the concerns of this article.
+<p>There are two instances of <code>x</code> created one for each call of <code>new_counter</code>. Each is bound to the
+anonymous function which is in turn returned. This sequence can be also interpreted as a construction of function-like
+object.
+<p><code>Init</code> is bound only to the <code>new_counter</code> and is not bound to the anonymous function.
+
+
<h2>Environment</h2>
<p>Back to the main topic! Quick reminder: in Lua "environment" is used to resolve references to names that are not
local variables or upvalues. In other words it's a way to deal with free variables when the program executes.
<p>"Environments" are associative tables. They link global variable names to actual variables. The environments
themselves are bound to functions as upvalues called <code>_ENV</code>, whenever they are needed. It's done implicitly;
quietly in the background. This means that the <code>calibrate</code> function from the first example actually has two
-upvalues: <code>OFFEST_ERROR</code>, and <code>_ENV</code>. <code>_ENV</code> by default took as its value a table that
+upvalues: <code>OFFSET_ERROR</code>, and <code>_ENV</code>. <code>_ENV</code> by default took as its value a table that
was used as global environment at that time. If <code>calibrate</code> wouldn't use <code>print</code>,
<code>_ENV</code> wouldn't be there at all.
<p>This is quite important, so let me repeat. Environments are here to deal with free variables, but are bound variables