summaryrefslogtreecommitdiff
path: root/environments_in_lua_5_2_and_beyond.html
diff options
context:
space:
mode:
Diffstat (limited to 'environments_in_lua_5_2_and_beyond.html')
-rw-r--r--environments_in_lua_5_2_and_beyond.html155
1 files changed, 155 insertions, 0 deletions
diff --git a/environments_in_lua_5_2_and_beyond.html b/environments_in_lua_5_2_and_beyond.html
new file mode 100644
index 0000000..fc2b9b7
--- /dev/null
+++ b/environments_in_lua_5_2_and_beyond.html
@@ -0,0 +1,155 @@
+<!doctype html>
+<html lang="en">
+<meta charset="utf-8">
+<meta name="viewport" content="width=device-width, initial-scale=1">
+<meta name="author" content="aki">
+<meta name="tags" content="Lua, setfenv, _ENV, sandbox, environment, tutorial">
+<link rel="icon" type="image/png" href="cylo.png">
+<link rel="stylesheet" type="text/css" href="style.css">
+
+<title>Environments in Lua 5.2 and Beyond</title>
+
+<nav><p><a href="https://ignore.pl">ignore.pl</a></p></nav>
+
+<article>
+<h1>Environments in Lua 5.2 and Beyond</h1>
+<p class="subtitle">Published on 2020-07-04 20:39:00+02:00
+<p>Environments are a way of dealing with various problems. Or creating them entirely on your own. Primarily, they are
+used to isolate a selected part of a program. As Lua is meant to be used as an embedded language, you may find yourself
+wanting to separate user created addons from more internal scripts. In short: sandboxing and overall security.
+<p>While I will focus on environments in Lua I won't go too deeply into the implementation details. I'll try to focus
+more on design and general overview with minor thoughts on syntax and the inner workings. I think you might find this
+text interesting no matter if you know or are interested in Lua itself.
+<p>Previously, we had <code><a href="https://www.lua.org/manual/5.1/manual.html#pdf-setfenv">setfenv</a></code> and
+related functions to do exactly that. With some tricks, debug library, and faith, you could do real magic, which is kind
+of cool. However, magic can easily become arcane, thus unclear.
+<p>With Lua 5.2 <code>setfenv</code> and related were removed in favour of a new approach. This one uses a simple local
+variable with name <code>_ENV</code>. Luckily, this approach can also be fun. It has other benefits over the old one,
+but the goal here is not a comparison.
+<p>One more thing before we hop into examples: I strongly encourage to read some documents on Lambda Calculus. They
+will give you quite good overview over what is happening with closures/anonymous functions/lambda expressions/whatever
+they are called. It's a good foundation and an entertaining exercise. This way you can quickly draw similarities,
+considering that the Calculus is easier to grasp than the most of language-syntax-related shenanigans.</p>
+<img src="environments_in_lua_5_2_and_beyond-1.png" alt="lambda">
+<h2>Terms</h2>
+<p>Now then, let's have a simple example to define few selected terms:</p>
+<pre>
+-- Start of chunk's body
+local OFFSET_ERROR = 0.97731
+local
+function calibrate (value, ratio, offset)
+ -- Start of function's body; not part of chunk's body
+ local real_offset = offset * OFFSET_ERROR
+ print("offset:", real_offset)
+ return value * ration + real_offset
+ -- End of function's body
+end
+-- End of chunk's body
+</pre>
+<p>Surprisingly, there's a lot of going on in here. First off, we have two scopes: "chunk's body", and "function's
+body". "Chunk's body" has two local variables: <code>OFFSET_ERROR</code> (that acts as a constant), and
+<code>calibrate</code> (a function). In turn, "function's body" has four local variables: <code>value</code>,
+<code>ratio</code>, <code>offset</code> (those three are the arguments for the function), and <code>real_offset</code>
+(a temporary variable I added just to show that function body may also have explicit local variable). We will call all
+of those variables exactly in the way I already did: <em>local variables</em>.
+<p>In addition to the local variables, "function's body" also refers to two other names. First one is
+<code>OFFSET_ERROR</code>. We already know this one; it's a local variable from the chunk. A smaller scope that is
+inside another scope can refer to their local variables as they want. They are called <em>upvalues</em> then. This works
+on any level, no matter how deeply the scopes are nested, as long as it makes sense. It doesn't work the other way:
+outer scope referring to local variable in inner scope is a no-no.
+<p>Second external reference in "function's body" is <code>print</code>. We don't see it defined anywhere as a local
+variable. Commonly such variables are called <em>globals</em> or global variables. That's how we will call them.
+<p>Here's the part that can be slightly ambiguous: definition of "environment". I can see myself include in it all three
+types of variables I mentioned, only upvalues and globals, or just globals. Depending on the situation, all of these
+cases may be fitting, and looking at how other languages use the term, and how it's similar to bound variables in the
+Lambda Calculus; they are all good explanations. The Lua itself uses "environment" only to refer to the method that
+resolves references to globals.
+<h2>Upvalues</h2>
+<p>That being said, let's talk about the upvalues for a moment before we go to the globals. Upvalues are the closest
+thing you can have to a bound variable from the Lambda Calculus. They always bind "by reference", simply because the
+variable name is a direct reference to the variable and it's value. Whatever that means, see for yourself:</p>
+<pre>
+local counter = 0
+local
+function increment ()
+ counter = counter + 1
+end
+increment()
+increment()
+increment()
+print(counter)
+</pre>
+<p>This example will print out "3". Upvalues also play huge part in garbage collection due to their nature but that's
+none of the concerns of this article.
+<h2>Environment</h2>
+<p>Back to the main topic! Quick reminder: in Lua "environment" is used to resolve references to names that are not
+local variables or upvalues. In other words it's a way to deal with free variables when the program executes.
+<p>"Environments" are associative tables. They link global variable names to actual variables. The environments
+themselves are bound to functions as upvalues called <code>_ENV</code>, whenever they are needed. It's done implicitly;
+quietly in the background. This means that the <code>calibrate</code> function from the first example actually has two
+upvalues: <code>OFFEST_ERROR</code>, and <code>_ENV</code>. <code>_ENV</code> by default took as its value a table that
+was used as global environment at that time. If <code>calibrate</code> wouldn't use <code>print</code>,
+<code>_ENV</code> wouldn't be there at all.
+<p>This is quite important, so let me repeat. Environments are here to deal with free variables, but are bound variables
+themselves.</p>
+<pre>
+local
+function hello ()
+ print "Hello!"
+end
+local _ENV = {print = function () end}
+hello()
+</pre>
+<p>We create a simple local function that is meant to print out "Hello!" to standard output. After that we overwrite
+current environment with a new one that contains <code>print</code> function that does nothing. If we call
+<code>hello</code>, it still prints out "Hello!" like it was meant to. It's because it's bound to the original
+environment, not the new one.
+<p>In the meantime you might have noticed that environments may appear somehow interchangeable with upvalues. That's
+correct to some extent, and it's because of the things I've already mentioned: ambiguity is one, dealing with free
+variables while being a bound variable is two. In program execution bound variables (in our terms: upvalues) are there
+to deal with free variables, and here we are doing the same thing.</p>
+<img src="environments_in_lua_5_2_and_beyond-2.png" alt="fat bird">
+<h2>Usage</h2>
+<p>Yeah, that's all of the explanation there was. I could sum it up in: "environments are tables bound to upvalues that
+resolve free variables". Cool, how and when can we use them?
+<p>The most common use case is sandboxing or more generally: limiting things available to scripts. Let's say we develop
+a program that uses Lua as a scripting language. We load all default modules from Lua for ourselves: <code>io</code>,
+<code>debug</code>, <code>string</code>, whatever we want. However, we don't want to expose all of them to external
+scripts. To do so, we prepare a table that will act as an environment for them and simply assign it as
+<code>_ENV</code> upvalue. Most likely through <code>load</code> or <code>loadfile</code> function:</p>
+<pre>
+local end_user_env = {
+ print = print
+}
+local script = loadfile("external.lua", nil, end_user_env)
+script()
+</pre>
+<p>Of course, you can do that from C API, too. This requires us to acknowledge one more thing: upvalues are stored as
+a list and are indexed. For regular functions <code>_ENV</code> upvalue might be in any place of this list. For main
+chunks, loaded external scripts, or the "chunk" from the first example <code>_ENV</code> is expected to be first on the
+list.</p>
+<pre>
+luaL_loadfile(L, "external.lua");
+lua_newtable(L);
+lua_pushliteral(L, "print");
+lua_getglobal(L, "print");
+lua_settable(L, -3);
+lua_setupvalue(L, -2, 1);
+lua_call(L, 0, 0);
+</pre>
+<p>It could also be done by prepending <code>local _ENV = end_user_env</code> to the external script before loading it,
+but that's a hassle:</p>
+<pre>
+local file = io.open "env.lua"
+local content = file:read "*a"
+file:close()
+content = "local _ENV = {print = print}\n" .. content
+local script = load(content)
+script()
+</pre>
+<p>This method of environment manipulation can be used in other cases for more in-line changes as seen in one of the
+previous examples. This is the new way of making magic tricks after <code>setfenv</code> is gone. I'll leave this as a
+topic for another time. I think the examples above are sufficient for now. From here I can expand to magic or sandboxing
+details.
+</article>
+<script src="https://stats.ignore.pl/track.js"></script>