diff options
Diffstat (limited to 'environments_in_lua_5_2_and_beyond.html')
-rw-r--r-- | environments_in_lua_5_2_and_beyond.html | 155 |
1 files changed, 155 insertions, 0 deletions
diff --git a/environments_in_lua_5_2_and_beyond.html b/environments_in_lua_5_2_and_beyond.html new file mode 100644 index 0000000..fc2b9b7 --- /dev/null +++ b/environments_in_lua_5_2_and_beyond.html @@ -0,0 +1,155 @@ +<!doctype html> +<html lang="en"> +<meta charset="utf-8"> +<meta name="viewport" content="width=device-width, initial-scale=1"> +<meta name="author" content="aki"> +<meta name="tags" content="Lua, setfenv, _ENV, sandbox, environment, tutorial"> +<link rel="icon" type="image/png" href="cylo.png"> +<link rel="stylesheet" type="text/css" href="style.css"> + +<title>Environments in Lua 5.2 and Beyond</title> + +<nav><p><a href="https://ignore.pl">ignore.pl</a></p></nav> + +<article> +<h1>Environments in Lua 5.2 and Beyond</h1> +<p class="subtitle">Published on 2020-07-04 20:39:00+02:00 +<p>Environments are a way of dealing with various problems. Or creating them entirely on your own. Primarily, they are +used to isolate a selected part of a program. As Lua is meant to be used as an embedded language, you may find yourself +wanting to separate user created addons from more internal scripts. In short: sandboxing and overall security. +<p>While I will focus on environments in Lua I won't go too deeply into the implementation details. I'll try to focus +more on design and general overview with minor thoughts on syntax and the inner workings. I think you might find this +text interesting no matter if you know or are interested in Lua itself. +<p>Previously, we had <code><a href="https://www.lua.org/manual/5.1/manual.html#pdf-setfenv">setfenv</a></code> and +related functions to do exactly that. With some tricks, debug library, and faith, you could do real magic, which is kind +of cool. However, magic can easily become arcane, thus unclear. +<p>With Lua 5.2 <code>setfenv</code> and related were removed in favour of a new approach. This one uses a simple local +variable with name <code>_ENV</code>. Luckily, this approach can also be fun. It has other benefits over the old one, +but the goal here is not a comparison. +<p>One more thing before we hop into examples: I strongly encourage to read some documents on Lambda Calculus. They +will give you quite good overview over what is happening with closures/anonymous functions/lambda expressions/whatever +they are called. It's a good foundation and an entertaining exercise. This way you can quickly draw similarities, +considering that the Calculus is easier to grasp than the most of language-syntax-related shenanigans.</p> +<img src="environments_in_lua_5_2_and_beyond-1.png" alt="lambda"> +<h2>Terms</h2> +<p>Now then, let's have a simple example to define few selected terms:</p> +<pre> +-- Start of chunk's body +local OFFSET_ERROR = 0.97731 +local +function calibrate (value, ratio, offset) + -- Start of function's body; not part of chunk's body + local real_offset = offset * OFFSET_ERROR + print("offset:", real_offset) + return value * ration + real_offset + -- End of function's body +end +-- End of chunk's body +</pre> +<p>Surprisingly, there's a lot of going on in here. First off, we have two scopes: "chunk's body", and "function's +body". "Chunk's body" has two local variables: <code>OFFSET_ERROR</code> (that acts as a constant), and +<code>calibrate</code> (a function). In turn, "function's body" has four local variables: <code>value</code>, +<code>ratio</code>, <code>offset</code> (those three are the arguments for the function), and <code>real_offset</code> +(a temporary variable I added just to show that function body may also have explicit local variable). We will call all +of those variables exactly in the way I already did: <em>local variables</em>. +<p>In addition to the local variables, "function's body" also refers to two other names. First one is +<code>OFFSET_ERROR</code>. We already know this one; it's a local variable from the chunk. A smaller scope that is +inside another scope can refer to their local variables as they want. They are called <em>upvalues</em> then. This works +on any level, no matter how deeply the scopes are nested, as long as it makes sense. It doesn't work the other way: +outer scope referring to local variable in inner scope is a no-no. +<p>Second external reference in "function's body" is <code>print</code>. We don't see it defined anywhere as a local +variable. Commonly such variables are called <em>globals</em> or global variables. That's how we will call them. +<p>Here's the part that can be slightly ambiguous: definition of "environment". I can see myself include in it all three +types of variables I mentioned, only upvalues and globals, or just globals. Depending on the situation, all of these +cases may be fitting, and looking at how other languages use the term, and how it's similar to bound variables in the +Lambda Calculus; they are all good explanations. The Lua itself uses "environment" only to refer to the method that +resolves references to globals. +<h2>Upvalues</h2> +<p>That being said, let's talk about the upvalues for a moment before we go to the globals. Upvalues are the closest +thing you can have to a bound variable from the Lambda Calculus. They always bind "by reference", simply because the +variable name is a direct reference to the variable and it's value. Whatever that means, see for yourself:</p> +<pre> +local counter = 0 +local +function increment () + counter = counter + 1 +end +increment() +increment() +increment() +print(counter) +</pre> +<p>This example will print out "3". Upvalues also play huge part in garbage collection due to their nature but that's +none of the concerns of this article. +<h2>Environment</h2> +<p>Back to the main topic! Quick reminder: in Lua "environment" is used to resolve references to names that are not +local variables or upvalues. In other words it's a way to deal with free variables when the program executes. +<p>"Environments" are associative tables. They link global variable names to actual variables. The environments +themselves are bound to functions as upvalues called <code>_ENV</code>, whenever they are needed. It's done implicitly; +quietly in the background. This means that the <code>calibrate</code> function from the first example actually has two +upvalues: <code>OFFEST_ERROR</code>, and <code>_ENV</code>. <code>_ENV</code> by default took as its value a table that +was used as global environment at that time. If <code>calibrate</code> wouldn't use <code>print</code>, +<code>_ENV</code> wouldn't be there at all. +<p>This is quite important, so let me repeat. Environments are here to deal with free variables, but are bound variables +themselves.</p> +<pre> +local +function hello () + print "Hello!" +end +local _ENV = {print = function () end} +hello() +</pre> +<p>We create a simple local function that is meant to print out "Hello!" to standard output. After that we overwrite +current environment with a new one that contains <code>print</code> function that does nothing. If we call +<code>hello</code>, it still prints out "Hello!" like it was meant to. It's because it's bound to the original +environment, not the new one. +<p>In the meantime you might have noticed that environments may appear somehow interchangeable with upvalues. That's +correct to some extent, and it's because of the things I've already mentioned: ambiguity is one, dealing with free +variables while being a bound variable is two. In program execution bound variables (in our terms: upvalues) are there +to deal with free variables, and here we are doing the same thing.</p> +<img src="environments_in_lua_5_2_and_beyond-2.png" alt="fat bird"> +<h2>Usage</h2> +<p>Yeah, that's all of the explanation there was. I could sum it up in: "environments are tables bound to upvalues that +resolve free variables". Cool, how and when can we use them? +<p>The most common use case is sandboxing or more generally: limiting things available to scripts. Let's say we develop +a program that uses Lua as a scripting language. We load all default modules from Lua for ourselves: <code>io</code>, +<code>debug</code>, <code>string</code>, whatever we want. However, we don't want to expose all of them to external +scripts. To do so, we prepare a table that will act as an environment for them and simply assign it as +<code>_ENV</code> upvalue. Most likely through <code>load</code> or <code>loadfile</code> function:</p> +<pre> +local end_user_env = { + print = print +} +local script = loadfile("external.lua", nil, end_user_env) +script() +</pre> +<p>Of course, you can do that from C API, too. This requires us to acknowledge one more thing: upvalues are stored as +a list and are indexed. For regular functions <code>_ENV</code> upvalue might be in any place of this list. For main +chunks, loaded external scripts, or the "chunk" from the first example <code>_ENV</code> is expected to be first on the +list.</p> +<pre> +luaL_loadfile(L, "external.lua"); +lua_newtable(L); +lua_pushliteral(L, "print"); +lua_getglobal(L, "print"); +lua_settable(L, -3); +lua_setupvalue(L, -2, 1); +lua_call(L, 0, 0); +</pre> +<p>It could also be done by prepending <code>local _ENV = end_user_env</code> to the external script before loading it, +but that's a hassle:</p> +<pre> +local file = io.open "env.lua" +local content = file:read "*a" +file:close() +content = "local _ENV = {print = print}\n" .. content +local script = load(content) +script() +</pre> +<p>This method of environment manipulation can be used in other cases for more in-line changes as seen in one of the +previous examples. This is the new way of making magic tricks after <code>setfenv</code> is gone. I'll leave this as a +topic for another time. I think the examples above are sufficient for now. From here I can expand to magic or sandboxing +details. +</article> +<script src="https://stats.ignore.pl/track.js"></script> |