summaryrefslogtreecommitdiff
path: root/environments_in_lua_5_2_and_beyond.html
blob: 3c6469cc57ce2e74e0b3d48767e82e4a2009e1a4 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
<!doctype html>
<html lang="en">
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="author" content="aki">
<meta name="tags" content="Lua, setfenv, _ENV, sandbox, environment, tutorial">
<meta name="published-on" content="2020-07-04T20:39:00+02:00">
<link rel="icon" type="image/png" href="favicon.png">
<link rel="stylesheet" type="text/css" href="style.css">

<title>Environments in Lua 5.2 and Beyond</title>

<header>
<nav><a href="https://ignore.pl">ignore.pl</a></nav>
<time>4 July 2020</time>
<h1>Environments in Lua 5.2 and Beyond</h1>
</header>

<article>
<p>Environments are a way of dealing with various problems. Or creating them entirely on your own. Primarily, they are
used to isolate a selected part of a program. As Lua is meant to be used as an embedded language, you may find yourself
wanting to separate user created addons from more internal scripts. In short: sandboxing and overall security.
<p>While I will focus on environments in Lua I won't go too deeply into the implementation details. I'll try to focus
more on design and general overview with minor thoughts on syntax and the inner workings. I think you might find this
text interesting no matter if you know or are interested in Lua itself.
<p>Previously, we had <code><a href="https://www.lua.org/manual/5.1/manual.html#pdf-setfenv">setfenv</a></code> and
related functions to do exactly that. With some tricks, debug library, and faith, you could do real magic, which is kind
of cool. However, magic can easily become arcane, thus unclear.
<p>With Lua 5.2 <code>setfenv</code> and related were removed in favour of a new approach. This one uses a simple local
variable with name <code>_ENV</code>. Luckily, this approach can also be fun. It has other benefits over the old one,
but the goal here is not a comparison.
<p>One more thing before we hop into examples: I strongly encourage to read some documents on Lambda Calculus. They
will give you quite good overview over what is happening with closures/anonymous functions/lambda expressions/whatever
they are called. It's a good foundation and an entertaining exercise. This way you can quickly draw similarities,
considering that the Calculus is easier to grasp than the most of language-syntax-related shenanigans.</p>
<img src="environments_in_lua_5_2_and_beyond-1.png" alt="lambda">
<h2>Terms</h2>
<p>Now then, let's have a simple example to define few selected terms:</p>
<pre>
-- Start of chunk's body
local OFFSET_ERROR = 0.97731
local
function calibrate (value, ratio, offset)
	-- Start of function's body; not part of chunk's body
	local real_offset = offset * OFFSET_ERROR
	print("offset:", real_offset)
	return value * ration + real_offset
	-- End of function's body
end
-- End of chunk's body
</pre>
<p>Surprisingly, there's a lot of going on in here. First off, we have two scopes: "chunk's body", and "function's
body". "Chunk's body" has two local variables: <code>OFFSET_ERROR</code> (that acts as a constant), and
<code>calibrate</code> (a function). In turn, "function's body" has four local variables: <code>value</code>,
<code>ratio</code>, <code>offset</code> (those three are the arguments for the function), and <code>real_offset</code>
(a temporary variable I added just to show that function body may also have explicit local variable). We will call all
of those variables exactly in the way I already did: <em>local variables</em>.
<p>In addition to the local variables, "function's body" also refers to two other names. First one is
<code>OFFSET_ERROR</code>. We already know this one; it's a local variable from the chunk. A smaller scope that is
inside another scope can refer to their local variables as they want. They are called <em>upvalues</em> then. This works
on any level, no matter how deeply the scopes are nested, as long as it makes sense. It doesn't work the other way:
outer scope referring to local variable in inner scope is a no-no.
<p>Second external reference in "function's body" is <code>print</code>. We don't see it defined anywhere as a local
variable. Commonly such variables are called <em>globals</em> or global variables. That's how we will call them.
<p>Here's the part that can be slightly ambiguous: definition of "environment". I can see myself include in it all three
types of variables I mentioned, only upvalues and globals, or just globals. Depending on the situation, all of these
cases may be fitting, and looking at how other languages use the term, and how it's similar to bound variables in the
Lambda Calculus; they are all good explanations. The Lua itself uses "environment" only to refer to the method that
resolves references to globals.
<h2>Upvalues</h2>
<p>That being said, let's talk about the upvalues for a moment before we go to the globals. Upvalues are the closest
thing you can have to a bound variable from the Lambda Calculus. They always bind "by reference", simply because the
variable name is a direct reference to the variable and it's value. Whatever that means, see for yourself:</p>
<pre>
local counter = 0
local
function increment ()
	counter = counter + 1
end
increment()
increment()
increment()
print(counter)
</pre>
<p>This example will print out "3". Upvalues also play huge part in garbage collection due to their nature but that's
none of the concerns of this article.
<h2>Environment</h2>
<p>Back to the main topic! Quick reminder: in Lua "environment" is used to resolve references to names that are not
local variables or upvalues. In other words it's a way to deal with free variables when the program executes.
<p>"Environments" are associative tables. They link global variable names to actual variables. The environments
themselves are bound to functions as upvalues called <code>_ENV</code>, whenever they are needed. It's done implicitly;
quietly in the background. This means that the <code>calibrate</code> function from the first example actually has two
upvalues: <code>OFFEST_ERROR</code>, and <code>_ENV</code>. <code>_ENV</code> by default took as its value a table that
was used as global environment at that time. If <code>calibrate</code> wouldn't use <code>print</code>,
<code>_ENV</code> wouldn't be there at all.
<p>This is quite important, so let me repeat. Environments are here to deal with free variables, but are bound variables
themselves.</p>
<pre>
local
function hello ()
	print "Hello!"
end
local _ENV = {print = function () end}
hello()
</pre>
<p>We create a simple local function that is meant to print out "Hello!" to standard output. After that we overwrite
current environment with a new one that contains <code>print</code> function that does nothing. If we call
<code>hello</code>, it still prints out "Hello!" like it was meant to. It's because it's bound to the original
environment, not the new one.
<p>In the meantime you might have noticed that environments may appear somehow interchangeable with upvalues. That's
correct to some extent, and it's because of the things I've already mentioned: ambiguity is one, dealing with free
variables while being a bound variable is two. In program execution bound variables (in our terms: upvalues) are there
to deal with free variables, and here we are doing the same thing.</p>
<img src="environments_in_lua_5_2_and_beyond-2.png" alt="fat bird">
<h2>Usage</h2>
<p>Yeah, that's all of the explanation there was. I could sum it up in: "environments are tables bound to upvalues that
resolve free variables". Cool, how and when can we use them?
<p>The most common use case is sandboxing or more generally: limiting things available to scripts. Let's say we develop
a program that uses Lua as a scripting language. We load all default modules from Lua for ourselves: <code>io</code>,
<code>debug</code>, <code>string</code>, whatever we want. However, we don't want to expose all of them to external
scripts. To do so, we prepare a table that will act as an environment for them and simply assign it as
<code>_ENV</code> upvalue. Most likely through <code>load</code> or <code>loadfile</code> function:</p>
<pre>
local end_user_env = {
	print = print
}
local script = loadfile("external.lua", nil, end_user_env)
script()
</pre>
<p>Of course, you can do that from C API, too. This requires us to acknowledge one more thing: upvalues are stored as
a list and are indexed. For regular functions <code>_ENV</code> upvalue might be in any place of this list. For main
chunks, loaded external scripts, or the "chunk" from the first example <code>_ENV</code> is expected to be first on the
list.</p>
<pre>
luaL_loadfile(L, "external.lua");
lua_newtable(L);
lua_pushliteral(L, "print");
lua_getglobal(L, "print");
lua_settable(L, -3);
lua_setupvalue(L, -2, 1);
lua_call(L, 0, 0);
</pre>
<p>It could also be done by prepending <code>local _ENV = end_user_env</code> to the external script before loading it,
but that's a hassle:</p>
<pre>
local file = io.open "env.lua"
local content = file:read "*a"
file:close()
content = "local _ENV = {print = print}\n" .. content
local script = load(content)
script()
</pre>
<p>This method of environment manipulation can be used in other cases for more in-line changes as seen in one of the
previous examples. This is the new way of making magic tricks after <code>setfenv</code> is gone. I'll leave this as a
topic for another time. I think the examples above are sufficient for now. From here I can expand to magic or sandboxing
details.
</article>
<script src="https://stats.ignore.pl/track.js"></script>