summaryrefslogtreecommitdiff
path: root/environments_in_lua_5_2_and_beyond.html
blob: 722645a96588e1de5dfd9f3c82019f2b2d49c1bb (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
<!doctype html>
<html lang="en">
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="author" content="aki">
<meta name="tags" content="Lua, setfenv, _ENV, sandbox, environment, tutorial">
<meta name="published-on" content="2020-07-04T20:39:00+02:00">
<link rel="icon" type="image/png" href="favicon.png">
<link rel="stylesheet" type="text/css" href="style.css">

<title>Environments in Lua 5.2 and Beyond</title>

<header>
<nav><a href="https://ignore.pl">ignore.pl</a></nav>
<time>4 July 2020</time>
<h1>Environments in Lua 5.2 and Beyond</h1>
</header>

<article>
<p>Environments in Lua can be used to isolate selected parts of a program, expose an API, <a href="">create a markup
format</a>, widely modify behaviour at runtime, and many more. They can also create a lot of problems. Overall, we can
summarize it as "sandboxing."
<p>I will focus on Lua, but I won't go too deep into language or implementation details. Instead, I want to provide an
overview on design and essentials with minor points on syntax and the inner workings. This way I aim to make this
document useful to you whether you are or not a Lua user.
<p>Before Lua 5.2, we had <code><a href="https://www.lua.org/manual/5.1/manual.html#pdf-setfenv">setfenv</a></code>
family of functions for operating on function environments directly. With some tricks and debug library we could do real
magic. This approach was very versatile but could also very easily become unclear. It was also almost completely
disconnected from the language.
<p>In Lua 5.2 <code>setfenv</code> family was removed in favour of a new approach centred around <code>_ENV</code>
variable. This approach is usually called <em>Lexical Environments</em> and it is the main focus of this document.
<p>I strongly encourage everyone to learn lambda calculus. Some vocabulary will be borrowed from it and the design and
behaviour of Lua resembles it at many levels. Additionally, it gives a good overview over what is happening with
closures, anonymous functions, lambda expressions, or whatever else a programming language can call an ad hoc function.
It may also serve as a way to build better intuition about scoping rules in different languages.</p>
<img src="environments_in_lua_5_2_and_beyond-1.png" alt="lambda">
<h2>Terms</h2>
<p>Now then, let's consider a code sample and define selected terms:</p>
<style>
.merged pre {
	margin-top: 0;
	margin-bottom: 0;
	padding-top: 0;
	padding-bottom: 0;
}
.merged pre:first-child { padding-top: 0.4em; }
.merged pre:last-child { padding-bottom: 0.4em; }
span.decl { font-weight: bold; }
span.use { text-decoration: underline; }
span.a { background-color: #c7cdee; }
span.b { background-color: #f6f4cd; }
span.f { background-color: #c38df9; }
</style>
<div class="merged">
<pre style="background: #c7cdee;">
local <span class="decl">OFFSET_ERROR</span> = 0.97731
local function <span class="decl">calibrate</span> (<span class="decl">value</span>, <span class="decl">ratio</span>, <span class="decl">offset</span>)
</pre>
<pre style="background: #f6f4cd; border-left: 1rem solid #c7cdee; padding-left: 0;">
	local <span class="decl">real_offset</span> = <span class="use">offset</span> * <span class="use">OFFSET_ERROR</span>
	<span class="use">print</span>("offset:", <span class="use">real_offset</span>)
	return <span class="use">value</span> * <span class="use">ratio</span> + <span class="use">real_offset</span>
</pre>
<pre style="background: #c7cdee;">
end
</pre>
</div>
<p>Blocks marked with blue and yellow are called "chunks" in Lua. In this case blue is also considered as a special case
of a "main chunk" assuming it is a file fed into Lua. Yellow block is body of <b>calibrate</b> function. We can call it
"function body" and it is still considered as a "chunk". Yellow is not a "main chunk."
<p>In general, chunks are also lexical scopes for variables and it's fine to use these terms interchangeably. I'll do
that. Differentiation between these may matter, but only in the context of Lua loading internals as "chunk" has a more
specific meaning in there.
<p>I marked names of variables that are being declared or assigned in bold. Names that are underlined are being used or
otherwise referenced. And so we can observe <code>OFFSET_ERROR</code> being declared as a local and assigned a value in
the first line and then used in multiplication in third. In the second line variable <code>calibrate</code> is assigned
with a function as its value. These two syntax forms are explained in detail in
<a href="https://www.lua.org/manual/5.4/manual.html#3.3.7">local declaration</a> and
<a href="https://www.lua.org/manual/5.4/manual.html#3.4.11">function definition</a>.
<p>Both <code>OFFSET_ERROR</code> and <code>calibrate</code> use <code>local</code> keyword making these names available
only to the main chunk and its nested blocks. Of course, this makes them available in <b>calibrate</b>'s body. There's a
little bit more to how syntax around local function definition works but no meaningful problems occur in this particular
case, so I'll ignore it.
<p>Function arguments are automatically declared as local variables in their respective function scope. Other than that
from variables in function body we have <code>real_offset</code> local declaration. Similarly, because it is local, it
will be available only to function body and its nested blocks. This means main chunk cannot access it.
<p>Now, let's do a simple exercise and colour code variable references and their declarations:
<pre>
local <span class="a">OFFSET_ERROR</span> = 0.97731
local function calibrate (<span class="b">value</span>, <span class="b">ratio</span>, <span class="b">offset</span>)
	local <span class="b">real_offset</span> = <span class="b use">offset</span> * <span class="a use">OFFSET_ERROR</span>
	<span class="f use">print</span>("offset:", <span class="b use">real_offset</span>)
	return <span class="b use">value</span> * <span class="b use">ratio</span> + <span class="b use">real_offset</span>
end
</pre>
<p>Variables are used in three different ways here:
<dl>
<dt><code>value</code>, <code>ratio</code>, <code>offset</code>, <code>real_offset</code>
<dd>are used and declared inside function body
<dt><code>OFFSET_ERROR</code><dd>is used inside function body but is declared in the outer scope
<dt><code>print</code><dd>is used inside function body but is never visibly declared
</dl>
<p>First case is our ideal case. Keyword <code>local</code> is telling us exactly what is happening: variables are
declared and used only within the function body. They are local variables. They don't spill anywhere else unless they
are passed to a function as an argument. <code>OFFSET_ERROR</code> does spill but only to <b>calibrate</b>'s body
because it is a nested scope.
<p>They behave similarly to <em>bound variables</em> from lambda calculus. <code>OFFSET_ERROR</code> is a bit closer to
them in principle, but the idea is that source of their value is exactly known.
<p>On the other hand, <code>print</code> behaves like a <em>free variable</em>. It is never declared as local and by
default Lua considers such variables <em>global</em>. When global is encountered, <em>environment</em> is queried to
provide a value for it.


<h2>Upvalues</h2>
<p>Before jumping into <em>environments</em>, let's introduce one last term and talk about how <code>OFFSET_ERROR</code>
propagates internally. Whenever a local from an outer scope is referenced in a chunk it's called an <em>upvalue</em>.
Once referenced, they are bound to that particular chunk "by reference" and stored so they continue to live with it.
<p>Upvalues implement core principle of closures in Lua. Consider two counters:
<pre>
local init = 0
local function new_counter ()
	local x = init
	return function ()
		x = x + 1
		return x
	end
end
local a = new_counter()
local b = new_counter()
print(a())  -- 1
print(a())  -- 2
print(a())  -- 3
print(b())  -- 1
</pre>
<p>There are two instances of <code>x</code> created one for each call of <code>new_counter</code>. Each is bound to the
anonymous function which is in turn returned. This sequence can be also interpreted as a construction of function-like
object.
<p><code>Init</code> is bound only to the <code>new_counter</code> and is not bound to the anonymous function.


<h2>Environment</h2>
<p>Back to the main topic! Quick reminder: in Lua "environment" is used to resolve references to names that are not
local variables or upvalues. In other words it's a way to deal with free variables when the program executes.
<p>"Environments" are associative tables. They link global variable names to actual variables. The environments
themselves are bound to functions as upvalues called <code>_ENV</code>, whenever they are needed. It's done implicitly;
quietly in the background. This means that the <code>calibrate</code> function from the first example actually has two
upvalues: <code>OFFSET_ERROR</code>, and <code>_ENV</code>. <code>_ENV</code> by default took as its value a table that
was used as global environment at that time. If <code>calibrate</code> wouldn't use <code>print</code>,
<code>_ENV</code> wouldn't be there at all.
<p>This is quite important, so let me repeat. Environments are here to deal with free variables, but are bound variables
themselves.</p>
<pre>
local
function hello ()
	print "Hello!"
end
local _ENV = {print = function () end}
hello()
</pre>
<p>We create a simple local function that is meant to print out "Hello!" to standard output. After that we overwrite
current environment with a new one that contains <code>print</code> function that does nothing. If we call
<code>hello</code>, it still prints out "Hello!" like it was meant to. It's because it's bound to the original
environment, not the new one.
<p>In the meantime you might have noticed that environments may appear somehow interchangeable with upvalues. That's
correct to some extent, and it's because of the things I've already mentioned: ambiguity is one, dealing with free
variables while being a bound variable is two. In program execution bound variables (in our terms: upvalues) are there
to deal with free variables, and here we are doing the same thing.</p>
<img src="environments_in_lua_5_2_and_beyond-2.png" alt="fat bird">
<h2>Usage</h2>
<p>Yeah, that's all of the explanation there was. I could sum it up in: "environments are tables bound to upvalues that
resolve free variables". Cool, how and when can we use them?
<p>The most common use case is sandboxing or more generally: limiting things available to scripts. Let's say we develop
a program that uses Lua as a scripting language. We load all default modules from Lua for ourselves: <code>io</code>,
<code>debug</code>, <code>string</code>, whatever we want. However, we don't want to expose all of them to external
scripts. To do so, we prepare a table that will act as an environment for them and simply assign it as
<code>_ENV</code> upvalue. Most likely through <code>load</code> or <code>loadfile</code> function:</p>
<pre>
local end_user_env = {
	print = print
}
local script = loadfile("external.lua", nil, end_user_env)
script()
</pre>
<p>Of course, you can do that from C API, too. This requires us to acknowledge one more thing: upvalues are stored as
a list and are indexed. For regular functions <code>_ENV</code> upvalue might be in any place of this list. For main
chunks, loaded external scripts, or the "chunk" from the first example <code>_ENV</code> is expected to be first on the
list.</p>
<pre>
luaL_loadfile(L, "external.lua");
lua_newtable(L);
lua_pushliteral(L, "print");
lua_getglobal(L, "print");
lua_settable(L, -3);
lua_setupvalue(L, -2, 1);
lua_call(L, 0, 0);
</pre>
<p>It could also be done by prepending <code>local _ENV = end_user_env</code> to the external script before loading it,
but that's a hassle:</p>
<pre>
local file = io.open "env.lua"
local content = file:read "*a"
file:close()
content = "local _ENV = {print = print}\n" .. content
local script = load(content)
script()
</pre>
<p>This method of environment manipulation can be used in other cases for more in-line changes as seen in one of the
previous examples. This is the new way of making magic tricks after <code>setfenv</code> is gone. I'll leave this as a
topic for another time. I think the examples above are sufficient for now. From here I can expand to magic or sandboxing
details.
</article>
<script src="https://stats.ignore.pl/track.js"></script>