Environments in Lua 5.2 and Beyond
Published on 2020-07-04 20:39:00+02:00
Environments are a way of dealing with various problems. Or creating them entirely on your own. Primarily, they are
used to isolate a selected part of a program. As Lua is meant to be used as an embedded language, you may find yourself
wanting to separate user created addons from more internal scripts. In short: sandboxing and overall security.
While I will focus on environments in Lua I won't go too deeply into the implementation details. I'll try to focus
more on design and general overview with minor thoughts on syntax and the inner workings. I think you might find this
text interesting no matter if you know or are interested in Lua itself.
Previously, we had setfenv
and
related functions to do exactly that. With some tricks, debug library, and faith, you could do real magic, which is kind
of cool. However, magic can easily become arcane, thus unclear.
With Lua 5.2 setfenv
and related were removed in favour of a new approach. This one uses a simple local
variable with name _ENV
. Luckily, this approach can also be fun. It has other benefits over the old one,
but the goal here is not a comparison.
One more thing before we hop into examples: I strongly encourage to read some documents on Lambda Calculus. They
will give you quite good overview over what is happening with closures/anonymous functions/lambda expressions/whatever
they are called. It's a good foundation and an entertaining exercise. This way you can quickly draw similarities,
considering that the Calculus is easier to grasp than the most of language-syntax-related shenanigans.
Terms
Now then, let's have a simple example to define few selected terms:
-- Start of chunk's body
local OFFSET_ERROR = 0.97731
local
function calibrate (value, ratio, offset)
-- Start of function's body; not part of chunk's body
local real_offset = offset * OFFSET_ERROR
print("offset:", real_offset)
return value * ration + real_offset
-- End of function's body
end
-- End of chunk's body
Surprisingly, there's a lot of going on in here. First off, we have two scopes: "chunk's body", and "function's
body". "Chunk's body" has two local variables: OFFSET_ERROR
(that acts as a constant), and
calibrate
(a function). In turn, "function's body" has four local variables: value
,
ratio
, offset
(those three are the arguments for the function), and real_offset
(a temporary variable I added just to show that function body may also have explicit local variable). We will call all
of those variables exactly in the way I already did: local variables.
In addition to the local variables, "function's body" also refers to two other names. First one is
OFFSET_ERROR
. We already know this one; it's a local variable from the chunk. A smaller scope that is
inside another scope can refer to their local variables as they want. They are called upvalues then. This works
on any level, no matter how deeply the scopes are nested, as long as it makes sense. It doesn't work the other way:
outer scope referring to local variable in inner scope is a no-no.
Second external reference in "function's body" is print
. We don't see it defined anywhere as a local
variable. Commonly such variables are called globals or global variables. That's how we will call them.
Here's the part that can be slightly ambiguous: definition of "environment". I can see myself include in it all three
types of variables I mentioned, only upvalues and globals, or just globals. Depending on the situation, all of these
cases may be fitting, and looking at how other languages use the term, and how it's similar to bound variables in the
Lambda Calculus; they are all good explanations. The Lua itself uses "environment" only to refer to the method that
resolves references to globals.
Upvalues
That being said, let's talk about the upvalues for a moment before we go to the globals. Upvalues are the closest
thing you can have to a bound variable from the Lambda Calculus. They always bind "by reference", simply because the
variable name is a direct reference to the variable and it's value. Whatever that means, see for yourself:
local counter = 0
local
function increment ()
counter = counter + 1
end
increment()
increment()
increment()
print(counter)
This example will print out "3". Upvalues also play huge part in garbage collection due to their nature but that's
none of the concerns of this article.
Environment
Back to the main topic! Quick reminder: in Lua "environment" is used to resolve references to names that are not
local variables or upvalues. In other words it's a way to deal with free variables when the program executes.
"Environments" are associative tables. They link global variable names to actual variables. The environments
themselves are bound to functions as upvalues called _ENV
, whenever they are needed. It's done implicitly;
quietly in the background. This means that the calibrate
function from the first example actually has two
upvalues: OFFEST_ERROR
, and _ENV
. _ENV
by default took as its value a table that
was used as global environment at that time. If calibrate
wouldn't use print
,
_ENV
wouldn't be there at all.
This is quite important, so let me repeat. Environments are here to deal with free variables, but are bound variables
themselves.
local
function hello ()
print "Hello!"
end
local _ENV = {print = function () end}
hello()
We create a simple local function that is meant to print out "Hello!" to standard output. After that we overwrite
current environment with a new one that contains print
function that does nothing. If we call
hello
, it still prints out "Hello!" like it was meant to. It's because it's bound to the original
environment, not the new one.
In the meantime you might have noticed that environments may appear somehow interchangeable with upvalues. That's
correct to some extent, and it's because of the things I've already mentioned: ambiguity is one, dealing with free
variables while being a bound variable is two. In program execution bound variables (in our terms: upvalues) are there
to deal with free variables, and here we are doing the same thing.
Usage
Yeah, that's all of the explanation there was. I could sum it up in: "environments are tables bound to upvalues that
resolve free variables". Cool, how and when can we use them?
The most common use case is sandboxing or more generally: limiting things available to scripts. Let's say we develop
a program that uses Lua as a scripting language. We load all default modules from Lua for ourselves: io
,
debug
, string
, whatever we want. However, we don't want to expose all of them to external
scripts. To do so, we prepare a table that will act as an environment for them and simply assign it as
_ENV
upvalue. Most likely through load
or loadfile
function:
local end_user_env = {
print = print
}
local script = loadfile("external.lua", nil, end_user_env)
script()
Of course, you can do that from C API, too. This requires us to acknowledge one more thing: upvalues are stored as
a list and are indexed. For regular functions _ENV
upvalue might be in any place of this list. For main
chunks, loaded external scripts, or the "chunk" from the first example _ENV
is expected to be first on the
list.
luaL_loadfile(L, "external.lua");
lua_newtable(L);
lua_pushliteral(L, "print");
lua_getglobal(L, "print");
lua_settable(L, -3);
lua_setupvalue(L, -2, 1);
lua_call(L, 0, 0);
It could also be done by prepending local _ENV = end_user_env
to the external script before loading it,
but that's a hassle:
local file = io.open "env.lua"
local content = file:read "*a"
file:close()
content = "local _ENV = {print = print}\n" .. content
local script = load(content)
script()
This method of environment manipulation can be used in other cases for more in-line changes as seen in one of the
previous examples. This is the new way of making magic tricks after setfenv
is gone. I'll leave this as a
topic for another time. I think the examples above are sufficient for now. From here I can expand to magic or sandboxing
details.