Environments in Lua can be used to isolate selected parts of a program, expose an API, create a markup
format, widely modify behaviour at runtime, and many more. They can also create a lot of problems. Overall, we can
summarize it as "sandboxing."
I will focus on Lua, but I won't go too deep into language or implementation details. Instead, I want to provide an
overview on design and essentials with minor points on syntax and the inner workings. This way I aim to make this
document useful to you whether you are or not a Lua user.
Before Lua 5.2, we had setfenv
family of functions for operating on function environments directly. With some tricks and debug library we could do real
magic. This approach was very versatile but could also very easily become unclear. It was also almost completely
disconnected from the language.
In Lua 5.2 setfenv
family was removed in favour of a new approach centred around _ENV
variable. This approach is usually called Lexical Environments and it is the main focus of this document.
I strongly encourage everyone to learn lambda calculus. Some vocabulary will be borrowed from it and the design and
behaviour of Lua resembles it at many levels. Additionally, it gives a good overview over what is happening with
closures, anonymous functions, lambda expressions, or whatever else a programming language can call an ad hoc function.
It may also serve as a way to build better intuition about scoping rules in different languages.
Terms
Now then, let's consider a code sample and define selected terms:
local OFFSET_ERROR = 0.97731
local function calibrate (value, ratio, offset)
local real_offset = offset * OFFSET_ERROR
print("offset:", real_offset)
return value * ratio + real_offset
end
Blocks marked with blue and yellow are called "chunks" in Lua. In this case blue is also considered as a special case
of a "main chunk" assuming it is a file fed into Lua. Yellow block is body of calibrate function. We can call it
"function body" and it is still considered as a "chunk". Yellow is not a "main chunk."
In general, chunks are also lexical scopes for variables and it's fine to use these terms interchangeably. I'll do
that. Differentiation between these may matter, but only in the context of Lua loading internals as "chunk" has a more
specific meaning in there.
I marked names of variables that are being declared or assigned in bold. Names that are underlined are being used or
otherwise referenced. And so we can observe OFFSET_ERROR
being declared as a local and assigned a value in
the first line and then used in multiplication in third. In the second line variable calibrate
is assigned
with a function as its value. These two syntax forms are explained in detail in
local declaration and
function definition.
Both OFFSET_ERROR
and calibrate
use local
keyword making these names available
only to the main chunk and its nested blocks. Of course, this makes them available in calibrate's body. There's a
little bit more to how syntax around local function definition works but no meaningful problems occur in this particular
case, so I'll ignore it.
Function arguments are automatically declared as local variables in their respective function scope. Other than that
from variables in function body we have real_offset
local declaration. Similarly, because it is local, it
will be available only to function body and its nested blocks. This means main chunk cannot access it.
Now, let's do a simple exercise and colour code variable references and their declarations:
local OFFSET_ERROR = 0.97731
local function calibrate (value, ratio, offset)
local real_offset = offset * OFFSET_ERROR
print("offset:", real_offset)
return value * ratio + real_offset
end
Variables are used in three different ways here:
value
, ratio
, offset
, real_offset
- are used and declared inside function body
OFFSET_ERROR
- is used inside function body but is declared in the outer scope
print
- is used inside function body but is never visibly declared
First case is our ideal case. Keyword local
is telling us exactly what is happening: variables are
declared and used only within the function body. They are local variables. They don't spill anywhere else unless they
are passed to a function as an argument. OFFSET_ERROR
does spill but only to calibrate's body
because it is a nested scope.
They behave similarly to bound variables from lambda calculus. OFFSET_ERROR
is a bit closer to
them in principle, but the idea is that source of their value is exactly known.
On the other hand, print
behaves like a free variable. It is never declared as local and by
default Lua considers such variables global. When global is encountered, environment is queried to
provide a value for it.
Upvalues
Before jumping into environments, let's introduce one last term and talk about how OFFSET_ERROR
propagates internally. Whenever a local from an outer scope is referenced in a chunk it's called an upvalue.
Once referenced, they are bound to that particular chunk "by reference" and stored so they continue to live with it.
Upvalues implement core principle of closures in Lua. Consider two counters:
local init = 0
local function new_counter ()
local x = init
return function ()
x = x + 1
return x
end
end
local a = new_counter()
local b = new_counter()
print(a()) -- 1
print(a()) -- 2
print(a()) -- 3
print(b()) -- 1
There are two instances of x
created one for each call of new_counter
. Each is bound to the
anonymous function which is in turn returned. This sequence can be also interpreted as a construction of function-like
object.
Init
is bound only to the new_counter
and is not bound to the anonymous function.
Environment
Back to the main topic! Quick reminder: in Lua "environment" is used to resolve references to names that are not
local variables or upvalues. In other words it's a way to deal with free variables when the program executes.
"Environments" are associative tables. They link global variable names to actual variables. The environments
themselves are bound to functions as upvalues called _ENV
, whenever they are needed. It's done implicitly;
quietly in the background. This means that the calibrate
function from the first example actually has two
upvalues: OFFSET_ERROR
, and _ENV
. _ENV
by default took as its value a table that
was used as global environment at that time. If calibrate
wouldn't use print
,
_ENV
wouldn't be there at all.
This is quite important, so let me repeat. Environments are here to deal with free variables, but are bound variables
themselves.
local
function hello ()
print "Hello!"
end
local _ENV = {print = function () end}
hello()
We create a simple local function that is meant to print out "Hello!" to standard output. After that we overwrite
current environment with a new one that contains print
function that does nothing. If we call
hello
, it still prints out "Hello!" like it was meant to. It's because it's bound to the original
environment, not the new one.
In the meantime you might have noticed that environments may appear somehow interchangeable with upvalues. That's
correct to some extent, and it's because of the things I've already mentioned: ambiguity is one, dealing with free
variables while being a bound variable is two. In program execution bound variables (in our terms: upvalues) are there
to deal with free variables, and here we are doing the same thing.
Usage
Yeah, that's all of the explanation there was. I could sum it up in: "environments are tables bound to upvalues that
resolve free variables". Cool, how and when can we use them?
The most common use case is sandboxing or more generally: limiting things available to scripts. Let's say we develop
a program that uses Lua as a scripting language. We load all default modules from Lua for ourselves: io
,
debug
, string
, whatever we want. However, we don't want to expose all of them to external
scripts. To do so, we prepare a table that will act as an environment for them and simply assign it as
_ENV
upvalue. Most likely through load
or loadfile
function:
local end_user_env = {
print = print
}
local script = loadfile("external.lua", nil, end_user_env)
script()
Of course, you can do that from C API, too. This requires us to acknowledge one more thing: upvalues are stored as
a list and are indexed. For regular functions _ENV
upvalue might be in any place of this list. For main
chunks, loaded external scripts, or the "chunk" from the first example _ENV
is expected to be first on the
list.
luaL_loadfile(L, "external.lua");
lua_newtable(L);
lua_pushliteral(L, "print");
lua_getglobal(L, "print");
lua_settable(L, -3);
lua_setupvalue(L, -2, 1);
lua_call(L, 0, 0);
It could also be done by prepending local _ENV = end_user_env
to the external script before loading it,
but that's a hassle:
local file = io.open "env.lua"
local content = file:read "*a"
file:close()
content = "local _ENV = {print = print}\n" .. content
local script = load(content)
script()
This method of environment manipulation can be used in other cases for more in-line changes as seen in one of the
previous examples. This is the new way of making magic tricks after setfenv
is gone. I'll leave this as a
topic for another time. I think the examples above are sufficient for now. From here I can expand to magic or sandboxing
details.