Environments in Lua 5.2 and Beyond

Published on 2020-07-04 20:39:00+02:00

Environments are a way of dealing with various problems. Or creating them entirely on your own. Primarily, they are used to isolate a selected part of a program. As Lua is meant to be used as an embedded language, you may find yourself wanting to separate user created addons from more internal scripts. In short: sandboxing and overall security.

While I will focus on environments in Lua I won't go too deeply into the implementation details. I'll try to focus more on design and general overview with minor thoughts on syntax and the inner workings. I think you might find this text interesting no matter if you know or are interested in Lua itself.

Previously, we had setfenv and related functions to do exactly that. With some tricks, debug library, and faith, you could do real magic, which is kind of cool. However, magic can easily become arcane, thus unclear.

With Lua 5.2 setfenv and related were removed in favour of a new approach. This one uses a simple local variable with name _ENV. Luckily, this approach can also be fun. It has other benefits over the old one, but the goal here is not a comparison.

One more thing before we hop into examples: I strongly encourage to read some documents on Lambda Calculus. They will give you quite good overview over what is happening with closures/anonymous functions/lambda expressions/whatever they are called. It's a good foundation and an entertaining exercise. This way you can quickly draw similarities, considering that the Calculus is easier to grasp than the most of language-syntax-related shenanigans.

lambda

Terms

Now then, let's have a simple example to define few selected terms:

-- Start of chunk's body
local OFFSET_ERROR = 0.97731
local
function calibrate (value, ratio, offset)
	-- Start of function's body; not part of chunk's body
	local real_offset = offset * OFFSET_ERROR
	print("offset:", real_offset)
	return value * ration + real_offset
	-- End of function's body
end
-- End of chunk's body

Surprisingly, there's a lot of going on in here. First off, we have two scopes: "chunk's body", and "function's body". "Chunk's body" has two local variables: OFFSET_ERROR (that acts as a constant), and calibrate (a function). In turn, "function's body" has four local variables: value, ratio, offset (those three are the arguments for the function), and real_offset (a temporary variable I added just to show that function body may also have explicit local variable). We will call all of those variables exactly in the way I already did: local variables.

In addition to the local variables, "function's body" also refers to two other names. First one is OFFSET_ERROR. We already know this one; it's a local variable from the chunk. A smaller scope that is inside another scope can refer to their local variables as they want. They are called upvalues then. This works on any level, no matter how deeply the scopes are nested, as long as it makes sense. It doesn't work the other way: outer scope referring to local variable in inner scope is a no-no.

Second external reference in "function's body" is print. We don't see it defined anywhere as a local variable. Commonly such variables are called globals or global variables. That's how we will call them.

Here's the part that can be slightly ambiguous: definition of "environment". I can see myself include in it all three types of variables I mentioned, only upvalues and globals, or just globals. Depending on the situation, all of these cases may be fitting, and looking at how other languages use the term, and how it's similar to bound variables in the Lambda Calculus; they are all good explanations. The Lua itself uses "environment" only to refer to the method that resolves references to globals.

Upvalues

That being said, let's talk about the upvalues for a moment before we go to the globals. Upvalues are the closest thing you can have to a bound variable from the Lambda Calculus. They always bind "by reference", simply because the variable name is a direct reference to the variable and it's value. Whatever that means, see for yourself:

local counter = 0
local
function increment ()
	counter = counter + 1
end
increment()
increment()
increment()
print(counter)

This example will print out "3". Upvalues also play huge part in garbage collection due to their nature but that's none of the concerns of this article.

Environment

Back to the main topic! Quick reminder: in Lua "environment" is used to resolve references to names that are not local variables or upvalues. In other words it's a way to deal with free variables when the program executes.

"Environments" are associative tables. They link global variable names to actual variables. The environments themselves are bound to functions as upvalues called _ENV, whenever they are needed. It's done implicitly; quietly in the background. This means that the calibrate function from the first example actually has two upvalues: OFFEST_ERROR, and _ENV. _ENV by default took as its value a table that was used as global environment at that time. If calibrate wouldn't use print, _ENV wouldn't be there at all.

This is quite important, so let me repeat. Environments are here to deal with free variables, but are bound variables themselves.

local
function hello ()
	print "Hello!"
end
local _ENV = {print = function () end}
hello()

We create a simple local function that is meant to print out "Hello!" to standard output. After that we overwrite current environment with a new one that contains print function that does nothing. If we call hello, it still prints out "Hello!" like it was meant to. It's because it's bound to the original environment, not the new one.

In the meantime you might have noticed that environments may appear somehow interchangeable with upvalues. That's correct to some extent, and it's because of the things I've already mentioned: ambiguity is one, dealing with free variables while being a bound variable is two. In program execution bound variables (in our terms: upvalues) are there to deal with free variables, and here we are doing the same thing.

fat bird

Usage

Yeah, that's all of the explanation there was. I could sum it up in: "environments are tables bound to upvalues that resolve free variables". Cool, how and when can we use them?

The most common use case is sandboxing or more generally: limiting things available to scripts. Let's say we develop a program that uses Lua as a scripting language. We load all default modules from Lua for ourselves: io, debug, string, whatever we want. However, we don't want to expose all of them to external scripts. To do so, we prepare a table that will act as an environment for them and simply assign it as _ENV upvalue. Most likely through load or loadfile function:

local end_user_env = {
	print = print
}
local script = loadfile("external.lua", nil, end_user_env)
script()

Of course, you can do that from C API, too. This requires us to acknowledge one more thing: upvalues are stored as a list and are indexed. For regular functions _ENV upvalue might be in any place of this list. For main chunks, loaded external scripts, or the "chunk" from the first example _ENV is expected to be first on the list.

luaL_loadfile(L, "external.lua");
lua_newtable(L);
lua_pushliteral(L, "print");
lua_getglobal(L, "print");
lua_settable(L, -3);
lua_setupvalue(L, -2, 1);
lua_call(L, 0, 0);

It could also be done by prepending local _ENV = end_user_env to the external script before loading it, but that's a hassle:

local file = io.open "env.lua"
local content = file:read "*a"
file:close()
content = "local _ENV = {print = print}\n" .. content
local script = load(content)
script()

This method of environment manipulation can be used in other cases for more in-line changes as seen in one of the previous examples. This is the new way of making magic tricks after setfenv is gone. I'll leave this as a topic for another time. I think the examples above are sufficient for now. From here I can expand to magic or sandboxing details.