From ad76e9b885c9b9692074cf5b8b880cb79f8a48e0 Mon Sep 17 00:00:00 2001 From: Aki Date: Sun, 25 Jul 2021 19:17:40 +0200 Subject: Initialized website as git repository --- environments_in_lua_5_2_and_beyond.html | 155 ++++++++++++++++++++++++++++++++ 1 file changed, 155 insertions(+) create mode 100644 environments_in_lua_5_2_and_beyond.html (limited to 'environments_in_lua_5_2_and_beyond.html') diff --git a/environments_in_lua_5_2_and_beyond.html b/environments_in_lua_5_2_and_beyond.html new file mode 100644 index 0000000..fc2b9b7 --- /dev/null +++ b/environments_in_lua_5_2_and_beyond.html @@ -0,0 +1,155 @@ + + + + + + + + + +Environments in Lua 5.2 and Beyond + + + +
+

Environments in Lua 5.2 and Beyond

+

Published on 2020-07-04 20:39:00+02:00 +

Environments are a way of dealing with various problems. Or creating them entirely on your own. Primarily, they are +used to isolate a selected part of a program. As Lua is meant to be used as an embedded language, you may find yourself +wanting to separate user created addons from more internal scripts. In short: sandboxing and overall security. +

While I will focus on environments in Lua I won't go too deeply into the implementation details. I'll try to focus +more on design and general overview with minor thoughts on syntax and the inner workings. I think you might find this +text interesting no matter if you know or are interested in Lua itself. +

Previously, we had setfenv and +related functions to do exactly that. With some tricks, debug library, and faith, you could do real magic, which is kind +of cool. However, magic can easily become arcane, thus unclear. +

With Lua 5.2 setfenv and related were removed in favour of a new approach. This one uses a simple local +variable with name _ENV. Luckily, this approach can also be fun. It has other benefits over the old one, +but the goal here is not a comparison. +

One more thing before we hop into examples: I strongly encourage to read some documents on Lambda Calculus. They +will give you quite good overview over what is happening with closures/anonymous functions/lambda expressions/whatever +they are called. It's a good foundation and an entertaining exercise. This way you can quickly draw similarities, +considering that the Calculus is easier to grasp than the most of language-syntax-related shenanigans.

+lambda +

Terms

+

Now then, let's have a simple example to define few selected terms:

+
+-- Start of chunk's body
+local OFFSET_ERROR = 0.97731
+local
+function calibrate (value, ratio, offset)
+	-- Start of function's body; not part of chunk's body
+	local real_offset = offset * OFFSET_ERROR
+	print("offset:", real_offset)
+	return value * ration + real_offset
+	-- End of function's body
+end
+-- End of chunk's body
+
+

Surprisingly, there's a lot of going on in here. First off, we have two scopes: "chunk's body", and "function's +body". "Chunk's body" has two local variables: OFFSET_ERROR (that acts as a constant), and +calibrate (a function). In turn, "function's body" has four local variables: value, +ratio, offset (those three are the arguments for the function), and real_offset +(a temporary variable I added just to show that function body may also have explicit local variable). We will call all +of those variables exactly in the way I already did: local variables. +

In addition to the local variables, "function's body" also refers to two other names. First one is +OFFSET_ERROR. We already know this one; it's a local variable from the chunk. A smaller scope that is +inside another scope can refer to their local variables as they want. They are called upvalues then. This works +on any level, no matter how deeply the scopes are nested, as long as it makes sense. It doesn't work the other way: +outer scope referring to local variable in inner scope is a no-no. +

Second external reference in "function's body" is print. We don't see it defined anywhere as a local +variable. Commonly such variables are called globals or global variables. That's how we will call them. +

Here's the part that can be slightly ambiguous: definition of "environment". I can see myself include in it all three +types of variables I mentioned, only upvalues and globals, or just globals. Depending on the situation, all of these +cases may be fitting, and looking at how other languages use the term, and how it's similar to bound variables in the +Lambda Calculus; they are all good explanations. The Lua itself uses "environment" only to refer to the method that +resolves references to globals. +

Upvalues

+

That being said, let's talk about the upvalues for a moment before we go to the globals. Upvalues are the closest +thing you can have to a bound variable from the Lambda Calculus. They always bind "by reference", simply because the +variable name is a direct reference to the variable and it's value. Whatever that means, see for yourself:

+
+local counter = 0
+local
+function increment ()
+	counter = counter + 1
+end
+increment()
+increment()
+increment()
+print(counter)
+
+

This example will print out "3". Upvalues also play huge part in garbage collection due to their nature but that's +none of the concerns of this article. +

Environment

+

Back to the main topic! Quick reminder: in Lua "environment" is used to resolve references to names that are not +local variables or upvalues. In other words it's a way to deal with free variables when the program executes. +

"Environments" are associative tables. They link global variable names to actual variables. The environments +themselves are bound to functions as upvalues called _ENV, whenever they are needed. It's done implicitly; +quietly in the background. This means that the calibrate function from the first example actually has two +upvalues: OFFEST_ERROR, and _ENV. _ENV by default took as its value a table that +was used as global environment at that time. If calibrate wouldn't use print, +_ENV wouldn't be there at all. +

This is quite important, so let me repeat. Environments are here to deal with free variables, but are bound variables +themselves.

+
+local
+function hello ()
+	print "Hello!"
+end
+local _ENV = {print = function () end}
+hello()
+
+

We create a simple local function that is meant to print out "Hello!" to standard output. After that we overwrite +current environment with a new one that contains print function that does nothing. If we call +hello, it still prints out "Hello!" like it was meant to. It's because it's bound to the original +environment, not the new one. +

In the meantime you might have noticed that environments may appear somehow interchangeable with upvalues. That's +correct to some extent, and it's because of the things I've already mentioned: ambiguity is one, dealing with free +variables while being a bound variable is two. In program execution bound variables (in our terms: upvalues) are there +to deal with free variables, and here we are doing the same thing.

+fat bird +

Usage

+

Yeah, that's all of the explanation there was. I could sum it up in: "environments are tables bound to upvalues that +resolve free variables". Cool, how and when can we use them? +

The most common use case is sandboxing or more generally: limiting things available to scripts. Let's say we develop +a program that uses Lua as a scripting language. We load all default modules from Lua for ourselves: io, +debug, string, whatever we want. However, we don't want to expose all of them to external +scripts. To do so, we prepare a table that will act as an environment for them and simply assign it as +_ENV upvalue. Most likely through load or loadfile function:

+
+local end_user_env = {
+	print = print
+}
+local script = loadfile("external.lua", nil, end_user_env)
+script()
+
+

Of course, you can do that from C API, too. This requires us to acknowledge one more thing: upvalues are stored as +a list and are indexed. For regular functions _ENV upvalue might be in any place of this list. For main +chunks, loaded external scripts, or the "chunk" from the first example _ENV is expected to be first on the +list.

+
+luaL_loadfile(L, "external.lua");
+lua_newtable(L);
+lua_pushliteral(L, "print");
+lua_getglobal(L, "print");
+lua_settable(L, -3);
+lua_setupvalue(L, -2, 1);
+lua_call(L, 0, 0);
+
+

It could also be done by prepending local _ENV = end_user_env to the external script before loading it, +but that's a hassle:

+
+local file = io.open "env.lua"
+local content = file:read "*a"
+file:close()
+content = "local _ENV = {print = print}\n" .. content
+local script = load(content)
+script()
+
+

This method of environment manipulation can be used in other cases for more in-line changes as seen in one of the +previous examples. This is the new way of making magic tricks after setfenv is gone. I'll leave this as a +topic for another time. I think the examples above are sufficient for now. From here I can expand to magic or sandboxing +details. +

+ -- cgit v1.1