From 907144ec08a506bcd03053770f8b481045bca848 Mon Sep 17 00:00:00 2001 From: Aki Date: Wed, 13 Nov 2024 23:17:13 +0100 Subject: Rewrote introduction sections of environments in Lua 5.2 I'm still not happy with the state of this article, but it is a bit better now. The flow is quite bad, there are too many pointless expressions, and too little good information. However, it still seems good enough to stay, but I'll keep an eye on it. --- environments_in_lua_5_2_and_beyond.html | 172 +++++++++++++++++++++----------- 1 file changed, 113 insertions(+), 59 deletions(-) (limited to 'environments_in_lua_5_2_and_beyond.html') diff --git a/environments_in_lua_5_2_and_beyond.html b/environments_in_lua_5_2_and_beyond.html index 3c6469c..722645a 100644 --- a/environments_in_lua_5_2_and_beyond.html +++ b/environments_in_lua_5_2_and_beyond.html @@ -17,80 +17,134 @@
-

Environments are a way of dealing with various problems. Or creating them entirely on your own. Primarily, they are -used to isolate a selected part of a program. As Lua is meant to be used as an embedded language, you may find yourself -wanting to separate user created addons from more internal scripts. In short: sandboxing and overall security. -

While I will focus on environments in Lua I won't go too deeply into the implementation details. I'll try to focus -more on design and general overview with minor thoughts on syntax and the inner workings. I think you might find this -text interesting no matter if you know or are interested in Lua itself. -

Previously, we had setfenv and -related functions to do exactly that. With some tricks, debug library, and faith, you could do real magic, which is kind -of cool. However, magic can easily become arcane, thus unclear. -

With Lua 5.2 setfenv and related were removed in favour of a new approach. This one uses a simple local -variable with name _ENV. Luckily, this approach can also be fun. It has other benefits over the old one, -but the goal here is not a comparison. -

One more thing before we hop into examples: I strongly encourage to read some documents on Lambda Calculus. They -will give you quite good overview over what is happening with closures/anonymous functions/lambda expressions/whatever -they are called. It's a good foundation and an entertaining exercise. This way you can quickly draw similarities, -considering that the Calculus is easier to grasp than the most of language-syntax-related shenanigans.

+

Environments in Lua can be used to isolate selected parts of a program, expose an API, create a markup +format, widely modify behaviour at runtime, and many more. They can also create a lot of problems. Overall, we can +summarize it as "sandboxing." +

I will focus on Lua, but I won't go too deep into language or implementation details. Instead, I want to provide an +overview on design and essentials with minor points on syntax and the inner workings. This way I aim to make this +document useful to you whether you are or not a Lua user. +

Before Lua 5.2, we had setfenv +family of functions for operating on function environments directly. With some tricks and debug library we could do real +magic. This approach was very versatile but could also very easily become unclear. It was also almost completely +disconnected from the language. +

In Lua 5.2 setfenv family was removed in favour of a new approach centred around _ENV +variable. This approach is usually called Lexical Environments and it is the main focus of this document. +

I strongly encourage everyone to learn lambda calculus. Some vocabulary will be borrowed from it and the design and +behaviour of Lua resembles it at many levels. Additionally, it gives a good overview over what is happening with +closures, anonymous functions, lambda expressions, or whatever else a programming language can call an ad hoc function. +It may also serve as a way to build better intuition about scoping rules in different languages.

lambda

Terms

-

Now then, let's have a simple example to define few selected terms:

+

Now then, let's consider a code sample and define selected terms:

+ +
+
+local OFFSET_ERROR = 0.97731
+local function calibrate (value, ratio, offset)
+
+
+	local real_offset = offset * OFFSET_ERROR
+	print("offset:", real_offset)
+	return value * ratio + real_offset
+
+
+end
+
+
+

Blocks marked with blue and yellow are called "chunks" in Lua. In this case blue is also considered as a special case +of a "main chunk" assuming it is a file fed into Lua. Yellow block is body of calibrate function. We can call it +"function body" and it is still considered as a "chunk". Yellow is not a "main chunk." +

In general, chunks are also lexical scopes for variables and it's fine to use these terms interchangeably. I'll do +that. Differentiation between these may matter, but only in the context of Lua loading internals as "chunk" has a more +specific meaning in there. +

I marked names of variables that are being declared or assigned in bold. Names that are underlined are being used or +otherwise referenced. And so we can observe OFFSET_ERROR being declared as a local and assigned a value in +the first line and then used in multiplication in third. In the second line variable calibrate is assigned +with a function as its value. These two syntax forms are explained in detail in +local declaration and +function definition. +

Both OFFSET_ERROR and calibrate use local keyword making these names available +only to the main chunk and its nested blocks. Of course, this makes them available in calibrate's body. There's a +little bit more to how syntax around local function definition works but no meaningful problems occur in this particular +case, so I'll ignore it. +

Function arguments are automatically declared as local variables in their respective function scope. Other than that +from variables in function body we have real_offset local declaration. Similarly, because it is local, it +will be available only to function body and its nested blocks. This means main chunk cannot access it. +

Now, let's do a simple exercise and colour code variable references and their declarations:

--- Start of chunk's body
-local OFFSET_ERROR = 0.97731
-local
-function calibrate (value, ratio, offset)
-	-- Start of function's body; not part of chunk's body
-	local real_offset = offset * OFFSET_ERROR
-	print("offset:", real_offset)
-	return value * ration + real_offset
-	-- End of function's body
+local OFFSET_ERROR = 0.97731
+local function calibrate (value, ratio, offset)
+	local real_offset = offset * OFFSET_ERROR
+	print("offset:", real_offset)
+	return value * ratio + real_offset
 end
--- End of chunk's body
 
-

Surprisingly, there's a lot of going on in here. First off, we have two scopes: "chunk's body", and "function's -body". "Chunk's body" has two local variables: OFFSET_ERROR (that acts as a constant), and -calibrate (a function). In turn, "function's body" has four local variables: value, -ratio, offset (those three are the arguments for the function), and real_offset -(a temporary variable I added just to show that function body may also have explicit local variable). We will call all -of those variables exactly in the way I already did: local variables. -

In addition to the local variables, "function's body" also refers to two other names. First one is -OFFSET_ERROR. We already know this one; it's a local variable from the chunk. A smaller scope that is -inside another scope can refer to their local variables as they want. They are called upvalues then. This works -on any level, no matter how deeply the scopes are nested, as long as it makes sense. It doesn't work the other way: -outer scope referring to local variable in inner scope is a no-no. -

Second external reference in "function's body" is print. We don't see it defined anywhere as a local -variable. Commonly such variables are called globals or global variables. That's how we will call them. -

Here's the part that can be slightly ambiguous: definition of "environment". I can see myself include in it all three -types of variables I mentioned, only upvalues and globals, or just globals. Depending on the situation, all of these -cases may be fitting, and looking at how other languages use the term, and how it's similar to bound variables in the -Lambda Calculus; they are all good explanations. The Lua itself uses "environment" only to refer to the method that -resolves references to globals. +

Variables are used in three different ways here: +

+
value, ratio, offset, real_offset +
are used and declared inside function body +
OFFSET_ERROR
is used inside function body but is declared in the outer scope +
print
is used inside function body but is never visibly declared +
+

First case is our ideal case. Keyword local is telling us exactly what is happening: variables are +declared and used only within the function body. They are local variables. They don't spill anywhere else unless they +are passed to a function as an argument. OFFSET_ERROR does spill but only to calibrate's body +because it is a nested scope. +

They behave similarly to bound variables from lambda calculus. OFFSET_ERROR is a bit closer to +them in principle, but the idea is that source of their value is exactly known. +

On the other hand, print behaves like a free variable. It is never declared as local and by +default Lua considers such variables global. When global is encountered, environment is queried to +provide a value for it. + +

Upvalues

-

That being said, let's talk about the upvalues for a moment before we go to the globals. Upvalues are the closest -thing you can have to a bound variable from the Lambda Calculus. They always bind "by reference", simply because the -variable name is a direct reference to the variable and it's value. Whatever that means, see for yourself:

+

Before jumping into environments, let's introduce one last term and talk about how OFFSET_ERROR +propagates internally. Whenever a local from an outer scope is referenced in a chunk it's called an upvalue. +Once referenced, they are bound to that particular chunk "by reference" and stored so they continue to live with it. +

Upvalues implement core principle of closures in Lua. Consider two counters:

-local counter = 0
-local
-function increment ()
-	counter = counter + 1
+local init = 0
+local function new_counter ()
+	local x = init
+	return function ()
+		x = x + 1
+		return x
+	end
 end
-increment()
-increment()
-increment()
-print(counter)
+local a = new_counter()
+local b = new_counter()
+print(a())  -- 1
+print(a())  -- 2
+print(a())  -- 3
+print(b())  -- 1
 
-

This example will print out "3". Upvalues also play huge part in garbage collection due to their nature but that's -none of the concerns of this article. +

There are two instances of x created one for each call of new_counter. Each is bound to the +anonymous function which is in turn returned. This sequence can be also interpreted as a construction of function-like +object. +

Init is bound only to the new_counter and is not bound to the anonymous function. + +

Environment

Back to the main topic! Quick reminder: in Lua "environment" is used to resolve references to names that are not local variables or upvalues. In other words it's a way to deal with free variables when the program executes.

"Environments" are associative tables. They link global variable names to actual variables. The environments themselves are bound to functions as upvalues called _ENV, whenever they are needed. It's done implicitly; quietly in the background. This means that the calibrate function from the first example actually has two -upvalues: OFFEST_ERROR, and _ENV. _ENV by default took as its value a table that +upvalues: OFFSET_ERROR, and _ENV. _ENV by default took as its value a table that was used as global environment at that time. If calibrate wouldn't use print, _ENV wouldn't be there at all.

This is quite important, so let me repeat. Environments are here to deal with free variables, but are bound variables -- cgit v1.1