diff options
author | Aki <please@ignore.pl> | 2022-04-28 19:00:46 +0200 |
---|---|---|
committer | Aki <please@ignore.pl> | 2022-04-28 19:00:46 +0200 |
commit | 59cf5e9bcbd1e4195b680d8dc33c5f52a7cb6560 (patch) | |
tree | 348f9c1e389ffeac428c098a23f444a52ae4c796 /different_ways_of_making_errors.html | |
parent | 1486910d47075b65976195af75e904a32a7634aa (diff) | |
download | ignore.pl-59cf5e9bcbd1e4195b680d8dc33c5f52a7cb6560.zip ignore.pl-59cf5e9bcbd1e4195b680d8dc33c5f52a7cb6560.tar.gz ignore.pl-59cf5e9bcbd1e4195b680d8dc33c5f52a7cb6560.tar.bz2 |
Published different ways of making errors
Diffstat (limited to 'different_ways_of_making_errors.html')
-rw-r--r-- | different_ways_of_making_errors.html | 384 |
1 files changed, 384 insertions, 0 deletions
diff --git a/different_ways_of_making_errors.html b/different_ways_of_making_errors.html new file mode 100644 index 0000000..42a039b --- /dev/null +++ b/different_ways_of_making_errors.html @@ -0,0 +1,384 @@ +<!doctype html> +<html lang="en"> +<meta charset="utf-8"> +<meta name="viewport" content="width=device-width, initial-scale=1"> +<meta name="author" content="aki"> +<meta name="tags" content="programming, tutorial, error handling, errors, exceptions"> +<link rel="icon" type="image/png" href="cylo.png"> +<link rel="stylesheet" href="style.css"> + +<title>Different Ways of Making Errors</title> + +<nav><p><a href="https://ignore.pl">ignore.pl</a></p></nav> + +<article> +<h1>Different Ways of Making Errors</h1> +<p class="subtitle">Published on 2022-04-28 19:00:00+02:00 +<p>Errors are a key component of writing software - programs, libraries, scripts, you name it. We need to check for +them, catch them, mitigate, log, and finally create. In this article, I want to give you an overview of various methods +of that last activity - creating errors - sometimes also called raising or throwing, especially when the errors are +called exceptions. On that note... +<p>Yeah, before we dive into the topic, let's make it clear: I'll use "error" and "exception" here almost +interchangeably. This is because I'm talking here about the abstract case rather than the thing that is used to +represent it. If you are coming from e.g., Java you may find this confusing, because these two names are used +differently. On the other hand if you are coming from e.g., Python you might be wondering why am I even writing this +paragraph. +<p>Now then, as I was writing this article the classification of these methods evolved quite a few times. I doubt this +is the final form and depending on the feedback and my future endeavours I hope that I'll continue to make this list +better.</p> +<img src="different_ways_of_making_errors-1.png" alt="red flags"> + + +<h2>Returning the Error</h2> +<p>Let's start with something dead simple. In this method we indicate an occurrence of the error simply by returning a +value from the function or the program. There are different ways of doing that mostly in terms of types and error +details. + +<h3>Returning Boolean</h3> +<p>I don't think it gets any easier than that. As long as you don't need to return any meaningful value from the +function and you only want to indicate whether the function passed or failed. In such case this will do. Just make the +function return a boolean that answers "Did the function pass?" or "Did the function fail?" +<p>This method is sometimes used together with some bigger state. Especially if the executed function is a member of +some class. An example of such approach can be found in Qt. + +<h3>Returning Error Codes</h3> +<p>If you still don't need to return any meaningful value, but you want to differentiate between errors, you can encode +them with non-zero numbers. This time the value answers the question "What error occurred?" where zero means none. +<p>The most common example of this approach is classic shell, where the result of the last command is stored inside the +<code>$?</code> variable: +<pre> +$ ls real_file +real_file +$ echo $? +0 +$ ls does_not_exist +ls: cannot access 'does_not_exist': No such file or directory +$ echo $? +2 +</pre> +<p>Interestingly, shells implement if statements where <code>0</code> is interpreted as positive case: +<pre> +$ if ls real_file; then +> echo "True branch with $?" +> else +> echo "False branch with $?" +> fi +real_file +True branch with 0 +</pre> + +<h3>Returning Error Objects</h3> +<p>But you don't need to use numbers necessarily. The only requirement is that you remember about the ability to +represent all possible cases, including a situation in which no error occurred. +<p>The Go programming language is cleverly using its core mechanics to deal with errors: tuples, nil values and +interfaces. Function that wants to raise an error should return object that fulfills a special <b>error</b> +interface that requires an <b>Error()</b> method to be present. In case the function does not want to report +anything, it can just return <code>nil</code> instead. In simplified code it looks like this: +<pre> +type TooLargeError int64 + +func (err TooLargeError) Error() string { + return fmt.Sprintf("For reason number is too large: %d", err) +} + +func CheckNumber(value int64) error { + if value > 10 { + return TooLargeError(value) + } + return nil +} + +func main() { + err := CheckNumber(4) + if err != nil { + fmt.Println(err) + } + err = CheckNumber(14) + if err != nil { + fmt.Println(err) + } +} +</pre> +<p>It is worth noting here the difference between shell and go errors in terms of boolean logic. Depending on your +style, e.g., prevalence of early returns, you may want to consider whether to assign positive or negative boolean value +to case in which error did not occur. Both are viable. + +<h3>Returning an Invalid Value</h3> +<p>What happens if you want to return a meaningful value from the same function? +<p>In case of shell the return value is rarely used to store actual result, because that's the usual role of the +standard output stream. And in the above example of Go, the language has a very good built-in support for handling +tuples, so a function can just return a nilable error <em>and</em> the desired thing. +<p>The approach of Go can be used in many other languages, with or without syntactic support, but what if you are +forced to return a single primitive object from the function? +<p>Well, you can reverse the Error Codes approach by dedicating one or more from possible values to indicate errors +with them. Sometimes selecting those values can be straight-forward - for instance when the domain already has an +invalid space. Consider sizes which are usually represented with zero and positive integers, meaning if you use signed +integer as return value then you will have all of the negative numbers available to represent errors.</p> +<img src="different_ways_of_making_errors-2.png" alt="ruler with negative length"> +<p>This is the approach used by <strong>read</strong>(3). When successful the function returns amount of bytes read, +but on error it returns <code>-1</code> and sets a special global <b>errno</b>(3) to a value that describes what +exact error occurred: +<pre> +char buffer[1024]; +ssize_t bytes = read(fd, buffer, 1024); +if (bytes < 0) + perror("read()"); // Reads errno and prints description of error +else + do_something(buffer, bytes); +</pre> +<p>Note that I previously wrote that you can dedicate one <em>or more values</em>. Although I never found confirmation +in the POSIX standard, the only likely reason of read not using more negative numbers to indicate errors is to have +consistent interface to retrieve error details. Not all of the functions in the standard have enough available values +to indicate all the needed errors. +<p>Anyway, sometimes you have enough values to use but you choose not to use them, and sometimes you may be forced to +use a single value. Sometimes you may even need to create your own constraints and rules in order to indicate an error. +An example of that is memory allocation with <b>malloc</b>(3) that returns <code>NULL</code> in case of errors: +<pre> +void* buffer = malloc(4096); +if (NULL == buffer) + perror("malloc()"); // In case of malloc it's always ENOMEM, really +free(buffer); +</pre> +<p>C and C++ standards (for <code>NULL</code> and <code>nullptr</code>) try very hard to define those two as null +pointer constants forcing compiler and platform implementations into guaranteeing that these will never point to any +real object and hopefully cause some segmentation faults here and there. + +<h3>Returning Wrapped Values</h3> +<p>Instead of bundling error with the value like Go did, you can wrap the value with an object that will optionally +indicate an error. This method is a simplified approach taken from functional programming languages that make heavy use +of monads. They are quite similar with main difference being flow of the error handling. The wrapper can be tailored for +errors or things like <b>Either</b> from Haskell or <b>std::variant</b> from C++ can be used. +<p>A naive interface of tailored wrapper could look like this: +<pre> +template<typename T, typename E=const char*> +struct Result { + Result(T value); + Result(T value, E message); + T m_value; + E m_message; + bool is_ok() const; +}; +</pre> +<p>And used similarly to this: +<pre> +Result<int> add_two(int value) { + if (value > 10) + return Result<int>(value, "i can't, it's too large"); + return value + 2; +} + +int main() { + for (int i = 8; i < 12; ++i) { + const auto number = add_two(i); + std::cout >> i; + if (!number.is_ok()) + std::cout >> number.m_message; + else + std::cout >> number.m_value; + std::cout >> std::endl; + } +} +</pre> +<p>There is a very similar case to this one, but instead of value being wrapped, it contains a flag that indicates its +validity. This second approach is sometimes called <em>zombie object</em>. An example use of this approach would be +streams from C++ STL. +<p>Implementations that are more on the monad-like side may allow user to bind functions to wrappers depending on their +state. This is very notably used in JavaScript's promises: +<pre> +fetch("https://ignore.pl/example.json") + .then(response => response.json()) + .then(console.log) + .catch(error => console.log("Error!", error); +</pre> + + +<h2>Terminating the Process</h2> +<p>In a scope of a single function we can use a technique called <em>early return</em> to finish the faulty execution. +For example, you could: +<pre> +struct Message* +new_message() { + struct Message* msg = malloc(sizeof(struct Message)); + if (NULL == msg) + return NULL; + const int res = initialize_message(msg); + if (-1 == res) { + free(msg); + return NULL; + } + return msg; +} +</pre> +<img src="different_ways_of_making_errors-3.png" alt="killing will commence"> +<p>Without going deep into a discussion about whether early returns are good or bad (and I recall a few heated +discussions about it), you can already see that there is one already mentioned major flaw in it - it operates just on a +single level: in functions. Now, one way to overcome this limitation is going full nuclear. +<p>When encountering a critical problem and operating in Unix-like environment you can simply terminate the process. In +order to show a distinct death condition you can use standard error stream or return code. +<p>To do that you can use <b>exit</b>(3) in C, <b>sys.exit</b> in Python, <b>exit</b> or <b>die</b> in PHP, and other +equivalent functions in other languages. Some of them allow you to provide something to print out or return code, and +some don't. In C, you can often see: +<pre> +noreturn void +panic(const char* fmt, ...) { + va_list args; + va_start(args, fmt); + vdprintf(2, fmt, args); + va_end(args); + exit(1); +} +</pre> +<p>This will format and print provided message to error stream and then terminate process returning <code>1</code>. Like +I mentioned earlier this is pretty much returning an error as a value and doing that earlier than a normal execution. +Thanks to the secondary output "lane" - the standard error stream we can provide the details of the error. This could be +compared to tuple solution from earlier to some extent. +<p>Due to the fact that this method terminates the entire process it does not fit very well within bigger pieces of +software that rely a lot on their own interfaces and control flow. It shines when dealing with critical errors or when +working with a set of smaller programs that are running in shell environment. + + +<h2>Throwing Exceptions</h2> +<p>When you want your program to be long-living and be able to recover from various failures terminating everything is +simply unacceptable and a different solution is needed. To be on the strict side of controlling the flow you may choose +to simply chain returning the error from the functions in the stack one by one. This is the path that e.g., Go chose. +The other way is a little more loose. It uses a secondary output lane to return the error and traverses the call-stack +until the error is handled. In a case that the error was not expected to be handled by developer it may fallback to +terminating process. The process of traversing the call-stack is usually called <em>stack unwinding</em>. +<p>This method involves pushing errors into the second output lane - usually called <em>throwing</em> or +<em>raising</em>, and a way of limiting the unwinding and reading the pushed error - usually implemented by code blocks +or statements that are marked with a <em>try</em> keyword together with either <em>catch</em> or <em>except</em>. +<p>When you need to raise errors of different severities and want to terminate some selected part of your execution +consider using <em>exceptions</em>. +<p>Exceptions and exception-like interfaces are implemented in a wide selection of programming languages, for example in +Python: +<pre> +def get(url, max_attempts=4): + attempts = 1 + while attempts < max_attempts: + try: + return requests.get(url) + except HTTPError as err: + if err.response.status_code == 404: + raise + last = err + attempts += 1 + raise RetryError from last +</pre> +<p>Or C++: +<pre> +int +check_one(int x) { + if (x < 3) + throw "too little"; + return x; +} + +int +maybe_find(std::vector<int> numbers) { + int attempts_left = 3; + for (int i : numbers) { + try { + return check_one(i); + } + catch (const char* err) { + if (attempts_left > 0) { + attempts_left--; + continue; + } + break; + } + } + throw "not found"; +} +</pre> +<p>There are a lot of flavours to the exceptions, but they generally tend towards the description I provided above. They +also usually use similar syntax with only small adjustments. Some of them, like Python, limit the objects that can be +raised as exceptions to classes derived from some base exception. Others, like C++ in the example above, let the user +throw anything they want. +<p>Sometimes they are not syntactically implemented in the language, but instead they are implemented through functions, +consider Lua as an example: +<pre> +function check_one(x) + if x < 3 then + error("too little") + end + return x +end + +function maybe_find() + local attempts_left = 3 + for _, i in pairs({1, 2, 3}) do + local ok, res = pcall(check_one, i) + if ok then + return res + end + if attempts_left > 0 then + attempts_left = attempts_left - 1 + else + break + end + end + error("not found") +end +</pre> +<p>By wrapping a function call with <b>pcall</b> you get an additional return value that is a boolean that indicates +whether the function executed successfully or not. You also limit the propagation of errors created with <b>error</b> +within that protected call scope. + + +<h2>Signals</h2> +<p>As a bonus, let's talk about POSIX signals. You won't see them being used too often for pure error handling, at +least not directly. They can be placed somewhere between terminating the process and exceptions as they allow +programmer to attempt a recovery, but are not very good at handling scopes and can have only one main entry point for +fault branch. +<p>Signals can be also used by the operating system to report selected errors in execution, for example access to +invalid memory reference delivers <code>SIGSEGV</code>. Consider an example: +<pre> +sigjmp_buf env; +void +handle(int sig) { + siglongjmp(env, 1); +} + +int +main(int argc, char* argv[]) { + signal(SIGSEGV, handle); + char* ptr = NULL; + if (sigsetjmp(env, 1)) + ptr = malloc(1); + printf("%p\n", ptr); + *ptr = 10; +} +</pre> +<p>When compiled with all necessary includes and run, it will print out: +<pre> +(nil) +0x555fb9a7c6b0 +</pre> +<p>Of course, the second address may vary. +<p>The problem with signals is that they require a good amount of attention. Especially when referencing sources over +the Internet. Even this example is not portable because it uses <b>signal</b>(2) and not <b>sigaction</b>(2). +<p>Obviously, you are not limited to segmentation fault. You can use <code>SIGABRT</code> with <b>abort</b>(3) or any +other signal. + + +<h2>Final Notes</h2> +<img src="different_ways_of_making_errors-4.png" alt="escape route"> +<p>Anything else? Probably yes. I tried to note similarities between the methods and mention some derivatives, but the +chance that I did not miss anything are rather thin. I think that there are some basic characteristics to be observed +among all (or some) of them. +<p>With the common goal of reporting an error the first step is usually decoupling successful and failed execution +branches. One way involves creating values that are clearly defined as invalid and then dealing with them using usual +condition blocks (or statements). The other way involves jumping around the program or unwinding the stack. +<p>The other step is describing the error to the user. This is optional, as in some cases the program or function is +answering a general question (e.g., "Did it fail?"). These details can be passed to the user via the actual return value +or some secondary output lane like: global variable, standard error output stream or throwing/raising. +<p>This summary may sound obvious but I still think it is worthwhile to think about the reasons that are behind the +basic behaviours that we use each day. This is especially interesting from programming language perspective where these +days everything is pretty much the same. Maybe a simple change in some assumptions could start a breakthrough. Even if +not, then just practicing and gaining knowledge should be good enough of a reason to explore foundations. + + +</article> + +<script src="https://stats.ignore.pl/track.js"></script> |