Published different ways of making errors

author: Aki <please@ignore.pl> 2022-04-28 19:00:46 +0200
committer: Aki <please@ignore.pl> 2022-04-28 19:00:46 +0200
commit: 59cf5e9bcbd1e4195b680d8dc33c5f52a7cb6560 (patch)
tree: 348f9c1e389ffeac428c098a23f444a52ae4c796 /different_ways_of_making_errors.html
parent: 1486910d47075b65976195af75e904a32a7634aa (diff)
download: ignore.pl-59cf5e9bcbd1e4195b680d8dc33c5f52a7cb6560.zip
ignore.pl-59cf5e9bcbd1e4195b680d8dc33c5f52a7cb6560.tar.gz
ignore.pl-59cf5e9bcbd1e4195b680d8dc33c5f52a7cb6560.tar.bz2
1 files changed, 384 insertions, 0 deletions
diff --git a/different_ways_of_making_errors.html b/different_ways_of_making_errors.html
new file mode 100644
index 0000000..42a039b
--- /dev/null
+++ b/different_ways_of_making_errors.html
@@ -0,0 +1,384 @@
+<!doctype html>
+<html lang="en">
+<meta charset="utf-8">
+<meta name="viewport" content="width=device-width, initial-scale=1">
+<meta name="author" content="aki">
+<meta name="tags" content="programming, tutorial, error handling, errors, exceptions">
+<link rel="icon" type="image/png" href="cylo.png">
+<link rel="stylesheet" href="style.css">
+
+<title>Different Ways of Making Errors</title>
+
+<nav><p><a href="https://ignore.pl">ignore.pl</a></p></nav>
+
+<article>
+<h1>Different Ways of Making Errors</h1>
+<p class="subtitle">Published on 2022-04-28 19:00:00+02:00
+<p>Errors are a key component of writing software - programs, libraries, scripts, you name it. We need to check for
+them, catch them, mitigate, log, and finally create. In this article, I want to give you an overview of various methods
+of that last activity - creating errors - sometimes also called raising or throwing, especially when the errors are
+called exceptions. On that note...
+<p>Yeah, before we dive into the topic, let's make it clear: I'll use "error" and "exception" here almost
+interchangeably. This is because I'm talking here about the abstract case rather than the thing that is used to
+represent it. If you are coming from e.g., Java you may find this confusing, because these two names are used
+differently. On the other hand if you are coming from e.g., Python you might be wondering why am I even writing this
+paragraph.
+<p>Now then, as I was writing this article the classification of these methods evolved quite a few times. I doubt this
+is the final form and depending on the feedback and my future endeavours I hope that I'll continue to make this list
+better.</p>
+<img src="different_ways_of_making_errors-1.png" alt="red flags">
+
+
+<h2>Returning the Error</h2>
+<p>Let's start with something dead simple. In this method we indicate an occurrence of the error simply by returning a
+value from the function or the program. There are different ways of doing that mostly in terms of types and error
+details.
+
+<h3>Returning Boolean</h3>
+<p>I don't think it gets any easier than that. As long as you don't need to return any meaningful value from the
+function and you only want to indicate whether the function passed or failed. In such case this will do. Just make the
+function return a boolean that answers "Did the function pass?" or "Did the function fail?"
+<p>This method is sometimes used together with some bigger state. Especially if the executed function is a member of
+some class. An example of such approach can be found in Qt.
+
+<h3>Returning Error Codes</h3>
+<p>If you still don't need to return any meaningful value, but you want to differentiate between errors, you can encode
+them with non-zero numbers. This time the value answers the question "What error occurred?" where zero means none.
+<p>The most common example of this approach is classic shell, where the result of the last command is stored inside the
+<code>$?</code> variable:
+<pre>
+$ ls real_file
+real_file
+$ echo $?
+0
+$ ls does_not_exist
+ls: cannot access 'does_not_exist': No such file or directory
+$ echo $?
+2
+</pre>
+<p>Interestingly, shells implement if statements where <code>0</code> is interpreted as positive case:
+<pre>
+$ if ls real_file; then
+>    echo "True branch with $?"
+> else
+>    echo "False branch with $?"
+> fi
+real_file
+True branch with 0
+</pre>
+
+<h3>Returning Error Objects</h3>
+<p>But you don't need to use numbers necessarily. The only requirement is that you remember about the ability to
+represent all possible cases, including a situation in which no error occurred.
+<p>The Go programming language is cleverly using its core mechanics to deal with errors: tuples, nil values and
+interfaces. Function that wants to raise an error should return object that fulfills a special <b>error</b>
+interface that requires an <b>Error()</b> method to be present. In case the function does not want to report
+anything, it can just return <code>nil</code> instead. In simplified code it looks like this:
+<pre>
+type TooLargeError int64
+
+func (err TooLargeError) Error() string {
+    return fmt.Sprintf("For reason number is too large: %d", err)
+}
+
+func CheckNumber(value int64) error {
+    if value &gt; 10 {
+        return TooLargeError(value)
+    }
+    return nil
+}
+
+func main() {
+    err := CheckNumber(4)
+    if err != nil {
+        fmt.Println(err)
+    }
+    err = CheckNumber(14)
+    if err != nil {
+        fmt.Println(err)
+    }
+}
+</pre>
+<p>It is worth noting here the difference between shell and go errors in terms of boolean logic. Depending on your
+style, e.g., prevalence of early returns, you may want to consider whether to assign positive or negative boolean value
+to case in which error did not occur. Both are viable.
+
+<h3>Returning an Invalid Value</h3>
+<p>What happens if you want to return a meaningful value from the same function?
+<p>In case of shell the return value is rarely used to store actual result, because that's the usual role of the
+standard output stream. And in the above example of Go, the language has a very good built-in support for handling
+tuples, so a function can just return a nilable error <em>and</em> the desired thing.
+<p>The approach of Go can be used in many other languages, with or without syntactic support, but what if you are
+forced to return a single primitive object from the function?
+<p>Well, you can reverse the Error Codes approach by dedicating one or more from possible values to indicate errors
+with them. Sometimes selecting those values can be straight-forward - for instance when the domain already has an
+invalid space. Consider sizes which are usually represented with zero and positive integers, meaning if you use signed
+integer as return value then you will have all of the negative numbers available to represent errors.</p>
+<img src="different_ways_of_making_errors-2.png" alt="ruler with negative length">
+<p>This is the approach used by <strong>read</strong>(3). When successful the function returns amount of bytes read,
+but on error it returns <code>-1</code> and sets a special global <b>errno</b>(3) to a value that describes what
+exact error occurred:
+<pre>
+char buffer[1024];
+ssize_t bytes = read(fd, buffer, 1024);
+if (bytes < 0)
+    perror("read()");  // Reads errno and prints description of error
+else
+    do_something(buffer, bytes);
+</pre>
+<p>Note that I previously wrote that you can dedicate one <em>or more values</em>. Although I never found confirmation
+in the POSIX standard, the only likely reason of read not using more negative numbers to indicate errors is to have
+consistent interface to retrieve error details. Not all of the functions in the standard have enough available values
+to indicate all the needed errors.
+<p>Anyway, sometimes you have enough values to use but you choose not to use them, and sometimes you may be forced to
+use a single value. Sometimes you may even need to create your own constraints and rules in order to indicate an error.
+An example of that is memory allocation with <b>malloc</b>(3) that returns <code>NULL</code> in case of errors:
+<pre>
+void* buffer = malloc(4096);
+if (NULL == buffer)
+    perror("malloc()");  // In case of malloc it's always ENOMEM, really
+free(buffer);
+</pre>
+<p>C and C++ standards (for <code>NULL</code> and <code>nullptr</code>) try very hard to define those two as null
+pointer constants forcing compiler and platform implementations into guaranteeing that these will never point to any
+real object and hopefully cause some segmentation faults here and there.
+
+<h3>Returning Wrapped Values</h3>
+<p>Instead of bundling error with the value like Go did, you can wrap the value with an object that will optionally
+indicate an error. This method is a simplified approach taken from functional programming languages that make heavy use
+of monads. They are quite similar with main difference being flow of the error handling. The wrapper can be tailored for
+errors or things like <b>Either</b> from Haskell or <b>std::variant</b> from C++ can be used.
+<p>A naive interface of tailored wrapper could look like this:
+<pre>
+template&lt;typename T, typename E=const char*&gt;
+struct Result {
+    Result(T value);
+    Result(T value, E message);
+    T m_value;
+    E m_message;
+    bool is_ok() const;
+};
+</pre>
+<p>And used similarly to this:
+<pre>
+Result&lt;int&gt; add_two(int value) {
+    if (value &gt; 10)
+        return Result&lt;int&gt;(value, "i can't, it's too large");
+    return value + 2;
+}
+
+int main() {
+    for (int i = 8; i &lt; 12; ++i) {
+        const auto number = add_two(i);
+        std::cout &gt;&gt; i;
+        if (!number.is_ok())
+            std::cout &gt;&gt; number.m_message;
+        else
+            std::cout &gt;&gt; number.m_value;
+        std::cout &gt;&gt; std::endl;
+    }
+}
+</pre>
+<p>There is a very similar case to this one, but instead of value being wrapped, it contains a flag that indicates its
+validity. This second approach is sometimes called <em>zombie object</em>. An example use of this approach would be
+streams from C++ STL.
+<p>Implementations that are more on the monad-like side may allow user to bind functions to wrappers depending on their
+state. This is very notably used in JavaScript's promises:
+<pre>
+fetch("https://ignore.pl/example.json")
+    .then(response =&gt; response.json())
+    .then(console.log)
+    .catch(error =&gt; console.log("Error!", error);
+</pre>
+
+
+<h2>Terminating the Process</h2>
+<p>In a scope of a single function we can use a technique called <em>early return</em> to finish the faulty execution.
+For example, you could:
+<pre>
+struct Message*
+new_message() {
+    struct Message* msg = malloc(sizeof(struct Message));
+    if (NULL == msg)
+        return NULL;
+    const int res = initialize_message(msg);
+    if (-1 == res) {
+        free(msg);
+        return NULL;
+    }
+    return msg;
+}
+</pre>
+<img src="different_ways_of_making_errors-3.png" alt="killing will commence">
+<p>Without going deep into a discussion about whether early returns are good or bad (and I recall a few heated
+discussions about it), you can already see that there is one already mentioned major flaw in it - it operates just on a
+single level: in functions. Now, one way to overcome this limitation is going full nuclear.
+<p>When encountering a critical problem and operating in Unix-like environment you can simply terminate the process. In
+order to show a distinct death condition you can use standard error stream or return code.
+<p>To do that you can use <b>exit</b>(3) in C, <b>sys.exit</b> in Python, <b>exit</b> or <b>die</b> in PHP, and other
+equivalent functions in other languages. Some of them allow you to provide something to print out or return code, and
+some don't. In C, you can often see:
+<pre>
+noreturn void
+panic(const char* fmt, ...) {
+    va_list args;
+    va_start(args, fmt);
+    vdprintf(2, fmt, args);
+    va_end(args);
+    exit(1);
+}
+</pre>
+<p>This will format and print provided message to error stream and then terminate process returning <code>1</code>. Like
+I mentioned earlier this is pretty much returning an error as a value and doing that earlier than a normal execution.
+Thanks to the secondary output "lane" - the standard error stream we can provide the details of the error. This could be
+compared to tuple solution from earlier to some extent.
+<p>Due to the fact that this method terminates the entire process it does not fit very well within bigger pieces of
+software that rely a lot on their own interfaces and control flow. It shines when dealing with critical errors or when
+working with a set of smaller programs that are running in shell environment.
+
+
+<h2>Throwing Exceptions</h2>
+<p>When you want your program to be long-living and be able to recover from various failures terminating everything is
+simply unacceptable and a different solution is needed. To be on the strict side of controlling the flow you may choose
+to simply chain returning the error from the functions in the stack one by one. This is the path that e.g., Go chose.
+The other way is a little more loose. It uses a secondary output lane to return the error and traverses the call-stack
+until the error is handled. In a case that the error was not expected to be handled by developer it may fallback to
+terminating process. The process of traversing the call-stack is usually called <em>stack unwinding</em>.
+<p>This method involves pushing errors into the second output lane - usually called <em>throwing</em> or
+<em>raising</em>, and a way of limiting the unwinding and reading the pushed error - usually implemented by code blocks
+or statements that are marked with a <em>try</em> keyword together with either <em>catch</em> or <em>except</em>.
+<p>When you need to raise errors of different severities and want to terminate some selected part of your execution
+consider using <em>exceptions</em>.
+<p>Exceptions and exception-like interfaces are implemented in a wide selection of programming languages, for example in
+Python:
+<pre>
+def get(url, max_attempts=4):
+    attempts = 1
+    while attempts &lt; max_attempts:
+        try:
+            return requests.get(url)
+        except HTTPError as err:
+            if err.response.status_code == 404:
+                raise
+            last = err
+            attempts += 1
+    raise RetryError from last
+</pre>
+<p>Or C++:
+<pre>
+int
+check_one(int x) {
+    if (x &lt; 3)
+        throw "too little";
+    return x;
+}
+
+int
+maybe_find(std::vector&lt;int&gt; numbers) {
+    int attempts_left = 3;
+    for (int i : numbers) {
+        try {
+            return check_one(i);
+        }
+        catch (const char* err) {
+            if (attempts_left &gt; 0) {
+                attempts_left--;
+                continue;
+            }
+            break;
+        }
+    }
+    throw "not found";
+}
+</pre>
+<p>There are a lot of flavours to the exceptions, but they generally tend towards the description I provided above. They
+also usually use similar syntax with only small adjustments. Some of them, like Python, limit the objects that can be
+raised as exceptions to classes derived from some base exception. Others, like C++ in the example above, let the user
+throw anything they want.
+<p>Sometimes they are not syntactically implemented in the language, but instead they are implemented through functions,
+consider Lua as an example:
+<pre>
+function check_one(x)
+    if x &lt; 3 then
+        error("too little")
+    end
+    return x
+end
+
+function maybe_find()
+    local attempts_left = 3
+    for _, i in pairs({1, 2, 3}) do
+        local ok, res = pcall(check_one, i)
+        if ok then
+            return res
+        end
+        if attempts_left &gt; 0 then
+            attempts_left = attempts_left - 1
+        else
+            break
+        end
+    end
+    error("not found")
+end
+</pre>
+<p>By wrapping a function call with <b>pcall</b> you get an additional return value that is a boolean that indicates
+whether the function executed successfully or not. You also limit the propagation of errors created with <b>error</b>
+within that protected call scope.
+
+
+<h2>Signals</h2>
+<p>As a bonus, let's talk about POSIX signals. You won't see them being used too often for pure error handling, at
+least not directly. They can be placed somewhere between terminating the process and exceptions as they allow
+programmer to attempt a recovery, but are not very good at handling scopes and can have only one main entry point for
+fault branch.
+<p>Signals can be also used by the operating system to report selected errors in execution, for example access to
+invalid memory reference delivers <code>SIGSEGV</code>. Consider an example:
+<pre>
+sigjmp_buf env;
+void
+handle(int sig) {
+    siglongjmp(env, 1);
+}
+
+int
+main(int argc, char* argv[]) {
+    signal(SIGSEGV, handle);
+    char* ptr = NULL;
+    if (sigsetjmp(env, 1))
+        ptr = malloc(1);
+    printf("%p\n", ptr);
+    *ptr = 10;
+}
+</pre>
+<p>When compiled with all necessary includes and run, it will print out:
+<pre>
+(nil)
+0x555fb9a7c6b0
+</pre>
+<p>Of course, the second address may vary.
+<p>The problem with signals is that they require a good amount of attention. Especially when referencing sources over
+the Internet. Even this example is not portable because it uses <b>signal</b>(2) and not <b>sigaction</b>(2).
+<p>Obviously, you are not limited to segmentation fault. You can use <code>SIGABRT</code> with <b>abort</b>(3) or any
+other signal.
+
+
+<h2>Final Notes</h2>
+<img src="different_ways_of_making_errors-4.png" alt="escape route">
+<p>Anything else? Probably yes. I tried to note similarities between the methods and mention some derivatives, but the
+chance that I did not miss anything are rather thin. I think that there are some basic characteristics to be observed
+among all (or some) of them.
+<p>With the common goal of reporting an error the first step is usually decoupling successful and failed execution
+branches. One way involves creating values that are clearly defined as invalid and then dealing with them using usual
+condition blocks (or statements). The other way involves jumping around the program or unwinding the stack.
+<p>The other step is describing the error to the user. This is optional, as in some cases the program or function is
+answering a general question (e.g., "Did it fail?"). These details can be passed to the user via the actual return value
+or some secondary output lane like: global variable, standard error output stream or throwing/raising.
+<p>This summary may sound obvious but I still think it is worthwhile to think about the reasons that are behind the
+basic behaviours that we use each day. This is especially interesting from programming language perspective where these
+days everything is pretty much the same. Maybe a simple change in some assumptions could start a breakthrough. Even if
+not, then just practicing and gaining knowledge should be good enough of a reason to explore foundations.
+
+
+</article>
+
+<script src="https://stats.ignore.pl/track.js"></script>
author	Aki <please@ignore.pl>	2022-04-28 19:00:46 +0200
committer	Aki <please@ignore.pl>	2022-04-28 19:00:46 +0200
commit	59cf5e9bcbd1e4195b680d8dc33c5f52a7cb6560 (patch)
tree	348f9c1e389ffeac428c098a23f444a52ae4c796 /different_ways_of_making_errors.html
parent	1486910d47075b65976195af75e904a32a7634aa (diff)
download	ignore.pl-59cf5e9bcbd1e4195b680d8dc33c5f52a7cb6560.zip ignore.pl-59cf5e9bcbd1e4195b680d8dc33c5f52a7cb6560.tar.gz ignore.pl-59cf5e9bcbd1e4195b680d8dc33c5f52a7cb6560.tar.bz2