path: root/different_ways_of_reporting_errors.html
diff options
Diffstat (limited to 'different_ways_of_reporting_errors.html')
1 files changed, 386 insertions, 0 deletions
diff --git a/different_ways_of_reporting_errors.html b/different_ways_of_reporting_errors.html
new file mode 100644
index 0000000..4e8d07c
--- /dev/null
+++ b/different_ways_of_reporting_errors.html
@@ -0,0 +1,386 @@
+<!doctype html>
+<html lang="en">
+<meta charset="utf-8">
+<meta name="viewport" content="width=device-width, initial-scale=1">
+<meta name="author" content="aki">
+<meta name="tags" content="programming, tutorial, error handling, errors, exceptions">
+<link rel="icon" type="image/png" href="cylo.png">
+<link rel="stylesheet" href="style.css">
+<title>Different Ways of Reporting Errors</title>
+<nav><p><a href=""></a></p></nav>
+<h1>Different Ways of Reporting Errors</h1>
+<p class="subtitle">Published on 2022-04-28 19:00:00+02:00, last modified on 2022-06-17 22:27:00+02:00
+<p>Errors are a key component of writing software - programs, libraries, scripts, you name it. We need to check for
+them, catch them, mitigate, log, and finally create. In this article, I want to give you an overview of various methods
+of that last activity - creating errors - sometimes also called raising or throwing, especially when the errors are
+called exceptions. On that note...
+<p>Yeah, before we dive into the topic, let's make it clear: I'll use "error" and "exception" here almost
+interchangeably. This is because I'm talking here about the abstract case rather than the thing that is used to
+represent it. If you are coming from e.g., Java you may find this confusing, because these two names are used
+differently. On the other hand if you are coming from e.g., Python you might be wondering why am I even writing this
+<p>Now then, as I was writing this article the classification of these methods evolved quite a few times. I doubt this
+is the final form and depending on the feedback and my future endeavours I hope that I'll continue to make this list
+<img src="different_ways_of_reporting_errors-1.png" alt="red flags">
+<h2>Returning the Error</h2>
+<p>Let's start with something dead simple. In this method we indicate an occurrence of the error simply by returning a
+value from the function or the program. There are different ways of doing that mostly in terms of types and error
+<h3>Returning Boolean</h3>
+<p>I don't think it gets any easier than that. As long as you don't need to return any meaningful value from the
+function and you only want to indicate whether the function passed or failed. In such case this will do. Just make the
+function return a boolean that answers "Did the function pass?" or "Did the function fail?"
+<p>This method is sometimes used together with some bigger state. Especially if the executed function is a member of
+some class. An example of such approach can be found in Qt.
+<h3>Returning Error Codes</h3>
+<p>If you still don't need to return any meaningful value, but you want to differentiate between errors, you can encode
+them with non-zero numbers. This time the value answers the question "What error occurred?" where zero means none.
+<p>The most common example of this approach is classic shell, where the result of the last command is stored inside the
+<code>$?</code> variable:
+$ ls real_file
+$ echo $?
+$ ls does_not_exist
+ls: cannot access 'does_not_exist': No such file or directory
+$ echo $?
+<p>Interestingly, shells implement if statements where <code>0</code> is interpreted as positive case:
+$ if ls real_file; then
+> echo "True branch with $?"
+> else
+> echo "False branch with $?"
+> fi
+True branch with 0
+<p>Extreme example of throwing raw error codes at end-users is Windows and its API. I'd encourage you to avoid going to
+such lengths.
+<h3>Returning Error Objects</h3>
+<p>But you don't need to use numbers necessarily. The only requirement is that you remember about the ability to
+represent all possible cases, including a situation in which no error occurred.
+<p>The Go programming language is cleverly using its core mechanics to deal with errors: tuples, nil values and
+interfaces. Function that wants to raise an error should return object that fulfills a special <b>error</b>
+interface that requires an <b>Error()</b> method to be present. In case the function does not want to report
+anything, it can just return <code>nil</code> instead. In simplified code it looks like this:
+type TooLargeError int64
+func (err TooLargeError) Error() string {
+ return fmt.Sprintf("For reason number is too large: %d", err)
+func CheckNumber(value int64) error {
+ if value &gt; 10 {
+ return TooLargeError(value)
+ }
+ return nil
+func main() {
+ err := CheckNumber(4)
+ if err != nil {
+ fmt.Println(err)
+ }
+ err = CheckNumber(14)
+ if err != nil {
+ fmt.Println(err)
+ }
+<p>It is worth noting here the difference between shell and go errors in terms of boolean logic. Depending on your
+style, e.g., prevalence of early returns, you may want to consider whether to assign positive or negative boolean value
+to case in which error did not occur. Both are viable.
+<h3>Returning an Invalid Value</h3>
+<p>What happens if you want to return a meaningful value from the same function?
+<p>In case of shell the return value is rarely used to store actual result, because that's the usual role of the
+standard output stream. And in the above example of Go, the language has a very good built-in support for handling
+tuples, so a function can just return a nilable error <em>and</em> the desired thing.
+<p>The approach of Go can be used in many other languages, with or without syntactic support, but what if you are
+forced to return a single primitive object from the function?
+<p>Well, you can reverse the Error Codes approach by dedicating one or more from possible values to indicate errors
+with them. Sometimes selecting those values can be straight-forward - for instance when the domain already has an
+invalid space. Consider sizes which are usually represented with zero and positive integers, meaning if you use signed
+integer as return value then you will have all of the negative numbers available to represent errors.</p>
+<img src="different_ways_of_reporting_errors-2.png" alt="ruler with negative length">
+<p>This is the approach used by <strong>read</strong>(3). When successful the function returns amount of bytes read,
+but on error it returns <code>-1</code> and sets a special global <b>errno</b>(3) to a value that describes what
+exact error occurred:
+char buffer[1024];
+ssize_t bytes = read(fd, buffer, 1024);
+if (bytes < 0)
+ perror("read()"); // Reads errno and prints description of error
+ do_something(buffer, bytes);
+<p>Note that I previously wrote that you can dedicate one <em>or more values</em>. Although I never found confirmation
+in the POSIX standard, the only likely reason of read not using more negative numbers to indicate errors is to have
+consistent interface to retrieve error details. Not all of the functions in the standard have enough available values
+to indicate all the needed errors.
+<p>Anyway, sometimes you have enough values to use but you choose not to use them, and sometimes you may be forced to
+use a single value. Sometimes you may even need to create your own constraints and rules in order to indicate an error.
+An example of that is memory allocation with <b>malloc</b>(3) that returns <code>NULL</code> in case of errors:
+void* buffer = malloc(4096);
+if (NULL == buffer)
+ perror("malloc()"); // In case of malloc it's always ENOMEM, really
+<p>C and C++ standards (for <code>NULL</code> and <code>nullptr</code>) try very hard to define those two as null
+pointer constants forcing compiler and platform implementations into guaranteeing that these will never point to any
+real object and hopefully cause some segmentation faults here and there.
+<h3>Returning Wrapped Values</h3>
+<p>Instead of bundling error with the value in tuple or some other container like Go did, you can wrap the value with an
+object that will optionally indicate the error. This method may vary from simplified wrapper to a full-pledged monad.
+Depending on where you end up on this spectrum the main difference will be the flow of error handling. You can use
+tailored wrappers or something more generic like <b>Either</b> from Haskell or <b>std::variant</b> from C++.
+<p>A naive interface of tailored wrapper could look like this:
+template&lt;typename T, typename E=const char*&gt;
+struct Result {
+ Result(T value);
+ Result(T value, E message);
+ T m_value;
+ E m_message;
+ bool is_ok() const;
+<p>And used similarly to this:
+Result&lt;int&gt; add_two(int value) {
+ if (value &gt; 10)
+ return Result&lt;int&gt;(value, "i can't, it's too large");
+ return value + 2;
+int main() {
+ for (int i = 8; i &lt; 12; ++i) {
+ const auto number = add_two(i);
+ std::cout &lt;&lt; i;
+ if (!number.is_ok())
+ std::cout &lt;&lt; number.m_message;
+ else
+ std::cout &lt;&lt; number.m_value;
+ std::cout &lt;&lt; std::endl;
+ }
+<p>There is a very similar case to this one, but instead of value being wrapped, it contains a flag that indicates its
+validity. This second approach is sometimes called <em>zombie object</em>. An example use of this approach would be
+streams from C++ STL.
+<p>Implementations that are more on the monad-like side may allow user to bind functions to wrappers depending on their
+state. This is very notably used in JavaScript's promises:
+ .then(response =&gt; response.json())
+ .then(console.log)
+ .catch(error =&gt; console.log("Error!", error);
+<h2>Terminating the Process</h2>
+<p>In a scope of a single function we can use a technique called <em>early return</em> to finish the faulty execution.
+For example, you could:
+struct Message*
+new_message() {
+ struct Message* msg = malloc(sizeof(struct Message));
+ if (NULL == msg)
+ return NULL;
+ const int res = initialize_message(msg);
+ if (-1 == res) {
+ free(msg);
+ return NULL;
+ }
+ return msg;
+<img src="different_ways_of_reporting_errors-3.png" alt="killing will commence">
+<p>Without going deep into a discussion about whether early returns are good or bad (and I recall a few heated
+discussions about it), you can already see that there is one already mentioned major flaw in it - it operates just on a
+single level: in functions. Now, one way to overcome this limitation is going full nuclear.
+<p>When encountering a critical problem and operating in Unix-like environment you can simply terminate the process. In
+order to show a distinct death condition you can use standard error stream or return code.
+<p>To do that you can use <b>exit</b>(3) in C, <b>sys.exit</b> in Python, <b>exit</b> or <b>die</b> in PHP, and other
+equivalent functions in other languages. Some of them allow you to provide something to print out or return code, and
+some don't. In C, you can often see:
+noreturn void
+panic(const char* fmt, ...) {
+ va_list args;
+ va_start(args, fmt);
+ vdprintf(2, fmt, args);
+ va_end(args);
+ exit(1);
+<p>This will format and print provided message to error stream and then terminate process returning <code>1</code>. Like
+I mentioned earlier this is pretty much returning an error as a value and doing that earlier than a normal execution.
+Thanks to the secondary output "lane" - the standard error stream we can provide the details of the error. This could be
+compared to tuple solution from earlier to some extent.
+<p>Due to the fact that this method terminates the entire process it does not fit very well within bigger pieces of
+software that rely a lot on their own interfaces and control flow. It shines when dealing with critical errors or when
+working with a set of smaller programs that are running in shell environment.
+<h2>Throwing Exceptions</h2>
+<p>When you want your program to be long-living and be able to recover from various failures terminating everything is
+simply unacceptable and a different solution is needed. To be on the strict side of controlling the flow you may choose
+to simply chain returning the error from the functions in the stack one by one. This is the path that e.g., Go chose.
+The other way is a little more loose. It uses a secondary output lane to return the error and traverses the call-stack
+until the error is handled. In a case that the error was not expected to be handled by developer it may fallback to
+terminating process. The process of traversing the call-stack is usually called <em>stack unwinding</em>.
+<p>This method involves pushing errors into the second output lane - usually called <em>throwing</em> or
+<em>raising</em>, and a way of limiting the unwinding and reading the pushed error - usually implemented by code blocks
+or statements that are marked with a <em>try</em> keyword together with either <em>catch</em> or <em>except</em>.
+<p>When you need to raise errors of different severities and want to terminate some selected part of your execution
+consider using <em>exceptions</em>.
+<p>Exceptions and exception-like interfaces are implemented in a wide selection of programming languages, for example in
+def get(url, max_attempts=4):
+ attempts = 1
+ while attempts &lt; max_attempts:
+ try:
+ return requests.get(url)
+ except HTTPError as err:
+ if err.response.status_code == 404:
+ raise
+ last = err
+ attempts += 1
+ raise RetryError from last
+<p>Or C++:
+check_one(int x) {
+ if (x &lt; 3)
+ throw "too little";
+ return x;
+maybe_find(std::vector&lt;int&gt; numbers) {
+ int attempts_left = 3;
+ for (int i : numbers) {
+ try {
+ return check_one(i);
+ }
+ catch (const char* err) {
+ if (attempts_left &gt; 0) {
+ attempts_left--;
+ continue;
+ }
+ break;
+ }
+ }
+ throw "not found";
+<p>There are a lot of flavours to the exceptions, but they generally tend towards the description I provided above. They
+also usually use similar syntax with only small adjustments. Some of them, like Python, limit the objects that can be
+raised as exceptions to classes derived from some base exception. Others, like C++ in the example above, let the user
+throw anything they want.
+<p>Sometimes they are not syntactically implemented in the language, but instead they are implemented through functions,
+consider Lua as an example:
+function check_one(x)
+ if x &lt; 3 then
+ error("too little")
+ end
+ return x
+function maybe_find()
+ local attempts_left = 3
+ for _, i in pairs({1, 2, 3}) do
+ local ok, res = pcall(check_one, i)
+ if ok then
+ return res
+ end
+ if attempts_left &gt; 0 then
+ attempts_left = attempts_left - 1
+ else
+ break
+ end
+ end
+ error("not found")
+<p>By wrapping a function call with <b>pcall</b> you get an additional return value that is a boolean that indicates
+whether the function executed successfully or not. You also limit the propagation of errors created with <b>error</b>
+within that protected call scope.
+<p>As a bonus, let's talk about POSIX signals. You won't see them being used too often for pure error handling, at
+least not directly. They can be placed somewhere between terminating the process and exceptions as they allow
+programmer to attempt a recovery, but are not very good at handling scopes and can have only one main entry point for
+fault branch.
+<p>Signals can be also used by the operating system to report selected errors in execution, for example access to
+invalid memory reference delivers <code>SIGSEGV</code>. Consider an example:
+sigjmp_buf env;
+handle(int sig) {
+ siglongjmp(env, 1);
+main(int argc, char* argv[]) {
+ signal(SIGSEGV, handle);
+ char* ptr = NULL;
+ if (sigsetjmp(env, 1))
+ ptr = malloc(1);
+ printf("%p\n", ptr);
+ *ptr = 10;
+<p>When compiled with all necessary includes and run, it will print out:
+<p>Of course, the second address may vary.
+<p>The problem with signals is that they require a good amount of attention. Especially when referencing sources over
+the Internet. Even this example is not portable because it uses <b>signal</b>(2) and not <b>sigaction</b>(2).
+<p>Obviously, you are not limited to segmentation fault. You can use <code>SIGABRT</code> with <b>abort</b>(3) or any
+other signal.
+<h2>Final Notes</h2>
+<img src="different_ways_of_reporting_errors-4.png" alt="escape route">
+<p>Anything else? Probably yes. I tried to note similarities between the methods and mention some derivatives, but the
+chance that I did not miss anything are rather thin. I think that there are some basic characteristics to be observed
+among all (or some) of them.
+<p>With the common goal of reporting an error the first step is usually decoupling successful and failed execution
+branches. One way involves creating values that are clearly defined as invalid and then dealing with them using usual
+condition blocks (or statements). The other way involves jumping around the program or unwinding the stack.
+<p>The other step is describing the error to the user. This is optional, as in some cases the program or function is
+answering a general question (e.g., "Did it fail?"). These details can be passed to the user via the actual return value
+or some secondary output lane like: global variable, standard error output stream or throwing/raising.
+<p>This summary may sound obvious but I still think it is worthwhile to think about the reasons that are behind the
+basic behaviours that we use each day. This is especially interesting from programming language perspective where these
+days everything is pretty much the same. Maybe a simple change in some assumptions could start a breakthrough. Even if
+not, then just practicing and gaining knowledge should be good enough of a reason to explore foundations.
+<script src=""></script>