summaryrefslogtreecommitdiff
path: root/different_ways_of_reporting_errors.html
blob: b4e693b30660dd831c290602dd221a108e0a87bf (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
<!doctype html>
<html lang="en">
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1">
<meta name="author" content="aki">
<meta name="tags" content="programming, tutorial, error handling, errors, exceptions">
<meta name="published-on" content="2022-04-28T19:00:00+02:00">
<meta name="last-modified-on" content="2022-06-17T22:27:00+02:00">
<link rel="icon" type="image/png" href="favicon.png">
<link rel="stylesheet" href="style.css">

<title>Different Ways of Reporting Errors</title>

<header>
<nav><a href="https://ignore.pl">ignore.pl</a></nav>
<time>28 April 2022</time>
<h1>Different Ways of Reporting Errors</h1>
</header>

<article>
<p>Errors are a key component of writing software - programs, libraries, scripts, you name it. We need to check for
them, catch them, mitigate, log, and finally create. In this article, I want to give you an overview of various methods
of that last activity - creating errors - sometimes also called raising or throwing, especially when the errors are
called exceptions. On that note...
<p>Yeah, before we dive into the topic, let's make it clear: I'll use "error" and "exception" here almost
interchangeably. This is because I'm talking here about the abstract case rather than the thing that is used to
represent it. If you are coming from e.g., Java you may find this confusing, because these two names are used
differently. On the other hand if you are coming from e.g., Python you might be wondering why am I even writing this
paragraph.
<p>Now then, as I was writing this article the classification of these methods evolved quite a few times. I doubt this
is the final form and depending on the feedback and my future endeavours I hope that I'll continue to make this list
better.</p>
<img src="different_ways_of_reporting_errors-1.png" alt="red flags">


<h2>Returning the Error</h2>
<p>Let's start with something dead simple. In this method we indicate an occurrence of the error simply by returning a
value from the function or the program. There are different ways of doing that mostly in terms of types and error
details.

<h3>Returning Boolean</h3>
<p>I don't think it gets any easier than that. As long as you don't need to return any meaningful value from the
function and you only want to indicate whether the function passed or failed. In such case this will do. Just make the
function return a boolean that answers "Did the function pass?" or "Did the function fail?"
<p>This method is sometimes used together with some bigger state. Especially if the executed function is a member of
some class. An example of such approach can be found in Qt.

<h3>Returning Error Codes</h3>
<p>If you still don't need to return any meaningful value, but you want to differentiate between errors, you can encode
them with non-zero numbers. This time the value answers the question "What error occurred?" where zero means none.
<p>The most common example of this approach is classic shell, where the result of the last command is stored inside the
<code>$?</code> variable:
<pre>
$ ls real_file
real_file
$ echo $?
0
$ ls does_not_exist
ls: cannot access 'does_not_exist': No such file or directory
$ echo $?
2
</pre>
<p>Interestingly, shells implement if statements where <code>0</code> is interpreted as positive case:
<pre>
$ if ls real_file; then
>    echo "True branch with $?"
> else
>    echo "False branch with $?"
> fi
real_file
True branch with 0
</pre>
<p>Extreme example of throwing raw error codes at end-users is Windows and its API. I'd encourage you to avoid going to
such lengths.

<h3>Returning Error Objects</h3>
<p>But you don't need to use numbers necessarily. The only requirement is that you remember about the ability to
represent all possible cases, including a situation in which no error occurred.
<p>The Go programming language is cleverly using its core mechanics to deal with errors: tuples, nil values and
interfaces. Function that wants to raise an error should return object that fulfills a special <b>error</b>
interface that requires an <b>Error()</b> method to be present. In case the function does not want to report
anything, it can just return <code>nil</code> instead. In simplified code it looks like this:
<pre>
type TooLargeError int64

func (err TooLargeError) Error() string {
    return fmt.Sprintf("For reason number is too large: %d", err)
}

func CheckNumber(value int64) error {
    if value &gt; 10 {
        return TooLargeError(value)
    }
    return nil
}

func main() {
    err := CheckNumber(4)
    if err != nil {
        fmt.Println(err)
    }
    err = CheckNumber(14)
    if err != nil {
        fmt.Println(err)
    }
}
</pre>
<p>It is worth noting here the difference between shell and go errors in terms of boolean logic. Depending on your
style, e.g., prevalence of early returns, you may want to consider whether to assign positive or negative boolean value
to case in which error did not occur. Both are viable.

<h3>Returning an Invalid Value</h3>
<p>What happens if you want to return a meaningful value from the same function?
<p>In case of shell the return value is rarely used to store actual result, because that's the usual role of the
standard output stream. And in the above example of Go, the language has a very good built-in support for handling
tuples, so a function can just return a nilable error <em>and</em> the desired thing.
<p>The approach of Go can be used in many other languages, with or without syntactic support, but what if you are
forced to return a single primitive object from the function?
<p>Well, you can reverse the Error Codes approach by dedicating one or more from possible values to indicate errors
with them. Sometimes selecting those values can be straight-forward - for instance when the domain already has an
invalid space. Consider sizes which are usually represented with zero and positive integers, meaning if you use signed
integer as return value then you will have all of the negative numbers available to represent errors.</p>
<img src="different_ways_of_reporting_errors-2.png" alt="ruler with negative length">
<p>This is the approach used by <strong>read</strong>(3). When successful the function returns amount of bytes read,
but on error it returns <code>-1</code> and sets a special global <b>errno</b>(3) to a value that describes what
exact error occurred:
<pre>
char buffer[1024];
ssize_t bytes = read(fd, buffer, 1024);
if (bytes < 0)
    perror("read()");  // Reads errno and prints description of error
else
    do_something(buffer, bytes);
</pre>
<p>Note that I previously wrote that you can dedicate one <em>or more values</em>. Although I never found confirmation
in the POSIX standard, the only likely reason of read not using more negative numbers to indicate errors is to have
consistent interface to retrieve error details. Not all of the functions in the standard have enough available values
to indicate all the needed errors.
<p>Anyway, sometimes you have enough values to use but you choose not to use them, and sometimes you may be forced to
use a single value. Sometimes you may even need to create your own constraints and rules in order to indicate an error.
An example of that is memory allocation with <b>malloc</b>(3) that returns <code>NULL</code> in case of errors:
<pre>
void* buffer = malloc(4096);
if (NULL == buffer)
    perror("malloc()");  // In case of malloc it's always ENOMEM, really
free(buffer);
</pre>
<p>C and C++ standards (for <code>NULL</code> and <code>nullptr</code>) try very hard to define those two as null
pointer constants forcing compiler and platform implementations into guaranteeing that these will never point to any
real object and hopefully cause some segmentation faults here and there.

<h3>Returning Wrapped Values</h3>
<p>Instead of bundling error with the value in tuple or some other container like Go did, you can wrap the value with an
object that will optionally indicate the error. This method may vary from simplified wrapper to a full-pledged monad.
Depending on where you end up on this spectrum the main difference will be the flow of error handling. You can use
tailored wrappers or something more generic like <b>Either</b> from Haskell or <b>std::variant</b> from C++.
<p>A naive interface of tailored wrapper could look like this:
<pre>
template&lt;typename T, typename E=const char*&gt;
struct Result {
    Result(T value);
    Result(T value, E message);
    T m_value;
    E m_message;
    bool is_ok() const;
};
</pre>
<p>And used similarly to this:
<pre>
Result&lt;int&gt; add_two(int value) {
    if (value &gt; 10)
        return Result&lt;int&gt;(value, "i can't, it's too large");
    return value + 2;
}

int main() {
    for (int i = 8; i &lt; 12; ++i) {
        const auto number = add_two(i);
        std::cout &lt;&lt; i;
        if (!number.is_ok())
            std::cout &lt;&lt; number.m_message;
        else
            std::cout &lt;&lt; number.m_value;
        std::cout &lt;&lt; std::endl;
    }
}
</pre>
<p>There is a very similar case to this one, but instead of value being wrapped, it contains a flag that indicates its
validity. This second approach is sometimes called <em>zombie object</em>. An example use of this approach would be
streams from C++ STL.
<p>Implementations that are more on the monad-like side may allow user to bind functions to wrappers depending on their
state. This is very notably used in JavaScript's promises:
<pre>
fetch("https://ignore.pl/example.json")
    .then(response =&gt; response.json())
    .then(console.log)
    .catch(error =&gt; console.log("Error!", error);
</pre>


<h2>Terminating the Process</h2>
<p>In a scope of a single function we can use a technique called <em>early return</em> to finish the faulty execution.
For example, you could:
<pre>
struct Message*
new_message() {
    struct Message* msg = malloc(sizeof(struct Message));
    if (NULL == msg)
        return NULL;
    const int res = initialize_message(msg);
    if (-1 == res) {
        free(msg);
        return NULL;
    }
    return msg;
}
</pre>
<img src="different_ways_of_reporting_errors-3.png" alt="killing will commence">
<p>Without going deep into a discussion about whether early returns are good or bad (and I recall a few heated
discussions about it), you can already see that there is one already mentioned major flaw in it - it operates just on a
single level: in functions. Now, one way to overcome this limitation is going full nuclear.
<p>When encountering a critical problem and operating in Unix-like environment you can simply terminate the process. In
order to show a distinct death condition you can use standard error stream or return code.
<p>To do that you can use <b>exit</b>(3) in C, <b>sys.exit</b> in Python, <b>exit</b> or <b>die</b> in PHP, and other
equivalent functions in other languages. Some of them allow you to provide something to print out or return code, and
some don't. In C, you can often see:
<pre>
noreturn void
panic(const char* fmt, ...) {
    va_list args;
    va_start(args, fmt);
    vdprintf(2, fmt, args);
    va_end(args);
    exit(1);
}
</pre>
<p>This will format and print provided message to error stream and then terminate process returning <code>1</code>. Like
I mentioned earlier this is pretty much returning an error as a value and doing that earlier than a normal execution.
Thanks to the secondary output "lane" - the standard error stream we can provide the details of the error. This could be
compared to tuple solution from earlier to some extent.
<p>Due to the fact that this method terminates the entire process it does not fit very well within bigger pieces of
software that rely a lot on their own interfaces and control flow. It shines when dealing with critical errors or when
working with a set of smaller programs that are running in shell environment.


<h2>Throwing Exceptions</h2>
<p>When you want your program to be long-living and be able to recover from various failures terminating everything is
simply unacceptable and a different solution is needed. To be on the strict side of controlling the flow you may choose
to simply chain returning the error from the functions in the stack one by one. This is the path that e.g., Go chose.
The other way is a little more loose. It uses a secondary output lane to return the error and traverses the call-stack
until the error is handled. In a case that the error was not expected to be handled by developer it may fallback to
terminating process. The process of traversing the call-stack is usually called <em>stack unwinding</em>.
<p>This method involves pushing errors into the second output lane - usually called <em>throwing</em> or
<em>raising</em>, and a way of limiting the unwinding and reading the pushed error - usually implemented by code blocks
or statements that are marked with a <em>try</em> keyword together with either <em>catch</em> or <em>except</em>.
<p>When you need to raise errors of different severities and want to terminate some selected part of your execution
consider using <em>exceptions</em>.
<p>Exceptions and exception-like interfaces are implemented in a wide selection of programming languages, for example in
Python:
<pre>
def get(url, max_attempts=4):
    attempts = 1
    while attempts &lt; max_attempts:
        try:
            return requests.get(url)
        except HTTPError as err:
            if err.response.status_code == 404:
                raise
            last = err
            attempts += 1
    raise RetryError from last
</pre>
<p>Or C++:
<pre>
int
check_one(int x) {
    if (x &lt; 3)
        throw "too little";
    return x;
}

int
maybe_find(std::vector&lt;int&gt; numbers) {
    int attempts_left = 3;
    for (int i : numbers) {
        try {
            return check_one(i);
        }
        catch (const char* err) {
            if (attempts_left &gt; 0) {
                attempts_left--;
                continue;
            }
            break;
        }
    }
    throw "not found";
}
</pre>
<p>There are a lot of flavours to the exceptions, but they generally tend towards the description I provided above. They
also usually use similar syntax with only small adjustments. Some of them, like Python, limit the objects that can be
raised as exceptions to classes derived from some base exception. Others, like C++ in the example above, let the user
throw anything they want.
<p>Sometimes they are not syntactically implemented in the language, but instead they are implemented through functions,
consider Lua as an example:
<pre>
function check_one(x)
    if x &lt; 3 then
        error("too little")
    end
    return x
end

function maybe_find()
    local attempts_left = 3
    for _, i in pairs({1, 2, 3}) do
        local ok, res = pcall(check_one, i)
        if ok then
            return res
        end
        if attempts_left &gt; 0 then
            attempts_left = attempts_left - 1
        else
            break
        end
    end
    error("not found")
end
</pre>
<p>By wrapping a function call with <b>pcall</b> you get an additional return value that is a boolean that indicates
whether the function executed successfully or not. You also limit the propagation of errors created with <b>error</b>
within that protected call scope.


<h2>Signals</h2>
<p>As a bonus, let's talk about POSIX signals. You won't see them being used too often for pure error handling, at
least not directly. They can be placed somewhere between terminating the process and exceptions as they allow
programmer to attempt a recovery, but are not very good at handling scopes and can have only one main entry point for
fault branch.
<p>Signals can be also used by the operating system to report selected errors in execution, for example access to
invalid memory reference delivers <code>SIGSEGV</code>. Consider an example:
<pre>
sigjmp_buf env;
void
handle(int sig) {
    siglongjmp(env, 1);
}

int
main(int argc, char* argv[]) {
    signal(SIGSEGV, handle);
    char* ptr = NULL;
    if (sigsetjmp(env, 1))
        ptr = malloc(1);
    printf("%p\n", ptr);
    *ptr = 10;
}
</pre>
<p>When compiled with all necessary includes and run, it will print out:
<pre>
(nil)
0x555fb9a7c6b0
</pre>
<p>Of course, the second address may vary.
<p>The problem with signals is that they require a good amount of attention. Especially when referencing sources over
the Internet. Even this example is not portable because it uses <b>signal</b>(2) and not <b>sigaction</b>(2).
<p>Obviously, you are not limited to segmentation fault. You can use <code>SIGABRT</code> with <b>abort</b>(3) or any
other signal.


<h2>Final Notes</h2>
<img src="different_ways_of_reporting_errors-4.png" alt="escape route">
<p>Anything else? Probably yes. I tried to note similarities between the methods and mention some derivatives, but the
chance that I did not miss anything are rather thin. I think that there are some basic characteristics to be observed
among all (or some) of them.
<p>With the common goal of reporting an error the first step is usually decoupling successful and failed execution
branches. One way involves creating values that are clearly defined as invalid and then dealing with them using usual
condition blocks (or statements). The other way involves jumping around the program or unwinding the stack.
<p>The other step is describing the error to the user. This is optional, as in some cases the program or function is
answering a general question (e.g., "Did it fail?"). These details can be passed to the user via the actual return value
or some secondary output lane like: global variable, standard error output stream or throwing/raising.
<p>This summary may sound obvious but I still think it is worthwhile to think about the reasons that are behind the
basic behaviours that we use each day. This is especially interesting from programming language perspective where these
days everything is pretty much the same. Maybe a simple change in some assumptions could start a breakthrough. Even if
not, then just practicing and gaining knowledge should be good enough of a reason to explore foundations.


</article>

<script src="https://stats.ignore.pl/track.js"></script>