diff options
author | Aki <please@ignore.pl> | 2024-06-15 16:27:01 +0200 |
---|---|---|
committer | Aki <please@ignore.pl> | 2024-06-15 16:27:50 +0200 |
commit | 6dd6798b14f7fe39a36cbb96348eb23b3e0fbc28 (patch) | |
tree | 2e4fb37890d9d1ca48df99f08e2682862393bc63 | |
parent | ac52c685ff4fd28ce7715ed4752819e9e4b3c988 (diff) | |
download | ignore.pl-6dd6798b14f7fe39a36cbb96348eb23b3e0fbc28.zip ignore.pl-6dd6798b14f7fe39a36cbb96348eb23b3e0fbc28.tar.gz ignore.pl-6dd6798b14f7fe39a36cbb96348eb23b3e0fbc28.tar.bz2 |
Published Sequential Bias
-rw-r--r-- | index.html | 2 | ||||
-rw-r--r-- | sequential_bias-1.png | bin | 0 -> 3281 bytes | |||
-rw-r--r-- | sequential_bias.html | 85 |
3 files changed, 87 insertions, 0 deletions
@@ -19,6 +19,8 @@ <section id="posts"> <h2>posts</h2> <ul> +<li> <a href="sequential_bias.html">Sequential Bias</a><br> + <time>15 June 2024</time> <li> <a href="i_made_more_alcohol_in_2023.html">I Made More Alcohol in 2023</a><br> <time>12 May 2024</time> <li> <a href="respect.html">Respect</a><br> diff --git a/sequential_bias-1.png b/sequential_bias-1.png Binary files differnew file mode 100644 index 0000000..d863933 --- /dev/null +++ b/sequential_bias-1.png diff --git a/sequential_bias.html b/sequential_bias.html new file mode 100644 index 0000000..e6b73cf --- /dev/null +++ b/sequential_bias.html @@ -0,0 +1,85 @@ +<!doctype html> +<html lang="en"> +<meta charset="utf-8"> +<meta name="viewport" content="width=device-width, initial-scale=1"> +<meta name="author" content="aki"> +<meta name="tags" content="design, programming"> +<meta name="published-on" content="2024-06-15T16:27:01+02:00"> +<link rel="icon" type="image/png" href="favicon.png"> +<link rel="stylesheet" href="style.css"> + +<title>Sequential Bias</title> + +<header> +<nav><a href="https://ignore.pl">ignore.pl</a></nav> +<time>15 June 2024</time> +<h1>Sequential Bias</h1> +</header> + +<article> +<p>When dealing with containers we may loosely focus on three major attributes: +<ul> +<li><strong>order</strong>, whenever elements or slots are ordered or not +<li><strong>uniqueness</strong>, whenever one element can show up multiple times +<li><strong>association</strong>, whenever slots are identifiable by another value +</ul> +<p>And so <i>a list</i> could be considered an ordered, non-unique, associative container while <i>a set</i> could be an +unordered, unique and non-associative one. +<p>Of course, there are two significant gotchas here. This is a simplified classification that completely forgets to +consider operations supported by container and a number of abstract containers are defined exactly by that. And let's +not forget about the structure and implementation details, especially for the very specialized data structures. +<p>The other gotcha is that in normal conditions we would reserve "associative" for containers like <i>associative +arrays</i> and not simple lists. Here, I decided to classify it as such, because it is indexed and accessible. <i>A +stack</i>, where elements are not necessarily accessible unless they are at the top and where indexing is a hidden +detail would be considered as non-associative in this scenario. In general, I'd consider order, indexing, accessibility, +and association to play closely together. +<p>Detailed nature of the classification method, while interesting, is not the topic for today. For today's purpose, +this simplified approach seems to be coherent and sufficient.</p> +<img src="sequential_bias-1.png" alt="an a container"> +<p>With that out of the way, let's talk about generating documents. A sudden change, I know. +<p>In one of projects we were generating quite a lot of documents. In situations where we were handling data that had no +significant order by itself I noticed I have tendency to sort it before putting it "on paper". Vague reason was that +this way each revision of the document "felt similar". It felt similar because it was quicker to compare it by hand. +This was reinforced by the documentation process which heavily relied on redlines and manual reviews. Consistent +ordering across lifetime of a multi-revision document kept redline size small and easy to skim through. +<p>But if I represent unsorted data as sorted in two different orders, is the data any different? No, because order does +not matter. Why do I sort it then? Because it's easier to see differences. And why is that? Because I use tools (wetware +or software) designed to handle sequential data. Why? Because these tools are commonly available. +<p>The exact same thing happened when we were migrating data from the very same system into markdown text files tracked +by git. Since both git and our brains were looking for ordered data, we exported ordered data. +<p>Now, consider a Python script: +<pre> +def do_one_thing(): + pass + +def do_another_thing(): + pass + +def do_both_things(): + do_one_thing() + do_another_thing() +</pre> +<p>If we move <code>do_both_things</code> to the top, will the script change? Yes. It's a list of statements. Python +interprets the code sequentially and the code itself is represented as a text. But will it really change? No. It's a map +of functions. Python will bind the names the same way in both cases and the behaviour will not change. Both answers can +be reasoned to be true. +<p>Of course, if we take a look at the AST, we will confirm that from parser standpoint the first is true. Body of a +module is a list, so the order matters. This makes sense since Python allows to put behaviour at module level. What if +we simulate this situation in different languages? +<p>Take C++ next. One can shuffle things around but only after preparing enough declarations first. Funny enough this +reinforces both sides equally. If you shuffle things without declarations, they will easily break - meaning the program +changed. If you prepare all the declarations needed, you can shuffle or split things as much as you want - meaning the +program does not change. +<p>In C++ case, we could summarise it as legacy and compatibility reasons. Can I continue to analyse and reason it for +every single case or will it make sense to use an unordered representation sometimes? How would changing the assumptions +change the problems, tooling and processes? One way to find out. For programming languages, I currently see one concrete +thing: semantic diff and merge tools. You may already find those in the wild, but it also sounds like a cool weekend +project to work on. Moreover, on the more theoretical side of things, I think there are still things to consider +regarding structure and representation. I'm preparing a post about those, but the summary is "I'm quite stupid." +<p>Nonetheless, I plan to keep this little bias in mind. Whether it is a result of my brain, tools, legacy or anything +else. If I perform actions on certain representation of data with all its constraints, I will get results for exactly +that and not the actual abstract data that is being represented. That's why analysing and representing problems should +always be on of the priorities. +<!-- Well, duh... --> +</article> +<script src="https://stats.ignore.pl/track.js"></script> |