From 6dd6798b14f7fe39a36cbb96348eb23b3e0fbc28 Mon Sep 17 00:00:00 2001
From: Aki <please@ignore.pl>
Date: Sat, 15 Jun 2024 16:27:01 +0200
Subject: Published Sequential Bias

---
 sequential_bias.html | 85 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 85 insertions(+)
 create mode 100644 sequential_bias.html

(limited to 'sequential_bias.html')
diff --git a/sequential_bias.html b/sequential_bias.html
new file mode 100644
index 0000000..e6b73cf
--- /dev/null
+++ b/sequential_bias.html
@@ -0,0 +1,85 @@
+<!doctype html>
+<html lang="en">
+<meta charset="utf-8">
+<meta name="viewport" content="width=device-width, initial-scale=1">
+<meta name="author" content="aki">
+<meta name="tags" content="design, programming">
+<meta name="published-on" content="2024-06-15T16:27:01+02:00">
+<link rel="icon" type="image/png" href="favicon.png">
+<link rel="stylesheet" href="style.css">
+
+<title>Sequential Bias</title>
+
+<header>
+<nav><a href="https://ignore.pl">ignore.pl</a></nav>
+<time>15 June 2024</time>
+<h1>Sequential Bias</h1>
+</header>
+
+<article>
+<p>When dealing with containers we may loosely focus on three major attributes:
+<ul>
+<li><strong>order</strong>, whenever elements or slots are ordered or not
+<li><strong>uniqueness</strong>, whenever one element can show up multiple times
+<li><strong>association</strong>, whenever slots are identifiable by another value
+</ul>
+<p>And so <i>a list</i> could be considered an ordered, non-unique, associative container while <i>a set</i> could be an
+unordered, unique and non-associative one.
+<p>Of course, there are two significant gotchas here. This is a simplified classification that completely forgets to
+consider operations supported by container and a number of abstract containers are defined exactly by that. And let's
+not forget about the structure and implementation details, especially for the very specialized data structures.
+<p>The other gotcha is that in normal conditions we would reserve "associative" for containers like <i>associative
+arrays</i> and not simple lists. Here, I decided to classify it as such, because it is indexed and accessible. <i>A
+stack</i>, where elements are not necessarily accessible unless they are at the top and where indexing is a hidden
+detail would be considered as non-associative in this scenario. In general, I'd consider order, indexing, accessibility,
+and association to play closely together.
+<p>Detailed nature of the classification method, while interesting, is not the topic for today. For today's purpose,
+this simplified approach seems to be coherent and sufficient.</p>
+<img src="sequential_bias-1.png" alt="an a container">
+<p>With that out of the way, let's talk about generating documents. A sudden change, I know.
+<p>In one of projects we were generating quite a lot of documents. In situations where we were handling data that had no
+significant order by itself I noticed I have tendency to sort it before putting it "on paper". Vague reason was that
+this way each revision of the document "felt similar". It felt similar because it was quicker to compare it by hand.
+This was reinforced by the documentation process which heavily relied on redlines and manual reviews. Consistent
+ordering across lifetime of a multi-revision document kept redline size small and easy to skim through.
+<p>But if I represent unsorted data as sorted in two different orders, is the data any different? No, because order does
+not matter. Why do I sort it then? Because it's easier to see differences. And why is that? Because I use tools (wetware
+or software) designed to handle sequential data. Why? Because these tools are commonly available.
+<p>The exact same thing happened when we were migrating data from the very same system into markdown text files tracked
+by git. Since both git and our brains were looking for ordered data, we exported ordered data.
+<p>Now, consider a Python script:
+<pre>
+def do_one_thing():
+	pass
+
+def do_another_thing():
+	pass
+
+def do_both_things():
+	do_one_thing()
+	do_another_thing()
+</pre>
+<p>If we move <code>do_both_things</code> to the top, will the script change? Yes. It's a list of statements. Python
+interprets the code sequentially and the code itself is represented as a text. But will it really change? No. It's a map
+of functions. Python will bind the names the same way in both cases and the behaviour will not change. Both answers can
+be reasoned to be true.
+<p>Of course, if we take a look at the AST, we will confirm that from parser standpoint the first is true. Body of a
+module is a list, so the order matters. This makes sense since Python allows to put behaviour at module level. What if
+we simulate this situation in different languages?
+<p>Take C++ next. One can shuffle things around but only after preparing enough declarations first. Funny enough this
+reinforces both sides equally. If you shuffle things without declarations, they will easily break - meaning the program
+changed. If you prepare all the declarations needed, you can shuffle or split things as much as you want - meaning the
+program does not change.
+<p>In C++ case, we could summarise it as legacy and compatibility reasons. Can I continue to analyse and reason it for
+every single case or will it make sense to use an unordered representation sometimes? How would changing the assumptions
+change the problems, tooling and processes? One way to find out. For programming languages, I currently see one concrete
+thing: semantic diff and merge tools. You may already find those in the wild, but it also sounds like a cool weekend
+project to work on. Moreover, on the more theoretical side of things, I think there are still things to consider
+regarding structure and representation. I'm preparing a post about those, but the summary is "I'm quite stupid."
+<p>Nonetheless, I plan to keep this little bias in mind. Whether it is a result of my brain, tools, legacy or anything
+else. If I perform actions on certain representation of data with all its constraints, I will get results for exactly
+that and not the actual abstract data that is being represented. That's why analysing and representing problems should
+always be on of the priorities.
+<!-- Well, duh... -->
+</article>
+<script src="https://stats.ignore.pl/track.js"></script>
-- 
cgit v1.1