Of Privacy and Traffic Tracking
Published on 2020-07-11 21:11:00+02:00
Over the past weeks I wondered if anyone actually reads or visits this website. I kind of started worrying that
someone could want to leave some feedback in one way or another. I have plans to handle that, but I also have other
interests right now. I decided to quickly set up a method that will show me that there is no need to worry or hurry up.
I deployed the very first version today. I think I spent more time deploying it to the server compared to the amount
of time I put into writing it (systemd had some life problems and I was extremely stubborn to preserve 517 day long
uptime). Anyway, don't expect too much from it.
The goals are quite clear: respect user's privacy and collect useful information. Filter the data as soon as possible
to minimize what is stored. I'm not interested in some big data or hard-core traffic analysis across huge chunks of the
Internet (sup, Google Analytics). I just want to know if there is someone who spent time reading what I wrote.
Ok, so what data do I collect right now?
- Address of the tracked page
- Address of the page that user came from (referrer)
- Time and date when user loaded the tracked page
- Time spent on the tracked page
- The lowest vertical position user scrolled to
That's all. I don't collect any form of identification. Data that is stored is not even linked to the IP address that
sent it over. That's the point.
In future I would like to minimize data collection even further. I already mentioned early filtering, but there are
also some other improvements I would like to have. The approaches are quite naive. For example, time user spent looking
at the page is calculated as time from load
event to beforeunload
event.
Source code is available via public git repository.