2020-09-15

fill factor

I learned the other day that indexes in PostgreSQL do not point to individual rows, but to pages. These are fixed-size (usually 8Kb) spaces of memory.

Another thing that I learned was that when performing updates, what PostgreSQL really does is copy the data and then mark the out of date copy of the row as "dead". These dead copies are then cleaned up in an operation called 'vacuuming', which can be scheduled to run periodically or in parallel with the normal operation of the database.

This means that when you update some data in a row that has indexes pointing to some of its fields, you might end up copying that row to another page, which means you also have to update its indexes. Updating indexes gets more and more expensive as the database gets larger.

Fortunately, we can reduce the number of index updates by using a setting called "fill factor". This value determines how much space is left 'free' in each page when new rows are inserted. When a row is updated, its new copy is then inserted into the space 'next to' the old copy, which means that the index does not need to be updated.

How do you determine the right fill factor? As far as I know, this is a function of 1. how often vacuuming occurs 2. how often rows in the table are typically updated. Something that makes this tricky to determine by experimentation in production is that changing the fill factor essentially requires you to completely recreate the contents of the table. However, if you are able to model typical usage patterns then you could run a number of simulations to find an ideal value.

All of this is not particularly interesting unless it is directly useful to you, but I think it illustrates some general properties of software engineering that are important:

Knowing how your application will be used 'in the wild' has a direct impact on the design decisions you should take. It is not enough to just think about how concepts or systems are logically related to each other. In order to make something that will run efficiently, you should also consider the patterns or rhythm of usage.
It is worth learning about how the tools you use work at a deeper level. I have used PostgreSQL for several years but I never really looked into how it worked below the surface. I occasionally glimpsed at it by trying to interpret EXPLAIN ANALYZE queries, but in order to interpret them I had to basically take what other people said about "Seq Scan" and "Nested Loop" on faith. I still don't know that much about how this stuff works, but the idea that I could know about it feels really powerful.

writing your own html

If this website looks archaic or slightly ugly, there is a good reason! It is made up of 100% hand-made home-grown HTML. I have some reasons for doing it this way:

Work-life separation. Like a lot of professional software engineers, computing is also my hobby. When I write HTML, I don't have to think about a lot of the things that I think about when I am doing my day job.
Lower stress levels. Compared to working with that produces content for the web (be that a static site generator or single page web app framework), pure HTML does not do anything surprising. This is great, because surprises force me out of the mode of creating/imagining and into the mode of investigating/diagnosing bugs and eccentric behaviour. While I enjoy working in both modes, having to flip between them is kind of stressful.
Longevity. The "Lindy Effect" states that in the absence of any other evidence, the expected lifespan of a non-perishable thing (such as a web technology) is proportional to its current age. If I want to make something that lasts a long time, I should use the part of the Web that has lasted the longest - HTML.
html energy

[back]