2020-09-15

fill factor

I learned the other day that indexes in PostgreSQL do not point to individual rows, but to pages. These are fixed-size (usually 8Kb) spaces of memory.

Another thing that I learned was that when performing updates, what PostgreSQL really does is copy the data and then mark the out of date copy of the row as "dead". These dead copies are then cleaned up in an operation called 'vacuuming', which can be scheduled to run periodically or in parallel with the normal operation of the database.

This means that when you update some data in a row that has indexes pointing to some of its fields, you might end up copying that row to another page, which means you also have to update its indexes. Updating indexes gets more and more expensive as the database gets larger.

Fortunately, we can reduce the number of index updates by using a setting called "fill factor". This value determines how much space is left 'free' in each page when new rows are inserted. When a row is updated, its new copy is then inserted into the space 'next to' the old copy, which means that the index does not need to be updated.

How do you determine the right fill factor? As far as I know, this is a function of 1. how often vacuuming occurs 2. how often rows in the table are typically updated. Something that makes this tricky to determine by experimentation in production is that changing the fill factor essentially requires you to completely recreate the contents of the table. However, if you are able to model typical usage patterns then you could run a number of simulations to find an ideal value.

All of this is not particularly interesting unless it is directly useful to you, but I think it illustrates some general properties of software engineering that are important:

writing your own html

If this website looks archaic or slightly ugly, there is a good reason! It is made up of 100% hand-made home-grown HTML. I have some reasons for doing it this way:

[back]