Adventures with PostgreSQL large objects

Jun 30, 2021

Apart from the well-known TOAST functionality for storing large columns, PostgreSQL also contains a less-known large object facility. An in-depth explanation is found in the documentation, so suffice it to be said that large objects are stored the pg_largeobject and pg_largeobject_metadata system tables. They can be referenced by their oid and PostgreSQL provides a rather comprehensive set of manipulation functionalities, including importing from or exporting to files, random access, etc. That being said, they also come with a significant list of caveats, some of which will be given in more detail below.

Large objects are not automatically cleaned up

Since large objects are separate entities, they will not be removed once they are no longer referenced. In particular this also holds true when using some ORMs (e.g. Hibernate with @Lob), where each update will cause another version of the large object to be created. Over time this can lead to an accumulation of stale objects which need to be manually deleted using something like vacuumlo. To prevent this one needs to implement their own reference counting functionality and delete the large objects once the count reaches zero. A simple trigger for large objects that are only ever referenced by a single foreign key is given below.

create or replace function lo_cleanup()
  returns trigger as $$
declare
  col text;
  old_lo oid;
  new_lo oid;
begin
  col := tg_argv[0];

  if tg_op = 'UPDATE' then
      execute format('select $1.%i', col) into old_lo using old;
      execute format('select $1.%i', col) into new_lo using new;
  elsif tg_op = 'DELETE' then
      execute format('select $1.%i', col) into old_lo using old;
      new_lo := null;
  else
    raise exception 'lo_cleanup not defined for operation %', tg_op;
  end if;

  if (old_lo is distinct from new_lo and old_lo is not null)
  then
    perform lo_unlink(old_lo);
  end if;

  if tg_op = 'UPDATE' then
    return new;
  elsif tg_op = 'DELETE' then
    return old;
  end if;
end;
$$ language plpgsql;


create table tbl(loid oid);

create trigger t_lo_cleanup
before update or delete on tbl
for each row execute procedure lo_cleanup(loid);

Large objects are treated differently by `pg_dump`

When creating a database backup using pg_dump the large objects are read in 16kB chunks using the lo_read function (see pg_dump.c for details). This can become a problem because it introduces additional network roundtrips e.g. a 10MB large object will require 640 roundtrips. This is not a problem, as long as the roundtrip time is very low i.e. when pg_dump is running on the same machine or at the very least on the same local network. However if one were to run pg_dump over the internet (e.g. between AWS datacenters) this will cause it to slow down to a crawl. I currently know of no solution except for either recompiling pg_dump with a larger LOBBUFSIZE or running it on a machine with lower network latency.

Large objects have a separate permission model

Since large objects are not scoped to a table or schema, but rather to the entire database, they require their own permission model. Each large object has an owner and a set of permissions stored in the lomowner and lomacl columns of pg_largeobject_metadata respectively. Permissions from the schema or tables referencing the large objects are not carried over - they have to be granted separately for each large object.

Modifications require a separate lock per object

When updating a large number of large objects in a single transaction one will most likely at some point run into PostgreSQL announcing:

ERROR:  out of shared memory
HINT:  You might need to increase max_locks_per_transaction.

In short this is due to PostgreSQL taking out a lock for every single large object that has been modified without being able to combine those into a table lock. As far as I could tell these locks are always required and cannot be eliminated by manually acquiring table-level locks on pg_largeobject and pg_largeobject_metadata. The only recourse I know of is to either increase max_locks_per_transaction or partition the updates into smaller batches that run in separate transactions.

Large objects are not automatically cleaned up

Large objects are treated differently by pg_dump

Large objects have a separate permission model

Modifications require a separate lock per object

Large objects are treated differently by `pg_dump`