Write Code In English Part III

28 November 2021

Most written communication is in natural language, so why wouldn’t we want to try to mirror that in our code?

This series will present small concrete examples that might help your code read less like code, today's topic is...

Table Aliases

There seems to be an unofficial rule that table aliases should be cryptic. I think that’s a lame convention. Here’s a typical select clause - have a think what real life information the query could be returning given the column list:

Races at circuits? Rates for Currencies?? Records and Competitors??? All wrong… Click the spoiler to see the full query.

Full Query

SELECT R.Id, C.Name, R.Start, R.Finish FROM ReservationRequest R INNER JOIN Customer C ON C.Id = R.CustomerId LEFT JOIN Booking B ON B.CustomerId = C.Id WHERE R.Start < GETDATE() AND B.BookingId IS NULL

The query is part of a fictitious hotel’s reporting system - listing reservation requests made by would-be customers that weren’t converted into bookings, so therefore did not make the hotel any money. None of that semantic information is embedded by the table aliases. We tend to avoid single letter identifiers for variables, so why should we treat our SQL queries any differently?

Here’s a version with more descriptive table aliases - easier to understand (in my opinion).

SELECT UnfulfilledReservation.Id, PotentialGuest.Name, UnfulfilledReservation.Start, UnfulfilledReservation.Finish FROM ReservationRequest UnfulfilledReservation INNER JOIN Customer PotentialGuest ON PotentialGuest.CustomerId = UnfulfilledReservation.CustomerId LEFT JOIN Booking ON Booking.CustomerId = PotentialGuest.CustomerId WHERE UnfulfilledReservation.Start < GETDATE() AND Booking.BookingId IS NULL

When quickly scanning the query one can infer more about why it was written, compared to the version with the semantically ambiguous aliases.

This technique is particularly applicable for aliasing tables whose names are generic due to the table storing multiple entities. As a concrete example, consider SQL Server’s sys.database_principals system view that holds information about database users, roles and more. Sometimes we only care about one type of principal, such as only roles. Example listing all roles created over a year ago.

Nice and obvious from our column list alone - we’re selecting data about roles rather than the more generic 'principals'.

Another shining sample usage is (if you’re unlucky enough to be) working with a one true lookup table. Often I have seen queries joining such tables multiple times, where each join to the lookup table is aliased with a different number to ensure a unique alias per join. As an example:

SELECT B.Id, B.Start, B.Finish, L1.Name, L2.Name, L3.Name FROM Booking B INNER JOIN Lookup L1 on L1.Id = ForthcomingBooking.RoomTypeId INNER JOIN Lookup L2 on L2.Id = ForthcomingBooking.StatusId INNER JOIN Lookup L3 on L3.Id = ForthcomingBooking.RoomeSizeId WHERE ForthcomingBooking.Start BETWEEN GETDATE() AND GETDATE() + 7

The selected aliases L1, L2 and L3 provide information about what each join represents in the real world. A reader has to put in the cognitive effort to comprehend ‘L1 is representing the room types, L2 is representing booking status and L3 is representing the room sizes’. R isn’t great either. I wouldn’t want to impose this style on future readers of my code, so I would prefer to see:

SELECT ForthcomingBooking.Id, ForthcomingBooking.Start, ForthcomingBooking.Finish, BookingStatus.Name, RoomType.Name, RoomSize.Name FROM Booking ForthcomingBooking INNER JOIN Lookup RoomType on RoomType.Id = ForthcomingBooking.RoomTypeId INNER JOIN Lookup BookingStatus on BookingStatus.Id = ForthcomingBooking.StatusId INNER JOIN Lookup RoomSize on RoomSize.Id = ForthcomingBooking.RoomeSizeId WHERE ForthcomingBooking.Start BETWEEN GETDATE() AND GETDATE() + 7

It's obvious just by scanning the selected column list this query has something to do with upcoming bookings - a reader can then quickly decide to delve into the query more or skip past it when skim-reading code. There’s almost no cognitive load associating the RoomType alias with ‘Room Types’ compared to associating L1 with ‘Room Types’. The earlier obfuscated version would cause many readers to at least have a think before they can understand the query, which is the last thing anyone wants to be doing.

In summary, using semantic names in table aliases is one more tool on our clean code toolbelts.