Saturday, December 29, 2018

The insane OOM (out of memory) Killer

In the late nineties I worked on AIX for the first time. Back in those days there were several flavours of Unix available, all with their differences and idiosyncrasies. Linux was a fledging and fitted on just one CD. I came across a feature of AIX which I thought was crazy - the OOM (out of memory) killer. In this variant of Unix malloc always succeeded, even when there wasn't enough memory. The idea was that malloc returned a pointer to heap memory but wouldn't actually start to use it until the first reference was made. At the point at which it did then memory had jolly well better be available. If it was then all well and good. If not then the OOM killer came into play. The OOM killer would choose a victim process and kill it. The result was that memory would be freed and the access occurring at the time would succeed. Sounds insane, right? Right. I laughed and thought that this one feature rendered AIX useless compared to the other Unixes and would lead to its demise. How wrong I was. Fast forward a few years later. It was added to Solaris. Sigh. Fast forward to today. It has been added to Linux.

The OOM killer is a kernel development that mirrors what happens when banks try to innovate. It's what I call "the conspiracy of crappiness". It goes like this: some group or other tries to innovate but comes up with a really bad idea that doesn't work well and everyone hates it. The competition discover the move and for some inexplicable reason they copy it. Now everyone hates the competition as well and none of the players can be distinguished in this area. Bank charges on current accounts is an example. So is charging for withdrawals at ATMs (although customers have objected so vehemently to that one that there has been some back peddling). Well, in the world of Unix we now have the OOM killer.

There's a good article at LWN, that explains why this is insane. There's another article that gives tips on how to mitigate the nastiness, but surely that it yet another testimony to the fact that it is nasty. I also came across this article that discusses the nastiness and has an excerpt of .an amusing article that discusses the fairness, or otherwise, of how the victim is chosen. Here is the excerpt:


An aircraft company discovered that it was cheaper to fly its planes with less fuel on board. The planes would be lighter and use less fuel and money was saved. On rare occasions however the amount of fuel was insufficient, and the plane would crash. This problem was solved by the engineers of the company by the development of a special OOF (out-of-fuel) mechanism. In emergency cases a passenger was selected and thrown out of the plane. (When necessary, the procedure was repeated.) A large body of theory was developed and many publications were devoted to the problem of properly selecting the victim to be ejected. Should the victim be chosen at random? Or should one choose the heaviest person? Or the oldest? Should passengers pay in order not to be ejected, so that the victim would be the poorest on board? And if for example the heaviest person was chosen, should there be a special exception in case that was the pilot? Should first class passengers be exempted? Now that the OOF mechanism existed, it would be activated every now and then, and eject passengers even when there was no fuel shortage. The engineers are still studying precisely how this malfunction is caused.

Update: 27 March 2022

Since that aircraft analogy I have found an article on the perils of overcommit which gives a more dispassionate assessment, but still concludes it is a terrible idea: https://www.etalabs.net/overcommit.html

No comments: