Upon migrating my blog to a Google Cloud Compute Engine VM (this very blog) I had a strange issue where it would freeze and become unresponsive. Only “Resetting” the VM would bring the blog back.
The findings were so interesting I thought I would share them.
Issue
It’s worth noting what the instance is (and importantly what resources it has been allocated) as it probably would not be an issue with a larger (and more expensive) instance. As of writing the instance is set to a “e2-micro” instance located in “us-central1”. The reason for this is you get one free instance in this region when using Google Cloud.
The “e2-micro” instance is one of three (“e2-micro”, “e2-small” & “e2-medium”) that are “shared core” CPU types. This means that they are given two CPUs and are allowed to burst for a percentage of time before throttling (see here for info). The “e2-micro” instance is allowed to burst for 30 seconds at 100% before throttling. It also has only 1GB of memory.
As you can see from the above CPU metrics at just before 6:00pm most of the metrics stop returning data. This data is gathered and returned from the Google “ops-agent” that is installed on the instance. It’s also worth noting that just after 5:30pm there is a little CPU growth before it plateaus again.
From the Disk metrics it drastically increases the load on the server at the same time. Note that this was only the OS disk (not the data disk).
The story is the same with the Process metrics. However as you can see there is a process for “unattended-upgrade” this is just caught before the server stopped logging. This might have been a fluke that this was caught but the same general issue can bee seen in the logs as well (server upgrades that is).
From the logging you can see two messages regarding “Starting Daily apt upgrade and clean activities…” and “Starting Clean php session files…”. After this the “DeadlineExceeded” messages are just that the VM has frozen/crashed out.
I did some digging into this and found that by default the Ubuntu 22.04 images from Google have the auto upgrade setting enabled.
In the above image the auto upgrade setting is stored in the file “/etc/apt/apt.conf.d”. Setting the values for both of these to “0” will disable them.
To verify this I disabled the auto fetch and auto update settings for the VM to see if it would still have issues and found that it did not. Manually updating the VM did not have any issues. I suspect that there may be a resource clash issue here where the auto updating uses too much than what is available when also hosting the blog as well as the associated resources.
Outcomes
Although this fixed this issue (for now) I still have to be careful with what I do with this VM. Using too much CPU (say for scaling images) can freeze it like the auto updates.
The documentation state that after the CPU burst period has exhausted it goes back to the existing limits. However when this server runs out of burst credits it seems like it does not have enough CPU resources to even run.
It kind of makes sense though when you think about though as you are limited in CPU TIME not FREQUENCY. If you instead only need a lot of CPU for 15 seconds out of the 30 seconds this is effectively 50% CPU boost for 30 seconds.
If you’re not following I think this instance is always technically CPU boosting in it’s “base” configuration. So it’s not using enough to get throttled in normal operation but normal operation requires “some” boost (from the database probably) to run at all. This would also explain why “resetting” the instance brings it back whilst waiting for it to recover on it’s own will never occur.
Leave a Reply