AliCloud (Aliyun) Server Crashes Fixed with the Help of Site24x7

An Australian client company with a subsidiary in China had set up hosting on AliCloud (Aliyun) for their local website. Since set up the site has had many outages usually during the night Australian time and was having to be rebooted most mornings.

We used Site24x7 website monitors (https://www.site24x7.com/website-monitoring.html) to keep an eye on the server and notify of any outages. That became depressing reading as despite scouring logs, we could not find any events or events that triggered the outages: –

Site24x7 website monitor of AliCloud server showing the site outages we had problems with

You can see from the above that in a month the site was down nearly 13% of the time, usually at night, but the frequency seemed to be picking up.

We knew that the server had way more capacity than should have been required, as we had a very similar site running in other locations on AWS servers with far less resources, but we were struggling to find what was causing all these crashes.

Site24x7 also includes server monitors for various server types (https://www.site24x7.com/server-monitoring.html), which with a bit of work on the Linux command line are easily installed (https://www.site24x7.com/linux-monitoring.html).

Looking at the end of the month you can see the unhappy state of the server, which is always teetering on the border of failure. What this monitor did tell us though was exactly why. CPU was trickling along at a few percent, but memory was nearly always maxed out, and the crashes occurred at memory peaks, when the memory usage hit 95%. That was despite the fact the server had 2GB of RAM and WordPress was set to use 128MB of it: –

Site24x7 Server Monitor showing the physical memory usage and outages

Site24x7 monitor showing memory and a server crash illustrating that the memory peaks before the server crashes.

Another graph in the server memory area showed the memory used and included both the Used Physical Memory and Swap Memory. You can see from this graph below, that swap memory was not being used at all.

Site24x7 Server monitor showing that the used swap memory was zero.

Although swap memory wasn’t being used, we could see that it was available from the Memory Breakup graph (below). You can see here that there is a full 2GB of swap memory available but unused at almost all times.

Site24x7 Server monitor showing that there was free swap memory available, although it was not being used.

Solution

As root, use “swapon –show” to see if the swap file is on. In my case this confirmed that the swap was there which confirmed the finding of the Site24x7 Linus monitor.
As the swap memory is there and we have 2GB allocated, that must mean that the AliCloud/AliYun install has the swap file correctly configured, just not switched on! To all the server to use this memory we need to adjust the “swappiness” of the server.
You can find what swappiness is set using “nano /proc/sys/vm/swappiness” and even better adjust it using “sudo sysctl vm.swappiness=50” (I found 50 to be a good number, but you can adjust to suit).
Leave this running for a few hours and see if you see the swap memory increasing in the site 24×7 monitor. In my case I saw that happening within minutes, and the physical memory usage dropping dramatically as a result. In my case you can see how quickly that had an effect below. Note that the physical memory usage dropped to around half of what we were often running previously. You can also see that crashes stopped.
The last thing to do is to ensure that the swappiness always sets to that same value, in my case 50, on every reboot. To do this, you need to edit the “sysctl.conf” file in the “/etc/” folder. Add the line “vm.swappiness=50” and save the file.

Site24x7 Monitor showing the activation of virtual memory after swappiness had been set.

Result

The results have been dramatic. The only outages we have seen are planned reboots I had in place to restart the server in case of failure (now switched off): –

Site24x7 report showing the reduced number of outages on the AliYun Server after the swap file had been activated.

Physical memory usage is now averaging about 65%, so we have plenty of available capacity, which is why we are seeing no crashes. Previously we had averaged around 85% usage.

Site24x7 memory usage graphs showing reduced usage of physical memory after swappiness was set

The likely cause of this issue is a misconfiguration of the Ubuntu machine image on Alicloud. This simple solution fixes that issue, and the result is the fast machine you know is there.

I should recommend the Site24x7 Linux Server monitor (https://www.site24x7.com/linux-monitoring.html) too. Without seeing the Site24x7 reports I would not have found this issue or been able to implement a fix so easily.

Solution

Result

Awards – Graduate Certificate in Cybersecurity