7 Common Reasons For Office Downtime That Could Be Avoided

office downtime

Is there anything worse than when a critical application goes down? A lot of time is wasted getting to the bottom of the problem and rectifying it.

Office downtime can lead to bad user experience, which can even result in customers lost. Ultimately, it costs your business money!

To prevent office downtime from happening, you need to understand the most common causes so you can put steps in place to prevent them from occurring.

So, let’s take a look at some of the common causes in further detail, as well as some tips on how to avoid office downtime.

 

office downtime

Power failure – Let’s start with the most obvious cause of an application shutting down – or your entire operations shutting down for that matter – and this is a lack of power.

This isn’t something that happens often, but when it does, the destruction is massive. You find yourself on the phone to the supplier, waiting for updates, which take ages.

You can then find your offices without power for the rest of the day while the utility company get to the bottom of the problem.

This is why it is a good idea to have a backup power source. Or, make connections with a company like Rental Power who offer generator hire so that you have an easy solution to the problem if it does occur.

 

Failure domains – Another common reason for office downtime is failure domains. Failure domains can be described as a chunk of your infrastructure that can fail at the same time, altogether.

Needless to say, this can be damaging to any business, and so it is important to think ahead of time if you are to mitigate this issue. Try to determine what areas could be your failure domains.

Once you have done this, you can put your data and your backups elsewhere.

 

Bad deployment – Most experts agree that this is one of the main causes of outages at most businesses and organisations.

Deploying bad code is bound to result in an outage, and the best way to deal with this is to know how you are going to recover if this does happen.

Firstly, you need to know when a deploy is happening, and then you need to have a tried, tested, and true rollback process in place.

A progressive rollout can help be being slow enough to pick up on any issues that move slowly. A good ramp up strategy is to go for one per cent, to five per cent, to ten per cent, to 50 per cent, and finally, to 100 per cent.

 

office downtime

Uneven sharding – This happens when one shard is more popular than the other, which makes it busier. Hot spotting can also be a cause of this.

You will need to reshard in order to fix this, which can be done with one of two approaches. The first is by utilising a shard map to figure out which shards will become the most popular.

The second option is to split shards when they have become too big.

 

Bad dependency – This happens when requests start to pile up and dependency gets incredibly slow. This issue tends to occur when there are errors with regards to the communication of your application’s input and output.

So your client does not overload your backend systems, it needs to be a defensive driver. Also, you could tackle this issue with dynamic load shedding.

Going forward though, it is a good idea to mirror a disaster scenario with massive and sudden loan tests so that you can determine where the pain points are going to be.

 

Retry spikes – Once you begin to reject users, they are going to start retrying because they do not understand why they have been rejected.

Clients do not know the difference between a broken service and a single failure. The issue with this is that it can result in a cascading failure. So, how do you go about fixing this type of problem?

You will need to implement aggressive back-off. You also need to plan effectively for when the service is likely to be overloaded, so this is something that will need looking into in further detail.

 

office downtime

Overload – Last but not least, this blog post would not be complete without mentioning the issue of system overload. This is, quite simply, when capacity is exceeded by demand.

This is undoubtedly a common reason for office downtime problems because you are trying to do too much for what is available, or your popularity is exceeding the capability of your business.

One of the best approaches to deal with this is load shedding. This is a term used to describe shedding off the excess load, which will enable you to deal with this extra demand before it crushes your application.

When you hit capacity, before doing any work, return errors. You should calculate your service’s capacity so you can figure out what your application server can handle, and then you can dish out an error to every request that goes past this limit.

office downtime

Hopefully, you now have a better understanding regarding some of the most common causes of office downtime.

If you prepare for these scenarios before they happen, you can make sure you are best placed to deal with them when they do arise so that your business does not encounter too much downtime.

Or, better still, you can make sure such situations do not unfold in the first place.

Please Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: