A well-developed methodology helps data center operators provide high service quality despite unforeseen situations and local constraints
In today’s hyper-connected and digitized environment, data center downtime can result in significant business loss. It is for this reason that one of the top priorities of CIOs (Chief Information Officers) is to ensure systems and SLAs (Service Level Agreements) are in place that ensures peace of mind as far as data availability is considered so that the company can concentrate on its core business. On their part, most top-tier Internet Infrastructure companies, especially hyperscalers, strive to provide at least 99.999 per cent and above availability guarantees to their customers. A 99.999 per cent availability, or “five 9s” as it is called in industry parlance, translates to an average downtime of just six minutes per year [1].
While providing such business guarantee on uptime and availability through SLAs is rather easy, operationally achieving it, year-in-year-out, at times in environments where the customer has operations across several countries – with their own constraints and peculiarities – requires not only hard work but also the right systems, people, practices, and principles that can ensure an environment of high-quality service assurance.
In a nutshell, there is a requirement for a culture of Operational Excellence (OE) throughout the company. OE allows organizations to quantifiably measure every aspect of their operations and improve these based on these measurements. This is a continuous process which successful companies use to improve their systems and processes by making sure all operational aspects are under control.
OE is by no means unique to the Internet Infrastructure and data center industry. It is a central pillar of today’s industrial and infrastructure landscape as organizations scramble to digitize and adopt Industry 4.0 methodologies [2]. However, it is especially critical for data centers as they are by their very nature secured and controlled environments for critical servers, network and storage equipment. In today’s digitalized business environment data centers support all aspects of the enterprise irrespective of the nature of the business, be it online sales, public sector, or manufacturing that relies on a smooth digitalized supply chain.
For the data center industry, OE encompasses a host of measures that are required to avoid any form of unplanned outages that can affect service availability or resiliency of the network. One could argue that OE is the raison d’etre for data center companies as without OE, they are just real estate firms with expensive gear.
Building blocks
Operational Excellence is the building block of the infrastructure that underpins the internet and all the applications, content and transactions that we have come to rely on and take for granted. Unplanned outages of any nature, however small or infrequent, undermine the reliability of the services that run on the internet.
It is worth noting that companies become colocation data center customers because they themselves don’t want to manage the complexities of an in-house data center and deal with the associated issues such as temperature control, power requirements, service guarantees and others because that would distract them from their core business [3].
Customers gravitate towards colocation data center providers who have the systems and quality assurance that comes from a comprehensive OE culture because then they don’t have to worry about potential outages due to a fluctuation of power, temperature and the countless matrices that need to function flawlessly in a modern data center.
While the approach followed by various companies to achieve OE may differ in emphasis and focus, all look to achieve some broad goals. These include ensuring a safety-first approach, optimization of processes using a skilled workforce, automated maintenance management, early risk mitigation, quality assurance and compliance, and a culture of continuous innovation [4].
Building a good OE system
How does one build a good OE system? First of all, one has to start from the stated outcome – what does OE mean in quantitative terms (downtime, variability, time to restore, and other parameters). This comes from a deep understanding of what customers want now and what they are likely to want in the future. This is followed by a good understanding of the local realities – availability of talent, vendor quality, reliability of the infrastructure reliability, and other parameters.
Once the data is in, it helps companies to develop a comprehensive OE strategy – which, shorn of jargon, is nothing but the development of an approach towards delivering the desired outcome while being cognizant of local realities.
This strategy then has to be executed on a consistent basis through a combination of having the right people in place and making the appropriate investments to enable them to deliver OE.
Princeton Digital Group (PDG) follows what it calls the three Ps of OE: People, Principles and Practices.
People are the most critical element for ensuring OE. While at a data center facility level, ideally one would want to remove dependence on human beings – because where there are humans, there inevitably are human errors – the reality is that the application of OE is heavily dependent on talented, experienced people designing the appropriate contingencies and stress testing to ensure that operationally the data center is ready for all eventualities. The right people are a necessary condition to delivering OE.
Strong principles
In a multi-country, multi-site business like that of PDG’s, there is a need to develop strong principles that pervade the entire operational model. This is critical because the company’s customers expect consistency in data center services – whether it is in Shanghai, Mumbai, Jakarta or Singapore. These principles help to make the right decisions when there is a need to make trade-offs or when there are resource constraints. While customization is important, based on local country-level requirements, it is critical that OE principles are not diluted at any cost.
Practices are what drive the conversion of theory into reality at PDG, and they are not just internal but also apply to the company’s vendors and partners. Strong practices means that OE habit is developed throughout the organization as well as among vendors in the supply-chain.
There are several challenges that are common to the data center business across countries – and certain challenges are country or even city specific. However, the one challenge that has been common across the board in the last year has been delivering OE during a worldwide pandemic.
The industry has experienced an unprecedented force majeure situation for more than 12 months now and it has exacerbated everything that is hard about delivering OE in the data center business. PDG’s signal achievements during this year has been delivering an entire new site in Shanghai and a significant expansion in Singapore while operating existing capacity in both Singapore and Indonesia, all the while delivering outstanding customer service and satisfaction.
About PDG
Princeton Digital Group (PDG) is a Warburg Pincus-backed investor, developer and operator of internet infrastructure. Our portfolio of data centers powers the expansion of hyperscalers and enterprises in the world’s fastest-growing digital economies. Our agility, speed and unmatched experience in scaling global internet infrastructure provide our partners and customers immediate access to growth opportunities across Asia.