SaaS platform re-build

SaaS Platform re-build

This UK based internet security customer had over 2,000 servers distributed across 14 data centres worldwide from Australia and Hong Kong to UK, USA, and mainland Europe. The internet based software service scanned email and websites for viruses, inappropriate content – such as images, words, and for spam. Customers of this SaaS provider included everything from SME’s to large corporate organisations and governments and in most countries worldwide etc Energy costs were increasing in the data centres, data centre space was at a premium. A software based solution to the problem existed.

THE CHALLENGE

The company was winning lots of business, and every time it did so it needed to consider ‘scaling up’ its hardware. Scaling up usually required additional servers in more data centres i.e. a bigger hardware footprint. There was however an option of re-factoring the code so that utilisation of CPU/Memory etc. was reduced to such an extent that it could do away with the need for additional hardware for several years. This would reduce capex spend, and make processing much more efficient giving clients a much better service. 

Development was taking place in the UK with testing in Canada and the team had fallen behind schedule. 

Once the software was tested and passed functional and non-functional testing it needed to be installed –  ‘cut over’ – to data centres worldwide. There were key customers in some data centres whose Email volumes were huge – these were to be cut over last. The risk of a big bang meant that a schedule of which towers (racks of servers) within which data centres had to be prioritised heavily and signed off in advance.

ABOUT THE PROJECT

There were  many competing priorities. Several key team members were sucked into thinking about ‘phase 2’ when phase 1 was clearly in trouble. The test team in Canada needed to increase the pace on testing, and yet defects were going up and  not getting fixed as quickly as they could be.  The team was not short on ‘smart’ people, but it needed a rigorous approach to resolving existing defects, and setting a solid baseline in terms of a quality release before touching other aspects of the software.

We added testers to the UK development team and with the Canada testing operation this allowed extended testing cycles.

The safest way of proceeding was to deliver the project in increments of two week cycles which had clear objectives attached, and which were achievable but stretched the team.  

Close monitoring of progress and quick removal of any constraints improved moral and gave the team belief that everyone was pulling in the same direction.

OUR ROLE

Our role was to recover the project, reorganise the project, set clear priorities and clear time-scales for completion. Step one involved re-allocation of all resource to work on phase 1 until zero defects was reached. Once this was done only then would the team move onto step 2 – additional software development. We brought members of the operations function into a co-located team who were set the task of creating a mini ‘NOC’ . The NOC were able to see the results of testing on the overhead screens and highlight unusual or unexpected results. This helped aid understanding of the entire team.

As data centre cut over was in progress server utilisation was seen to drop dramatically with no errors. The impact on the company’s bottom line was that hardware upgrades were forecast not to be required for several years. The project  was later deemed ‘the most successful project ever delivered’ and the commercial advantages were huge. Several months later the company was sold in a trade sale to a large US corporation.