Folks, I hope everyone are having some quality time with their families and loved ones, I wanted to give you a quick update on the events of the 24th.
As you know, on the 14th we encountered a hardware malfunction from one of our Data Center vendors. The vendor remedied the situation in a matter of hours and we were back in full swing. We thought the issue was resolved for good. On the 24th at approximately 1pm PST our engineers identified the same problem with the same vendor starting to resurface again. Over the course of the next three hours the Vendor’s engineers were coming up with ideas and hypothesis on how to resolve the issue, however no one really gave us a solid answer as to WHY this issue reappeared. The issue was identified as a hardware problem, but it was not clear where exactly the failure is happening.
In order to put this uncertainty behind us, we have decided to migrate our entire server array into a new region, which in essence means that we will be running the entire LivePOS system on a new set of hardware, leaving the “bad” hardware behind. Once the last store was confirmed closed, LivePOS engineers started to migrate the array as planned, that was at approximately 9PM PST. The team worked throughout the night and at approximately 2pm the following day the migration was 97% complete and LivePOS.com, Dashboard and POS transactions were all back online. The rest of the 3% was done that following night, brining the service to 100%.
With the offline feature working as designed, most of you were ringing up customers and collecting payments without delays, however some of you had to close and go home before all of your transactions synced. When the stores re-opened Saturday morning the sales automatically synced and posted on the correct date of the 24th, showing you a blue logo.
With monster companies like Facebook and Netflix (who also uses AWS) going down a few times every year, no cloud solution can be fully immune to this problem, it is a simple fact of life, and while rare, we at LivePOS want to make sure that we are better prepared if this ever happens again.
In the next few weeks we are going to revisit our offline procedures and add enhancements and capabilities to the LivePOS offline system. We will post enhancements and feature updates in our weekly Friday email newsletter. Also, where applicable, we will be reaching out to some of you who were effected more than others to offer some good will compensation.
As I indicated in my last posting, I have been with LivePOS for over 10 years now, and to my knowledge this is the first time in a decade (!) that we had to deal with Data Center issues TWICE in the same month. While this is not comforting in any way, it help to know that these events are rare. Very rare, and like Murphy’s law, they always show up at the worst time.
I want to thank you again for your patience and understanding. In the past few days I have received many customer emails indicating that while the downtime was unpleasant, they were thankful for the constant updates (via our status page) and quick remedy of the situation. Thank you all for your kind words.
I want to wish you and your family a great new year, full of health and wealth.
David Miller, Head of IT LivePOS