In the fast changing technology world , various innovations and rise of cloud hosting IaaS , PaaS , SaaS and Pay as Go model etc. is very fascinating . But we have saying "all golden shining things are not made out of gold" same apply here, there are some pros and cons, and hidden pit-falls, and challenges to use them. It is important for a organisation IT community to identify them before designing a architecture of a solution, and remove those challenges.
I have worked on various critical clients systems like low latency financial trading platform , very high transactional, batch processing , multi-terabyte data mart , reporting and decision support systems. Sometime these systems goes wrong , performance goes down the hill after deploying a new release , small changes in code, or infrastructure issues. And some of the impact can be disastrous for the organisation.
In recently I looked into performance issue of one SQL statement dynamically generate by a reporting tool this tool have hundreds of reports and getting executed thousands of times in a day , this particular reports SQL started behaving badly after some release deployed at reporting server, as I found this report was running over night almost for 12 hours, no sign of ending , hammering IO subsystem so I decided to kill it.
When I looked at the some of the statistics of the process, what exactly it was doing, blown my mind , report has done massing ~ 9 quintillion IOs , What ! ~9 quintillion .. all it was doing is physical reads seems gone into infinity loop.
*** Physical read - means all IO request going back to disk fetching the data.
*** 9,223,372,036,845,683,031 ( nine quintillion two hundred and twenty-three quadrillion three hundred and seventy-two trillion and thirty-six million eight hundred and forty-five million six hundred and eighty-three thousand and thirty-one )
Long running reports is not a big deal, some of my client has much longer running reports when they have to do data mining or deep analysis on a very large data set. But for this report data set was not that big we are talking around 1Tb of database. Anyway later we manage to fix the report by rewriting some of the SQL code.
Recently client requested us for explore the option to move into cloud PaaS solution.
So, I started comparing this report how much it would have cost me, if ,I have used any cloud provider PaaS services. In this case I'm using AWS prices for comparing (nothing against AWS cloud provider they are awesome for the service they provide).
I looked into AWS Aurora DB service charges for compute , IOPS and data transfer out from cloud. Let's park compute and data transfer for now, just focus on IO cost.
AWS Aurora IOPS charge @ $0.20 ( per 1 million IO request) for read/write
So for this particular report how much it would have cost us
So for this particular report how much it would have cost us
Total IO = 9,223,372,036,845,683,031 / 1,000,000 = 9,223,372,036,845
IO Cost = 9,223,372,036,845 * $0.20 = $1,844,674,407,369
*** ( one trillion eight hundred and forty-four billion six hundred and seventy-four million four hundred seven thousand three hundred and sixty-nine )
Wow ! $1.8 Trillion for single query, which can make any large or small organisation completely bankrupt.
Knowing that SQL was spit out by a reporting tool in the error, Cloud provider might not have charged my client for unintentional mistakes. but still that's legal cost for service used and you at the mercy of cloud provider.
What is the solution ?
To avoid similar pit-fall each application and system need to be thoroughly reviewed on the basis of 5 pillar of cloud architecting
1. Operational excellence
2. Security
3. Cost effective
4. Resilient and
5. Performance
1. Operational excellence
2. Security
3. Cost effective
4. Resilient and
5. Performance
Before moving them into Cloud IaaS , PaaS and SaaS database solution. Compare and understand the hidden cost of such kind.
You can do some basic checks and capture some basic statistics like
- Data transfer in/out
- Total IOPS
- Compute
- Memory
- Cost
- Cost
Once you happy and move to the cloud also consider below configuring on SaaS ,PaaS solutions
- Cost Cap at billing level
- Monitoring - Cloud watch
- Auto Scale Up/Down
- Memory (to avoid Physical IOs)
- Memory (to avoid Physical IOs)
- Resource governance at database to restrict data fetch etc.