When does one scale?
When one hits a bottleneck. That’s when.
What is scaled in that scenario?
Only the instance running the service which has become the bottleneck.
(You don’t scale every instance of the whole goddamn architecture, do you?)

The respective services in an architecture will be scaled, in the order of the occurences of the bottlenecks.

So in the long run, to accurately and precisely forecast costs, one has to predict when will which service become the bottleneck in the architecture, along with the order in which these services will exhaust their capacity.

The only way then, to estimate each services upper cap, and then directly link it to the number of active users. (in case of B2C) Then, multiply


Or you could do performance testing. Spend a lot of money and effort in the short run, to be able to get a better estimate.

I had a discussion about the same with a Solutions Architect working with Amazon. I was hoping I’d get to learn a better way to forecast costs rather than multiply for each of the service and then sum it all.
But I was suggested to follow the same method, which I was already following.

For the whole architecture, calculate LCM for all services. And then multiply for the whole architecture.

For your services running in EC2, first figure out what is the bottleneck depending on what you’re running in your EC2.
It may be CPU or RAM or disk space or network throughput.

Then, you have found the first bottleneck now, depending on how you decide to scale. This bottleneck might remain the same for the EC2 or might change.

So you see, first level bottlenecks are services itself, second level bottlenecks are respective metrics of the instances running these services.
And each can change at any point of time, as software is always alive. You might have significant change in the architecture, which might increase or decrease the costs again.

Horrible method if you ask me, do you know any better? Ping me!