Reading Ernie’s post today inspired me to write about something I’ve been wanting to write about for a while: how much of a performance impact you should expect from network latency.
Hypothetical Numbers
I’m going to throw out some web-application ‘back of the envelope’ numbers which I will then use for examples:
- The goal is to generate a page within 200-500ms.
- Network latency between the application server and DB server is 0.5-1ms.
There will be use-cases which have goals far more agressive than this, for example advertising server goals are closer to 50ms. I have also seen network latency well below 1ms. I encourage commenters to correct me where they disagree 😉
Analysis
So if we say that a typical web page requires 5-10 queries to render, you can see that the amount of time spent in network latency could actually be very low:
10 queries @ 1ms each = 10ms, or 5% of our 200ms goal. 10 queries @ 0.5ms each = 5ms, or 2.5% of our 200ms goal.
However, what I want to do is illustrate the effects of the N+1 query pattern. That is, some applications will naively repeat a query inside a looping structure. So using the numbers we have established, lets take a look at the difference between:
SELECT * FROM Country
– 239 rows in 1 query.SELECT * FROM Country WHERE code = '?'
– 239 queries with one row each.
Using performance_schema
+ ps_helper
, I can show the time that it takes to execute on the server:
mysql> select * from statement_analysis\G *************************** 4. row *************************** query: SELECT * FROM `country` full_scan: * exec_count: 3 err_count: 0 warn_count: 0 total_latency: 3.41 ms max_latency: 1.17 ms avg_latency: 1.14 ms rows_sent: 717 rows_sent_avg: 239 rows_scanned: 717 digest: 53567ecd08977b34a4532202a10871f4 *************************** 6. row *************************** query: SELECT * FROM `country` WHERE CODE = ? full_scan: exec_count: 5 err_count: 0 warn_count: 0 total_latency: 1.19 ms max_latency: 249.08 us avg_latency: 238.78 us rows_sent: 5 rows_sent_avg: 1 rows_scanned: 5 digest: cc32c7d6ec9e2803dea1ff95f458520a
Because the SELECT * FROM Country
retrieves every row, it is going to be a table scan (which is quite efficient in this case). We can see that the query time is 1.14ms to retrieve all 239 rows versus 0.239ms to retrieve just a single row.
Retrieving all 239 rows from MySQL as primary key lookups takes 0.239ms * 239 = 57ms.
However, once you account for 0.5ms network latency:
* The single step retrieval is 1.14 + 0.5 = 1.64ms.
* Row-at-a-time retrieval is 119.5ms + 57ms = 176.5ms.
And with 1ms network latency:
* The single step retrieval is 1.14 + 1 = 2.14ms.
* Row-at-a-time retrieval is 239ms + 57ms = 296ms.
This example is largely the “worst-case” situation, which I am using to illustrate an example. 2.14ms versus 296ms isn’t even comparable 🙂 To apply context – this difference will basically double my page generation goal of 200-500ms.
Merging Queries
My next example to run through, is what happens when we merge queries together. In this case retrieving multiple rows via primary key at once:
SELECT * FROM Country WHERE code IN ('AUS', 'USA', 'CAN');
instead of:
SELECT * FROM Country WHERE code = 'AUS'; SELECT * FROM Country WHERE code = 'USA'; SELECT * FROM Country WHERE code = 'CAN';
We have the time for the single row query above. Here is the timing information from ps_helper for the IN list query:
*************************** 8. row *************************** query: SELECT * FROM `Country` WHERE CODE IN (...) full_scan: exec_count: 4 err_count: 0 warn_count: 0 total_latency: 1.19 ms max_latency: 318.66 us avg_latency: 296.59 us rows_sent: 12 rows_sent_avg: 3 rows_scanned: 12 digest: b19ca11697506fac486dd35535c37c32
With 0.5ms network latency total time is:
* (0.29659ms + 1 roundtrip) = 0.8 ms total
* (0.23878ms x 3 + 3 roundtrips) = 2.2ms total
With 1ms network latency:
* (0.29659ms + 1 roundtrip) = 1.3 ms total
* (0.23878ms x 3 + 3 roundtrips) = 3.7ms total
So by merging 3 queries together, I very nearly received a 3x reduction in query time.
Another was to look at this, is that for simple queries the execution cost can be near-free on the MySQL side. Network latency really can matter.