New box plot for Luke Donald for the 2015 GP Season up to The Barclays |
As above, but showing hover text. |
- The minimum value in the data set is the left most point (seen as a vertical tick above). This corresponds to the best finish for the golfer in the results analysed.
- The maximum value in the data set is the right most point (also seen as a vertical tick above). This corresponds to the worst finish for the golfer in the results analysed.
- The average value in the data set is the red vertical line in the box. This corresponds to the average finish for the golfer in the results analysed.
- I have adapted the box itself to represent one standard deviation either side of the average value. In layman's terms, this is where most* data points in the data set fall. This corresponds to the majority of the golfer finishing positions in the results analysed.
- The lines that join the box to the minimum and maximum values. These are known as whiskers and the longer they are, the more of an aberration (outlier) the minimum/maximum value is compared to the majority of the data points. In this case, a long whisker indicates that the best/worst finish is significantly different from the golfer's usual finishing position in the results analysed.
While ideally, a golfer should have the lowest best and worst result possible, in the real world, most top golfers will have some mixture of good, so-so and bad results, especially over the course of a GP season or longer time frame. A box plot tells you immediately where a golfer usually finishes and how different from normal his best and worst finishes are. Looking at the screenshot above, you can see that the whisker from the box to the best result is pretty short, while the whisker to the worst result is very long. This tells us that Luke Donald has usually performed much closer to his best result than his worst one in the 2015 GP season and that his worst result was a significant aberration from his usual result.
The ideal box plot for golfer results therefore is one with a low best finish, a narrow box** (=low standard deviation, which means high consistency), a very short (or no) whisker to the left and a long whisker to the right. For example, the box plot for Jordan Spieth for the 2015 GP Season so far almost meets these criteria (his standard deviation is a bit high!). Since his results have been so good in the main, the lower end of the box encompasses his best finish of first (see note 2 below) and there is a long whisker from the box to his worst finish of 119th.
Hovering over the box plot will display the main values for the chart, as shown in the second screenshot above. Most of this information is also just above and below the box chart, but the hover text also displays the lower and upper bounds of the box (which is the average minus/plus the standard deviation). Some notes on these new box plots:
- The box plot is hidden if the best finish is equal to the worst finish. In this case, the best/worst/average result are all the same, the standard deviation is zero and the box plot is a single vertical line!
- If the box bounds are lower than the best result or higher than the worst result, the box bounds are truncated at the best/worst result. This means if the average minus the standard deviation is less than the best result, the best result becomes the left side of the box (ditto for the other side of the box). This is because none of the majority of the golfer results can be outside the bounds of his actual results! For example, for Jordan Spieth in 2015 so far, his average minus his standard deviation is a negative number, which obviously can't be a finishing position in a tournament and is less than his best actual finish of first. Therefore, his box begins at 1, his best result.
- I have noticed that the edge of the box plot is slightly off in certain rare situations and only when at least one end of the box is the same as the best/worst result. This is only a very minor issue and beyond my control. All attempts to remedy it resulted in a worse situation, so I left it as is.
- These charts have been implemented using a modestly sized JavaScript library and should not impact noticeably on the page load time. If you experience any issues with this new functionality, please contact me immediately.
There has been a total of thirteen new box plots added to the pages on the site that show statistical information on a golfer's results in a certain collection of tournaments. Specifically, the box plots have been added to the following pages:
- Season Data page (1 box plot for season results)
- Golfer Data page (5 box plots for overall/regular/major/WGC/FedEx results in the golfer's career)
- Prediction Data page (7 box plots for GP season/last five events/course/tournament/last twelve similar events/similar weather/similar length course results)
This brings the total number of charts on the site to 433. I trust you will find these new box plots useful as a graphical representation of a golfer's performance in certain key areas. Just another way to make Golf Predictor even better!
No comments:
Post a Comment