The Empirical Histogram

There are 147 15-day intervals between January 1 and August 18. Here is the 15-day histogram of these 147 percentage changes:



Figure 1. 15-day percentage changes, IBM, 1/1/99 – 8/18/99.


If the IBM 140 call expires in 15 days and you are interested in the chance this call option will be in the money during the next 15 days,
you would calculate: out of the 147 15-day intervals, 66 showed a closing percentage increase of at least 3.89% sometime during the 15 days, and 66/147 = 44.9%.

What kind of bias might you be introducing if you calculate in this way, and how might you correct for it? IBM has been generally rising since January 1, 1999, so the 15-day percentage increases will be, on average, positive, and
using these 15-day percentage changes is tantamount to assuming that this upward bias will continue, at least for the next 15 days.

You may not believe that IBM will continue its upward climb, or you may wish to adopt a neutral position. If you look at the lower tail of this histogram and find the number of days in which IBM declined in the following 15-day period, you will find that to be 30, which leads you to a estimate of 30/147 = 20.4% as the chance IBM will decline by at least 3.89% before expiration. This is considerably smaller than the 44.9% chance we derived as the estimate that IBM would rise by at least 3.89%, reflecting the bias of the histogram towards the upside.

These two chances, 44.9% and 20.4%, should be about equal under a neutral market, so a natural way to adjust your probability estimate would be to average the two probabilities: (44.9%+20.4%)/2 = 32.7%. This 32.7% is now truly an off-the-cuff estimate, but you can see that it is more reasonable than the 44.9% estimate.

A longer history would also give a better estimate, assuming that the behavior of IBM prices has not changed. The point to understand is that the number you finally adopt as a probability is truly an estimate of future behavior. The only way to check its validity is to check the policy you use in its derivation, and to check it over a long enough history.

The histogram derived from the actual data is called the empirical histogram, as it is derived from empirical data. In some ways, it is the ultimate resource, but you have to use it with discretion.