Friday, August 6, 2010

Predictions of "Futures"

As I mentioned in an earlier post, one of our approaches to prediction is to gather, organize and present the predictions of others. Obviously, these aren't guarantees of future occurrences, but we feel there is value in having access to this type of information for a variety of reasons. Some of these are simply announcements of planned activities such as this news from March of future iPad availability:

http://www.9to5mac.com/ipad-april-3-pre-orders-march-25409682734

and some are mere speculations, such as when iPhones might be available on the Verizon network (January 2011?)

http://www.physorg.com/news197109182.html

In either case, if you are interested in a topic, you are probably particularly interested in associated likely future events.


When we harvest content, we capture both the publication date of the information and the event time of the events harvested. This event time can be in the past, present or future depending on the linguistic context of the content. As such, a certain percentage of events that we harvest are forward looking. The time spans for these future events can vary widely from a day to a month to a decade depending on the precision of the prediction.


For this analysis, I used our news analytics web service interface to collect our forward looking statements about S&P500 companies with a time span of a single day. Note that I'm not discussing events occurring after today (July 28, 2010) but rather events predicted to occur after the publication date. With these events in hand, it is relatively straightforward to assess what happened on the markets on these "future" days. I looked at "future events" predicted from the beginning of 2009 to the present. For each company with more than 5 predicted events, I looked at the local standardized trading volume for each day. This standardized the volume of the company on the previous 20 days and is essentially the volume for the company for the day minus the recent average volume for that company and then divided by the recent standard deviation of the volume for that company. I also tried using various other standardizations but the change had little impact in the overall analysis. I fitted a model across all of the companies looking at whether having an event predicted for a day predicted a larger volume than not. And on average, this turned out to be true. The model output from a model fit in R is below

Call:
lm(formula = LZvolume ~ future, data = datatable[datatable$futurecount > 4,])

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.045481 0.006486 7.013 2.36e-12 ***
future 0.403100 0.024010 16.789 <>
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.718 on 75694 degrees of freedom
(67 observations deleted due to missingness)
Multiple R-squared: 0.00371, Adjusted R-squared: 0.003697
F-statistic: 281.9 on 1 and 75694 DF, p-value: <>


The key observation here is that the future estimate is positive suggesting that on average the standardized volume increases around 0.4. Recall that with a standardized variable, 95% of the data is roughly between 2 and -2 (96% in our case) so a change in 0.4 is about 10% of the range. The R2 values here are relatively low suggesting that while on average the a future event is associated with an increase in volume, the overall data is quite noisy.

We then looked at whether the trend held up across different industries and found that generally it did. The estimates and P-values for several industry categories are listed below.


IndustyFuture effect (change in standardized Volume)P-Value
Industrials0.513.03E-11
Health Care0.644.73E-09
Consumer Discretionary0.512e-16
Information Technology0.331.23E-14
Utilities0.320.1712
Financials0.251.10E-06
Materials0.854.10E-09
Consumer Staples0.621.62E-10
Telecommunications Services0.260.0173
Energy0.430.0023



I took a more visual approach to this analysis as well. For each company, I split the volumes into two groups, those with an associated future prediction and those without. I used a Wilcoxon test (similar in concept to a t-test, but less sensitive to outliers) to compare these two groups for each company.

Using Spotfire to display the set of P-values I obtained (one for the statistical difference of the two groups for each company) gives the following histogram,

If there were no relationship between future events and our volume measure, we’d expect this bar chart to be relatively flat, with roughly 5% of the t-tests having a p-value less than 5%. In contrast, we see about 35% of our companies having significant differences between predicted event volatility and non-predicted volatility.

From our earlier analysis, the companies where future events are predictive of volume increases don’t appear to be segmented by Industry. So are they segmented by market cap, region, news coverage? That will need to be the subject of a future post.

No comments:

Post a Comment