Friday, March 4, 2011

Factor Modeling Media Analytic Data

At Recorded Future, we’re scouring the web for predictive signals in online content. Previously, we’ve covered our efforts at complex event modeling, and liquidity modeling using news flow information. Publicly, we’ve also touched briefly on some of our returns modeling - we’ve seen instances of particular blogs that seem to have superior predictive power in terms of their ability to write about stocks that will outperform.

Recently, we’ve expanded this approach to build a whole-market factor model that uses media analytic data to predict excess returns. Using aggregate data for the S&P 500, which is available to our API customers, we’ve built a number of factors that are derived from online sentiment and momentum of S&P 500 constituents that show statistically robust predictive signals of market-relative returns over a 1-day to 1-week investment horizon in a time-series cross-sectional modeling environment.

Factor Examination
Let’s take a look at one such factor, which is based on sentiment and momentum. If we take this factor, and break it into deciles by day and then construct portfolios for each decile, we see the following cumulative continuous returns in these portfolios. We’ve included dividend-adjusted returns to the SPDR S&P 500 ETF (SPY) as a benchmark in bright orange.


You can see quite clearly that over the last two years, our top decile (in orange) has outperformed all other deciles in a fairly consistent manner. Meanwhile, the bottom three deciles (the three darkest shades of blue) have underperformed all other deciles, as well as the market. One thing to note is that this relationship is not strictly linear. For instance, our 2nd, 3rd, and 4th place deciles actually fall near the middle of the returns distribution, which may have something to do with the construction of this particular factor.

If we compare the portfolios to the performance of the S&P 500 over this period, we find that the portfolio in the top decile has a Beta of 1.08, assuming a risk free rate of return roughly equivalent to that of T-bills over the period. It has a statistically significant annualized (continuous) Jensen’s alpha of +16% over the period. When we examine the bottom two deciles under the same assumption, we see that they are high Beta portfolios (1.37 and 1.34, respectively), but with statistically significant and negative alphas, at -42% annually, and -26%, annually. As you might imagine, constructing hedged portfolios out of the securities in these deciles provides some possibly compelling trading strategies.

If you’d like to experiment with this approach yourself. We’ve made some R code available on our Google Code site which will pull in market data, Recorded Future data, and perform this sort of decile analysis on a factor of your choosing. You’ll need a Recorded Future API token to pull that data.

Soon, we’ll discuss the inclusion of a factor like this into a portfolio built using other factors based on Recorded Future media analytic data, and find out whether a portfolio like this can stand up to trading costs, and evaluate its performance in an out-of-sample context.

Thursday, March 3, 2011

Turning Online Media into Big Data for Quants

We recently hosted a webcast discussing applications of the Recorded Future news analytics API for quantitative finance, and a big thanks goes out to everyone that joined us. The original presentation can be viewed here and slides from the session detailing how we turn online media into actionable data as well as several case studies are below: