Tuesday, June 8, 2010

Does Momentum Predict Higher Trading Volume?

[Originally posted at the Recorded Future blog]

Every day, billions of dollars change hands in the U.S. stock market. In a single trading day, a big company like Exxon Mobil might see $5 billion of its shares change hands. This quantity might not mean anything to an individual investor looking to buy or sell a few shares on Etrade, but it means a lot to a large institutional investor looking to unwind a $500 million position in a particular stock.

With a position that large in a given stock, the aforementioned investor is unlikely to be able to trade completely into or out of their desired position the stock without adversely affecting its price over the course of a single day. Instead, they are likely to work the trade over the course of several days. But just how much of their position should they trade in a single day? The answer to this question is in part dependent on that investor's expectations of how much will be traded in the greater market.

So what are the factors that drive expected trading volume? In part, the previous day's, and perhaps month's trading volume impact expected near term trading volume. Additionally, a significant piece of news about a company, such as a product release or earnings announcement may drive trading volume up as traders more actively move shares around and the market determines the “right” price for a stock.

News/media flow (remember it's not only about news in it's classic sense - it involves everything from regulatory filings to blogs) is typically difficult to quantify. What constitutes news? How much of it is out there? Is it actually relevant to a company's value, or is it just PR fluff? Additionally, obtaining historical data about the news and using it in a statistical model is often quite difficult.

Using data from Recorded Future's advanced platform for processing the semantic structure of the web, I have taken a simple autoregressive model to predict trading volume, and augmented it with quantified information about the news about companies in the S&P 500 Index, and will demonstrate that incorporating this information into the model has a statistically significant effect and may provide more accurate predictions about future trading volume.

Experimental Setup and Data


I obtained my information about individual stocks from August 1, 2009 to April 10, 2010 from Yahoo finance with the help of the Rmetrics software package. Yahoo's historical quotes provide pricing and share volume information for companies on each date. I estimate daily dollar volume for a company as its share volume * closing price, because Volume Weighted Average Price was not publically available. I made use of the timeSeries class on this data to calculate lagging and trailing moving average numbers for each stock on each day.

Using the Recorded Future API together with
the R language, I was able to pull aggregate news information about all of the companies in the S&P 500 over the same period. Included in this information was each company's “Momentum” on any particular day. Momentum can be thought of an aggregate indicator of news or “buzz” behind a company on a given day.

Using the R software, I then combined this data to derive a time-series cross-sectional dataset, representing dollar volume and news sentiment for all companies in the S&P 500 over the time period.

Model Specification


I propose a simple autoregressive model for predicting trading volume with a simple moving average term as follows:

DVt = a*DV(t-1) + b*SMA(DV, t-1, t-20) + et

Where DVx is Dollar Volume at time x, SMA provides a simple moving average function on a range of time periods, and et is the error term at time t.

I then augment this model with a momentum term. Because we are estimating Dollar Volume, and that raw number is highly variable (and largely dependent on Equity Market Capitalization of a given firm), I scale momentum by the moving average term.

DVt = a*DV(t-1) + b*SMA(DV, t-1, t-20) + c*(MOt-1*SMA(DV, t-1, t-20)) + et


Experimental Results


I constructed two models using the R "lm" function:

> dflm <- lm(Dollarvol ~ 0 + lDollarvol + smaDvol.Dollarvol, seriesdf)
> summary(dflm)

Call:
lm(formula = Dollarvol ~ 0 + lDollarvol + smaDvol.Dollarvol,
data = seriesdf)

Residuals:
Min 1Q Median 3Q Max
-5.060e+09 -2.277e+07 -2.686e+06 1.755e+07 1.597e+10

Coefficients:
Estimate Std. Error t value Pr(>|t|)
lDollarvol 0.513351 0.003237 158.6 <2e-16 ***
smaDvol.Dollarvol 0.477892 0.003600 132.7 <2e-16 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 1.71e+08 on 72110 degrees of freedom
Multiple R-squared: 0.8539, Adjusted R-squared: 0.8539
F-statistic: 2.107e+05 on 2 and 72110 DF, p-value: < 2.2e-16

> dflmMo <- lm(Dollarvol ~ 0 + lDollarvol + smaDvol.Dollarvol + smaxlMo, seriesdf)
> summary(dflmMo)

Call:
lm(formula = Dollarvol.1 ~ 0 + lDollarvol.1 + smaDvol.Dollarvol.1 +
smaxlMo, data = seriesdf)

Residuals:
Min 1Q Median 3Q Max
-5.039e+09 -2.215e+07 -2.284e+06 1.813e+07 1.597e+10

Coefficients:
Estimate Std. Error t value Pr(>|t|)
lDollarvol.1 0.513193 0.003237 158.54 < 2e-16 ***
smaDvol.Dollarvol.1 0.471645 0.003817 123.56 < 2e-16 ***
smaxlMo 0.077162 0.015683 4.92 8.67e-07 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 170900000 on 72109 degrees of freedom
Multiple R-squared: 0.8539, Adjusted R-squared: 0.8539
F-statistic: 1.405e+05 on 3 and 72109 DF, p-value: < 2.2e-16

We can see that the addition of the Momentum term provides a statistically significant enhancement to the estimate at the 0.001 significance level as well as at the adjusted R-squared level.

Drawbacks


Note that both of these models are linear, when in fact the relationship between trailing momentum and current momentum may be non-linear. For example, based on the roughly lognormal distribution of Dollar Volume, a log-linear model may be more appropriate. Further, the error term may not be normally distributed or exhibit heteroskedasticity, invalidating some model assumptions and providing weaker than expected predictions.

Additionally, I have built these models on less than 1 year of economic data, and not taken into account other factors that may affect dollar volume. These include, but are not limited to - exchange/OTC market effects, non-linear market capitalization effects, industry effects, seasonal effects, and the effects of stocks with multiple share classes (e.g. Berkshire Hathaway). The relatively short time span does not capture a full picture of trading volume over the course of the greater economic cycle.

What's Next?


Whereas this is an interesting result in itself there are many types of deeper analysis to be done. What about volatility and price? What about breaking down Momentum from news/media flow by type - mainstream media vs. blogs vs. government filings, etc. ? What about exploring the effects of news/media Momentum to other asset classes?

If you'd like to try this yourself,
contact us to gain access to our API!

0 comments:

Post a Comment