Friday, October 15, 2010

Minutes Away from Analyzing News Analytic Data


The Web Services API for Recorded Future is designed to answer some fairly sophisticated questions. Some people are writing applications using our API to integrate Recorded Future content into their internal proprietary applications. This blog has posted the results of several complex analyses and a variety of code examples using both R and Python are available on our Google Code site

But even if Web Services, JSON and Python are not familiar words to you, its pretty straightforward to get data and start modeling. I have an example R program that you can get up and running in a few minutes that will retrieve average sentiment and momentum for a company as well as its market performance since January of 2009. With this data its very straightforward to start looking for relationships in the data.

It will of course be easiest to do this if you are familiar with R, but you simply need to install R and download three packages, fImport, rjson, and RCurl. Once you have an R session running,
you can enter these lines of code to download and install the required packages. (You only need to install once; in subsequent sessions, just use the library commands.)

> install.package("rjson")
> install.packages("RCurl")
> install.packages("fImport")

> library(RCurl)
> library(rjson)
> library(fImport)

The rest of the code in the file sets up an R function createDataSet (after you supply an access token for the Recorded Future API). This function accepts a stock ticker as input and retrieves Recorded Future data for the average sentiment and momentum for each day over an 18 month period. It also retrieves stock prices and trading volume for the ticker and for an ETF that tracks the S&P 500. From the retrieved data, it calculates a number of derived values including:
  1. The difference between positive and negative sentiment
  2. The positive and negative sentiments multiplied by momentum
  3. The returns for the selected ticker and for the market index for the day after the news analytic metrics are calculated
  4. The market adjusted returns of the ticker


Subscribers to the Recorded Future API can now easily create a simple news analytic data set for an individual ticker, say the ticker for Amazon.com, as follows:

> out<-createDataSet("AMZN")

yielding a data frame structured like this:

> tail(out)
Day Entity Count Momentum Positive Negative Ticker
265 2009-09-22 33328212 64 0.0803539 0.0896110 0.01840840 AMZN
266 2009-09-23 33328212 157 0.1367590 0.0645906 0.01073010 AMZN
267 2009-09-24 33328212 34 0.0782548 0.0388494 0.00448654 AMZN
268 2009-09-25 33328212 76 0.1315870 0.0913595 0.01383090 AMZN
272 2009-09-29 33328212 52 0.0822941 0.1491080 0.00000000 AMZN
273 2009-09-30 33328212 64 0.0961285 0.0641504 0.04386140 AMZN

sentiment.difference weighted.pos weighted.neg Open High Low Close
265 -0.01840840 0.007200593 0.0014791867 91.46 94.19 91.10 93.75
266 -0.01073010 0.008833346 0.0014674377 92.82 94.50 92.22 92.38
267 -0.00448654 0.003040152 0.0003510933 92.00 92.71 90.77 92.11
268 -0.01383090 0.012021723 0.0018199666 91.44 92.25 89.75 90.52
272 0.00000000 0.012270709 0.0000000000 91.96 92.33 90.10 91.72
273 -0.04386140 0.006166682 0.0042163306 92.26 94.17 91.43 93.36

Volume Adj.Close returns SPY.Open SPY.High SPY.Low SPY.Close
265 8264900 93.75 -0.034619050 107.08 107.37 106.60 107.07
266 5685300 92.38 0.014721160 107.32 108.03 105.99 106.18
267 5075100 92.11 0.002926990 106.41 106.64 104.55 105.01
268 4256800 90.52 0.017412694 104.78 105.36 104.09 104.45
272 4393900 91.72 0.005328127 106.51 107.02 105.78 106.00
273 8539200 93.36 -0.017722530 106.36 106.46 104.62 105.59

SPY.Volume SPY.Adj.Close spy.returns marketAdjustedReturns weekday
265 143126700 105.43 -0.005802632 -0.028816418 Tue
266 225947400 104.55 0.008381800 0.006339360 Wed
267 228636800 103.40 0.011060464 -0.008133474 Thu
268 204059000 102.85 0.005333346 0.012079348 Fri
272 133733900 104.37 0.003061324 0.002266802 Tue
273 254383000 103.97 0.003839882 -0.021562412 Wed

Note that sentiment is divided into individual measures for positive and negative sentiment. We find these independently interesting values and model them independently.

With this data frame created, there are many exploratory possibilities. For example, we can create a linear model of relationships between our news analytic metrics and market adjusted returns

> model.fit<-lm(marketAdjustedReturns~Momentum+Positive+Negative+
weighted.pos+weighted.neg,data=out)
> summary(model.fit)
Call:
lm(formula = marketAdjustedReturns ~ Momentum + Positive + Negative +
weighted.pos + weighted.neg, data = out)

Residuals:
Min 1Q Median 3Q Max
-0.127379 -0.012972 0.002639 0.012891 0.057185

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.019392 0.009178 -2.113 0.03631 *
Momentum 0.171907 0.081764 2.102 0.03723 *
Positive 0.319033 0.101889 3.131 0.00210 **
Negative -0.313860 0.228773 -1.372 0.17219
weighted.pos -3.435493 0.833410 -4.122 6.27e-05 ***
weighted.neg 4.197043 2.132710 1.968 0.05097 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.02424 on 146 degrees of freedom
(4 observations deleted due to missingness)
Multiple R-squared: 0.1495, Adjusted R-squared: 0.1204
F-statistic: 5.134 on 5 and 146 DF, p-value: 0.0002285

This model suggests relationships between the market adjusted returns and momentum and positive sentiment and perhaps a marginal relationship with negative momentum. At this point, we are just scratching the surface of the models we could build and interpret. Additionally, we could easily loop through a collection of tickers to generate a larger set of data. The point is that once you have an access token you are literally just minutes away from analyzing news analytics data in R.


No comments:

Post a Comment