Monday, October 25, 2010

Live Webinar: Recorded Future News Analytics API for Quantitative Trading

When: Monday, November 1, 2010 at 11:00AM EST
Where: Web conference (register here)


On Monday, November 1, we're hosting a webcast to formally announce the launch of our Recorded Future news analytics API and to demonstrate a simple modeling approach that can be set up in set up in less than five minutes! Join us to see how our API short web seminar to see how our API data is used to support quantitative investment and trading strategies.

We’ll open by detailing the pricing and plans for our API product and then provide an in-depth presentation of Recorded Future’s temporal data and web service capabilities led by our Chief Analytic Officer, Dr. Bill Ladd.

In a demonstration of the API, we’ll show the retrieval of Recorded Future data live in an R analytic environment, how to integrate with existing financial data, and an initial modeling exercise. Additionally, we’ll discuss how the API can be used for activities ranging from construction of alpha-generating signals to regime change detectors.

We will cover how, using computational linguistics, we extract and temporally index events, entities,  and related measures from a wide variety of online media. This structured data identifies historical, current and expected future events as well as associated statistical measures such as momentum and sentiment.

Register for the November 1 event!

Friday, October 15, 2010

Minutes Away from Analyzing News Analytic Data


The Web Services API for Recorded Future is designed to answer some fairly sophisticated questions. Some people are writing applications using our API to integrate Recorded Future content into their internal proprietary applications. This blog has posted the results of several complex analyses and a variety of code examples using both R and Python are available on our Google Code site

But even if Web Services, JSON and Python are not familiar words to you, its pretty straightforward to get data and start modeling. I have an example R program that you can get up and running in a few minutes that will retrieve average sentiment and momentum for a company as well as its market performance since January of 2009. With this data its very straightforward to start looking for relationships in the data.

It will of course be easiest to do this if you are familiar with R, but you simply need to install R and download three packages, fImport, rjson, and RCurl. Once you have an R session running,
you can enter these lines of code to download and install the required packages. (You only need to install once; in subsequent sessions, just use the library commands.)

> install.package("rjson")
> install.packages("RCurl")
> install.packages("fImport")

> library(RCurl)
> library(rjson)
> library(fImport)

The rest of the code in the file sets up an R function createDataSet (after you supply an access token for the Recorded Future API). This function accepts a stock ticker as input and retrieves Recorded Future data for the average sentiment and momentum for each day over an 18 month period. It also retrieves stock prices and trading volume for the ticker and for an ETF that tracks the S&P 500. From the retrieved data, it calculates a number of derived values including:
  1. The difference between positive and negative sentiment
  2. The positive and negative sentiments multiplied by momentum
  3. The returns for the selected ticker and for the market index for the day after the news analytic metrics are calculated
  4. The market adjusted returns of the ticker


Subscribers to the Recorded Future API can now easily create a simple news analytic data set for an individual ticker, say the ticker for Amazon.com, as follows:

> out<-createDataSet("AMZN")

yielding a data frame structured like this:

> tail(out)
Day Entity Count Momentum Positive Negative Ticker
265 2009-09-22 33328212 64 0.0803539 0.0896110 0.01840840 AMZN
266 2009-09-23 33328212 157 0.1367590 0.0645906 0.01073010 AMZN
267 2009-09-24 33328212 34 0.0782548 0.0388494 0.00448654 AMZN
268 2009-09-25 33328212 76 0.1315870 0.0913595 0.01383090 AMZN
272 2009-09-29 33328212 52 0.0822941 0.1491080 0.00000000 AMZN
273 2009-09-30 33328212 64 0.0961285 0.0641504 0.04386140 AMZN

sentiment.difference weighted.pos weighted.neg Open High Low Close
265 -0.01840840 0.007200593 0.0014791867 91.46 94.19 91.10 93.75
266 -0.01073010 0.008833346 0.0014674377 92.82 94.50 92.22 92.38
267 -0.00448654 0.003040152 0.0003510933 92.00 92.71 90.77 92.11
268 -0.01383090 0.012021723 0.0018199666 91.44 92.25 89.75 90.52
272 0.00000000 0.012270709 0.0000000000 91.96 92.33 90.10 91.72
273 -0.04386140 0.006166682 0.0042163306 92.26 94.17 91.43 93.36

Volume Adj.Close returns SPY.Open SPY.High SPY.Low SPY.Close
265 8264900 93.75 -0.034619050 107.08 107.37 106.60 107.07
266 5685300 92.38 0.014721160 107.32 108.03 105.99 106.18
267 5075100 92.11 0.002926990 106.41 106.64 104.55 105.01
268 4256800 90.52 0.017412694 104.78 105.36 104.09 104.45
272 4393900 91.72 0.005328127 106.51 107.02 105.78 106.00
273 8539200 93.36 -0.017722530 106.36 106.46 104.62 105.59

SPY.Volume SPY.Adj.Close spy.returns marketAdjustedReturns weekday
265 143126700 105.43 -0.005802632 -0.028816418 Tue
266 225947400 104.55 0.008381800 0.006339360 Wed
267 228636800 103.40 0.011060464 -0.008133474 Thu
268 204059000 102.85 0.005333346 0.012079348 Fri
272 133733900 104.37 0.003061324 0.002266802 Tue
273 254383000 103.97 0.003839882 -0.021562412 Wed

Note that sentiment is divided into individual measures for positive and negative sentiment. We find these independently interesting values and model them independently.

With this data frame created, there are many exploratory possibilities. For example, we can create a linear model of relationships between our news analytic metrics and market adjusted returns

> model.fit<-lm(marketAdjustedReturns~Momentum+Positive+Negative+
weighted.pos+weighted.neg,data=out)
> summary(model.fit)
Call:
lm(formula = marketAdjustedReturns ~ Momentum + Positive + Negative +
weighted.pos + weighted.neg, data = out)

Residuals:
Min 1Q Median 3Q Max
-0.127379 -0.012972 0.002639 0.012891 0.057185

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.019392 0.009178 -2.113 0.03631 *
Momentum 0.171907 0.081764 2.102 0.03723 *
Positive 0.319033 0.101889 3.131 0.00210 **
Negative -0.313860 0.228773 -1.372 0.17219
weighted.pos -3.435493 0.833410 -4.122 6.27e-05 ***
weighted.neg 4.197043 2.132710 1.968 0.05097 .
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.02424 on 146 degrees of freedom
(4 observations deleted due to missingness)
Multiple R-squared: 0.1495, Adjusted R-squared: 0.1204
F-statistic: 5.134 on 5 and 146 DF, p-value: 0.0002285

This model suggests relationships between the market adjusted returns and momentum and positive sentiment and perhaps a marginal relationship with negative momentum. At this point, we are just scratching the surface of the models we could build and interpret. Additionally, we could easily loop through a collection of tickers to generate a larger set of data. The point is that once you have an access token you are literally just minutes away from analyzing news analytics data in R.


Friday, October 8, 2010

Recorded Future Day Trading

At Recorded Future, we often get asked how our news analytic technology can fit into an automated trading strategy. The answer: We offer a rich web service API that provides near real-time access to content as it is processed by our system. We've just posted some example R code to our Google Code site which illustrates how one might incorporate our API into this kind of strategy. At a high level, the code works as follows:

1) Load up a query which monitors for new content related to S&P 500 companies.
2) Every five minutes, poll the Recorded Future API for new content related to these companies (on the basis of the time the document was analyzed by our system).
3) If we see a new occurrence of one of these companies in source content, check to see whether that occurrence matches the following criteria:
  • Is this content truly relevant to the company at hand (using our new "relevance" score)
  • Does the content have sufficient positive (and insufficient negative) sentiment associated with it?
  • Does the company in question have sufficiently high momentum?
  • Is the company NOT already in our portfolio?
4) If the occurrence matches those criteria, we execute a paper "buy" order on the basis of current stock price. (Using near real-time quotes from Google Finance)
5) At the end of the day, get closing prices for every stock in our paper portfolio, and execute a paper sell at this price.
6) Calculate profits and losses on the basis of the trades made during the day.

Let's have a look at the results of this strategy, which was run on live data from Friday, October 1, 2010:


ticker

trade_time

price

close

returns(%)

1

GE

09-55-00

16.48

16.36

-0.728155340

2

ORCL

09-55-00

27.37

27.24

-0.474972598

3

GOOG

09-55-00

528.17

525.62

-0.482799099

4

WFC

10-00-00

25.42

25.56

0.550747443

5

BAC

10-15-00

13.15

13.30

1.140684411

6

T

10-20-00

28.82

28.81

-0.034698126

7

MSFT

10-25-00

24.50

24.38

-0.489795918

8

AMZN

10-30-00

152.99

153.71

0.470618995

9

KFT

10-40-00

30.92

31.21

0.937904269

10

AAPL

10-45-00

283.13

282.52

-0.215448734

11

WMT

10-50-00

53.29

53.36

0.131356727

12

HPQ

11-00-00

40.86

40.77

-0.220264317

14

MOT

11-20-00

8.55

8.56

0.116959064

15

VZ

11-30-00

32.87

32.89

0.060845756

16

S

11-35-00

4.65

4.72

1.505376344

17

F

12-10-00

12.38

12.26

-0.969305331

18

YHOO

12-15-00

14.18

14.27

0.634696756

19

GS

13-05-40

147.44

147.70

0.176342919

20

INTC

13-25-49

19.29

19.32

0.155520995

21

C

15-01-14

4.09

4.09

0.000000000

22

IBM

15-51-26

135.65

135.64

-0.007371913

23

NYT

15-51-26

7.83

7.85

0.255427842



You can see that we executed 23 "buys" at various times throughout the day. Our average profits were +11bp/trade, with our best being +115bp, worst being -96bp.

This is obviously a naive trading strategy, and customers are using much more sophisticated approaches. This approach does not include trading costs, carries with it an extremely small sample size, and has no risk control parameters. The purpose of this example is to show how one could include the Recorded Future API into a live trading strategy and to make available sample code for performing these operations. Via the API, we also offer this data historically for the purposes of modeling and strategy building. Take a look at our Google Code Site for more information about our API and contact sales@recordedfuture.com for more information about getting access.