Company | #Entity Instances | Market Cap (B) |
Goldman Sachs Group | 91738 | 79.3 |
Citigroup Inc. | 44675 | 119.6 |
JPMorgan Chase & Co. | 35547 | 165.5 |
Bank of America Corp. | 35419 | 171.2 |
American International Group | 25774 | 5.6 |
Morgan Stanley | 23041 | 38.9 |
Moody's Corp | 14813 | 5.3 |
Wells Fargo | 14244 | 175.2 |
SLM Corporation | 10133 | 5.8 |
Prudential Financial | 8216 | 29.8 |
NASDAQ OMX Group | 5846 | 4.2 |
American Express | 4564 | 53.0 |
CME Group Inc. | 4187 | 21.8 |
MetLife Inc. | 3923 | 36.0 |
Bank of New York Mellon Corp. | 3489 | 37.5 |
Simon Property Group Inc | 3216 | 26.4 |
E-Trade | 2717 | 3.5 |
BB&T Corporation | 2706 | 24.6 |
NYSE Euronext | 2419 | 8.2 |
State Street Corp. | 2368 | 21.46 |
Regions Financial Corp. | 2246 | 10.5 |
PNC Financial Services | 1973 | 36 |
SunTrust Banks | 1863 | 15.9 |
AFLAC Inc. | 1844 | 23.5 |
Fifth Third Bancorp | 1792 | 11.9 |
Capital One Financial | 1646 | 21.1 |
Ameriprise Financial Inc. | 1503 | 11.9 |
Allstate Corp. | 1470 | 17.8 |
Northern Trust Corp. | 1460 | 13.4 |
U.S. Bancorp | 1423 | 51. |
Its interesting to note that while there is some correlation between market cap and number of entities, its not as large as I might have expected. Some of this is obviously due to the coverage of various aspects of the financial crisis. It might take a little more investigation to understand why U.S. Bancorp and Northern Trust have similar degrees of coverage with a nearly four-fold difference in market cap.
In order to compare companies, I’m going to look at weekly entity instance counts for each of these companies between May of 2009 and May 0f 2010. For example, a portion of the Goldman and Citigroup coverage is included in the table below (ranging from the 19th week of 2009 to the 17th week of 2010)

You can easily see that the number of entities instances included from early 2009 is smaller than we see in 2010. This is due to the increase in the number of sources from which we are harvesting now. If we were trying to compare Goldman coverage in 2010 to Goldman coverage in 2009, we would need to normalize for this difference in harvesting sources. However, since I am interested in comparing companies, I need to use a different normalization approach. For my purposes today, I want to standardize the counts for each company. For each company, I’m going to subtract the mean and divide by the standard deviation of its weekly levels. This generates a standardized measure of entity instances for each company over time. I do this because I want to compare companies based on the pattern of news flow for each company rather than the volume.
I’m going to use a hierarchical clustering approach for this comparison. (Strictly speaking, I’m using the agnes algorithm from the clustering package in R with Euclidean distances and average linkage). The results of the clustering are seen below.

Companies are organized here by the similarity of the pattern of the news flow over the years worth of data I’ve included. The degree of similarity between two branches is related to the distance from the bottom of the graph. The closer the connection between branches is to the bottom, the more similar the news pattern of the companies is. Thus AIG and Prudential have more similarity than CME and MetLife. From here it would be interesting to look deeper into the data that drives both the expected and unexpected groupings.
This is just one approach to making this type of comparison. I might have used a different time grouping, different clustering approaches or even different comparison frameworks (e.g. Principal component analysis). I just wanted to take a first look at an approach and share it with interested observers.

No comments:
Post a Comment