Monday, January 9, 2017

January 2017 Data Update 1: The Promise and Perils of "Big Data"!

Each year, for the last 25 years, I have spent the first week playing Moneyball, with financial data. I gather accounting and market data on all publicly traded companies, listed globally, and then try to extract whatever lessons that I can from the data, to use in investing, corporate finance and valuation for the rest of the year. I report the data, classified by industry group and by country, on my website, in the hope that others might find it useful. While, like last year, I will be summarizing what I see in the data in a series of posts over the rest of January, I decided to use this one to both provide some perspective and cautionary notes not only on my data but on numbers, in general.

The Number Cruncher's Delusions
In an earlier post on narrative and numbers, I confessed that I am more naturally a number cruncher than a story teller and that I have learned through experience that focusing entirely on the numbers can lead you astray in valuation and investing. In fact, as you read my posts on what the numbers look like at the start of 2017, it is also worth noting that I am, like all number crunchers, susceptible to three delusions about data:
  1. Numbers are precise: I say, only half jokingly, that when a number cruncher is in doubt, his or her reaction is to add more decimals, in the hope that making a number look more precise will make it so. The truth is that numbers are only as precise as the process that delivers them and in business, that makes them imprecise. Thus, when you peruse the returns on capital or costs of capital that I will be estimating and reporting for both companies and industry groups, please do recognize that the former is an accounting number, where discretionary choices on expensing and depreciation can translate into big changes in returns on capital, and the latter is market number, making it not only a moving target (as interest rates and risk premiums change) but also a function of my estimation choices as well as estimation error in estimating risk premiums and risk parameters. 
  2. Numbers are objective: One of the resentments that number crunchers have about story tellers is that the latter indulge in flights of fancy and are unashamed about bringing their biases into their stories and through them into pricing and investing. The problem, though, is that numbers can be just as biased as stories, with the caveat that it is easier to hide biases with numbers. To give one example, one of the datasets that I will be updating has tax rates paid by US companies in 2016 and I provide three measures of effective tax rates, ranging from a simple average of effective tax rates across all companies in a sector, yielding the lowest values, to a weighted average effective tax rate that is computed only across money-making firms, which yields much higher values. If you are dead-set on making a case that US companies don't pay their fair share in taxes, you will report only the first number and not mention the rest, whereas if you want to show that US companies pay their fair share and more in taxes, you will go with the latter. It is for this reason that I will not claim to be unbiased (since no one is) but I will try to provide multiple measures of widely used variables and leave it to you to decide which one best fits your preconceptions. 
  3. Numbers put you in control: It is human nature to try to be in control and numbers serve us well, in that pursuit. As in other aspects of life, we seem to think that attaching a number to a volatile or uncontrollable variable brings it under control. So, at the risk of stating the obvious, let me say that measuring your return on invested capital is not going to turn bad projects into good ones, just as estimating your interest coverage ratio is not going to make it easier for you to make your interest payments. 
Don't get me wrong! I remain, at heart, a number cruncher but I have a more complicated, and healthier, relationship with data than I used to have. My faith in data has been tempered by my experiences with data, and especially so with the ease with which I have seen it bent to reflect the agenda of the user. I trust numbers, but only after I verify them, and I hope that you will do the same with the data that you find on my site.

A Big Data Skeptic
It is my experience with data that make me skeptical about two of the hottest concepts in business, big data and data analytics, at least as a basis for making money. It is true that companies are collecting more data than ever before on almost every aspect of our lives, with the intent of using that data to make more money off us. In a capitalist society, I remain doubtful that big data will be monetized, for three reasons.
  1. Data is not information: Not all data is created equal. Data that is based on what you do is worth a lot more than what you say will do; a tweet that you are bullish on Apple, Twitter or the entire market is less useful data than a record of you buying Apple, Twitter or the entire market. This is a point worth remembering as the rush is on to incorporate social media data (from Twitter and Facebook) with financial data to create super data bases. In addition, as we collect and store more data, it is worth noting that data is not information. In fact, if data analytics does its job, converting data to information will remain its focus, rather than generating neat looking graphs and obscure statistics. 
  2. If everyone has it (data), no one has it: For data to have value, you have to some degree of exclusivity in access to that data or a proprietary edge on processing that data. It is one of the reasons that investors have been unable, for the most part, to convert increased access to financial data into investing profits.
  3. Not all data is actionable: , To convert that data to profits, you need to be able to find a way to monetize whatever data edge you have acquired. For companies that offer products and services, this will take the form of modifying existing products/services or coming up with new products/services to what you have learned from the data.
As you look at these three factors, it is easy to see why Netflix and Amazon have become illustrative examples for the benefits of big data. They get to observe us (as consumers) in action, Amazon watching what we buy and Netflix observing what we watch on our devices, and that information is not only proprietary but can be used to not only modify product offerings but to also nudge us to act in ways that will be beneficial to the companies. By the same token, you can also see why using big data as an investing advantage will, at best, provide a transitory advantage, and why I feel no qualms about sharing my data. 

Data Details
If you choose to use any of my data, it behooves me to take you through the process by which I collect and analyze the data and offer some cautionary notes along the way. 
  1. Raw Data: The first step in the process is collecting the raw data and I am deeply thankful to the data services that allow me to do this. I use S&P Capital IQ, Bloomberg and a host of specialized services (Moody's, PRS etc.). For company-specific data, the only criteria that I use for including a company is that it has to have a non-zero market capitalization, yielding a total of 42678 firms on January 1, 2017. The data collected is as of January 1, 2017, with market data (stock prices, market capitalization and interest rates) being as of that data but accounting data reflecting the most recent twelve months (which would be through September 30, 2016 for calendar year companies). 
  2. Classification: I classify these companies first by geographic group into five groups - the United States, Japan, Developed Europe (including the EU and Switzerland), Emerging Markets (including Eastern Europe, Asia, Africa and Latin America) and Australia/New Zealand/Canada, a somewhat arbitrary grouping that I am stuck with because of history.
    I also classify firms into 96 industry groups, built loosely on raw service industrial grouping and SIC codes. The number of firms in each industry group, broken down further by geographic grouping, can be found at this link and you can find the companies in each industry grouping at this link.
  3. Key numbers: I generally don't report much macroeconomic data (interest rates, inflation, GDP growth etc.), since there are much better sources for the data, with my favorite remaining FRED (the Federal Reserve data site in St. Louis). I update equity risk premiums not only for the US but for much of the world at the start of every year and will update them again in July 2017. Using the company data, I report on dozens of metrics at the industry group and geographic levels on profitability, cost of capital, relative risk and valuation ratios and you can find the entire listing here.
  4. Computational details: One of the lessons that I have learned from wrestling with the data is that computing even simple statistics requires making choices, which, in turn, can be affected by your biases. Just to provide an example, to compute the PE ratio for US steel companies, I can take a simple average of the PE ratios of companies but that will not only weight tiny companies and very large companies equally but will also eliminate any companies that have negative earnings from my sample (causing bias in my estimates). To eliminate this problem, for most of the industry average statistics, I aggregate values across companies and then compute ratios. With the PE ratio for US steel companies, for instance, I aggregate the net income of all steel companies (including money-losing companies) and the market capitalizations for the same companies and then divide the former by the latter to get the PE ratio. Think of these averages then as weighted averages of all companies in each industry group, perhaps explaining why my numbers may be different from those reported by other services. 
  5. Reporting: I have wrestled with how best to report this data, so that you can find what you are looking for easily. I have not found the perfect template, but here is how you will find the data. For the current data (from January 2017), go to this link. You will see the data classified into risk, profitability, capital structure and dividend policy measures, reflecting my corporate finance focus, and then into pricing groups (earnings multiples, book value multiples and revenue multiples). I also keep archived data from prior years (going back to 1999) at this link. Unfortunately, since I have had to switch raw data providers multiple times in the last 20 years, the data is not perfectly comparable over time, as both industry groupings and data measures change over time. 
  6. Usage: There are two ways you can get the data. For the US data, I have html versions that you can see on your browser. For all of the data, I have excel spreadsheets that you can download for the data. I would strongly encourage you to use the latter rather than the former, since you can then manipulate and work with the data. If you have questions about any of the variables and how exactly I define them, try this link, where I summarize my computational details
In Closing
I am a one-man operation and I am sure that there are datasets that I have not updated or where you find missing pieces. If you find any of these, please let me know, and I will try to fix them. I also don't see myself as a raw data provider, especially on a real-time basis and on individual companies. So, I don't plan to update this data over the course of the year, partly because industry averages should not have dramatic changes over a few months and partly because I have other stuff that I would rather do.

YouTube Video


Data Links
  1. Current Data on my website
  2. Archived Data on my website

1 comment:

Unknown said...

One version of the efficient market hypothesis goes (or went) like this: an individual investor cannot beat the market because prices reflect all known information.

My problem with that sentiment is that I don't believe prices reflect all known information...prices reflect the interpretation of (i.e., insights into) all known information. And that is a pretty big difference.

All investors had all public knowledge of, say, Coca Cola back in the early 1990s. But it was Warren Buffet's insights that led him to understand the intrinsic value of Coca Cola and make the home run investment that he did.

Just so with your notion that if everyone has data no one has data. The difference is what someone does (or can do) with the data, and how they interpret the results.

T.E. Lawrence said that nine tenths of tactics are certain enough to be taught in books, but the irrational tenth is like the kingfisher flashing across the pool, and that is the test of generals.

It is the same with data. Nine tenths (or more) of the available data and the things people do with it will be predictable and mundane. But the one to watch for is the kingfisher that sees something no one else sees, who can act effectively on that insight.