In this episode of Beneficial Intelligence, I discuss data hoarding. Gathering too much data costs money and doesn't add value. We think we need all this data to train our AI, but hoarding data is the wrong place to start.
Using a counterproductive metaphor, some say that "data is the new oil." That is a dangerous metaphor with no less than four problems:
- First, data is not fungible like oil is. One barrel of oil is just as valuable as the next barrel. But one data record does not have the same value as another data record.
- Second, data hoarding shows diminishing returns. The value of 100 million barrels of oil is 100 times the value of 1 million barrels. But the value of 100 million transaction records is not 100 times the value of 1 million transaction records.
- Third, the process of refining data into valuable business insight is not repeatable. Anybody can build an oil refinery. That's just a question of money. But extracting value from data is more art than science, and even with the best data scientists, you might still not be able to extract any value from your data.
- Fourth, the value density in data is very low. Everything in a barrel of oil becomes a useful product. But most data records do not provide any business insight.
Gathering data in the hope of extracting value is putting the cart in front of the horse. The right way to work with data is to start with a business goal and a hypothesis about which data might provide insight. Gather the data, run the experiment and evaluate. Don't just hoard data.
Beneficial Intelligence is a bi-weekly podcast with stories and pragmatic advice for CIOs, CTOs, and other IT leaders. To get in touch, please contact me at firstname.lastname@example.org