Excel - The Dirty Little Secret
In spite of ALL the hype in the data analytics space, the dirty little secret is that Excel remains the most powerful, widely adopted, BI tool there is in corporate America. Which entirely makes sense. It combines data storage, statistical functions, regular expressions, search, lookup, graphing/charting all in one package that has a very low learning curve. But as with most dirty little secrets, there are reasons it’s being kept…
Before we get to those reasons, a little more on Excel. Prior to the early 1980s, keeping track of data was even more difficult than it is today. Granted, there was much less of it, but what there was had to be manually tracked in ledgers and manipulated with complex equations on paper. Then in 1983 the world got Lotus 1-2-3 and in 1987, Excel 2.0 was launched for Windows, and the rest is history.
Jump to “The Internet” in the early 1990s - at least the World Wide Web part of the Internet that became universally available to anyone with a computer, a browser and a dial-up modem - and the groundswell of easily created and accessed digital data began. And as most of us know, the swell has only continued up until this very minute and will continue indefinitely. For perspective, the World Economic Forum estimates that by 2025 the amount of data generated each day will reach 463 exabytes (one exabyte is 1000 bytes to the sixth power). For more perspective, it’s estimated that all of the words ever spoken by humans is only 5 exabytes.
So back to Excel and the big issues that corporate America knowingly faces by continuing to use it as their go-to BI tool:
- Data today is just too varied and voluminous for Excel to be effective as an enterprise-grade tool. Spreadsheets contain finite rows, finite columns, finite characters-per-cell and finite links.
- Excel can be a memory hog! So when dealing with huge datasets the computational power required grows quickly and computers will grind to a halt before the limits of the spreadsheet.
- Spreadsheets have limited querying capabilities and often require reformatting the sheet itself (the data) in order to display results in a particular order.
- Data quality of spreadsheets can be terrible due to a lack of data validation, duplicate entries, sparse data sets, typos, and version control.
- Excel does not come with very robust tools and features to ensure secure and reliable data. For smaller collections of data, Excel works just fine, but the tool becomes riskier and less reliable with larger databases.
- Excel is suited for basic statistical analysis, such as linear regression - not complicated data science tasks.
To solve for Every One of these issues, enterprises should look for opportunities to ditch the spreadsheets and embrace a modern data supply chain that securely ingests, stores, transforms, and exports any variety and volume of data. This “data first” approach ensures that data is treated as an organizational asset, has a high degree of data quality, and can support a single version of the truth for data science, business intelligence, and self service data exploration and discovery. It’s time to get the Excel-skeleton out of the corporate closet and build a data supply chain infrastructure suitable for the realities of 2022.