About This Talk
At SmartData 2020, I participated as an expert panelist evaluating big data storage technologies. The session provided a structured overview of the storage landscape — helping practitioners navigate the growing number of options and make informed technology choices.
Key Ideas
No Silver Bullet — The fundamental principle: no single storage technology is universally optimal. Each technology excels in specific scenarios and struggles in others. The goal is matching technology to requirements, not finding the “best” database.
Technology Categories — The panel covered: traditional RDBMS (PostgreSQL, MySQL) for transactional workloads with analytical extensions; columnar OLAP engines (ClickHouse, Vertica) for fast analytical queries; MPP warehouses (Greenplum, Redshift) for complex transformations at scale; cloud-native platforms (Snowflake, BigQuery) for managed analytics; and distributed systems (Cassandra, HBase) for extreme scale with specific access patterns.
Decision Framework — Start with query patterns and latency requirements (OLTP vs. OLAP, real-time vs. batch). Filter by data volume (gigabytes vs. terabytes vs. petabytes). Then evaluate operational complexity (managed vs. self-hosted). Finally, consider cost structure (fixed vs. consumption-based).
Why It Matters
The storage technology landscape is overwhelming. This framework helps cut through the marketing noise and make decisions based on actual technical requirements rather than vendor claims or industry trends.