Which storage architectures best support mixed OLTP and OLAP big data workloads?

Mixed OLTP and OLAP workloads require storage architectures that balance low-latency transactional updates with high-throughput analytical scans. Systems designed around hybrid transactional/analytical processing or HTAP aim to remove the traditional separation between OLTP and OLAP systems so that operational data can be queried analytically in near real time. Research by Michael Stonebraker at MIT and deployments such as SAP HANA led by Hasso Plattner at SAP demonstrate that combining in-memory techniques with careful layout choices yields significant benefits in throughput and latency. These approaches are relevant where businesses need immediate insights from live operational data, affecting decision speed, customer experience, and regulatory reporting.

Storage layout strategies

A common practical answer is a dual-format or hybrid row/column architecture that stores recent and frequently updated records in a row-oriented, write-optimized form while exposing a columnar, read-optimized format for analytical queries. This design reduces write amplification for OLTP while enabling vectorized scans for OLAP. Systems that implement this pattern also rely on background transformation or real-time replication between formats, which introduces complexity and potential consistency trade-offs noted in studies by Andy Pavlo at Carnegie Mellon University. Latency-sensitive services such as e-commerce carts or payment processing favor row layouts for point updates, whereas aggregate reporting benefits from columnar compression and predicate pushdown.

Underlying storage engines and consequences

Log-structured merge trees and append-only write paths are common choices for write-heavy mixed workloads because they separate writes from reads and enable efficient compaction; however, compaction can create variable I/O patterns that must be managed to avoid tail-latency spikes. In-memory primary storage paired with persistent columnar copies offers strong performance but increases memory footprint and costs, a material cultural and territorial consideration for organizations in regions where hardware costs or energy constraints are limiting. Choosing a unified storage architecture affects operational complexity, disaster recovery, and regulatory compliance: replication and snapshotting strategies must preserve transactional semantics while still enabling efficient analytical access.

Adopting hybrid storage is not a one-size-fits-all solution; it demands careful engineering of transaction isolation, background data movement, and resource isolation. When done correctly, as documented in academic and industrial work by researchers and practitioners at institutions such as MIT and SAP, hybrid architectures enable real-time analytics without sacrificing the responsiveness required by transactional systems.