Choosing a Database Based on Hype Is How Systems Fall Apart
by Arif Ikhsanudin, Backend Developer
The Pattern Is Consistent
MongoDB had its moment — "web scale," schema flexibility, JSON documents. Teams migrated from PostgreSQL to MongoDB for applications that were fundamentally relational, discovered that referential integrity, transactions, and complex queries were harder to implement correctly, and spent months adding back the consistency guarantees they had given up. Some migrated back.
Cassandra had its moment — "massive write throughput, linear horizontal scale." Teams adopted it for applications that wrote thousands of records per day, not millions per second, and discovered they had added significant operational complexity for a problem they did not have.
CockroachDB, PlanetScale, Fauna, SurrealDB — each has had or will have its moment. The adoption cycle is consistent: conference talk demonstrates impressive benchmark, blog post explains why the new engine is the future, teams adopt it for new projects without fully understanding the trade-offs, operational problems emerge that were not visible in the benchmark conditions.
Why Benchmarks Mislead
Database benchmarks are optimized for the conditions that make the benchmarked database look best. A benchmark demonstrating Cassandra's write throughput tests at high write concurrency with simple key-value writes and eventual consistency. It does not test complex queries, secondary index performance under high read concurrency, or operational behavior during a node failure with coordinator-heavy workloads.
Real production workloads have messy access patterns. They have a mix of reads and writes. They have queries that were not anticipated at schema design time. They have operational requirements — backup, point-in-time recovery, monitoring, slow query analysis — that vary significantly across database engines.
# What benchmarks show vs what production reveals:
Benchmark conditions:
- Uniform key distribution
- Controlled concurrent clients
- Single table / simple schema
- No operational events (node failures, compaction)
- Measured at peak throughput
Production conditions:
- Hot keys (popular items get disproportionate traffic)
- Unpredictable concurrency spikes
- Complex schemas with joins and secondary indexes
- Operational events that impact performance
- Measured at p95/p99, not average
The Criteria for Database Selection
Data model fit. Does the database's native data model match your data? Relational for structured data with relationships and ad-hoc query needs. Key-value for high-throughput point lookups. Wide-column for time-series or access patterns partitioned by a single key. Document for genuinely schema-variable data. Using a document database for relational data or a key-value store for data requiring complex queries creates friction throughout the application.
Consistency model fit. What consistency guarantees do your transactions require? If you are handling money, reservations, or anything where double-writes or dirty reads cause real problems, you need serializable isolation or equivalent guarantees. Many NoSQL systems trade consistency for availability and throughput. That trade is fine if you understand it and your use case tolerates it. It is catastrophic if you discover it after the fact.
Operational fit. Can your team run this in production? What does backup and recovery look like? What does a node failure look like? What monitoring exists? PostgreSQL has decades of operational tooling, documentation, and community knowledge. A newer engine may have better performance characteristics and far less operational tooling. Factor in the cost of building that operational knowledge — and the cost of incidents while building it.
Support and community. What happens when you hit an obscure bug? PostgreSQL has a large community, well-documented behavior, and decades of bug reports. A new database engine may have a limited community, incomplete documentation, and bugs that have not been encountered yet.
The Default That Is Usually Correct
PostgreSQL for primary data storage. Redis for caching and session storage. A specialized store only when PostgreSQL demonstrably does not fit the access pattern and the operational cost is justified.
This default handles the majority of real-world application data requirements. It is boring. Boring is good for databases. The excitement should be in the product, not the database engine.