Choosing the Right Data Type in SQL Is More Important Than You Think
by Arif Ikhsanudin, Backend Developer
The type choice you'll regret
You're building a financial application. You store monetary amounts as FLOAT. Everything works in development. In production, a settlement report shows $10,000.00 becoming $9,999.999999998 after a series of calculations. You've lost money — literally — due to floating-point rounding. Or you store timestamps as VARCHAR to "keep it flexible," and three months later your ORDER BY created_at returns chronologically wrong results because '2024-10-01' sorts before '2024-09-15' lexicographically... except it doesn't, so the bug shows up in a different comparison involving '2024-1-15' vs '2024-09-01'.
Data types are not just a storage detail. They define what operations are valid, what constraints are enforceable, and what the database will do when you ask it to compare, sort, or aggregate your data.
Numbers: the float trap
FLOAT and DOUBLE PRECISION are binary floating-point types (IEEE 754). They cannot exactly represent most decimal fractions. 0.1 + 0.2 is not 0.3 in binary floating-point — it's 0.30000000000000004.
For monetary values, rates, and anything where decimal precision matters, use DECIMAL(precision, scale) (also called NUMERIC):
-- Wrong: binary float, imprecise for currency
amount FLOAT
-- Wrong: integer cents works but loses semantic clarity and causes errors at boundaries
amount_cents BIGINT
-- Right: exact decimal arithmetic
amount DECIMAL(19, 4) -- 15 integer digits, 4 decimal places
DECIMAL(19, 4) handles amounts up to $999,999,999,999,999 with 4 decimal places — sufficient for most financial use cases including multi-currency with fractional conversions.
FLOAT is appropriate for scientific measurements, ML feature values, and other contexts where you care about magnitude and relative precision rather than exact decimal representation.
Timestamps: always use time zones
-- PostgreSQL: TIMESTAMPTZ stores UTC, displays in session time zone
-- TIMESTAMP stores whatever you give it, no time zone info
created_at TIMESTAMP -- ambiguous: what time zone?
created_at TIMESTAMPTZ -- unambiguous: always UTC internally
In PostgreSQL, TIMESTAMPTZ (timestamp with time zone) stores the moment in UTC and converts to the session time zone on display. TIMESTAMP has no time zone information — it stores whatever you insert. If your application servers are in different time zones, or if you ever change your server's time zone, TIMESTAMP columns will silently contain wrong data.
Use TIMESTAMPTZ for all event timestamps. Use DATE for dates where time of day is irrelevant (birthdays, invoice dates). Never store timestamps as VARCHAR or Unix epoch integers unless you have a very specific reason.
Text: size matters
TEXT (or CLOB) in PostgreSQL is variable-length with no practical size limit. VARCHAR(n) enforces a maximum length. CHAR(n) pads to exactly n characters.
-- Use TEXT when length is genuinely unbounded
body TEXT,
notes TEXT
-- Use VARCHAR(n) when length is a business rule
email VARCHAR(255), -- RFC 5321 limits to 254 characters
country_code CHAR(2), -- ISO 3166-1 alpha-2: always exactly 2 chars
postal_code VARCHAR(20) -- Variable by country, but bounded
CHAR(n) is almost always wrong — it pads with spaces, wastes storage for shorter values, and causes surprising equality comparison behavior. Use VARCHAR(n) for fixed-domain strings and TEXT for unbounded ones.
For emails specifically: store lowercased (case-insensitive comparison is expensive if not normalized at write time) and add a UNIQUE constraint.
Booleans vs status enums
-- Boolean: simple, but can't express a third state cleanly
is_active BOOLEAN
-- Enum: explicit set of valid states
status VARCHAR(20) CHECK (status IN ('draft', 'active', 'suspended', 'deleted'))
PostgreSQL supports native ENUM types, but VARCHAR with a CHECK constraint is more portable and easier to migrate (adding an enum value in PostgreSQL requires ALTER TYPE, which can cause issues in some migration tools). MySQL's ENUM has different semantics and storage characteristics.
The rule: use BOOLEAN for genuinely binary state. Use a constrained VARCHAR for anything with more than two states. Don't use TINYINT(1) as a boolean — it's a MySQL-ism that leaks into ORMs and confuses everyone.
Integer sizes
-- SMALLINT: 2 bytes, -32768 to 32767
-- INT / INTEGER: 4 bytes, -2.1B to 2.1B
-- BIGINT: 8 bytes, -9.2Q to 9.2Q
For primary keys: use BIGINT. The 4-byte INT limit (2.1 billion) sounds large but high-volume tables hit it. The storage cost difference (4 bytes per row) is trivial at scale. SERIAL and BIGSERIAL in PostgreSQL, AUTO_INCREMENT on BIGINT in MySQL.
For foreign keys: match exactly the type of the referenced primary key. A FK mismatch causes implicit casts that defeat indexes.
For small-domain counts or status codes: SMALLINT is appropriate and signals to developers that the value range is intentionally limited.
JSON: JSONB vs JSON in PostgreSQL
JSON stores the raw text. JSONB stores a parsed binary representation that supports indexing, key-existence operators, and path queries.
-- JSON: stores text, can't be indexed (except full-text)
metadata JSON
-- JSONB: parsed, indexable, supports @>, ?, #> operators
metadata JSONB
-- GIN index on JSONB for key/value queries
CREATE INDEX idx_products_metadata ON products USING GIN (metadata);
-- Query: find products where metadata contains a specific key-value pair
SELECT id FROM products WHERE metadata @> '{"category": "electronics"}';
Use JSONB unless you specifically need to preserve JSON key order or whitespace (almost never). The write cost of JSONB parsing is negligible for most workloads.
The type decision rule
For each column, ask: what is the actual domain of this data? What operations need to be correct (arithmetic, comparison, sorting)? What constraints should the database enforce? The type that accurately represents the domain and supports the needed operations is the right choice — not the most convenient one.