90. Design Database with SQL Commands
Design Database with SQL Commands
Design and implement a
Database that supports a small SQL-like API:
INSERT/UPSERT,
SELECT with
WHERE and
ORDER BY, and
DELETE.
The storage engine should be inspired by an
LSM (Log Structured Merge) design:
- All writes go to an in-memory Memtable.
- When the memtable grows beyond a threshold, it is flushed into an immutable on-disk-like structure (simulate as in-memory) called an SSTable.
- DELETE is implemented using tombstones (a deletion marker). Data is not physically removed immediately.
- SELECT and DELETE WHERE must merge results across the memtable and all SSTables, and must respect tombstones.
- When merging versions of the same primary key, the version with the greatest timestamp ≤ logical current time wins. Tombstones also participate in this rule.
currentTimestamp is provided only for write/delete operations (put, deleteWhere) and is monotonically non-decreasing across those calls.
select(...) uses the latest timestamp seen so far from successful put/deleteWhere calls as the logical current time.
WHERE Clause Grammar
Support a limited WHERE clause of the form:
column operator value Where:
operator is one of: =, !=, <, <=, >, >=
value is a raw string without spaces (for simplicity), for example age>=18 or name=amy
- Comparison rules:
- All operators use string comparison using ASCII / lexicographic order.
- This rule is followed even if the value looks numeric, e.g.
age>=18.
- If the WHERE clause is malformed or references an invalid column, treat it as invalid.
- Later API call wins.
Methods
Constructor
LSMDatabase(int memtableMaxEntries)
memtableMaxEntries > 0
- When the memtable entry count becomes strictly greater than
memtableMaxEntries, flush (move) it into a new SSTable.
- Memtable entry count is based on distinct primary keys currently present in the memtable.
Create Table
boolean createTable(String tableName, String primaryKeyColumn, List<String> columns)
tableName is non-empty.
primaryKeyColumn must be included in columns.
columns contains unique non-empty column names.
- Column names have characters only from the set
[a-z], [0-9], and -.
- Return
false if the table already exists or schema is invalid; otherwise create it and return true.
Put (Insert / Upsert)
boolean put(String tableName, String primaryKey, List<String> columnValues, long currentTimestamp)
currentTimestamp is non-decreasing across all put/deleteWhere calls.
primaryKey is non-empty.
columnValues is a List<String> of assignments formatted as col=value.
- Assignments in a single call must use valid existing columns and must not repeat the same column.
- Assignments may omit some columns (partial update); unspecified columns keep their previous value only if the row is currently visible.
- Assume all column values are strings (no numeric/boolean typing).
- If a tombstone exists for the key, a later
put at a higher timestamp revives the row.
- If the row is not currently visible (e.g., tombstoned), unspecified columns become empty strings on insert/revival.
- Return
false if table does not exist or any assignment/column is invalid; otherwise apply and return true.
Read
List<String> readFromBothTables(String tableName, String primaryKey)
- Reads the latest raw row snapshot for
primaryKey from:
- Memtable side only (index 0)
- SSTable side only, merged across all SSTables (index 1)
- Returns exactly 2 strings in format
"column0Value,column1Value,...,columnNValue" in schema order from createTable().
- If a value for some column is missing, use empty string
"".
- If no row exists on that side, return empty string
"" for that side.
- If the latest entry on a side is a tombstone, return empty string
"" for that side.
- If table does not exist, return
List.of("", "").
Delete with WHERE (Tombstones)
List<String> deleteWhere(String tableName, String whereClause, long currentTimestamp)
- Find all rows visible at
currentTimestamp that match whereClause, then write tombstones for their primary keys.
- Returns the list of primary keys tombstoned arranged in ascending lexicographic order.
- If
whereClause is empty, delete all visible rows in the table (still using tombstones).
- If
tableName does not exist or whereClause is invalid, return List.of().
Select with WHERE and ORDER BY
List<String> select(String tableName, String whereClause, String orderByColumn, boolean ascending, int limit)
- Returns rows visible at the logical current time after merging memtable + SSTables and applying tombstones.
whereClause may be empty (no filter) or follow the limited grammar described above.
orderByColumn may be empty. If provided, it must be a valid column.
- If
orderByColumn is empty, rows must be returned in primary key ascending (lexicographic) order (deterministic default).
- If
limit == 0, return all matching rows; otherwise return up to limit rows.
- Each row is encoded as a single string using schema column order:
col1=value1,col2=value2,... (Missing values must be encoded as empty strings, e.g. age=).
- Ordering rules (when
orderByColumn is provided):
- Primary sort:
orderByColumn, ascending/descending as requested.
- Tie-breaker: primary key ascending (lexicographic).
- If
tableName does not exist, whereClause is invalid, or orderByColumn is invalid, return List.of().
Constraints
1 ≤ memtableMaxEntries ≤ 200
1 ≤ numberOfTables ≤ 10
1 ≤ columns.size() ≤ 50
0 ≤ operations ≤ 2000 (total calls to put/deleteWhere/select)
0 ≤ currentTimestamp ≤ 10^9
currentTimestamp is non-decreasing across all successful/failed put/deleteWhere calls
0 ≤ limit ≤ 200000
- String lengths are reasonable (e.g., ≤ 100 chars).
Examples
Example 1: Schema Validation + Put Failures + Flush + Merge + Partial Upsert + readFromBothTables + WHERE (=, !=, >=) + ORDER BY (asc/desc/default) + LIMIT + Invalid Select/Delete
LSMDatabase db = new LSMDatabase(memtableMaxEntries=2) db.createTable(tableName="users", primaryKeyColumn="id", columns=List.of("id","name","age","city")) -> true db.createTable(tableName="users", primaryKeyColumn="id", columns=List.of("id","name","age","city")) -> false db.createTable(tableName="bad1", primaryKeyColumn="id", columns=List.of("name","age")) -> false db.createTable(tableName="bad2", primaryKeyColumn="id", columns=List.of("id","name","name")) -> false db.createTable(tableName="bad3", primaryKeyColumn="id", columns=List.of("id","naMe")) -> false db.put(tableName="missing", primaryKey="x1", columnValues=List.of("name=na"), currentTimestamp=1) -> false db.put(tableName="users", primaryKey="u0", columnValues=List.of("name"), currentTimestamp=2) -> false db.put(tableName="users", primaryKey="u0", columnValues=List.of("bad=1"), currentTimestamp=3) -> false db.put(tableName="users", primaryKey="u1", columnValues=List.of("name=amy","age=2","city=delhi"), currentTimestamp=4) -> true db.put(tableName="users", primaryKey="u2", columnValues=List.of("name=bob","age=10","city=agra"), currentTimestamp=5) -> true db.put(tableName="users", primaryKey="u3", columnValues=List.of("name=zoe","age=18","city=delhi"), currentTimestamp=6) -> true
(Now memtable has 3 distinct keys which is > 2, so it flushes into SSTable #1; memtable becomes empty.)
db.put(tableName="users", primaryKey="u1", columnValues=List.of("age=20"), currentTimestamp=7) -> true
(Partial upsert: u1 already exists and is visible at timestamp 7, so name/city remain from older visible row.)
db.put(tableName="users", primaryKey="u4", columnValues=List.of("name=kim","age=2"), currentTimestamp=8) -> true
(u4 has no city assignment; missing values are stored/returned as empty string.)
db.readFromBothTables(tableName="users", primaryKey="u1") -> List.of("u1,amy,20,delhi", "u1,amy,2,delhi") db.readFromBothTables(tableName="users", primaryKey="u2") -> List.of("", "u2,bob,10,agra") db.readFromBothTables(tableName="users", primaryKey="u4") -> List.of("u4,kim,2,", "") db.readFromBothTables(tableName="users", primaryKey="ux") -> List.of("", "") db.readFromBothTables(tableName="missing", primaryKey="u1") -> List.of("", "") db.put(tableName="users", primaryKey="u5", columnValues=List.of("name=lea","age=2","city=mumbai"), currentTimestamp=9) -> true
(Memtable keys become u1,u4,u5 => count 3 > 2, so it flushes into SSTable #2; memtable becomes empty.)
db.select(tableName="users", whereClause="age>=18", orderByColumn="age", ascending=true, limit=3) -> List.of( "id=u3,name=zoe,age=18,city=delhi", "id=u4,name=kim,age=2,city=", "id=u5,name=lea,age=2,city=mumbai" )
Explanation:
select reads at logical current time = latest successful write/delete timestamp = 9.
- WHERE uses lexicographic string comparison, so
"2" >= "18" is true (because '2' > '1').
- Matches are
u1(20), u3(18), u4(2), u5(2). Ordering by age ascending (lexicographic) gives "18", "2", "2", "20".
- Tie-breaker for
age="2" is primary key ascending, so u4 then u5. limit=3 returns first three rows.
db.select(tableName="users", whereClause="name=amy", orderByColumn="id", ascending=true, limit=0) -> List.of("id=u1,name=amy,age=20,city=delhi") db.select(tableName="users", whereClause="name!=amy", orderByColumn="id", ascending=true, limit=2) -> List.of("id=u2,name=bob,age=10,city=agra", "id=u3,name=zoe,age=18,city=delhi") db.select(tableName="users", whereClause="city=delhi", orderByColumn="", ascending=true, limit=0) -> List.of("id=u1,name=amy,age=20,city=delhi", "id=u3,name=zoe,age=18,city=delhi")
(Deterministic default order when orderByColumn is empty: primary key ascending.)
db.select(tableName="users", whereClause="", orderByColumn="name", ascending=false, limit=2) -> List.of("id=u3,name=zoe,age=18,city=delhi", "id=u5,name=lea,age=2,city=mumbai")
(Covers descending ORDER BY.)
db.select(tableName="users", whereClause="age>>18", orderByColumn="age", ascending=true, limit=0) -> List.of() db.select(tableName="users", whereClause="", orderByColumn="unknown", ascending=true, limit=0) -> List.of() db.select(tableName="missing", whereClause="", orderByColumn="", ascending=true, limit=0) -> List.of() db.deleteWhere(tableName="users", whereClause="age>>18", currentTimestamp=10) -> List.of() db.deleteWhere(tableName="missing", whereClause="", currentTimestamp=11) -> List.of()
Example 2: Multiple SSTables + Remaining WHERE Operators (<, <=, >) + Tombstones + No-Match Delete + Delete-All + Revival After Tombstone + Partial Update After Revival
LSMDatabase db = new LSMDatabase(memtableMaxEntries=1) db.createTable(tableName="kv", primaryKeyColumn="k", columns=List.of("k","v","score","tag")) -> true db.put(tableName="kv", primaryKey="a", columnValues=List.of("v=cat","score=5","tag=blue"), currentTimestamp=100) -> true db.put(tableName="kv", primaryKey="b", columnValues=List.of("v=dog","score=10","tag=blue"), currentTimestamp=101) -> true
(Memtable has 2 keys which is > 1, so it flushes into SSTable #1; memtable becomes empty.)
db.put(tableName="kv", primaryKey="c", columnValues=List.of("v=eel","score=7","tag=red"), currentTimestamp=102) -> true db.put(tableName="kv", primaryKey="a", columnValues=List.of("score=6"), currentTimestamp=103) -> true
(Partial upsert on visible row a; memtable becomes 2 keys > 1, so it flushes into SSTable #2.)
db.select(tableName="kv", whereClause="score<=7", orderByColumn="k", ascending=true, limit=0) -> List.of( "k=a,v=cat,score=6,tag=blue", "k=b,v=dog,score=10,tag=blue", "k=c,v=eel,score=7,tag=red" )
Explanation: lexicographic compare means "10" <= "7" is true (because '1' < '7'), so b also matches.
db.select(tableName="kv", whereClause="v<eel", orderByColumn="v", ascending=true, limit=0) -> List.of("k=a,v=cat,score=6,tag=blue", "k=b,v=dog,score=10,tag=blue") db.deleteWhere(tableName="kv", whereClause="tag=blue", currentTimestamp=104) -> List.of("a","b") db.select(tableName="kv", whereClause="", orderByColumn="k", ascending=true, limit=0) -> List.of("k=c,v=eel,score=7,tag=red") db.deleteWhere(tableName="kv", whereClause="k=z", currentTimestamp=105) -> List.of() db.deleteWhere(tableName="kv", whereClause="", currentTimestamp=106) -> List.of("c") db.select(tableName="kv", whereClause="", orderByColumn="k", ascending=true, limit=0) -> List.of() db.put(tableName="kv", primaryKey="b", columnValues=List.of("v=bee"), currentTimestamp=107) -> true
(Revival after tombstone: b was not visible immediately before timestamp 107, so omitted columns are empty.)
db.select(tableName="kv", whereClause="", orderByColumn="k", ascending=true, limit=0) -> List.of("k=b,v=bee,score=,tag=") db.put(tableName="kv", primaryKey="b", columnValues=List.of("tag=yellow"), currentTimestamp=108) -> true
(Partial update after revival: b is visible now, so v stays bee, score remains empty, tag becomes yellow.)
db.select(tableName="kv", whereClause="", orderByColumn="k", ascending=true, limit=0) -> List.of("k=b,v=bee,score=,tag=yellow") db.put(tableName="kv", primaryKey="d", columnValues=List.of("v=car","score=1","tag=green"), currentTimestamp=109) -> true db.put(tableName="kv", primaryKey="e", columnValues=List.of("v=zzz","score=2","tag=green"), currentTimestamp=110) -> true db.select(tableName="kv", whereClause="v>car", orderByColumn="v", ascending=true, limit=0) -> List.of("k=e,v=zzz,score=2,tag=green") db.deleteWhere(tableName="kv", whereClause="", currentTimestamp=111) -> List.of("b","d","e")
(Empty WHERE deletes all currently visible rows; returned primary keys are sorted lexicographically.)