90. Design Database with SQL Commands

Design Database with SQL Commands
Design and implement a Database that supports a small SQL-like API: INSERT/UPSERT, SELECT with WHERE and ORDER BY, and DELETE.

The storage engine should be inspired by an LSM (Log Structured Merge) design:
  • All writes go to an in-memory Memtable.
  • When the memtable grows beyond a threshold, it is flushed into an immutable on-disk-like structure (simulate as in-memory) called an SSTable.
  • DELETE is implemented using tombstones (a deletion marker). Data is not physically removed immediately.
  • SELECT and DELETE WHERE must merge results across the memtable and all SSTables, and must respect tombstones.
  • When merging versions of the same primary key, the version with the greatest timestamp ≤ logical current time wins. Tombstones also participate in this rule.
  • currentTimestamp is provided only for write/delete operations (put, deleteWhere) and is monotonically non-decreasing across those calls.
  • select(...) uses the latest timestamp seen so far from successful put/deleteWhere calls as the logical current time.

WHERE Clause Grammar

Support a limited WHERE clause of the form:
column operator value

Where:
  • operator is one of: =, !=, <, <=, >, >=
  • value is a raw string without spaces (for simplicity), for example age>=18 or name=amy
  • Comparison rules:
    • All operators use string comparison using ASCII / lexicographic order.
    • This rule is followed even if the value looks numeric, e.g. age>=18.
  • If the WHERE clause is malformed or references an invalid column, treat it as invalid.
  • Later API call wins.

Methods

Constructor

LSMDatabase(int memtableMaxEntries)
  • memtableMaxEntries > 0
  • When the memtable entry count becomes strictly greater than memtableMaxEntries, flush (move) it into a new SSTable.
  • Memtable entry count is based on distinct primary keys currently present in the memtable.

Create Table

boolean createTable(String tableName, String primaryKeyColumn, List<String> columns)
  • tableName is non-empty.
  • primaryKeyColumn must be included in columns.
  • columns contains unique non-empty column names.
  • Column names have characters only from the set [a-z], [0-9], and -.
  • Return false if the table already exists or schema is invalid; otherwise create it and return true.

Put (Insert / Upsert)

boolean put(String tableName, String primaryKey, List<String> columnValues, long currentTimestamp)
  • currentTimestamp is non-decreasing across all put/deleteWhere calls.
  • primaryKey is non-empty.
  • columnValues is a List<String> of assignments formatted as col=value.
  • Assignments in a single call must use valid existing columns and must not repeat the same column.
  • Assignments may omit some columns (partial update); unspecified columns keep their previous value only if the row is currently visible.
  • Assume all column values are strings (no numeric/boolean typing).
  • If a tombstone exists for the key, a later put at a higher timestamp revives the row.
  • If the row is not currently visible (e.g., tombstoned), unspecified columns become empty strings on insert/revival.
  • Return false if table does not exist or any assignment/column is invalid; otherwise apply and return true.

Read

List<String> readFromBothTables(String tableName, String primaryKey)
  • Reads the latest raw row snapshot for primaryKey from:
    • Memtable side only (index 0)
    • SSTable side only, merged across all SSTables (index 1)
  • Returns exactly 2 strings in format "column0Value,column1Value,...,columnNValue" in schema order from createTable().
  • If a value for some column is missing, use empty string "".
  • If no row exists on that side, return empty string "" for that side.
  • If the latest entry on a side is a tombstone, return empty string "" for that side.
  • If table does not exist, return List.of("", "").

Delete with WHERE (Tombstones)

List<String> deleteWhere(String tableName, String whereClause, long currentTimestamp)
  • Find all rows visible at currentTimestamp that match whereClause, then write tombstones for their primary keys.
  • Returns the list of primary keys tombstoned arranged in ascending lexicographic order.
  • If whereClause is empty, delete all visible rows in the table (still using tombstones).
  • If tableName does not exist or whereClause is invalid, return List.of().

Select with WHERE and ORDER BY

List<String> select(String tableName, String whereClause, String orderByColumn, boolean ascending, int limit)
  • Returns rows visible at the logical current time after merging memtable + SSTables and applying tombstones.
  • whereClause may be empty (no filter) or follow the limited grammar described above.
  • orderByColumn may be empty. If provided, it must be a valid column.
  • If orderByColumn is empty, rows must be returned in primary key ascending (lexicographic) order (deterministic default).
  • If limit == 0, return all matching rows; otherwise return up to limit rows.
  • Each row is encoded as a single string using schema column order: col1=value1,col2=value2,... (Missing values must be encoded as empty strings, e.g. age=).
  • Ordering rules (when orderByColumn is provided):
    • Primary sort: orderByColumn, ascending/descending as requested.
    • Tie-breaker: primary key ascending (lexicographic).
  • If tableName does not exist, whereClause is invalid, or orderByColumn is invalid, return List.of().

Constraints

  • 1 ≤ memtableMaxEntries ≤ 200
  • 1 ≤ numberOfTables ≤ 10
  • 1 ≤ columns.size() ≤ 50
  • 0 ≤ operations ≤ 2000 (total calls to put/deleteWhere/select)
  • 0 ≤ currentTimestamp ≤ 10^9
  • currentTimestamp is non-decreasing across all successful/failed put/deleteWhere calls
  • 0 ≤ limit ≤ 200000
  • String lengths are reasonable (e.g., ≤ 100 chars).

Examples

Example 1: Schema Validation + Put Failures + Flush + Merge + Partial Upsert + readFromBothTables + WHERE (=, !=, >=) + ORDER BY (asc/desc/default) + LIMIT + Invalid Select/Delete

LSMDatabase db = new LSMDatabase(memtableMaxEntries=2)
db.createTable(tableName="users", primaryKeyColumn="id", columns=List.of("id","name","age","city")) -> true
db.createTable(tableName="users", primaryKeyColumn="id", columns=List.of("id","name","age","city")) -> false
db.createTable(tableName="bad1", primaryKeyColumn="id", columns=List.of("name","age")) -> false
db.createTable(tableName="bad2", primaryKeyColumn="id", columns=List.of("id","name","name")) -> false
db.createTable(tableName="bad3", primaryKeyColumn="id", columns=List.of("id","naMe")) -> false

db.put(tableName="missing", primaryKey="x1", columnValues=List.of("name=na"), currentTimestamp=1) -> false
db.put(tableName="users", primaryKey="u0", columnValues=List.of("name"), currentTimestamp=2) -> false
db.put(tableName="users", primaryKey="u0", columnValues=List.of("bad=1"), currentTimestamp=3) -> false

db.put(tableName="users", primaryKey="u1", columnValues=List.of("name=amy","age=2","city=delhi"), currentTimestamp=4) -> true
db.put(tableName="users", primaryKey="u2", columnValues=List.of("name=bob","age=10","city=agra"), currentTimestamp=5) -> true
db.put(tableName="users", primaryKey="u3", columnValues=List.of("name=zoe","age=18","city=delhi"), currentTimestamp=6) -> true
(Now memtable has 3 distinct keys which is > 2, so it flushes into SSTable #1; memtable becomes empty.)

db.put(tableName="users", primaryKey="u1", columnValues=List.of("age=20"), currentTimestamp=7) -> true
(Partial upsert: u1 already exists and is visible at timestamp 7, so name/city remain from older visible row.)

db.put(tableName="users", primaryKey="u4", columnValues=List.of("name=kim","age=2"), currentTimestamp=8) -> true
(u4 has no city assignment; missing values are stored/returned as empty string.)


db.readFromBothTables(tableName="users", primaryKey="u1") -> List.of("u1,amy,20,delhi", "u1,amy,2,delhi")
db.readFromBothTables(tableName="users", primaryKey="u2") -> List.of("", "u2,bob,10,agra")
db.readFromBothTables(tableName="users", primaryKey="u4") -> List.of("u4,kim,2,", "")
db.readFromBothTables(tableName="users", primaryKey="ux") -> List.of("", "")
db.readFromBothTables(tableName="missing", primaryKey="u1") -> List.of("", "")

db.put(tableName="users", primaryKey="u5", columnValues=List.of("name=lea","age=2","city=mumbai"), currentTimestamp=9) -> true
(Memtable keys become u1,u4,u5 => count 3 > 2, so it flushes into SSTable #2; memtable becomes empty.)


db.select(tableName="users", whereClause="age>=18", orderByColumn="age", ascending=true, limit=3)
-> List.of( "id=u3,name=zoe,age=18,city=delhi", "id=u4,name=kim,age=2,city=", "id=u5,name=lea,age=2,city=mumbai" )
Explanation:
  • select reads at logical current time = latest successful write/delete timestamp = 9.
  • WHERE uses lexicographic string comparison, so "2" >= "18" is true (because '2' > '1').
  • Matches are u1(20), u3(18), u4(2), u5(2). Ordering by age ascending (lexicographic) gives "18", "2", "2", "20".
  • Tie-breaker for age="2" is primary key ascending, so u4 then u5. limit=3 returns first three rows.

db.select(tableName="users", whereClause="name=amy", orderByColumn="id", ascending=true, limit=0)
-> List.of("id=u1,name=amy,age=20,city=delhi")

db.select(tableName="users", whereClause="name!=amy", orderByColumn="id", ascending=true, limit=2)
-> List.of("id=u2,name=bob,age=10,city=agra", "id=u3,name=zoe,age=18,city=delhi")

db.select(tableName="users", whereClause="city=delhi", orderByColumn="", ascending=true, limit=0)
-> List.of("id=u1,name=amy,age=20,city=delhi", "id=u3,name=zoe,age=18,city=delhi")
(Deterministic default order when orderByColumn is empty: primary key ascending.)


db.select(tableName="users", whereClause="", orderByColumn="name", ascending=false, limit=2)
-> List.of("id=u3,name=zoe,age=18,city=delhi", "id=u5,name=lea,age=2,city=mumbai")
(Covers descending ORDER BY.)


db.select(tableName="users", whereClause="age>>18", orderByColumn="age", ascending=true, limit=0) -> List.of()
db.select(tableName="users", whereClause="", orderByColumn="unknown", ascending=true, limit=0) -> List.of()
db.select(tableName="missing", whereClause="", orderByColumn="", ascending=true, limit=0) -> List.of()
db.deleteWhere(tableName="users", whereClause="age>>18", currentTimestamp=10) -> List.of()
db.deleteWhere(tableName="missing", whereClause="", currentTimestamp=11) -> List.of()

Example 2: Multiple SSTables + Remaining WHERE Operators (<, <=, >) + Tombstones + No-Match Delete + Delete-All + Revival After Tombstone + Partial Update After Revival

LSMDatabase db = new LSMDatabase(memtableMaxEntries=1)
db.createTable(tableName="kv", primaryKeyColumn="k", columns=List.of("k","v","score","tag")) -> true

db.put(tableName="kv", primaryKey="a", columnValues=List.of("v=cat","score=5","tag=blue"), currentTimestamp=100) -> true
db.put(tableName="kv", primaryKey="b", columnValues=List.of("v=dog","score=10","tag=blue"), currentTimestamp=101) -> true
(Memtable has 2 keys which is > 1, so it flushes into SSTable #1; memtable becomes empty.)

db.put(tableName="kv", primaryKey="c", columnValues=List.of("v=eel","score=7","tag=red"), currentTimestamp=102) -> true
db.put(tableName="kv", primaryKey="a", columnValues=List.of("score=6"), currentTimestamp=103) -> true
(Partial upsert on visible row a; memtable becomes 2 keys > 1, so it flushes into SSTable #2.)


db.select(tableName="kv", whereClause="score<=7", orderByColumn="k", ascending=true, limit=0)
-> List.of( "k=a,v=cat,score=6,tag=blue", "k=b,v=dog,score=10,tag=blue", "k=c,v=eel,score=7,tag=red" )
Explanation: lexicographic compare means "10" <= "7" is true (because '1' < '7'), so b also matches.


db.select(tableName="kv", whereClause="v<eel", orderByColumn="v", ascending=true, limit=0)
-> List.of("k=a,v=cat,score=6,tag=blue", "k=b,v=dog,score=10,tag=blue")

db.deleteWhere(tableName="kv", whereClause="tag=blue", currentTimestamp=104) -> List.of("a","b")
db.select(tableName="kv", whereClause="", orderByColumn="k", ascending=true, limit=0)
-> List.of("k=c,v=eel,score=7,tag=red")

db.deleteWhere(tableName="kv", whereClause="k=z", currentTimestamp=105) -> List.of()
db.deleteWhere(tableName="kv", whereClause="", currentTimestamp=106) -> List.of("c")
db.select(tableName="kv", whereClause="", orderByColumn="k", ascending=true, limit=0) -> List.of()

db.put(tableName="kv", primaryKey="b", columnValues=List.of("v=bee"), currentTimestamp=107) -> true
(Revival after tombstone: b was not visible immediately before timestamp 107, so omitted columns are empty.)

db.select(tableName="kv", whereClause="", orderByColumn="k", ascending=true, limit=0)
-> List.of("k=b,v=bee,score=,tag=")

db.put(tableName="kv", primaryKey="b", columnValues=List.of("tag=yellow"), currentTimestamp=108) -> true
(Partial update after revival: b is visible now, so v stays bee, score remains empty, tag becomes yellow.)

db.select(tableName="kv", whereClause="", orderByColumn="k", ascending=true, limit=0)
-> List.of("k=b,v=bee,score=,tag=yellow")

db.put(tableName="kv", primaryKey="d", columnValues=List.of("v=car","score=1","tag=green"), currentTimestamp=109) -> true
db.put(tableName="kv", primaryKey="e", columnValues=List.of("v=zzz","score=2","tag=green"), currentTimestamp=110) -> true

db.select(tableName="kv", whereClause="v>car", orderByColumn="v", ascending=true, limit=0)
-> List.of("k=e,v=zzz,score=2,tag=green")

db.deleteWhere(tableName="kv", whereClause="", currentTimestamp=111) -> List.of("b","d","e")
(Empty WHERE deletes all currently visible rows; returned primary keys are sorted lexicographically.)




Please use Laptop/Desktop or any other large screen to add/edit code.