88. Design Resumable Iterator for Large Dataset

Design Resumable Iterator for Large Dataset
Design and implement a resumable iterator for a large dataset. The iterator must be able to pause mid-traversal and later resume from the exact same position.

The dataset may be too large to fit in memory, so the iterator must be memory-efficient and must not require loading the entire dataset at once.

Additionally, the iterator should support serializing its state (cursor) so it can be persisted across sessions.

Key Definitions

  • Dataset: an ordered collection of records, identified by integer datasetId and integer datasetVersion.
  • Record: a single item in the dataset, represented as a String.
  • Cursor / State Token: a String that encodes the iterator’s resume position and safety metadata.
  • Position: a 0-indexed integer representing the index of the next record that would be returned by next().
  • Dataset Change: if dataset content changes between pause and resume, the iterator must detect it using datasetVersion.

Method Signatures

Constructor

ResumableIterator(int datasetId, List<String> records, int datasetVersion)
  • datasetId uniquely identifies the dataset.
  • records represents the dataset records (in real systems this could be streamed; in this in-memory problem it is provided as a list).
  • datasetVersion is used to detect dataset changes between pause and resume.

Iteration

boolean hasNext()
  • Returns true if there is at least one more record to read.
  • If the iterator is currently paused, it may still return true if records remain.
String next()
  • Returns the next record and advances the iterator by 1 position.
  • If there is no next record, returns an empty string "".
  • If the iterator is paused, returns an empty string "" (must call resume(...) to continue).

State / Resuming

int position()
  • Returns the current 0-indexed integer position of the next record (the index of the next record that next() would return).
  • Does not change iterator state (does not pause).
  • This value must be the same position that is embedded inside the token returned by pause().
String pause()
  • Pauses the iterator and returns a serialized cursor token.
  • After pausing, next() must return "" until resume(...) succeeds.
  • Calling pause() multiple times should be safe and return a token for the same position.
  • The returned token must be formatted exactly as:
    • datasetId=<int>,version=<int>,position=<int>
    • Example: datasetId=144,version=16,position=501
  • position inside the token is the 0-indexed index of the next record.
  • version inside the token refers to the dataset's current datasetVersion.
boolean resume(String stateToken)
  • Attempts to resume the iterator from the provided cursor token.
  • Returns true if resume succeeds, otherwise false.
  • The token must be parsed using the required format: datasetId=<int>,version=<int>,position=<int>.
  • Resume must fail (return false) if:
    • The token is invalid/corrupted and cannot be parsed into three integer fields.
    • The token belongs to a different datasetId.
    • The token’s version does not match the current datasetVersion (dataset changed).
    • The token’s position points outside 0..records.size().
  • If resume succeeds, the iterator becomes unpaused and continues from that position.

Constraints

  • 1 ≤ records.size() ≤ 10^7 (conceptually large; your solution must not copy or buffer all records again).
  • 0 ≤ position ≤ records.size() (position is 0-indexed and refers to the next record).
  • 1 ≤ datasetId ≤ 10^9.
  • 1 ≤ datasetVersion ≤ 10^9.
  • 0 ≤ record.length() ≤ 10^4.
  • The iterator’s extra memory (excluding the provided dataset list) should be O(1).
  • Cursor token is a String but must follow the exact formatting rules described above.

Notes on Dataset Changes

  • Your cursor token must include enough information to detect changes safely (at minimum: datasetId, version, and position).
  • If the dataset changes between pause and resume, do not attempt to “best-effort” resume; return false.
  • Resuming at the end (position equals records.size()) is valid:
    • hasNext() returns false.
    • next() returns "".

Examples

Example 1: Basic iteration, position tracking, pause

  • ResumableIterator it = new ResumableIterator(datasetId=1, records=List.of("A","B","C","D"), datasetVersion=10)
  • it.hasNext()true
  • it.position()0
  • it.next()"A"
  • it.position()1
  • it.pause()"datasetId=1,version=10,position=1"
  • it.next()"" (paused; must resume to continue)

Example 2: Resume later using saved state token

  • String token = "datasetId=1,version=10,position=1"
  • ResumableIterator it2 = new ResumableIterator(datasetId=1, records=List.of("A","B","C","D"), datasetVersion=10)
  • it2.resume(stateToken=token)true
  • it2.next()"B"
  • it2.next()"C"
  • it2.position()3

Example 3: Resume fails when dataset version changes

  • String oldToken = "datasetId=1,version=10,position=2"
  • ResumableIterator it3 = new ResumableIterator(datasetId=1, records=List.of("A","B","X","C","D"), datasetVersion=11)
  • it3.resume(stateToken=oldToken)false (version mismatch: 10 vs 11)
  • it3.position()0 (unchanged because resume failed)
  • it3.next()"A"

Example 4: Resume at end of dataset

  • ResumableIterator it4 = new ResumableIterator(datasetId=2, records=List.of("P","Q"), datasetVersion=7)
  • it4.next()"P"
  • it4.next()"Q"
  • it4.hasNext()false
  • it4.position()2
  • String endToken = it4.pause()"datasetId=2,version=7,position=2"
  • ResumableIterator it5 = new ResumableIterator(datasetId=2, records=List.of("P","Q"), datasetVersion=7)
  • it5.resume(stateToken=endToken)true
  • it5.hasNext()false
  • it5.next()""
  • it5.position()2

Example 5: Invalid token (corrupted / wrong dataset)

  • ResumableIterator it6 = new ResumableIterator(datasetId=1, records=List.of("A","B","C"), datasetVersion=10)
  • it6.resume(stateToken="not-a-valid-token")false
  • it6.resume(stateToken="datasetId=999,version=10,position=1")false (datasetId mismatch)
  • it6.resume(stateToken="datasetId=1,version=10,position=999999")false (position out of bounds)
  • it6.position()0




Please use Laptop/Desktop or any other large screen to add/edit code.