88. Design Resumable Iterator for Large Dataset

Design Resumable Iterator for Large Dataset

Design and implement a resumable iterator for a large dataset. The iterator must be able to pause mid-traversal and later resume from the exact same position.

The dataset may be too large to fit in memory, so the iterator must be memory-efficient and must not require loading the entire dataset at once.

Additionally, the iterator should support serializing its state (cursor) so it can be persisted across sessions.

Key Definitions

Dataset: an ordered collection of records, identified by integer datasetId and integer datasetVersion.
Record: a single item in the dataset, represented as a String.
Cursor / State Token: a String that encodes the iterator’s resume position and safety metadata.
Position: a 0-indexed integer representing the index of the next record that would be returned by next().
Dataset Change: if dataset content changes between pause and resume, the iterator must detect it using datasetVersion.

Method Signatures

Constructor

ResumableIterator(int datasetId, List<String> records, int datasetVersion)

datasetId uniquely identifies the dataset.
records represents the dataset records (in real systems this could be streamed; in this in-memory problem it is provided as a list).
datasetVersion is used to detect dataset changes between pause and resume.

Iteration

boolean hasNext()

Returns true if there is at least one more record to read.
If the iterator is currently paused, it may still return true if records remain.

String next()

Returns the next record and advances the iterator by 1 position.
If there is no next record, returns an empty string "".
If the iterator is paused, returns an empty string "" (must call resume(...) to continue).

State / Resuming

int position()

Returns the current 0-indexed integer position of the next record (the index of the next record that next() would return).
Does not change iterator state (does not pause).
This value must be the same position that is embedded inside the token returned by pause().

String pause()

Pauses the iterator and returns a serialized cursor token.
After pausing, next() must return "" until resume(...) succeeds.
Calling pause() multiple times should be safe and return a token for the same position.
The returned token must be formatted exactly as:
- datasetId=<int>,version=<int>,position=<int>
- Example: datasetId=144,version=16,position=501
position inside the token is the 0-indexed index of the next record.
version inside the token refers to the dataset's current datasetVersion.

boolean resume(String stateToken)

Attempts to resume the iterator from the provided cursor token.
Returns true if resume succeeds, otherwise false.
The token must be parsed using the required format: datasetId=<int>,version=<int>,position=<int>.
Resume must fail (return false) if:
- The token is invalid/corrupted and cannot be parsed into three integer fields.
- The token belongs to a different datasetId.
- The token’s version does not match the current datasetVersion (dataset changed).
- The token’s position points outside 0..records.size().
If resume succeeds, the iterator becomes unpaused and continues from that position.

Constraints

1 ≤ records.size() ≤ 10^7 (conceptually large; your solution must not copy or buffer all records again).
0 ≤ position ≤ records.size() (position is 0-indexed and refers to the next record).
1 ≤ datasetId ≤ 10^9.
1 ≤ datasetVersion ≤ 10^9.
0 ≤ record.length() ≤ 10^4.
The iterator’s extra memory (excluding the provided dataset list) should be O(1).
Cursor token is a String but must follow the exact formatting rules described above.

Notes on Dataset Changes

Your cursor token must include enough information to detect changes safely (at minimum: datasetId, version, and position).
If the dataset changes between pause and resume, do not attempt to “best-effort” resume; return false.
Resuming at the end (position equals records.size()) is valid:
- hasNext() returns false.
- next() returns "".

Examples

Example 1: Basic iteration, position tracking, pause

ResumableIterator it = new ResumableIterator(datasetId=1, records=List.of("A","B","C","D"), datasetVersion=10)
it.hasNext() → true
it.position() → 0
it.next() → "A"
it.position() → 1
it.pause() → "datasetId=1,version=10,position=1"
it.next() → "" (paused; must resume to continue)

Example 2: Resume later using saved state token

String token = "datasetId=1,version=10,position=1"
ResumableIterator it2 = new ResumableIterator(datasetId=1, records=List.of("A","B","C","D"), datasetVersion=10)
it2.resume(stateToken=token) → true
it2.next() → "B"
it2.next() → "C"
it2.position() → 3

Example 3: Resume fails when dataset version changes

String oldToken = "datasetId=1,version=10,position=2"
ResumableIterator it3 = new ResumableIterator(datasetId=1, records=List.of("A","B","X","C","D"), datasetVersion=11)
it3.resume(stateToken=oldToken) → false (version mismatch: 10 vs 11)
it3.position() → 0 (unchanged because resume failed)
it3.next() → "A"

Example 4: Resume at end of dataset

ResumableIterator it4 = new ResumableIterator(datasetId=2, records=List.of("P","Q"), datasetVersion=7)
it4.next() → "P"
it4.next() → "Q"
it4.hasNext() → false
it4.position() → 2
String endToken = it4.pause() → "datasetId=2,version=7,position=2"
ResumableIterator it5 = new ResumableIterator(datasetId=2, records=List.of("P","Q"), datasetVersion=7)
it5.resume(stateToken=endToken) → true
it5.hasNext() → false
it5.next() → ""
it5.position() → 2

Example 5: Invalid token (corrupted / wrong dataset)

ResumableIterator it6 = new ResumableIterator(datasetId=1, records=List.of("A","B","C"), datasetVersion=10)
it6.resume(stateToken="not-a-valid-token") → false
it6.resume(stateToken="datasetId=999,version=10,position=1") → false (datasetId mismatch)
it6.resume(stateToken="datasetId=1,version=10,position=999999") → false (position out of bounds)
it6.position() → 0