perf: reduce chapter parsing time by ~10% via CSS caching and loop optimizations by jpirnay · Pull Request #1528 · crosspoint-reader/crosspoint-reader

jpirnay · 2026-03-28T19:52:43Z

Summary

What is the goal of this PR? Reduce the time spent parsing and laying out a chapter. Profiling identified index_split_007.html (~60 KB, 72 pages) from a representative epub as a worst-case target, with a baseline of ~2164 ms.
What changes are included?
- CSS style resolution cache per chapter (ChapterHtmlSlimParser)
- Inter-word gap pre-computation in both line-breaking paths (ParsedText)
- ASCII fast path in the characterData XML callback

Additional Context

Profiling methodology

Chapter-level and per-paragraph timing was added temporarily (and removed before this PR) to measure each phase:

Phase	Baseline	Final	Δ
Total	2164 ms	~1950 ms	−214 ms (−10%)
Layout (line-break + SD writes)	~1501 ms	~1501 ms	—
XML + misc	~663 ms	~452 ms	−211 ms

The layout figure is itself a compound cost: ~720 ms is SD card writes (~10 ms/page × 72 pages), ~780 ms is actual line-break computation. Neither was addressed in this PR.

The xml+misc figure was further broken down by callback:

Callback	Time
`startElement` pure logic	~95 ms
`characterData`	~65 ms
`endElement`	~17 ms
expat internals	~282 ms

What worked

CSS style resolution cache (−171 ms) — the largest gain. resolveStyle and parseInlineStyle were being called for every HTML element (~1000 per chapter), each doing string normalization, splitting, and multiple hash lookups. Per-chapter unordered_map caches keyed on tag|class and styleAttr eliminate repeat work. The maps are members of ChapterHtmlSlimParser and are discarded with the parser after each chapter.

Inter-word gap pre-computation (−23 ms) — getSpaceAdvance/getKerning were called inside the inner loop of both computeLineBreaks (DP) and computeHyphenatedLineBreaks (greedy). Moving them to a single pre-pass vector avoids redundant codepoint scanning and renderer calls. Applied to both paths; the hyphenated path also keeps the vector in sync when hyphenateWordAtIndex inserts a new word.

ASCII fast path in characterData (−11 ms) — for bytes > 0x20 && < 0x80, skip all multi-byte sequence checks (NBSP, NNBSP, BOM) and go directly to buffer append. Safe because all multi-byte UTF-8 sequences start with a byte ≥ 0x80.

What was investigated but did not help

Async SD writes — the ~720 ms of SD write time embedded in page completion is the single largest remaining cost. Addressing it would require making completePageFn asynchronous, which is a significant architectural change outside the scope of this PR. (Edit: simple buffering the write calls does not help as SdFat likely already buffers internally to 512-byte sector boundaries, so our many small writePod calls were accumulating in SdFat's sector buffer and only flushing when a sector boundary was crossed.)
Expat parser internals (~282 ms) — not addressable without replacing the XML parser.
Additional callback optimizations — profiling showed endElement (~17 ms) and the remaining startElement logic (~95 ms) have no obvious low-risk wins beyond what the CSS cache already provides.

AI Usage

Did you use AI tools to help write this code? YES

coderabbitai · 2026-03-28T19:52:59Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: da8fc4e3-d65f-447c-a961-f313a19be002

📥 Commits

Reviewing files that changed from the base of the PR and between 9f397e7 and 21da536.

📒 Files selected for processing (1)

lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp

🚧 Files skipped from review as they are similar to previous changes (1)

lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp

📜 Recent review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: cppcheck
GitHub Check: build

📝 Walkthrough

Walkthrough

Precomputes inter-word gaps in ParsedText line-breaking/hyphenation to remove repeated renderer kerning/space calls, adds per-chapter caches for stylesheet and inline-style resolution in ChapterHtmlSlimParser, and introduces an ASCII fast path in character tokenization.

Changes

Cohort / File(s)	Summary
Line-breaking optimization `lib/Epub/Epub/ParsedText.cpp`	Added a precomputed `interWordGaps` vector used by `computeLineBreaks` and `computeHyphenatedLineBreaks` to avoid repeated `renderer.getSpaceAdvance`/`renderer.getKerning` calls inside O(n²) loops. When hyphenation splits a word, inserts a `0` gap placeholder to keep indices aligned with modified `words`/`wordWidths`.
HTML parsing & style caching `lib/Epub/Epub/parsers/ChapterHtmlSlimParser.h`, `lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp`	Added `cssStyleCache_` and `inlineStyleCache_` unordered_maps to cache `resolveStyle(name, class)` and parsed inline styles; cache only populated for fully-resolved styles. Applied cached styles when evaluating image display and sizing.
Tokenization fast path & logging `lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp`	Introduced ASCII fast path for characterData bytes 0x21..0x7F to append directly to `partWordBuffer`, reducing UTF-8/whitespace handling on the hot path. Changed parse-time logging to compute and store elapsed ms in a `uint32_t totalTimeMs`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

perf: Replace std::list with std::vector in text layout #1038 — Modifies the same ParsedText line-breaking and hyphenation code paths; likely overlaps with interWordGaps/hyphenation adjustments.
fix: Fix inter-word spacing rounding error in text layout #1311 — Also changes ParsedText spacing/line-breaking logic and interacts with inter-word gap computation.
fix: Fixed Image Sizing When No Width is Set #1002 — Changes ChapterHtmlSlimParser CSS resolution and inline-style merging; related to the new style/inline caches and image style handling.

Suggested reviewers

znelson
daveallie

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely summarizes the main changes: performance improvements via CSS caching and loop optimizations, with the specific ~10% improvement metric.
Description check	✅ Passed	The description is comprehensive and directly related to the changeset, detailing the goal, specific changes, profiling methodology, measured results, and investigations.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (3)

lib/Epub/Epub/ParsedText.cpp (2)

168-181: Extract shared inter-word-gap precompute into a helper to avoid drift.

The gap-building logic is now duplicated in two paths. A shared helper will keep behavior aligned and reduce maintenance risk.

♻️ Suggested refactor

+namespace {
+std::vector<int> buildInterWordGaps(const GfxRenderer& renderer, int fontId,
+                                    const std::vector<std::string>& words,
+                                    const std::vector<EpdFontFamily::Style>& wordStyles,
+                                    const std::vector<bool>& continuesVec) {
+  std::vector<int> gaps(words.size(), 0);
+  for (size_t j = 1; j < words.size(); ++j) {
+    gaps[j] = !continuesVec[j]
+                  ? renderer.getSpaceAdvance(fontId, lastCodepoint(words[j - 1]), firstCodepoint(words[j]),
+                                             wordStyles[j - 1])
+                  : renderer.getKerning(fontId, lastCodepoint(words[j - 1]), firstCodepoint(words[j]),
+                                        wordStyles[j - 1]);
+  }
+  return gaps;
+}
+}  // namespace

Then replace both local precompute blocks with:

-std::vector<int> interWordGaps(...);
-...
+std::vector<int> interWordGaps = buildInterWordGaps(renderer, fontId, words, wordStyles, continuesVec);

Also applies to: 293-307

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@lib/Epub/Epub/ParsedText.cpp` around lines 168 - 181, The duplicated
inter-word-gap precompute should be extracted into a helper function to keep
behavior consistent; implement a function (e.g., computeInterWordGaps) that
takes fontId, totalWordCount, words, continuesVec, wordStyles and renderer and
returns the std::vector<int> interWordGaps by using
lastCodepoint(words[j-1])/firstCodepoint(words[j]) and calling
renderer.getSpaceAdvance or renderer.getKerning depending on continuesVec[j],
then replace both local blocks that build interWordGaps (the one using
continuesVec and renderer.getSpaceAdvance/getKerning) with calls to this helper
to avoid drift.

338-340: Guard the gap-vector sync invariant with assertions.

This insertion strategy is valid but depends on strict vector alignment invariants. Add explicit assert() checks after split+insert so future changes fail fast if alignment breaks.

🛡️ Suggested hardening

+#include <cassert>
...
       if (availableWidth > 0 &&
           hyphenateWordAtIndex(currentIndex, availableWidth, renderer, fontId, wordWidths, allowFallbackBreaks)) {
         // Keep interWordGaps in sync: insert placeholder for the new remainder word.
         // The remainder is always the first word on the next line so this slot is never read.
         interWordGaps.insert(interWordGaps.begin() + currentIndex + 1, 0);
+        assert(interWordGaps.size() == wordWidths.size());
+        assert(wordWidths.size() == words.size());
+        assert(wordStyles.size() == words.size());
+        assert(continuesVec.size() == words.size());

Based on learnings: In this codebase, assertions are always enabled and should be used to enforce invariants (side-effect free, inexpensive).

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@lib/Epub/Epub/ParsedText.cpp` around lines 338 - 340, After inserting the
placeholder gap, add lightweight assert checks to enforce the inter-word gap
alignment: immediately after interWordGaps.insert(...) assert that currentIndex
+ 1 is a valid index (currentIndex + 1 < interWordGaps.size()) and that the
newly inserted slot equals 0 (interWordGaps[currentIndex + 1] == 0); also add an
assertion that the interWordGaps vector remains aligned with the corresponding
word/token container used in this routine (e.g., the words/wordSlots vector used
by ParsedText) so sizes match after the insert.

lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp (1)

297-307: Reuse the already-resolved cssStyle in the <img> path.

For img, cssStyle already contains the same stylesheet + inline-style merge, so these extra cache probes cannot change behavior. Removing them trims per-image hot-path work and gives you one place to apply the cache policy consistently.

♻️ Suggested simplification

-      if (self->cssParser) {
-        std::string imgCacheKey("img|");
-        imgCacheKey += classAttr;
-        auto imgIt = self->cssStyleCache_.find(imgCacheKey);
-        if (imgIt == self->cssStyleCache_.end())
-          imgIt = self->cssStyleCache_.emplace(imgCacheKey, self->cssParser->resolveStyle("img", classAttr)).first;
-        CssStyle imgDisplayStyle = imgIt->second;
-        if (!styleAttr.empty()) {
-          auto it = self->inlineStyleCache_.find(styleAttr);
-          if (it == self->inlineStyleCache_.end())
-            it = self->inlineStyleCache_.emplace(styleAttr, CssParser::parseInlineStyle(styleAttr)).first;
-          imgDisplayStyle.applyOver(it->second);
-        }
+      {
+        const CssStyle& imgDisplayStyle = cssStyle;
         if (imgDisplayStyle.hasDisplay() && imgDisplayStyle.display == CssDisplay::None) {
           self->skipUntilDepth = self->depth;
           self->depth += 1;
           return;
         }
       }
@@
-                std::string imgCacheKey("img|");
-                imgCacheKey += classAttr;
-                auto imgStyleIt = self->cssParser ? self->cssStyleCache_.find(imgCacheKey) : self->cssStyleCache_.end();
-                if (self->cssParser && imgStyleIt == self->cssStyleCache_.end())
-                  imgStyleIt =
-                      self->cssStyleCache_.emplace(imgCacheKey, self->cssParser->resolveStyle("img", classAttr)).first;
-                CssStyle imgStyle = self->cssParser ? imgStyleIt->second : CssStyle{};
-                // Merge inline style (e.g. style="height: 2em") so it overrides stylesheet rules
-                if (!styleAttr.empty()) {
-                  auto it = self->inlineStyleCache_.find(styleAttr);
-                  if (it == self->inlineStyleCache_.end())
-                    it = self->inlineStyleCache_.emplace(styleAttr, CssParser::parseInlineStyle(styleAttr)).first;
-                  imgStyle.applyOver(it->second);
-                }
+                CssStyle imgStyle = cssStyle;

Also applies to: 352-364

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp` around lines 297 - 307, The
img handling duplicates work by re-resolving and re-caching styles even though
cssStyle already holds the resolved stylesheet+inline merge; update the <img>
path in ChapterHtmlSlimParser (where imgCacheKey, cssStyleCache_,
cssParser->resolveStyle("img", classAttr), inlineStyleCache_,
CssParser::parseInlineStyle and imgDisplayStyle.applyOver are used) to reuse the
existing cssStyle variable instead of creating imgCacheKey and probing
cssStyleCache_/inlineStyleCache_ again—remove the extra cache lookups and
resolve calls, and apply the same single cache/application policy already used
for other elements so style resolution for images is done in one place
consistently (also apply the same change in the similar block referenced at the
later img path).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp`:
- Around line 189-197: The current code memoizes whatever
cssParser->resolveStyle(name, classAttr) returns, but that can be a transient
low-heap fallback (empty/default) and should not be cached; change the miss path
so you call cssParser->resolveStyle(...) into a temporary, then only insert into
cssStyleCache_ if the returned CssStyle is authoritative (e.g., non-default/has
rules) — otherwise skip emplacing and just use the temporary result so future
calls can re-resolve after memory recovers; reference cssStyleCache_,
cssParser->resolveStyle(name, classAttr) and the local cssStyle variable when
making this change.

---

Nitpick comments:
In `@lib/Epub/Epub/ParsedText.cpp`:
- Around line 168-181: The duplicated inter-word-gap precompute should be
extracted into a helper function to keep behavior consistent; implement a
function (e.g., computeInterWordGaps) that takes fontId, totalWordCount, words,
continuesVec, wordStyles and renderer and returns the std::vector<int>
interWordGaps by using lastCodepoint(words[j-1])/firstCodepoint(words[j]) and
calling renderer.getSpaceAdvance or renderer.getKerning depending on
continuesVec[j], then replace both local blocks that build interWordGaps (the
one using continuesVec and renderer.getSpaceAdvance/getKerning) with calls to
this helper to avoid drift.
- Around line 338-340: After inserting the placeholder gap, add lightweight
assert checks to enforce the inter-word gap alignment: immediately after
interWordGaps.insert(...) assert that currentIndex + 1 is a valid index
(currentIndex + 1 < interWordGaps.size()) and that the newly inserted slot
equals 0 (interWordGaps[currentIndex + 1] == 0); also add an assertion that the
interWordGaps vector remains aligned with the corresponding word/token container
used in this routine (e.g., the words/wordSlots vector used by ParsedText) so
sizes match after the insert.

In `@lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp`:
- Around line 297-307: The img handling duplicates work by re-resolving and
re-caching styles even though cssStyle already holds the resolved
stylesheet+inline merge; update the <img> path in ChapterHtmlSlimParser (where
imgCacheKey, cssStyleCache_, cssParser->resolveStyle("img", classAttr),
inlineStyleCache_, CssParser::parseInlineStyle and imgDisplayStyle.applyOver are
used) to reuse the existing cssStyle variable instead of creating imgCacheKey
and probing cssStyleCache_/inlineStyleCache_ again—remove the extra cache
lookups and resolve calls, and apply the same single cache/application policy
already used for other elements so style resolution for images is done in one
place consistently (also apply the same change in the similar block referenced
at the later img path).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 574ebba3-75c4-49c3-aeac-f75678807ea2

📥 Commits

Reviewing files that changed from the base of the PR and between 3448430 and 9f397e7.

📒 Files selected for processing (3)

lib/Epub/Epub/ParsedText.cpp
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.h

📜 Review details

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: cppcheck
GitHub Check: build

🧰 Additional context used

🧠 Learnings (9)

📓 Common learnings

Learnt from: Uri-Tauber
Repo: crosspoint-reader/crosspoint-reader PR: 988
File: lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp:649-661
Timestamp: 2026-02-19T12:17:05.421Z
Learning: In ChapterHtmlSlimParser.cpp, when computing footnote word indices in endElement() for footnote links, the wordIndex must be cumulative across the 750-word mid-paragraph flush boundary. The correct calculation is: `wordIndex = wordsExtractedInBlock + currentTextBlock->size()`, not just `currentTextBlock->size()`. This ensures footnotes attach to the page containing their actual anchor word, even after layoutAndExtractLines(false) has extracted and removed earlier words from the block.

📚 Learning: 2026-02-19T12:17:05.421Z

Learnt from: Uri-Tauber
Repo: crosspoint-reader/crosspoint-reader PR: 988
File: lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp:649-661
Timestamp: 2026-02-19T12:17:05.421Z
Learning: In ChapterHtmlSlimParser.cpp, when computing footnote word indices in endElement() for footnote links, the wordIndex must be cumulative across the 750-word mid-paragraph flush boundary. The correct calculation is: `wordIndex = wordsExtractedInBlock + currentTextBlock->size()`, not just `currentTextBlock->size()`. This ensures footnotes attach to the page containing their actual anchor word, even after layoutAndExtractLines(false) has extracted and removed earlier words from the block.

Applied to files:

lib/Epub/Epub/ParsedText.cpp
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.h
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp

📚 Learning: 2026-03-17T15:27:17.468Z

Learnt from: znelson
Repo: crosspoint-reader/crosspoint-reader PR: 1413
File: lib/EpdFont/EpdFont.cpp:34-36
Timestamp: 2026-03-17T15:27:17.468Z
Learning: In crosspoint-reader/crosspoint-reader, `EpdFont::getGlyph()` falls back to `getGlyph(REPLACEMENT_GLYPH)` (U+FFFD) before returning `nullptr`. All fonts in this project are guaranteed to include a U+FFFD replacement glyph, so the `!glyph` null branch in `EpdFont::getTextBounds()` (and similar rendering paths) is unreachable in practice. Do not flag stale-state issues in that branch (e.g., leftover `lastBaseAdvanceFP`/`lastBaseTop` after a null glyph) as bugs in future reviews.

Applied to files:

lib/Epub/Epub/ParsedText.cpp

📚 Learning: 2026-03-20T19:51:12.696Z

Learnt from: znelson
Repo: crosspoint-reader/crosspoint-reader PR: 1447
File: src/activities/reader/EpubReaderActivity.cpp:642-660
Timestamp: 2026-03-20T19:51:12.696Z
Learning: In crosspoint-reader/crosspoint-reader (EpubReaderActivity.cpp, prepareSection), when `epub->getTocIndexForSpineIndex(currentSpineIndex)` returns a negative value (pre-TOC spine with no TOC entry), `prepareSection` skips the entire chapter walk and leaves `chapterPageInfo` unset. The status bar correctly falls back to `section->pageCount` in this scenario. This means the `pageTocIndex == -1` path inside the `render()` per-page TOC update block is handled correctly by simply leaving `chapterPageInfo` unchanged — no explicit clear is needed.

Applied to files:

lib/Epub/Epub/ParsedText.cpp
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.h
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp

📚 Learning: 2026-02-27T22:49:59.600Z

Learnt from: ngxson
Repo: crosspoint-reader/crosspoint-reader PR: 1218
File: src/activities/ActivityManager.cpp:254-265
Timestamp: 2026-02-27T22:49:59.600Z
Learning: In this codebase, assertions are always enabled (no NDEBUG). Use assert() to crash on programmer errors and surface logic bugs during development and in production builds. Do not rely on asserts for runtime error handling; they should enforce invariants that must always hold. Keep asserts side-effect free and inexpensive, and avoid relying on them for user-visible failures. Include <cassert> where appropriate and document the invariant being tested.

Applied to files:

lib/Epub/Epub/ParsedText.cpp
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp

📚 Learning: 2026-03-02T10:14:16.036Z

Learnt from: Uri-Tauber
Repo: crosspoint-reader/crosspoint-reader PR: 1245
File: lib/Epub/Epub/Section.cpp:277-308
Timestamp: 2026-03-02T10:14:16.036Z
Learning: Guideline: Strengthen serialization::readString to defend against unbounded growth when reading from disk data. Implement and enforce a maximum allowed length (e.g., a configured or reasonable constant) and validate the incoming length before resizing or allocating. Audit all call sites (e.g., BookMetadataCache, TextBlock, KOReaderCredentialStore, Section cache readers) to ensure they do not rely on unbounded len-based resizing. If the readString API must remain, add internal safeguards (bounds checks, length validation, and error handling) so per-call-site validations are not required. Ensure Section cache files remain versioned (SECTION_FILE_VERSION) and parameter mismatches invalidate caches, but do not rely on unsafe allocations; prefer safe, bounded reads with explicit errors on invalid data.

Applied to files:

lib/Epub/Epub/ParsedText.cpp
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp

📚 Learning: 2026-03-28T11:06:29.611Z

Learnt from: pablohc
Repo: crosspoint-reader/crosspoint-reader PR: 1488
File: src/activities/home/HomeActivity.cpp:92-95
Timestamp: 2026-03-28T11:06:29.611Z
Learning: When reviewing crosspoint-reader code, avoid flagging a missing `renderer.displayBuffer()` call immediately after `GUI.drawPopup()` / `BaseTheme::drawPopup()`: `BaseTheme::drawPopup()` already calls `renderer.displayBuffer()` before returning, so the popup is guaranteed to be flushed to the e-ink panel before subsequent blocking work begins. Conversely, do not require a `renderer.displayBuffer()` call after `fillPopupProgress()`; it intentionally does not flush, so intermediate progress-bar updates may not appear unless the update granularity warrants an explicit flush elsewhere.

Applied to files:

lib/Epub/Epub/ParsedText.cpp
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp

📚 Learning: 2026-03-21T19:30:33.664Z

Learnt from: znelson
Repo: crosspoint-reader/crosspoint-reader PR: 1454
File: lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp:101-105
Timestamp: 2026-03-21T19:30:33.664Z
Learning: In `lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp`, the `blockStyleStack` is a fixed-capacity array (`MAX_BLOCK_STYLE_DEPTH`, currently 8). On overflow, `pushBlockStyle()` only logs an error and increments `blockStyleStackSize` without writing a new entry; `getAccumulatedBlockStyle()` returns the deepest stored accumulated entry as a fallback. This is intentional — real EPUBs are expected to have ≤ 8 block-style nesting levels. Inline styles use a `vector`-based stack and can nest more deeply. The overflow threshold or container type can be revisited if 8 proves insufficient in practice. Do not flag the overflow fallback behavior as a bug in future reviews.

Applied to files:

lib/Epub/Epub/parsers/ChapterHtmlSlimParser.h
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp

📚 Learning: 2026-03-02T21:18:25.387Z

Learnt from: Uri-Tauber
Repo: crosspoint-reader/crosspoint-reader PR: 1245
File: lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp:137-150
Timestamp: 2026-03-02T21:18:25.387Z
Learning: In ChapterHtmlSlimParser.cpp, pendingAnchorId is a single string (not a vector) because EPUB footnote targets are virtually always block-level elements (<aside>, <p>, <div>) that trigger startNewTextBlock() and flush the anchor before overwrite can occur. Multiple consecutive inline IDs would resolve to the same completedPageCount anyway, and vector allocation overhead is unnecessary for ESP32-C3. Only inline IDs in practice are footnote back-references (<a id="ref1">), which are never navigation targets.

Applied to files:

lib/Epub/Epub/parsers/ChapterHtmlSlimParser.h
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp

🔇 Additional comments (1)

lib/Epub/Epub/ParsedText.cpp (1)

168-181: Nice optimization with preserved spacing semantics.

Good change: precomputing interWordGaps removes repeated renderer/codepoint work from hot loops while keeping the same space-vs-kerning decision model.

Also applies to: 199-200, 293-307, 322-323

Copilot

Pull request overview

This PR targets faster chapter parsing/layout by reducing repeated CSS resolution work, minimizing renderer calls in line-breaking loops, and adding a faster ASCII path for XML character data handling.

Changes:

Add per-chapter caches for resolved CSS styles and parsed inline styles in ChapterHtmlSlimParser.
Pre-compute inter-word gaps in both DP and hyphenation line-breaking paths to reduce repeated renderer/codepoint work.
Add an ASCII fast path in the XML characterData callback.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File	Description
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.h	Adds per-chapter `unordered_map` caches for stylesheet and inline-style resolution.
lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp	Uses the caches in `startElement`, adds an ASCII fast path in `characterData`, and minor logging refactor.
lib/Epub/Epub/ParsedText.cpp	Pre-computes inter-word spacing/kerning vectors to reduce inner-loop cost during line breaking.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-16T21:08:49Z

+        if (imgIt == self->cssStyleCache_.end())
+          imgIt = self->cssStyleCache_.emplace(imgCacheKey, self->cssParser->resolveStyle("img", classAttr)).first;
+        CssStyle imgDisplayStyle = imgIt->second;


The image-path style caching unconditionally stores the result of CssParser::resolveStyle("img", classAttr). But resolveStyle can intentionally return an empty CssStyle when free heap is below MIN_FREE_HEAP_FOR_CSS (see CssParser.cpp), and caching that empty result would make the style permanently empty for the rest of the chapter even if heap later recovers. Consider mirroring the earlier caching logic here (only cache when resolved.defined.anySet(), otherwise use the transient resolved value without inserting) so low-heap fallbacks don’t get memoized.

Suggested change

if (imgIt == self->cssStyleCache_.end())

imgIt = self->cssStyleCache_.emplace(imgCacheKey, self->cssParser->resolveStyle("img", classAttr)).first;

CssStyle imgDisplayStyle = imgIt->second;

CssStyle imgDisplayStyle;

if (imgIt == self->cssStyleCache_.end()) {

CssStyle resolved = self->cssParser->resolveStyle("img", classAttr);

if (resolved.defined.anySet())

imgIt = self->cssStyleCache_.emplace(imgCacheKey, resolved).first;

imgDisplayStyle = resolved;

} else {

imgDisplayStyle = imgIt->second;

}

Copilot · 2026-04-16T21:08:50Z

+                auto imgStyleIt = self->cssParser ? self->cssStyleCache_.find(imgCacheKey) : self->cssStyleCache_.end();
+                if (self->cssParser && imgStyleIt == self->cssStyleCache_.end())
+                  imgStyleIt =
+                      self->cssStyleCache_.emplace(imgCacheKey, self->cssParser->resolveStyle("img", classAttr)).first;
+                CssStyle imgStyle = self->cssParser ? imgStyleIt->second : CssStyle{};


Same concern as above: this imgStyle cache path always emplaces the result of resolveStyle into cssStyleCache_. If resolveStyle returned an empty style due to low heap, that fallback becomes permanently cached and will prevent later correct CSS height/width/display resolution for images of this class. Suggest applying the same ‘only cache when resolved.defined.anySet()’ policy (or otherwise avoid caching low-heap fallback results) here too.

Suggested change

auto imgStyleIt = self->cssParser ? self->cssStyleCache_.find(imgCacheKey) : self->cssStyleCache_.end();

if (self->cssParser && imgStyleIt == self->cssStyleCache_.end())

imgStyleIt =

self->cssStyleCache_.emplace(imgCacheKey, self->cssParser->resolveStyle("img", classAttr)).first;

CssStyle imgStyle = self->cssParser ? imgStyleIt->second : CssStyle{};

CssStyle imgStyle{};

if (self->cssParser) {

auto imgStyleIt = self->cssStyleCache_.find(imgCacheKey);

if (imgStyleIt != self->cssStyleCache_.end()) {

imgStyle = imgStyleIt->second;

} else {

CssStyle resolved = self->cssParser->resolveStyle("img", classAttr);

if (resolved.defined.anySet()) {

imgStyle = self->cssStyleCache_.emplace(imgCacheKey, resolved).first->second;

} else {

imgStyle = resolved;

}

}

}

znelson · 2026-04-21T22:41:34Z

After a careful look, I'm uneasy with this. Yes, some performance improvement, but unordered_map requires nodes to be heap allocated, and I worry about heap fragmentation as a result. If the only improvement is performance, I'm going to hold off until we have a clear owner for these improvements.

Add css cache

9f397e7

coderabbitai Bot reviewed Mar 28, 2026

View reviewed changes

Comment thread lib/Epub/Epub/parsers/ChapterHtmlSlimParser.cpp

Review comment

21da536

jpirnay closed this Apr 4, 2026

osteotek reopened this Apr 16, 2026

osteotek requested a review from Copilot April 16, 2026 21:03

Copilot started reviewing on behalf of osteotek April 16, 2026 21:03 View session

Copilot AI reviewed Apr 16, 2026

View reviewed changes

osteotek approved these changes Apr 16, 2026

View reviewed changes

znelson closed this Apr 21, 2026

jpirnay deleted the perf-htmlslim-parser branch May 7, 2026 17:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: reduce chapter parsing time by ~10% via CSS caching and loop optimizations#1528

perf: reduce chapter parsing time by ~10% via CSS caching and loop optimizations#1528
jpirnay wants to merge 2 commits into
crosspoint-reader:masterfrom
jpirnay:perf-htmlslim-parser

jpirnay commented Mar 28, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Mar 28, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 16, 2026

Uh oh!

Copilot AI Apr 16, 2026

Uh oh!

znelson commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

-        if (imgIt == self->cssStyleCache_.end())
-          imgIt = self->cssStyleCache_.emplace(imgCacheKey, self->cssParser->resolveStyle("img", classAttr)).first;
-        CssStyle imgDisplayStyle = imgIt->second;
+        CssStyle imgDisplayStyle;
+        if (imgIt == self->cssStyleCache_.end()) {
+          CssStyle resolved = self->cssParser->resolveStyle("img", classAttr);
+          if (resolved.defined.anySet())
+            imgIt = self->cssStyleCache_.emplace(imgCacheKey, resolved).first;
+          imgDisplayStyle = resolved;
+        } else {
+          imgDisplayStyle = imgIt->second;
+        }

Uh oh!

Conversation

jpirnay commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Additional Context

Profiling methodology

What worked

What was investigated but did not help

AI Usage

Uh oh!

coderabbitai Bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 16, 2026

Choose a reason for hiding this comment

Uh oh!

znelson commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jpirnay commented Mar 28, 2026 •

edited

Loading

coderabbitai Bot commented Mar 28, 2026 •

edited

Loading