Skip to content

feat(reporting): bound PDF compliance report memory and CPU#11160

Open
pedrooot wants to merge 3 commits into
masterfrom
PROWLER-1733-optimize-pdf-report-generation-performance-api
Open

feat(reporting): bound PDF compliance report memory and CPU#11160
pedrooot wants to merge 3 commits into
masterfrom
PROWLER-1733-optimize-pdf-report-generation-performance-api

Conversation

@pedrooot
Copy link
Copy Markdown
Member

Problem

The scan-compliance-reports Celery task generates 5 PDFs per scan (ThreatScore, ENS, NIS2, CSA, CIS). On scans with hundreds of thousands of findings it OOM-ed the worker: a single check with thousands of findings forced ReportLab to resolve layout for one giant LongTable, and FindingOutput.transform_api_finding ran pydantic v1 validation per finding plus an N+1 lookup on resources/tags. The master function also re-initialised the Prowler provider five times.

Changes

  • Chunk findings tables: _create_findings_tables returns 300-row sub-tables instead of one large LongTable, so ReportLab layout cost is bounded per Flowable.
  • Cap detail rows at 100 failed findings per check (env-tunable via DJANGO_PDF_MAX_FINDINGS_PER_CHECK; set to 0 to disable). The PDF shows an in-banner "Showing first 100 of N failed findings" and points the reader to the CSV or JSON export, which are never truncated.
  • Push the only_failed filter down to SQL: PASS findings are never loaded from the DB nor pydantic-transformed.
  • Fix N+1: prefetch_related("resources", "resources__tags") in _load_findings_for_requirement_checks.
  • Initialise prowler_provider once per batch and propagate to each framework wrapper instead of re-initialising 5 times.
  • Evict the shared findings_cache between frameworks: drop the check_ids no remaining framework still needs.
  • ROWBACKGROUNDS (O(1) style entry) instead of N per-row BACKGROUND commands in create_data_table.
  • Structured logging via a new _log_phase context manager emitting phase_start/phase_end with scan_id, framework, elapsed_s, rss_kb, delta_rss_kb. Per-framework error paths now use logger.exception with scan_id/tenant_id.
  • Validate output_path early in _create_document.

Steps to review

Please add a detailed description of how to review this PR.

Checklist

Community Checklist
  • This feature/issue is listed in here or roadmap.prowler.com
  • Is it assigned to me, if not, request it via the issue/feature in here or Prowler Community Slack

SDK/CLI

  • Are there new checks included in this PR? Yes / No
    • If so, do we need to update permissions for the provider? Please review this carefully.

UI

  • All issue/task requirements work as expected on the UI
  • If this PR adds or updates npm dependencies, include package-health evidence (maintenance, popularity, known vulnerabilities, license, release age) and explain why existing/native alternatives are insufficient.
  • Screenshots/Video of the functionality flow (if applicable) - Mobile (X < 640px)
  • Screenshots/Video of the functionality flow (if applicable) - Table (640px > X < 1024px)
  • Screenshots/Video of the functionality flow (if applicable) - Desktop (X > 1024px)
  • Ensure new entries are added to CHANGELOG.md, if applicable.

API

  • All issue/task requirements work as expected on the API
  • Endpoint response output (if applicable)
  • EXPLAIN ANALYZE output for new/modified queries or indexes (if applicable)
  • Performance test results (if applicable)
  • Any other relevant evidence of the implementation (if applicable)
  • Verify if API specs need to be regenerated.
  • Check if version updates are required (e.g., specs, Poetry, etc.).
  • Ensure new entries are added to CHANGELOG.md, if applicable.

License

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 13, 2026

Conflict Markers Resolved

All conflict markers have been successfully resolved in this pull request.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 13, 2026

✅ All necessary CHANGELOG.md files have been updated.

with _log_phase("failing_phase", scan_id="s-2", framework="FW"):
raise RuntimeError("boom")

messages = [r.getMessage() for r in caplog.records]
Comment thread api/src/backend/tasks/jobs/reports/base.py Outdated
Comment thread api/src/backend/tasks/jobs/reports/base.py Outdated
Comment thread api/src/backend/tasks/jobs/threatscore_utils.py
Comment thread docs/user-guide/compliance/tutorials/compliance.mdx Outdated
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants