Skip to content

disagg: Fix unexpected object storage usage caused by pre-lock residue (#10760)#10767

Open
ti-chi-bot wants to merge 1 commit intopingcap:release-nextgen-20251011from
ti-chi-bot:cherry-pick-10760-to-release-nextgen-20251011
Open

disagg: Fix unexpected object storage usage caused by pre-lock residue (#10760)#10767
ti-chi-bot wants to merge 1 commit intopingcap:release-nextgen-20251011from
ti-chi-bot:cherry-pick-10760-to-release-nextgen-20251011

Conversation

@ti-chi-bot
Copy link
Member

This is an automated cherry-pick of #10760

What problem does this PR solve?

Issue Number: close #10763

Problem Summary:

  • In concurrent remote write paths, PageDirectory write-group semantics could cause follower writers to miss their own applied lock-id cleanup signals.
  • As a result, S3LockLocalManager.pre_lock_keys could remain resident and be repeatedly written into manifest locks.
  • S3GC then treated many obsolete objects as still protected, leading to long-term remote storage usage inflation.

What is changed and how it works?

disagg: eliminate pre-lock key residue that lead to unexpected OSS usage
  • End-to-end correctness fixes for lock lifecycle

    • PageDirectory::apply now returns writer-scoped applied_data_files for both write-group owner and followers, so each writer gets its own cleanup signal.
    • UniversalPageStorage::write uses those per-writer ids to clean pre-locks reliably after apply.
    • Added explicit failure cleanup path: cleanPreLockKeysOnWriteFailure(...) is invoked when remote write/apply fails.
    • createS3LockForWriteBatch was adjusted to avoid partial pre-lock residue on partial lock-creation failures (append to pre_lock_keys after lock-creation pass), and its return value is now aligned with "newly appended keys" semantics.
  • Test coverage and regression guards

    • Added write-group concurrency tests in PageDirectory and UniversalPageStorage paths.
    • Added focused S3LockLocalManager tests for partial cleanup, failure cleanup, lock-return semantics, and partial-failure atomicity.
    • Updated SyncPoint-based async tests to use std::launch::async to avoid deferred scheduling risk.
  • Observability and operations improvements

    • Most observability change are split into seperate PR disagg: Add O11y on object store usage summary of each tiflash store #10764 to keep this logical changes clean
    • Added lock-manager metrics to track pre-lock residency and cleanup outcomes (hit/miss/remaining).
    • Added owner-only periodic S3 storage summary in S3GCManagerService.
    • Added per-store S3 summary gauge:
      • tiflash_storage_s3_store_summary_bytes{store_id, type=data_file_bytes|dt_file_bytes}
    • Added setting remote_summary_interval_seconds and wired it through TMTContext; <= 0 disables periodic summary task registration.
    • Updated Grafana panels for the new S3 summary metric.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
# Run chbenchmark workload and check the metrics of `prelock_keys` and OSS usage
tiup bench ch --host 10.2.12.81 -P 8081 --warehouses 8000 run -D chbenchmark8k -T 50 -t 0 --time 30m --ignore-error --queries q1
# Before the fix, from 23:29 to 00:00, the number of prelock_keys in memory would accumulate and increase with the write load; after the fix, from 02:00 to 02:30, there was no longer any persistent residue of prelock_keys in memory.
# Also can check the new added grafana panel "Remote Store Summary (Disagg arch)"
image image
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Fix an issue in disaggregated remote-write paths where pre-lock keys could remain resident under write-group concurrency or partial failure, causing S3GC to retain obsolete objects and inflate remote storage usage. Also add configurable periodic S3 storage summary and per-store summary metrics.

Summary by CodeRabbit

  • Bug Fixes

    • Resolved S3 pre-lock key cleanup on write failures to prevent orphaned lock keys.
    • Improved remote write error handling with enhanced exception logging.
  • New Features

    • Added S3 lock manager metrics for monitoring lock creation, cleanup, and status.
    • Extended S3 store summary metrics tracking.
  • Improvements

    • Enhanced checkpoint operation logging for better visibility.
    • Refined concurrent write batch processing with improved lock key tracking.

@ti-chi-bot ti-chi-bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. type/cherry-pick-for-release-nextgen-20251011 labels Mar 23, 2026
@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Mar 23, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign guo-shaoge for approval. For more information see the Code Review Process.
Please ensure that each of them provides their approval before proceeding.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot
Copy link
Contributor

ti-chi-bot bot commented Mar 23, 2026

This cherry pick PR is for a release branch and has not yet been approved by triage owners.
Adding the do-not-merge/cherry-pick-not-approved label.

To merge this cherry pick:

  1. It must be LGTMed and approved by the reviewers firstly.
  2. For pull requests to TiDB-x branches, it must have no failed tests.
  3. AFTER it has lgtm and approved labels, please wait for the cherry-pick merging approval from triage owners.
Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@coderabbitai
Copy link

coderabbitai bot commented Mar 23, 2026

Important

Review skipped

Auto reviews are disabled on base/target branches other than the default branch.

🗂️ Base branches to auto review (3)
  • release-8.5
  • release-7.5
  • release-8.1

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 8b49cbf3-ec25-4022-b6b2-7d6cd3fd05f6

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/cherry-pick-not-approved release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. type/cherry-pick-for-release-nextgen-20251011

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants