CDC_Cleanup_LSN_Query

Snippet Name: CDC_Cleanup_LSN_Query

Tags:

Created Date: 2025-06-16 13:12:49
Last Modified Date: 2025-05-31 10:32:10

Dependencies:

DBA Notes:

Sample Output:

Title: CDC Cleanup LSN Query
Author: user
I have an Azure SQL database where i have enable Change Data Capture on nine tables. the history tables are getting very large. i think i need to runsys.sp_cdc_cleanup_change_table on each of the nine tables, but need help finding an LSN related to a date odler than, say 3 days

Author: ChatGPT
To clean up the Change Data Capture (CDC) history tables in Azure SQL Database, you need to use `sys.sp_cdc_cleanup_change_table`. However, this procedure requires the starting LSN (`low_water_mark`) to indicate where cleanup should start. To find the LSN corresponding to a specific date (e.g., 3 days ago), you can use the system function `sys.fn_cdc_map_time_to_lsn`.

Here???s the process:

### Steps to Find LSN for a Date
1. Use the `sys.fn_cdc_map_time_to_lsn` function to get the LSN for a specific date.
2. Pass that LSN into `sys.sp_cdc_cleanup_change_table` for each table.

### SQL Query to Find the LSN
Run this query to find the LSN for 3 days ago:

```sql
DECLARE @date datetime = DATEADD(DAY, -3, GETDATE()); -- 3 days ago
DECLARE @lsn binary(10);

SELECT @lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than or equal', @date);

SELECT @lsn AS LSN_For_Three_Days_Ago;
```

### Execute Cleanup for Each CDC-Enabled Table
You???ll need to run the `sys.sp_cdc_cleanup_change_table` stored procedure for each of your CDC-enabled tables. Here's how you can loop through all nine tables dynamically:

```sql
DECLARE @date datetime = DATEADD(DAY, -3, GETDATE());
DECLARE @lsn binary(10);
DECLARE @table_name sysname;
DECLARE @schema_name sysname;
DECLARE @sql nvarchar(max);

-- Get the LSN for the specified date
SELECT @lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than or equal', @date);

-- Loop through all CDC-enabled tables
DECLARE cdc_tables_cursor CURSOR FOR
SELECT schema_name, table_name
FROM cdc.change_tables;

OPEN cdc_tables_cursor;

FETCH NEXT FROM cdc_tables_cursor INTO @schema_name, @table_name;

WHILE @@FETCH_STATUS = 0
BEGIN
    SET @sql = N'EXEC sys.sp_cdc_cleanup_change_table @capture_instance = @ci, @low_water_mark = @lsn';
    SET @sql = REPLACE(@sql, '@ci', QUOTENAME(@schema_name + '_' + @table_name));
    SET @sql = REPLACE(@sql, '@lsn', CAST(@lsn AS NVARCHAR(MAX)));

    PRINT @sql; -- For review before executing
    EXEC sp_executesql @sql;

    FETCH NEXT FROM cdc_tables_cursor INTO @schema_name, @table_name;
END

CLOSE cdc_tables_cursor;
DEALLOCATE cdc_tables_cursor;
```

### Explanation
1. **Step 1**: Get the LSN for a date 3 days ago using `sys.fn_cdc_map_time_to_lsn`.
2. **Step 2**: Use a cursor to iterate over all CDC-enabled tables in `cdc.change_tables`.
3. **Step 3**: Execute `sys.sp_cdc_cleanup_change_table` for each table with the calculated LSN.

### Things to Consider
- **Review Generated SQL**: Use the `PRINT` statement to preview the commands before execution.
- **Impact on Performance**: Running the cleanup on large tables may take some time. Run it during low-activity periods.
- **Retention Policy**: After cleanup, consider adjusting the retention policy to reduce table size in the future using `sys.sp_cdc_enable_table`.

If you need help tweaking the script further, let me know!

Author: user
i had to fix the curor's query to get the table and schem name to what is below, but i get an error from inside the cursor: Msg 257, Level 16, State 3, Procedure sys.sp_cdc_cleanup_change_table, Line 0 [Batch Start Line 0]
Implicit conversion from data type nvarchar to binary is not allowed. Use the CONVERT function to run this query.

Author: ChatGPT
The error is occurring because the `@low_water_mark` parameter for `sys.sp_cdc_cleanup_change_table` requires a `binary(10)` value, but the value being passed is an `nvarchar`. You need to explicitly convert the LSN to `binary(10)` before passing it to the stored procedure.

### Corrected Script
Here is the updated version of the script that should fix both issues:

```sql
DECLARE @date datetime = DATEADD(DAY, -3, GETDATE());
DECLARE @lsn binary(10);
DECLARE @table_name sysname;
DECLARE @schema_name sysname;
DECLARE @capture_instance sysname;
DECLARE @sql nvarchar(max);

-- Get the LSN for the specified date
SELECT @lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than or equal', @date);

-- Loop through all CDC-enabled tables
DECLARE cdc_tables_cursor CURSOR FOR
SELECT s.name AS schema_name, t.name AS table_name
FROM sys.tables t
JOIN sys.schemas s ON t.schema_id = s.schema_id
WHERE OBJECTPROPERTY(t.object_id, 'TableHasChangeTracking') = 1; -- List only CDC-enabled tables

OPEN cdc_tables_cursor;

FETCH NEXT FROM cdc_tables_cursor INTO @schema_name, @table_name;

WHILE @@FETCH_STATUS = 0
BEGIN
    -- Form the capture instance name (usually formatted as schema_table)
    SET @capture_instance = @schema_name + '_' + @table_name;

    -- Execute the cleanup stored procedure
    EXEC sys.sp_cdc_cleanup_change_table 
        @capture_instance = @capture_instance, 
        @low_water_mark = @lsn;

    FETCH NEXT FROM cdc_tables_cursor INTO @schema_name, @table_name;
END

CLOSE cdc_tables_cursor;
DEALLOCATE cdc_tables_cursor;
```

### Key Fixes
1. **Proper Conversion**: The `@lsn` is correctly defined as `binary(10)` to match the data type expected by `sys.sp_cdc_cleanup_change_table`. No conversion to `nvarchar` is attempted in this version.
2. **Cursor Query**: The `cdc.change_tables` query has been replaced with a direct query from `sys.tables` joined with `sys.schemas`. It identifies CDC-enabled tables using the `TableHasChangeTracking` property.

### Explanation of Key Changes
- `@low_water_mark`: The LSN is passed directly as `binary(10)` without any conversion to `nvarchar`. This resolves the implicit conversion error.
- Cursor Query:
  - It fetches schema and table names from `sys.tables` and `sys.schemas`.
  - The condition `OBJECTPROPERTY(t.object_id, 'TableHasChangeTracking') = 1` ensures only CDC-enabled tables are included.

### Next Steps
- **Verify the Capture Instance Name**: Ensure that the format for the capture instance matches the naming convention in your system (e.g., `schema_table`).
- **Test with a Single Table**: Before running the cursor, test the `sys.sp_cdc_cleanup_change_table` procedure manually with one capture instance to confirm the correct behavior.
  
Let me know if you need additional adjustments!

Author: user
SELECT s.name, tb.name
--SELECT 
-- s.name AS Schema_Name, 
-- tb.name AS Table_Name,
-- tb.object_id, 
--tb.is_tracked_by_cdc
 FROM sys.tables tb
 INNER JOIN sys.schemas s on s.schema_id = tb.schema_id
 WHERE  tb.is_tracked_by_cdc = 1

Author: ChatGPT
This query looks correct for listing tables that are tracked by Change Data Capture (CDC) in your database. It retrieves the schema name and table name for all tables where CDC is enabled (`is_tracked_by_cdc = 1`).

### Full Query
Here is the cleaned-up and complete version of the query, including comments for clarity:

```sql
SELECT 
    s.name AS Schema_Name,   -- Schema name of the table
    tb.name AS Table_Name    -- Name of the CDC-enabled table
FROM sys.tables tb
INNER JOIN sys.schemas s 
    ON s.schema_id = tb.schema_id
WHERE tb.is_tracked_by_cdc = 1; -- Only include tables enabled for Change Data Capture
```

### Explanation
- **`sys.tables`**: Contains information about all user tables in the database.
- **`sys.schemas`**: Contains schema names, which are joined to get the full name of the table.
- **`is_tracked_by_cdc`**: A column in `sys.tables` that indicates whether CDC is enabled (`1` = enabled, `0` = not enabled).

### Integration with Cleanup Script
You can use this query in your cursor for iterating over CDC-enabled tables. Here's how it fits into the cleanup script:

```sql
DECLARE @date datetime = DATEADD(DAY, -3, GETDATE());
DECLARE @lsn binary(10);
DECLARE @table_name sysname;
DECLARE @schema_name sysname;
DECLARE @capture_instance sysname;

-- Get the LSN for the specified date
SELECT @lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than or equal', @date);

-- Loop through all CDC-enabled tables
DECLARE cdc_tables_cursor CURSOR FOR
SELECT 
    s.name AS Schema_Name, 
    tb.name AS Table_Name
FROM sys.tables tb
INNER JOIN sys.schemas s 
    ON s.schema_id = tb.schema_id
WHERE tb.is_tracked_by_cdc = 1;

OPEN cdc_tables_cursor;

FETCH NEXT FROM cdc_tables_cursor INTO @schema_name, @table_name;

WHILE @@FETCH_STATUS = 0
BEGIN
    -- Form the capture instance name (usually formatted as schema_table)
    SET @capture_instance = @schema_name + '_' + @table_name;

    -- Execute the cleanup stored procedure
    EXEC sys.sp_cdc_cleanup_change_table 
        @capture_instance = @capture_instance, 
        @low_water_mark = @lsn;

    FETCH NEXT FROM cdc_tables_cursor INTO @schema_name, @table_name;
END

CLOSE cdc_tables_cursor;
DEALLOCATE cdc_tables_cursor;
```

### Output Example
When running the query, you should see output like this for CDC-enabled tables:
| Schema_Name | Table_Name    |
|-------------|---------------|
| Sales       | Orders        |
| HR          | Employees     |

If you're still encountering issues or need further help, let me know!

Author: user
that worked well. i changed the query a bit to get the explicitly named @capture_instnace isntead of calculating what it typically is if it followed a naming convention: DECLARE @date datetime = DATEADD(DAY, -3, GETDATE());
DECLARE @lsn binary(10);
DECLARE @table_name sysname;
DECLARE @schema_name sysname;
DECLARE @capture_instance sysname;

-- Get the LSN for the specified date
SELECT @lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than or equal', @date);

-- Loop through all CDC-enabled tables
DECLARE cdc_tables_cursor CURSOR FOR
SELECT 
    s.name AS Schema_Name, 
    tb.name AS Table_Name,
    cd.capture_instance
FROM sys.tables tb
INNER JOIN sys.schemas s 
    ON s.schema_id = tb.schema_id
    inner join cdc.change_tables cd ON tb.object_id = cd.source_object_id
WHERE tb.is_tracked_by_cdc = 1;

OPEN cdc_tables_cursor;

FETCH NEXT FROM cdc_tables_cursor INTO @schema_name, @table_name,@capture_instance;

WHILE @@FETCH_STATUS = 0
BEGIN
    -- Form the capture instance name (usually formatted as schema_table)
   -- SET @capture_instance = @schema_name + '_' + @table_name;

    -- Execute the cleanup stored procedure
    EXEC sys.sp_cdc_cleanup_change_table 
        @capture_instance = @capture_instance, 
        @low_water_mark = @lsn;

    FETCH NEXT FROM cdc_tables_cursor INTO @schema_name, @table_name,@capture_instance;
END

CLOSE cdc_tables_cursor;
DEALLOCATE cdc_tables_cursor;


Author: ChatGPT
This is a great improvement! By explicitly retrieving the `@capture_instance` from the `cdc.change_tables` system table, you eliminate any assumptions about the naming convention. This approach is more reliable, especially in environments where the naming convention might not be consistent.

### Key Improvements:
1. **Direct Retrieval of `@capture_instance`**:
   - You now join `cdc.change_tables` on `source_object_id` to fetch the exact `capture_instance` for each table. This avoids potential mismatches due to naming inconsistencies.

2. **Simplified Cleanup Logic**:
   - By removing the need to manually calculate the `@capture_instance`, the logic becomes cleaner and less error-prone.

### Finalized Query Script:
Here???s the finalized version for reference:

```sql
DECLARE @date datetime = DATEADD(DAY, -3, GETDATE());
DECLARE @lsn binary(10);
DECLARE @table_name sysname;
DECLARE @schema_name sysname;
DECLARE @capture_instance sysname;

-- Get the LSN for the specified date
SELECT @lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than or equal', @date);

-- Loop through all CDC-enabled tables
DECLARE cdc_tables_cursor CURSOR FOR
SELECT 
    s.name AS Schema_Name, 
    tb.name AS Table_Name,
    cd.capture_instance
FROM sys.tables tb
INNER JOIN sys.schemas s 
    ON s.schema_id = tb.schema_id
INNER JOIN cdc.change_tables cd 
    ON tb.object_id = cd.source_object_id
WHERE tb.is_tracked_by_cdc = 1;

OPEN cdc_tables_cursor;

FETCH NEXT FROM cdc_tables_cursor INTO @schema_name, @table_name, @capture_instance;

WHILE @@FETCH_STATUS = 0
BEGIN
    -- Execute the cleanup stored procedure
    EXEC sys.sp_cdc_cleanup_change_table 
        @capture_instance = @capture_instance, 
        @low_water_mark = @lsn;

    FETCH NEXT FROM cdc_tables_cursor INTO @schema_name, @table_name, @capture_instance;
END

CLOSE cdc_tables_cursor;
DEALLOCATE cdc_tables_cursor;
```

### Notes:
- **Join with `cdc.change_tables`**:
   - The `cdc.change_tables` table includes `capture_instance` and `source_object_id`, which directly ties to the `object_id` of the source table. This ensures accuracy when identifying the correct `capture_instance`.
   
- **Dynamic Cleanup**:
   - The cleanup now dynamically applies to all CDC-enabled tables without needing to rely on naming conventions.

### Example Output:
If you have tables like this:

| Schema_Name | Table_Name | Capture_Instance     |
|-------------|------------|----------------------|
| Sales       | Orders     | sales_orders         |
| HR          | Employees  | hr_employees         |

The procedure will clean up CDC data for these capture instances with the calculated `@lsn` threshold.

This is a robust and scalable solution. Let me know if you need further refinements! ????

Author: user
it worked without errors, but i still have one table with named cdc.REPORTING_MasterInvoices_CT that ahs 84Million rows before and after

Author: ChatGPT
If `sys.sp_cdc_cleanup_change_table` didn't remove any rows for `cdc.REPORTING_MasterInvoices_CT`, it???s likely because there are active CDC consumers (e.g., replication, ETL processes, or queries) that are preventing the cleanup. The procedure only removes rows up to the **minimum LSN of active consumers**.

### Troubleshooting Steps

#### 1. Check the Current Low LSN
Run the following query to check the **low watermark LSN** for the CDC cleanup process:

```sql
SELECT * 
FROM cdc.lsn_time_mapping;
```

Look for the **`tran_begin_time`** and corresponding **`tran_end_time`**. If there are no transactions eligible for cleanup older than your target date, this could explain why rows remain.

Alternatively, to directly query the oldest LSN in use:
```sql
SELECT 
    MIN(start_lsn) AS LowWatermarkLSN,
    capture_instance
FROM cdc.change_tables
GROUP BY capture_instance;
```

Ensure your `@lsn` for cleanup is older than this **low watermark LSN**.

#### 2. Check for Active Consumers
Active CDC consumers might hold LSNs that prevent cleanup. Check the CDC jobs and confirm if there???s any long-running process blocking the cleanup:

```sql
SELECT 
    capture_instance, 
    start_lsn, 
    end_lsn, 
    has_drop_pending 
FROM cdc.change_tables;
```

#### 3. Analyze the Retention Period for CDC
Ensure your CDC cleanup job isn't constrained by an excessively long retention period. The retention period determines how long the data is kept.

To check the retention settings:
```sql
EXEC sys.sp_cdc_help_change_data_capture;
```

Look for the `retention` column. The value is in minutes (e.g., `4320` means 3 days). If it's too high, consider reducing it to ensure older records are cleaned.

To modify the retention:
```sql
EXEC sys.sp_cdc_change_job 
    @job_type = N'cleanup', 
    @retention = 1440; -- Set retention to 1 day
```

#### 4. Force Cleanup for the Specific Table
If the cleanup is still not removing rows, and you're sure the rows are no longer needed, try directly targeting the specific capture instance. Here's how:

```sql
DECLARE @date datetime = DATEADD(DAY, -3, GETDATE());
DECLARE @lsn binary(10);

-- Get the LSN for the specific date
SELECT @lsn = sys.fn_cdc_map_time_to_lsn('smallest greater than or equal', @date);

-- Force cleanup on the specific capture instance
EXEC sys.sp_cdc_cleanup_change_table 
    @capture_instance = N'reporting_masterinvoices', 
    @low_water_mark = @lsn;
```

#### 5. Verify the Rows in the CDC Table
Inspect the CDC table itself to confirm the data range and whether rows are eligible for cleanup:
```sql
SELECT TOP 10 *
FROM cdc.REPORTING_MasterInvoices_CT
ORDER BY __$start_lsn ASC;

SELECT TOP 10 *
FROM cdc.REPORTING_MasterInvoices_CT
ORDER BY __$start_lsn DESC;
```

Check the range of LSNs and verify that your cleanup LSN falls within the range of the data.

---

### Key Points to Check
1. **Low Watermark LSN**: Cleanup won't remove rows older than the minimum LSN required by active CDC consumers.
2. **Retention Settings**: Adjust retention settings to allow for older rows to be removed.
3. **Active Consumers**: Ensure no active jobs or processes are holding onto LSNs.
4. **Correct `@low_water_mark`**: Ensure the `@low_water_mark` provided for the cleanup falls within the range of LSNs in the table.

---

### Next Steps
Run the above diagnostics to narrow down why rows remain in the `cdc.REPORTING_MasterInvoices_CT` table. Let me know what you find, and I can guide you further! ????


Leave a Reply