Tracking_Table_Modifications_MySQL

Posted on2023-02-21AuthorStormAdminLeave a comment

Snippet Name: Tracking_Table_Modifications_MySQL

Tags:

Created Date: 2025-06-16 13:12:40
Last Modified Date: 2025-05-31 10:32:18

Dependencies:

DBA Notes:

Sample Output:

Title: Tracking Table Modifications MySQL
Author: user
In MySQL, is there a way similar to SQL server statistics to know if  the number of rows were modified since the last Analyze Table was executed?  that would help me identify if a table might benefit form analyze table commands

Author: ChatGPT
MySQL does not track row-level modification counts since the last `ANALYZE TABLE` the same way SQL Server does with statistics (like `modification_counter` or `rows_modified`). However, there are a few workarounds and partial strategies depending on the **storage engine** and **MySQL version**.

---

### ???? Key Considerations by Engine

#### 1. **InnoDB (most common engine)**
- InnoDB does **not maintain modification counters** for statistics.
- However, **auto-recalculation** of stats is triggered **based on internal thresholds**, e.g. when 1/16 of rows are modified.

#### 2. **Performance Schema (MySQL 8.0+)**
You can get general table-level statistics, but not modification deltas since last ANALYZE:

```sql
SELECT 
    OBJECT_SCHEMA, 
    OBJECT_NAME, 
    COUNT_STAR AS TotalAccesses, 
    SUM_TIMER_WAIT 
FROM performance_schema.table_io_waits_summary_by_table
WHERE OBJECT_TYPE = 'TABLE'
ORDER BY COUNT_STAR DESC;
```

While this tracks I/O, not modifications specifically, it **helps identify heavily used tables**.

---

### ???? Workaround: Manual Audit Table or Trigger-Based Counters

If you really need to know **how many rows were modified since the last `ANALYZE TABLE`**, here are your options:

#### Option 1: **Store `UPDATE_TIME` + Row Count**
```sql
SELECT 
    table_schema, 
    table_name, 
    update_time, 
    table_rows 
FROM information_schema.tables 
WHERE table_schema = 'your_db';
```

- `UPDATE_TIME` works only for **MyISAM** (not InnoDB).
- `TABLE_ROWS` for InnoDB is approximate.

&gt; ???? You can snapshot `TABLE_ROWS` before/after ANALYZE to detect large changes.

#### Option 2: **Create a "stats snapshot" table**
1. Run `ANALYZE TABLE` manually.
2. Insert into a helper table:

```sql
CREATE TABLE table_stats_snapshot (
    table_name VARCHAR(128),
    row_count BIGINT,
    snapshot_time DATETIME
);

-- Manual snapshot
INSERT INTO table_stats_snapshot
SELECT 
    'my_table',
    TABLE_ROWS,
    NOW()
FROM information_schema.tables
WHERE table_schema = 'your_db' AND table_name = 'my_table';
```

3. Later, compare `TABLE_ROWS` with the stored snapshot to decide whether to run `ANALYZE TABLE`.

---

### ??? Alternative Strategy: Rely on MySQL???s Built-In Logic

MySQL (InnoDB) already uses **automatic statistics recalculation** heuristics:
- Triggered by DML changes on roughly 10% of rows.
- Adaptive depending on table size.

If you're okay with trusting MySQL???s built-in logic, manual `ANALYZE TABLE` is rarely necessary unless:
- You're debugging poor query plans.
- You recently loaded massive amounts of data.

---

### ???? Recommendation for DBA Automation

You could script a **periodic compare of `TABLE_ROWS`** and only run `ANALYZE TABLE` if row count changes by &gt;X%.

Would you like a sample script (e.g., in SQL or Python) that logs row counts and flags tables for analysis?

Author: user
so i already have some logic that is just arbitrarily analyzing every hour, can you modify the procedure below to feature your logic by factoring in from Performance Schema whether an anayze table event might be adventagous

Author: ChatGPT
Sure! Please upload or paste the stored procedure you're using to run `ANALYZE TABLE` every hour, and I???ll modify it to incorporate logic like:

- **Read counts and changes from `performance_schema.table_io_waits_summary_by_table`**.
- Optionally compare estimated `TABLE_ROWS` with a snapshot.
- Only call `ANALYZE TABLE` if the table has significant activity (e.g., high read/write ratio or row count change).

Once I see your current procedure, I???ll tailor it with:
- Threshold settings.
- Efficient checks for InnoDB.
- Logging of when and why `ANALYZE TABLE` was triggered.

Please drop in the procedure text or file so I can get started.

Author: user
here is the procedure -- #################################################################################################
-- procedure 
-- #################################################################################################
DROP PROCEDURE IF EXISTS  `usp_dba_analyze_statistics_all_user_tables`;
DELIMITER $$
CREATE PROCEDURE `usp_dba_analyze_statistics_all_user_tables`()
 BEGIN
 -- ################################################################################################
 -- This updates all statistics, anticipated to have an event to run anywhere form once per hour to once per day
 -- ################################################################################################
   
   
 DECLARE finished INTEGER DEFAULT 0;
 DECLARE param_cmd VARCHAR(4000);
   DECLARE curCommandLoops CURSOR FOR
 -- ###############################################################################################
 -- cursor definition
 -- ###############################################################################################
 SELECT
     CONCAT ( 'ANALYZE TABLE `',t.table_schema,'`.`',t.table_name,'`;') AS cmd
 FROM
   information_schema.tables AS t
 WHERE t.table_schema  NOT IN ('sys','mysql','information_schema','performance_schema') 
   AND t.table_type = 'BASE TABLE' -- don't trip over views
 ORDER BY
   t.table_schema,
   t.table_name ;
   -- ###############################################################################################
     -- declare NOT FOUND handler
 	DECLARE CONTINUE HANDLER 
         FOR NOT FOUND SET finished = 1;
 	OPEN curCommandLoops;
     getData: LOOP
 		fetch curCommandLoops INTO param_cmd;
 		IF finished = 1
            THEN
               LEAVE getData;
            END IF;
 		-- the mySQL equivilent of a dynamic SQL command;
 
             SELECT @cmd = param_cmd;
             PREPARE dynamic_statement FROM @cmd;
             EXECUTE dynamic_statement;
             DEALLOCATE PREPARE dynamic_statement;
     END LOOP getData;
 	CLOSE curCommandLoops;
 END$$
DELIMITER ;

Author: ChatGPT
Thanks for sharing the procedure! Below is a **refactored version** of your `usp_dba_analyze_statistics_all_user_tables` that only executes `ANALYZE TABLE` if the table meets certain activity thresholds, using data from:

- `performance_schema.table_io_waits_summary_by_table` for read/write counts.
- Optional check on row count growth via `information_schema.tables`.

---

### ???? Logic Added:
- Only `ANALYZE TABLE` if either of the following:
  - `COUNT_STAR` (total reads + writes) &gt; 10,000 (you can adjust this).
  - `TABLE_ROWS` growth &gt; 10% since the last snapshot (optional, with helper table).

---

### ??? Modified Procedure:
```sql
DROP PROCEDURE IF EXISTS `usp_dba_analyze_statistics_all_user_tables`;
DELIMITER $$

CREATE PROCEDURE `usp_dba_analyze_statistics_all_user_tables`()
BEGIN
    DECLARE finished INTEGER DEFAULT 0;
    DECLARE schema_name VARCHAR(128);
    DECLARE table_name VARCHAR(128);
    DECLARE total_io BIGINT;
    DECLARE param_cmd VARCHAR(4000);
    
    -- Threshold for IO activity to justify ANALYZE
    DECLARE io_threshold INT DEFAULT 10000;
    
    -- Cursor definition using performance_schema stats
    DECLARE cur CURSOR FOR
        SELECT 
            ps.OBJECT_SCHEMA, 
            ps.OBJECT_NAME,
            ps.COUNT_STAR
        FROM performance_schema.table_io_waits_summary_by_table ps
        JOIN information_schema.tables t 
            ON ps.OBJECT_SCHEMA = t.table_schema AND ps.OBJECT_NAME = t.table_name
        WHERE 
            t.table_schema NOT IN ('sys','mysql','information_schema','performance_schema') 
            AND t.table_type = 'BASE TABLE'
            AND ps.COUNT_STAR &gt;= io_threshold
        ORDER BY ps.OBJECT_SCHEMA, ps.OBJECT_NAME;

    DECLARE CONTINUE HANDLER FOR NOT FOUND SET finished = 1;

    OPEN cur;

    getTables: LOOP
        FETCH cur INTO schema_name, table_name, total_io;
        IF finished = 1 THEN 
            LEAVE getTables;
        END IF;

        -- Construct dynamic SQL
        SET @param_cmd = CONCAT('ANALYZE TABLE `', schema_name, '`.`', table_name, '`;');
        PREPARE dynamic_statement FROM @param_cmd;
        EXECUTE dynamic_statement;
        DEALLOCATE PREPARE dynamic_statement;
    END LOOP;

    CLOSE cur;
END$$

DELIMITER ;
```

---

### ???? Optional Enhancements:
If you want to track row count changes over time to supplement the I/O threshold:

1. **Create a snapshot table**:
```sql
CREATE TABLE IF NOT EXISTS dba_table_stats_snapshot (
    table_schema VARCHAR(128),
    table_name VARCHAR(128),
    table_rows BIGINT,
    snapshot_time DATETIME DEFAULT CURRENT_TIMESTAMP,
    PRIMARY KEY (table_schema, table_name)
);
```

2. Insert snapshot periodically:
```sql
REPLACE INTO dba_table_stats_snapshot (table_schema, table_name, table_rows)
SELECT table_schema, table_name, table_rows
FROM information_schema.tables
WHERE table_type = 'BASE TABLE'
AND table_schema NOT IN ('sys','mysql','information_schema','performance_schema');
```

3. Modify the procedure logic to compare current `TABLE_ROWS` to previous value and trigger `ANALYZE` if delta &gt; 10%.

---

Would you like me to extend the procedure to include this **row count delta tracking** logic as well?

Author: user
lets make the io_threshold  a sliding scale, base don the number of rows in the table, so tables with lots and lots of rows hit a lower threshold earlier than 10%. how a percentage of the square root (number of rows in the table) 

Author: ChatGPT
Great idea ??? a **sliding scale threshold** based on the **square root of the table's row count** is a smart way to scale I/O sensitivity across large vs small tables.

Let???s use this formula for the dynamic threshold per table:

```sql
IO_Threshold = ROUND(SQRT(TABLE_ROWS) * 0.10)
```

This means:
- A small table (100 rows) needs 1 I/O to be eligible.
- A mid-size table (1,000,000 rows) needs ~316 I/O operations.
- A massive table (100,000,000 rows) needs ~1,000 I/O ops.

This approach is logarithmically sensitive to size ??? perfect for long-tail table distributions.

---

### ??? Revised Stored Procedure with Dynamic Threshold:

```sql
DROP PROCEDURE IF EXISTS `usp_dba_analyze_statistics_all_user_tables`;
DELIMITER $$

CREATE PROCEDURE `usp_dba_analyze_statistics_all_user_tables`()
BEGIN
    DECLARE finished INTEGER DEFAULT 0;
    DECLARE schema_name VARCHAR(128);
    DECLARE table_name VARCHAR(128);
    DECLARE total_io BIGINT;
    DECLARE table_rows BIGINT;
    DECLARE io_threshold INT;
    DECLARE param_cmd VARCHAR(4000);
    
    -- Cursor grabs table IO + table row estimates
    DECLARE cur CURSOR FOR
        SELECT 
            ps.OBJECT_SCHEMA, 
            ps.OBJECT_NAME,
            ps.COUNT_STAR,
            t.TABLE_ROWS
        FROM performance_schema.table_io_waits_summary_by_table ps
        JOIN information_schema.tables t 
            ON ps.OBJECT_SCHEMA = t.table_schema AND ps.OBJECT_NAME = t.table_name
        WHERE 
            t.table_schema NOT IN ('sys','mysql','information_schema','performance_schema') 
            AND t.table_type = 'BASE TABLE'
        ORDER BY ps.OBJECT_SCHEMA, ps.OBJECT_NAME;

    DECLARE CONTINUE HANDLER FOR NOT FOUND SET finished = 1;

    OPEN cur;

    getTables: LOOP
        FETCH cur INTO schema_name, table_name, total_io, table_rows;
        IF finished = 1 THEN 
            LEAVE getTables;
        END IF;

        -- Handle NULLs from table_rows (can happen in InnoDB)
        SET table_rows = IFNULL(table_rows, 0);

        -- Calculate dynamic threshold: 10% of sqrt(row count)
        SET io_threshold = ROUND(SQRT(GREATEST(table_rows, 1)) * 0.10);

        -- Decide whether to analyze
        IF total_io &gt;= io_threshold THEN
            SET @param_cmd = CONCAT('ANALYZE TABLE `', schema_name, '`.`', table_name, '`;');
            PREPARE dynamic_statement FROM @param_cmd;
            EXECUTE dynamic_statement;
            DEALLOCATE PREPARE dynamic_statement;
        END IF;
    END LOOP;

    CLOSE cur;
END$$

DELIMITER ;
```

---

### ???? Notes:
- We use `GREATEST(table_rows, 1)` to avoid square rooting zero.
- This allows even **tiny tables** to be considered when touched.
- It???s still conservative on **very large tables**, but more responsive than a flat 10%.

Would you also like this procedure to **log which tables it analyzed**, for auditing or later performance review?

Author: user
can you help me understand what the total_io  repreents? it's number of bytes written, or number of pages of data?

Author: ChatGPT
Great question!

The `total_io` value in the procedure (coming from `performance_schema.table_io_waits_summary_by_table.COUNT_STAR`) represents the **number of I/O *operations*** ??? **not** bytes, not pages.

---

### ???? Breakdown: What is `COUNT_STAR`?

In `performance_schema.table_io_waits_summary_by_table`, the relevant fields include:

| Column          | Meaning                                                  |
|-----------------|-----------------------------------------------------------|
| `COUNT_STAR`    | Total number of I/O operations (reads + writes + fetches) |
| `SUM_TIMER_WAIT`| Total time spent on I/O (in picoseconds)                  |
| `COUNT_READ`, `COUNT_WRITE`, `COUNT_FETCH`, etc. | Specific I/O type counts |

So:

- **`COUNT_STAR`** is a **raw operation counter**.
- It increments **every time** MySQL accesses a table ??? via:
  - `SELECT`, `INSERT`, `UPDATE`, `DELETE`, `REPLACE`
  - Includes full table scans or index lookups
- It does **not** tell you how **much** data (e.g., in bytes or pages) was read or written ??? just that **some I/O occurred**.

---

### ???? Example:
Imagine:

| Table       | `TABLE_ROWS` | `COUNT_STAR` |
|-------------|--------------|--------------|
| `orders`    | 10,000,000    | 1,000        |
| `audit_log` | 5,000         | 2,500        |

This means:
- `orders` had 1,000 I/O ops across 10M rows (relatively light).
- `audit_log` had 2,500 I/O ops across 5K rows (high I/O density).

Your dynamic threshold scaling is ideal in such a case ??? `audit_log` would likely hit the `ANALYZE` threshold sooner than `orders`.

---

### ???? Want more granularity?

If you want to distinguish **reads vs writes**, you could extend the procedure to use:

```sql
COUNT_READ + COUNT_WRITE + COUNT_FETCH
```

Or even better, **weight** writes higher than reads if you're focused on update-heavy stats skew.

Would you like help adjusting the procedure to do that?

Author: user
yes please

Author: ChatGPT
Perfect ??? here???s an enhanced version of your procedure that:

1. **Separates out read, write, and fetch counts** from `performance_schema.table_io_waits_summary_by_table`.
2. **Weights them**, so that writes are considered more impactful (default: write = 2x read/fetch).
3. Computes a **weighted I/O score** and compares it to the threshold based on `???(table_rows) * 0.10`.

---

### ??? Weighted I/O Logic

```sql
weighted_io = COUNT_READ * 1.0 + COUNT_WRITE * 2.0 + COUNT_FETCH * 1.0
```

You can adjust the weight factors (1.0, 2.0, 1.0) as needed.

---

### ???? Modified Stored Procedure

```sql
DROP PROCEDURE IF EXISTS `usp_dba_analyze_statistics_all_user_tables`;
DELIMITER $$

CREATE PROCEDURE `usp_dba_analyze_statistics_all_user_tables`()
BEGIN
    DECLARE finished INTEGER DEFAULT 0;
    DECLARE schema_name VARCHAR(128);
    DECLARE table_name VARCHAR(128);
    DECLARE count_read BIGINT;
    DECLARE count_write BIGINT;
    DECLARE count_fetch BIGINT;
    DECLARE table_rows BIGINT;
    DECLARE io_threshold DOUBLE;
    DECLARE weighted_io DOUBLE;
    DECLARE param_cmd VARCHAR(4000);

    -- Cursor grabs IO stats and row estimates
    DECLARE cur CURSOR FOR
        SELECT 
            ps.OBJECT_SCHEMA, 
            ps.OBJECT_NAME,
            ps.COUNT_READ,
            ps.COUNT_WRITE,
            ps.COUNT_FETCH,
            t.TABLE_ROWS
        FROM performance_schema.table_io_waits_summary_by_table ps
        JOIN information_schema.tables t 
            ON ps.OBJECT_SCHEMA = t.table_schema AND ps.OBJECT_NAME = t.table_name
        WHERE 
            t.table_schema NOT IN ('sys','mysql','information_schema','performance_schema') 
            AND t.table_type = 'BASE TABLE'
        ORDER BY ps.OBJECT_SCHEMA, ps.OBJECT_NAME;

    DECLARE CONTINUE HANDLER FOR NOT FOUND SET finished = 1;

    OPEN cur;

    getTables: LOOP
        FETCH cur INTO schema_name, table_name, count_read, count_write, count_fetch, table_rows;
        IF finished = 1 THEN 
            LEAVE getTables;
        END IF;

        -- Handle NULLs and floor for table row count
        SET table_rows = IFNULL(table_rows, 0);
        SET io_threshold = ROUND(SQRT(GREATEST(table_rows, 1)) * 0.10);

        -- Compute weighted I/O activity
        SET weighted_io = count_read * 1.0 + count_write * 2.0 + count_fetch * 1.0;

        -- Execute only if weighted I/O exceeds the calculated threshold
        IF weighted_io &gt;= io_threshold THEN
            SET @param_cmd = CONCAT('ANALYZE TABLE `', schema_name, '`.`', table_name, '`;');
            PREPARE dynamic_statement FROM @param_cmd;
            EXECUTE dynamic_statement;
            DEALLOCATE PREPARE dynamic_statement;
        END IF;
    END LOOP;

    CLOSE cur;
END$$

DELIMITER ;
```

---

### ???? Summary of Behavior

| Metric           | Weight |
|------------------|--------|
| `COUNT_READ`     | 1.0    |
| `COUNT_FETCH`    | 1.0    |
| `COUNT_WRITE`    | 2.0    |

You can tweak these to prioritize the kinds of workload changes that most impact statistics quality.

Would you like this to also **log** which tables were analyzed, when, and why (e.g., I/O score vs threshold)?

Author: user
yes please

Author: ChatGPT
Great ??? here's the final enhanced version of the procedure, which now includes:

---

### ??? New Feature: Logging

We'll log:
- Schema + table name
- Weighted I/O
- Threshold used
- Timestamp of analysis

---

### ???? 1. Create the Logging Table (run this once)

```sql
CREATE TABLE IF NOT EXISTS dba_analyze_statistics_log (
    log_id INT AUTO_INCREMENT PRIMARY KEY,
    table_schema VARCHAR(128),
    table_name VARCHAR(128),
    table_rows BIGINT,
    count_read BIGINT,
    count_write BIGINT,
    count_fetch BIGINT,
    weighted_io DOUBLE,
    io_threshold DOUBLE,
    analyze_executed TINYINT(1),
    analyze_time DATETIME DEFAULT CURRENT_TIMESTAMP
);
```

---

### ???? 2. Final Version of the Stored Procedure

```sql
DROP PROCEDURE IF EXISTS `usp_dba_analyze_statistics_all_user_tables`;
DELIMITER $$

CREATE PROCEDURE `usp_dba_analyze_statistics_all_user_tables`()
BEGIN
    DECLARE finished INTEGER DEFAULT 0;
    DECLARE schema_name VARCHAR(128);
    DECLARE table_name VARCHAR(128);
    DECLARE count_read BIGINT;
    DECLARE count_write BIGINT;
    DECLARE count_fetch BIGINT;
    DECLARE table_rows BIGINT;
    DECLARE io_threshold DOUBLE;
    DECLARE weighted_io DOUBLE;
    DECLARE analyze_needed TINYINT(1) DEFAULT 0;
    DECLARE param_cmd VARCHAR(4000);

    -- Cursor pulls table stats and I/O metrics
    DECLARE cur CURSOR FOR
        SELECT 
            ps.OBJECT_SCHEMA, 
            ps.OBJECT_NAME,
            ps.COUNT_READ,
            ps.COUNT_WRITE,
            ps.COUNT_FETCH,
            t.TABLE_ROWS
        FROM performance_schema.table_io_waits_summary_by_table ps
        JOIN information_schema.tables t 
            ON ps.OBJECT_SCHEMA = t.table_schema AND ps.OBJECT_NAME = t.table_name
        WHERE 
            t.table_schema NOT IN ('sys','mysql','information_schema','performance_schema') 
            AND t.table_type = 'BASE TABLE'
        ORDER BY ps.OBJECT_SCHEMA, ps.OBJECT_NAME;

    DECLARE CONTINUE HANDLER FOR NOT FOUND SET finished = 1;

    OPEN cur;

    getTables: LOOP
        FETCH cur INTO schema_name, table_name, count_read, count_write, count_fetch, table_rows;
        IF finished = 1 THEN 
            LEAVE getTables;
        END IF;

        SET table_rows = IFNULL(table_rows, 0);
        SET io_threshold = ROUND(SQRT(GREATEST(table_rows, 1)) * 0.10);

        SET weighted_io = count_read * 1.0 + count_write * 2.0 + count_fetch * 1.0;

        IF weighted_io &gt;= io_threshold THEN
            SET analyze_needed = 1;
            SET @param_cmd = CONCAT('ANALYZE TABLE `', schema_name, '`.`', table_name, '`;');
            PREPARE dynamic_statement FROM @param_cmd;
            EXECUTE dynamic_statement;
            DEALLOCATE PREPARE dynamic_statement;
        ELSE
            SET analyze_needed = 0;
        END IF;

        -- Log the decision and metrics
        INSERT INTO dba_analyze_statistics_log (
            table_schema,
            table_name,
            table_rows,
            count_read,
            count_write,
            count_fetch,
            weighted_io,
            io_threshold,
            analyze_executed
        ) VALUES (
            schema_name,
            table_name,
            table_rows,
            count_read,
            count_write,
            count_fetch,
            weighted_io,
            io_threshold,
            analyze_needed
        );
    END LOOP;

    CLOSE cur;
END$$

DELIMITER ;
```

---

### ???? Example Output from `dba_analyze_statistics_log`:

| table_schema | table_name | weighted_io | io_threshold | analyze_executed |
|--------------|------------|-------------|--------------|------------------|
| `sales`      | `orders`   | 850.0       | 500          | 1                |
| `audit`      | `log`      | 50.0        | 100          | 0                |

---

Let me know if you???d like:
- Email or Slack notifications on high-I/O tables.
- A daily summary report from this log.
- Truncation or retention policy on the log table.

Leave a Reply Cancel reply

You must be logged in to post a comment.