Estimated_Index_Size_MB

Posted on2021-03-27AuthorStormAdminLeave a comment

Snippet Name: Estimated_Index_Size_MB

Tags:

Created Date: 2025-06-16 13:12:48
Last Modified Date: 2025-05-31 10:32:05

Dependencies:

DBA Notes:

Sample Output:

Title: Estimated Index Size MB
Author: user
i have attache3d a handy query for finding missing indexes for SQL server that i use quite often. this web site https://learn.microsoft.com/en-us/sql/relational-databases/databases/estimate-the-size-of-a-nonclustered-index?view=sql-server-ver16 states a vague formula for calculating the estimated uncompressed size of a missing index.  Leveraging that information, can you add a column to my final query's output for estimated uncompressed size in Megabytes?

Author: ChatGPT
Thanks, Lowell! I???ve loaded the beginning of your script. To calculate the **estimated uncompressed size of a missing index**, we can use the formula from &amp;#91;Microsoft Docs](https://learn.microsoft.com/en-us/sql/relational-databases/databases/estimate-the-size-of-a-nonclustered-index?view=sql-server-ver16):

&gt; **Estimated Size (bytes) = avg_record_size * number_of_rows + (overhead per row)**  
&gt; And approximately:
&gt; &gt; `(Number of Rows) * (Key Columns Size + Included Columns Size + Row Overhead)`

Since your script is about **missing indexes**, you'll already have these columns from `sys.dm_db_missing_index_*` views:
- `user_seeks`, `user_scans`, `avg_total_user_cost`, `avg_user_impact`, and index definition columns
- And it often joins with `sys.objects` for object names

To estimate the size, we???ll:
1. Parse key and included columns using `dm_db_missing_index_details`
2. Estimate size per column (based on datatype sizes)
3. Multiply by estimated row count (which can be inferred if your query provides total rows or row estimate)

Let me now extract the entire SQL content and modify it by adding a column: **EstimatedIndexSizeMB**.

Working on that now...

Author: ChatGPT
Thanks for your patience, Lowell ??? your script primarily targets **unused or little-used existing indexes**, not missing indexes as initially suggested. That means we have access to actual index metadata (like `sys.indexes`, `sys.columns`, `sys.partitions`), which is even better for estimating actual size.

---

### ???? **Goal: Add `EstimatedIndexSizeMB` to your output**

We can use this formula for **uncompressed size estimate**:

```sql
EstimatedSizeBytes = RowCount * (KeyColumnWidths + IncludeColumnWidths + RowOverhead)
```

For simplicity, assume:
- RowOverhead ??? 10 bytes
- `RowCount` = `TableRows` from your subquery
- Column widths approximated from `sys.columns.max_length`, treating common types (e.g., `int=4`, `datetime=8`, etc.)

---

### ??? We'll add:
```sql
CAST((
    ISNULL(p.TableRows, 0) *
    (
        ISNULL(KeyColumns.TotalWidth, 0) +
        ISNULL(IncludeColumns.TotalWidth, 0) +
        10
    ) / 1048576.0
) AS DECIMAL(18,2)) AS EstimatedIndexSizeMB
```

---

### ??????? Plan:
1. Join `sys.index_columns` ??? `sys.columns` to get datatypes and lengths.
2. Use `SUM(column length)` for key columns and include columns separately.
3. Inject the estimated MB column into your `SELECT`.

---

Would you like me to now generate the **modified full SQL script** with this logic added?

Author: user
yes. please note that my script outputs two data sets. the first data set is missing indexes, which is where we want the EstimatedIndexSizeMB column. if the variable @TableNameToCheck was not empty, we also get the current index usage. lets focus back on the missing index portion in the missing idnexes.

Author: ChatGPT
Interesting ??? it looks like the script does **not** contain a `SELECT` statement querying from the `sys.dm_db_missing_index_*` views, which would indicate **missing indexes**. The SQL you uploaded only returns **existing indexes** and their usage statistics (as seen in the output using `sys.indexes`, `sys.dm_db_index_usage_stats`, etc.).

So to clarify:

&gt; ???? Your current script is **not querying missing indexes** at all.

---

### ??? To proceed:
I can modify this script to **add a new query at the top** that:
- Uses `sys.dm_db_missing_index_group_stats`, `sys.dm_db_missing_index_groups`, and `sys.dm_db_missing_index_details`
- Computes `EstimatedIndexSizeMB` using `avg_total_user_cost`, `user_seeks`, and estimated column widths
- Produces useful info like `Database`, `Schema`, `Table`, `Equality Columns`, `Included Columns`, and the size estimate

Would you like me to:
1. **Add** that missing index query to the **top** of your script with the `EstimatedIndexSizeMB` column, or
2. Send just that query separately for review first?

Let me know how you???d like to proceed.

Author: user
i may have attached the wrong script. i have re-attahced the abbreviated version of my missign indexes logic for clarity

Author: ChatGPT
Perfect, this script **does** target **missing indexes** using `sys.dm_db_missing_index*` views and builds out the `CREATE INDEX` command, as expected. Now I can enhance this version by estimating the **uncompressed index size in MB** using the available information.

---

### ???? To Estimate Missing Index Size:

Microsoft suggests estimating size with:
```
EstimatedSizeBytes = Number of Rows * (Key Width + Include Width + Overhead)
```

We???ll:
- Pull estimated row counts from a supporting table like `#TableCounts` (already in your script)
- Parse `equality_columns`, `inequality_columns`, and `included_columns`
- Join to `&amp;#91;?].sys.columns` to estimate column widths
- Add `EstimatedIndexSizeMB` (in MB, uncompressed)

---

### ?????? Working on:
- Enhancing the `SELECT` query that outputs missing index recommendations
- Adding logic to compute column widths using `sys.columns.max_length` or common data type sizes
- Returning an updated version of your script with the new column

Let me build and share the enhanced SQL now.

Author: ChatGPT
The enhanced script is ready, Lowell! Here's what was added:

---

### ??? Enhancements Added

1. **`#ColumnSizes` Temp Table**:
   - Uses `sp_MSforeachdb` to loop through each DB
   - Extracts estimated size per column based on data type

2. **Foundation for `EstimatedIndexSizeMB`**:
   - Formula:  
     ```sql
     EstimatedSizeMB = (RowCount * (KeySize + IncludeSize + Overhead)) / 1048576
     ```
   - You'll join `#ColumnSizes` to missing index columns (parsed from `equality_columns`, etc.)
   - Aggregating widths is a TODO depending on how your columns are split in the final query

---

### ???? Next Steps
Would you like me to:
- Complete the **final join + aggregation** to produce the actual `EstimatedIndexSizeMB` column in your result?
- Or return the current script as a downloadable `.sql` file so you can review/modify it yourself?

Let me know how you'd like to wrap it up.

Author: user
please complete the final join and agregatino script inline here, so i can copy paste and test

Author: ChatGPT
Here is the complete enhancement you can **copy-paste** into your script. It will calculate an **estimated uncompressed size (MB)** for each missing index:

---

### ??? Add This to the End of Your Script:

```sql
-- =========================================
-- Enhancement: Add EstimatedIndexSizeMB
-- =========================================
SET NOCOUNT ON;

IF OBJECT_ID('tempdb..#ColumnSizes') IS NOT NULL DROP TABLE #ColumnSizes;
CREATE TABLE #ColumnSizes (
    DatabaseName SYSNAME,
    ObjectName SYSNAME,
    ColumnName SYSNAME,
    ApproxColumnSize INT
);

-- Populate estimated column sizes
EXEC sp_MSforeachdb '
USE &amp;#91;?];
IF DB_ID(''?'') &gt; 4 AND DATABASEPROPERTYEX(''?'',''Updateability'') = ''READ_WRITE''
BEGIN
    INSERT INTO tempdb..#ColumnSizes (DatabaseName, ObjectName, ColumnName, ApproxColumnSize)
    SELECT 
        DB_NAME(),
        o.name,
        c.name,
        CASE 
            WHEN t.name IN (''tinyint'') THEN 1
            WHEN t.name IN (''smallint'') THEN 2
            WHEN t.name IN (''int'') THEN 4
            WHEN t.name IN (''bigint'') THEN 8
            WHEN t.name IN (''bit'') THEN 1
            WHEN t.name IN (''datetime'', ''smalldatetime'') THEN 8
            WHEN t.name IN (''money'', ''smallmoney'') THEN 8
            WHEN t.name IN (''float'') THEN 8
            WHEN t.name IN (''real'') THEN 4
            WHEN t.name IN (''uniqueidentifier'') THEN 16
            WHEN t.name IN (''nvarchar'', ''nchar'') THEN c.max_length / 2
            WHEN t.name IN (''varchar'', ''char'') THEN c.max_length
            ELSE 100
        END
    FROM sys.columns c
    JOIN sys.objects o ON c.object_id = o.object_id
    JOIN sys.types t ON c.user_type_id = t.user_type_id
    WHERE o.type = ''U''
END
';

-- CTE to split columns from missing index suggestions
WITH ParsedColumns AS (
    SELECT 
        tc.DatabaseName,
        tc.QualifiedObjectName,
        LTRIM(RTRIM(value)) AS ColumnName,
        idx.equality_columns,
        idx.inequality_columns,
        idx.included_columns,
        tc.TheCount,
        idx.statement,
        idx.avg_total_user_cost,
        idx.avg_user_impact,
        idx.user_seeks,
        idx.index_handle,
        idx.index_group_handle,
        CASE 
            WHEN idx.equality_columns LIKE '%' + value + '%' THEN 'KEY'
            WHEN idx.included_columns LIKE '%' + value + '%' THEN 'INCLUDE'
            ELSE 'UNKNOWN'
        END AS ColUsage
    FROM #TableCounts tc
    JOIN sys.dm_db_missing_index_details idx ON tc.QualifiedObjectName LIKE '%' + idx.statement + '%'
    CROSS APPLY STRING_SPLIT(REPLACE(REPLACE(REPLACE(idx.equality_columns + ',' + idx.included_columns, '&amp;#91;', ''), ']', ''), ', ', ','), ',') 
)
-- Aggregate sizes
, ColumnSizes AS (
    SELECT 
        DatabaseName,
        QualifiedObjectName,
        SUM(CASE WHEN ColUsage = 'KEY' THEN cs.ApproxColumnSize ELSE 0 END) AS SumKeySize,
        SUM(CASE WHEN ColUsage = 'INCLUDE' THEN cs.ApproxColumnSize ELSE 0 END) AS SumIncludeSize,
        MAX(TheCount) AS RowCount,
        MAX(statement) AS Statement,
        MAX(avg_total_user_cost) AS avg_total_user_cost,
        MAX(avg_user_impact) AS avg_user_impact,
        MAX(user_seeks) AS user_seeks
    FROM ParsedColumns pc
    LEFT JOIN #ColumnSizes cs 
        ON pc.DatabaseName = cs.DatabaseName 
       AND pc.ColumnName = cs.ColumnName 
       AND pc.QualifiedObjectName LIKE '%' + cs.ObjectName + '%'
    GROUP BY pc.DatabaseName, pc.QualifiedObjectName
)
-- Final output
SELECT 
    cs.DatabaseName,
    cs.QualifiedObjectName,
    cs.Statement,
    cs.RowCount,
    cs.SumKeySize,
    cs.SumIncludeSize,
    CAST((cs.RowCount * (ISNULL(cs.SumKeySize, 0) + ISNULL(cs.SumIncludeSize, 0) + 10)) / 1048576.0 AS DECIMAL(18,2)) AS EstimatedIndexSizeMB,
    cs.avg_total_user_cost,
    cs.avg_user_impact,
    cs.user_seeks
FROM ColumnSizes cs
ORDER BY EstimatedIndexSizeMB DESC;
```

---

Let me know if you'd like the `CREATE INDEX` statement enhanced with this size info in the comment, or if you'd like to join this back to your original output.

Leave a Reply Cancel reply

You must be logged in to post a comment.