Very Slowly Changing Dimensions

– Traditional design patterns for Slowly Changing Dimensions (SCD) normally append extra attributes within the main table. This adds an overhead of maintaining these attributes. They are great when these attributes are few (like one for insert, delete or update), when the dimension itself is relatively small, and when the rows are relatively changing not very slowly. But if we have a relatively large dimension, in terms of rows and/or of columns, and there only few occasional rows changing per ETL run, the maintenance and space taken by the SCD columns become of an issue. Furthermore, using one update flag then updating all columns of this dimension become a large overhead, and adding one an update-flag per dimension attribute column adds too many attributes to such large dimension table, making it even more bigger (double the size or more, N attributes + N attribute-update flags).

If you have a dimension that changes on a really slowly fashion, it is relatively large table, it has potential updates on one of several, you may want to create a dimension-change table, where you store the differences and changes needed between the old and new versions of data.

Build a component in SSIS that regularly rebuilds this history table:

Build a logic that detects and flags the rows that need to be deleted or inserted, and the attributes that need to be updated:

– Integrate this logic into ETL:
pastedImage_2.png
Build another task to prebuild the historics structure of your object if not already exists:

– Build another package that exports this into the (potentially dynamic) destination table:

– Build a Data Flow task with a OLE DB source that only detects one type of the ETL change, here we pick the insertion as an example.

– add an OLE DB command that inserts into the ETL Out table only the insert.

To reuse your design pattern, make the table name dynamic, create a table name parameter

Then convert this query into a parameterized query and inserted the table name parameter in it.

Then in the Data Flow Expressions Property assign this query-in variable into the SQL command of the OLE DB Command.

(Or even faster: make this as a fast load with an OLE DB destination and parameterize the name of the table:)

– Repeat the same design pattern for Deletions.

– And repeat the same for OLE DB Command to delete from the table.

– And parameterize this query to make the package dynamically choose the name of the table.

– SQL Update:
– Build the same pattern for update: Data task with a OLEDB source collecting the attribute that needs to be updated, a OLE DB Command task updating this column in destination.

– Parameterize this update query and insert the table name parameter into it, like we did for Insert and Delete tasks.

– Add one parameter (ColumnToUpdate) for the name of column to change.

– Then unanymize the name of the column to update in source query:

– Parameterize your query: Modify the expression Parameterize

– To save yourself to perform this task for each attribute, make a loop: Build a Foreach Loop Container – put your list of all table attributes that can potentially have changed (so exclude any keys) and put your above update data task into it.

– Assure proper configuration of the “Variable mappings” tab.

– Test all.

– Test it for one update:

Leave a Reply

Your email address will not be published. Required fields are marked *