Comparing SSIS Lookup and SQL Joins


Motivating SSIS lookups

Denormalization is the most important step to create a Data warehouse. Fundamentally, we do so to populate surrogate key of a dimension in a fact table. However, we can perform denormalization in two ways viz. SQL joins or a lookup component in an ETL tool. In Microsoft SSIS, we have a Data Flow Task, which contains the Lookup component.

The three modes of SSIS Lookups

SSIS lookup component can be used in 3 modes viz. full cache, partial cache and no-cache. Read this article to know more: SSIS Look Up Component. In the article, we have an important point to note about SSIS lookups i.e. in full cache mode the string comparison is case-sensitive, which is generally not true in case of SQL joins(depends on collation).

Usually, SQL joins perform the case-insensitive comparison. Let’s take an example.

The Illustration contains a dimension table DimGeo with columns named GeoId and Geography. Furthermore, we have a staging table stgFactSales having a column called Geography. After denormalization using joins we get FactSales.  Note that DimGeo contains India in camel case whereas stgFactSales holds it in upper case. After denormalization, we get the GeoKey as ‘1’ for India.

Now let’s go to another illustration with Lookups.

After a lookup on Geography between DimGeo and stgFactSales, we get the following results.

We can see that GeoKey is populated as NULL for India.

The reason for Case Sensitive comparison in SSIS Lookup Full Cache mode

In full cache mode, the .NET based SSIS Engine carries out string comparison, while SQL Server Engine takes over in partial and no-cache mode. We cannot assert that whether this nuance is a boon or a bane since this is a call of the business requirement. Lastly, interchanging the joins and lookups blindly might lead to undesirable outcomes.

Read: What is a Data Hub: Concepts and Guidelines

2 thoughts on “Comparing SSIS Lookup and SQL Joins

Leave a Reply