Category: Big Data

Machine Learning System Design is a mindset

I was interviewing a candidate recently for a Machine Learning Engineering role. I asked some questions about Machine Learning Deployment and Drift. In general, the...

Analyzing Machine Learning Retraining Pipelines

A key requirement to successful MLOps practice is the ability to build and maintain reliable and repeatable training pipelines, also called Machine Learning Retraining...

Introducing Machine Learning System Design

Why do we need Machine Learning System Design? In their seminal paper, Hidden Technical Debt in Machine Learning Systems, Google researchers expound that only a small...

Data Profiling in Power BI (using Azure Databricks)

In Microsoft, there are two worlds i.e. MS Azure and MS Office 365. They are two two different Active Directories in Microsoft world. Hence, they have their own tools to...

Motivating Entity Resolution for Data Science

Why Entity Resolution? Data is the new oil. Thus, analytical models are the new combustion engines. A combustion engine functions efficiently with good fuel. Similarly,...

An Introduction to Azure Synapse SQL

Evolution of Azure Synapse SQL Azure Synapse was previously known as Azure SQL Datawarehouse. With the re-branding to Synapse, Microsoft added many more layers on top of...

Azure Databricks source in PowerBI

Microsoft PowerBI is  a great tool for Data Visualization. It can connect to a variety of sources. However, databases remain a popular data source. But, what if you...

Overview of the exam DP-900 : Azure Data Fundamentals

Motivating DP-900 : Azure Data Fundamentals Data Engineering is one of the fastest growing career opportunity for people aspiring a career in machine learning and AI....

Motivating Databricks Delta in Azure

Exploratory data analysis entails a lot of ad-hoc analysis. To do so, either they have to rely on databases or file systems like data lakes. Now, to analyze these...

Tutorial: Hierarchical Clustering in Spark with Bisecting K-Means

In the previous article, we covered the standard K-Means Clustering technique on Spark. Read that article here: Tutorial : K-Means Clustering on Spark. In this article,...