Data Science for IoT



When I was struggling with my health, a statement (paraphrasing) from an entrepreneur got me thinking:

If you want to improve on something, start measuring that aspect of your life.

With advances in technology, wearable devices do provide some trace of your health. Similarly, a host of other devices augment you with different aspects of life. So how do these devices enhance different aspects of your lives? How do they collect, process, analyze data for you? The answer lies in the discipline of  Internet of Things, abbreviated as IoT. IoT enhances our lives with the sub-discipline of Data Science for IoT. 

What is IoT? 

There was a time when the word ‘interaction’ was restricted to humans. Gradually, computers started interacting, leading to the explosion of the Internet. Finally, we have reached a point where ‘devices’ interact with each other, ushering in the era of the Internet of Things or IoT. The Internet of things (IoT) is the distributed collection of devices with an internet connection. It includes computers, mobile devices, wearable devices, etc. However, the key revolution in IoT includes devices with sensors. This removes the need for human effort to enter the data, thus reducing human biases and errors. Moreover, devices can measure signals, unintelligible to human senses.

However, all the data generated and processed is of little use if not leveraged for augmenting human life. This is where Data Science plays a role.

Role of Data Science in IoT 

As Charles Darwin says, ‘It is not the strongest of the species that survives, nor the most intelligent that survives. It is the one that is most adaptable to change’. The ability to sense changes and process the same enhances decision making. Devices sense tons of data through their sensors. All these signals needs to be captured and processed efficiently via a robust, scalable infrastructure. Data Engineers/Architects setup this infrastructure to extract, process and store this data. Further more Data Scientists analyze this data, build reports and train ML models on top of it.

Let us understand this with an article I wrote, named An Introduction to Azure IoT with Machine Learning, fours years ago. In that blog, I took an example of a system that monitors temperature and humidity. Imagine an industrial system where Temperature and Humidity needs to be within a range and any potential anomaly needs to be detected and flagged. Here is the architecture diagram for the same:


In this system, we have raspberry pi simulator that sends temperature and humidity values to an Azure IoT Hub. Azure IoT Hub interfaces with Azure Stream Analytics, which processes the data and stores it CosmosDB. This data in CosmosDB is used to train an anomaly detection algorithm, used by Azure Stream Analytics to detect anomalies in near real-time. Lastly, Power BI connects to Azure Stream Analytics for visualizing data in real-time.

IoT serving pattern

I chose this example to introduce the readers to a broader set of ideas about Data Engineering and Data Science for IoT. Data from devices/sensors is sent to an IoT Gateway, and Azure IoT hub in this case. An IoT gateway streamlines the data and sends it to a stream processing service –  Azure Stream Analytics in this case, to be stored in a storage viz. Cosmos DB in this case. This data can be used to build reports (batch or real-time) and train Machine Learning models. This is called the IoT serving pattern for downstream use cases in the book Fundamentals of Data Engineering by Joe Reis and Matt Housley. Here is the broad. Here is a flow from the book that broadly encompasses these ideas.

Importance of Data Science in IoT 

IoT and Data Science are amongst the most defining technologies of our age. The relation between the two has been very productive. Data Science enhances IoT, while IoT requirements have led to tremendous progress in Data Engineering, thus leading to better, faster Data Science.

The value proposition of IoT is largely defined by the Data management and Analytics around it. Hence, Data Engineering and Data Science in IoT is gaining wide traction. Insights generated out of Data from devices enables key decisions and timely actions. For example, Predictive Maintenance enables businesses to gauge potential failures early on, thus reducing downtime. Similarly, timely insights from your smart watches, let you make better health decisions.

Key Skills for IoT Data Scientist 

Till now, we went through the why and what of Data Science for Internet of Things. The next logical questions would be around the how. The aforementioned example gave a glimpse. Data Science with IoT brings in many theaters to upskill. From big data to machine learning, architecting an IoT system presents a host of opportunities and challenges.

Here are some key skills of Data Engineering and Data Science in IoT.

Key Big Data Skill(s)

By now, you may have gauged that IoT requires a lot of Data Engineering efforts. High volumes, velocity and variety of data make it one of the most prominent use case for Big Data. Amongst these, Stream Processing/Stream Analytics is the central piece of the puzzle. Data Engineers/Data Scientists for IoT must be able to handle and manipulate large streams of Data and store them efficiently for further use.

Key Machine Learning Skill(s)

Equally important are the Machine Learning skills. With large volumes of data coming in, faster decisions become imperative. Predictive Maintenance is a typical use case, where advanced health warnings about your system could save costs. Anomaly Detection and Time Series Modeling come in handy.

Another key area of Machine Learning that could be useful is Reinforcement learning. An agent can learn from its surroundings and react accordingly in real-time; self-driving cars are the case in point.

Differences Between Traditional and Data Science for IoT 

Now, what’s the difference between Traditional Data Science and Data Science for IoT?

Firstly, Traditional Data Science is largely dominated batch processing, whereas, IoT needs real time processing. This leads to static/non-real time reports, whereas, IoT scenarios need dynamic, fast changing visuals.

Traditional Data Science deals with all ranges of volumes, from low to high. Whereas, IoT systems generate massive amounts of Data. Storing and querying these datasets is a different ball game in itself.

On the Machine Learning front, Traditional Data Science typically needs Batch Inferencing of Models. Whereas, in IoT scenarios, ML model inferencing needs to be real-time. Hence, the models should have low time complexity during serving.

Challenges of Data Science for IoT 

Any great technology brings along its own unique set of challenges. Data Science for IoT has its own share. Let’s look at them:

Data Management and Security 

With increasing volume and velocity of Data, data management is a challenge. Storage and compute gets hard and expensive to manage. On the other hand, securing such speed and substance is challenging too. Risk of cybersecurity events and data theft are plausible.

Scaling Issues 

With increasing number of users, devices increase. The infrastructure needs to scale accordingly. Managing the scaling has its security and cost risks.

Data Analytics Skills 

Even if Infrastructure, security and scaling is taken care of, there is a serious dearth of skills for IoT Data Science. Real Time Analytics, Anomaly Detection, Reinforcement learning are some specialized skills, unavailable in job market easily. Moreover, more specialized skills like below are scarce:

Edge Computing 

Imagine your device is located on a remote ship, which has intermittent access to internet. Edge computing enables you to run your computing workloads on a remote device. However, it is easier said than done. It takes a great deal of skill and expertise to do so.

Computer-aided Design 

Real world devices are expensive. Hence, it is a standard practice to create simulations. CAD software enable you to do so. However, these software and tools require specialized skills and are expensive, too.

IoT Computing Frameworks 

Lastly, the IoT design requires specialized frameworks. It takes experience and expertise to use these computing frameworks.

Operating Costs 

All the above challenges naturally have a cost implication, thus making IoT and Data Science a costly affair.

Technologies for IoT Data Scientists 

Nonetheless, there is a wide variety of software tools available for implementing Data Stack for IoT. Especially, with emergence of Cloud computing, tools are accessible with relative ease. Kafka, Azure IoT hub are popularly used IoT Gateways, while Storm and Azure Stream Analytics are famous stream processing engine. With Cloud ML Deployments, security and scalability aren’t enormous challenges anymore.

How Data Science and IoT are Shaping the Future? 

With an increasing number of devices, Data is growing at a humongous pace. We are surrounded by smart devices all around to help us make better decisions and lead better lives. That said, it will bring a host of new challenges on scalability and security front. If dealt well, it autonomous systems will enhance lives by removing mundane tasks, allowing humans to live a more productive and fulfilling lives.


Data Science and IoT and defining technologies of 21st century. Their combination has already led us to Industry 4.0, the fourth Industrial revolution. I hope this article motivates readers to take up careers in this exciting field and take technology forward. You can also read the same article on KnowledgeHut.

I am a Data Scientist with 6+ years of experience.

Leave a Reply