Ground Truth: Towards Labeling On-Demand IoT Traffic
Abstract
A lack of transparency has accompanied the rapid proliferation of Internet of Things(IoT) devices. To this end, a growing body of work exists to classify IoT device traffic to identify unexpected or surreptitious device activity. However, this work requires fine-grained labeled datasets of device activity. This paper proposes a holistic approach for IoT device traffic collection and automated event labeling. Our work paves the way for future research by thoroughly examining different techniques for synthesizing and labeling on-demand traffic from IoT sensors and actuators. To demonstrate this approach, we instrumented a smart home environment consisting of 57 IoT devices spanning cameras, doorbells, locks, alarm systems, lights, plugs, environmental sensors, and hub. We release a sample dataset consisting of 16,576 labeled events over 467,883network flows. Our results indicate that vendor APIs, trigger-action frameworks, and companion notifications can be used to generate scientifically valuable labeled datasets of IoT traffic and can used to automatically produce future datasets.