Learning Meaningful Representations of Surgical Motion Without Annotations
Superior technical skill in the operating room is associated with better patient outcomes [3,18], and at the core of surgical education is the belief that technical skill is improved through deliberate practice and appropriate feedback [10,17]. However, current standards for providing technical skills training are constrained by time , and most available methods for surgical skill assessment are subjective and global . This has motivated interest in delivering targeted and automated assessment and feedback with machines, especially in robot-assisted surgery, during which high-quality surgical motion data can be captured transparently for analysis. Automated surgical activity recognition is an important precursor to achieving this goal. Indeed, in recent years, significant progress has been made in surgical activity recognition, especially within the context of simulated training [1,5,8], an important part of current training curricula . Though promising, these approaches have relied on large amounts of annotated data, which, unlike the surgical-motion data itself, must be provided manually by experts. This process is expensive, difficult, and error-prone. In this report, we instead consider learning meaningful representations of surgical motion from motion data alone. The underlying idea is simple: if we can learn to reliably predict future motion from past motion, through a learned encoding, then this encoding must implicitly capture identifying characteristics of the underlying surgical activities. We demonstrate the feasibility of this idea; show that the obtained encodings correlate well with high-level activities; use the obtained encodings to obtain state-of-the-art performance for querying a database of surgical motion; and use the obtained encodings to improve surgical activity recognition when few annotated sequences are available.