Multi-sensor fusion systems (MSF) are widely deployed in modern autonomous vehicles (AVs) as the perception module. Hence, their robustness to common and adversarial semantic transformations (such as vehicle rotation and shifting) in the physical world is critical to the safety of AVs. Prior work shows that multi-sensor fusion systems, though more robust than single-modal models, are still vulnerable to adversarial semantic transformations. Even some empirical defenses have been proposed, they can be adaptively attacked again. So far, no certified defenses have been studied yet for MSF. In this work, we propose the first robustness certification framework COMMIT to certify robustness of multi-sensor fusion systems against semantic attacks. We propose a practical anisotropic noise mechanism to leverage randomized smoothing given multi-modal data, a grid-based splitting method to characterize complex semantic transformations, and efficient algorithms to compute the certification for object detection and IoU lower bounds for large-scale MSF models. We provide a benchmark of certified robustness for different MSF models using COMMIT based on CARLA. We show that the certification for MSF models is at least 48.39% higher than single-modal models, which confirms the advantages of MSF models. We believe our framework and benchmark will contribute an important step towards certifiably robust AVs in practice.