RTFM: Towards Understanding Source Code using Natural Language Processing

Maximilian Galanis, Vincent Dietrich, Bernd Kast, Michael Fiegert


The manual configuration of today’s autonomous systems for new tasks is becoming increasingly difficult due to their complexity. One solution to this problem is to use planning algorithms that can automatically synthesize suitable data processing pipelines for the task at hand and thus simplify the configuration. Planners usually rely on models, which are created manually based on already existing methods. These methods are often provided as part of domain specific code libraries. Therefore, using existing planners on new domains requires the manual creation of models based on the methods provided by other libraries. To facilitate this, we propose a system that generates an abstract semantic model from C++ libraries automatically. The necessary information is extracted from the library using a combination of static source code analysis to analyze its header files and natural language processing (NLP) to analyze its official documentation. We evaluate our approach on the perception domain with two popular libraries: HALCON and OpenCV. We also outline how the extracted models can be used to configure data processing pipelines for the perception domain automatically by using an existing planner.


Paper Citation