Large language models in software engineering
A systematic review and vision
DOI:
https://doi.org/10.56916/jesi.v2i2.968Keywords:
Large Language Models, Software Engineering, Transformer, AI, PRISMAAbstract
Large Language Models (LLMs) are very large deep learning models pre-trained on a vast amount of data. This article aims to provide an overview of the use of major language models in the field of software engineering from January 2021 to February 2024. It surveys the emerging area of Large Language Modeling in Software Engineering but acknowledges that to fully understand the issues, effects, and limitations of LLMs in this field, further research is needed. The article also highlights open research challenges for applying Large Language Models to technical problems faced by software engineers. The exceptional properties of LLMs bring novelty and creativity to applications within Software Engineering activities, including coding, design, requirements, repair, refactoring, performance improvement, documentation, and analytics. Our survey demonstrates the key role of reliable and efficient large language models in the development and deployment of Software Engineering.
References
Chang, Y., Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K., Chen, H., et al. (2023). A survey on evaluation of large language models. ACM Transactions on Intelligent Systems and Technology.
Chen, M., Tworek, J., Jun, H., Yuan, Q., Ponde de Oliveira Pinto, H., Kaplan, J., Edwards, H., et al. (2021). Evaluating large language models trained on code. arXiv preprint, arXiv:2107.03374.
Du, X., Liu, M., Wang, K., Wang, H., Liu, J., Chen, Y., Feng, J., et al. (2023). Classeval: A manually-crafted benchmark for evaluating LLMs on class-level code generation. arXiv preprint, arXiv:2308.01861.
Fan, Z., Gao, X., Mirchev, M., Roychoudhury, A., & Tan, S. H. (2023). Automated repair of programs from large language models. In 2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE) (pp. 1469–1481). IEEE.
Fu, Y., Zhang, Y., Yu, Z., Li, S., Ye, Z., Li, C., Wan, C., & Lin, Y. C. (2023). GPT4AiGChip: Towards next-generation AI accelerator design automation via large language models. In 2023 IEEE/ACM International Conference on Computer Aided Design (ICCAD) (pp. 1–9). IEEE.
Kiesler, N., Lohr, D., & Keuning, H. (2023). Exploring the potential of large language models to generate formative programming feedback. In 2023 IEEE Frontiers in Education Conference (FIE) (pp. 1–5). IEEE.
Li, J., Li, G., Tao, C., Zhang, H., Liu, F., & Jin, Z. (2023). Large language model-aware in-context learning for code generation. arXiv preprint, arXiv:2310.09748.
Liu, Y., Han, T., Ma, S., Zhang, J., Yang, Y., Tian, J., He, H., et al. (2023). Summary of ChatGPT-related research and perspective towards the future of large language models. Meta-Radiology, 100017.
Lu, J., Yu, L., Li, X., Yang, L., & Zuo, C. (2023). LLaMA-Reviewer: Advancing code review automation with large language models through parameter-efficient fine-tuning. In 2023 IEEE 34th International Symposium on Software Reliability Engineering (ISSRE) (pp. 647–658). IEEE.
Makridakis, S., Petropoulos, F., & Kang, Y. (2023). Large language models: Their success and impact. Forecasting, 5(3), 536–549.
Pan, J. Z., Razniewski, S., Kalo, J.-C., Singhania, S., Chen, J., Dietze, S., Jabeen, H., et al. (2023). Large language models and knowledge graphs: Opportunities and challenges. arXiv preprint, arXiv:2308.06374.
Sallou, J., Durieux, T., & Panichella, A. (2023). Breaking the silence: The threats of using LLMs in software engineering. arXiv preprint, arXiv:2312.08055.
Sarsa, S., Denny, P., Hellas, A., & Leinonen, J. (2022). Automatic generation of programming exercises and code explanations using large language models. In Proceedings of the 2022 ACM Conference on International Computing Education Research-Volume 1 (pp. 27–43).
Singh, I., Blukis, V., Mousavian, A., Goyal, A., Xu, D., Tremblay, J., Fox, D., Thomason, J., & Garg, A. (2023). ProgPrompt: Generating situated robot task plans using large language models. In 2023 IEEE International Conference on Robotics and Automation (ICRA) (pp. 11523–11530). IEEE.
Thakur, S., Ahmad, B., Pearce, H., Tan, B., Dolan-Gavitt, B., Karri, R., & Garg, S. (2023). VeriGen: A large language model for Verilog code generation. arXiv preprint, arXiv:2308.00708.
Vaithilingam, P., Zhang, T., & Glassman, E. L. (2022a). Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In CHI Conference on Human Factors in Computing Systems Extended Abstracts (pp. 1–7).
Vaithilingam, P., Zhang, T., & Glassman, E. L. (2022b). Expectation vs. experience: Evaluating the usability of code generation tools powered by large language models. In CHI Conference on Human Factors in Computing Systems Extended Abstracts (pp. 1–7).
Wang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J., Chen, Z., et al. (2023). A survey on large language model-based autonomous agents. arXiv preprint, arXiv:2308.11432.
Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., Yogatama, D., et al. (2022a). Emergent abilities of large language models. arXiv preprint, arXiv:2206.07682.
Wei, J., Tay, Y., Bommasani, R., Raffel, C., Zoph, B., Borgeaud, S., Yogatama, D., et al. (2022b). Emergent abilities of large language models. arXiv preprint, arXiv:2206.07682.
Wu, T., Jiang, E., Donsbach, A., Gray, J., Molina, A., Terry, M., & Cai, C. J. (2022). PromptChainer: Chaining large language model prompts through visual programming. In CHI Conference on Human Factors in Computing Systems Extended Abstracts (pp. 1–10).
Xiao, G., Lin, J., Seznec, M., Wu, H., Demouth, J., & Han, S. (2023a). SmoothQuant: Accurate and efficient post-training quantization for large language models. In International Conference on Machine Learning (pp. 38087–38099). PMLR.
Xiao, G., Lin, J., Seznec, M., Wu, H., Demouth, J., & Han, S. (2023b). SmoothQuant: Accurate and efficient post-training quantization for large language models. In International Conference on Machine Learning (pp. 38087–38099). PMLR.
Xu, F. F., Alon, U., Neubig, G., & Hellendoorn, V. J. (2022). A systematic evaluation of large language models of code. In Proceedings of the 6th ACM SIGPLAN International Symposium on Machine Programming (pp. 1–10).
Zhang, Z., Zhang, X., Xie, W., & Lu, Y. (2023). Responsible task automation: Empowering large language models as responsible task automators. arXiv preprint, arXiv:2306.01242.
Zheng, Z., Ning, K., Chen, J., Wang, Y., Chen, W., Guo, L., & Wang, W. (2023). Towards an understanding of large language models in software engineering tasks. arXiv preprint, arXiv:2308.11396.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Nguyen Van Viet, Nguyen The Vinh

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.