Abstract
This paper proposes the use of synthetic training data generated by large language models to improve machine learning SDG classifiers. It shows that supplementing existing training data with synthetic data produced by the ChatGPT tool improves the performance of the SDGClassy classifier. This addition of synthetic data is especially useful in building SDG classifiers given the limited availability of properly labeled data and the complex, interconnected nature of the SDGs. Synthetic data thus enable more effective machine-learning applications in this context.
© United Nations
- 30 Nov 2023