CUSTOMIZED WEB APPLICATION FOR ADDRESSING LANGUAGE MODEL MISALIGNMENT THROUGH REINFORCEMENT LEARNING FROM HUMAN FEEDBACK

Authors

  • Chinonso Emmanuel Okeke Department of Technical Education, Ignatius Ajuru University of Education, Port Harcourt Rivers State, Nigeria
  • Emmanuel Joseph Essien Department of Computer Science, Akwa Ibom State University, Mkpat Enin, Nigeria.

DOI:

https://doi.org/10.5281/zenodo.14677340

Keywords:

Artificial Intelligence; ChatGPT; OpenAI; Reinforcement Learning; Human Feedback; InstructGPT models

Abstract

Recent years have seen tremendous progress in the field of artificial intelligence, which has sparked the creation of cutting-edge tools like OpenAI ChatGPT. The OpenAI GPT -3 family of big language models serves as the foundation for ChatGPT, which is enhanced through the use of supervised and reinforcement learning methodologies. Its goal is to produce text that can't be distinguished from human-written information. It can hold conversations with users in a way that is surprisingly clear-cut and uncomplicated. Reinforcement Learning from Human Feedback (RLHF) is the technique employed. Human input and machine learning methods (Supervised Learning) are used to train the model. It is employed in the training phases to reduce biased, damaging, and false outputs. The resulting InstructGPT models are much better at following instructions than GPT-3. Above all, customized ChatGPT web application that can fine-tune a given input and generate text that is of high quality, harmless, truthful and appropriate, without biased outputs. A key motivation for our work is to increase helpfulness and truthfulness output while mitigating the harms and biases of language models. In conclusion, our results show that reinforcement learning from human feedback (RLHF) techniques is effective at significantly improving the alignment of general-purpose AI systems with human intentions

Downloads

Published

2025-01-17

Issue

Section

Articles