A WEB APPLICATION FOR CORRECTING LANGUAGE MODEL MISALIGNMENT THROUGH REINFORCEMENT LEARNING FROM HUMAN FEEDBACK

Authors

  • Chukwudi Daniel Okeke Department of Computer Science, National Open University of Nigeria, Nigeria
  • Blessing Grace Udom Department of Computer Science, Akwa Ibom State University, Mkpat Enin, Nigeria.

Keywords:

Artificial Intelligence; ChatGPT; OpenAI; Reinforcement Learning; Human Feedback; InstructGPT models

Abstract

Recent years have seen tremendous progress in the field of artificial intelligence, which has sparked the creation of cutting-edge tools like OpenAI ChatGPT. The OpenAI GPT -3 family of big language models serves as the foundation for ChatGPT, which is enhanced through the use of supervised and reinforcement learning methodologies. Its goal is to produce text that can't be distinguished from human-written information. It can hold conversations with users in a way that is surprisingly clear-cut and uncomplicated. Reinforcement Learning from Human Feedback (RLHF) is the technique employed. Human input and machine learning methods (Supervised Learning) are used to train the model. It is employed in the training phases to reduce biased, damaging, and false outputs. The resulting Instruct models are much better at following instructions than GPT-3. Above all, customized ChatGPT web application that can fine-tune a given input and generate text that is of high quality, harmless, truthful and appropriate, without biased outputs. A key motivation for our work is to increase helpfulness and truthfulness output while mitigating the harms and biases of language models. In conclusion, our results show that reinforcement learning from human feedback (RLHF) techniques is effective at significantly improving the alignment of general-purpose AI systems with human intentions

Downloads

Published

2025-01-17

Issue

Section

Articles