GPT-4: Mitigating Risks and Enhancing Safety in AI Models | GPT-4 | OpenAI | 

WHAT IS GPT-4 ? | BRIEF OVERVIEW GPT-4 | OpenAI |


OpenAI Reveals GPT-4: A Multimodal AI Model That Exhibits Human-Level Performance on Professional and Academic Benchmarks


OpenAI has just announced the development of its latest deep learning model, the GPT-4. This multimodal model accepts both image and text inputs and emits text outputs. While it may be less capable than humans in real-world scenarios, it exhibits human-level performance on various professional and academic benchmarks.


Professional and Academic Benchmarking:

One notable example of GPT-4's impressive capabilities is its performance on a simulated bar exam. GPT-4 scored around the top 10% of test takers, whereas its predecessor, GPT-3.5, only scored around the bottom 10%. These benchmark tests were performed after six months of iterative alignment using lessons learned from OpenAI's adversarial testing program and ChatGPT. As a result, GPT-4 boasts the best-ever results on factuality, steerability, and refusing to go outside of guardrails.


Improved Deep Learning Stack and Supercomputer Design:

Over the past two years, OpenAI has rebuilt its entire deep learning stack and co-designed a supercomputer with Azure from the ground up for its workload. A year ago, the team trained GPT-3.5 as a first "test run" of the system, which helped identify and fix some bugs and improve their theoretical foundations. This preparation resulted in GPT-4 being the first large model whose training performance they were able to accurately predict ahead of time.


Reliable Scaling and Predicting Future Capabilities:

As OpenAI continues to focus on reliable scaling, they aim to hone their methodology to help them predict and prepare for future capabilities increasingly far in advance. This approach is critical for safety, especially as AI models become more complex and sophisticated.


Text Input Capability via ChatGPT and API Release:

OpenAI is releasing GPT-4's text input capability via ChatGPT and the API, but with a waitlist. To prepare for the image input capability's wider availability, they are collaborating closely with a single partner to start.


Open-Sourcing OpenAI Evals Framework:

OpenAI is also open-sourcing its OpenAI Evals framework, which automates the evaluation of AI model performance. Anyone can now report shortcomings in their models to help guide further improvements.


Subtle Differences Between GPT-3.5 and GPT-4:

In a casual conversation, it may be challenging to distinguish between GPT-3.5 and GPT-4. The difference becomes evident when the task's complexity reaches a sufficient threshold. GPT-4 is more reliable, creative, and can handle much more nuanced instructions than its predecessor.


Benchmark Testing and Technical Report:

To understand the difference between the two models, OpenAI tested GPT-4 on various benchmarks, including simulating exams originally designed for humans. They used the most recent publicly-available tests or purchased 2022-2023 editions of practice exams without specific training for these exams. While a minority of the problems in the exams were seen by the model during training, they believe the results to be representative. For more information, check out their technical report.


GPT-4: The Future of Language Models:

GPT-4, the latest addition to the GPT family of language models, has taken the natural language processing world by storm. Developed by OpenAI, GPT-4 has demonstrated unparalleled performance on traditional machine learning benchmarks, surpassing existing large language models and state-of-the-art models.


Multilingual Capabilities:

GPT-4's language prowess extends beyond English, as it outperforms GPT-3.5 and other large language models on the MMLU benchmark, which spans 57 subjects and 14,000 multiple-choice problems in 24 out of 26 languages tested, including low-resource languages such as Latvian, Welsh, and Swahili.


Visual Inputs:

GPT-4's capabilities also extend to vision tasks. It can accept prompts with text and images, generating natural language or code outputs over a range of domains, including documents with text and photographs, diagrams, or screenshots. Its performance on standard academic vision benchmarks is impressive, and OpenAI is continuously discovering new tasks that the model can tackle.


Steerability:

OpenAI has been working on defining the behavior of AIs, including steerability. With ChatGPT, users can now prescribe their AI's style and task by describing those directions in the "system" message. This allows API users to significantly customize their users' experience within bounds. Although the adherence to the bounds is not perfect, OpenAI is continually improving the system.


Limitations:

Despite its capabilities, GPT-4 still has limitations. It can hallucinate facts and make reasoning errors, making it unreliable in high-stakes contexts. OpenAI recommends taking great care when using language model outputs and matching the exact protocol to the specific use-case.


OpenAI has made progress on reducing hallucinations relative to previous models, and GPT-4 scores 40% higher than GPT-3.5 on internal adversarial factuality evaluations. However, the model can still have various biases in its outputs, and it lacks knowledge of events that occurred after September 2021.


OpenAI has revealed that its next-generation natural language model, GPT-4, will be safer and more aligned with human values than its predecessor, GPT-3. However, there are still risks associated with the technology. GPT-4 could generate harmful advice, buggy code, or inaccurate information, and while the model's predictive capabilities are highly calibrated, it can still be confidently wrong. To mitigate these risks, OpenAI has engaged over 50 experts to adversarially test the model and feed back data to improve its safety properties. GPT-4 also incorporates an additional safety reward signal during reinforcement learning training to reduce harmful outputs.


The training process for GPT-4 involves predicting the next word in a document using publicly available data, as well as data licensed by OpenAI. Fine-tuning the model's behavior is achieved through reinforcement learning with human feedback, although the model's capabilities are primarily derived from the pre-training process.


OpenAI has also focused on building a deep learning stack that scales predictably, using infrastructure and optimization with predictable behavior across multiple scales. The organization has developed a software framework called OpenAI Evals, which allows users to create and run benchmarks for evaluating models like GPT-4 and tracking their performance across different versions. OpenAI has also been collaborating with external researchers to assess the potential social and economic impacts of GPT-4 and other AI systems.


While OpenAI has made significant progress in improving GPT-4's safety properties, there is still a risk of generating harmful outputs. It will become increasingly critical to achieve high degrees of reliability in interventions as the "risk per token" of AI systems increases. OpenAI hopes to develop methods that provide better guidance about what to expect from future systems and make this a common goal in the field.

Summary

While GPT-4 offers new capabilities and improvements compared to its predecessors, it still poses risks and requires mitigation efforts to ensure safety and alignment with intended use. The training process involves a combination of pre-training and post-training techniques, including reinforcement learning with human feedback, to fine-tune the model's behavior. The predictability of scaling and development of better evaluation methods are crucial for ensuring safety in future AI systems. With the release of OpenAI Evals, researchers and developers can evaluate and track the performance of GPT-4 and other models more efficiently. As AI technology continues to advance, it is important to prioritize safety and alignment with human values to minimize potential harm and maximize benefits to society.