Nature Journal Highlights Challenges in Reporting Large Language Models in Research
A recent article in Nature discusses the complexities and challenges associated with the use of large language models (LLMs) like ChatGPT in scientific research. The article emphasizes the variability in how these models are implemented, used, and reported, which can lead to significant differences in research outcomes. It highlights that the label 'ChatGPT' can refer to various underlying models, such as GPT-4 or GPT-4o, each with multiple versions. These versions can differ significantly in performance based on parameters like size and access mode. The article also points out that commercial models often do not disclose changes, complicating reproducibility and verification. Additionally, the variability in model outputs can be influenced by specific parameter settings and prompt phrasing, which are often unrecorded, making it difficult to compare results across studies.