AI Time Machine Chatbot Developed Using Pre-1930s Data
A new trend in AI development involves creating large language models (LLMs) based on historical data, with a recent example being an AI trained exclusively on information available before 1931. This AI, named 'talkie-1930-13b-base', was developed using 260 billion tokens of pre-1931 English text, including books, newspapers, and scientific journals. The aim is to explore how AI can function when trained on data from a specific historical period, free from modern influences. However, challenges such as data impurities and potential anachronistic influences during training are noted.