Yes, Apple is also training on public web data

from blog Birchtree, | ↗ original
From Apple’s Machine Learning Research blog: Introducing Apple’s on-Device and Server Foundation ModelsWe train our foundation models on licensed data, including data selected to enhance specific features, as well as publicly available data collected by our web-crawler, AppleBot. Web publishers have the option to opt