Paper Review: Apple Intelligence Foundation Models

The paper introduces Apple’s Apple Foundation Model, developed for iOS, iPad, and macOS. Apple’s aim with this model is to create a model customized for users' daily needs and adaptable to their real-time activities. Apple Intelligence includes features such as text writing, summarization, enhancement, visual generation, and simplifying interactions between applications

Architecture

The article introduces two foundation models, AFM-on-device and AFM-server (the larger one). These models are built on a decoder-only transformer model.

Here is the dimension of the model.

model dimensions — Figure 1: Model dimensions

This model can be considered more compact and medium-sized compared to the larger versions of ChatGPT (such as GPT-3 or GPT-4). The parameter size, number of layers, and dimensions of attention heads indicate that this model is suited for more specific and optimized use cases. While models like GPT-3 or GPT-4 are large-scale and have many more parameters, this model can offer advantages in terms of computational efficiency and resource usage for more specific tasks.

Dataset

Apple states that it uses a hybrid data strategy consisting of human annotations and synthetic data for its dataset. It emphasizes multiple times in the article that this data does not include any personal information from Apple users.

While preparing the dataset, Apple scans publicly available web data using the Apple bot and allows web publishers to exercise their right to opt-out from AppleBot by using the robots.txt guidelines.

Tokenizer

Apple uses a byte-pair encoding (BPE) tokenizer for these models. This tokenizer is also used in models like GPT and BERT. BPE can represent a wide variety of words more efficiently by splitting each word into smaller subunits, rather than representing each word as a single token.

Example:

Word: "unhappiness"

Split into characters: ['u', 'n', 'h', 'a', 'p', 'p', 'i', 'n', 'e', 's', 's']
Frequent character pairs are found and merged:

Most frequent: 'pp' -> ['u', 'n', 'h', 'a', 'pp', 'i', 'n', 'e', 's', 's']
Then: 'ss' -> ['u', 'n', 'h', 'a', 'pp', 'i', 'n', 'e', 'ss']
Then: 'ine' -> ['u', 'n', 'h', 'a', 'pp', 'ine', 'ss']

As a result, the word "unhappiness" is tokenized into subunits like ['u', 'n', 'h', 'a', 'pp', 'ine', 'ss'] by the model.

Results

According to the results published in the article, the AFM models appear to be superior to all other models except LLaMA-3 and GPT-4.

İnsan değerlendirmeleri — Figure 2: Human evaluation

Both models seem to be quite successful in terms of general capabilities.

Yetenek kıyaslamaları — Figure 3: Comparing capabilities

Both models also provide superiority in terms of security according to human evaluations compared to other models.

Güvenlik ile ilgili insan değerlendirmeleri — Figure 4: Human Preference Evaluation on Safety Prompts

Conclusion

This article introduces Apple Intelligence, developed for Apple’s own devices. The model has been reported to be more successful compared to other models based on the data.

You can read the entire article here.

Apple. "Apple Intelligence Foundation Language Models." arXiv, 29 Jul. 2024, arxiv.org/abs/2407.21075.

Everything about technology