From WWDC 2024, Apple Intelligence to the race of Open-Source model: Uncovering the Story Behind Apple's Lightweight On-Device GenAI Language Model

Corenet is the keyword.

Jun 12, 2024

At WWDC24, Apple revealed Apple Intelligence, a new personal intelligence system integrated into iPhone, iPad and Mac. It combines generative AI models with the user's personal context to deliver useful, relevant intelligence while protecting privacy through on-device processing.

The Apple Intelligence page provides more details on features like:

Intelligent writing assistance nearly everywhere you type
Generating original images from descriptions
Creating personalized Memoji (Genmoji)
Sketch-based image generation in Notes
Siri enhancements including typing, product knowledge, and cross-app actions
Strong privacy through on-device processing and Private Cloud Compute

However, what I think the most impressive is the “Open-source, on-device Generative AI". On-device AI models, particularly Generative AI (GenAI), are gaining significant attention and showing a lot of potential in addressing data privacy concerns. One of the most pressing issues with using third-party GenAI APIs is the risk of personal data leakage, which can be mitigated by employing on-device AI solutions. By processing data locally on the user's device, on-device GenAI eliminates the need to send sensitive information to external servers, thereby reducing the chances of unauthorized access and ensuring better data protection.

And what’s the framework behind this ground-breaking innovation from Apple?

This remarkable achievement from Apple is the culmination of years of cutting-edge research and development by Apple's AI teams. Central to this effort is CoreNet, Apple's open-source deep learning library that has been quietly evolving to enable the efficient training of foundational models like large language models (LLMs).

You can find Corenet’s libraries with all training scripts and experiments here

Most notably, CoreNet powers OpenELM, the latest open-source language model family at the heart of Apple's on-device GenAI capabilities. By leveraging novel techniques like layer-wise scaling to efficiently allocate parameters, OpenELM achieves industry-leading performance while remaining lightweight enough to run on consumer devices. The release of OpenELM, complete with code, pre-trained weights, and training recipes, marks a significant shift in Apple's AI strategy towards greater transparency and open research.

Now let’s deep-dive into what’s inside OpenELM

OpenELM is a tate-of-the-art open language model that efficiently allocates parameters within each transformer layer to enhance accuracy. Apple also provides the complete training and evaluation framework, including code, model weights, training logs, and configurations to empower open research (which is super cool).

Let’s first look at their data:

Publicly available datasets (totaling ~1.8 trillion tokens).
Datasets include:
- RefinedWeb (665B tokens)
- Subsets of RedPajama like Github, Books, ArXiv, Wikipedia, StackExchange (361B tokens total)
- Deduplicated PILE (207B tokens)
- Subsets of Dolma v1.6 like The Stack, Reddit, Wikipedia, etc. (580B tokens total)
Applied on-the-fly tokenization and data filtering during pre-training.

Then, the choice of their model’s architecture and methods:

Decoder-only transformer architecture (just like any other gen-AI or GPT-based models)
Key architectural choices: no learnable biases, pre-normalization, rotary positional embeddings, grouped query attention instead of multi-headed attention, SwiGLU FFN instead of vanilla FFN, flash attention
Layer-wise scaling to efficiently allocate parameters non-uniformly across transformer layers (also referred as block-wise scaling).
Scale number of attention heads and FFN dimension in each layer based on its depth

In which, alpha_min and alpha_max are the hyper-parameters to scale the attention heads, and beta_min and beta_max are the hyper-parameters to adjust the width of FFN layers

Experiments, results & take-aways

The authors trained and evaluated OpenELM models of sizes 270M, 450M, 1.1B and 3B parameters.
Evaluation across standard zero-shot tasks, OpenLLM leaderboard tasks, LLM360 leaderboard tasks.
1.1B OpenELM outperforms comparable open models like 1.2B OLMo by 2.36% average accuracy while using 2x fewer pre-training tokens.
Largest 3B OpenELM achieves strong results, e.g. 67.39% average accuracy on standard zero-shot tasks.

Performance improves with model scale and training duration in most tasks.
Releasing pre-trained checkpoints and complete code/framework to reproduce results.

The next steps for OpenELM include further improving model efficiency and accuracy, expanding the open resources released to the community, and conducting more comprehensive evaluations across a wider range of natural language tasks. Key areas of focus for future research are architectural enhancements, compute optimizations, and studying approaches to enable safe and responsible deployment of open language models like OpenELM.

The model’s weight is here on Huggingface.

Like this post? Subscribe and share it to your friends!

From WWDC 2024, Apple Intelligence to the race of Open-Source model: Uncovering the Story Behind Apple's Lightweight On-Device GenAI Language Model

Corenet is the keyword.

And what’s the framework behind this ground-breaking innovation from Apple?

Now let’s deep-dive into what’s inside OpenELM

Experiments, results & take-aways

Discussion about this post