Written by 11:00 am AI Language use

Eight little AI language versions made by Apple are released that are intended for on-device use.

OpenELM mirrors efforts by Microsoft to make useful small AI language models that run locally.

What might be called “little speech types” in the field of AI have recently gained popularity because they can run on a nearby device rather than a cloud of data center-grade computers. Apple unveiled a number of small source-available AI language versions called OpenELM on Wednesday that are small enough to move straight onto a smartphone. They’re primarily proof of concept study designs for the time being, but they could serve as the foundation for Apple’s upcoming on-device AI offerings.

Apple’s novel AI models, collectively known as “Open-cause Effective Language Models,” are now accessible on the Touching Face under an Apple Sample Code License. Although there are some limitations in the passport, it might not suit the accepted definition of “open cause,” but the source code for OpenELM is attainable.

On Tuesday, we covered Microsoft’s Phi-3 models, which aim to achieve something related: a valuable level of language understanding and processing performance in little AI models that you run directly. Phi-3-little functions 3.8 billion parameters, but some of Apple’s OpenELM models are much smaller, ranging from 270 million to 3 billion parameters in eight different models.

In comparison, the largest model yet released in Meta’s Llama 3 family includes 70 billion parameters (with a 400 billion version on the way), and OpenAI’s GPT-3 from 2020 shipped with 175 billion parameters. While parameter count can be used to assess AI model capability and complexity, recent research has focused on improving the capabilities of smaller AI language models as much as they were a few years ago.

The eight OpenELM models come in two flavors: four as “pretrained” (basically a raw, next-token version of the model) and four as instruction-tuned (fine-tuned for instruction following, which is more ideal for developing AI assistants and chatbots):

  • OpenELM-270M
  • OpenELM-450M
  • OpenELM-1_1B
  • OpenELM-3B
  • OpenELM-270M-Instruct
  • OpenELM-450M-Instruct
  • OpenELM-1_1B-Instruct
  • OpenELM-3B-Instruct

OpenELM features a 2048-token maximum context window. The models were trained on the publicly available datasets Refined Web, a version of PILE with duplications removed, a subset of RedPajama, and a subset of Dolma v1.6, which Apple says totals around 1.8 trillion tokens of data. Tokens are disjointed representations of data that AI language models use for processing.

Apple claims that its approach to OpenELM includes a “layer-wise scaling strategy,” which reportedly allocates parameters more effectively across each layer, reducing both computational resources and boosting the model’s performance while using fewer tokens. In accordance with Apple’s white paper, this approach requires twice as many pre-training tokens as Allen AI’s OLMo 1B, which has an accuracy improvement of 2.36 percent over Allen AI’s OLMo 1B.

An table comparing OpenELM with other small AI language models in a similar class, taken from the OpenELM research paper by Apple.
OpenELM and other small AI language models in a similar class are compared in more detail in this table, which was taken from Apple’s OpenELM research paper.

Apple

Apple also made available the code for CoreNet, a library used to train OpenELM, as well as reproducible training recipes that allow the weights (neural network files) to be replicated, which is unusual for a major tech company so far. Transparency is a key goal for the company, according to Apple’s OpenELM paper abstract: “The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and enabling investigations into data and model biases, as well as potential risks.”

By releasing the source code, model weights, and training materials, Apple says it aims to “empower and enrich the open research community.” However, it also cautions that since the models were trained on publicly sourced datasets, “there exists the possibility of these models producing outputs that are inaccurate, harmful, biased, or objectionable in response to user prompts.”

The upcoming iOS 18 update, which is scheduled to be unveiled in June at WWDC, is rumored to include new AI features that use on-device processing to ensure user privacy. However, Apple may choose to work with Google or OpenAI to handle more complex, off-device AI processing to give Siri a long-overdue boost.

Visited 3 times, 1 visit(s) today
Tags: Last modified: April 30, 2024
Close Search Window
Close