Written by 7:59 am Discussions, OpenAI

### “The Myth of ‘Open AI’: Debunking the Illusion”

Transparency isn’t enough to democratize the technology

A modest yet burgeoning technological movement posed a significant challenge to Microsoft at the start of the 21st century when the internet was gaining momentum, and the company was at its peak. During that period, Microsoft’s CEO, Steve Ballmer, famously likened one of its essential components to “a growth that clings to everything it contacts.” In contrast to expensive, specialized software such as Microsoft Windows and Office, the issue revolved around a rival operating system, Linux, and the ethos of open-source technology it embodied.

Projects like Mozilla Firefox, the Android operating system, and Wikipedia exemplify the ethos of “open” initiatives, but the tech industry successfully transformed this democratic concept into a profitable venture. Eventually, open-source software became intricately woven into the fabric of the internet. Major corporations worth billions leverage free open-source technology to develop or enhance their proprietary products. The cornerstone of the internet, including Big Tech platforms, devices, and data servers facilitating widespread internet connectivity, heavily relies on open-source components, drawing users towards the most dominant players in the industry. It’s almost a given that accessing processing capabilities from cloud services operated by tech giants like Microsoft, Google, or Amazon is imperative for running applications or hosting websites.

The emerging landscape of conceptual Artificial Intelligence (AI) is facing a similar conundrum. As more individuals rely on AI solutions provided by major corporations, there is a diminishing understanding and control over the technology’s inner workings. In response, a growing coalition of academics and businesses is advocating for open AI, distinct from OpenAI, the clandestine entity behind ChatGPT. The aim is to democratize a highly centralized technology that holds the potential to revolutionize various aspects of life, including work, politics, leisure, and even religion. The objective is to develop transparent models that the general populace can readily utilize, study, or replicate at a more accessible cost. However, this movement risks being co-opted by Big Tech, akin to the open-source revolution that preceded it.

The epitome of this tension can be observed in Llama 2, the most renowned and contentious AI system developed by Meta, the corporate behemoth encompassing Facebook, Instagram, WhatsApp, and Strands. Llama 2, a substantial language model released recently, is available for both research and commercial purposes, albeit less potent than the foundational ChatGPT and Google’s Bard. Despite the availability of the model’s code for download, Meta imposes restrictions on its utilization. Developers must obtain explicit authorization from Meta to integrate Llama 2 into products catering to over 700 million regular users, limiting its application in enhancing other speech models. This constraint could impede platforms like TikTok from leveraging the technology seamlessly. Moreover, a significant portion of Llama’s developmental pipeline remains undisclosed; specifically, details regarding the model’s training data are known only within Meta’s confines. Critics and independent programmers argue that this does not align with true accessibility.

Nevertheless, startups, institutions, and organizations can download and deploy Llama 2 for a variety of purposes. They can partially scrutinize the origins of Llama 2’s capabilities and constraints, facilitating its integration into products—a feat more challenging with “closed” technologies like ChatGPT and Bard. Meta’s lenient policy aims to enable the AI community to advance AI applications securely and reliably, as per an official statement from the company.

Despite the limitations on usage, true accessibility of a conceptual AI model necessitates more than just the final product. Understanding, replicating, or modifying AI mandates access to training data, processing code, fine-tuning methodologies, and other crucial elements. Unlike traditional open-source software, which can be easily disseminated in a single .zip file, AI poses greater challenges in containment and accessibility. Udbhav Tiwari, the global product policy lead at Mozilla, highlighted that many projects purportedly labeled as “open” AI are not genuinely open-source at all. Global initiatives are underway to redefine the concept of “open-source” concerning AI. Some critics perceive these ostensibly accessible releases as instances of “open washing,” where companies garner acclaim and research accolades without providing the necessary information for comprehensive study, replication, or competition with their models.

While there are more openly available models offering extensive insights into training methodologies and fewer usage constraints, these models are typically disseminated by nonprofits and small startups. Nonetheless, even these models struggle to cope with the immense complexity and resource demands of conceptual AI. If traditional open-source software resembled a bicycle in terms of simplicity in understanding and modification, AI is akin to a Tesla—repairing, let alone manufacturing, an advanced vehicle based on engineering blueprints is beyond the reach of most individuals. Similarly, when querying models like ChatGPT or Bard, the screen displays results powered by hundreds of millions of dollars’ worth of computing resources—not to mention investments in salaries, hardware, and ancillary expenses. Only tech giants and affiliated startups can afford such resources, limiting access to a select few.

Moreover, sustaining these models for a broad user base is cost-intensive. Nur Ahmed, an AI market researcher at the MIT Sloan School of Management, posited that universities, organizations, and startups “could develop such designs independently.” As concerns mount over startups lacking the resources to compete with tech giants, the pool of venture capital for AI initiatives is gradually dwindling.

“You’re open-sourcing the information, the weights, or the code in some mixture.” However, Mohamed Abdalla, a systems scientist at the University of Toronto specializing in Big Tech’s influence on AI, contended, “Not the data, not the infrastructure.” Major corporations lack the computational power and human expertise necessary to emerge as even minor competitors or significantly shape the trajectory of AI development. Yet auditing “open” models demands substantial resources; for instance, identifying images of child exploitation in the largest open-source image dataset used to train conceptual AI took nearly two years. Sarah Myers West, the managing director of the AI Now Institute, emphasized the stark disparity between the perceived simplification of AI access through open-source initiatives and the industry’s actual progress.

Several initiatives are striving to shift AI capabilities away from tech giants and towards the broader public. A cutting-edge technology hub dedicated to advanced AI research has been established in Boston, and the federal government is laying the groundwork for a National AI Research Resource. Yannis Paschalidis, a computer scientist at Boston University involved in the computing center, noted that, for now, “I don’t believe I can train a larger, more specialized model or develop the next iteration of ChatGPT with trillions of parameters.”

Researchers are also crafting smaller, open models that are more cost-effective to train and operate yet potent enough for various industrial applications. For instance, a group of researchers aiming to provide an open alternative to OpenAI’s closed GPT-3 founded the nonprofit research lab EleutherAI, which releases open-source AI models. Stella Biderman, the executive director of EleutherAI, elucidated, “We aimed to train models like this, gain a deeper understanding of their functioning, and make scaled-down versions accessible to the public.” However, numerous engineers, startups, nonprofits, and universities struggle to develop even smaller models without substantial grant funding or resort to utilizing models from affluent corporations.

Nevertheless, tools purportedly bolstering the open-source arena can also benefit tech giants; for instance, Google and Meta have developed and maintain widely used, free machine learning application libraries. Meta CEO Mark Zuckerberg underscored the significance of providing such tools during an earnings call, stating that it has been crucial for enabling top designers across the industry to leverage the same tools internally. When AI projects are developed using Meta’s tools, they are easily marketable and have the potential to draw users into Meta’s product ecosystem. When questioned about the financial incentives behind open-AI libraries, a Meta spokesperson emphasized their commitment to strategies benefiting both Meta and fostering a vibrant AI ecosystem. By championing a form of “open” AI development, as advocated by certain tech executives, companies may also be aiming to counteract potential regulatory constraints. Nonetheless, due to resource constraints, these projects are unlikely to pose a significant threat to leading AI enterprises.

The dominance of Silicon Valley in attracting talent and producing cutting-edge AI products directs research and innovation towards software architectures, applications, and tasks deemed most beneficial by these tech giants. This trend ultimately shapes the trajectory of AI research, as highlighted by Ahmed.

The technology sector presently thrives on scale, with large-scale models operating on corporate data servers in pursuit of marginal enhancements to specific benchmarks. An analysis of significant AI research papers from recent years revealed a predominant focus on functionality and innovation, while values such as “respect for individuals” and “justice” were notably absent. The trajectory of AI applications incorporated into various products and services is steered by these scholarly works. Abeba Birhane, an AI researcher at Mozilla and co-author of the review, noted that “the downstream repercussions could involve job denials or restricted housing opportunities for individuals.”

The tech industry has effectively shaped public expectations regarding technology by setting the bar with generative AI: if ChatGPT epitomizes the functioning of language models, any deviation is deemed inadequate. However, constraining the development and application of conceptual AI in such a manner could prove limiting. Just as individuals consider various factors beyond horsepower when purchasing a car, such as design, mileage, safety features, and entertainment systems, they may prioritize equity and transparency in a chatbot over sheer performance. To truly harness the potential of open AI, there is a need not only to redefine open-source but also to reimagine the essence and appearance of AI.

Visited 2 times, 1 visit(s) today
Tags: , Last modified: February 29, 2024
Close Search Window
Close