Written by 8:53 am AI, AI designs, AI Trend, Big Tech companies, Discussions, Giskard, Latest news

### Testing AI Models for Production with an Open-Source Framework

Giskard is a French startup working on an open-source testing framework for large language models.

A European startup, Giskard, is in the process of developing an open-source testing framework for large language models. This framework aims to alert developers to potential biases, security vulnerabilities, and the model’s capability to generate harmful or unsafe content. Despite the current hype surrounding AI technologies, the focus is shifting towards machine learning testing methodologies, especially with impending legislation such as the AI Act in the European Union and similar regulations in other jurisdictions. To avoid substantial penalties, companies engaged in developing artificial intelligence models must demonstrate compliance with regulations and mitigate associated risks.

Giskard, led by co-founder and CEO Alex Combessie, stands out as an early proponent of a testing tool that prioritizes effectiveness. Combessie, who previously worked at Dataiku concentrating on NLP integration, recognized the challenges in comparing supplier performance and the limitations in practical applications. The Giskard assessment framework comprises three main components. Firstly, the company offers an open-source Python library designed for retrieval-augmented generation (RAG) and large language model (LLM) projects, compatible with various machine learning platforms such as Hugging Face, MLFlow, Weights & Biases, PyTorch, Tensorflow, and Langchain. This library has gained popularity on platforms like GitHub.

Giskard facilitates the creation of a comprehensive test suite tailored to the model’s specific requirements post-initial setup. These tests cover a wide range of issues including performance evaluation, hallucinations, misinformation, biased outputs, data leaks, generation of harmful content, and rapid injections. Combessie emphasizes the increasing importance of social considerations and regulatory compliance alongside performance metrics for data scientists.

Developers can seamlessly integrate these tests into their continuous delivery and integration (CI/CD) pipeline to automatically run whenever a new code version is deployed. The tests are customized to simulate real-world scenarios, ensuring relevance and accuracy. Giskard collaborates with businesses to access relevant data repositories to enhance the testing process.

Giskard’s upcoming AI Quality Hub promises to facilitate comparative analysis of large language models, aiding organizations in ensuring regulatory compliance. The company’s future plans include the introduction of “LLMon,” a real-time monitoring tool to evaluate LLM responses for common issues like toxicity, hallucination, and fact-checking. Giskard is actively engaging with companies utilizing OpenAI’s APIs and LLMs, while also exploring partnerships with other industry players like Hugging Face and Anthropic.

In terms of regulatory compliance and use cases, Giskard is well-positioned to assist developers in navigating the complexities of controlling artificial designs, particularly those enriched with external data. As the company expands its team to meet the growing demand for machine learning models, it aims to establish itself as a leading provider in the market.

Visited 7 times, 1 visit(s) today
Last modified: November 14, 2023
Close Search Window
Close