QK Helps Leading Indian Insurer Evaluate its Gen AI-powered Chatbot

Overview

QK, a global leader in AI-powered reliability and quality engineering, collaborated with a leading Indian insurer to optimize the performance of its customer-facing AI-powered chatbot. The chatbot, powered by generative AI complemented with Retrieval-Augmented generation (RAG), aimed to enhance customer experience by providing instant, accurate, and personalized responses to inquiries related to our client’s insurance products and services.

0 +

Digital
modernization
projects delivered

0 %

Reduction in
regression testing
times

0 %

Reduction in
Quality Engineering
costs

Zero

Critical
production
defects

Client Overview

A leading Indian life insurance provider offering term and whole life protection, savings plans for education/marriage goals, retirement income products, and health coverage. The company drives digital transformation through AI and machine learning, deploying chatbots for real-time customer support, automated claims processing, and personalized policy recommendations. Its multi-channel network includes agent partnerships, online platforms, and bancassurance collaborations, supported by regional-language tools and mobile apps to ensure accessibility across urban and rural markets. Focused on innovation, it prioritizes inclusive growth and operational agility.

Business Objectives & Challenges

To improve customer experience and personalization for their product, our client designed an AI-powered chatbot harnessing generative AI and RAG to instantly and accurately answer customer queries about its range of products. Our client wanted to failproof the effectiveness and accuracy of the AI-powered chatbot while identifying areas of improvement to ensure it delivers value to their end users and augments their digital-first business vision.

Complex Functional Requirements:

The app needed to support symptom tracking, medication adherence, trigger management, and integration with wearable devices, making functional testing critical.

User Experience Optimization:

Many patients with chronic conditions are not highly tech-savvy, so the app needed to be intuitive and easy to navigate for users of all ages

Performance and Compatibility:

The app had to perform seamlessly across different devices (smartphones, tablets) and operating systems (Android, iOS), even under varying conditions such as low-battery or offline use

Data Security and Compliance:

Given the sensitive nature of healthcare data, the app needed to comply with HIPAA and other relevant regulations to ensure patient privacy and data security.

Real-World Usability:

The app had to be tested with real patients to ensure it met their needs and provided tangible benefits in managing their conditions.

Goals

Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry’s standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book.

Accelerate
time to market

Improve cost
efficiencies

Improve ROI for
investments made
in innovation

Access
scalability

The QK Strategy: AI-Augmented QE Framework for Chatbot Testing

After thoroughly assessing our client’s requirements and building on our in-house quality engineering of AI framework, we developed a comprehensive plan to assess the chatbot across six key performance dimensions, then deploy a two-phase testing approach to ensure peak performance, accuracy, and continuous improvement.

Chatbot Testing Performance Dimensions

QK’s chatbot testing framework defines six key areas to assess the quality engineering of AI-powered chatbot performance:

Query Understanding

This facet of quality engineering of AI deals with assessing the chatbot’s ability to accurately interpret and classify user queries, forming the foundation for delivering accurate and personalized responses.

Information Retrieval

Harnessing RAG, the effectiveness of the AI chatbot’s responses depended on its ability to effectively retrieve contextually relevant information from its knowledge base.

Response Generation

The final response generated through the interaction of RAG and LLM models required thorough evaluation to assess its accuracy, quality, and relevance to user queries.

Conversation Flow

Along with its responses, the AI chatbot was expected to maintain a natural and engaging flow, aligned with the brand’s customer-centric tone and style of communication.

Error Handling

To ensure optimal performance, the AI chatbot needed to be tested for managing exceptional cases, involving unexpected inputs, ambiguous queries, and out-of-scope requests.

Integration with Insurer’s Information Database

The AI chatbot’s success depended on its effective integration with the insurer database to pull and display correct data for generating relevant and accurate responses to user queries.

Two-phase Testing Approach

QK tested the functional performance of the Generative AI-powered chatbot using a two-phase testing approach including test data generation and test evaluation.

Phase 1: Test Data Generation

The test data generation phase was divided into 3 different stages:

Data Asset Creation

This step focused on creating a comprehensive dataset to test the Generative AI-powered chatbot. To accomplish this, we created a test dataset that comprised documents with annotated data points on questions, their corresponding expected answers, and validation data created by an AI data specialist. To ensure comprehensive test coverage, we used appropriate LLM models to generate multiple variations of the initial questions.

The human-preferred validation dataset, designed to provide the guardrails for the chatbot to refine its responses, was structured into three distinct classifications:

Positive examples showcasing questions with relevant answers in the documents
Negative examples showcasing questions with no corresponding answers in the documents
Edge cases tested the chatbot’s ability to handle ambiguous, open-ended, or reasoning-based queries.

Context Retrieval & Response Generation

The generated test data (user queries) was leveraged by an AI engineer to interact with the chatbot. For each user query, the AI engineer performed the following activities:

Triggered the chatbot to retrieve relevant document chunks (context) from the provided documents
Captured the retrieved context for later analysis
Instructed the chatbot to generate a response to the query

Test Data Preparation

Conducting the activities to test the chatbot for its retrieval and response efficiencies, the AI engineer complied the test data that included:

Original query
Retrieved contexts
Generated responses
Expected answers

Phase 2: Evaluation

The evaluation phase of the AI chatbot testing involved assessing the generated responses using intelligent automation in conjunction with manual verification to ensure scalable, accelerated, and accurate quality engineering.

AI-driven Evaluation

The first step in this phase used an AI model evaluation platform to calculate the following metrics for each validation set:

Context Relevance: The metric assessed whether the retrieved document chunks (contexts) were relevant to the query and contained information to answer the question. Using the platform each output was labelled as relevant or irrelevant.
Groundedness/Hallucination: This metric determined if the generated response was based on factual information from retrieved contexts or fabricated information created due to AI hallucination. The platform marked each output as factual or hallucinated in this context.
Answer Correctness: The metric evaluated whether the generated response matched the expected answer and labeled each output as correct or incorrect.

Manual Validation

The AI platform evaluation was followed by manual validation from our data specialist who reviewed the results for each test case. This process involved verifying the accuracy of the AI-generated evaluation labels, ensuring the reliability of the evaluation process.

Results

At this step of the evaluation, our data specialist and AI engineer came together to analyze the compiled evaluation results to identify patterns and trends in the chatbot’s performance across the various test cases.
The key areas of the analysis focused on benchmarking the AI performance based on:

The effectiveness of context retrieval in finding relevant information.
The prevalence of factual vs. hallucinatory responses.
The accuracy of the generated responses compared to expected answers.
The time taken to generate the response.

Our Impact

We conducted 424 tests blending positive, negative, and edge-case user queries to comprehensively evaluate the chatbot’s performance, effectiveness, and accuracy.

Combining the analysis results, we created a comprehensive test report summarizing project findings that highlighted the strengths and weaknesses of the RAG system along with actionable recommendations for improving the chatbot’s performance.

Our comprehensive report helped the insurer identify the following key details about the AI chatbot’s performance:

Recommendations

Based on the evaluation we made the following recommendations to our client:

Improvement in Answer Correctness: The chatbot generated 5.9% incorrect and 5% partially correct answers, missing the industry benchmark of greater than 90% correctness.
Elimination of Hallucination: We identified several instances of AI hallucination during our evaluation, where the chatbot provided fabricated information outside the scope of the provided documents. This creates a serious risk of customer misinformation, impacting satisfaction and potentially resulting in financial losses due to misrepresentation. We recommended our client prioritize the elimination of these hallucinations to mitigate these risks.
Accelerate Query Responses: The chatbot API took 10 seconds to respond to user queries, missing the industry-set benchmark of 5 seconds. We recommended our client reduce the query response time to 5 seconds or lower to avoid user frustration and drop-offs.

This case study showcases how QK’s comprehensive and intelligent quality engineering framework empowered a leading insurance company to failproof their AI chatbot performance. By providing detailed recommendations and insights on key chatbot performance parameters, we helped the client deliver enhanced customer experiences and optimize AI deployment to meet their business goals. The success of the project underscores the need for rigorous evaluation of RAG systems before deployment.

Want to accelerate your
digital banking
transformation journey?

Contact Us

With digital penetration skyrocketing in the Middle East, the BFSI industry continues to evolve to meet the changing demands of the digital-first customers in the region. The trend has resulted in exponential growth in digital banking services in the region, with a recent report estimating the sector to have grown at 52% between 2021 and 2023.

Our client, one of the top 10 largest banks in the UAE offering a full range of innovative retail and commercial banking services, wanted to capitalize on the exponentially growing sector in the region and proactively stay ahead of the fast-changing banking landscape. To accomplish its goal, the UAE banking giant was undertaking an IT modernization journey to futureproof its digital ecosystem for high-velocity innovation, enhanced reliability, and user-centric experiences.

Combining the trifecta of proprietary processes, expertise, and technology, QualityKiosk analyzed the bank’s requirements and established a Testing Center of Excellence (TCoE) to enable accelerated quality engineering at scale.

Leveraging an AI-first approach, the TCoE helped the banking giant:

Accelerate completion of 35+ digital modernization projects
Develop an AI-ready enterprise-wide testing framework
Reduce testing regression times by 70%
Enhance automation penetration by 70%
Reduce quality engineering costs by 20%

Download the complete case study today and access the roadmap to enable AI-powered enterprise-wide testing.

Leveraging an AI-first approach, the TCoE helped the banking giant:

Accelerate completion of 35+ digital modernization projects
Develop an AI-ready enterprise-wide testing framework
Reduce testing regression times by 70%
Enhance automation penetration by 70%
Reduce quality engineering costs by 20%

Download the complete case study today and access the roadmap to enable AI-powered enterprise-wide testing.

QK Helps Leading Indian Insurer Evaluate its Gen AI-powered Chatbot

Overview

Client Overview

Business Objectives & Challenges

Goals

The QK Strategy: AI-Augmented QE Framework for Chatbot Testing

Chatbot Testing Performance Dimensions

Two-phase Testing Approach

Phase 1: Test Data Generation

Phase 2: Evaluation

Results

Our Impact

Recommendations

Want to accelerate your
digital banking
transformation journey?

Contact Us

Download Case Study

Download Case Study

QK Helps Leading Indian Insurer Evaluate its Gen AI-powered Chatbot

Overview

Client Overview

Business Objectives & Challenges

Goals

The QK Strategy: AI-Augmented QE Framework for Chatbot Testing

Chatbot Testing Performance Dimensions

Two-phase Testing Approach

Phase 1: Test Data Generation

Phase 2: Evaluation

Results

Our Impact

Recommendations

Want to accelerate your digital banking transformation journey?

Contact Us

Download Case Study

Download Case Study

Want to accelerate your
digital banking
transformation journey?