Transparency

The AI Transparency Report: What Our AI Gets Right and Wrong

Name: Aditya Labs AI Agent Platform
Author: Aditya Labs

B Mohan

Published February 24, 2026 · Updated February 24, 2026 · 7 min read

Every AI company highlights what their product does well. Few publish honest assessments of where their product falls short. We think you deserve both.

We ran 1,000 simulated conversations through our AI agents across multiple industry templates — dental, real estate, restaurant, legal, and fitness — and analysed the results. This report shares what we found: what our AI handles well, where it struggles, and what we are actively working to improve.

What Our AI Gets Right

### FAQ Responses (High Accuracy When Knowledge Base Is Populated)

When a business has populated its knowledge base thoroughly, our AI agents answer frequently asked questions with high accuracy. In our simulated tests, agents correctly answered straightforward FAQs — office hours, service descriptions, pricing ranges, location information — in the vast majority of cases where the relevant information existed in the knowledge base.

**Why this works well:** Our RAG (Retrieval-Augmented Generation) system retrieves relevant content from the knowledge base before generating a response. This means the AI is drawing from your actual business information rather than inventing answers. The agent acts more like a well-informed receptionist reading from your own materials than a general-purpose AI guessing at answers.

**Important caveat:** Accuracy drops noticeably when the knowledge base is sparse or poorly organised. If you upload a single paragraph about your services, the AI has very little to work with. Quality in equals quality out.

### Appointment Scheduling Queries

Our agents handle appointment scheduling inquiries effectively — collecting preferred dates, times, service types, and contact information. In our simulations, agents correctly captured all necessary scheduling information in the large majority of straightforward booking conversations.

**Why this works well:** Appointment scheduling follows a predictable pattern. The agent knows what information to collect and asks clarifying questions when details are missing.

### Basic Lead Qualification

When configured with qualification criteria, our agents successfully identify and categorise leads based on budget range, timeline, service needs, and location. The agents ask relevant qualifying questions and tag leads appropriately for follow-up.

**Why this works well:** Lead qualification is fundamentally a structured information-gathering task, which plays to the AI's strengths.

### After-Hours Responses

This is arguably our strongest use case. Our agents provide immediate, helpful responses at any hour, capturing leads and answering questions that would otherwise go unanswered until the next business day.

**Why this matters:** Research from InsideSales.com and MIT shows that lead contact rates drop dramatically after the first five minutes. An AI agent that responds instantly at 11 PM captures leads that would otherwise be lost to competitors who respond the next morning.

What Our AI Gets Wrong

We believe this section is the most important part of this report. Understanding limitations helps you deploy AI agents effectively and set realistic expectations for your team and customers.

### Complex Multi-Step Reasoning

When conversations require the agent to hold multiple pieces of context, make conditional decisions, and navigate branching logic across many turns, accuracy degrades. For example, a conversation like "I need a root canal, but I only have insurance that covers 80%, my budget is $500 out of pocket, and I need it done before I travel on March 15th — can you find me a time that works?" involves multiple constraints that the agent sometimes handles incompletely.

**Observed behaviour:** The agent may address some constraints while overlooking others, or it may ask redundant clarifying questions because it loses track of information provided earlier in the conversation.

**Our approach:** For complex, multi-constraint requests, the agent is designed to capture all the information and route to a human team member rather than attempt to resolve everything autonomously. We believe this is the responsible approach.

### Highly Emotional or Sensitive Conversations

AI is not equipped to handle conversations that require genuine empathy, emotional intelligence, or crisis management. When a customer is upset, distressed, or dealing with a sensitive situation, scripted empathy from an AI can feel hollow and sometimes makes the situation worse.

**Observed behaviour:** The agent produces responses that are polite and technically appropriate but can feel generic or tone-deaf in emotionally charged situations.

**Our approach:** We configure agents to detect signals of emotional distress or frustration and escalate to a human team member promptly. The agent acknowledges the customer's feelings and clearly states that a real person will follow up. We do not pretend the AI can substitute for genuine human empathy.

### Industry-Specific Jargon It Has Not Been Trained On

While our industry templates include common terminology, highly specialised or regional jargon can trip up the agent. Legal terms, medical subspecialty language, or local real estate terminology that is not in the knowledge base may be misunderstood or ignored.

**Observed behaviour:** The agent may misinterpret specialised terms, provide a generic response that does not address the specific question, or ask the user to rephrase in a way that feels frustrating to an expert.

**Our approach:** We encourage customers to add industry-specific terminology and definitions to their knowledge base. The more domain-specific content you provide, the better the agent performs. We are also continuously expanding our industry templates based on customer feedback.

### Situations Requiring Real-Time External Data

Our AI agents do not currently fetch real-time data from external sources during conversations. If a customer asks about today's stock price, current weather affecting a scheduled outdoor event, or live inventory availability from a third-party system, the agent cannot provide accurate real-time information.

**Observed behaviour:** The agent may attempt to answer with whatever information is in the knowledge base, which could be outdated, or it may correctly state that it does not have access to real-time information.

**Our approach:** We are building integration capabilities that will allow agents to pull real-time data from connected systems. Until those integrations are ready, agents are instructed to clearly state when they do not have access to current information and direct the customer to the appropriate source.

Our Approach: RAG Grounding and Hallucination Reduction

The foundation of our accuracy is RAG — Retrieval-Augmented Generation. Here is how it works in practice:

1. **A customer sends a message.** The agent receives the question or request.

2. **The system searches the knowledge base.** Our retrieval system finds the most relevant documents, FAQ entries, or content from the business's uploaded materials.

3. **The AI generates a response grounded in retrieved content.** Instead of generating an answer from its general training data, the AI uses the retrieved content as its primary source.

4. **If no relevant content exists, the agent says so.** This is critical. Rather than guessing or hallucinating, the agent explicitly states that it does not have information on that topic and offers to connect the customer with a human.

This approach significantly reduces hallucination compared to AI systems that answer from general knowledge. However, it is not a complete solution. RAG grounding does not eliminate all errors, and the quality of responses is directly tied to the quality and completeness of the knowledge base.

Honest Limitations of AI in Customer-Facing Applications

We want to be direct about something the industry often glosses over: **AI is not a replacement for human judgment. It is a tool that handles routine inquiries so your team can focus on complex, high-value interactions.**

The Stanford Institute for Human-Centered Artificial Intelligence (HAI) has published extensive research on AI reliability. Their 2024 AI Index Report documents that while AI capabilities have improved dramatically, reliability in open-ended, customer-facing conversations remains a challenge across the industry. AI systems perform best on structured, predictable tasks and struggle with ambiguity, emotional nuance, and novel situations.

This aligns with our own observations. Our AI agents are at their best when handling the 80% of customer interactions that are routine and predictable. They are at their weakest when conversations become unpredictable, emotional, or require information the system does not have.

Our Improvement Roadmap

Here is what we are actively working on to improve accuracy and capabilities:

### Short Term (Next 3 Months)

Improved context retention: Enhancing the agent's ability to maintain context across longer, multi-turn conversations.

Better emotion detection: Improving the agent's ability to recognise when a conversation should be escalated to a human.

Expanded industry templates: Adding more specialised terminology and conversation patterns based on customer feedback.

### Medium Term (3-6 Months)

Real-time integrations: Building connections to booking systems, CRMs, and other tools so agents can access live data during conversations.

Multi-language support improvements: Expanding beyond English to serve businesses with multilingual customer bases.

Conversation analytics enhancements: Providing better insights into where agents succeed and struggle so businesses can improve their knowledge bases.

### Long Term (6-12 Months)

Advanced reasoning capabilities: Improving the agent's ability to handle multi-constraint requests without losing context.

Voice agent improvements: Enhancing our voice capabilities for phone-based interactions.

Feedback loop automation: Allowing agents to learn from corrections and improve over time within a business's specific context.

What We Ask of You

If you are a current or prospective customer, here is how you can get the best results:

Invest in your knowledge base. The single biggest factor in agent accuracy is the quality and completeness of the information you provide. Spend time adding thorough FAQs, service descriptions, policies, and procedures.

Review conversations regularly. Check the conversation logs to see where the agent is doing well and where it is falling short. This tells you what to add to the knowledge base.

Report issues to us. If your agent gives an inaccurate or inappropriate response, let us know at hello@adityalabs.ai. Every report helps us improve.

Set realistic expectations. Share with your team that the AI agent handles routine inquiries and escalates the rest. It is not a replacement for your staff — it is a tool that makes your staff more effective.

Sources

InsideSales.com / MIT — Lead Response Management Study (response time impact on contact rates)

Stanford HAI — 2024 AI Index Report (AI reliability in customer-facing applications)

Gartner — AI in Customer Service Predictions and Research

B Mohan

Founder, Aditya Labs

Founder of Aditya Labs. Building AI-powered customer service tools to help small businesses capture every lead and never miss a customer inquiry. Based in Watford, UK.

Share:Twitter/X LinkedIn

Ready to build your AI agent?

Start free. No credit card required. Simple setup — no coding needed.

Start Free

Transparency

The AI Transparency Report: What Our AI Gets Right and Wrong

B Mohan

Published February 24, 2026 · Updated February 24, 2026 · 7 min read

Every AI company highlights what their product does well. Few publish honest assessments of where their product falls short. We think you deserve both.

What Our AI Gets Right

### FAQ Responses (High Accuracy When Knowledge Base Is Populated)

### Appointment Scheduling Queries

**Why this works well:** Appointment scheduling follows a predictable pattern. The agent knows what information to collect and asks clarifying questions when details are missing.

### Basic Lead Qualification

**Why this works well:** Lead qualification is fundamentally a structured information-gathering task, which plays to the AI's strengths.

### After-Hours Responses

What Our AI Gets Wrong

We believe this section is the most important part of this report. Understanding limitations helps you deploy AI agents effectively and set realistic expectations for your team and customers.

### Complex Multi-Step Reasoning

### Highly Emotional or Sensitive Conversations

**Observed behaviour:** The agent produces responses that are polite and technically appropriate but can feel generic or tone-deaf in emotionally charged situations.

### Industry-Specific Jargon It Has Not Been Trained On

### Situations Requiring Real-Time External Data

Our Approach: RAG Grounding and Hallucination Reduction

The foundation of our accuracy is RAG — Retrieval-Augmented Generation. Here is how it works in practice:

1. **A customer sends a message.** The agent receives the question or request.

2. **The system searches the knowledge base.** Our retrieval system finds the most relevant documents, FAQ entries, or content from the business's uploaded materials.

3. **The AI generates a response grounded in retrieved content.** Instead of generating an answer from its general training data, the AI uses the retrieved content as its primary source.

Honest Limitations of AI in Customer-Facing Applications

Our Improvement Roadmap

Here is what we are actively working on to improve accuracy and capabilities:

### Short Term (Next 3 Months)

Improved context retention: Enhancing the agent's ability to maintain context across longer, multi-turn conversations.

Better emotion detection: Improving the agent's ability to recognise when a conversation should be escalated to a human.

Expanded industry templates: Adding more specialised terminology and conversation patterns based on customer feedback.

### Medium Term (3-6 Months)

Real-time integrations: Building connections to booking systems, CRMs, and other tools so agents can access live data during conversations.

Multi-language support improvements: Expanding beyond English to serve businesses with multilingual customer bases.

Conversation analytics enhancements: Providing better insights into where agents succeed and struggle so businesses can improve their knowledge bases.

### Long Term (6-12 Months)

Advanced reasoning capabilities: Improving the agent's ability to handle multi-constraint requests without losing context.

Voice agent improvements: Enhancing our voice capabilities for phone-based interactions.

Feedback loop automation: Allowing agents to learn from corrections and improve over time within a business's specific context.

What We Ask of You

If you are a current or prospective customer, here is how you can get the best results:

Review conversations regularly. Check the conversation logs to see where the agent is doing well and where it is falling short. This tells you what to add to the knowledge base.

Report issues to us. If your agent gives an inaccurate or inappropriate response, let us know at hello@adityalabs.ai. Every report helps us improve.

Sources

InsideSales.com / MIT — Lead Response Management Study (response time impact on contact rates)

Stanford HAI — 2024 AI Index Report (AI reliability in customer-facing applications)

Gartner — AI in Customer Service Predictions and Research

B Mohan

Founder, Aditya Labs

Founder of Aditya Labs. Building AI-powered customer service tools to help small businesses capture every lead and never miss a customer inquiry. Based in Watford, UK.

Share:Twitter/X LinkedIn

Ready to build your AI agent?

Start free. No credit card required. Simple setup — no coding needed.

Start Free

The AI Transparency Report: What Our AI Gets Right and Wrong

What Our AI Gets Right

What Our AI Gets Wrong

Our Approach: RAG Grounding and Hallucination Reduction

Honest Limitations of AI in Customer-Facing Applications

Our Improvement Roadmap

What We Ask of You

Sources

Ready to build your AI agent?

You Might Also Like

Why We Don't Buy Reviews (And Why You Should Care)

Choosing the Right AI Agent Plan: An Honest Buying Guide

How We Measure Success: Transparent KPIs at Aditya Labs

Related Resources

The AI Transparency Report: What Our AI Gets Right and Wrong

What Our AI Gets Right

What Our AI Gets Wrong

Our Approach: RAG Grounding and Hallucination Reduction

Honest Limitations of AI in Customer-Facing Applications

Our Improvement Roadmap

What We Ask of You

Sources

Ready to build your AI agent?

You Might Also Like

Why We Don't Buy Reviews (And Why You Should Care)

Choosing the Right AI Agent Plan: An Honest Buying Guide

How We Measure Success: Transparent KPIs at Aditya Labs

Related Resources