The AI Transparency Report: What Our AI Gets Right and Wrong
B Mohan
Published February 24, 2026 · Updated February 24, 2026 · 7 min read
Every AI company highlights what their product does well. Few publish honest assessments of where their product falls short. We think you deserve both.
We ran 1,000 simulated conversations through our AI agents across multiple industry templates — dental, real estate, restaurant, legal, and fitness — and analysed the results. This report shares what we found: what our AI handles well, where it struggles, and what we are actively working to improve.
What Our AI Gets Right
### FAQ Responses (High Accuracy When Knowledge Base Is Populated)
When a business has populated its knowledge base thoroughly, our AI agents answer frequently asked questions with high accuracy. In our simulated tests, agents correctly answered straightforward FAQs — office hours, service descriptions, pricing ranges, location information — in the vast majority of cases where the relevant information existed in the knowledge base.
**Why this works well:** Our RAG (Retrieval-Augmented Generation) system retrieves relevant content from the knowledge base before generating a response. This means the AI is drawing from your actual business information rather than inventing answers. The agent acts more like a well-informed receptionist reading from your own materials than a general-purpose AI guessing at answers.
**Important caveat:** Accuracy drops noticeably when the knowledge base is sparse or poorly organised. If you upload a single paragraph about your services, the AI has very little to work with. Quality in equals quality out.
### Appointment Scheduling Queries
Our agents handle appointment scheduling inquiries effectively — collecting preferred dates, times, service types, and contact information. In our simulations, agents correctly captured all necessary scheduling information in the large majority of straightforward booking conversations.
**Why this works well:** Appointment scheduling follows a predictable pattern. The agent knows what information to collect and asks clarifying questions when details are missing.
### Basic Lead Qualification
When configured with qualification criteria, our agents successfully identify and categorise leads based on budget range, timeline, service needs, and location. The agents ask relevant qualifying questions and tag leads appropriately for follow-up.
**Why this works well:** Lead qualification is fundamentally a structured information-gathering task, which plays to the AI's strengths.
### After-Hours Responses
This is arguably our strongest use case. Our agents provide immediate, helpful responses at any hour, capturing leads and answering questions that would otherwise go unanswered until the next business day.
**Why this matters:** Research from InsideSales.com and MIT shows that lead contact rates drop dramatically after the first five minutes. An AI agent that responds instantly at 11 PM captures leads that would otherwise be lost to competitors who respond the next morning.
What Our AI Gets Wrong
We believe this section is the most important part of this report. Understanding limitations helps you deploy AI agents effectively and set realistic expectations for your team and customers.
### Complex Multi-Step Reasoning
When conversations require the agent to hold multiple pieces of context, make conditional decisions, and navigate branching logic across many turns, accuracy degrades. For example, a conversation like "I need a root canal, but I only have insurance that covers 80%, my budget is $500 out of pocket, and I need it done before I travel on March 15th — can you find me a time that works?" involves multiple constraints that the agent sometimes handles incompletely.
**Observed behaviour:** The agent may address some constraints while overlooking others, or it may ask redundant clarifying questions because it loses track of information provided earlier in the conversation.
**Our approach:** For complex, multi-constraint requests, the agent is designed to capture all the information and route to a human team member rather than attempt to resolve everything autonomously. We believe this is the responsible approach.
### Highly Emotional or Sensitive Conversations
AI is not equipped to handle conversations that require genuine empathy, emotional intelligence, or crisis management. When a customer is upset, distressed, or dealing with a sensitive situation, scripted empathy from an AI can feel hollow and sometimes makes the situation worse.
**Observed behaviour:** The agent produces responses that are polite and technically appropriate but can feel generic or tone-deaf in emotionally charged situations.
**Our approach:** We configure agents to detect signals of emotional distress or frustration and escalate to a human team member promptly. The agent acknowledges the customer's feelings and clearly states that a real person will follow up. We do not pretend the AI can substitute for genuine human empathy.
### Industry-Specific Jargon It Has Not Been Trained On
While our industry templates include common terminology, highly specialised or regional jargon can trip up the agent. Legal terms, medical subspecialty language, or local real estate terminology that is not in the knowledge base may be misunderstood or ignored.
**Observed behaviour:** The agent may misinterpret specialised terms, provide a generic response that does not address the specific question, or ask the user to rephrase in a way that feels frustrating to an expert.
**Our approach:** We encourage customers to add industry-specific terminology and definitions to their knowledge base. The more domain-specific content you provide, the better the agent performs. We are also continuously expanding our industry templates based on customer feedback.
### Situations Requiring Real-Time External Data
Our AI agents do not currently fetch real-time data from external sources during conversations. If a customer asks about today's stock price, current weather affecting a scheduled outdoor event, or live inventory availability from a third-party system, the agent cannot provide accurate real-time information.
**Observed behaviour:** The agent may attempt to answer with whatever information is in the knowledge base, which could be outdated, or it may correctly state that it does not have access to real-time information.
**Our approach:** We are building integration capabilities that will allow agents to pull real-time data from connected systems. Until those integrations are ready, agents are instructed to clearly state when they do not have access to current information and direct the customer to the appropriate source.
Our Approach: RAG Grounding and Hallucination Reduction
The foundation of our accuracy is RAG — Retrieval-Augmented Generation. Here is how it works in practice:
1. **A customer sends a message.** The agent receives the question or request.
2. **The system searches the knowledge base.** Our retrieval system finds the most relevant documents, FAQ entries, or content from the business's uploaded materials.
3. **The AI generates a response grounded in retrieved content.** Instead of generating an answer from its general training data, the AI uses the retrieved content as its primary source.
4. **If no relevant content exists, the agent says so.** This is critical. Rather than guessing or hallucinating, the agent explicitly states that it does not have information on that topic and offers to connect the customer with a human.
This approach significantly reduces hallucination compared to AI systems that answer from general knowledge. However, it is not a complete solution. RAG grounding does not eliminate all errors, and the quality of responses is directly tied to the quality and completeness of the knowledge base.
Honest Limitations of AI in Customer-Facing Applications
We want to be direct about something the industry often glosses over: **AI is not a replacement for human judgment. It is a tool that handles routine inquiries so your team can focus on complex, high-value interactions.**
The Stanford Institute for Human-Centered Artificial Intelligence (HAI) has published extensive research on AI reliability. Their 2024 AI Index Report documents that while AI capabilities have improved dramatically, reliability in open-ended, customer-facing conversations remains a challenge across the industry. AI systems perform best on structured, predictable tasks and struggle with ambiguity, emotional nuance, and novel situations.
This aligns with our own observations. Our AI agents are at their best when handling the 80% of customer interactions that are routine and predictable. They are at their weakest when conversations become unpredictable, emotional, or require information the system does not have.
Our Improvement Roadmap
Here is what we are actively working on to improve accuracy and capabilities:
### Short Term (Next 3 Months)
### Medium Term (3-6 Months)
### Long Term (6-12 Months)
What We Ask of You
If you are a current or prospective customer, here is how you can get the best results:
Sources
B Mohan
Founder, Aditya Labs
Founder of Aditya Labs. Building AI-powered customer service tools to help small businesses capture every lead and never miss a customer inquiry. Based in Watford, UK.
Ready to build your AI agent?
Start free. No credit card required. Simple setup — no coding needed.
Get Started Free