AI operational intelligence chatbot

AI operational intelligence chatbot

Industry

Enterprise

Business Operations

Internal Tool

Skills

Prototyping

Conversational AI Design

AI-Native Product Thinking

Team

Carolyn Tung (UX Designer)

Carolyn Tung (Design & Development)

Travis Roe (Product Manager)

Carolyn Tung (Design & Development)

Abhinav Goyal (Head of Managed Services)

Carolyn Tung (Design & Development)

Timeline

Jan - Mar 2025

TL;DR

Overview

Overview

I designed EVA, a conversational ops agent that turns operational data into decision-ready answers with built-in trust and verification so leaders can validate metrics on demand. This was a 0 -> 1 design project.

I designed EVA, a conversational ops agent that turns operational data into decision-ready answers with built-in trust and verification so leaders can validate metrics on demand. This was a 0 -> 1 design project.

PROBLEM

Operational intelligence existed, but it wasn’t usable.

Operational intelligence existed, but it wasn’t usable.

Operational intelligence for the Managed Services team was buried in opaque logic and not easily accessible or interpretable. Teams relied on daily exports and static reporting. Leaders needed the ability to self-serve answers instantly, but lacked trust and on-demand access.

Operational intelligence for the Managed Services team was buried in opaque logic and not easily accessible or interpretable. Teams relied on daily exports and static reporting. Leaders needed the ability to self-serve answers instantly, but lacked trust and on-demand access.

UNDERSTANDING USER NEEDS

User Research

User Research

My two primary users were Managed Services leadership and Ops Managers. My PM interviewed our key users to understand how leadership currently reviews metrics and received direct stakeholder feedback. We discovered that leadership reviews metrics via static reports, workbooks, and exports, but data entry is time-consuming and disorganized, so tracking performance metrics was not consistent.

  1. Trust was the bottleneck

Metrics were calculated using undocumented business rules and embedded in hidden worksheets. As a result, leadership lacked confidence in performance monitoring data.

  1. Leadership wanted on-demand answers based on existing data

A conversational chatbot could interpret data and establish trust better than a dashboard

Landscape Research

Landscape Research

According to Norman Nielson Group (1, 2), users feel like they'll AI products will upsell them, assuming that they are untrustworthy, inaccurate, and expensive due to hallucinations and the cost of tokens. For this project, I realized that we needed to give users some degree of success within their credit plan before they run out of credit limits. As a result, we ought to help them prompt less because increasing number of credits would be expensive for EOX Vantage.

According to Norman Nielson Group (1, 2), users feel like they'll AI products will upsell them, assuming that they are untrustworthy, inaccurate, and expensive due to hallucinations and the cost of tokens. For this project, I realized that we needed to give users some degree of success within their credit plan before they run out of credit limits. As a result, we ought to help them prompt less because increasing number of credits would be expensive for EOX Vantage.

"AI features must solve real problems, not be implemented for novelty. Unnecessary AI chatbots and features can harm rather than help users."

Norman Nielson Group

"Be skeptical of the marketing claims being made by AI tools designed for UX researchers. Many of these systems are not able to do everything they claim."

Norman Nielson Group

Competitive Research

Competitive Research

Perplexity was my North Star for chatbot interface design. By auditing high caliber chatbots, I gained insight into how they designed which sources to prioritize (academic papers vs. news vs. Reddit), how to rank conflicting information, and what to do when sources contradict each other. Perplexity demonstrates how anchoring every response to verifiable sources, a substrate-level design decision related to the retrieval policy, fundamentally establishes trust, far beyond what surface-level UI changes can achieve.

Perplexity was my North Star for chatbot interface design. By auditing high caliber chatbots, I gained insight into how they designed which sources to prioritize (academic papers vs. news vs. Reddit), how to rank conflicting information, and what to do when sources contradict each other. Perplexity demonstrates how anchoring every response to verifiable sources, a substrate-level design decision related to the retrieval policy, fundamentally establishes trust, far beyond what surface-level UI changes can achieve.

Exploration

Exploration

Scrapped

Scrapped

Raw text input

Raw text input

Played around with this, but ultimately decided against it to control model behavior and prevent off-domain or unverifiable responses

Result citations

Result citations

  • Could help user track data entries used in calculations, boosting trust and credibility

  • Prioritized for future phase

Implemented

Implemented

Pre-configured questions

Pre-configured questions

  • Reduced prompt ambiguity

  • Anchored queries to supported metrics

  • Reduced prompt ambiguity

  • Anchored queries to supported metrics

Chain-of-Thought disclosure

Chain-of-Thought disclosure

Thinking/reasoning experience to show how the model is calculating the data

Opinionated answer structure

Opinionated answer structure

Chat responses followed structured template:

  1. Time-based comparison (yesterday / 7d / 14d / 30d)

  2. Trend interpretation

  3. Operational explanation (what likely changed)

Threading history

Threading history

Iterations

Iterations

During review, we discovered that Ops Managers needed to ask micro-level and mid-level questions whereas leadership only needed to ask macro-level questions

01—Creating two role views for Managed Services leadership and for Ops Managers

01—Creating two role views for Managed Services leadership and for Ops Managers

02—Establishing trust & verifiability

02—Establishing trust & verifiability

We also discovered users needed to double-check the data used against the calculation. To fulfill this user need, I extended the design so that any metric answer could display calculation logic and source attribution.

03—Recommended follow-up questions

03—Recommended follow-up questions

Improving success rate of conversational experience & decreasing org costs for tokens

TESTING

Success Criteria

Success Criteria

Retrieval + grounding (RAG over structured data)

Retrieval + grounding (RAG over structured data)

  • Reduced ad-hoc reporting requests to analysts

  • Faster leadership decision cycles (staffing, tooling, escalation response)

Higher trust signals

  • Proof Mode usage rate

Key Takeaways

Key Takeaways

By working through multiple iterations of the chatbot interface, it became increasingly clear that it is more imperative to be AI-second than AI-first. I was able to sprinkle AI in where it can reduce frustrations or improve/speed up successes instead of making everything automated without solving a real problem or user need.

By working through multiple iterations of the chatbot interface, it became increasingly clear that it is more imperative to be AI-second than AI-first. I was able to sprinkle AI in where it can reduce frustrations or improve/speed up successes instead of making everything automated without solving a real problem or user need.