Telecom AI Hits a Reliability Wall as Evaluation Becomes the Real Bottleneck - ngopihangat

Telecom companies have moved beyond early experimentation with large language models (LLMs) and are now confronting a more difficult question: how to deploy them safely inside real networks. The challenge is no longer centered on model capability alone. It is increasingly defined by whether these systems can meet strict reliability, compliance, and operational standards. Insights from South Korean startup Selectstar (operating globally under Datumo brand), following its work with global telecom operators and GSMA initiatives, point to a deeper constraint shaping the next phase of telecom AI adoption.

Telecom AI Has a Higher Cost of Failure

Telecommunications networks operate under stricter conditions than most enterprise AI environments. Systems are closely tied to real-time operations, regulated services, and sensitive data.

According to Michael Hwang, Vice President at Selectstar (Datumo),

“Telecom environments are fundamentally different because the “cost of failure” is higher as the systems are tightly coupled with real-time operations and regulated services.
Unlike many enterprise AI applications where there is room for trial and error, telecom AI demands a much higher threshold for reliability, as errors can directly compromise customer trust, service continuity, and regulatory compliance.”

This therefore changes how performance is defined. Telecom AI must deliver consistent and safe outputs under edge cases, supported by auditable and repeatable governance. General usefulness is no longer enough when outputs can directly influence network operations.

Selectstar (Datumo) insights show telecom AI is now constrained less by models than by the testing, governance, and evaluation needed to deploy them. — *Michael Hwang, Vice President at Selectstar (Datumo), briefing participants ahead of the Red Team Challenge. | Source: Selectstar (Datumo).*

The Hidden Risk Inside Telecom LLM Deployment

“High-impact failures often stem from misinformation forcing rather than a lack of domain knowledge.”

— Vice President Michael Hwang of Selectstar (Datumo)

One of the more critical findings from telecom AI red teaming is that failures often arise from how models respond under pressure, not only from gaps in knowledge.

Selectstar highlights “misinformation forcing” as a recurring pattern. In these cases, prompts are designed to push a model into agreeing with false statements.

Authority-based instructions can override guardrails when a user claims internal access or instructs the system to ignore previous rules. At the same time, technically detailed prompts using real telecom terminology can lead to confident hallucinations because they resemble legitimate expertise.

These behaviors create risk in telecom environments where incorrect outputs can influence operational decisions and compliance processes.

The Benchmark Push Behind Telecom AI Readiness

The emergence of telecom-specific benchmarking reflects limitations in existing evaluation approaches.

Selectstar notes that frontier models still struggle with telecom-specific tasks such as interpreting network data, understanding standards documentation, and supporting operational workflows.

This gap helps explain why only a small portion of generative AI deployments in telecom have reached network operations. According to Selectstar, only 16% of deployments have been applied in this area so far, indicating limited readiness for core infrastructure use.

Initiatives such as the GSMA-led Open Telco AI and its benchmark framework introduce structured evaluation across telecom-specific metrics. The goal is to move beyond demonstration performance toward measurable evidence of deployment readiness.

How Red Teaming Is Changing Telecom AI Testing

The Global AI Red Team Challenge at MWC 2026 provides insight into how operators are changing their approach.

More than 130 participants took part, with dozens submitting vulnerability analyses. The program was designed as a closed and improvement-focused exercise, allowing findings to feed directly into internal evaluation cycles.

Selectstar describes a shift toward building repeatable evaluation loops. These include converting observed attack patterns into regression tests, updating guardrails, and establishing consistent validation criteria across model updates.

In this context, the value of red teaming lies not in ranking models, but in strengthening systems over time.

Why AI Evaluation Is Becoming Telecom Infrastructure

“The primary bottleneck is increasingly shifting from model capability to the evaluation infrastructure and governance required to deploy AI safely at scale.”

— Vice President Michael Hwang of Selectstar (Datumo)

As deployment expands, evaluation is evolving into a core capability within telecom operations.

Selectstar explains that AI trustworthiness testing is moving from ad hoc processes to structured workflows. These include designing telecom-specific evaluation datasets, combining automated scoring with human review, and integrating red teaming into both development and post-deployment monitoring.

This shift places evaluation alongside other established operational layers such as security and observability. It reflects a broader need for systems that can continuously verify reliability rather than rely on one-time validation.

Production Changes the Rules: What Telecom AI Deployment Really Requires

The transition from experimentation to production introduces stricter requirements.

In early stages, teams often optimize for general performance or demonstration quality. In operational environments, telecom operators must define clear thresholds for deployment decisions.

This includes setting approval processes, defining evidence requirements, and planning for failure scenarios such as fallback behaviors and escalation paths.

Continuous post-deployment evaluation also becomes necessary to ensure that previously identified issues do not reappear as models and policies evolve.

How Selectstar (Datumo) Fits the Telecom AI Evaluation Layer

Selectstar (Datumo)’s role in global telecom initiatives highlights a specific entry point for Korean startups. With its participation in GSMA’s Open Telco AI alliance and involvement in benchmarking efforts as Korea’s only startup, Selectstar further adds clearer perspectives of where value is forming.

Rather than competing in large-scale model development, Selectstar operates in the evaluation and reliability layer of the AI stack. As telecom AI deployment becomes more constrained by testing and governance, this layer now gains strategic importance.

And so, for Korean startups, this suggests that specialized capabilities tied to deployment readiness can provide access to global infrastructure ecosystems.

Why AI Governance Now Shapes Telecom Deployment

The central constraint in telecom AI has now been shifting.

Selectstar notes that the primary bottleneck is moving away from model capability toward the evaluation infrastructure and governance required to deploy AI safely at scale.

Operators now face the challenge of defining what telecom-grade trustworthiness means in practice. This includes determining acceptable levels of safety, accuracy, and consistency, as well as embedding evaluation processes across the full lifecycle of a system.

The issue is no longer limited to building models that perform well in controlled settings. It is about proving that these systems can operate reliably under real-world conditions.

Telecom AI Now Needs Proof, Not Promises

Telecom AI is progressing, but its constraints are becoming more specific and more demanding.

The industry is moving toward a phase where deployment decisions depend on structured evaluation, repeatable validation, and clear governance frameworks. Systems must withstand adversarial inputs, integrate with operational workflows, and remain reliable over time.

This shift changes where value accumulates in the AI ecosystem. It also creates space for companies focused on evaluation and reliability to play a larger role in global infrastructure markets.

“The industry is shifting from AI experimentation to prioritizing trustworthiness evaluation as a prerequisite for deployment.”

— Vice President Michael Hwang of Selectstar (Datumo)

Key Takeaways on Telecom AI Reliability and Deployment

Telecom AI deployment is constrained by reliability, governance, and evaluation rather than model capability alone.
“Misinformation forcing” and prompt-based attacks represent key failure patterns in telecom LLM systems.
The GSMA Open Telco AI benchmark reflects the need for telecom-specific evaluation frameworks.
Operators are shifting toward continuous evaluation loops instead of one-time testing.
AI trustworthiness testing is becoming a core infrastructure layer within telecom operations.
Korean startups like Selectstar (Datumo) are gaining relevance by focusing on the evaluation and validation layer of AI systems.

🤝 Looking to connect with verified Korean companies building globally?
Explore curated company profiles and request direct introductions through beSUCCESS Connect.

– Stay Ahead in Korea’s Startup Scene –
Get real-time insights, funding updates, and policy shifts shaping Korea’s innovation ecosystem.
➡️ Follow ngopihangat on LinkedIn, X (Twitter), Threads, Bluesky, Telegram, Facebook, and WhatsApp Channel.

PakarPBN

A Private Blog Network (PBN) is a collection of websites that are controlled by a single individual or organization and used primarily to build backlinks to a “money site” in order to influence its ranking in search engines such as Google. The core idea behind a PBN is based on the importance of backlinks in Google’s ranking algorithm. Since Google views backlinks as signals of authority and trust, some website owners attempt to artificially create these signals through a controlled network of sites.

In a typical PBN setup, the owner acquires expired or aged domains that already have existing authority, backlinks, and history. These domains are rebuilt with new content and hosted separately, often using different IP addresses, hosting providers, themes, and ownership details to make them appear unrelated. Within the content published on these sites, links are strategically placed that point to the main website the owner wants to rank higher. By doing this, the owner attempts to pass link equity (also known as “link juice”) from the PBN sites to the target website.

The purpose of a PBN is to give the impression that the target website is naturally earning links from multiple independent sources. If done effectively, this can temporarily improve keyword rankings, increase organic visibility, and drive more traffic from search results.

Jasa Backlink

Download Anime Batch

Telecom AI Hits a Reliability Wall as Evaluation Becomes the Real Bottleneck – ngopihangat