Orchestrating Cognitive LLM Agents

LLM-based multi-agent systems still face several key challenges that impact their effectiveness and scalability. One of the main challenges is ensuring seamless communication and coordination among agents, as these systems often involve complex interactions across various tasks and domains. We have developed robust mechanisms for synchronization and negotiation, enabling LLM conversational agents to understand and respond to each other’s actions in real time. We propose a novel multi-agent system architecture where LLM-powered cognitive agents interact to analyze data effectively. Each agent within the system is designed to perform specialized roles, such as communication, coordination, and code generation, enabling task distribution and cooperative problem-solving. Unlike monolithic AI systems, our proposal promotes modularity and scalability, ensuring that individual agents can evolve independently while collaborating with other agents.

Planning in LLM-based cognitive Multi-agent Systems

Plan generation refers to selecting actions at a given moment from a set of possible actions within a defined environment and objective, requiring agents not only to perceive environmental conditions but also to account for other agents’ plans. We have developed novel planning approaches to anticipate and adapt to unexpected environmental changes, particularly in dynamic and data-centric settings. These strategies encompass both deriving plans for task-driven agents, leveraging different strategies toward a specific goal, and designing orchestration paradigms in which a single agent must best organize a group of distinct task-driven agents. Efficient plan generation is closely tied to Memory Systems, particularly when using external modules to access plan guidelines elaborated by human specialists to aid the agent in predetermined scenarios. This approach can be extended to optimize these guidelines based on the plan’s success or failure, thereby supporting the co-creation of plans with humans and LLM agents.

Memory in LLM Agents

LLM agents increasingly incorporate external memory to support long-term reasoning, adaptation, and task generalization. Prior research has examined reflective memory for self-evaluation and semantic memory for storing abstracted knowledge, but the interaction between these components has received limited attention. We introduce a unified memory architecture for LLM agents in which reflection and semantic memory co-evolve to improve agents’ lifelong learning capabilities. Our approach leverages reflection as a signal for curation of semantic memory consolidation, enabling selective strengthening, stabilization, and forgetting of stored long-term knowledge. Conversely, we propose a semantic-augmented reflection mechanism in which retrieved semantic knowledge enriches the agent’s self-critique, yielding more profound, more transferable insights. Our project further investigates approaches to forgetting in episodic memory and how to integrate distinct memory types via working memory mechanisms.

Language model refinement focuses on adapting language models to specific domains or tasks through fine-tuning, enabling them better to capture domain-specific terminology, structures, and reasoning patterns. Fine-tuning can be applied both to embedding models, improving information retrieval by learning representations that are better aligned with domain-specific semantics, and to generative models, thereby specializing them for particular tasks or areas of expertise by incorporating curated knowledge directly into the model parameters. A core objective of our work is to improve information retrieval quality, as retrieval effectiveness directly impacts downstream reasoning and generation.

Complementarily, Retrieval-Augmented Generation (RAG) integrates language models with external knowledge sources to ground generation in retrieved evidence, significantly reducing hallucinations and improving factual consistency. This paradigm is particularly effective in domains that are specialized, proprietary, or insufficiently covered during pretraining. Our research explores the refinement of individual RAG components, including retrievers and generators, and their joint optimization to strengthen the overall system synergy and end-to-end outcome.

In addition to fine-tuning–based approaches, we investigate retrieval enhancement methods that do not require model fine-tuning, such as LLM-driven query expansion and reformulation. These techniques aim to bridge lexical and semantic gaps between user queries and documents, improving recall and robustness in rapidly evolving domains. By combining fine-tuned and non–fine-tuned strategies, our approach provides flexible and scalable solutions for high-performance RAG systems in domain-specific settings beyond general knowledge.

Systematic Evaluation of LLM-based Multi-agent Systems

The rapid evolution of LLM-based Multi-Agent Systems (MAS) has intensified the need for standardized assessment protocols, as current practices frequently lack the methodological rigor required to evaluate complex tasks. This research addresses this gap by designing and developing a comprehensive technical framework for assessing LLM-based MAS. We have established a comprehensive taxonomy that categorizes evaluation methods, metrics, and specialized tools. By differentiating between individual-agent competencies, such as reasoning and tool use, and collective emergent behaviors, such as coordination and communication efficiency, our study identifies the essential requirements for a robust evaluation framework. Our approach integrates Ethical, Legal, and Social Aspects (ELSA) to ensure that technical effectiveness is balanced with responsible AI principles. This approach moves beyond traditional “LLM-as-a-Judge” paradigms, providing a reproducible, domain-agnostic standard that enhances observability and traceability across diverse agentic ecosystems.

Oil & Gas Benchmark Construction

Development of structured benchmarks and datasets to evaluate and calibrate the performance of large language models (LLMs) and multi-agent systems, as well as ontology engineering. This initiative focuses on the oil and gas sector, specifically on Floating Production Storage and Offloading (FPSO) units. The process aims to construct a representative set of real-world scenarios, operational decisions, and technical contexts that reflect the complexity of activities conducted on offshore production facilities.

Orchestrating Cognitive LLM Agents

Planning in LLM-based cognitive Multi-agent Systems

Memory in LLM Agents

Language Model Refinement and Retrieval-Augmented Generation

Systematic Evaluation of LLM-based Multi-agent Systems

Oil & Gas Benchmark Construction