← Back to Course Contents · ← All Papers
Week 10

Agentic AI, RAG & Advanced Research Tools

Harnesses, long-horizon reliability, MCP, agentic RAG, and the 2026 tool landscape

12 papers covering harness optimisation, agent benchmarks, the reliability-vs-accuracy distinction, long-horizon planning collapse, agentic RAG, and RAG evaluation.

Each entry links to the canonical version of the paper — on arXiv, the journal, or the publisher. Where a paper is paywalled, the DOI is given for UCT-library access.

10.1 · What Agents Are and What's New in 2026

Meta-Harness: End-to-End Optimization of Model Harnesses
Lee, Y., et al. (2026)
Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command-Line Interfaces
(2026)

10.2 · Failure Modes for Long-Horizon Tasks

Towards a Science of AI Agent Reliability
Rabanser, S., Kapoor, S., Kirgis, P., Liu, K., Utpala, S., & Narayanan, A. (2026) — Princeton CITP
Why Reasoning Fails to Plan: A Planning-Centric Analysis of Long-Horizon Decision Making in LLM Agents
Wang, Z., et al. (2026)
YC-Bench: Benchmarking AI Agents for Long-Term Planning and Consistent Execution
(2026)
Beyond pass@1: A Reliability Science Framework for Long-Horizon LLM Agents
(2026)
Why Language Models Hallucinate
Kalai, A. T., Nachum, O., Vempala, S. S., & Zhang, E. (2025) — also cited in Week 9

10.3 · The Current Tool Landscape and MCP

The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search
Sakana AI (2025)

10.4 · RAG in 2026

Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG
Singh, A., Ehtesham, A., Kumar, S., Talaei Khoei, T., & Vasilakos, A. V. (2025)
Retrieval Augmented Generation or Long-Context LLMs? A Comprehensive Study and Hybrid Approach
Li, Z., Li, C., Zhang, M., Mei, Q., & Bendersky, M. (2024) — Google; EMNLP 2024
RAGAS: Automated Evaluation of Retrieval Augmented Generation
Es, S., James, J., Espinosa-Anke, L., & Schockaert, S. (2023)

10.5 · Advanced Research Tools — A Curated Tour

The Esethu Framework: Reimagining Sustainable Dataset Governance and Curation for Low-Resource Languages
Rajab, J., Aremu, A., Chimoto, E. A., et al. (2025) — also cited in Week 4

10.6 · Hands-On Activities and Assessment

Assessment design (the “Same Task, Three Ways” activity).