Week 8 Papers — Multimodal AI for Research

10 papers covering chart understanding, VLM blind spots, scientific images, document OCR, transcription benchmarks, and long-context behaviour. Two references (ACM FAccT, SSRN) are link-only.

All PDFs link to raw.githubusercontent.com; clicking will download the file directly. Source links go to the canonical version on arXiv, the journal, or the publisher.

8.1 · What Multimodal AI Can See, Hear, and Read

CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs

Wang, Z., et al. (2024)

↓ Download PDF arXiv:2406.18521

Vision Language Models Are Blind: Failing to Translate Detailed Visual Features into Words

Rahmanzadehgervi, P., et al. (2024) — ACCV 2024

↓ Download PDF arXiv:2407.06581

8.2 · AI and Scientific Images

Hidden flaws behind expert-level accuracy of multimodal GPT-4 vision in medicine

Jin, Q., et al. (2024) — npj Digital Medicine

↓ Download PDF DOI:10.1038/s41746-024-01185-7

Efficient deep learning-based approach for malaria detection

Mujahid, M., et al. (2024) — Scientific Reports

↓ Download PDF DOI:10.1038/s41598-024-63831-0

8.3 · Document Intelligence

Benchmarking Large Language Models for Handwritten Text Recognition

Crosilla, G., Klic, L., & Colavizza, G. (2025)

↓ Download PDF arXiv:2503.15195

olmOCR 2: Unit Test Rewards for Document OCR

Poznanski, J., Soldaini, L., et al. (2025)

↓ Download PDF arXiv:2510.19817

8.4 · Transcription and Audio Analysis

Automatic Speech Recognition (ASR) for African Low-Resource Languages: A Systematic Literature Review

Imam, S. H., Belay, T. D., et al. (2025)

↓ Download PDF arXiv:2510.01145

Benchmarking Automatic Speech Recognition Models for African Languages

Nahabwe, A., Kagumire, S., et al. (2025) — Deep Learning Indaba 2025

↓ Download PDF arXiv:2512.10968

8.5 · Video and Multimodal Workflows

Lost in the Middle: How Language Models Use Long Contexts

Liu, N. F., Lin, K., Hewitt, J., Paranjape, A., Bevilacqua, M., Petroni, F., & Liang, P. (2024) — TACL 2024

↓ Download PDF arXiv:2307.03172

8.6 · Hands-On Activities and Assessment

Hydroxychloroquine and chloroquine prophylaxis for COVID-19

COPCOV Investigators (2024) — PLOS Medicine — activity 3 source paper

↓ Download PDF DOI:10.1371/journal.pmed.1004428

Linked but not redistributed

Koenecke, A., et al. (2024). Careless Whisper: Speech-to-Text Hallucination Harms. FAccT ’24. DOI:10.1145/3630106.3658975 8.4

ACM Digital Library doesn’t expose a public PDF endpoint; click through or use ACM Open Access.

Friese, S. (2025). From Coding to Conversation: A New Methodological Framework for AI-Assisted Qualitative Analysis. SSRN:5232579 8.4

SSRN bot-protected.