Image Support for TYPO3 MCP Server - Enabling AI-powered Media Management (Alexander Bernhardt)

Idea Title
Image Support for TYPO3 MCP Server - Enabling AI-powered Media Management

What is my idea about?
What is the problem you are solving?

With an MCP server, you can control content in TYPO3 entirely from a corresponding client (e.g. Claude Desktop).
Examples of this include:
Proofreading
SEO optimisation
‘Translation’: into simple language
Complete rewriting of news items

We are developing a TYPO3 Model Context Protocol (MCP) server that enables LLMs to interact with TYPO3 installations.
The first Version is available here: hn/typo3-mcp-server - Packagist

We already have the MCP-Server in production for the first customers.

Currently, the server lacks image/file support from the fileadmin. We want to extend it with comprehensive image handling capabilities, including intelligent image search through detailed descriptions and metadata. This requires integrating or extending existing extensions for automatic alt-text generation and potentially adding features like person recognition to enable text-based image searches for AI assistants.

What do I want to achieve by the end of Q4 2025?
We would like to further develop our already implemented proof of concept and make it available as an extension for TYPO3 version 14. In doing so, we plan to provide the following actions:

  • Full fileadmin/FAL integration in the TYPO3 MCP server
  • Implementation of AI-powered image description generation
  • Text-based image search capabilities for LLMs
  • Integration with existing TYPO3 extensions for alt-text generation
  • Documentation and examples for the community
  • A working prototype that can be used with Claude, ChatGPT and other LLMs

What is the potential impact of your idea for the overall goal?
This directly supports TYPO3’s AI strategy and the GetAI toolbox initiative. It enables content editors to use AI assistants that can intelligently work with media assets - for example, finding appropriate images when creating news articles about specific people or topics. This makes TYPO3 more competitive in the AI-driven CMS landscape and provides immediate value for agencies and their clients.

How does your Idea align with the strategic goals for TYPO3 v14.
The project directly supports the GenAI toolbox goal by providing essential infrastructure for AI integration. MCP is becoming a standard for LLM-to-application communication, and having robust image support makes TYPO3 AI-ready. This positions TYPO3 as a forward-thinking CMS that embraces modern AI workflows while maintaining its strengths in content management.

Which budget do we need for this idea?
10000 Euro

My Name
Alexander Bernhardt

3 Likes

How does this budget request align with the approach “Interfaces Instead of Integration” which was outlined by Frank Nägler earlier this year?

Short answer: Yes, our TYPO3 MCP Server follows TYPO3’s “interfaces, not integration” approach. We expose tools that respect Workspaces, permissions, and languages; we don’t bundle any AI provider.

The media challenge (why images are different): MCP is great for structured tool calls, but it isn’t a firehose for raw media. Clients like Claude Desktop enforce tight payload limits for MCP resources (~1 MB), and vision models typically accept only a small set of images per request, so “send 1,000 images and let the model figure it out” isn’t viable.

Our plan (still pure MCP):

  • Provide search tools that operate on TYPO3’s own FAL metadata (alt text, description, tags, location). The tools return compact IDs/URLs/metadata, not bulk pixels.

  • Add preview/fetch for a small, selected subset when the assistant truly needs to look closely (thumbnails or single images), staying within client limits.

  • Allow write operations where applicable; raw file storage semantics differ from page/content versioning, so we scope features accordingly.

Optional (recommended) enhancements:

  • We’ll recommend third-party extensions that enrich metadata (people/places, not just generic “portrait” captions). These are recommendations, not requirements, and keep the MCP layer provider-agnostic.

  • I’m keen to contribute to those metadata extensions so they produce more useful, searchable descriptions.

    • A third-party extension could use the MCP-server tools to let an external model explore an image’s context. e.g., where the asset is used, which pages reference it, and the surrounding page copy, so it can generate richer, more discriminative metadata.

What this enables: Even without a metadata extension, an MCP client will be able to search by whatever metadata exists today, request previews of a few candidates, and then reference the chosen asset. With better metadata (via additional extensions), discovery will improve significantly, especially for people/places.