The TCA is a mess!

TL;DR: I and Manuel Selbach have requested the creation of a working group to completely refactor TCA, with the implicit goals of 1) reducing the size and complexity, 2) providing a fixed API built on formalised methods, and 3) separate schema, presentation and controls (access, etc.). We are fully aware of literally all points raised in this topic.

The definition of TCA is Table Configuration Array. This hints at the context in which TCA is used: it is solely for configuration.

The following is only a very rough run through the points.

Point 1

TCA is not intended to hold database schema information and a refactoring would not integrate schemas with TCA in any way. Rather, the goal would be to separate the two parts completely. TCA is for mapping and should remain mapping-oriented. Whether or not TCA is satisfactory to define schema is not so much the matter as making it better suited for defining mapping.

That being said - some of the information stored in TCA can be used to generate certain parts of the schema and this may be done as part of a TCA working group’s efforts.

Point 2

TCA itself is not inefficient (it is merely metadata in array form and as such has no “performance” inherently). The way it is processed however, is in some cases inefficient (cascading through relations in particular). Although the processing of input to map it into tables is not directly related to TCA itself, having a fully formalised API and approach to TCA would help with things like extracting query builder parts for creating joined queries and other optiimsations.

Side note: Extbase’s persistence layer attempts to do some of these optimisations by analysing TCA and creating joined queries accordingly. Some parts of this logic are good, others are not. A more generalised approach may be desirable (and would be easier to achieve given a proper API around TCA).

Point 3

This is partially true; TCA does not currently define actual schema, but it does contain presentation-specififc instructions (FormEngine configuration). These parts should be decoupled. It is however unlikely that it will be entirely possible (or reasonable) to generate schemas in full by assimilating only TCA.

This is entirely the responsibility of the DataHandler and does not directly pertain to, or get facilitated by, refactoring TCA itself. At most, refactoring TCA could make it easier to write such integrations, since the mapping information can be made accessible through a proper API as opposed to having to be read through accessing a huge array (e.g. ability to extract all relations and their type, from A to B).

Point 4

First off: not all tables have parent fields and this also means that the parent field is not imposed by TYPO3. A parent field can mean many things but common for all is that it’s a relation, in the case of pid a relation to the pages table. The main reason why a pid column exists is to make it possible to list and retrieve the records through listing the records assoiated with a given page (making the page a “storage” separation).

Second: changing the nature of the primary identity field may not be reasonable. In the past, and several times, discussions have been raised about adopting UUID logic as primary identifier - the main problem is that making these fields unpredictable (as in: not necessarily named uid and not necessarily containing an integer) has an extreme degree of impact on almost all parts of TYPO3 in context of record handling.

Therefore these changes would likely not be a priority or focus area for refactoring TCA. That’s not to say it won’t ever happen - but the benefit it brings is minor and the downsides (in terms of accompanying changes) are major.

Point 5

Yes, decoupling is required. Yes, a type-specific filtering ability is also required.

Proposition

I think it is necessary to separate the concerns, and your last segment in the proposal mixes up several concerns that don’t immediately apply to TCA (as mentioned above). The way Repositories work, the Extbase persistence layer, the nature of SQL columns etc. are all concerns which exist separately from TCA. While it is true that most of these can be improved, decoupling the parts of TCA would have no immediately effect on any of those (with one possible minor exception being that parts of the database schema may be possible to generate in order to limit the definition requirements in SQL schema files).

Therefore I would suggest to not make those components a deciding factor (the proposal is about improving TCA, not about improving Repositories - TCA is only step 1, Repositories etc. may be steps 2 to X). If we take this starting point, the goal is to separate the concerns of TCA and provide a more uniform API:

  1. Reducing the size and complexity of TCA definitions. Hereunder, making the various controls and definitions inside TCA more uniform and grouping together things that are “the same”, e.g. access constraints (enableFields) and so on.
  2. Providing a formalised API (including validation ability) which will reduce the cognitive load of developers who need to write TCA, as well as that of developers who need to read and analyse TCA to create a certain feature.
  3. Separate mapping information from schema and presentation information. Which means extracting the “how field looks” from the “how field works”. This may also provide a way to make the “how field looks” definitions more uniform and in the end may result in being able to define how an entire FormEngine setup should look for any given table (with options to add third party fields in specific places, make modifications to existing fields, and so on).