The TCA is a mess!

Yes, I know, to some, the mere mention of drastic changes to something they’re used to or have well mastered is to them, like an act of war. They first wipe their face and head for fresh air. This topic is not for the faint at heart. It’s not for every end-user or 'used-to’s developers or even every ‘experienced’ TYPO3 developers per se. This is for TYPO3 architects, the software architects among you. Those who sit down and plan an infrastructure. It’s for people who like to see things cleaned up or done in a proper, clear, consistent, conceptualized and planned fashion. Forgive me if am not so eloquent. Storytelling is not exactly my strongest suit. I can put you to sleep, quickly with boredom. Am just bringing to the table good experience and code analysis. So, thank you.

This is not a complaint. It’s a note on the inefficiencies and inadequacies of the TCA in its current form, its power, prowess and the limitations it places on any developer. I won’t get into specifics but just give a general ground.

This will benefit all those who work with tables, forms, content etc.

Experience
I was using TYPO3 when version 4 came out and have since used every version and have stuck to TYPO3 on a daily basis since then to this day with the current version 10.4.5. There are several things that are great about TYPO3 to this day that keep getting better with every other release. These include the localization features (though its changing for use in classes), the caching system, imaging, the yaml settings, the DBAL (or so I call it, the connection to the database features), the menu system and typoscript (for design). For others, the more things change, the more they remain the same, such as the extension manager and the MVC. Others require heavy-duty changes. These include - the normalization of the backend and the TCA. Others keep getting worse like lack of proper documentation (what TYPO3 calls documentations are actually mostly white-papers), the database limitations, long development times given that no two websites are exactly the same, among other things. For some of these I shall open separate topics for them (if you don’t mind) so you can weigh-in and suggest improvements and strategies that are simple and cheap. I will too.

TCA
The TCA desperately needs an extreme makeover. A complete overhaul. As demands for complexity and sophistication increases with every passing day, the TCA has been left behind to such a degree that innovation in websites using TYPO3 is slowly becoming difficult. Some follow TCA like a cult, I don’t understand why, it not capable of very much and would like you to sacrifice more and more on its alter- ‘Oh give me more. I want more… feed me moooooore…’. The rest of us say, it’s too big and evil it should be broken up, like Amazon. Am talking from an end-use-development outsider perspective - not from an insider/establishment type. Using the TCA right now feels like a cookie-cutter type of software where you either fit in or you quit. It’s too restrictive, too cumbersome and a lot more difficult and time-consuming - for an example, it should allow for custom queries that override TYPO3 data process generated 'queries. It should allow for user-designed joins for foreign tables or fields, etc. - it should do that for you. Heck it doesn’t allow for such. It simply says - show me your license or step out of your car with hands up. In other words, it demands perfection rather than compliance, conformity rather than freedom. I’ve read about people pouting and quitting because it more and more takes away more ‘freedom’ from you that is necessary for a developer to really showcase his talents and compete readily with the rest of the world - all because of the TCA. These things TYPO3 can change so easily. TYPO3 is actually best suited for all these with some clear-cut changes to the TCA, which don’t take much.

Short-comings

Point 1
TCA has no proper way of defining tables, their fields and the relationships between them. Yes, I know I know. It pretends to. But it doesn’t know how to. Currently, it simply requires pre-definitions (that is, simply mapping - which is not even that good) - and hence you get a pre-determined outcome - which you mostly thumb-down. Looking at many extensions (a variety of them), although they lay out their tables based on the rules (which need changing also - in favor of a standard), on the inside, they later resort to defining their own rules using CRUD in their repository and other methods, disregarding and without benefiting from the already set rules. And this begs the question, who then are these rules for? And the sheer number of them is alarming. Nearly every extension out there that uses the database has some mechanism that overrides some aspect of the way the TCA works - which is ok but which is also my point - showing that the TCA’s mechanisms are not satisfactory. One can say, it’s not supposed to satisfy everyone. But if the developer were instead allowed to set the rules for their own processing, this would greatly reduce development time - after all, the developer will be processing using his own rules via the TCA. And in the absence of own rules, a best case scenario should kick-in.

Point 2
Currently, TCA is inefficient. Over the years a lot has been added to the TCA. And rather than redesign it; codes, snippets, patches and hacks have been made to it, many with hind-sight but no fore-sight or concern about their impact to usabilities or robustness. This you soon find out when you have a new table layout. Data processing using TCA is also inefficient. Retrieval of child records is also inefficient - easy and better to use UNIONs and such. When doing a multi-table query, in the repository, all children of a foreign field are returned, then stored. Without using hacks, it should be possible to return only the children you want and not all as is the case currently. Imagine having thousands of children. If you detach them, they get deleted. Storage of children is another issue all together.

Point 3
TCA has no clear separation of concerns. TCA does too many things, all lumped up together, all synchronized as one - bad bad software design. Database table definitions, their fields, relationships plus their conditions and accompanying models should have a separate definition, away from backend forms and its interpretations (for those who chose to have backend forms for either of their tables), away from data retrievals and away from data storage and storages. This would also require re-designing the form engine so one can design it as they please and have its rules in a completely separate file. By separating these concerns, the total processing time would greatly reduce (since only what is needed is included), developers can be given the ability to manipulate the data between stages, and add or remove capabilities as they see fit, among many other benefits like having the ability to develop database administration tools and such.

Point 4
The TCA has no way of declaring which of your fields in a table is the primary field and which one is the parent field it is all assumed to be uid and pid. If TYPO3 can do these, I along with many others can consider this a BIG win! Additionally, it has no way of knowing whether the parent field exists in the same table or not and how to connect to it. This by itself opens up a whole can of worms for developers who want to use TYPO3 to create unique experiences for their users and want to manage saving of data normally through the repositories. Lighten the load on the developer.

Point 5
Some of fields in the TCA configuration only work under certain conditions and might be relevant for other uses as well (based on type). Table configuration should exist completely separately all by itself together with its fields and associations.

— You can stop reading here —

Proposition
By now, I think you realize, I don’t like talking much about short-comings. I just leave them open-ended. I’ve looked at TCA related questions on the internet and there are people with a lot more to say than I do. They’re many. Can’t list all their concerns here. These are just to give you a general look and feel of the current state of the TCA and the need to conceptualize it - modularize it - modernize it. Talking about the solutions is far more exciting since the possibilities are endless. My belief is that a redesign of the whole concept of the TCA will revolutionize TYPO3 and give it an edge to adequately deal with challenges of the future, which are already knowcking at the door.

Beauty in the wilderness
There’s a location called Persistence under Extbase in the Configuration folder that defines Models and their tables. This, in my opinion, would be a perfect location to move all table configurations from the TCA files - each one of them. Perhaps a start? I don’t know. Allow it to handle table all definitions and all matters of storage and retrievals. Every configuration moved there should be subject to storage. Retrieval configurations can appear separately in the same folder or elsewhere. Table relationships and field relationships can also appear in this location. Each field can be an array of its configuration with enable/disable option, and its control feature declarations including sql for its retrievals, formatting and validation classes for its storage. Tables themselves can carry own SQL so that whenever they are references, the SQL executes in place of the generated one. These can also be provided for example in array form for the FROM, WHERE, GROUPBY,… clause for each table. Conditions for data storage can also be introduced for uses to supply their classes and if the condition return TRUE then data is stored otherwise data is not stored or retrieved.

There’s no one example that can adequately sum-up the deficiencies inherent in the TCA. But I think the following example can illustrate most of them. Imagine having just two table for your extension: Parent table and Child table. The parent table only hold the index record and relations between index records and such fields for each index as created, modified, deleted, etc. It doesn’t contain any actual data. The associated data records are stored in the Child table as title, first name, last name,… as individual records with a field to show which record is which. Unfortunately, the TCA in its present state cannot cope with this setup as it would treat the Child records as sub-ordinate to the Parent record and would treat the foreign field in the child table as the uid in the parent record when it’s not. Hard-core users would argue this, every which way - even I can, but they point being that having a table configuration defined by the DEVELOPER (not TYPO3’s internally generated scripts), separate from backend form interpretations and separate from table structure and data would greatly enhance the way we use TYPO3 - a great software due to its principles (not necessarily its processes). Backend table forms can benefit greatly from these kind of setup as well queries in the frontend.

Related areas
While we are here, I might as well mention a few related things.

SQL
Table structures are highly restricted even for miniscule tasks such as having a MYSQL generated date or hash column. Common keywords like ON or bool are not permitted also. It would be nice to have a switch in configuration extension file LocalConfiguration.php under connections or in table definition or other means by which a developer can be allowed to use these keywords, if he so chooses. Additionally, I work in tourism and so spatial keywords in MYSQL would do as great benefit if allowed, however they keep being flagged for changing whenever we analyze the database. and can’t be allowed to create them. The following examples are perfect illustrations of what’s possible with TYPO3 but won’t run under current TYPO3 database setup conditions:

modified DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
total_item_price DECIMAL(10,2) AS (quantity * product_price),

Repositories
Repositories have no way of dealing with kids (I mean children). So one has to iterate. Imagine thousands of children. I had to port a Laravel collections into typo3 (a small script) which I use for those tasks. (The collections are developed by a German company and available for free in GitHub). I don’t know if any thoughts have been given in the capabilities to work with child object of a parent field in result sets produced by createQuery function in the repository. These can easily be configured in a robust TCA.

In the Repositories, children are a joy to work with - with the know-how but are a pain when it comes to putting them to sleep (persisting them). You add the children to the parent and persist the parent. Unfortunately, the persistent manager persists ALL children - again! - a disaster. And if you exempt certain children from storage, they get deleted - dear Lord! So you can’t win with children. Isn’t that how it’s supposed to be… everywhere? Am just kidding here. But you get it, there ought to be a way to exclude a child from updating without deleted them and getting the children you want, not all of them.

What do you think? Have I summed it up well? I think these changes are long overdue.

Just my 2 short cents, since the text has been veeery long and it’s already a bit late.

While on the one hand you are mentioning some issues that are at least valid, on the other hand you mix up too many different things to really come to a decision - which is what decisions.typo3.org actually is about.

The major parts you are mixing up are TCA, DataHandler and Extbase, especially Extbase Repositories. The job of TCA is the definition of backend forms and behaviours of the DataHandler, which usually does not include any piece of extbase at all.

Even storing things in the database of an extbase based extension will never happen via extbase repositories as long as it is done from within the usual backend editing forms, which are configured by TCA. Repositories and the Extbase persistence layer might come into action while storing data via self written importers or via frontend forms, but only if you are using extbase within your extensions - which I would not recommend for importers, since they should make direct use of the DataHandler instead, but that’s another story.

So IMHO those concerns should be separated from your original text first, before this could be the base for any kind of actual TCA related decision over here.

Regarding SQL you should keep in mind, that TYPO3 has to take care of several major database systems, not just MySQL - so very specific queries or table definitions that might work with MySQL would break the CMS for users of other database flavors.

TL;DR: I and Manuel Selbach have requested the creation of a working group to completely refactor TCA, with the implicit goals of 1) reducing the size and complexity, 2) providing a fixed API built on formalised methods, and 3) separate schema, presentation and controls (access, etc.). We are fully aware of literally all points raised in this topic.

The definition of TCA is Table Configuration Array. This hints at the context in which TCA is used: it is solely for configuration.

The following is only a very rough run through the points.

Point 1

TCA is not intended to hold database schema information and a refactoring would not integrate schemas with TCA in any way. Rather, the goal would be to separate the two parts completely. TCA is for mapping and should remain mapping-oriented. Whether or not TCA is satisfactory to define schema is not so much the matter as making it better suited for defining mapping.

That being said - some of the information stored in TCA can be used to generate certain parts of the schema and this may be done as part of a TCA working group’s efforts.

Point 2

TCA itself is not inefficient (it is merely metadata in array form and as such has no “performance” inherently). The way it is processed however, is in some cases inefficient (cascading through relations in particular). Although the processing of input to map it into tables is not directly related to TCA itself, having a fully formalised API and approach to TCA would help with things like extracting query builder parts for creating joined queries and other optiimsations.

Side note: Extbase’s persistence layer attempts to do some of these optimisations by analysing TCA and creating joined queries accordingly. Some parts of this logic are good, others are not. A more generalised approach may be desirable (and would be easier to achieve given a proper API around TCA).

Point 3

This is partially true; TCA does not currently define actual schema, but it does contain presentation-specififc instructions (FormEngine configuration). These parts should be decoupled. It is however unlikely that it will be entirely possible (or reasonable) to generate schemas in full by assimilating only TCA.

This is entirely the responsibility of the DataHandler and does not directly pertain to, or get facilitated by, refactoring TCA itself. At most, refactoring TCA could make it easier to write such integrations, since the mapping information can be made accessible through a proper API as opposed to having to be read through accessing a huge array (e.g. ability to extract all relations and their type, from A to B).

Point 4

First off: not all tables have parent fields and this also means that the parent field is not imposed by TYPO3. A parent field can mean many things but common for all is that it’s a relation, in the case of pid a relation to the pages table. The main reason why a pid column exists is to make it possible to list and retrieve the records through listing the records assoiated with a given page (making the page a “storage” separation).

Second: changing the nature of the primary identity field may not be reasonable. In the past, and several times, discussions have been raised about adopting UUID logic as primary identifier - the main problem is that making these fields unpredictable (as in: not necessarily named uid and not necessarily containing an integer) has an extreme degree of impact on almost all parts of TYPO3 in context of record handling.

Therefore these changes would likely not be a priority or focus area for refactoring TCA. That’s not to say it won’t ever happen - but the benefit it brings is minor and the downsides (in terms of accompanying changes) are major.

Point 5

Yes, decoupling is required. Yes, a type-specific filtering ability is also required.

Proposition

I think it is necessary to separate the concerns, and your last segment in the proposal mixes up several concerns that don’t immediately apply to TCA (as mentioned above). The way Repositories work, the Extbase persistence layer, the nature of SQL columns etc. are all concerns which exist separately from TCA. While it is true that most of these can be improved, decoupling the parts of TCA would have no immediately effect on any of those (with one possible minor exception being that parts of the database schema may be possible to generate in order to limit the definition requirements in SQL schema files).

Therefore I would suggest to not make those components a deciding factor (the proposal is about improving TCA, not about improving Repositories - TCA is only step 1, Repositories etc. may be steps 2 to X). If we take this starting point, the goal is to separate the concerns of TCA and provide a more uniform API:

  1. Reducing the size and complexity of TCA definitions. Hereunder, making the various controls and definitions inside TCA more uniform and grouping together things that are “the same”, e.g. access constraints (enableFields) and so on.
  2. Providing a formalised API (including validation ability) which will reduce the cognitive load of developers who need to write TCA, as well as that of developers who need to read and analyse TCA to create a certain feature.
  3. Separate mapping information from schema and presentation information. Which means extracting the “how field looks” from the “how field works”. This may also provide a way to make the “how field looks” definitions more uniform and in the end may result in being able to define how an entire FormEngine setup should look for any given table (with options to add third party fields in specific places, make modifications to existing fields, and so on).

Hi Claus. Thanks for the response. I actually didn’t expect anyone to pick up on it. And yes I know, the proposition part sucks. I didn’t understand it myself. It’s wild. That’s why I created a divider on it.

Thanks for the points and for every clarifications laid out. Your points are legit. Let’s go through them one-by-one:

  1. Point 1 - A WIN!. My point exactly! And yes, I understand the difference between the various facets. And don’t be fooled by my explanations. They were never meant to be exact. I can cry here - just kidding.

  2. Point 2 - I agree and disagree on some points. You make some great points. But you lose me on configurations. The whole redesign would hinge on clear-cut goals. Am not saying you know everything right now. Neither do I. Nevertheless, whatever comes out, I imagine,would be better. Speaking for myself, and I believe, for countless programmers out there, I would say, we can’t express our joy that the wheels are finally turning!

  3. Point 3 - I know. And so wonderful - they should be decoupled. Remember, the team working on this will have to invent something new - a new way to store these configurations and processes separately. - however simple. There ought to be (in my humble opinion) a completely different, new configuration for generating schemas - separate - and can be called wherever it’s needed. Am also sorry to say, that the DataHandler is another beauty. It’s a head-scratcher. It is so intricate and so much work went into it, so much dedication for such a long time. Maybe the team will talk to it nicely, see what it can give up to the other side. Those joins are funny. I don’t know. Yeah. It’s hard just thinking about it.

  4. Point 4 - I agree. In which case, if you have a different field as uid then just rename your fields and sign the primary as - uid.

  5. Point 5 - Yap.

You are right. Something like a fixed API would do great. Wow.

Hello Jo,

Thanks so much for your feedback. I’ve enjoyed it. Forget the proposition part. I was so tired while writing it. Got mixed up at the end of a hard-working day. You are right. Datahandler is something else. But I respect it on this one point - it does what it’s supposed to. No questions asked. I don’t mean it has no mistakes. it’s just loyal. Like a thief’s assistant. However, if TCA changes, some parts of it should.

Let’s talk about your last paragraph for a second. Am aware of the concern - completely. My Idea is to not have anything change. The idea is to add a feature. Something like a switch (in typoscript, yaml site settings, etc) that lifts the limitations if one so choses. Another way might be to add a class in the custom extension which get’s picked up and processed, or the adding of a hook; to add types to those already defined. the added data types should then be added whenever database analysis and comparisons are done.Doctrine already has that feature. I did add it sometime back, then I realized there was no hook in ConnectionPool.php (line 190) to hook them up. Would you care to take a look at that line? Something can be done there? or in another way? I don’t know, that sounds reasonable?

I/We will try to make it more of a priority to refactor (in the end, recreate) the concept of TCA as described in my bullet points. I actually agree with your points about the DataHandler but I strongly believe that the structure of the underlying configuration layer has to be cleaned up first - otherwise we’d be putting new floors on top of a bad foundation. It also doesn’t make much sense to do both at the same time because this will cause the scope to explode in size.

We would also be looking into potential third party libraries to manage the configuration - if we can avoid inventing something new then that would be ideal. Think Symfony configuration component, ORM systems, etc.

The success criteria is pretty much encapsulated in the bullet points, but of course the main criterias are to have a compatibility measure (a sort of “bridge”; I’m thinking a lot in terms of a generating feature that might generate TCA as a first implementation and then later on generate “something else” - basically viewing TCA as a “persistent storage format” that might be stored by arbitrary means and be accessed through the API as opposed to directly from the huge array). So the idea is to write an API that will cover and clean up all existing use cases and segment this API into separated concerns, yet still generate “old” TCA as output. Then after this begin the long and slow climb toward accessing TCA exclusively through this new API throughout the TYPO3 core, including in DataHandler. When the TYPO3 core no longer accesses TCA directly from the array, this frees it up to change the storage mechanism.

Needless to say, perhaps: this will be a very long running effort. A time-scale of years should probably be expected.

I have another side topic that i wanted to mention. I had a quick chat with Benni Mack about optimizing the TCA as well concerning the really horrible syntax of tabs, palettes, fields and field definitions like tx_contentelements_slideshowimages;;;;1-1-1 of \TYPO3\CMS\Core\Utility\ExtensionManagementUtility::addToAllTCAtypes('fe_users', 'tx_contentelements_slideshowimages;;;;1-1-1'); or do you know what settings you can set after each semicolon? I don’t. A kind of array syntax with key-value pairs would be much better in my opinion.

So if there is any working group, I would also like to be part of it.

Refs:

@simonschaufi I’ll keep you in mind for the working group!

@tomalo.stuttgart1 just came up with an API for TCA: https://git.spooner.io/spooner-web/tcabuilder/-/tree/main

This isn’t exactly the direction we would go in - our choice would be to have dedicated objects that reflect the structure, which can then be persisted to various formats. So instead of having PHP methods to manipulate TCA (to some degree TYPO3 already has this) we would implement a more object-oriented solution. In short, we would like to move beyond the current string instructions for structure and at the same time decouple UI and domain instructions.