I’ve been doing some experimentation recently (yes, more of it), and Benni suggested this was a good time to throw out my latest skunkworks for consideration.
Executive Summary
Using a Serializer library, we can “deserialize” TYPO3_CONF_VARS into defined classed objects that are exposed through the DI Container. That will overall improve the tractability, self-documentation, and stability of the code, as well as open up the potential for further improvements to the configuration system in the future with less of a BC impact.
Details
- For every top level key in the
$GLOBALS['TYPO3_CONF_VARS']
array, we define a PHP class containing properties that correspond to the keys in the array. Nested items are represented by nested objects. For example:
class MailConfig
{
public string $transport;
public string $command;
public string $encrypt;
public string $server;
public string $username;
public string $password;
}
- We create a new service, for now I’m calling it
ConfigFactory
, that has a single method on it that takes a key and returns that portion of the config array, deserialized into the defined class. In concept it’s little more than this:
class ConfigFactory
{
public function getConfigClass(string $key, string $class): object
{
$data = $GLOBALS['TYPO3_CONF_VARS'][$key] ?? [];
return $this->serializer->denormalize($data, $class);
}
}
- That factory is wired into the container, as is a service entry for every top level key and the class that maps to it.
- Now, any service that wants a given part of the site configuration can declare that as a constructor dependency by class name, and it will be injected with all the data from the array, with defaults, types, etc. already handled. Thanks to auto-wiring, in the typical case a given class would need to do nothing more than this (in PHP 7.4):
class SomeClass
{
protected MailConfig $mailConfig;
public function __construct(MailConfig $mailConfig)
{
$this->mailConfig = $mailConfig;
}
public function someMethod()
{
$mailServer = $this->mailConfig->server;
}
}
- The config classes can, and should, also have additional methods on them that handle domain-specific behavior. For instance, the
GfxConfig
class (see the PoC below) has a $processor key with only 2 legal values, so this line from EnvironmentController:
[
'imageProcessingProcessor' => $GLOBALS['TYPO3_CONF_VARS']['GFX']['processor'] === 'GraphicsMagick' ? 'GraphicsMagick' : 'ImageMagick',
]
Could be moved into the class like so:
public function processor(): string
{
return $this->processor === 'GraphicsMagick' ? 'GraphicsMagick' : 'ImageMagick';
}
Proof of concept(s)
I have two PoCs to demonstrate the concept. One is a patch against core: https://review.typo3.org/c/Packages/TYPO3.CMS/+/70521
Tests are failing right now because 1) Adding composer libraries is a pain and composer.lock got borked again. 2) It’s using Doctrine annotations, and the cglGit rules apparently disagree with the Doctrine documentation about how to properly format strings in annotation arguments. I’m not interested in getting into that fight at the moment. The code itself is fully reviewable, however, and does appear to work in my manual testing.
The other is a PHP8-targeted standalone repository, which can be viewed here: GitHub - Crell/serializer-test: Just fiddling around with the Symfony Serializer. Ignore this.
This version uses (mostly) PHP 8 Attributes rather than Doctrine annotations, and takes advantage of Constructor Property Promotion. It’s working from a smaller sample config file (I just grabbed my LocalConfiguration.php
file as a test case), but shows off more robust cases with dependent, nested values. I would say this is a more accurate picture of what is possible/intended than the patch above.
Benefits
Direct raw access to a giant global array problematic for a number of reasons.
- Default values are not centrally managed, meaning they need to be re-handled everywhere that configuration is used.
- There is no guarantee that a given value is defined at all, requiring lots of extra null handling. (The first month I was working on TYPO3 consisted almost entirely of adding that handling to make PHP 8 work, and people are still finding places that need it.)
- There is zero type safety anywhere.
- Because of the previous points, the array structure is essentially undocumentable. Attempts can be made for out-of-band documentation, but the code itself provides zero information on what is even available, much less what it means or how to use it.
- Unit testing becomes more difficult, because every test needs to side-band-populate a global, then run the test, then decompose the global again. The testing framework has extra code to handle that right now, but that’s extra code that should not need to exist.
Using explicitly defined classed objects solves all of these problems.
- Default values are handled directly in the class definition as constructor arguments.
- The type system itself enforces that certain values must always be defined, period, even if with a default value. And if a value is not provided, the code will fail at the correct point with the correct error (a given required property/argument is missing in the configuration) rather than 5,000 lines of code later, far away from the source of the issue.
- Typed properties (especially with union types in 8.0) give us all the type guarantees we want.
- The structure is effectively self-documenting. A given configuration object has a name, its available properties are specified in code, their types are specified in code, their defaults are specified in code. Should additional documentation be necessary, a docblock right on the property guarantees a single-source of information, and one that is readily available to IDEs.
- Unit testing becomes trivially easy. Rather than accessing a global, the appropriate configuration object is a dependency of a class’s constructor, and thus injected by the container. For unit testing, manually creating the config object is a single line of code, and it can be passed to the constructor of the object under test. No additional plumbing necessary.
Additionally, a defined class provides a clear place to handle BC layers, such as multiple methods, duplicating properties if renaming them, etc. That makes long term evolution of the configuration tree easier.
Open questions
When
In discussion with Benni, he expressed interest in trying to get the framework in place in core for v11. Alternatively, we could do it very early in v12.
Benefits of v11:
- Gets the mechanism out there now, so people can get used to it.
- Allows us to deprecate direct access to the global in v12, and remove it in v13.
- Get a little feedback on it early.
Benefits of v12:
- Having looked through the use of the TYPO3_CONF_VARS array, there are a whole lot of uses that are not quite trivial to convert in this fashion. Mostly that’s due to how much of the system still does not use DI, but also because some parts of the array apparently are not entirely defined and are left mushy. This appraoch would not be compatible with mushy, so we’d need to sort out how to deal with those. That could take time.
- It’s very late in the v11 cycle, so that’s a lot of time pressure.
- The biggest advantage is that v12 is going to require PHP 8.1, most likely. PHP 8’s constructor property promotion syntax makes defining value objects such as these so vastly easier that just building the example code for the patch above was painful after getting used to 8.0. Really, CPP is the killer feature for PHP 8, and this is exactly the area it is most beneficial. Just compare the config classes in the patch and the separate repository to see the difference.
- We are going to need some level of annotation support, both to link the classes to their definition keys and to support property renaming, et al. (See the PoC code.) If we target v11, that means using Doctrine Annotations for now, and supporting them through at least all of v12. PHP 8 means we can go straight to (and only support) native attributes. Attributes are superior to annotations in every conceivable way: They’re more compact, easier to read, easier to write, easier to parse, and don’t require another dependency.
- PHP 8.1 will also support the readonly flag on object properties. While there’s nothing stopping us from having public properties for this use case even in PHP 7.4 (it does make sense here), the readonly flag would make it more robust. Adding dedicated getter methods would be pointlessly redundant in either case and we should not do it.
So while I think we could get it started in v11, the language version differences here are large enough that I’d personally favor waiting for v12 and avoiding PHP 7.4 and Doctrine Annotations entirely.
Which library to used
The basic mechansim would work with any reasonable serialization library; the basic code needed is really quite simple. However, it does make sense to use a proper serialization library simply for robustness, and because having a good serialization library readily on-hand would help with other tasks as well. There are two options for that.
-
Symfony Serializer. This has the advantage of already existing, already working, and already having a whole bunch of options. That’s a pretty big advantage. On the downside, it is so flexible that configuring it properly is a pain in the neck. See the giant block in the patch’s
Services.yaml
file for the bare minimum configuration I needed to get it to do what I wanted it to do. It also has a total of 9 dependencies, including transitively, including Doctrine Annotations. The PropertyInfo component is already in TYPO3, but the rest are not, so it’s not a small inclusion. As an older component, it also has a lot of older design decisions in it that complicate the architecture. Finally, as of right now it doesn’t support deserializing into readonly properties in 8.1. I’m assuming that will get fixed at some point, but it’s worth noting. -
As part of my various skunkworks projects I started work on a PHP 8-specific serialization library called Crell/Serde. It’s still early and right now only works on JSON (arrays are an intermediary step in that), but since it can assume PHP 8, attributes, and a couple of other modern things it should be much easier to flesh out. It has only 2 small dependencies (one direct, one transitive), both of which I also wrote. The much simpler design should make it much easier to use, and possibly faster. On the flipside, as noted it’s still not fully built, and fully building it will likely end up with something not entirely unlike Symfony Serializer, as despite its complexity it is a good model. (Definitely not that complicated because I can/will make a lot more targeted assumptions, but it is not going to stay as a single class, that’s for sure.) I’ve also tested its hydration mechanism with readonly properties already and it works fine.
I am confident that either library would get us to a successful implementation of config objects as described here, just with different challenges. Naturally I’m biased toward my own library, but that isn’t necessarily a deciding factor. I think the main question is whether or not we’re OK with the dependency weight that Symfony Serializer brings with it.
Future scope
Thinking longer term, this approach opens up a number of additional avenues for us to consider. In no particular order:
- Configuration could be canonically defined not via large arrays, but in the config classes themselves. An extension can ship with one or more config objects that serve as the definition for that object, and then the serialization tool can round-trip that to materialized arrays on disk, to YAML, to JSON, to the database, or whatever else. That’s a much nicer DX for developers, both for extention authors and extension users.
- Arrays are the least user-friendly, least-memory-efficient, least-CPU-efficient data structure in PHP, with one exception. An array that is fully defined in code and not modified is stored in shared opcache memory, not in process memory, and thus the cost of its definition is born only once, not one per process. On a site that is serving many request simultaneously that can be a non-trivial savings. Currently, the TYPO3_CONF_VARS array does not get this benefit as it is modified at runtime extensively. However, a follow-on to the previous point would allow for extensions to provide additional information via some more convenient format (or even just arrays as now), then core could build out a combined materialized array and store it to disk once. Then that is a single static memory use, and efficient to turn pieces of into small, efficient objects on-demand. (There are some properties that may not make sense to materialize, such as database information. Those really ought to be separated from the materializable data to begin with. This just provides still more incentive to do so.)
- Some of the existing TYPO3_CONF_VARS structures are quite large and deep, in particular the
SYS
key. It would be highly beneficial, both from a DX POV and a code POV, to break it up into smaller, more logical pieces. Again, this is already true but putting classes on top of it increases the incentive to do so.
None of these are immediate goals, but are the sort of things we can look at once the initial mechanism is in place.
Conclusion
Overall, I believe this approach or something very close to it is a good, solid step toward making TYPO3 more consistent, predictable, and testable for developers. That helps improve developer on boarding, as well as their ability to quickly build out other functionality on a solid, self-documenting foundation.
I now step back and again don my flame-retardant suit while I await your comments.