Object interface to TYPO3_CONF_VARS

crell · August 13, 2021, 9:24pm

I’ve been doing some experimentation recently (yes, more of it), and Benni suggested this was a good time to throw out my latest skunkworks for consideration.

Executive Summary

Using a Serializer library, we can “deserialize” TYPO3_CONF_VARS into defined classed objects that are exposed through the DI Container. That will overall improve the tractability, self-documentation, and stability of the code, as well as open up the potential for further improvements to the configuration system in the future with less of a BC impact.

Details

For every top level key in the $GLOBALS['TYPO3_CONF_VARS'] array, we define a PHP class containing properties that correspond to the keys in the array. Nested items are represented by nested objects. For example:

class MailConfig
{
    public string $transport;
    public string $command;
    public string $encrypt;
    public string $server;
    public string $username;
    public string $password;
}

We create a new service, for now I’m calling it ConfigFactory, that has a single method on it that takes a key and returns that portion of the config array, deserialized into the defined class. In concept it’s little more than this:

class ConfigFactory
{
    public function getConfigClass(string $key, string $class): object
    {
        $data = $GLOBALS['TYPO3_CONF_VARS'][$key] ?? [];

        return $this->serializer->denormalize($data, $class);
    }
}

That factory is wired into the container, as is a service entry for every top level key and the class that maps to it.
Now, any service that wants a given part of the site configuration can declare that as a constructor dependency by class name, and it will be injected with all the data from the array, with defaults, types, etc. already handled. Thanks to auto-wiring, in the typical case a given class would need to do nothing more than this (in PHP 7.4):

class SomeClass
{
    protected MailConfig $mailConfig;

    public function __construct(MailConfig $mailConfig)
    {
        $this->mailConfig = $mailConfig;
    }

    public function someMethod()
    {
        $mailServer = $this->mailConfig->server;
    }
}

The config classes can, and should, also have additional methods on them that handle domain-specific behavior. For instance, the GfxConfig class (see the PoC below) has a $processor key with only 2 legal values, so this line from EnvironmentController:

[
    'imageProcessingProcessor' => $GLOBALS['TYPO3_CONF_VARS']['GFX']['processor'] === 'GraphicsMagick' ? 'GraphicsMagick' : 'ImageMagick',
]

Could be moved into the class like so:

public function processor(): string
{
    return $this->processor === 'GraphicsMagick' ? 'GraphicsMagick' : 'ImageMagick';
}

Proof of concept(s)

I have two PoCs to demonstrate the concept. One is a patch against core: https://review.typo3.org/c/Packages/TYPO3.CMS/+/70521

Tests are failing right now because 1) Adding composer libraries is a pain and composer.lock got borked again. 2) It’s using Doctrine annotations, and the cglGit rules apparently disagree with the Doctrine documentation about how to properly format strings in annotation arguments. I’m not interested in getting into that fight at the moment. The code itself is fully reviewable, however, and does appear to work in my manual testing.

The other is a PHP8-targeted standalone repository, which can be viewed here: GitHub - Crell/serializer-test: Just fiddling around with the Symfony Serializer. Ignore this.

This version uses (mostly) PHP 8 Attributes rather than Doctrine annotations, and takes advantage of Constructor Property Promotion. It’s working from a smaller sample config file (I just grabbed my LocalConfiguration.php file as a test case), but shows off more robust cases with dependent, nested values. I would say this is a more accurate picture of what is possible/intended than the patch above.

Benefits

Direct raw access to a giant global array problematic for a number of reasons.

Default values are not centrally managed, meaning they need to be re-handled everywhere that configuration is used.
There is no guarantee that a given value is defined at all, requiring lots of extra null handling. (The first month I was working on TYPO3 consisted almost entirely of adding that handling to make PHP 8 work, and people are still finding places that need it.)
There is zero type safety anywhere.
Because of the previous points, the array structure is essentially undocumentable. Attempts can be made for out-of-band documentation, but the code itself provides zero information on what is even available, much less what it means or how to use it.
Unit testing becomes more difficult, because every test needs to side-band-populate a global, then run the test, then decompose the global again. The testing framework has extra code to handle that right now, but that’s extra code that should not need to exist.

Using explicitly defined classed objects solves all of these problems.

Default values are handled directly in the class definition as constructor arguments.
The type system itself enforces that certain values must always be defined, period, even if with a default value. And if a value is not provided, the code will fail at the correct point with the correct error (a given required property/argument is missing in the configuration) rather than 5,000 lines of code later, far away from the source of the issue.
Typed properties (especially with union types in 8.0) give us all the type guarantees we want.
The structure is effectively self-documenting. A given configuration object has a name, its available properties are specified in code, their types are specified in code, their defaults are specified in code. Should additional documentation be necessary, a docblock right on the property guarantees a single-source of information, and one that is readily available to IDEs.
Unit testing becomes trivially easy. Rather than accessing a global, the appropriate configuration object is a dependency of a class’s constructor, and thus injected by the container. For unit testing, manually creating the config object is a single line of code, and it can be passed to the constructor of the object under test. No additional plumbing necessary.

Additionally, a defined class provides a clear place to handle BC layers, such as multiple methods, duplicating properties if renaming them, etc. That makes long term evolution of the configuration tree easier.

Open questions

When

In discussion with Benni, he expressed interest in trying to get the framework in place in core for v11. Alternatively, we could do it very early in v12.

Benefits of v11:

Gets the mechanism out there now, so people can get used to it.
Allows us to deprecate direct access to the global in v12, and remove it in v13.
Get a little feedback on it early.

Benefits of v12:

Having looked through the use of the TYPO3_CONF_VARS array, there are a whole lot of uses that are not quite trivial to convert in this fashion. Mostly that’s due to how much of the system still does not use DI, but also because some parts of the array apparently are not entirely defined and are left mushy. This appraoch would not be compatible with mushy, so we’d need to sort out how to deal with those. That could take time.
It’s very late in the v11 cycle, so that’s a lot of time pressure.
The biggest advantage is that v12 is going to require PHP 8.1, most likely. PHP 8’s constructor property promotion syntax makes defining value objects such as these so vastly easier that just building the example code for the patch above was painful after getting used to 8.0. Really, CPP is the killer feature for PHP 8, and this is exactly the area it is most beneficial. Just compare the config classes in the patch and the separate repository to see the difference.
We are going to need some level of annotation support, both to link the classes to their definition keys and to support property renaming, et al. (See the PoC code.) If we target v11, that means using Doctrine Annotations for now, and supporting them through at least all of v12. PHP 8 means we can go straight to (and only support) native attributes. Attributes are superior to annotations in every conceivable way: They’re more compact, easier to read, easier to write, easier to parse, and don’t require another dependency.
PHP 8.1 will also support the readonly flag on object properties. While there’s nothing stopping us from having public properties for this use case even in PHP 7.4 (it does make sense here), the readonly flag would make it more robust. Adding dedicated getter methods would be pointlessly redundant in either case and we should not do it.

So while I think we could get it started in v11, the language version differences here are large enough that I’d personally favor waiting for v12 and avoiding PHP 7.4 and Doctrine Annotations entirely.

Which library to used

The basic mechansim would work with any reasonable serialization library; the basic code needed is really quite simple. However, it does make sense to use a proper serialization library simply for robustness, and because having a good serialization library readily on-hand would help with other tasks as well. There are two options for that.

Symfony Serializer. This has the advantage of already existing, already working, and already having a whole bunch of options. That’s a pretty big advantage. On the downside, it is so flexible that configuring it properly is a pain in the neck. See the giant block in the patch’s Services.yaml file for the bare minimum configuration I needed to get it to do what I wanted it to do. It also has a total of 9 dependencies, including transitively, including Doctrine Annotations. The PropertyInfo component is already in TYPO3, but the rest are not, so it’s not a small inclusion. As an older component, it also has a lot of older design decisions in it that complicate the architecture. Finally, as of right now it doesn’t support deserializing into readonly properties in 8.1. I’m assuming that will get fixed at some point, but it’s worth noting.
As part of my various skunkworks projects I started work on a PHP 8-specific serialization library called Crell/Serde. It’s still early and right now only works on JSON (arrays are an intermediary step in that), but since it can assume PHP 8, attributes, and a couple of other modern things it should be much easier to flesh out. It has only 2 small dependencies (one direct, one transitive), both of which I also wrote. The much simpler design should make it much easier to use, and possibly faster. On the flipside, as noted it’s still not fully built, and fully building it will likely end up with something not entirely unlike Symfony Serializer, as despite its complexity it is a good model. (Definitely not that complicated because I can/will make a lot more targeted assumptions, but it is not going to stay as a single class, that’s for sure.) I’ve also tested its hydration mechanism with readonly properties already and it works fine.

I am confident that either library would get us to a successful implementation of config objects as described here, just with different challenges. Naturally I’m biased toward my own library, but that isn’t necessarily a deciding factor. I think the main question is whether or not we’re OK with the dependency weight that Symfony Serializer brings with it.

Future scope

Thinking longer term, this approach opens up a number of additional avenues for us to consider. In no particular order:

Configuration could be canonically defined not via large arrays, but in the config classes themselves. An extension can ship with one or more config objects that serve as the definition for that object, and then the serialization tool can round-trip that to materialized arrays on disk, to YAML, to JSON, to the database, or whatever else. That’s a much nicer DX for developers, both for extention authors and extension users.
Arrays are the least user-friendly, least-memory-efficient, least-CPU-efficient data structure in PHP, with one exception. An array that is fully defined in code and not modified is stored in shared opcache memory, not in process memory, and thus the cost of its definition is born only once, not one per process. On a site that is serving many request simultaneously that can be a non-trivial savings. Currently, the TYPO3_CONF_VARS array does not get this benefit as it is modified at runtime extensively. However, a follow-on to the previous point would allow for extensions to provide additional information via some more convenient format (or even just arrays as now), then core could build out a combined materialized array and store it to disk once. Then that is a single static memory use, and efficient to turn pieces of into small, efficient objects on-demand. (There are some properties that may not make sense to materialize, such as database information. Those really ought to be separated from the materializable data to begin with. This just provides still more incentive to do so.)
Some of the existing TYPO3_CONF_VARS structures are quite large and deep, in particular the SYS key. It would be highly beneficial, both from a DX POV and a code POV, to break it up into smaller, more logical pieces. Again, this is already true but putting classes on top of it increases the incentive to do so.

None of these are immediate goals, but are the sort of things we can look at once the initial mechanism is in place.

Conclusion

Overall, I believe this approach or something very close to it is a good, solid step toward making TYPO3 more consistent, predictable, and testable for developers. That helps improve developer on boarding, as well as their ability to quickly build out other functionality on a solid, self-documenting foundation.

I now step back and again don my flame-retardant suit while I await your comments.

kaystrobach · August 14, 2021, 4:27pm

Thank you for the proposal. Loving it.

Maybe a mixed approach makes sense. Hide it behind a feature flag and require php8, or go straight for v12.

Please go ahead

crell · August 16, 2021, 2:52pm

I’m not sure that’s feasible. It would mean we have to switch a compiler pass to only work on certain PHP versions (doable, but not ideal), and core couldn’t use this API at all.

At that point, it may make more sense to build it for v12, then once we know what that looks like develop a backport extension for v11 for extensions that want it. (Assuming extensions can add Compiler Passes? Can they?)

layne.obserdia · August 17, 2021, 5:44am

They can. They can create a Services.php which can contain arbitrary code, including adding custom compiler passes. I use it to auto tag and register classes implementing interfaces into registries all the time. I’m to lazy to manually register them.

Concrete examples:
https://github.com/werkraum-media/thuecat/blob/v1.0.3/Configuration/
https://github.com/werkraum-media/thuecat/tree/v1.0.3/Classes/DependencyInjection

crell · August 17, 2021, 2:31pm

Ah, good good. I think my recommendation then (given that no one seems to be against this) is to implement it first for v12, figuring out the library to use and the other details, and once that’s merged make a backport extension for v11 that requires PHP 8(.1) and let people get used to that if they are so inclined.

crell · October 22, 2021, 9:50pm

Just a brief update here:

I’ve spent a lot more time working with the Symfony Serializer, and determined that it is NOT a viable alternative after all. It’s missing two key features:

Collecting data into an “other” property on deserialization, and flattening it back out on serialization. Given that the config array has a number of “add other stuff here” type areas, that’s a deal breaker. It’s also not possible to add, as far as I can tell.
It supports static discriminator maps for properties that are of an interface, and can then be multiple types of object. However, dynamic discriminator maps (eg, based on what extensions are installed rather than hard coded into the class) are not. You can kinda-sorta write your own discriminator, but then you lose the static support, and the documentation for doing so is absolutely nonexistent.

For that reason, I’ve spent more time working on Serde and it’s coming along very nicely. It’s actually a bit closer to Rust’s Serde library than to Symfony Serializer. The library itself has tests that show off how it works. (See link in the original post.) It supports both of the features above, and then some. The only thing it doesn’t do, really, is serialization groups, although that could be added if we need. Right now it supports JSON, YAML, and arrays; I haven’t written an XML formatter for it yet, but I am confident that can be added when necessary.

It’s not quite done yet, but it already does everything we need for handling the conf array. Also, in my benchmarks it’s ~6 times faster than Symfony Serializer at the same task, and the codebase is vastly smaller, too. I have a direct comparison with benchmarks, and some practical examples of objects for the config system in a demo repo. Feel free to check both out. Naturally I’ll also write extensive documentation once I’m certain I won’t be changing the architecture anymore. (There’s still a few changes that may still happen.)

Once v12 is branched and updated to require 8.1, I will get back to a real patch against core to make use of this setup. I’m pretty happy with where it’s ended up.

crell · November 4, 2021, 5:06pm

And another update. On a lark, I decided to try applying the same deserialization process to TCA as I was applying to CONF VARS. And after some additional functional additions to Serde, it’s coming along nicely.

The demo repo has a SerdeTCA directory and matching test class that demonstrates how it would work. I don’t have the entirety of TCA modeled, but it does have a representative sample. Of particular note:

I support flattening/collecting of multiple object parameters in a single object. That means we can take a large set of keys and pull them apart into two separate object properties. Such as type and renderType on column config.
The previous works with type mapped interfaces, so we can have a RenderType interface and property, which gets populated with the appropriate render-type-specific class instance.
Fields that are a comma-separated string list of values can be deserialized into an array.
Similarly, fields that are a comma separated list of key=value pairs can be deserialized into an associative array.
Custom per-object serialization/deserialization is supported using PHP’s built in __serialize()/__unserialize() methods. That means Select field item lists, which are currently magic-position arrays (generally undesireable), can still be upcast to proper struct objects.
If desired, the same approach could also be used on the key-value strings to turn those into objects, as long as the keys are known and fixed.
Enums all over the place, because PHP 8.1.
Although it’s not shown in the demo repo yet, there is also support for post-deserialization callbacks. That can be useful for BC and translating one legacy format into another on-load.
It’s still several times faster than Symfony, though I’m pretty sure there are other performance optimizations that could be made if we want to push it.

I still want to add a few more features to Serde, in particular around field renaming and then more varied formats (XML, streaming JSON, etc.). And I still need to bikeshed the names of a few components in its design that I just don’t like. Either way, I think the potential here to objectify the legacy arrays in a BC-friendly way is enormous.

jonaseberle · November 5, 2021, 8:11am

Having an object structure will make working with configuration so much easier and more fun.

How could an API to set config values during early bootstrapping (AdditionalConfiguration.php, …) look like?

Something like

Container::get(LocalConfiguration)->frontend()->debug = $_ENV['...'];

?

Sorry if that should have been clear and I haven’t grasped that yet.

crell · November 5, 2021, 1:38pm

As I have it setup right now, the “mutate the array in bootstrap before it’s used” part doesn’t change. So AdditionalConfiguration.php, DefaultConfiguration.php, etc. don’t change at all, except that DefaultConfiguration.php may be unnecessary in the future as defaults get handled by the class definitions.

By design, the objects that get loaded are all read-only so that they cannot be modified post-load. (That keeps one extension from pulling the rug out from under another, or weird race conditions, etc.) The big advantage is for reading config from extension or core code, as that can be just injected objects with type-safe readonly properties.

The other advantage is that because you’re no longer reading from global arrays directly, we decouple the read and write sides. We could then change how values get set and stored to something else entirely, and then change the logic that translates that to the objects that get read, and the read-side API doesn’t have to change. It gives us more flexibility to evolve over time.

One possibility would be to move the current “build the array” logic to a build time step, rather than on every request. Then as part of build we take the resulting giant array, deserialize that to objects, and then reserialize that to a standard-format array that we dump into one single giant generated file on disk. Because that array is entirely on disk and immutable, its memory usage gets moved to shared memory rather than process memory. We also skip the “edit the array” step on every page request, although we’d still have the deserialization cost to convert that to an array on every request. (Looks like that will be small but non-zero.)

jonaseberle · November 6, 2021, 12:17pm

Not using that for places where developers are currently building the array IMHO means we are passing on a great opportunity to improve developer experience. Because setting configuration is much more common than reading outside of the core.

If it is “only” for improved static code analysis, it is a bit overkill IMHO.

Regarding the configuration ecosystem in TYPO3: IMHO all the through-the-web configuration possibilities that are writing configuration should be dropped. For professional application those are more of a potential source of problems than anything else. This really includes all settings (including TypoScript and TsConfig). It is conceptually not sustainable IMHO. Having all settings structured with description in @phpdoc and proper type information will give us a good chance that developers are willing to adopt that.

That is just my opinion. But if we give the configuration structure and context (which is amazing!) I would like to keep that in mind.

crell · November 6, 2021, 4:30pm

I wouldn’t call it “only” for static analysis. There’s gobs of code throughout core (and I presume extensions) that reads from a global array, checks for missing values, checks for bad types, does some kind of trivial processing that’s repeated in 5 places, all to, essentially, see if a value is a 1 or a 2. By moving that data to typed defined object properties, most of that either goes away or becomes a method on a known object, making code throughout the system easier, smaller, and more self-documenting. AND then that object is injectiable via DI, making more classes unit testable. I also see this as a useful component in revisiting content storage itself, in the future.

I am totally on board with revising the entire configuration system and philosophy system-wide. There’s a lot of work to be done there, and I am here for it. The main point at the moment is separating the read side from write side; we want read-only objects on the read-side, but readonly objects make the write-side kind of a problem. What the ideal answer is there I’m not sure yet, but the goal for now is essentially divide-and-conquer.

(As a random example: I am fairly certain from the work I’ve done so far that between AttributeUtils and Serde, we can take an arbitrary object and produce a YAML version of it, which then serves as a template for a config file version of that configuration. Poof, we’ve just ported the config from PHP arrays to YAML. Is that what we want to do long term? I don’t know, but this approach puts that on the table, along with various other options we can consider.)

jonaseberle · November 7, 2021, 9:57am

I prefer PHP honestly. But YAML with Expression Language is ok, too. It’s just not that accessible.
A common use case for configuration is setting it from or conditionally depending on environment variables.

We don’t really need writing configuration back to persistence (yet - but I love your thinking in regards to content storage or general serialization/hydration) IMHO. But being able to change configuration objects during application bootstrap would be nicer for developers than working with arrays.

crell · November 7, 2021, 3:05pm

A common use case for configuration is setting it from or conditionally depending on environment variables.

See, I’d argue the Symfony approach is better there: Have separate files (in whatever format) per environment, which then get merged together into a single per-env cached configuration. (Could be array on disk, could be part of the DI system, lots of ways to shave that yak.) Anything that is run on every single request in bootstrap that doesn’t absolutely have to be is an unnecessary performance hit, and architectural complication.

The challenge with exposing mutable objects for bootstrap manipulation, although they would certainly be more self-documenting, is that we then cannot make them immutable at runtime. There’s a lot of safety that comes from knowing another extension cannot change the config out from under you at a random time, and IMO we want that.

Technically it would be possible to have mutable and readonly versions of config objects (or TCA), pass the mutable one around in one phase, and then transfer it to the RO version to pass around in later phases. However, because PHP object properties are invariant that would require writing two complete sets of objects, or using some form of code generation to produce the alternate version. (Technically possible, but icky.) That also adds still more overhead on bootstrap, which we likely want to avoid. If we really wanted to go that route, I’d say that mandates a compile-time mutable variant that ends up in a materialized array, and then a runtime readonly variant that is all that’s used in a request. I’m still skeptical of that, however, for the complexity it introduces.

helhum · November 10, 2021, 10:19am

I’m all in with revising the configuration system and I agree that the global array access should be replaced with something different in the future. I also agree that this needs to be done in a “divide-and-conquer” fashion.

What I disagree with, though is starting the topic with representing global config arrays with objects.

We are (still) lacking a point in the bootstrap where we can assume that configuration is final.
And that is already something that needs to be solved before we can change how configuration is read.

There is the point where DefaultConfiguration, LocalConfiguration and AdditionalConfiguration is read, but later on (sometimes but not always) ext_localconf.php files are read. These files (which is worked on to make them obsolete) are still the only way for extensions to add or change configuration. So we can’t drop that, but we also can’t just have one immutable config array at the moment. So before doing any action in improving on accessing configuration, we must resolve the issue how configuration is provided.

If that is solved, we could start with the reading side, however I would still think that working on an interface for providing configuration for different environments and/ or for providing credentials would add more value to the system than changing the way configuration is pulled from an array (in the end configuration will have to be represented as a bunch of key value pairs).

So thumbs up for the initiative, that part looks like a nice approach, which we can keep in mind. Let’s start with revising how configuration is handled in TYPO3 12.

crell · November 15, 2021, 4:51pm

Hi Helmut.

From talking with Benni, it sounds like ext_localconf.php etc. are already slated to be replaced with Events in v12. Either way, there’s still the point of those having been run (or their replacement events having fired), after which it should be safe to say that the config is “frozen” and anyone using it after that bootstrap step just doesn’t get to mess with it. That would make an object front-end safe for 99% of the codebase, even if there’s a few places in bootstrap itself where it may not be.

As for revisiting all of configuration, yes, very much needed! Which we tackle “first” depends on our desired outcome and what the new overhauled configuration system may look like. I am churning some ideas in my head, including different possible directions, which I will be posting about shortly (later this week, probably). I’ll link it here when it’s written. In short, we can take an incremental approach, in which case starting with this translation layer is probably good, or we can take a Go Big Or Go Home approach, in which case this kind of translation would be just a part of a larger design that may have more cumbersome BC implications. Classic risk/reward trade off.

helhum · November 15, 2021, 11:44pm

A lot of things are replaced with Events already and a lot more will. Providing configuration very likely won’t. While this would be theoretically possible, I don’t think it makes much sense to do so. Which leaves us with the issue, that we have configuration before and after loading those extension files.

The entire code in the install tool works with configuration without ext_localconf being loaded. Even worse, there is code that runs without and later on reads those extension files.
Same is true for a couple of crucial cli commands.
Immutable config objects for reading with the current config system will only work out, when building two immutable versions: one for before on one for after ext_localconf.php are loaded. And that is exactly the kind of hassle I would like to avoid.

I’m all in for incremental approaches. That is what we practice fo years now in different areas. That doesn’t mean though, that all incremental steps are always completely independent of each other. One of the areas btw is getting rid of ext_localconf.php. The the last step for getting rid of those is to introduce a clean way for extensions to provide (system) configuration. It is a relatively small step independent from making configuration immutable, but still needs to be done beforehand to avoid the immutable config step becoming too messy.

After that I personally would still prioritise working power for conceptual work for a revised config system (where concepts and production proven code btw. already exits for a long time already), because I think it will benefit a broader audience, but true, at this point both could be done mostly independent of each other.

crell · December 6, 2021, 4:37pm

Follow ups to this thread:

Part 1: https://decisions.typo3.org/t/the-state-of-configuration/723
Part 2: https://decisions.typo3.org/t/configuring-a-better-configuration/727