Configuring a better configuration

crell · December 6, 2021, 4:36pm

In our last exciting episode, we laid out a picture of what TYPO3’s configuration story is today in v11. The discussion thread pointed out a lot of other axes of variability that I hadn’t considered, which is great. Today, I want to offer some thoughts on how it can be re-imagined to be better.

The most important question is “what do we want in a configuration system?” Or, put another way, “what capabilities do we want to explicitly say are not supported and not included, by design, and we’re OK with that?” (Design is the art of deciding what to say No to.)

Depending on what we want to prioritize, there are a number of different directions that we could take. What I lay out below is a few possible paths, and their trade-offs. I do not claim it is every possible path, just the key ones I can think of. The intent is to encourage ideation and debate about what trade-offs we want to make, explicitly.

Core requirements

To start off, these are design decisions that I firmly believe we should take, no matter what other alternatives we build on top of them.

PHP class schema

All configuration should be defined canonically by PHP classes. Properties of those classes must be typed, and have defaults specified if appropriate. Descriptions of each property are the purview of docblocks, which can still be parsed by the system to auto-generate GUIs as appropriate. This provides a clear schema, type safety, self-documentation, and if used correctly scope-locality. There is also a fair bit of automation tooling already available, and my work recently on Serde makes loading into and out of classes to standard formats both fast and easy.

Clustered configuration

A twin recommendation with the previous point, all configuration is provided as classes/objects. There are no free-floating configuration values that are bare primitives. Everything is a config object of some kind or another, and a class or other component that wants a piece of configuration accepts it as a constructor-injected object.

There will no doubt be cases where that means an extension has a config object that has only one or two properties. That’s fine. Recall Garfield’s Law: “One is a special case of many.”

In other cases, a given config object may have many nested objects as properties. That’s also fine. Those objects then are not directly accessible outside of that config object. (That is, you cannot XPath your way into some deep property or similar.)

This also means one configuration object must not depend on another configuration object. If you need that, you need a single deeper object. (This may or may not be a problem; it depends on some nuances of what configuration we have that I haven’t figured out yet.)

No core privilege

Core-provided extensions SHOULD NOT be treated any differently than user-added extensions. From the configuration system’s point of view, they should be identical.

Invariably, providing different APIs depending on whether something is pre-bundled or not leads to inconsistencies, bugs, difficulty moving code from packaged to not or vice versa, and creates all kinds of additional conditionals (try saying that five times fast) along critical code paths. The fewer conditionals we have, the more stable and predictable the system.

That means TYPO3_CONF_VARS['EXTENSIONS'] has no equivalent and goes away entirely, because anything that would live there ends up on the same “level” as core-provided configuration.

A given extension should be able to provide any number of config objects as it sees fit: 0, 1, or lots.

Environment is its own configuration

Environment-specific configuration should be handled by a completely different system that does not intersect with Configuration proper in any way. I think the most straightforward place to put it would be to add it to the existing Environment class, but also convert that class to an object instance.

My recommendation here is to define a file (or just repurpose AdditionalConfiguration.php) where certain very specific globals can be set, although they are only global in that file, not overall. (That can be done by pre-declaring the variables inside a function, then including the file.) If those variables are not set, pull their values from well-defined environment variables. Then add in a .env library (there are ample, pick any) for development purposes, and we offer a variety of ways to define those values as a given host requires.

Those values are not exposed as globals; they are exposed only through the Environment class, or another similar class (or classes?) built for that purpose. The exact structure here I’m not certain about, but that requires a bit more detailed analysis than I want to get into here. Either way, it moves environment-sensitive configuration out of the non-environment-sensitive configuration, which is the goal.

Environment configuration makes no sense to make GUI-editable, so we shouldn’t even bother.

Environment-type overrides included natively

Dev/stage/prod toggles are an important enough consideration that support for them should be built into the system from the start. It’s not something site builders should have to reinvent themselves. Build in such override capability.

Systems like Symfony already provide a useful model to mimic here, so let’s do that.

Compiled representation

We SHOULD make use of configuration pre-processing. Just as the container is built at build time into generated files, we can and should do the same for configuration to produce a high-read-throughput representation. This generated code can and should live in its own writable directory that is never committed to Git.

My recommendation here, since all configuration is PHP objects, is to leverage var_export(), combined with a simple trait or pair of traits (see below) that will allow those objects to be re-read back into PHP. I don’t think it is possible to get a faster read time.

Aside: `var_export()`

PHP’s little-used var_export() function takes an arbitrary PHP value and generates the PHP code that will reconstruct that value. That works great for simple values, but for objects it’s slightly more complicated. The generated code calls a static magic method, __set_state(), that acts as a named constructor to reproduce the object from the data snapshot. For an object to be re-readable, it has to implement that method itself.

Fortunately, in most cases, the logic for such a method is fairly basic and can be easily encapsulated into a trait. There are two cases to consider. The generic should work with nearly any object:

trait Hydrateable 
{
    public static function __set_state(array $data): static
    {
        static $reflector;
        $reflector ??= new \ReflectionClass(static::class);
        $new = $reflector->newInstanceWithoutConstructor();
        foreach ($data as $k => $v) {
            $new->$k = $v;
        }
        return $new;
    }
}

Alternatively, the following trait is, I think, a bit faster in the special case where all properties correspond to constructor arguments. With the advent of constructor property promotion (PHP 8.0), named arguments (8.0), and readonly properties (8.1), that’s going to be a fairly common case, especially for configuration objects.

trait ConstructorHydrateable 
{
    public static function __set_state(array $data): static
    {
        return new static(...$data);
    }
}

That means we can make a requirement for any config object that it implement __set_state(), provide the two above traits to cover the 99% case, and then persist all config objects by var_expotr()ing them to files that we can just include later to rehydrate the configuration system as needed.

Open questions

The following are still-open questions that exist regardless of the direction we take. I don’t yet have an answer or strong recommendation on them. Input very welcome.

TCA

At this point, I do not want to consider TCA directly. TCA is better looked at on its own as part of a re-evaluation of the content modeling story, which is a separate line of work. That re-evaluation may end up leveraging a more generic configuration system as described here, or it could be something else entirely. For now, I strongly recommend we just punt on it.

Human-editable file format

If we take a route that includes human-editable files on disk, we need to decide on the format to use for the configuration files. The obvious possibilities are arrays, JSON, YAML, and XML. TOML is technically also an option these days. (Serde can already handle arrays, JSON, and YAML; XML and TOML I am confident could be written and are already on my todo list). All have their pros and cons.

Frankly, I am not entirely sold on any of them, mainly for their lack of self-documentation. As noted in the last installment, XML is the most self-documenting but is not particularly popular for other reasons. JSON lacks comments. In YAML, it’s easy to screw up the whitespace and it has no schema system of its own; sometimes people borrow JSON Schema but that’s an imperfect fit and IDE support is middling.

One fun trick to improve usability here is to take the PHP object with only default values specified (assuming that creates a valid object, i.e., everything has a default), and serialize that out to whatever format is used. That creates a convenient, automated template for new configuration files.

This all also means having file-per-config-object, rather than a single file for all the things. I believe that is a good thing.

Load-time config alteration

I am undecided if we really want to allow extensions to manipulate other extensions’ configuration arbitrarily. Currently that’s technically possible via AdditionalConfiguration.php, but it becomes considerably more complex when dealing with discrete loaded objects rather than a huge array blob. In particular, it may be incompatible with using readonly properties, which would be a huge win for avoiding “Spooky action at a distance” (SAAAD).

This also impacts the question of registering additional options (eg, adding new image backends), which I also don’t have a clear answer to yet.

I would prefer to avoid this if possible, but I am not sure if that’s feasible. Input welcome here from people doing more esoteric things.

Site-specific configuration

Several people in the previous thread brought up the question of site-specific configuration, or configuration overrides. That becomes another axis of override in addition to environment type, and thus a multiplicative number of possible places configuration should come from. That also adds a question of which axis has priority. If there is an override for the graphics system, for example, for “dev sites” and for “the math department” site, then what do you load when viewing the math department in a dev environment?

If there is a need for a third axis as well (I don’t know what that would be, but I don’t believe for a moment that two is the maximum number that could exist), this question balloons very quickly. I do not yet have a good answer for it, but if we want to support more than just env-type based overrides we will need a generic answer.

It also means that any axis on which configuration can vary cannot itself be part of the configuration system. That is, if we allow configuration to vary by site, then sites are not configuration. They need to be some other file-defined, no-UI concept.

I believe we really should consider an answer here of “disk is cheap, make a new install.”

Per-page configuration

This is an override axis I believe we absolutely should not support. Not because there aren’t use cases for it, but because per-page data… isn’t configuration in the first place. It’s content. Content can and should vary by page, and the content model can be a lot more robust than what TYPO3 has today. But that is a separate topic for another time, and not part of configuration.

I submit that per-page data is not configuration and thus out of scope. There are other ways that should be done.

Alternate futures

With the above recommendations in place, there’s a few broad routes we could take. All of them have trade-offs. Even if we include more override axes, these basic models would still apply.

Admin-only configuration

This is similar to the route used by many frameworks. In this case, configuration is controlled by admin-edited files on disk (in some format, YAML, XML, array, whatever). These are, at build time, loaded into memory one at a time, mapped to a typed object, and then that object is var_export()ed out to a generated directory. (This could be a separate file per object, or a single file with a keyed array of all objects, or some kind of clustering. That’s an implementation detail we don’t need to think about for now.)

Then at runtime, through the DI container we allow classes to be loaded by name; loading a given class becomes just including its exported file, which hydrates the object as fast as PHP is capable of. That object is then cached as any other container instance and injected into whatever classes want it.

The build process loads configuration objects from a series of places, using whichever one is found last. (Likely it would scan in reverse order, but in the list below later items override earlier items.)

The Configuration directory of the extension that defined it.
A site-specific Configuration directory. (Similar to typo3conf, but a directory that is checked into Git.)
An environment-type-specific Configuration directory. Likely this is a sub-directory of the main site-specific directory.

This does mean defining a global value to indicate what “environment mode” the site should be built in.

Additionally, for simplicity I would suggest that config objects are only overridden completely. If you override a single value, you have to duplicate the whole file. That makes the loading process easier than trying to do a deep merge on the files first before materializing them. That’s doable, but it’s also trickier. It also makes it more difficult for users to figure out where a given value is coming from. A similar whole-file override is common in templating engines, too, where you override a whole template file at once.

Benefits

It’s simple. As configuration systems go, this is a fairly straightforward model.
Because the canonical version of all configuration is files on disk, deploying configuration changes is trivial. git commit && git push, then rerun the build process. The build process can also be database-free, making it safe to run in a cloud host’s build step.
It means changes cannot be made in production and then forgotten, to be overwritten later.
It’s fast. I don’t expect we’ll get a faster read time than var_exported() objects other than raw arrays, but only if we read the raw arrays directly, which we want to avoid for DX reasons.
We can use readonly properties on objects. That allows us to expose the properties directly (faster than method reads, and less work to implement than a bunch of 99% pointless getter methods) without risking SAAAD.
All configuration is available without the database. That’s a win for performance, simplicity, and avoiding weird circular dependencies.

Drawbacks

There is no mechanism for editing configuration through the GUI. At all. For some configuration that may be seen as a regression. For some cases, I expect editing a config file (if well defined) to be much easier than navigating through a GUI. For other cases, it’s likely the other way around. But this approach does lock us into no-GUI configuration. (If you need to step outside of that for whatever reason, you step out of all support for the common config system and you’re on your own.)
I do not know what this does to the Install Tool/Rescue Tool. Aside from cache clearing, it would not be able to edit most configuration because most configuration is not editable to begin with. Whether this is a bad thing, or a good thing because configuration is then harder to screw up in the first place, is a debatable question.

Admin-only with database overrides

This is an extension on the previous model. Same basic idea, but individual config objects MAY be overridden via an admin GUI. I suspect the GUI can be auto-generated in most cases just based on the type information, give or take a few extra attributes. The edited version of the config object is saved into the database, serialized to JSON (easy to do with Serde), in a JSON column in a simple key/value table. When loading an object, there’s another override “layer” beyond what is listed above of “check the database for it.”

The generated admin form would need an indicator for two statuses:

You’re about to override configuration
This configuration is already overridden and you’re editing the overridden version

It would also need a “Clear overrides” button to toss the copy in the database and go back to what is on disk.

Optionally, we could have a flag (attribute or interface, TBD) on a config object to either opt-in or opt-out of having an admin form auto-generated, so that some config objects are still admin-editable-on-disk only.

Benefits

This somewhat gives us an in-between state between file based and GUI based configuration.
For the limited cases where configuration makes sense to have overridable per-instance, it’s fairly easy to do.
The overall system is still fairly simple.

Drawbacks

Reading from the database is always going to be slower than reading from disk, and in practice most objects won’t be there. That means an awful lot of extra DB traffic to find no records. (There’s likely optimizations to make here to make it not too bad, but there is unavoidably going to be overhead.)
Any overridden configuration is un-syncable. If you configure something in your dev environment, replicating that over to staging and production is a manual process to replicate your changes in files.
It may still be possible to use readonly properties, but it would mean doing the form handling in very specific ways, or building/automating some kind of with-er set of methods. Some heavy duty code generation is also possible, but then we’re dealing with heavy duty code generation. There’s potentially a lot of complexity lurking here.
Access control for who can edit what config objects becomes a thing we need to worry about.

Full round-trip

This model is largely inspired by Drupal 8/9. It does something essentially similar to this, but with a lot less syntactic structure. (To be fair, PHP’s object syntax was far weaker in 2014 when it was designed.)

In this case, the canonical source for configuration objects is a JSON column in the database. That may be dumped out to exported objects as above for performance, but the canonical source is the database.

Extensions ship with their config class(es), which include the default values for any property. The system then loads those on extension install and promptly serializes the objects to JSON to dump into a key/JSON-blob table.

Admin GUIs can be auto-generated for most objects (again, modulo some additional attributes, etc.), allowing for most configuration to be user-editable.

There is some trigger that can be fired that exports all configuration objects, or some subset of configuration objects, out to a directory in some file format (YAML, XML, etc.). That directory can then be checked into Git and deployed. On the receiving end, the import process consists of deserializing all of those files to their respective objects, then immediately serializing them back out to JSON in the database, overwriting whatever is there.

Config continues to be read from the DB or from a materialized PHP export file. That directory full of files is just a syncing tool, and is not intended to be user-edited.

In Drupal’s case, the whole directory is exported/imported at once, and everything in the DB is wiped first. That way, deleted config objects (the instantiated objects mentioned previously) are cleared along the way. The alternative would be some sort of tombstone record instead, although how long we keep those around is an interesting question.

Drupal also discovered after the fact that some config does need to be overridden at runtime, in order to handle environment-specific configuration. (E.g., setting up a Solr index is done via the UI, but then the server URL needs to be overridden from environment config.) This part of the system is a total mess, and why I firmly believe we should have an entirely separate environment system, thus avoiding this situation entirely.

Reading configuration in this approach is the same as the two previous: All configuration objects are available through the DI Container to be constructor-injected as needed, loaded from whatever backend we end up deciding is most flexible and performant.

Potentially, it may be useful to have a flag in production (or some set of environments) to disable the GUIs, so that changes cannot be made in prod where they couldn’t be exported anyway. (The export directory is, ideally, readonly on disk.) However, that then begs the question of where that flag is stored if not in configuration. Part of the environment variables, perhaps?

Benefits

This has the most flexibility. It offers both GUI editing and deployability, which is a hard combination to get.
If done correctly with an export-materialized file, the runtime performance can still be pretty good.

Drawbacks

This is by far the most complex approach. More moving parts means more complexity, which means more places for bugs and design flaws.
All that syncing introduces new workflows for site admins they may not be comfortable with.
A lot of the configuration probably really shouldn’t be user-editable, as the odds of it breaking things are high. That means we would still need some kind of opt-in/out flag.
As with the previous approach, this makes having an immutable object much more difficult. I think in practice we’d have to either skip readonly and either go with straight public properties or get/set pairs (eew, gross, plus potential for SAAAD), or go for heavy code generation to auto-create write-versions of config classes. The problem there is that you cannot change the readonly status in inheritance, so the read and write versions of a class could not be compatible types. It also means duplicating any methods. This gets complicated very fast.
As with the previous approach, this opens up a large box of access control questions to consider.
I am not entirely clear how the DB interacts with doing environment-type-specific overrides. It’s something we would need to implement, but I don’t understand how it would work, other than having env-specific files take precedence over the database, which means the GUI needs to be able to detect that and disable the form/switch it to read-only or whatever. Still more complexity.

Analysis

While I think the problems in approach 3 (full round trip) are surmountable, they are definitely not small. That would be the most work, and the most complex work, and could undermine many of the benefits we want to get from using PHP classes directly in a readonly way.

Currently, the majority of configuration, as far as I am aware, is not user-editable. There’s a lot more that is TYPO3_CONF_VARS only than is exposed in the UI. So how much of a regression it is, I’m not sure.

TYPO3, historically, has put a lot of its configuration in files to begin with. That makes options 1 or 2 more viable than in a system like Drupal or WordPress where GUI-config is their entire selling point, so dropping that is not even worth considering. That doesn’t mean we have to take the file-only approach, but it’s an option where it wouldn’t be in many other systems.

Wither TypoScript

Something I have not touched on at all so far is TypoScript. TypoScript is in a sort of weird place where it is a proprietary declarative config system and a proprietary declarative template engine at the same time. For now, I’d prefer to avoid dealing with the template engine side. The configuration side, however, should transition over to whatever the new system looks like.

Multiple systems?

In the previous thread, someone mentioned splitting “configuration” (canonically in files on disk, no GUI, site-admin-editable-only) from “settings” (canonically in the DB, has a pretty GUI).

That’s an interesting way to approach it. At least some of the same tooling could be used under the hood (Serde, any sort of var_export() materialization, etc.). However, I am skeptical of the amount of work involved.

I am also skeptical about our ability to put a line in the sand and say “this is admin editable and deployable, this other thing is GUI-editable and not deployable, that choice has been made for you, deal.” For people used to big-amorphous-blobs, that is likely going to be met with “you made the wrong decision on this object!” (probably about every object by at least someone).

One possible way to think about that is to take approach 2, with an opt-in. Everything is on disk, everything uses a materialized object, but select objects can opt-in to a GUI and thus lose deployability. My concern here is extension developers saying “I want maximum flexibility, and GUIs are shinier, so I’ll opt everything into the GUI,” and suddenly we have nothing that can be safely deployed. That’s not a good place to be.

Also, as noted above: Per-page data is not configuration, it is content, and should be treated as such. (And thus use a content lifecycle workflow, whatever we determine that to be, later.)

Transition

Naturally, this is all very different than TYPO3 today. Fortunately, I think it is sufficiently better to justify the change. It also offers an opportunity to re-imagine the current TYPO3_CONF_VARS structure. This is an opportunity to break the configuration up into more discrete, logically-clustered objects that can be more explicitly defined, with a shallower design (rather than the extremely deep and unwieldy SYS block).

In practice, that likely means that for v12, the TYPO3_CONF_VARS array continues to exist but is not read directly; instead, its values are mapped over to the new system at build time (details hand-wavy at the moment) if the “new style” configuration files are not provided. Then it can be removed in v13 (or maybe v14 if we feel the mapping code is sustainable, TBD).

DefaultConfiguration.php and DefaultConfigurationDescription.yaml go away entirely, as they become just the default values provided in the config PHP classes.

LocalConfiguration.php would also go away, long-term, and be replaced by “a directory of whatever files we decide to have.”

AdditionalConfiguration.php changes purpose to be just for environment value glue code, as noted above. I am not sure yet exactly how we want to tie that to the Environment object. It may make sense to have a clean-break and just have a new file entirely to replace it, allowing all three of these files to continue to exist for v12 to support TYPO3_CONF_VARS until that can be removed in v13.

Conclusion

I thank everyone for their time in reading this lengthy analysis. I like to be complete, but I’m sure I have missed something along the way, because humans. My intent here is to provide a solid ground on which we can discuss what we want the future to look like.

At the moment, I am personally leaning toward option 1 (file only), possibly with the extension of option 2 (with DB override that is non-deployable). I’m not, yet, convinced that the complexity of option 3 is worth it.

It’s entirely possible that there’s a 4th approach I haven’t thought of, but hopefully the above will get the thinking juices flowing. If you have an alternate suggestion, post it below and I’ll try to think through its implications. The goal here is to build a consensus around what option to pursue, and therefore what features we are willing to commit to not having, so that we can then go build it with a consistent vision. This is definitely going to require a team to build it, so we need to get the team in sync.

I now don my flame retardant suit.

masi · December 8, 2021, 9:41pm

A quick one on format: in the Node world some packages use JSON5. It simply adds the few things that turn JSON from a transfer format into a configuration format. If it must be JSON I’d vote to add support for JSON5.

Sidenote: I’m not a huge fan of YAML, but I find it strange to move on to yet another format (be it JSON, XML or TOML) now that newer config files have moved from PHP code to YAML.

masi · December 8, 2021, 9:44pm

I agree with the stance on TypoScript. TypoScript as a configuration language should go.

Time will tell if users (ie the senior devs - senior in years not rank) hang on to it and if it will have its use cases. But that depends IMHO on how the rendering system evolves.

crell · December 9, 2021, 1:17am

Talking to Mathias L earlier this week, he suggested that TypoScript really should get split: The template/rendering parts of it need to just go away, now that Fluid exists. The config parts should… evolve, but we’re not sure how yet, other than it desperately needs type safety as much as CONF_VARS does. It also may need to live in the DB simply because it’s so context-dependent, but then mapping it to typed values becomes a very tricky question because any syntax error becomes a parse problem at runtime, and then it may not load correctly, and now your site is broken. That’s clearly undesireable.

I’m still not sure yet what if anything can/should happen there. Hence the discussion here.

masi · December 9, 2021, 1:06pm

Well, for now some things are really strange to implement in Fluid. Sadly you have IMHO too often to revert to a custom view helper or embedded TypoScript. But I think the current idea to map proper Fluid (the agnostic rendering) and FE or BE related widgets into the same name space will create more confusion if more stuff moves into the Core (be it from the extension “vhs” or not).

I didn’t think about type safety for configs that much. All I can say is that I love TypeScript (mind the “e”) for typed config objects.

1stmachine · December 14, 2021, 9:56pm

I think Configuration Should be deployable!
but a Configuration Module in the Backend could allow for viewing and we could Allow Edeting stuff on an Environment Bases so in Development Mode you could edit the Configuration. but not on Production.

On Topic of Overwritablity. you mention the Priority Problem. we already use a custom multi level Configuration System to generate siteConfig ans TYPO3_CONV_VARS

and we use TYPO3_CONTEXT as the Main “decion factor on what configuration to build”
wie have context like this
Prodution/Staging/SiteA

and our configuration Directory Looks like this (displayed in loading order):

/config.yaml # base configration
/Production/config.yaml # for all production systes
/any/Staging/config.yaml # Loaded on all Staging Contextxt
/any/any/SiteA/config.yaml # config for site A
/any/any/SiteB/config.yaml # not loaded because we are in site A context

the any is kind of wild card… in our experience it allows for good editing expirence (and good way to overwrite for a specific system or stage.
for human editable formats i am not sold on one file per configuration object. but one file per “extension” could be a good middle ground.

crell · December 15, 2021, 1:34am

I think in practice, most extensions will have only one main config object. (It may have sub-objects.) Core extensions are more likely to have many separate objects.

crell · December 15, 2021, 5:21pm

Follow up with a more formal proposal now: https://decisions.typo3.org/t/configuration-overhaul-plan/728

Configuring a better configuration

Core requirements

PHP class schema

Clustered configuration

No core privilege

Environment is its own configuration

Environment-type overrides included natively

Compiled representation

Aside: var_export()

Open questions

TCA

Human-editable file format

Load-time config alteration

Site-specific configuration

Per-page configuration

Alternate futures

Admin-only configuration

Benefits

Drawbacks

Admin-only with database overrides

Benefits

Drawbacks

Full round-trip

Benefits

Drawbacks

Analysis

Wither TypoScript

Multiple systems?

Transition

Conclusion

Aside: `var_export()`