Configuration Overhaul plan

crell · December 15, 2021, 5:20pm

I’ve chatted a bit more with Benni Mack and others, and based on the previous two posts (1, 2) and their feedback I want to now offer a proposal for the next several steps in overhauling configuration.

The final picture

The endgame for this Epic is as follows. There is a directory tree (in composer or legacy mode, no difference) like this, in every TYPO3 site:

config/
  EnvironmentOverrides.php
  features.yaml
  default/
    Development/
      mail.yaml
    Testing/
      mail.yaml
    Production/
      mail.yaml
    somestring/
      mail.yaml
    mail.yaml
  sites/
    default/
      mail.yaml 
    sitea/
      config.yaml
      overrides/
        Development/
          mail.yaml
        mail.yaml
    siteb/
      overrides/
        mail.yaml

All configuration objects are defined canonically by a typed PHP class. That PHP class corresponds 1:1 with a YAML file. (The exact naming of the YAML files is still TBD.)
Every config class is expected to have a default for every property, so that it can be loaded without a YAML file existing. It also MUST be re-loadable via __set_state(). (Traits provided to make that easy.)
YAML is chosen as the config format on-disk not because it’s good (it isn’t), but because it’s popular. XML would be more self-documenting, but arguing for XML is an uphill battle I don’t want to burn karma on. (Also, as of this writing Serde supports YAML but not XML yet, since it’s more annoying to parse.)
features.yaml - This config object (there is a class for it) is technically outside of “configuration”. It is read into a corresponding object and that object is then exposed to the system via DI. It contains all feature flags, with readonly properties. It replaces the feature flag portion of TYPO3_CONF_VARS.

EnvironmentOverrides.php is part of the environment system. It consists of:

The Symfony DotEnv component.
The override file.
A class that such data is read into, which may or may not be the same as the existing Environment class (TBD).
A new $connections array. (More on that in a moment.)

The Environment

The existing TYPO3_CONTEXT variable is used as an environment mode differentiator. Many systems already have such a flag. We can use it as is. It is used mainly for config directory resolution.

There are additional env vars to define common per-environment things, such as a default database connection. (The exact list is TBD.)

The DotEnv library allows for those to be set via a .env file in development environments. ddev’s existing support for setting env vars is unaffected.

During startup, the environment variables are read into local variables along with a new variable named $connections, and then EnvironmentOverrides.php is invoked. That file may then mutate whatever variables it wants. This is primarily for host-specific glue code on cloud-based hosts (to map the host’s env vars into TYPO3’s env vars). Developers may also enhance the $connections array as desired. This file does not have meaningful access to any true-globals.

Connections

The $connections array includes all connections to external services: SQL database, Redis, Solr, Elasticsearch, etc. It’s structure is approximately thus:

$connections['db']['default'] = [ ... ];
$connections['db']['legacy'] = [ ... ];
$connections['solr']['main'] = [ ... ];
$connections['elasticsearch']['main'] = [ ... ];
$connections['cache']['redis'] = [ ... ];

A single connection type for each main type of connection is defined by the system, and populated automatically by environment variables. Additional options may be added in EnvironmentOverrides.php. So, for example, the typical site will only need to set these env vars:

DB_TYPE=mysql
DB_HOST=localhost
DB_USER=me
DB_PASS=secret
DB_PORT=3608

And that will automatically map to

$connections['db']['default'] = [
  'type' => 'mysql',
  'host' => 'localhost',
  'user' => 'me',
  'pass' => 'secret',
  'port' => 3608,
];

There are similar env vars pre-defined for Solr, Redis, and whatever else we decide to predefine.

After that array is populated, it is deserialized into a readonly object (with nested objects, most likely), which is then available through the DI system. Any service that needs to can get that Connections object injected and read whatever it needs.

Any extension that wants to configure additional connections… is only allowed to do so by having the site admin define them with an appropriate key in that array. Defining new connections through the UI is explicitly not supported.

(It may be possible to make all parsing of that information lazy the first time the Connections service is requested. TBD.)

Sites v2

Extending the current sites definition logic, each site is defined by a config/sites/$key/config.yaml file. A site does not exist unless it is defined there first. That is, its presence in the page tree is dependent on that file existing. It’s structure is essentially the same as now, but is parsed into a readonly object to expose to the system. That is how one reads the site. (A “current site” DI service can handle the resolution logic and return the appropriate object.)

Configuration

All other configuration objects vary based on only two axes: TYPO3_CONTEXT and the current site. When a configuration object is loaded, a file is looked up according to the following order. The first file found is the whole configuration object.

config/$current_site/overrides/$current_context/mail.yaml
config/$current_site/overrides/mail.yaml
config/default/$current_context/mail.yaml
config/default/mail.yaml

The first file found is what gets used. It is deserialized into the corresponding object, which is available as a readonly service through DI. (Note: The order of the two middle lines–that is, what happens if there is a site-specific and context-specific file but not one for both–is important but easy to hard code either direction. I don’t much care which it is. That’s a separate discussion we can have at a later time, but should be trivial to change in code once we make up our minds.)

The loaded object is also cached out to disk using var_export() to typo3temp/config/$site/$context/ (or similar). That way, subsequent reads can just include that file if it exists and skip the file resolution. Because that process does not require a database lookup, it can be done lazily (for development) or all at once during a deployment process to pre-warm the cache. (The cache warmer is a low-priority, later feature.)

Wither TypoScript

I have concluded that, from an architectural and workflow perspective, we should treat TypoScript as content, not configuration. While one could debate what it is semantically, from a workflow perspective it is… not configuration, but content, because it is data that varies potentially per-page. It is therefore explicitly excluded from this discussion for now.

We may (and should) revisit TypoScript later to put more structure into it and rethink how it should be able to vary, but for now it should be viewed as content, not configuration, and thus out of scope of this discussion.

Because this system is more self-evident and easy to work with than ext_conf_template.txt, though, it may incentivize some extension developers to shift behavior out of TypoScript and into the formal configuration system.

One thing we should do, however, is expose configuration values to TypoScript, so that TS authors can read (but not manipulate) those values as appropriate. What that looks like syntactically, I don’t know right now.

Editing?

In v11, extension configuration defined via ext_conf_template.txt has a basic UI auto-generated for it, and is GUI editable. TYPO3_CONF_VARS proper does not, although some bits of it may be manipulated through the Install Tool.

In this revised model, by default, none of those configuration files are editable through the GUI. In practice I think this is a very small regression, but it is technically a regression. What we get in exchange, however, is a much more explicitly defined, self-documenting, and hand-editable configuration system. More importantly, it has native support for site-based and env mode-based overrides, which are far and away the most common things by which you need to vary configuration.

The configuration, $connections, and environment systems also make TYPO3 vastly more compatible with cloud-based hosts that offer readonly file systems. That’s the win.

A possible (and I stress this is possible; I am not promising it) extension, however, would be to allow the system to detect if it is running in an environment where the config directory is writable. If it is, then all configuration objects can have edit forms auto-generated for them (give or take an opt-in/opt-out flag and some attributes for form customization; I’m pretty sure the form engine configuration should not be UI-exposed, at all). Those forms would write back to the YAML files on disk directly. (Excluding features.yaml and EnvironmentOverrides.php, of course.) Updating the file clears the cached var_export() for that object, allowing it to be regenerated.

If the directory is read only, then all of those forms automatically become read only as well. That is, they become a way to review what the configuration is, but not to edit it.

That would allow local development environments to manipulate the configuration via the GUI, and automatically produce files that are git commit-able. When deploying to production, however, all of those files become locked and readonly and the only way to update them is via a new git push, which is the correct way to update production.

The tricky part is how the forms would interact with the context/site override logic. I think that can be handled by putting some toggles onto the forms themselves. The specific UI/UX of that I am not sure at the moment, but that’s something that would define if this addition is possible. (My prediction: It won’t happen by 12.0, but there’s a better than 50% chance of it happening by v12 LTS, assuming we can make it work at all. Again, I am not promising.)

Transition

This setup replaces TYPO3_CONF_VARS, LocalConfiguration.php, AdditionalConfiguration.php, and FactoryConfiguration.php. It also replaces the ext_conf_template.txt files. Naturally that means we need some transition phase for backward compatibility.

My intent here is that those systems all remain essentially unchanged in v12, but get deprecation warnings if you access them directly. (Assuming we can find a place to put such a warning; if not, we just document it.) Then, appropriate sections of the TYPO3_CONF_VARS array become a final fallback for the config objects if not defined elsewhere. For example, if the mail system is not defined anywhere in the following files:

config/$current_site/overrides/$current_context/mail.yaml
config/$current_site/overrides/mail.yaml
config/default/$current_context/mail.yaml
config/default/mail.yaml

Then it is hard-coded somewhere to look at TYPO3_CONF_VARS['MAIL']. That array is then used to populate the config object and cached. Config objects do not need to correspond directly to a top-level TYPO3_CONF_VARS array; in fact they very much should not. However, most will likely correspond to some segment of that array in order to make the transition easier. (For instance, the Locking class will correspond to locking.yaml, and if one is not found then it will look at TYPO3_CONF_VARS['SYS'][locking']['strategies'].

In v13, we just remove that last fallback step and TYPO3_CONF_VARS will be gone.

Implementation steps

Getting here is a multi-step process, of course. I can and will kick it off but it will require help from others, especially to convert core to use the new API rather than reading from TYPO3_CONF_VARS directly. The general 12-step plan is:

Clean up the Environment class into an injectable object, cf Feature #94995: Expose environment object through DI - TYPO3 Core - TYPO3 Forge
Add Symfony/DotEnv as a dependency and wire it in.
Add Crell/Serde and its ancillary tools as dependencies. (Doing this as its own step will make merges easier, as it’s less work to keep composer.json in sync.)
Add the new features.yaml file.
Introduce the EnvironmentOverrides.php file and $connections array, and associated object.
Convert the DB system to read from the Connections service value object.
Update the installer to write a .env file instead of LocalConfiguration.php. (Or maybe in addition to at the moment, until the transition is further along.)
Update how the sites files are read to use Serde and injected objects. This includes appropriately updating the “site figuring-out” logic.
Pick one or two easy-ish config objects (mail, logging, and gfx are good candidates) and build out the config reading and fallback system using those as trial balloons. This will be the most complex task, I imagine.
Build out config objects for the rest of configuration. This is a very crowd-sourceable task.
Convert all uses of TYPO3_CONF_VARS in core to use the new config system. This is a very crowd-sourceable task.
Build auto-forms for configuration objects.

As for who does them:

Steps 1, 3, 4, 5, and 9 I intend to tackle directly myself.
Steps 2, 6, and 8 I could do, but so could most contributors so if someone else wants to pitch in, that’s a place to do it.
Step 7 is best done by someone with more knowledge of the installer system.
Steps 10 and 11 can and should be done by as many people as possible as a way to get developers practice with the new APIs.
Step 12 is going to require someone with very dedicated form API knowledge, working closely with me to keep it as elegant as possible. (Volunteers welcome.)

Conclusion

So, this is my proposed roadmap for configuration in TYPO3 v12 and beyond. If anyone would like to chime in with support, approval, pointing out something stupid that I missed, or rotten tomatoes, now is the time.

krueml · December 15, 2021, 7:10pm

As an ordinary extension developer I want to sum it up for my understanding: I am maintainer of some extensions (e.g. EXT:matomo_integration and EXT:matomo_widgets) where the settings are defined in site configuration. For this I extended the site configuration with custom tabs and fields. Now (in v10/v11) it is written from GUI into the according site’s config.yaml file. So for each site mostly the same settings have to be defined (like Matomo URL or API token, which may differ or may not - or you have to define them manually as environment variables and use that variable in config.yaml).

As I understand now the new approach: I will create a class (e.g. MatomoWidgets) with the readonly properties which can hold the settings from matomo_widgets.yaml. The integrator defines a default matomo_widgets.yaml with the common values (like URL, API token, activated widgets, etc). She can then adjust the settings for a site by adding a matomo_widgets.yaml file into "config/$current_site/overrides/matomo_widgets.yaml.

That is great, because this will be more powerful and flexible: If the Matomo URL is the same for all sites but only the site ID differs (which is most likely), then only this ID has to be adjusted in the overrides folder for a specific site.

mabolek1 · December 15, 2021, 8:07pm

This sounds exciting.

I like that you include so much on transition. It will take some time getting used to, but it seems logically sound and not bad at all.

Could we please use lowerCamelCase or UpperCamelCase instead of snake_case for the file names and variables?

Having had some time to think about it, I think you’re right about TypoScript being in the content realm. You’re right to exclude it from this discussion. (Though the word “content” be misunderstood, so I guess I should dig in the dictionary and come up with a suggestion for a name. “Garfield Space, the realm between confguration and prose.”)

crell · December 15, 2021, 10:50pm

If I follow your description correctly, that is almost right. In this model, you would have something like this class in your code:

#[Config]
class MatomoWidgets 
{
    use Rehydrateable;

    public function __construct(
      public readonly string $url = '',
      public readonly string $token = '',
    ) {}
}

If you do nothing else, you’ll be able to get MatomoWidgets injected into your services via DI, and it will have those two empty properties.

Then, a site admin would add a file in config/default like this:

# config/default/matomo_widgets.yaml
# (Or whatever the name is)

url: https://www.example.com/
token: dev-token

And that would get used in all circumstances to populate MatomoWidgets.

Then, add the following as config/default/Production/matomo_widgets.yaml:

url: https://www.example.com/
token: prod-token

You would need to repeat the whole file, not just the one key, as described here. But then your code would get whichever file is most specific and is defined, and you can just use that readonly struct in your code, with all the type safety that implies.

crell · December 15, 2021, 11:07pm

More precisely, TypoScript follows a content lifecycle, as opposed to a code lifecycle. “Configuration” is a mushy thing that kinda floats between the two lifecycles depending on the system and details.

For more on this distinction, see https://youtu.be/1OIjInDHqmI?t=255 (about 4:15 through 11:20).

Also, this post, which is a sort of companion: https://platform.sh/blog/6-things-to-do-to-make-your-application-cloud-friendly/

Also, regarding the installer: https://platform.sh/blog/2020/installers-that-dont-suck/

mbrodala · December 16, 2021, 8:47am

Technically duplication could be reduced by making use of the existing import feature of the TYPO3 YAML loader. But that should be avoided if possible to keep everything sane, including the developer.

helhum · December 16, 2021, 10:39am

What I definitively miss is a strategy how to deal with configuration (TYPO3_CONF_VARS) defined in ext_localconf.php files by extensions.

grep -c TYPO3_CONF_VARS typo3/sysext/*/ext_localconf.php

currently gives a sense what is in there that definitively needs to be resolved in some way before starting with any action besides tuning the concept.

Another thing that isn’t clear to me is what “connection” exactly means. From the matomo example above, the URL and token is a connection right? At least the token is a credential, which I’m not very keen on committing to version control, but still want to have the flexibility to pull different values somehow on different systems/ environments.

Which leads me to a very practical thing, which I’m not sure I understand correctly:

In a typical TYPO3 v10 (or v11) project, I have configuration for a project that is 98% identical for all environments. The 2% differences are spread across all configuration sections.

If I understand the proposal correctly, I would have to duplicate all yaml files then for all systems, ending up with 98% duplication across > 4 files I would have to keep in sync manually?

A few examples for such env specific config in practice:

mailer config (SMTP/sendmail config)
image process config (path to imagemagick)
sytem locale
proxy/ reverse proxy config
site name
cookie name(s)
system maintainers
…

crell · December 16, 2021, 3:14pm

Re whole-object overrides:

On the read side, yes, it would be possible to allow multiple files to “mask” each other, effectively. It would involve reading the YAML files in as arrays first, doing a deep merge, and then deserializing that into an object rather than deserializing straight from YAML. More complex, but doable.

The problem is on the write side. If we want to include a GUI editor as proposed here, I can envision it having sufficient complexity to say “I am editing the Foo object for site A in mode B”. However, “This field on this page should override site A in mode B, this field should apply to site B in all cases, this field applies only to site mode D on any site, etc.” sounds… terrifying. The complexity of that, both for the implementation and for the user trying to use it, seems far more than we want to tackle. Certainly more than I feel competent to tackle.

Just the UX complexity would be huge, to say nothing of the implementation complexity of partially serializing just one or two fields out of an object, and then different fields to a different location. That’s not something Serde can handle (nor frankly should it).

I simply don’t see how that would be possible on the write site. Potentially, we could implement it for the read side but not the write, so if you use the GUI then you will get duplication but if you ignore it or turn it off you can do that manually. (This would likely necessitate a kill-switch to disable the GUI even in cases where the file system is writeable, which I’m OK with.)

Would that be acceptable?

crell · December 16, 2021, 3:22pm

Re tokens/credentials: Those are best handled through environment variables, frankly. The EnvironmentOverrides.php file would need to have some way to ensure those get set, or use .env, or whatever. I don’t have a precise picture of what that looks like yet, but “use env vars more” is part of that part of the epic. Of course, it would be up to the extension developers to use env vars, configuration, or something else appropriately. (We can provide guidance and recommendations, but some people will always go against them, rightly or wrongly.)

As far as ext_localconf.php, you’re correct there is no direct equivalent at the moment What sort of things are reasonably done there that need to be supported? For things like “registering a new thingie”, that ought to move to a PSR-14 event anyway. (Registration of that sort is specifically an intended feature of PSR-14.) What else is there that needs to be supported?

helhum · December 16, 2021, 4:02pm

This will not work out. A site currently is a request attribute, and depends on either of the two things:

The selected page in the page tree (a site is connected the root page of a subtree)
The current URI (a site is also connected to a base URI)

Therefore the current site will never be injectable via DI (unless we do some globals hackery, which we aim to get rid of)

crell · December 23, 2021, 3:17pm

@mbrodala Thoughts on the partial write problem? Write-side single-value overrides are not feasible, I think, so I am skeptical about adding it on the read side to avoid confusion.

@helhum What is there in ext_localconf.php right now that cannot reasonably move to events? Any registration really belongs in events instead already, it just hasn’t migrated yet. We need to ensure everything else has a better solution so that we can use it.

masi · December 24, 2021, 9:58am

You would need to repeat the whole file

An approach that works out nice for TYPO3 forms and docker-compose.yml files is that when more than one is configured (forms) or specified on the command line their content is accumulated.

Duplicating complete config files is IMHO a horror.

On the read side, yes, it would be possible to allow multiple files to “mask” each other, effectively.

I simply don’t see how that would be possible on the write site. Potentially, we could implement it for the read side but not the write, so if you use the GUI then you will get duplication but if you ignore it or turn it off you can do that manually.

The write side has also the problem of writing for what: the global settings, per site settings, per context settings (which?)…

Actually the problem exists already for all those who have custom PHP confguration files. Eg we (my colleagues and me) tend to have context dependent settings which are never overridden by anything you can set by UI.

OTOH if the UI is complex enough it could write to the correct file. That is the user has to define which value of the site/context matrix has to be changed.

masi · December 24, 2021, 10:17am

To avoid conflicts with other variables I suggest rather not to use

DB_TYPE

but

TYPO3_DB_TYPE

My intent here is that those systems all remain essentially unchanged in v12, but get deprecation warnings if you access them directly. (Assuming we can find a place to put such a warning; if not, we just document it.)

We could turn it into a nested object which implements all the array interfaces.

crell · December 24, 2021, 3:42pm

With two axes of variation (mode and site), a UI that can handle object-level overrides is potentially challenging, but within the realm of possibility (both from a UX perspective and implementation perspective). Individual key based overrides is simply not feasible. Even at the code level, it means “Saving” would be a per-property operation. That basically blows the concept of config objects out of the water.

We could turn it into a nested object which implements all the array interfaces.

I’m not sure what that would accomplish? Do you mean to have a place to trigger Deprecations? That might work, but the conversion step itself could be rather costly along the hot path of every request.

masi · December 24, 2021, 4:13pm

We could turn it into a nested object which implements all the array interfaces.

I’m not sure what that would accomplish? Do you mean to have a place to trigger Deprecations? That might work, but the conversion step itself could be rather costly along the hot path of every request.

That’s what I meant. I agree that it’ll slow down the system. Perhaps it can be set up to be an opt-in for development context.

masterofd · January 10, 2022, 4:27pm

I guess this is something that can be done on cache warmup (which we conveniently now have, thanks to the good soul(s) that implemented this), which would allow to deploy to cloud and have it fast in all situations as soon as it is deployt, therefore the costly write operation (or the final yaml object) could be simply stored once and never touched again, until the next deployment.

As for the whole spiel of “How do we ensure that we don’t have the same configuration 4 times over” I’d suggest a similar road (although, I must admit that I’m not quite sure on how you’d achieve an interface that could tell the differences apart, but in the end, I guess this would be used on configuration and could be cleaned up before commit?).

EDIT: How to be late on a very cool party

crell · January 10, 2022, 8:00pm

How would cache warming help with the problem of individual key overrides being too complex for anything but manual editing? Or did you mean something else?

masterofd · January 11, 2022, 7:55am

That’s not what I meant.

What I meant was, that if we have these separate configurations that we deep merge, we’d need to pre-build a finale configuration per-site for performance reasons.

The approach about individual key overrides that I had in mind was more in the jist of “you’ve updated the configuration, you now have a new configuration file which you should trim down to what you really need” which IMO is a task that an integrator should be able to do, but writing this out, I figured, you could support the integrator further by using diff and creating a before_ and after_ configuration file to display what changed in the final yaml, which the integrator then easily can integrate into the/a trimmed down version of the yaml file (which we then could even write for the integrator, if we wanted to), still giving the integrator the possibility to just go with the whole file, if they want to.

crell · January 11, 2022, 3:12pm

Ah. Yes, the setup I describe above involves caching the resolved config objects to disk as var_export()ed objects that can be read in straight, without any deserialization at all. Just an include call. They’re also DB-agnostic so a cache warming CLI command should be straightforward to do.

For key overrides, if I am following you, you’re saying to have the GUI write the full object to a given site/mode combination, but support masked reading? So if someone is configuring via the GUI they will get full objects, but the admin is welcome to go in and manually trim out duplicate lines if they feel like it before deploying.

That would work, technically. Whether it’s a good approach or not I’m not sure, but on a technical level it would work.

masi · April 12, 2022, 7:12pm

Technically duplication could be reduced by making use of the existing import feature of the TYPO3 YAML loader . But that should be avoided if possible to keep everything sane, including the developer.

Many systems relying on YAML come up with their own import feature. Duplication is a real issue. Some folks are driven mad by it