Logging Plan TNG

crell · June 23, 2021, 8:45pm

Hi folks. After a long and productive call with Benni, I want to offer yet another plan for modernizing logging in TYPO3.

The root issues to resolve are:

Until last week, we were misusing PSR-3. That was largely resolved in this issue.
Much of the system is not actually using PSR-3. It’s using the old sys_log table directly for both reads and writes, which bypasses all of the logging mechanisms. See this post on the places that do so, which are not all clear on what they do.
The current log system is complex to configure, and relies very heavily on class names for creating “virtual channels.” Yet class names are not a great key in practice, and their configuration is incompatible with using constructor dependency injection for loggers.
There are systems in core now that really do need log-esque information to be available in the UI, but not all logs necessarily need to be in the UI. (That’s a per-site decision.)
The “type” field is a legacy that is largely unuseful, as it’s nominally extensible but in practice is a fixed set of integer values, which are therefore much less readable and decipherable.
A logging system is a commodity. There’s little value to TYPO3 in having its own, in practice, other than one already exists. But then we have to maintain it, and it is weaker than others that already exist.

The proposed solution is to effectively replicate what Symfony does for its logging “channels”, with a TYPO3 spin on it to keep the sys_log table working mostly the same as it does now.

The steps involved would be as follows:

Add a channel field to sys_log, as a string field. Existing “types” would get mapped to new string channel names. There’s only 3-4 places that currently write to that table, so that can just be done manually in each of those locations for now. (cf: https://review.typo3.org/c/Packages/TYPO3.CMS/+/69656 and https://review.typo3.org/c/Packages/TYPO3.CMS/+/69684)
Update BackendUserAuthentication::writeLog and DataHandler get their own dedicated log channels. (cf: https://review.typo3.org/c/Packages/TYPO3.CMS/+/69656)
Update ext-belog to use the new channel field as a filterable field, instead of type. type becomes deprecated. (cf: https://review.typo3.org/c/Packages/TYPO3.CMS/+/69663)
Add Monolog as a dependency. Wire that into the container in a similar fashion as Symfony does. That is, each logger “channel” gets its own DI entry, and each one can have its own set of writers and processors.
Write a new Monolog writer/handler that writes to the sys_log table. This may or may not also involve additional Processors, TBD.
Write a Compiler Pass that has a hard coded list of channels that always have the new sys_log writer configured, and it cannot be disabled. That list is something we can debate later but it would include at least login attempts, page writes, and other “editor actions,” which are the most important to show up in the UI.
Update LogManager to be Container Aware; then, if getLogger() is called with a known channel, return that Monolog channel instance instead. Otherwise, do what it does now.
Update the DI system to allow services to request a specific logger channel through the variable name, as Symfony does. (I’m not certain exactly how that works right now, but presumably we can just copy their code for it, whatever it is.)
Add a default channel logger service that gets used if a channel isn’t specified. This logger would accept a channel context key and forward the log message to the appropriate channel.
Convert key systems to take a logger via constructor DI rather than LoggerAware, so they get the new channel-based system. This includes BackendUserAuthentication and friends, so they stop writing directly to sys_log.
Over time, convert everything else to use constructor DI for logging rather than LoggerAware. Some of this may wait for v12, simply because then we can require PHP 8.0 where constructor property promotion will make it mechanically easier to do. (Or not; it’s not hard, just will take some typing.)
At some point in the future (v12 or later), drop the old logger implementation.
Revisit sys_log and possibly change it to something more purpose-built. TBD, this is future scope.

The net result when the dust settles:

More services are using clean DI, which makes them more mockable and testable. A trivial mock logger is easy to write, and the in-progress fig/log-util package will likely include one we can just use as well.
We no longer have to maintain a logging implementation of our own. Instead, we use one with widespread support and a zillion handlers/writers already available for a wide array of backends.
We have a logging system that is more straightforward to configure, and familiar to developers from other systems

The list of channels is naturally configurable, but I would propose the following out-of-the-box.

default (effectively “other”, and what 3rd party libs would likely use. Would force sys_log_writer)
user (login, logout, bad password, etc… Would force sys_log_writer)
file (file system related issues.)
php (any PHP notice/warning/errors we can catch)
deprecated (both E_USER_DEPRECATED and E_DEPRECATED)
content (page created, edited, deletec, etc.)
security (possible security issue, rare)

The existing SysLog::Type entries would map as follows:

DB → content
File → file
Cache → default (it’s only used once, and shouldn’t be its own thing)
Extension → default (use cases can/should get rewritten to a proper log call)
Error → php (these all come from error handlers right now)
Setting → default (this is only used once, in system setup)
Login → user

For reference, these are the channels I found used by default by other systems:

Symfony:

main
console

Drupal:

default
php
image
cron
file
form
security
$module_name

liayn · June 24, 2021, 7:58am

Thanks for the extensive description.

What I did not understand reading the text is: Will the following scenarios be covered?

I want to collect all security related logs of TYPO3 core (only) in a dedicated log file.
I want to collect all errors of TYPO3 including extensions in a combined log file.
I want to have debug-logs of extension XY in a dedicated log file.
I want my big XYZ extension to log errors and access violations into a dedicated file each.
I want to combine security related logs of extensions and core in a dedicated file.
I want to Null-output a certain log level of a certain class. (e.g. A certain warning is not important/correct in the given usage, and is therefore only spamming logs; i.e. “no_cache warning” of Core if ext:solr is in use)
I want to combine scenarios 1 to 6 freely

Please evaluate your concept in mind, whether this is possible. I totally see this flexibility as a minimum requirement for a modern logging system. “Create logs right, instead of parsing them later”, is my motto.

And finally, just because something is widespread, it does not necessarily mean it is good!
(e.g. COVID-19, some Apple products, some Microsoft products)
So please stay skeptical with whatever library you have in mind.

crell · June 24, 2021, 2:18pm

Hi Markus.

Anything marked for the security channel, yes, you could configure to go to a specific log file.
Yes, there’s no reason I know of why you couldn’t have all channels write to the same file (or 3rd party service). In the most common configuration, I’d expect all channels to write to sys_log.
I’m not sure off the top of my head if you can configure a channel differently per-level in Monolog. I’ll have to check that. I think so, based on Symfony, but I’m not certain. (It may depend on what DI wiring we do.)
That will depend on if the extension defines its own channel. If so, then yes. I don’t know how common that is in Symfony land, but I know in Drupal it’s extremely common for modules to define a per-module channel. Several core modules do, in fact.
Assuming those extensions are using the security channel correctly (for some definition of “correctly” that I’ve not decided on yet, input welcome), that would be the default behavior.
No, per-class logic would not be possible.

The main difference is that how logs are “sliced up” becomes more the responsibility of the caller to decide rather than the system configuration. Or rather, it becomes a joint responsibility (the caller provides the axes that it could be split up on, the system decides what to do with those axes) rather than today where it’s entirely the system’s responsibility.

masterofd · June 30, 2021, 7:53am

Hi Larry,

just for clarification, when you say “per-class logic would not be possible”, do you mean that it is not possible to set a specific log level per class, as it is possible right now?

I found it really useful to be able to log on a debug level for a Package namespace (so Vendor\MyExtension) and disable anything else.

Thanks for clarifying and doing this impressive work up!

crell · June 30, 2021, 5:00pm

What would not be possible is externally (to a class) attaching different loggers to that class specifically.

A class can (and arguably should) take a logger object for an extension-specific channel as a DI dependency, and then call ->debug() or ->info() or whatever on it, as appropriate. But that’s something you would set inside the class, not from the settings file.

What exactly all the Services.yaml configuration looks like, I haven’t figured out yet. I’ll figure out those details when I get there.

masterofd · July 1, 2021, 6:41am

That’s not exactly what I meant. At the moment I’m able to “pollute” my classe que calls to the logger, bearing different logging levels throughtout my class, for example I have info and debug calls on a very granular level but most of the time, I don’t need that. Then at some point I’d like to flick the switch and know that only the debug information for this class in particular will be written somewhere.

No need to rush, I was just interested in this case.

helhum · July 20, 2021, 11:53am

I agree with @masterofd

I used to describe it like this:

A log event should has:

Severity: “How important is the incident?”
Scope: “Where (in which component) did this incident occur?”
Channel: “What is the type of the incident?” (Most prominently “Security” and “Deprecation”)

Severity is nicely covered by PSR-3 already, channels are covered here in this concept, I’d like to see a concept for “Scope” (or however we call it). I also would not describe the current logging implementation to have “virtual channels”, but rather virtual scopes and the configuration relying on these scopes soley.

What I like about virtual (automatic scoping) based on PHP namespaces is, that it allows disabling or enabling very specific loggings by configuration. E.g. “I want default debug messages from all TYPO3 components, but explicitly not from e.g. file abstraction layer”.

Removing this possibility only because other system don’t have them as well would be a pity.

crell · July 26, 2021, 6:23pm

Relevant Forge link: https://forge.typo3.org/issues/94600

I am going to have to change the format a bit from what is there (turns out min_level is only configurable at the handler level, not channel level), but I do have a working per-class override for those use cases where it’s useful. That part shouldn’t be changing.

crell · July 26, 2021, 10:05pm

OK, design feedback needed, please.

I’ve discovered that Monolog itself has no way to filter log items by level. The handlers do that, by virtue of convention only; all of Monolog’s own loggers take a $level parameter that acts as a min-level filter. That means if we want to filter by min level, that needs to be done at the handler level, not channel level.

The current file:

monolog:
  channels:
    default:
      handlers: ['@Monolog\Handler\SyslogHandler']
      processors: ['@Monolog\Processor\MemoryUsageProcessor']
    deprecations:
      handlers: ['@Monolog\Handler\SyslogHandler']
      processors: ~
  classes:
    Foo\Bar:
      handlers: [ '@monolog.stream.deprecations' ]
      processors: ~

Right now, I have the config file setup to be channel-centric. You define a channel, then wire handlers and processors to it. I had planned to make the handlers singly-defined services, but with the min-level being set on the handler that’s not going to work. I will have to specify the handlers inline in the config file, and then spawn new services off of that.

At that point, though, we should consider that Symfony takes an entirely different approach: Its configuration file is handler-centric. You define a handler by name, and its config, and then what channels it should attach to. cf: monolog.yaml Presumably it then spawns new Handler services (though that logic is buried somewhere in Symfony’s config processing, which I don’t fully grok), and then also derives channels off of that config. We could take something closer to that approach.

Either way, the basic model is taking a config file and generating container configuration out of it, so in concept we should be able to do either, I think. I think I can make the “define channel by class in the config file” logic work either way, likely by simply keeping a separate monolog.classes block, but then the handler setup is different there.

I am unsure of which way is the better way to proceed, here. What do others think? A channel-centric config file would look something like this, give or take:

monolog:
  channels:
    default:
      handlers: 
        db:
          blah:
          level: info
        stream:
          path: typo3temp/var/log/typo3.log
          level: notice
    deprecations:
      handlers: 
        stream:
          file: typo3temp/var/log/whatev.log
          level: info
  classes:
    Foo\Bar:
      handlers: 
        stream:
          file: typo3temp/var/log/foobar.log
          level: debug

While a handler-centric model would maybe look something like this:

monolog:
    handlers:
        main:
            type: stream
            path: "%kernel.logs_dir%/%kernel.environment%.log"
            level: debug
            channels: ["*"]
        db:
            type: db
            db_info: whatever
            level: info
            channels: ["!deprecations"]
        console:
            type: console
            channels: ["!event", "!doctrine", "!console"]
  classes:
    Foo\Bar:
      # This would auto-generate something equivalent to 
      # one of the above entries, but with `channels: ['Foo_Bar']`, 
      # and then auto-create a `Foo_Bar` channel to use for that one class.
      handlers: 
        stream:
          file: typo3temp/var/log/foobar.log
          level: debug

There’s still a fair bit of question marks in the above, and it’s possible that one or the other may simply not work at all (I won’t know until I try), but I am looking for feedback on which direction to pursue. (Note: At the moment please ignore the question of where the file lives; that’s not relevant at this time.)

helhum · July 26, 2021, 10:35pm

Hm, before I can lean in one or the other direction I’d need to understand how the usage side of logger would look like. How would my code be able to obtain a logger? To what channels will this logger then log? What exactly is the “classes” config in that regard? Will each class behave like a channel? Does one logger instance will only log to one channel, or multiple ones? Or will we have different logger instances for different channels?

I think it would help to be more clear how we want these things to work before deciding upon a configuration style?

Do you know how exactly this works in SF?

crell · July 27, 2021, 2:00pm

I don’t 100% grok how all of the Symfony wiring works. It’s surprisingly complicated and ties in their config system in ways I don’t fully understand. (Or like, frankly.)

Monolog crash course:

Every instance of Logger is a channel, in that it has a name. A Logger doesn’t do anything on its own.
A Logger has zero or more Processors, which pre-process a log record to add context to it.
A Logger has zero or more Handlers, which are what actually write to a log output.
A Handler has zero or one Formatters, which format the record structure into a string.
Although it is not part of the Handler interface, all of the Handlers that Monolog ships with (there’s a huge list) have a $level constructor parameter that acts as a min-level filter. For the one or two we write ourselves we should follow that pattern.

My current patch has this logic, which I would want to keep (more or less) either way, the first part of which is roughly inspired by Symfony:

A constructor parameter of $logger with an injected service of @logger gets the default logger channel instance.
A constructor parameter of $beepLogger with an injected service of @logger gets the beep channel logger instance. If beep isn’t defined anywhere, it gets cloned from default with the same configuration.
A constructor parameter of anything other than @logger, or named some other way, gets no special processing; you’re on your own.
If in the config file you specify a class override for class Foo\Bar\Baz, then that class will get a new channel created named Foo_Bar_Baz and that will be injected, regardless of the parameter name.

I think that logic is reasonably sound. It’s configuring what each channel has behind it in terms of handlers and processors that I’m currently not clear on.

Does that clear things up a bit?

benni · July 29, 2021, 4:10pm

I think there are two separate things here to discuss:

Configuration and injection of (custom) channels

I will provide a patch for WIP Core (in DataHandler), as this is probably already possible with our current setup. The WIP patch might be a good place to see custom channel. However in our current PSR-3 system, the logger do not care what handlers or processors or writers are assigned to it.

Conceptually / Design topic

Maybe we need to define what a “handler” is versus a “processor” or “writer”. Am I correct in assuming that the “handler” basically connects a set of writers and processors together? This seems like a very tightly coupled solution then, right?

crell · July 29, 2021, 4:59pm

What the current logging system calls a “Writer” is more or less what Monolog calls a “Handler”. They’re effectively the same thing with a different name.

What the current logging system calls a Processor is more or less what Monolog calls a Processor. Slightly different API, but same basic place in the architecture.

So really, Monolog isn’t radically different from what we’re doing now, at an architectural level. What it adds are a metric buttload of existing Handlers (aka Writers) and Processors that we don’t need to maintain ourselves, and a more formal definition of channels.

In TYPO3 core right now, basically every class is a pseudo-channel, and the class cannot control its channel without changing its name. That is suboptimal if you want to categorize messages that end up in the same output, such as the database, which is the common case.

Technically we probably could introduce a channel concept for injection separately from Monolog, I suppose. But the work involved is more or less the same. It would still be some kind of config file (format debatable) that mutates into DI configuration via a CompilerPass. And Monolog gives us 60-odd Handlers/Writers vs the 6 in core now.

benni · July 30, 2021, 2:24pm

So… here’s my PoC of introducing generic loggers (= channels) for common use cases.

https://review.typo3.org/c/Packages/TYPO3.CMS/+/70163

Using custom loggers it becomes obvious that defining loggers should be done in Services.yaml to
use DI injection such as __construct($myLogger), configuration of writers / handlers will become a mess in Yaml, and does not need to happen at DI time (in my opinion) and per-class logging is a stupid idea when you look at the overhead of spawned loggers.

crell · August 5, 2021, 4:24pm

Revised version of the patch now available: https://review.typo3.org/c/Packages/TYPO3.CMS/+/70014/14

The feature file describes it in more detail: https://review.typo3.org/c/Packages/TYPO3.CMS/+/70014/14/typo3/sysext/core/Documentation/Changelog/master/Feature-94600-AdoptMonologForInjectedLoggers.rst

But, long story short, this gives us:

By default, everything logs to the database.
Dynamic channels.
Support for arbitrary handlers on arbitrary channels for arbitrary levels, all configurable by the site admin via a YAML file.
Per-class overrides available to site administrators.
A very simple DX for most extension devs in the majority case.

It’s similar to, but not quite the same as, Symfony’s config file format, although the wiring logic is quite different.

Still to do, but I wanted to touch base here before doing so:

Add support for processors. These are… a little confusing. I am not clear how Symfony handles them, other than through seemingly proprietary flags.
Add support for attaching a formatter to the handler. I am unsure yet exactly how useful that is, frankly, since many formatters are designed to work with one and only one handler anyway.
Verify how this works with auto-wiring. There may be a bit more work to do there.
Decide where the config file will live. It needs to be YAML and cannot be in the main config array because it is all processed at compile time, not at runtime.

Feedback requested before I proceed further.

crell · August 19, 2021, 4:04pm

I spoke with Jordi Boggiano, the maintainer of Monolog, and got a better sense of how to wire things together. Based on that I have a new version of the patch.

New version: https://review.typo3.org/c/Packages/TYPO3.CMS/+/70014/16

The code has been heavily reworked to handle processors and formatters, and I like this version much better. The changelog doc has also been updated and is the best source of “what does this actually do?” right now. I would consider this feature-complete.

As an extra bonus, I included a PHP 8-only way to define channels using Attributes. I like that far better than the variable name mechanism that Symfony uses, so I’ve supported both at this point, and noted in the upgrading doc that the name-based version may be deprecated eventually.

I have not done any integration level testing yet. Please suggest classes I should try to inject the logger into this way to make sure that part works.

Still to do, on which I do really need feedback here:

Decide where the config file will live.
Integration testing and any knock-on from that.