Disable external requests until opted in by admin

Discussion Topic

TYPO3 currently has a couple of implementations which in a default setup will dispatch external requests that may be used for (undeclared) information collection such as usage tracking. I take privacy very seriously and am naturally opposed to software making any external requests unless the user opts in, and I am particularly against such requests if they are enabled by default and have no opt-out setting.

I therefore would like to raise the discussion about whether or not to remove such mandatory external requests from all current TYPO3 versions and make it an official rule to never implement such requests unless they are opt-in and disabled by default, and to carefully consider whether the request is important enough to merit its existence.

Impact

  • TYPO3 currently exposes installations without any way to opt-out.
  • At the time of writing this, such external requests are even causing performance issues (see Bug #91507: System information very slow - TYPO3 Core - TYPO3 Forge).
  • By forcing external requests to happen we risk lowering the trust of users that their usage of TYPO3 is not being unwillingly tracked.
  • GDPR concerns may apply since presumably, these remote requests are at the very least logged on the remote server along with an IP-address.

Possible Migrations

  • (A) Completely disallow the implementation of external requests from TYPO3 unless they are implemented by a third party developer (making them very clearly opt-in).
  • (B) Implement an opt-in mechanism for such requests which allows them to be disabled by default and only be dispatched if a user knowlingly opts in.
  • Institute a rule for future development that such requests should either be disallowed entirely (A) or always be disabled by default and only enabled on explicit opt-in (B).

Organizational

Topic Initiator: Claus Due

TYPO3 currently has a couple of implementations which in a default setup will dispatch external requests that may be used for (undeclared) information collection such as usage tracking.

Just to make it clear: nothing gets currently tracked, this requests are done for a simple version check. Ofc accesses gets written to the webserver logs but this information is rarely used for debugging purposes of the application only. Since 9.5 the new API is used, older version are checking the version by downloading get.typo3/json.

But I agree here we have to change this behavior. Other option would be just using the install tool for such requests on demand.

I miss a couple of details in this request before I can form an opinion on the matter.

Whose privacy is infringed upon in detail?
Which personal data is sent or stored?
Which forms of usage tracking can be created using the transmitted information?

Thank you in advance

It’s a simple get request, not data is transfered. Only IP’s are stored in the webserver log

The only information I currently can extract are the number of accesses to the API, so nothing really usful

No claims were made that privacy is infringed upon. The issue is that external requests happen without the users knowledge and consent. It affects any and all backend users of any and all TYPO3 sites on any and all of the TYPO3 versions that perform these external requests.

None, but at least one of the requests (version upgrade check) includes the currently installed minor TYPO3 version as URL segment.

Statistical information such as number of active sites distributed across minor TYPO3 versions, segmented by geography (geoIP), estimation of number of active users for a given server, login frequency, and depending on DNS setup and shared/dedicated hosting, reversal of IP to hostname to identify the specific site by domain name. Sites which use reverse proxies will also expose their real IP if not configured with a system routing-level proxy ability. I’ve not had time to verify this but requests run through Guzzle and at lower level through PHP which may also generate a HTTP client identifier that includes anything from PHP version to OS being used.

Further, TYPO3 configuration allows replacing the Guzzle handler which could be used to hijack or modify these requests in a way that isn’t immediately detectable by standard PHP safety analysis such as detection of usage of URL stream wrappers.

But as mentioned, the issue isn’t what gets transmitted, or how securely it happens - the issue is that it happens without consent. The discussion is not centered around collecting concrete personal information or infringing on privacy, main topic is that we don’t ask for permission and have no way to opt-out.

Please help me understand. Maybe give an example: What external requests are made to where? When?

All request should be guarded with an opt in at the place of calling. It is a matter of a single if which checks some global array. This is way faster then doing the request.

I think we should categorize the requests and atleast make them public in then installtool.

Regarding the version check i strongly argue for having it enabled by default, but give the user the option to disable it during installation, by suggesting to have it enabled, but allow to switch it of.

Maybe we could move that into an extension called update check.

Same goes for extension list update. If you push the update button or use the scheduler task, you basically opt-in.

Thanks for bringing up this topic, I remember it has been discussed in some Slack channel already a couple of months ago.

Facts

  • on every backend login or reload CoreVersionService is triggered by SystemInformationToolbarItem
  • via AJAX this status is constantly updated during active backend sessions
  • JSON data is retrieved from e.g. https://get.typo3.org/v1/api/major/10
  • the request is executed by the server hosting a particular TYPO3 instance
  • the exposed information is the currently used major version (10 in the example above)
  • the exposed IP address belongs to the web server, not to the currently logged in user

Feedback

  • GDPR applies to individuals, not to web servers
  • web scrapers and bots are crawling the web 24/7 - potentially discovering and tagging TYPO3 websites in their storages - there’s not much we can do against it
  • concerns on potential negative performance impact seem to be valid since at least the first request is blocking and consequent AJAX requests by a frequently used system might cause “some load” on the system

My suggestion would be to complete disable this current check for the time being and research on better alternatives that actually allow to collect anonymous usage data - helping us to understand use-cases much better. Of course that must be an opt-in solution then.

Since TYPO3 9.5 every toolbar refresh makes a request to get.typo3.org to check for minor or security updates, the first time this request is made after the BE login.

TYPO3 8.7 and earlier do this request from the install tool only after a explicit user request AFAIK.

Since the call was for opinions:
Data clearly indicates that website maintainers do a shitty job updating their clients sites.
To counter this, the update check was put in place.

Making this a configurable option will result in exactly those maintainers disabling this check.

So I strongly oppose to the removal (I‘d argue that an update check once a day is sufficient, though).
I value secure sites higher than pseudo-privacy of webservers (!) which can be publicly scraped (which they effectively are).

In general I think it is polite to inform the site administrator about outgoing calls, their endpoint and their purpose. Providing an API to register calls and configure them is IMHO a nice-to-have. Honest 3rd party developers will have documented these calls right now, and sneaky ones will not use the API anyway.

I agree with @dermattes that the version check has its merits and that once a day (for the whole installation) is enough.
The check should not block the backend as TYPO3 may run on a server with limited access to the Internet. Actually some may argue that preventing outgoing calls by default by a firewall etc is best-practice. In this situation admins should get a notice that a version check has not been done.

The 24h solution is available here: https://review.typo3.org/c/Packages/TYPO3.CMS/+/64904
Reviews and comments of course always welcome.

While I agree with the »secure (production) sites = higher value« I created the »Please let me opt-in« request - see https://forge.typo3.org/issues/90934 - back then because these requests happen on my local dev environments also and at least here I would not opt-in. Having the toggle within the Production / Development presets would be also an option.

Do you also oppose:

  • Converting it to an opt-in solution
  • Adding an opt-out ability

I’m assuming this means that you also oppose converting to an opt-in solution or even providing an opt-out ability, I just wanted to make sure I understand it correctly.

In my opinion this is the wrong way to think about consent of our users, essentially treating them as if they do not have the right to choose whether or not to allow dispatch of external requests. The argument seems to be that because somebody does it wrong, nobody should be allowed to choose.

For the record, it is not only a question of web servers - the outgoing request also happens for anyone developing on their local machines, on CI (private) systems that implement acceptance testing with e.g. Selenium, and reports the internal IP address of sites running behind reverse proxies. These are things that public scraping cannot reveal.

The version check is a very useful feature and we should not disable it (however the perf issue has to be solved).
I also dont see a need to make it disabled by default.

All the cms and ecommerce systems (both proprietary and open source) I know have this kind of checks in place. Often collecting much more telemetry data then we have here.

In the longer run TYPO3 should be able to send anonymous usage stats so we have solid data helping us to improve the product, but this Would need to be an opt in thing and a separate topic than this one.

Maybe a compromise is to disable the check completely for dev systems, context = dev

In my opinion, site owners should have the option to opt-in/out at least. If not from a technical perspective, then for the reputation of the product.

I understand the implication of not having a version check in place. Besides the fact that the frequency of this check could/should be adjusted (see @liayn’s comment), this is an important feature to improve the security of production sites, but is pointless for dev environments.

I propose to add an option to enable/disable all (or individual) “contact home” functions during the installation process. Users should be able to choose their preference depending on their individual use case and be able adjust the settings later. As the version check is a crucial feature, a prominent message should inform admin BE users that this is currently disabled when they log in (e.g. similar to the message that issues have been found and you should check the reports module).

  • Discuss/determine the default setting of “contact home” functions.
  • Does this discussion impact any other functions besides the CoreVersionService?

I think having a configuration for this is important.
My idea about that would be:

  1. Check the version asynchronously at each login (to not block the login)
    This should be enough to keep most systems up-to-date. I do not consider it necessary to constantly send this request (seems to be every 5 minutes at the moment).
  2. Add a scheduler task that checks for new versions.
    With this one can realise e.g. a daily request. Optionally, you can specify an e-mail address that will be notified when updates are available.
  3. Add a configuration to completely disable the request.
    This allows the request to be switched off in the development context, for example. Or activate only for the “Live” environment. Or (almost everything) what the developers want.
  4. Bonus: A possibility to adjust the URL of the request
    This could be used to re-route requests to e.g. an own server. On that server you have to take care that the data of the actual endpoint is available. But in this case the system will no longer call “home”.

Thanks for bringing up that topic.

From my perspective, I’d like to see the following things:

  • opt-in / opt-out for the user within the installation process regarding outgoing requests from TYPO3
  • store those information as configuration, thus it could be managed via code
  • ability to configure it on a context level such as “Production” (and sub-contexts “Prodution/Staging” or “Production/Integration”), “Development”, just to name some
  • optional: scheduler task/s for those things to let the user decide what should be used and how frequent. This could also be part of the configuration, but would make the process a bit more complex during the installation.

I guess most of the users will opt-in for those requests, as they have a pure benefit.

If that data is used for any “statistics”, this data is not really reliable, as mentioned before those request are also send from local development and from other environments that do not represent the actual installation of TYPO3. Also it’s kind of “wasted effort” here to send those requests, as they do not have any benefit as no user will see it, if I take an environment for acceptance testing as an example.

I’m very excited to see where we end up and look forward to all the feedback here.