Disable external requests until opted in by admin

jonaseberle · June 18, 2020, 9:30am

For me the priority is to fix the slow startup. What is the worst case there? The usual 30s HTTP timeout?

As we currently do not process that data for statistics, there is no need for opt-out in my opinion. I would keep the server (not the client) requesting that data to stay clear of user IPs here.

Could we add a backend route that gets accessed via AJAX from the backend? That would need extending SystemInformationToolbarItem, though.

mabolek1 · June 18, 2020, 9:38am

My two cents from a marketing perspective:

The statistics gathered by this request are extremely useful and I cannot emphasize this enough.
Anonymizing the data is the right way, but it’s good to pick country information from the IP address first.
I think one weekly or monthly request is good enough for marketing.
The request can be disabled on requests originating on localhost or 127.0.0.1
I understand those who would like a way to disable these requests. Such a feature is a fair thing to ask for, but I will actively discourage people from using such a disable option because no personal information is exposed and the data is extremely helpful to the TYPO3 marketing effort and thus the growth of the CMS.

kevin-appelt · June 18, 2020, 10:37am

Can you explain in more detail how this data can be used?
As far as I can see, the request seems to be sent every 5 minutes when I am logged in. In addition, the request is sent from each system (local development, staging, testing, live, whatever).
A system in which nobody logs in will (probably) not send any information at all.
At least at this point in time I would say that this information is worth little or nothing.

mabolek1 · June 18, 2020, 11:00am

Monitoring adoption rate of TYPO3
Monitoring adoption rate for major releases
Monitoring the effect of specific marketing emphasis
All of the above with on a per-country basis

I suggested a weekly or monthly rate as well as excluding localhost. Excluding non-production could also be a good idea, but it depends exactly which numbers are most useful. (I don’t have the answer.)

A system where nobody logs in will have no statistical significance when we’re measuring changes only. That nobody logs into an install could probably be interesting for some, but we can still track changes in activity without measuring lack of activity.

helhum · June 18, 2020, 5:02pm

My two cents from a marketing perspective

The version check isn’t a marketing feature and by no means should be abused to become one.
Instead a pseudonomous and optional (opt in) feature should be introduced that reports back some basic information about the TYPO3 installation, which can be used for statistical purposes.

helhum · June 18, 2020, 5:16pm

Exactly. And while this is likely not a personal (GDPR related) privacy concern, this reveals TYPO3 installations that are NOT publicly available or scrapable. For dev systems it might be GDPR related, as IP addresses of individuals are revealed and stored in server logs without consent.

I don’t think this is the case. I rather would assume the IP address of the gateway will be revealed.

helhum · June 18, 2020, 5:31pm

\TYPO3\CMS\Install\Report\InstallStatusReport exits (and was there before the checks were added as well to system information toolbar) to check for updates and informs users and is opt in, as the scheduler task for reports needs to be set up, or reports module is viewed. It also is asynchronous by concept, as current as configured and has zero impact on backend performance.

Information fetched in this report could be cached and shown in the tool bar, so that the information is more prominent.

Benefits:

Information is fetched from online resource when requested, or regularly when configured as opposed to continuously and without explicit “request”
it is opt in
no performance impact
no request without reports module

Downsides:

update check is opt in. no check is performed when not triggered or configured
update check opt in is rather complicated (scheduler and reports module is required)

Conclusion:

Remove polling every 5 minutes
Allow opt out for update check (like any reasonable software)
Only check once a day
Always check when requested (reports module)

mabolek1 · June 18, 2020, 9:26pm

The check is not a marketing feature, and I have not said it is. However, measuring our marketing’s effectiveness is a statistical purpose.

What I am describing is not abuse. It is basic web server log analysis.

erredeco · June 19, 2020, 9:09am

I think that development environments should not be dependent from external systems (I should be free to develop even if I am offline); about the rest, I don’t know… maybe during installation, warn the user about this feature and allow the user to disable it on install tool later.

I’d say I agree with Manuel Selbach

helhum · June 19, 2020, 10:42am

Yah, maybe “abuse” wasn’t an appropriate word. What I mean is, that we shouldn’t discuss this feature from a perspective that doesn’t serve its initial purpose. The puprose is to actively inform users about updates. To be able to do so, a remote service is queried. Now arguing that this feature should stay to allow doing statistics on the logs of the remote service, side tracks the discussion into a direction that confirms concerns that are brought to the table.

mabolek1 · June 19, 2020, 1:48pm

Good. That’s a fair concern.

neufeind · June 20, 2020, 10:32am

I was astonished to first read “it’s just a data-fetch” and then learn that the data is actually wanted/collected and used without prior consent. While I know why marketing wants/needs/… that information, I think collecting it without explicit consent is a no-go. If i’m right other systems ask for permission to collect that data upon installation. With cookies nowadays there is a clear distiction what you permit, which would maybe mean here: no version-checks, checks only (functional) or also allow submitting telemetry (statistical data, version-information, …). And if people really agree to submit telemetry-data, them I’m fine to collect data that marketing desires - maybe the number of backend-users or whatever

namelesscoder · June 20, 2020, 10:53am

@mabolek All the more reason why this request shouldn’t be mandatory or only be disabled on certain hosts (btw you can’t filter that correctly, from a technical perspective - you do not know where/how the request is routed). The least we should do is ask for consent before initiating any data telemetry that gets used for statistical or marketing purposes (including measuring campaign efficiency). It doesn’t matter how useful the data is - the usefulness isn’t an argument for (or against) making telemetry mandatory.

namelesscoder · June 20, 2020, 11:00am

Two things:

“Everyone else does it / does it worse” is not a good argument for TYPO3 doing it as well, but it is a good opportunity for TYPO3 to care more than “everyone else” about consent of users - by making the check opt-in.
Usefulness is not an argument for or against mandatory data telemetry; also, a mandatory version check is not essential to the operation of a TYPO3 site - an optional, opt-in check would be equally useful to the user.

namelesscoder · June 20, 2020, 11:06am

This would depend on the nature of the gateway. NAT may hide such information, other types of routing may not. You can do a manual check for this if you, for example, make a tiny script that requests a resource from a server where you have access to logs, and check the reported REMOTE_ADDR. For hosted server environments it is more frequent than not, that the actual IP of the server is reported (as opposed to the IP of some NAT or gateway). Standard gateway setups do not report the gateway’s IP as origin address.

Specifically about reverse proxies they will most definitely report a different (less public) IP address through scraping, than through server-initiated requests. Take CloudFlare as example - there you can configure your DNS records to target CF servers and thereby hide your server’s IP, but if you initiate a request from your own server, that “hidden” IP gets reported as origin address (except in some NAT’ed cases).

mabolek1 · June 20, 2020, 7:05pm

I’m looking for statistically useful data only. A margin of error is to be expected.

Mandatory is not something I support or a word I have used. I would choose default-on with the possibility for opt-out.

On the other hand, I think usefulness is a part of the argument. Better efficiency can mean hundreds of thousands of euro saved for TYPO3, volunteer time put to the right tasks, and potentially turning around what others say is a continuing drop in market share for the CMS. (We sorely lack our own data.)

I see people’s privacy concerns in this thread. I understand them, and I have suggested steps to address these issues. What I’m imagining is no usage data extraction. It is very limited, but the lack of detail is no problem when trying to see the big picture.

namelesscoder · June 20, 2020, 9:07pm

By this I assume you mean filtering the logs that are generated by the request. What I was talking about was preventing the request from happening in the first place, based on origin IP. But even so, the same problem exists on the logging server - you cannot filter out requests based on localhost-or-not origin IP because the origin IP will always be the public one.

For the record, I don’t have any problem with usage tracking as long as it is opt-in. To me it doesn’t matter how much money is saved if the consequence is that we track users without their consent (goal won’t justify the means).

mabolek1 · June 21, 2020, 9:16am

I agree with you. The requests should not happen when the parent request originates on e.g. localhost or 127.0.0.1. TYPO3 should do that filtering preemptively.

The margin of error would apply to extracting country names from server IPs.

We can agree to disagree about this. In my view no user is tracked because it is the server making the request.

dermattes · June 22, 2020, 2:36pm

Since this topic started turning into a different direction, I wanted to make some things clear.

Marketing does not have or get get access to the server logs of get.typo3.org nor do we see any reason to grant access to them.
This applies to T3G marketing as well as the marketing team of typo3.org.

These logs are written to the webserver to be able to spot errors during the uptime of the service (aka debugging).

With this out of the way: please continue

benni · June 23, 2020, 9:49am

Hey all,

I opt in for removing the existing functionality in TYPO3 Core completely for the time being, and then come up with a proper concept on HOW to opt-in and WHAT to opt-in (what do I get for phoning home), and what data should be sent (all documented) before starting to code a feature.