T3versions.com continued improvements and reports

|What is your idea about?
Continue to improve and extend the T3versions project. Providing the TYPO3 community with valuable insights on TYPO3 usage.

What is the potential impact of your idea?
Valauble insights for both the product team and marketing team. Maybe even for the security team, GmbH and agencies/partners.

Who can / should implement your idea?
Working together with Torben. Hope to include more community members.

Approximate Funds needed|
€10,000 - €25,000

Application by
Ronald Meeuwissen

2 Likes

Are there any specific goals that you want to achieve within this budget?

3 Likes

While I definitely support the goal of gathering version usage information for marketing and security purpose, I’m not sure whether doing this through that tool is not the right approach because the insights are very limited.

I would instead spend that money on implementing a core feature and the infrastructure to collect these statistics from the typo3 installation itself, given consent required. There was a discussion around this here: https://decisions.typo3.org/t/disable-external-requests-until-opted-in-by-admin/593

It doesn’t help us figuring out versions of existing old installations in the wild, but I don’t know if that’s really needed, and still if so, it could be achieved using a bot which can then be built as part ot t3versions.com project which could then be a specific goal.

@richardhaeser Sure! We will share our project backlog here.

@moger In what way insights are very limited? Do you have any suggestions for our statistical pages? Happy to add a few more widgets with additional insights.

We are happy to hook into any core initiatives…

Some tasks from our backlog:

  • Recognize multi-installations on one domain
  • Check known TYPO3 sites for installed extensions
  • Extend service with security scanning features (e.g show result of security header)
  • More information who is behind the domain and group found TYPO3 websites by industry
  • Country filter by stat: Version distribution
  • Monitor lifecycle of individual websites

@moger I followed the discussion you referenced and yes, it would be really great to have an additional source for TYPO3 usage as you describe (even with more data e.g. PHP / DB versions). The idea behind such additional service already exist and Benny, Olly and me discussed it in february this year. Basically its “just” about creating a public concept, getting the community to agree on it and starting to code :wink:

Our friends from the Joomla Community have integrated such a service in Joomla Core - see:

We could adopt some of the concepts/ideas from there:

  • especially validating, that only “real” data is posted
  • sites have a unique id
  • ensuring that no one tries to spoof the database with invalid data
  • Local dev. sites do not POST data

Where I really see an issue is, that many site owners for sure will not opt in for such a service (see how the discussion evolved you referenced), so we may only gather a small(?) amount of data of all existing TYPO3 websites out there.

So IMO such a service added as a core feature / infrastructure service may only be an addition to t3versions and if such a service would exist, we could merge the collected data into a central database together with data from t3versions (we have an API in place that could be used).

For t3versions we`re open for improvements / new ideas (except publishing all sites including security related information about sites we collected).

Regarding limited insights: If I understand it correctly, the current statistics on t3versions.com are only collected through the manual version checks. So, statistically speaking, we might have a strong sampling/selection bias in the data because the tool is only used by a small subset of users (probably certain agencies?). So if the goal is to provide statistically reliable marketing insights which are based on a lare number of installations, we should consider using other means of data collection.

(Yes, even if we would use some service which relies on consent instead, we would still have a bias, but I’d say the overall numbers -and with that, reliability of statistics- increases and I would also hope that the opt-in behavior is more following a normal distribution curve among typo3 versions than the website does).

Checks are executed manually when using the t3versions.com GUI. In order to collect a large amount of new TYPO3 websites, we run a cluster of multiple “workers” which crawl millions of domains that we “feed” into the crawling queue. For the last “big” check in october 2020 we crawled ~25 million domains.

Ah, thanks for clarification - yes, then that thing makes sense :slight_smile:

2 Likes

Not to forget the API usage :wink:

For those we missed our blog; a bit more background information: https://typo3.org/article/typo3-market-insights-with-t3versionscom