The state of the Internet as of January 2022
In a previous post, I wrote about working with The Internet Society to rewrite an IPv6 crawler. In this post, I wanted to share some of the results I found interesting from the most recent crawl of the top million websites (27th January 2022).
Website Records
This crawl found 1,734,877 published A records for website hostnames (note that it doesn't take the stem, so www.silvermou.se
would be recognised but not silvermou.se
- this was for backwards compatibility).
IPv4
Of the 1,734,877 records, 1,734,758 had A records, of which only 505,879 (29.16%) were unique.
102,889 (5.93%) of the A records were the same 7 IPs:
8573 208.91.197.25,AS40034 CONFLUENCE-NETWORK-INC
10098 23.227.38.74,AS13335 CLOUDFLARENET
10611 199.59.243.200,AS16509 AMAZON-02
14905 208.91.197.26,AS40034 CONFLUENCE-NETWORK-INC
15671 188.114.97.0,AS13335 CLOUDFLARENET
15776 188.114.96.0,AS13335 CLOUDFLARENET
27255 209.99.64.71,AS40034 CONFLUENCE-NETWORK-INC
550,123 (31.71%) were registered as either Cloudflare, Amazon, Confluence, or Google:
10032 AS22612 NAMECHEAP-NET
10443 AS396982 GOOGLE-CLOUD-PLATFORM
10674 n/a
11354 AS54600 PEGTECHINC
12490 AS26496 AS-26496-GO-DADDY-COM-LLC
13046 AS8560 IONOS SE
13415 AS2635 AUTOMATTIC
15298 "AS37963 Hangzhou Alibaba Advertising Co.
18499 "AS63949 Linode
20667 AS395954 LEASEWEB-USA-LAX-11
21504 AS54113 FASTLY
22066 AS32244 LIQUIDWEB
22208 AS8075 MICROSOFT-CORP-MSN-AS-BLOCK
23476 AS25504 Vautron Rechenzentrum AG
25872 AS46606 UNIFIEDLAYER-AS-1
36156 AS14061 DIGITALOCEAN-ASN
38157 AS24940 Hetzner Online GmbH
38566 AS16276 OVH SAS
46082 AS14618 AMAZON-AES
61781 AS15169 GOOGLE
107292 AS40034 CONFLUENCE-NETWORK-INC
117007 AS16509 AMAZON-02
200605 AS13335 CLOUDFLARENET
Statistics per A record
Of the 1,734,758 records returned:
- 1,441,639 (83.10%) were reachable on both port 80 and port 443;
- 231,094 (13.32%) were HTTP only;
- 4,606 (0.27%) were HTTPS only;
- 62,025 (3.58%) were unreachable on port 80;
- 288,513 (16.63%) were unreachable on port 443;
- 57,419 (3.31%) were not reachable on either port 80 or port 443.
Statistics per unique IPv4 address
To recap the above figures per unique IP address, of which there were 505,879:
- 443,384 (87.65%) were reachable on both port 80 and port 443;
- 39,386 (7.79%) were HTTP only;
- 1,880 (0.37%) were HTTPS only;
- 23,109 (4.57%) were unreachable on port 80;
- 60,615 (11.98%) were unreachable on port 443;
- 21,229 (4.20%) were not reachable on either port 80 or port 443.
IPv6
Of the 1,734,877 records, 292,671 (16.87%) had AAAA records, of which 102,742 (35.11%) were unique.
Statistics per AAAA record
Of the 292,671 records returned:
- 280,551 (95.86%) were reachable on both port 80 and port 443;
- 4,471 (1.53%) were HTTP only;
- 783 (0.27%) were HTTPS only;
- 7,649 (2.61%) were unreachable on port 80;
- 11,337 (3.87%) were unreachable on port 443;
- 6,866 (2.35%) were not reachable on either port 80 or port 443.
Statistics per unique IPv6 address
To recap the above figures per unique IPv6 address, of which there were 102,742:
- 96,524 (93.95%) were reachable on both port 80 and port 443;
- 1,782 (1.73%) were HTTP only;
- 406 (0.40%) were HTTPS only;
- 4,436 (4.32%) were unreachable on port 80;
- 5,812 (5.66%) were unreachable on port 443;
- 4,030 (3.92%) were not reachable on either port 80 or port 443.
Dual connectivity
- 1,447,206 (83.17%) only had A records published;
- 5,119 (0.29%) only had AAAA records published;
- 280,757 of the 1,734,877 (16.18%) were reachable on both IPv4 and IPv6; 777 records (0.045%) were not reachable on either protocol.
Mailserver (MX) Records
The total number of domains with MX records published was 744,226, with 1,679,283 total MX records published.
1,202 MX records were set to localhost.
, and a further 20,964 did not resolve to a valid A or AAAA record. Discarding these left 735,875 domains and 1,658,319 MX records.
For the MX records:
- 1,658,091 (99.99%) had corresponding A records;
- 1,487,116 (89.68%) were reachable over IPv4;
- 827,490 (49.90%) had corresponding AAAA records;
- 817,936 (49.32%) were reachable over IPv6;
- 228 (0.014%) had only AAAA records published;
- 5,347 (0.32%) were not reachable over either IPv4 or IPv6;
- 30,068 (1.8%) did not support STARTTLS.
For the domains:
- 735,848 of the 735,875 (99.9963%) had at least one MX record with a corresponding A record;
- 621,352 (84.44%) had at least one MX record which was reachable over IPv4 at the time of the crawl;
- 199,282 of the 735,875 (27.08%) had at least one MX record with a corresponding AAAA record;
- 192,766 (26.20%) had at least one MX record which was reachable over IPv6 at the time of the crawl.
752,436 of the 827,490 AAAA records (90.93%) were pointed to Google's mailservers.
Without Google the percentage of domains with IPv6 accessible mailservers would drop from 27.08% to 6.88%, and total number of MX records with AAAA records would drop from 49.90% to 8.17%.
Uniqueness of mailservers
- Of the 1,658,319 MX records, only 434,810 (26.22%) were unique;
- 751,618 (45.32%) of all published MX records were pointing to one of Google's MX servers;
- An additional 24,750 (1.49%) were pointing to an IP address in
AS15169 GOOGLE
; - 148,845 domains (representing 20.23% of those with MX records) had at least one MX record pointing to Google;
- 160,936 domains (21.87%) had at least one MX record pointing to
AS15169 GOOGLE
; - 104,357 domains (14.18%) had at least one MX record pointing to Outlook, and an additional 1,845 (0.25%) were pointing to another MX record in
AS8075 MICROSOFT-CORP-MSN-AS-BLOCK
; - This represents 36.3% of domains having at least one MX record pointing to either Outlook or Gmail.
I grouped the data by domain,AS
entries to catch domains which have varied MX records, and was left with 800,527 entries, sorted as follows:
2785 "AS52129 Proofpoint
2893 AS198610 Beget LLC
3163 AS31034 Aruba S.p.A.
3376 AS13916 PROOFPOINT-UT7
3378 "AS45102 Alibaba US Technology Co.
3466 "AS63949 Linode
3574 AS13335 CLOUDFLARENET
3626 AS32244 LIQUIDWEB
4171 AS42427 Mimecast Services Limited
4758 AS25504 Vautron Rechenzentrum AG
4794 AS14061 DIGITALOCEAN-ASN
6016 AS2639 ZOHO-AS
6637 AS27357 RACKSPACE
7026 AS19994 RACKSPACE
7062 AS26211 PROOFPOINT-ASN-US-WEST
7549 AS14618 AMAZON-AES
7586 AS136958 China Unicom Guangdong IP network
8268 AS30031 MIMECAST
8447 n/a
10102 AS22843 PROOFPOINT-ASN-US-EAST
10366 AS8560 IONOS SE
11518 AS24940 Hetzner Online GmbH
11765 AS13238 YANDEX LLC
14874 AS22612 NAMECHEAP-NET
15775 AS26496 AS-26496-GO-DADDY-COM-LLC
16455 AS16276 OVH SAS
17111 AS46606 UNIFIEDLAYER-AS-1
28245 AS16509 AMAZON-02
105971 AS8075 MICROSOFT-CORP-MSN-AS-BLOCK
160936 AS15169 GOOGLE
The above shows that Google and Microsoft are dominating the mailserver market for the top million sites in this dataset.
Nameservers
- There were 2,502,001 nameservers (DNS servers) returned;
- 1,605,634 (64.17%) of these were accessible over IPv6;
- Only 198,457 (7.93%) were unique;
- There were 15,706 different providers according the AS information;
- 554,168 (22.15%) were pointing to Cloudflare;
- 1,006,524 (40.23%) were pointing to one of three providers - Cloudflare, Amazon, or Host Europe.
39929 AS397213 ULTRADNS
46505 AS16276 OVH SAS
56039 AS15169 GOOGLE
58225 AS16552 TIGGEE
59154 AS8560 IONOS SE
154298 AS44273 Host Europe GmbH
298058 AS16509 AMAZON-02
554168 AS13335 CLOUDFLARENET
Summary
The Internet is increasingly centralised and the majority of WWW and MX records point to the same few places - in the case of WWW a large part of this is due to Cloudflare, which of course masks the true hosting location, but these hosts are still relying on Cloudflare:
- 5.93% of hosts use the same 7 IPv4 addresses - though this has actually decreased from 14.10% in 2010 - 9.74% were using Blogspot domains then;
- 31.71%) were registered as either Cloudflare, Amazon, Confluence, or Google. In 2010, only 13.97% were resolving to the same 4 providers, and 30% share was split across 19 providers;
- Of all of the 3,126,709 IP addresses collected (including for MX, NTP and NS), only 874,980 (27.98%) were unique;
- IPv6 availability is still quite low, with only 16.83% of hosts having AAAA records published (though up from 1.53% in 2010);
- 45.32% of all MX records pointed to Google, and 36.3% of all domains had at least one MX record pointing to either Google or Outlook - in 2010 this was 9.50% of domains;
- Over 90% of the IPv6 reachable MX records were Google;
- Only 7.93% of the 2,502,001 nameservers were unique (down from 15.34% in 2010), and over 40% of them pointed to one of three providers (up from 23% in 2010);
- 17.04% of the domains were using one of three providers for at least one nameserver, up from 6.45% in 2010 - when 17% was split across 24 providers.