Where do hackers come from?

If you have a server that’s accessible from the internet, chances are you’ll be dealing with unwanted hacking attempts. Sometimes they’ll be security researchers trying to warn you about the existence of some vulnerability. Sometimes, they want to check if you’re accidentally running an open relay. Sometimes they’re actually nefarious people. And sometimes they just want you to subscribe to PewDiePie. Where do all these people come from?

Background

Ever since the worldwide lock-down due to COVID-19 began, my own home Raspberry Pi server has been under more brute-force SSH login attempts than usual. Fortunately for me, I’d previously set-up SSHGuard which monitors these login attempts and automatically adds firewall rules to block repeat offenders. This originally resulted in about 100 new blocked addresses a week, but (possibly due to people suddenly having more time on their hands) this has grown to about 1000 a week. Since you can sort of map an IP-address to a country, this means that I accidentally got myself a nice geographical data set.

Geographical analysis

The immediate question that came to mind is: where are my biggest fans located? With the following one-liner in bash, we can analyse our black-list and get the top 10 of most listed country. You can place your bets, but none of it will be surprising:

$ cut -d\| -f4 < /var/db/sshguard/blacklist.db | xargs -n 1 geoiplookup | sort | uniq -c | sort -n | tail
CA, Canada
RU, Russian Federation
DE, Germany
KR, Korea, Republic of
SG, Singapore
BR, Brazil
IN, India
FR, France
US, United States
CN, China

Just listing these numbers doesn’t really communicate just how much these top countries dominate the total number of cases, so let’s look at it in a different way:

Distrubution of IP-addresses by country

As is to be expected there are a few dominating countries and then the numbers falls off quickly. China has a lot of people and a lot of internet users, so obviously they’re number one. The US has an insanely large address space so they’re also not unexpected. Having Singapore in there is a bit weird though, and I don’t have a good explanation for it. Is this a fair distribution? Let’s look at it differently.

Normalizing for population

There are more people in China than there are in any particular other country, so it’s not unexpected that they’re at the top of the list. That does not tell you anything (useful) about the difference between China and the rest of the world. One thing that we can do, is to normalize for the number of internet users in the country.

This number is a bit fuzzy for various reasons, but we can make a guess at what it should be. Various organisations track the internet percentage, and the World Bank is nice enough to publish their data in a mostly readable format. If we combine that with the population data that the UN kindly publishes, we can get a compute what percentage of a particular country’s internet users I’ve blocked. Doing that, we get the following top 20:

Fraction of internet users banned by country

Other than being somewhat more popular in Singapore and among my own countrymen, this paints a similar picture as the previous overview. So instead, let’s not look at how many people we have blocked, but how many people we can block.

Normalizing by IP space

Not every country has an IP-address space proportional to their population. Nepal for example, has a population of approximately 29 million who share about half a million IPv4 addresses between them. The United States on the other hand sport a population of 321 million with nearly one billion (a billion with nine zeroes, not twelve) IPv4 addresses. Or in other words, while a Nepalese person might have to be content with one sixtieth of an IPv4 address, an American gets three.

As with all fun statistics, these numbers are misleading. IP addresses are simply never handed out by population. Instead, the US keeps large swaths of its address range reserved for military purposes, but more importantly, IP ranges were sold like candy when the technology was still new. Quite a few large companies happen to have a /8 range (16 million addresses) to themselves for no reason other than that they asked for it. In general, countries that joined the network earlier on and were somewhat affluent at the time got more IP space. Regardless of whether it is a fair normalization, our top 20 now looks as follows:

Fraction of IP space banned by country

Singapore is still number one for some reason, but now the other top countries are places with fewer IP-addresses per capita. The US has completely disappeared as was to be expected. I don’t expect Apple to try to brute force my rPi’s password, and the US military address space also has better stuff to do.

Note: I have discarded any country with fewer than 5 incidents from this normalization, as it can randomly skew the data for countries with a small number of IP addresses.

Friends from Tor

There are IP-addresses that are more suspect than others. When I started researching this post, I suspected that the Tor exit nodes would be well represented in the list. Tor, like any other anonymization system, is commonly used to both hide legitimate things from questionable authorities and questionable things from legitimate authorities. If I were on a crusade against every exposed SSH server on the entire internet, that’s what I’d use to hide my identity. You can imagine my surprise when not a single one of the known current exit nodes are on the list.

As a heat map

Listings like the ones shown before don’t really do justice to just how many places these login attempts come from. There are 131 countries represented in the data set, which is a lot, given that there are approximately 200 countries in the world depending on who you ask. We can show this on a logarithmic heat map, and I think it’s beautiful:

Larger version

My map unfortunately doesn’t show certain city states and smaller countries due to the limitations of my heat-mapping skills, but we can still get a decent picture of just how much of the world is present. You can hover over a country to see more details.

In conclusion

There is very little to conclude. This post has been a not-so-serious attempt at analysing where the IP addresses harassing my home server come from. I think it makes for pretty pictures but that’s it. I will not publish the list of IP addresses for obvious reasons. If there is any takeaway from this article, it is that you should take some defensive measures if you expose a service to the internet. There’s quite a few people out there trying to get in.

Disclaimer

This article is intentionally tongue-in-cheek and in no way accuses any particular country or people of computer crimes. IP-based geolocation is far from perfect, and the published IP range assignments don’t necessarily reflect how they are actually being used. Even when someone’s IP-address is used to abuse external services, it does not mean they are responsible for the abuse. Their computer may be infected with malware, or the IP-address may actually be used as a network proxy and the actual attacker will be somewhere else. All this is to say: don’t judge the people from any country based on the data I show here.

Sources used

IP location data from the GeoLite database created by MaxMind was used to determine the location of blacklisted IP addresses. This data was made available under the Open Data License.
Number of allocated IP addresses by country, accessed April 26th 2020.
Individuals using the internet (% of population), accessed April 27th 2020.
World Population Prospects, accessed April 27th 2020.
List of Tor exit nodes, accessed June 7th 2020.