Uncovering Trends in Email Breaches
Data breaches have unfortunately become a very frequent occurrence. In the first 6 months of 2019 there were over 3,800 breaches — that’s an estimated 4.1 billion records, a 50% increase over the past 4 years.
So what is a data breach? The GDPR defines it as “a breach of security leading to the accidental or unlawful destruction, loss, alteration, unauthorized disclosure of, or access to, personal data transmitted, stored or otherwise processed.” In other words: any time private data is accessed by unauthorized outsiders.
As a small step towards making our data more secure, we decided to find out the types of email addresses that most commonly appear in data breaches. Are there things you can do when selecting an email address that might make you more likely to experience a breach? For instance, is a long or short email address more secure? What about including numbers or your first name in the address?
To explore these questions and more, we ran a random sample of 212,000 public email addresses through the free resource Have I Been Pwned (HIBP) to determine how many of them have been compromised in a data breach and at what frequency.
HIBP is trusted by organizations like Mozilla and 1Password to verify breaches. From there, we analyzed the results to determine tendencies in things like the email provider or country domain the email is associated with. We also looked at more general issues like how the average number of breaches compares for long and short email addresses. Read on to learn more about what we uncovered.
Note: a more detailed methodology can be found at the bottom of the page.
Comparing Email Domains: .Com, .Org, .Net, and More
First, we compared email addresses by domain to determine the percentage of each that had appeared in a breach. We found that .com email addresses had both the highest percentage of addresses to appear in a breach (80%) and the highest average number of breaches (33).
The next highest percentage of breached emails were .uk addresses followed by .ca with 63% and 59% of emails breached. The .us domains had the smallest number of breaches with 29%.
Comparing Email Providers: AOL, Gmail, Hotmail, MSN, and Yahoo
We looked at 5 of the largest email providers: AOL, Gmail, Hotmail, MSN, and Yahoo.
AOL users in the sample experienced the most breaches by far, with nearly all of the AOL email addresses in the sample being involved in at least one breach.
Note that AOL pre-dated Gmail by almost 20 years, which may explain the distribution.
Gmail was least affected by breaches, with about 3 in 4 email addresses being involved in a breach.
How Often Various Country Domains Appeared in a Data Breach
We analyzed domains for 10 different countries: Australia (.au), Canada (.ca), China (.cn), Colombia (.co), Germany (.de), Ireland (.ie), New Zealand (.nz), Singapore (.sg), the United Kingdom (.uk), and the United State (.us).
Emails from the German domain had the highest percentage having appeared in a breach (64%), followed very closely by those from the United Kingdom (63%). Emails from the Chinese domain had the lowest percentage of breached emails with only 8% and an average number of breaches of just 0.36.
Long vs Short Emails: Which Are More Likely to Appear in a Breach?
Next, we looked at the length of email addresses.
For background, the median email address length of the sample was 9 characters and the standard deviation was 3.6; therefore, we set the standard range of email address lengths to between 5 and 12 characters.
Email addresses with 1-4 characters were categorized as short and those with 13 or more characters were categorized as long. We found that short email addresses were breached more frequently, with 71% breached compared to 62% of long emails.
Based on this, we can conclude that short emails are more likely to appear in a breach than long emails.
Men’s vs Women’s Email: Which Are More Likely to Appear in a Breach?
What about men and women? Which gender’s email is more likely to appear in a breach?
About 55% of the emails in the sample included a first name, and we used lists of the top 1,000 men’s and women’s names to determine which category each fell into.
The results of the analysis of the collected sample showed that email addresses with a man’s names are slightly more likely to be involved in a breach, but only by about a difference of 1%.
Similarly, the average number of breaches for men and women’s email addresses was almost the same — an average of 26.31 breaches for men and 25.45 breaches for women.
The First Names That Have Appeared in the Most Data Breaches
Digging further into the email addresses with names included in the sample, we also found the first names with the highest percentage of breached emails and average number of breaches. The percentage of breached emails represents the percentage of emails with the names included in them that have appeared in at least one breach.
The average number of breaches represents the number of breaches each name has appeared in on average. We calculated these figures for the most commonly recurring names (or portions of a longer name) in the sample.
Email addresses with the name “Angel” saw the highest percentage of breaches at almost 95%. Those with the name “Nehemiah” saw the highest average number of breaches with 180.
Note that these results may suffer from statistical data dredging and might not indicate any kind of causal relationship.
The Role-Based Emails Most Likely to Appear in a Breach
Email addresses like info@ or admin@ often serve as general accounts for businesses. Normally, multiple people within an organization have access to them on a regular basis. Does this make them more likely to appear in a breach?
Overall, 62% of emails that include one of these job titles or categories experienced a breach, compared to 70% of those that do not contain a job title.
The role-based email addresses support@ and admin@ saw the lowest percentage of breached emails, both below 25%. The role-based email with the highest percentage of breached emails was admissions@ with 90% of emails breached.
The Presence of Numbers in an Email Address
Lastly, we explored how the presence of numbers in an email address affects the percentage of emails breached. We found that about 94% of emails with numbers had been involved in a breach compared to only 65% without digits in the address.
On a more fanciful note, does including well-known number strings like 69, 420, and 123 impact your likelihood of appearing in a breach? When looking at our sample, it does. Nearly all of the email addresses included in the sample with one of these 3 number strings had been involved in a breach, and the average number of breaches was more than 40 for all.
What does all of this mean for you? Numerous insights can be drawn from our study on how to craft the “safest” possible email address.
For one, the next time you set up an email account, you could refrain from including numbers in the address, particularly common number strings like 123. While we can’t say for sure, this could potentially help lower your chances of being involved in a data breach based on the results of the study.
We believe that the powerful insights from this study will convince internet users around the world to be more careful with how and where they share their email address, given the significant risks of appearing in a data breach.
To produce this report, we — Quality Nonsense Ltd (Company No: 05889123) 27 Mortimer Street, London, W1T 3BL — engaged B. Patt LLC (d/b/a Go Fish Digital) to collect a sample of 212,000 publicly available email addresses from a variety of commercial and government websites including universities and businesses. The sole purpose of collecting and processing such email addresses was to prepare this report.
Data was processed in the US, but the study is published by an English company. Mindful of the GDPR, we undertook the project on the grounds of “legitimate interest.”
We knew from previous research that the majority of internet users are unaware of just how likely their email address is to appear in a data breach. Despite numerous warnings, user behavior has changed little over recent years.
For greater context, Homeland Security Secretary Kirstjen Nielsen recently cited sophisticated hacking and cybersecurity as a greater threat to the United States than the risk of physical attacks, according to The Washington Post.
We believe that the powerful insights from this study will convince internet users around the world to be more careful with how and where they share their email address, given the significant risks of appearing in a data breach. These wider societal benefits are in our, and indeed your, legitimate interest. We believe that showing real statistics is the only way to significantly change user behavior. The only way to do this is by processing real user data, as we have in this study.
As the only personal data we used in this study was publicly available email addresses, we don’t believe that any individual’s interests override the overall legitimate interest of conducting this study. A number of email addresses analyzed in this study were not personal data and, where such email addresses did constitute personal data, the results of the study do not identify any one individual over and above another. The email addresses are not particularly sensitive or private information and the results of the study sought to benefit rather than harm the individuals involved.
We used Have I Been Pwned (HIBP) and their API to check how many of the email addresses have been compromised in a data breach and how often.
No other third parties were involved.
No additional personal data (other than email addresses) was collected, stored, or processed in the study. After the study was completed, all email data was permanently deleted.
If you have questions about the methodology or the study in general, please contact us via our contact form. We’ll be happy to answer them.