What is my dataset?

When a HIPAA covered organization experiences a data breach that affects more than 500 individuals, Section 13407e of the HITECH (Health Information Technology for Economic and Clinical Health) act requires that entity to provide notification of the breach to the Department of Health and Human Services. The Secretary makes this information available on a public web page. The data can be viewed either on the web page or in file form via a download here.

What was my motivation?

I decided to choose this dataset because of the the relevance that the topic of data breaches has in today’s news. Every other week or so, it seems like there’s another hack, another phishing scam, or another data leak of people’s credit card and Social Security Numbers. To me this dataset seemed like a great look into really just a subset of those data breaches, breaches in the field of healthcare.

Column Descriptions

Name Description
Name_of_Covered_Entity name of the entity experiencing the breach
State 2-letter code of the state where the breach occurred
Individuals_Affected number of humans whose records were compromised in the breach
Date_of_Breach vector giving the date or date range of the breach
Type_of_Breach type of breach (e.g., "Theft" vs., "Unauthorized Access/Disclosure", etc.)
Location_of_Breached_Information location from which the breach occurred (e.g., "Paper", "Laptop", etc.)
Date_Posted_or_Updated Date the information was posted to the HHS data base or last updated
breach_start Date of the start of the incident
breach_end Date of the end of the incident
Year Year of the incident

Tableau Prototype

d3 Visualization

Click on a state to see more info


Findings:

When I hear of a data breach, the first thing that comes to my mind is a hacker sitting in his basement working to find vulnerabilities in a corporation's systems, and exploiting them for his own gain. The story that this data tells however, is very different. The number of people affected is huge; in one incident alone, almost 5 million people were affected. However, the most common location or breached data wasn't on a laptop, or some secure server. More often than not, data breaches happened when people left paperwork lying around, or when someone accidentally left a laptop on public transportation. One of the most common types of breaches was "Loss", and that's just the result of employee's negligence, not some ace hacker. Data breaches are a real problem, and measures need to be put in place so that people's sensitive information isn't leaked due to one person's mistake.

About Me

Hi, my name is Gilbert. I am currently a senior studying Computer Science at the University of San Francisco. I usually code in Java, and hope to be a backend developer coming out of college. As for hobbies, I watch a lot of TV shows, play tennis, and I enjoy going to the movies. Right now, I'm in the middle of Sense8 (a 10/10 show), and waiting for House of Cards to finally come out.