Why we need centralized breach notification

Let’s start with the basics. Data Breaches are common – and will continue to be the norm.

How the App Economy and Big Data ruined it

As we shifted towards the ‘App-Economy’ and ‘Big-Data’ (circa 3 years ago), consumers begun sharing more data with more apps. Everyone and their granny, wanted to create a new app, and everyone was told to collect as much data as possible. Then, because storage costs were low, they were encouraged to store as much data as they could first – and figure out how to use it later.

[Read more]

Gov.My TLS audit: Version 2.0

Last week I launched a draft of the Gov.my Audit, and this week we have version 2.0

Here’s what changed:

  1. Added More Sites. We now scan a total of 1324 government websites, up from just 1180.
  2. Added Shodan Results. Results includes both the open ports and time of the Shodan scan (scary shit!)
  3. Added Site Title. Results now include the HTML title to give a better description of the site (hopefully!).
  4. Added Form Fields. If the page on the root directory has an input form, the names of the fields will appear in the results. This allows for a quick glance at which sites have forms, and (roughly!) what the form ask for (search vs. IC Numbers).
  5. Added Domain in the CSV. The CSV is sorted by hostname, to allow for grouping by domain names (e.g. view all sites from selangor.gov.my or perlis.gov.my)
  6. Added an API. Now you can query the API can get more info on the site, including the cert info and HTTP headers.
  7. Released the Serverless.yml files for you to build the API yourself as well :)
All in all, it's a pretty bad-ass project (if I do say so myself). So let's take all that one at a time.
[Read more]

I scanned 1000 government sites, what I found will NOT shock you

Previously, I moaned about dermaorgan.gov.my, a site that was probably hacked but was still running without basic TLS. This is unacceptable, that in 2018, we have government run websites, that ask for personal information, running without TLS.

So I decided to check just how many .gov.my sites actually implemented TLS, and how many would start being labled ’not secure’ by Google in July. That’s right, Google will start naming and shaming sites without TLS, so I wanted to give .gov.my sites the heads up!

Why check for TLS?

TLS doesn't guarantee a site is secure (nothing does!), but a site without TLS signals lack of care from the administrator. The absence of TLS is an indicator of just how lightly the security of the servers has been taken.

Simply put, TLS is necessary for not sufficient for security – and since it’s the easiest thing to detect for, without running intrusive network scans, it seems like the best place to start.

How I checked for TLS?

But first I needed a list of .gov.my sites.

To do that, I  wrote a web-crawler that started with a few .gov.my links, and stored the results. It then repeated the process for the links, the links of the links…and so forth. After 3 iterations, I ended with 20,000 links from 3,000+ individual hostnames (a word I wrongly use in place of FQDN, but since the code uses hostnames, I’m sticking to it for now – please forgive me networking nerds)

I then manually filtered the hostnames to those from a .gov.my or .mil.my domain and scanned them for a few things:

  • Does it have a https website ( if it doesn't redirect)
  • Does it redirect users from http to https
  • Does the https site have a valid certificate
    • Does it match the hostname
    • Does it have a fresh certificate (not expired)
    • Can the certificate be validated -- this required all intermediary certs to be present
  • What is the IP of the site
  • What is the asn of the IP
  • What is the server & X-Powered-By headers returned by the host
Obviously, as I was coding this, my mind got distracted and I actually collected quite a bit more data, but those fields are in the csv for you the Excel the shit out off! The repository contains both a json and jsonl file that has more data.

Now onto the results

[Read more]

Another Day, Another breach

220,000 is a lot of people. It’s the population of a small town like Taiping, and roughly twice the capacity of Bukit Jalil Stadium.

Yet today, a data breach of this size, barely registers in the news-cycle. After all, the previous data breach was 200 times bigger, and occurred just 3 months ago. How could we take seriously something that occurs so frequently, and on a scale very few comprehend.

Individually, each breach is not particularly damaging, it’s a thin thread of data about victims, but they do add up. Criminals use multiple breaches, and stitch together a fabric of the victims identity, eventually being able to forge credit card applications in their name, or to perform typical scams.

But if you’re thinking of avoiding being in a breach, that’s an impossible task. The only Malaysians that weren’t part of the telco breach, were those without mobile phones. In the organ donor leak, the victims were kind-hearted souls who were innocent bystanders in the war between attackers and defenders on the internet.

The only specific advice that would work, would be to not subscribe to mobile phone accounts and don’t pledge your organs. That is not useful advice.

I wanted this post to be about encouraging people to stop worrying about data breaches, and move on with their lives. To accept that the price of living in a hyper-connected world, is that you’ll be data breach victim every now and then – I wanted to demonstrate this by actually going out and pledging my organs to show that we shouldn’t be afraid.

But when I went to the Malaysian organ donation website (demarorgan.gov.my), I was greeted by all too common “Connection is Not Secure” warning. Which just made my head spin!

[Read more]

That long post about Data breaches (you never wanted to read!)

Part 1: An intro to Data Breaches

Let's start with some basics. What is a Data Breach?

According to Verizon, a data breach is when you’ve confirmed that data has been lost to an attacker, while a data incident is merely something that ‘may’ result in a breach.

An incident is when a laptop goes missing from your company’s office.

A breach is when the data on that laptop is published online.

[Read more]

Part 8: False prepaid registrations

Consider this a bonus piece from my long thoughts about data breaches. You might the older post before reading this. So let’s dive in.

The telco breach was a giant hairball of issues, and one of the strands in the hairball is false prepaid registrations.

Immediately after releasing sayakenahack, people reported that they were seeing additional numbers linked to their mykad numbers. From TheStar:

Malaysian Communications and Multi­media Commission (MCMC) network security and enforcement sector chief officer Zulkar­nain Mohd Yassin said it would most likely be a case of other people using another person’s identity to register.

“We are serious about this. That’s why you see many compounds issued by the MCMC to service providers in respect of non-compliance with the guidelines of prepaid registrations,” he said.

He’s right, telcos have been issued summons for false registrations every year from 2014 to 2017, withTune Talk chief executive officer Jason Lo telling Digital News Asia (DNA):

[Read more]

Writing Millions of rows into DynamoDB

While designing sayakenahack, the biggest problem I faced was trying to write millions of rows efficiently into DynamoDB. I slowly worked my way up from 100 rows/second to around the 1500 rows/second range, and here’s how I got there.

Work with Batch Write Item

First mistake I did was a data modelling error. Sayakenahack was supposed to take a single field (IC Number) and return the results of all phone numbers in the breach. So I initially modeled the phone numbers as an array within an item (what you'd called a row in regular DB speak).

Strictly speaking this is fine, DynamoDB has an update command that allows you to update/insert an existing item. Problem is that you can’t batch an update command, each update command can only update/insert one item at a time.

Running a script that updated one row in DynamoDB (at a time) was painfully slow. Around 100 items/second on my machine, even if I copied that script to an EC2 instance in the same datacenter as the DynamoDB, I got no more than 150 items/second.

At that rate, a 10 million row file would take nearly 18 hours to insert. That wasn’t very efficient.

So I destroyed the old paradigm, and re-built.

Instead of phone numbers being arrays within an item, phone numbers were the item itself. I kept IC Number as the partition key (which isn’t what Amazon recommend), which allowed me to query for an IC Number and get an array of items.

This allowed me to use DynamoDB’s batch_write_item functionality, which does up to 25 request at once (up to a maximum of 16MB). Since my items weren’t anywhere 16MB,  I would theoretically get a 25 fold increase in speed.

In practice though, I got ‘just’ a 10 fold increase, allowing me to write 1000 items/second, instead of 100. This meant I could push through a 10 million row file in under 3 hours.

First rule of thumb when trying to write lots of rows into DynamoDB – make sure the data is modeled so that you can batch insert, anything else is painfully slow.

[Read more]

Identity in a Post-Breach world (draft)

Posting this here first, my thoughts to follow. Random thoughts below are draft :).

Random thoughts on the matter

  1.  We still need a single identifier in Malaysia (IC Number), this is administrative necessity. LHDN needs to check your bank accounts, Election Commission needs to know you're not double-voting..etc.
  2. But that single identifier should not be used as an authenticator. No one should ask me for my IC number as a means of authenticating myself. When I call the bank, they shouldn't be asking me for my IC number as a means of proving my identity to them.
  3. There's too much info in the IC number (age, state, gender). Take all of that out, and replace with a random blob of numbers -- one that cannot be guessed as well. So something like 8 numbers and 4 letters large enough, so criminals can't guess.
  4. We need 'identity-freezes' in Malaysia. In America you can freeze your credit, but in Malaysia we need to go a step further and put an Identity freeze, especially for internet services.
  5. Check out section 114(a) of the evidence act, wrongly registered phone numbers are a thing, and they're bad. If someone registered a pre-paid account in your name, and posted something bad -- you'd be in trouble.
  6. If you took a loan from Maybank to buy a house, and 1 year later defaulted on the loan, no other bank in the country would grant you a loan. This protects the banks from issuing credit to someone who can't pay back. So we have credit freezes.
  7. Let's use the same mechanism to allow people to lock their identities, so no one can open bank accounts, telco accounts, not even Astro, TNB or Indah Water as long as the identity is locked. This way, the value of a stolen identity is tremendously reduced, and we protect breached victims.
  8. Identities can be 'un-freezed', e.g. when you buy a house, but then re-freezed shortly after.
  9. More thoughts to come.....
[Read more]

Sayakenahack: Epilogue

I keep this blog to help me think, and over the past week, the only thing I’ve been thinking about, was sayakenahack.

I’ve declined a dozen interviews, partly because I was afraid to talk about it, and partly because my thoughts weren’t in the right place. I needed time to re-group, re-think, and ponder.

This blog post is the outcome of that ‘reflective’ period.

The PR folks tell me to strike while the iron is hot, but you know – biar lambat asal selamat.

Why I started sayakenahack?

I'm one part geek and one part engineer. I see a problem and my mind races to build a solution. Building sayakenahack, while difficult, and sometimes frustrating, was super-duper fun. I don't regret it for a moment, regardless of the sleepless nights it has caused me.

But that’s not the only reason.

I also built it to give Malaysians a chance to check whether they’ve been breached. I believe this is your right, and no one should withhold it from you. I also know that most Malaysians have no chance of ever checking the breach data themselves because they lack the necessary skills.

I know this, because 400,000 users have visited my post on “How to change your Unifi Password”.

400,000!!!

If they need my help to change a Wifi password, they’ve got no chance of finding the hacker forums, downloading the data, fixing the corrupted zip, and then searching for their details in file that is 10 million rows long – and no, Excel won’t fit 10mln rows.

So for at least 400,000 Malaysians, most of whom would have had their data leaked, there would have been zero chance of them ever finding out. ZERO!

The ’normal’ world is highly tech-illiterate (I’ve even talked about it on BFM).  Sayakenahack was my attempt to make this accessible to common folks. To deny them this right of checking their data is just wrong.

But why tell them at all if there’s nothing they can do about it? You can’t put the genie back in the lamp.

[Read more]

Sayakenahack architecture

I know the picture is a bit hard to read, but I wanted to make sure I had a detailed enough picture to understand the ‘innards’ of sayakenahack. Sometimes when you’re building stuff on the fly, and bottom-up, it’s good to take a step back, and have a top-down view.

I’ll be expanding this post over time, wanted to get my thoughts down quickly on paper before I moved on.

[Read more]