Posts for: #Keith's Favorite Post

Access Keys in AWS Lambda

Let's look at AWS Access Keys inside a Lambda function, from how they are populated into the function's execution context, how long they last, how to exfiltrate them out and use them, and how we might detect an compromised access keys.

But before that, let's go through some basics. Lambda functions run on Firecracker, a microVM technology developed by Amazon. MicroVMs are like docker containers, but provide VM level isolation between instances. But because we're not going to cover container breakouts here, for the purpose of this post we'll use the term container to refer to these microVMs.

Anyway...

Lambda constantly spins up containers to respond to events, such as http calls via API Gateway, a file landing in an S3 bucket, or even an invoke command executed from your aws-cli.

These containers interact with AWS services in the same exact way as any code in EC2, Fargate or even your local machine -- i.e. they use a version of the AWS SDK (e.g. boto3) and authenticate with IAM access keys. There isn't any magic here, it's just with serverless we can remain blissfully ignorant of the underlying mechanism.

But occasionally it's a good idea to dig deep and try to understand what goes on under the hood, and that's what this post seeks to do.

So where in the container are the access keys stored? Well, we know that AWS SDKs reference credentials in 3 places:

  • Environment Variables
  • The ~/.aws/credentials file
  • The Instance Metadata Service (IMDS)

If we check, we'll find that our IAM access keys for lambda functions are stored in the environment variables of the execution context, namely:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • AWS_SESSION_TOKEN

You can easily verify this, by printing out those environment variables in your runtime (e.g. $AWS_ACCESS_KEY_ID) and see for yourself.

OK, now we know where the access stored keys are stored, but how did they end up here and what kind of access keys are they? For that, we need to look at the life-cycle of a Lambda function...

[]

Contact Tracing Apps: they’re OK.

I thought I'd write down my thoughts on contact tracing apps, especially since a recent BFM suggested 53% of Malaysians wouldn't download a contact tracing app due to privacy concerns. It's important for us to address this, as I firmly believe, that contact tracing is an important weapon in our arsenal against COVID-19, and having 54% of Malaysians dismiss outright is concerning.

But first, let's understand what Privacy is.

Privacy is Contextual

Privacy isn't secrecy. Secrecy is not telling anyone, but privacy is about having control over who you tell and in what context.

For example, if you met someone for the first time, at a friends birthday party, it would be completely rude and unacceptable to ask questions like:

  • What's your weight?
  • What's your last drawn salary?
  • What's your age?

In that context you're unlikely to find someone who will answer these questions truthfully.

But...

Age and weight, are perfectly acceptable questions for a Doctor to ask you at a medical appointment, and your last drawn salary is something any company looking to hire you will ask. We've come to accept these questions as OK -- under these contexts.

You might still not want to answer them, which might mean you don't get the job, or the best healthcare -- but you certainly can't be concerned by them. Far more people will answer these same questions truthfully if you change the context from random stranger at a party to doctors appointment.

So privacy is contextual, to justify concerns we have to evaluate both the context and the question before coming to a conclusion.

So let's look at both, starting with the context:

[]

My experience with AWS Certified Security - Specialty

Last week I took the AWS Certified Security - Specialty exam -- and I passed with a score of 930 (Woohoo!!)

In this post I cover why I took it, what I did to pass, my overall exam experience, and some tips I learnt along the way.

So let's go.

Why?

Why would anybody pay good money, subject themselves to hours of studying, only to end up sitting in a cold exam room for hours answering many multiple choice questions!

And the reward for that work is an unsigned PDF file claiming you're 'certified', and 'privilege' access to buy AWS branded notebooks and water bottles!! Unless those water bottles come with a reserved instance for Microsoft SQL server in Bahrain, I'm not interested.

But, jokes asides, I did this for fun and profit, and fortunately I really did enjoy the preparing for this exam. It exposed me to AWS services that I barely knew -- and forced me to level-up my skills even on those that I knew.

The exam has a massive focus on VPC, KMS, IAM, S3, EC2, Cloudtrail and Cloudwatch. While lightly touching Guardduty, Macie, Config, Inspector, Lambda, Cloudfront, WAF, System Manager and AWS Shield.

You need to catch you breath just reading through that list!

But for those diligently keeping count -- you'd notice that the majority of those services are serverless -- meaning the exam combined my two technological love-affairs ... security and serverless!

I wasn't lying when I said it was fun. So what about the profit.

I'm not sure how good this would be for my career (I literally got the cert last week), but for $300, it's is relatively cheap, with a tonne of practical value. So trying to get an ROI on this, isn't going to be hard.

For comparison, the CCSP certification cost nearly twice as much, is highly theoretical and requires professional experience.

The results also help me validate my past years of working on serverless projects, proving I wasn't just some rando posting useless hobby projects on GitHub. Instead, I'm now a certified AWS professional, posting useless hobby projects on GitHub (it's all about how you market it!)

So now that we've covered the why, let's move onto how.

[]

The problem with Grab

As a company, Grab has done enormously well for itself, and naturally will be the target of some hate.

But I think there's a deeper issue with Grab that needs addressing before it becomes an unsolvable problem.

Grab is a win-win

Let's start with what makes Grab so appealing.

Grab (at least in my mind) is the highest paying hourly wage job in the country. As long as you possess a car, and a valid driving license you can be a Grab driver, earning significantly more than any other hourly wage job.

According to this WOB article (which looks suspiciously like a paid ad), the average Grab driver earns RM5,000 per month, which is crazy money for a unskilled job -- and yes driving Grab is unskilled labour.

For unskilled work in Malaysia, earning RM5,000 per month is a god-send, after all even graduate employees don't earn that much. And like all hourly wage jobs, the more hours you put in, the more money they make -- 5,000 is just where it starts

So this seems like a win-win for everyone, drivers get to earn, and at the same time provide a service that is in high demand.

And in truth, Grab is a win-win -- at least for now.

Fast-forward

The problem is that when you fast-forward 10 years, or just 2 elections from now.

Most Grab drivers I've met aren't doing this part-time. They're driving as a full-time job, and they're putting in serious hours (10-12 a day) to make serious money. That means they've no time or to up-skill themselves, because every hour learning a new skill is an hour they could have been driving.

The cost of learning to them is a double-whammy, first they spend on acquiring the new skill (like everybody else), but also the lose income from their not driving. This for most, will be too high a price to pay.

You might argue that driving isn't un-skilled. But all it takes to be a Grab driver is a driving license and a car, skills don't factor into this. Grab doesn't care if you're a PhD, diploma holder or SPM drop-out, it'll pay the same.

Grab views all of it's drivers as a supplier of the one commodity it needs -- cars to move passengers. The only time Grab pays more to drivers is when they turn on the auto-accept feature, because that makes their algorithm more efficient. The more subservient you are to the algorithm, the better it will reward you -- that is a pretty nasty feeling.

So as more folks join the Grab band-wagon, we're sucking out skilled labour from the job-market. Leaving the entire country, as a whole, worse off in terms of competitiveness. But we're just getting started.

[]

Here’s one thing that’s already changed post GE14

In 2015, I was invited to a variety program on Astro to talk about cybersecurity.

This was just after Malaysian Airlines (MAS) had their DNS hijacked, but I was specifically told by the producer that I could NOT talk about the MAS hack, because MAS was a government linked company, and they couldn’t talk bad about GLCs.

Then half-way through the interview they asked me about government intervention, and I said something to the effect of “Governments are part of the problem and should refrain from censoring the internet”, that sound-bite never made it to TV because it was censored.

[]

Gov TLS Audit : Architecture

Last Month, I embarked on a new project called GovTLS Audit, a simple(ish) program that would scan 1000+ government websites to check for their TLS implementation. The code would go through a list of hostnames, and scan each host for TLS implementation details like redirection properties, certificate details, http headers, even stiching together Shodan results into a single comprehensive data record. That record would inserted into a DynamoDB, and exposed via a rest endpoint.

Initially I ran the scans manually Sunday night, and then uploaded the output files to S3 Buckets, and ran the scripts to insert them into the DB.

But 2 weeks ago, I decided to Automate the Process, and the architecture of this simple project is complete(ish!). Nothing is ever complete, but this is a good checkpoint, for me to begin documenting the architecture of GovTLS Audit (sometimes called siteaudit), and for me to share.

What is GovTLS Audit

First let's talk about what GovTLS Audit is -- it's a Python Script that scans a list of sites on the internet, and stores the results in 3 different files, a CSV file (for human consumption), a JSONL file (for insertion into DynamoDB) and a JSON file (for other programmatic access).

A different script then reads in the JSONL file and loads each row into database (DynamoDB), and then uploads the 3 files as one zip to an S3 bucket.

On the ‘server-side’ there are 3 lambda functions, all connected to an API Gateway Endpoint Resource.

  • One that Queries the latest details for a site [/siteDetails]
  • One that Queries the historical summaries for the site [/siteHistory]
  • One that List all scan (zip files) in the S3 Bucket [/listScans]
Finally there's a separate S3 bucket to serve the 'website', but that's just a simple html file with some javascript to list all scan files available for download. In the End, it looks something like this (click to enlarge):

[]

Read this before GE14

Let’s start this post the same way I start my day – by looking at Facebook.

Facebook made $40 Billion dollars in revenue in 2017, solely from advertising to pure schmucks like you. The mantra among the more technically literate is that facebook doesn’t have users it has products that it sells to advertisers, it just so happens that all its products are homo-sapien smart-phone totting urbanites (just like you!)

The platforms meteoric rise from nobody to top-dog, is a dream-story in Silicon Valley, but underneath the veneer of wholesome innovation lies a darker secret, one that could be responsible for the polarization of entire communities, including our own. And it’s all because of their most valuable employee.

No, not Mark Zuckerberg, but the real genius behind the blue and white site. The one responsible for billions of ad revenue facebook generates yearly, and unsurprisingly she’s female.

Anna Lytica and Machine Learning

There's probably thousands of post your facebook friends make everyday, but she decides which 3 to fit onto your smartphone screen first, and the next 3 and so forth. From the millions of videos shared every hour, she painstakingly picks the few you'd see in your timeline, she decides which ads to show you, and which advertisers to sell you too, underneath the hood in the giant ad behemoth, she lies working all day, everyday.

She isn’t a person, ‘she’ is an algorithm, a complex program that does billions of calculations a second, and for this post we’ll give her the name… Anna Lytica.

Facebook doesn’t talk about her much, she is after all a trade secret (sort of), but what she does and how she does it, might be as much a mystery to us, as it is to Mr. Zuckerberg. Machine Learning algorithms are complex things, we know how to build them, and train them, but how they actually work is sometimes beyond our understanding.

Google can train Alpha-Go to play a game, but how it makes decisions is unknown to Google and even itself – it just IS a Go player.And it is really sad, when we watch these AI algorithms make amazing discoveries, but are unable to explain their rationale to us mere humans. It’s the reason why Watson, IBMs big AI algorithm, hasn’t taken off in healthcare, there’s no point recommending a treatment for cancer, if the algorithm can’t explain why it chose the treatment in the first place.

This is hard to grasp, but AI isn’t just a ‘very powerful’ program, AI is something else entirely. We don’t even use traditional words like write or build to refer to the process of creating them (like we do regular programs), instead we use the word train.

We train an algorithm to play Go, to drive, or to treat cancer. We do this the same way we breed dogs, we pick specimens with the traits we want, and breed them till we end up with a something that matches our desires. How a dog works, and what a dog thinks is irrelevant. If we want them big, we simply breed the biggest specimens, the process is focused entirely on outcome.

Similarly, how the algorithm behaves is driven by what it was trained to do. How it works is irrelevant, all that matters is outcome. Can it play Go, can it drive, can it answer jeopardy? If you want to understand an algorithm you need to know what it was trained to do.

Anna Lytica, was trained to keep you browsing Facebook, after all the companies other endeavors like internet.org, and instant articles were built with the same intention. And while good ol’ Mark stated that he’s tweaking Anna to reduce the time people spend on Facebook, this is something new, an exception to the years Facebook tweaked her to keep you on their site.

After all the average monthly user spends 27 minutes per day in the app, and if you go by daily users, they spend about 41 minutes per day on Facebook. If that’s the end-result of tweaking Anna to ensure we spend less time on Facebook – God help us all!

And while it’s difficult to understand how Anna works, its very easy to guess how she’ll behave. If the end result of Anna’s training is to keep you browsing Facebook, then human psychology reveals a simple trait all humans share – confirmation bias.

[]

Gov.My TLS audit: Version 2.0

Last week I launched a draft of the Gov.my Audit, and this week we have version 2.0

Here’s what changed:

  1. Added More Sites. We now scan a total of 1324 government websites, up from just 1180.
  2. Added Shodan Results. Results includes both the open ports and time of the Shodan scan (scary shit!)
  3. Added Site Title. Results now include the HTML title to give a better description of the site (hopefully!).
  4. Added Form Fields. If the page on the root directory has an input form, the names of the fields will appear in the results. This allows for a quick glance at which sites have forms, and (roughly!) what the form ask for (search vs. IC Numbers).
  5. Added Domain in the CSV. The CSV is sorted by hostname, to allow for grouping by domain names (e.g. view all sites from selangor.gov.my or perlis.gov.my)
  6. Added an API. Now you can query the API can get more info on the site, including the cert info and HTTP headers.
  7. Released the Serverless.yml files for you to build the API yourself as well :)
All in all, it's a pretty bad-ass project (if I do say so myself). So let's take all that one at a time.
[]

Sayakenahack: Epilogue

I keep this blog to help me think, and over the past week, the only thing I’ve been thinking about, was sayakenahack.

I’ve declined a dozen interviews, partly because I was afraid to talk about it, and partly because my thoughts weren’t in the right place. I needed time to re-group, re-think, and ponder.

This blog post is the outcome of that ‘reflective’ period.

The PR folks tell me to strike while the iron is hot, but you know – biar lambat asal selamat.

Why I started sayakenahack?

I'm one part geek and one part engineer. I see a problem and my mind races to build a solution. Building sayakenahack, while difficult, and sometimes frustrating, was super-duper fun. I don't regret it for a moment, regardless of the sleepless nights it has caused me.

But that’s not the only reason.

I also built it to give Malaysians a chance to check whether they’ve been breached. I believe this is your right, and no one should withhold it from you. I also know that most Malaysians have no chance of ever checking the breach data themselves because they lack the necessary skills.

I know this, because 400,000 users have visited my post on “How to change your Unifi Password”.

400,000!!!

If they need my help to change a Wifi password, they’ve got no chance of finding the hacker forums, downloading the data, fixing the corrupted zip, and then searching for their details in file that is 10 million rows long – and no, Excel won’t fit 10mln rows.

So for at least 400,000 Malaysians, most of whom would have had their data leaked, there would have been zero chance of them ever finding out. ZERO!

The ’normal’ world is highly tech-illiterate (I’ve even talked about it on BFM).  Sayakenahack was my attempt to make this accessible to common folks. To deny them this right of checking their data is just wrong.

But why tell them at all if there’s nothing they can do about it? You can’t put the genie back in the lamp.

[]

Why does SayaKenaHack have dummy data?

Why does sayakenahack have dummy data? If I enter “123456” and “112233445566” I still get results.

I was struggling with answering this question, as some folks have used it to ‘prove’ that I was a phisher. We’ll get to that later, for now I hope to answer why these ‘fake’ IC numbers exist in the sayakenahack.

Firstly, I couldn’t find a good enough way to validate IC numbers as I was inserting them into the database. Most of you think that IC numbers follow a pre-define pattern :

  • 6-digit birthday (yymmdd format)
  • 2-digit state code
  • 4-digit personal identifier, where the last digit is odd for men, and even for women.
But, there are still folks with old IC numbers, and the army have their own format. Not to mention that the IC Number field  can be populated by passport numbers (for foreigners) and Company registration IDs. So instead of cracking my head on how to validate IC numbers, I decided to pass them all in.

The only ’transformation’ I do is to strip them of all non-AlphaNumeric characters and uppercasing any letters in the result. This would standardize the IC numbers in the database, regardless of source file format.

Had I done some validation, I might have removed these dummy entries – but fortunately I didn’t.

Upon further analyzing the data, I went back to the original source files and notice something strange, the account numbers belonged to some strange names. And then it made sense – this was Test data.

Test data in a Production Environment to be exact.

And when the Database for the telco was dumped, the telco’s didn’t remove these test accounts from their system. So what we have is a bunch of dummy accounts, with dummy IC numbers.

[]