A lambda function is a like a little island, surrounded by network. Unlike Fargate containers, of EC2 instances, they do not have EFS, EBS or some other fast storage support. Everything that goes into a lambda, goes in via the network interface (and network only).
data:image/s3,"s3://crabby-images/0e85a/0e85aa374e6096f2a061ed18e22ab5a99480de31" alt=""
And hence, since Lambda’s are ephemeral, everything going in and out of the lambda has to transverse that network ‘moat’. And because they have no long-term storage, everything of value must be exfiltrated out the function’s execution context, and onto something else (like S3)
This is easy for HTTP requests or messages via SQS/SNS, but when dealing with files, the common tactic is to store them in /tmp
for reading or processing.
data:image/s3,"s3://crabby-images/fbc98/fbc98626640a3627ae4513299b073fb23013156e" alt=""
But a lesser known technique, bypasses the need for storing anything in the lambda /tmp
directory. Instead it uses Python’s inbuilt tempfile module, to create temporary files in memory, that be read/process the files in place like so:
This bypasses the need for /tmp
, and the limitations of the directory’s size (currently capped at 512MB). Since the file is loaded into memory, you get a larger capacity (though by not as much as you think).
Plus it actually incurs some additional complexity as well, because I’m not entirely comformtable coding io.Bytes
and io.String
, but generally speaking this does make your architecture neater at the expense of a couple lines of ‘not-so-straigtforward’ code.