A lambda function is a like a little island, surrounded by network. Unlike Fargate containers, of EC2 instances, they do not have EFS, EBS or some other fast storage support. Everything that goes into a lambda, goes in via the network interface (and network only).
And hence, since Lambda’s are ephemeral, everything going in and out of the lambda has to transverse that network ‘moat’. And because they have no long-term storage, everything of value must be exfiltrated out the function’s execution context, and onto something else (like S3)
This is easy for HTTP requests or messages via SQS/SNS, but when dealing with files, the common tactic is to store them in /tmp
for reading or processing.
But a lesser known technique, bypasses the need for storing anything in the lambda /tmp
directory. Instead it uses Python’s inbuilt tempfile module, to create temporary files in memory, that be read/process the files in place like so:
This bypasses the need for /tmp
, and the limitations of the directory’s size (currently capped at 512MB). Since the file is loaded into memory, you get a larger capacity (though by not as much as you think).
Plus it actually incurs some additional complexity as well, because I’m not entirely comformtable coding io.Bytes
and io.String
, but generally speaking this does make your architecture neater at the expense of a couple lines of ‘not-so-straigtforward’ code.