In my company, we have an application that generates NDVI (normalized difference vegetation index) maps using Sentinel2 scenes, this is a very complex operation that requires a lot of RAM and CPU.
To understand the process, imagine that you have a polygon and you want to have the NDVI map for that polygon.
My example polygon
The simplified version of the whole process to have an NDVI map is:
Find out which scene from Sentinel 2 you need - there are many scenes to cover the world, in this step you need to find which scene covers your polygon
Calculate the NDVI using GDAL project https://gdal.org/ - GDAL has all features for it, we just orchestrate the CLI commands
Color the NDVI with some color scheme - the NDVI map is grayscale and we want to deliver to our customers a color version
Grayscale Polygon
Color Polygon
We are using Rails API and Sidekiq to generate maps. The first problem is that we have a lot of polygons to process and we need a lot of CPU and RAM to do that, in the beginning, we had EC2 machines to do all this work but we realized it was very expensive.
So we decided to put the sidekiq workers on our k8s cluster to run at night (low load moment), and we saved a few dollars on this operation.
It worked fine but created another problem: Sentinel 2's scenes are large and we paid a lot to transfer data from Amazon S3 to our k8s cluster, so we decided to create a lambda function to cut scenes in the amazon environment and only respond to a small file.
So far so good but we have one more problem, we need the GDAL binaries to cut the scene, i.e. the simple lambda function is not enough. At this point, we found that it is possible to create a Dockerfile to provision the environment that runs the AWS Lambda function :). We created a Docker image with all the GDALs and it worked great.