AWS Lambda is a serverless compute service, which allow the user to run codes without provisioning or managing servers. With Lambda, the user does not manage runtimes nor admin the server. Utilizing Lambda is as simple as uploading a code in a ZIP file or a container image, and Lambda automatically allocates compute execution power and runs the code based on the incoming request or event. Lambda functions can be written in many flavors such as, Node.js, Python, Go, Java, and more. For more information about Lambda.
In this tutorial, we will explore two important aspects of AWS Lambda, which are Custom Layers and Lambda Function in Python using layers.
As a best practice, let’s update our Ubuntu operating system.
sudo apt update
Let’s install the zip utility.
sudo apt install zip -y
pip is the package installer for Python. Let’s install pip.
sudo apt install python3-pip
The chown command allows us to change the user and/or group ownership of a given file or directory.
sudo chown ubuntu:ubuntu -R /opt
We will cd into opt directory. cd is the command used to move between directories/folders.
cd /opt
mkdir is the command to create directories.
mkdir -p appzip/python
Let’s cd into app directory.
cd appzip/
Now, we will use the zip utility to zip the appzip.
zip -r appzip.zip python
We will download the zip file to our device. We will use Secure Copy scp, which is command line utility to allow us to securely copy files and directories between devices.
From your device’s terminal, use the following command to download the zip file to your local device. For more information, visit AWS.
scp -i <Name>.pem ubuntu@<EC2-Public-IP OR DNS>:/opt/appfolder/appfolder.zip ./appfolder.zip
I’m using AWS CLIv2 via my Windows Command Line:
C:\Users\Omar-PC\Desktop>scp -i Lambda.pem ubuntu@XX.XXX.XX.XXX:/opt/appfolder/appfolder.zip C:\Users\Omar-PC\Desktop
On AWS lambda, click on “Layers”, give it a name, description and upload the zip file (appzip). So far, we created a lambda layer as simple as that.
Add the following policies to the role, “AmazonS3Full Access and CloudWatchFull Access.
Name the role, lambda-pdf-extractor.
On S3, create two buckets:
A. First bucket name: source[5 random numbers]
B. Second bucket name: destination[5 random numbers]
Note: S3 bucket names are global names, which means they are unique. Two identical bucket names don’t exist.
Now, it’s time to create the lambda function using the layer and the role we have previously created.
On the lambda function code editor, remove the existing code and replace it with the code form my GitHub repo.
On Layers section, add a Custom layers by selecting our previously created Layer (TikaLayer) version 1.
key | value |
---|---|
TARGET_BUCKET | destinationxxxxx |
Note: replace destinationxxxxx with the name of your destination S3 bucket.
On Configuration, General configuration, edit to the following:
A. Memory: 256 MB
B. Timeout: 2 min
C. On the Existing role, select our previously created role, lambda=pdf-extractor
Next, select Test, and on the Template dropdown menu, select “Amazon S3 Put”
Name the event: s3put
On the JSON section, replace the JSON with this one from my Github. Let’s ensure that you update the following:
A. The AWS region. In my case, it is us-east-1
B. The bucket name. This should match the name of your S3 source bucket.
C. The object key. The object name should match the name of your PDF.
Click on Save changes.
Now, we are ready to test our function by selecting “Code, our “s3put” and click “test” button. If you have received status code 200, it means our code executed successfully.
If the function fails, it will state error “Unable to start the Tika server.” If this occurs, run the test one more time.
When the lambda function executed successfully, the extracted PDF is saved in our S3 destination bucket. It includes the result of the PDF text extraction, as shown below.
By the end of this tutorial, we have successfully created our AWS Lambda Function which extracts texts from uploaded PDFs in a S3 source bucket. The extracted texts are saved in a designated S3 destination bucket. We have achieved this task by utilizing a Custom Layer and a Lambda Function in Python using layers.