7 Mistakes I Made In DynamoDB

When I was working at Glidewell, I was a regular participant of the weekly gym classes. They had regular 12-week health challenges where doing physical activities can earn points, which earn prizes, and I volunteered to help them make an application that automatically calculates and displays points. Because Glidewell was a heavy user of Amazon Web Services (AWS) services, I chose DynamoDB as the database to support my Angular app.

AWS DynamoDB logo

According to AWS, “DynamoDB is a fast and flexible nonrelational database service for any scale…You can deploy a nonrelational database in a matter of minutes. DynamoDB automatically scales throughput capacity to meet workload demands, and partitions and repartitions your data as your table size grows. Also, DynamoDB synchronously replicates data across three facilities in an AWS Region, giving you high availability and data durability.”

Well, my strong suit is the front-end, so yes to creating a database in minutes and yes to auto-scaling to meet workload demands.

While working with DynamoDB, I made many mistakes and learned more about its limitations as well. I’m hoping that sharing my mistakes will help or prevent others from making the same mistakes.

  1. By default, a Lambda DOES NOT have permissions to access the DynamoDB database.
  2. Empty strings are not allowed in DynamoDB (updated as of May 2020).
  3. Scanning the data is expensive. Consider using query over scan to get the data.
  4. Secondary Index is considered another ARN resource.
  5. Batch Write updates multiple records, but only up to 25 requests or 16 mb.
  6. DynamoDB returns paginated results on query and scan.
  7. Get only the data you need with Projection Expression.

1. By default, a Lambda DOES NOT have permissions to access the DynamoDB database.

A Lambda is basically a function that does something (whatever you code) in the cloud. One of the first things I did was write code in the Lambda to read values from the database. Easy peazy, right? Well not right.
In order for the Lambda to read data, write data, or update data in DynamoDB, you have to give it permissions to do that in the IAM (Identity Access Management) role. If I wanted to ask for permission every time I wanted to do something, I would live with my mom! Why doesn’t the Lambda just know that I want to access the database I created?

To give the Lambda permissions, you must create an IAM role, which contains a policy that has a list of actions that you will allow this resource to have. If I connect the Lambda to this IAM policy, then it will be able to actions on the DynamoDB table listed below in Action array on the DynamoDB table I specify in Resource. us-east-1 should be replaced with whatever region you are using and Resource should have the ARN of the DynamoDB table you want to access.

IAM policy for AWS Lambda to access DynamoDB
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "ReadWriteOnGymActivities",
            "Effect": "Allow",
            "Action": [
                "dynamodb:Query",
                "dynamodb:Scan",
                "dynamodb:BatchWrite*",
                "dynamodb:GetItem",
                "dynamodb:BatchGetItem",
                "dynamodb:PutItem",
                "dynamodb:UpdateItem",
                "dynamodb:DeleteItem"
            ],
            "Resource": "arn:aws:dynamodb:us-east-1:123456789012:table/GymActivities"
        }
    ]
}

2. Empty strings are not allowed in DynamoDB.

UPDATE: As of 5/18/2020, DynamoDB now does support empty strings.

I noticed that the app was sometimes getting errors when getting records from DynamoDB…but not ALWAYS, just SOMETIMES. After diving into this issue (and complaining to my coworkers), I saw that the only requests that were getting errors had some values that had empty strings. Specifically, sometimes the email field was empty as it was an optional field. Then a coworker, David, straight-up told me, “Oh yea, Dynamo databases don’t allow empty strings.” Wait, what? Is that true?! I thought that was crap, but googled it, and it WAS true. Also, I guess venting to your coworkers sometimes pays off.

So I fixed the HTTP GET errors with simply transforming my request data with a default value of null.

Resolve DynamoDB issues with default null value instead of empty string
Resolve DynamoDB issues with default null value instead of empty string

3. Scanning the data is expensive. Consider using query over scan to get the data.

There are 2 ways to get data from DynamoDB. You can query or you can scan. Scan will get ALL of the data in your table, which is pretty expensive, and query will get only filtered data, which is faster. However, if you plan to query, you have to setup the keys as your index in order to use in your query.

QueryScan
Searches by primary key or secondary indexReads all data
Can use conditions on sort keyCan specify filters to refine the values returned to you
Cheap with the right keys (large percentage of totally unique values)Expensive for large tables
used query by secondary index (Tech ID) to get the activities of individual participant
I used query by secondary index (Tech ID) to get the activities of individual participant
used scan here to get activities for all participants (all activities in the database)
I used scan here to get activities for all participants (all activities in the database)

4. Secondary Index is considered another ARN resource.

When you first setup your DynamoDB table, you have to select a primary key that is always unique.  If you want to efficiently get filtered data, you will want to also setup a secondary index, which doesn’t have to be unique.

An example of this is if I have all activities by all participants in the database, and I want to get all activities by 1 participant. The activityId is the primary key which is always unique, and the secondary index is the participantId. I don’t expect participantId to be unique because one participant could have many activities, but I will often want to know all the activities by that 1 participant.

In my app, I had setup a participantId as the secondary Index, but I could not get the data, and it kept erroring. It turns out that if you use a secondary index, that is considered an additional ARN resource, and you have to add it to your IAM role.

If you have a secondary index, you have to add an additional ARN resource to your IAM role.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AccessAllIndexesOnActivities",
            "Effect": "Allow",
            "Action": [
                "dynamodb:*"
            ],
            "Resource": [
                "arn:aws:dynamodb:us-east-1:123456789012:table/GymActivities",
                "arn:aws:dynamodb:us-east-1:123456789012:table/GymActivities/index/*"
            ]
        }
    ]
}

5. Batch Write updates multiple records, but only up to 25 requests or 16 mb.

When I was creating forms for participants to fill in their activities, I allowed participants to add as many activities as they wanted to one form submission. What I wanted to do was PUT or upsert them in the database. Then I found out about this DynamoDB action, batch write. Oh man, I’m so clever, and this is going to be so efficient. I’m going to send this form in through batch write, and every activity entered in the form will be a separate activity record in my DynamoDB. It’s going to be glorious!

Then one day, someone told me the activities were not being all entered in and the points were off. Nonsense! Not my app! Well, I looked into it and saw that some of the batch writes were throwing errors and not writing to the database.

It turns out that batch write allows “no more than 16mb or 25 requests,” so on the forms where participants were adding more than 25 activities on 1 form, the whole request would fail.

As a solution, I wrote a loop to do multiple batch writes and used a Promise.all() to accommodate more than 25 activities.

const saveToDatabase = (dataRecords, params) => {
    return new Promise((resolve, reject) => {
        docClient.batchWrite(params, (err, data) => {
            if(err) {
                return reject(err)
            } else {
                return resolve(dataRecords)
            }
        })
    })
}
const updateActivityFromAdmin = requestBody => {
    return new Promise((resolve, reject) => {
        const dataRecords = adminFormToRecords(requestBody);
        console.log('dataRecords', JSON.stringify(dataRecords));

        // count length of dataRecords and if it's more than 25, loop again
        const numRecords = dataRecords.length;
        const dynamoBatchWriteMax = 25;
        const numBatchWrites = Math.ceil(numRecords / dynamoBatchWriteMax);

        let promises = [];
        for (let i = 0; i < numBatchWrites; i++) {
            //substring of the numRecords 0-24, 25-50, 510
            const start = dynamoBatchWriteMax * i;
            const lastBatchWrite = i === numBatchWrites - 1;
            const end = lastBatchWrite
                ? numRecords
                : start + dynamoBatchWriteMax;
            const partialRecords = dataRecords.slice(start, end);
            const params = {
                RequestItems: {
                    [tableName]: partialRecords
                }
            };
            promises.push(saveToDatabase(partialRecords, params));
        }
        Promise.all(promises)
            .then(function (values) {
                resolve(dataRecords);
            })
            .catch(err => {
                reject(err);
            });
    });
};

6. DynamoDB returns paginated results on query and scan.

I was getting all activities for all participants and then getting all activities per participants. Then one day maybe 6 weeks into the 12 week challenge, someone had reported to me that the points were off. The total points in the leaderboard seemed to be lower than expected.  I kept wondering if I did the math wrong…but then again, it’s just addition?  Did I really mess up addition? Then I compared the number of records in the table and the number of records being returned to me, and it dawned on me that DynamoDB was returning paginated results!

I even had a QA environment and a production environment and did not catch this, but in my QA environment, I had way less users and less activities and it had never hit that data limit until in production, we were continuously adding more data as more forms got submitted throughout the challenge.  

This is how I solved this problem and got all of the data again. I wanted to get all activities to accurately calculate points per person.

const getActivitiesByTechId = async (techID) => {
    const params = filterTechId(techID);
    let results = [];
    let items;
    // loop through multiple pages of results
    do {
        items = await docClient.query(params).promise();
        items.Items.forEach((item) => results.push(item));
        params.ExclusiveStartKey = items.LastEvaluatedKey;
    } while (typeof items.LastEvaluatedKey != "undefined");
    return results;
}

7. Get only the data you need with Projection Expression.

About 10 weeks into the challenge, there were about 4000 records in the DynamoDB table. People were complaining that they couldn’t see their points, and my investigation showed that the API response was timing out intermittently. When I read the Lambda logs in Cloudwatch, it said that I hit the max data limit.

As a solution, instead of taking the entire record with all keys and values, I just took the fields that I was displaying in my Angular app with projection expression. That got rid of my API timeout errors.

const getParamsAllData() => {
    const params = {
        TableName: tableName,
        ProjectionExpression:
            'DateSubmitted, Activity, ActivityPoints, ActivityPointsApproved, Description, TechID, EmployeeName, Staff, AdminLink, ViewLink, ActivityFormId, NumberOfSteps'
    };
    return params;
}

Have you had any challenges using DynamoDB?

I hit some pretty time-consuming hurdles while using DynamoDB. What’s your experience with DynamoDB? Did you find any challenges while using it? What did you learn? I’d love to learn from you as well. Feel free to share your experiences and challenges with DynamoDB below, so we can build a community and all learn from each other.

Add a Comment

Your email address will not be published. Required fields are marked *