Infrastructure at your Service

Franck Pachot

Amazon DynamoDB: the cost of indexes

By August 13, 2020 AWS, NoSQL No Comments

By Franck Pachot

.
That’s common to any data structure, whether it is RDBMS or NoSQL, indexes are good to accelerate reads but slow the writes. This post explains the consequences of adding indexes in DynamoDB.

Secondary Indexes

What we call an index in DynamoDB is different from an index in RDBMS. They have the same goal: store your data with some redundancy in order to have it physically partitioned, sorted, and clustered differently than the table, in order to optimize the performance for specific access patterns. It can be full redundancy (covering indexes) so that there’s is no need to look at the table, or partial redundancy (only the key values and the sufficient values for accessing the table efficiently). The indexes are maintained automatically: when the table is updated, the index entries are maintained by the database engine. This can be synchronous, or asynchronous if eventual consistency is accepted. The major difference is that a relational database separates the logical and physical implementation (Codd Rule 8: Physical Data Independence) for better agility: there is no change to do in the application code to access through an index or another. RDBMS have an optimizer (query planner) that selects the best access path for the query predicates. That was the topic of the previous post. But following the NoSQL spirit, AWS DynamoDB delegates this responsibility to the application code: the index access will be used only when you explicitly query it.

Because DynamoDB tables are physically organized by the primary key (hash partitioning with local index when a sort key is defined) this KeySchema can be considered the primary index. Then any additional index is a secondary index. It can be local, prefixed by the hash key, or global, prefixed by another hash key than the table.

Table with no indexes


aws dynamodb create-table --table-name Demo \
 --billing-mode PROVISIONED --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5 \
 --attribute-definitions AttributeName=P,AttributeType=S AttributeName=S,AttributeType=S \
 --key-schema AttributeName=P,KeyType=HASH AttributeName=S,KeyType=RANGE

I have created a HASH/RANGE table that is perfect to access with a single value for the attribute P, the partition key, and a single value or a range for S, the sort key.


{
    "TableDescription": {
        "TableArn": "arn:aws:dynamodb:eu-central-1:802756008554:table/Demo",
        "AttributeDefinitions": [
            {
                "AttributeName": "P",
                "AttributeType": "S"
            },
            {
                "AttributeName": "S",
                "AttributeType": "S"
            }
        ],
        "ProvisionedThroughput": {
            "NumberOfDecreasesToday": 0,
            "WriteCapacityUnits": 5,
            "ReadCapacityUnits": 5
        },
        "TableSizeBytes": 0,
        "TableName": "Demo",
        "TableStatus": "CREATING",
        "TableId": "b2a97f98-611d-451d-99ee-c3aab1129b30",
        "KeySchema": [
            {
                "KeyType": "HASH",
                "AttributeName": "P"
            },
            {
                "KeyType": "RANGE",
                "AttributeName": "S"
            }
        ],
        "ItemCount": 0,
        "CreationDateTime": 1597052214.276
    }
}

This is the output of the create-table command. I use small reserved capacity in my blog posts so that you can run the test on the AWS Free Tier without risk. I look at the metrics, for better understanding, not the response time which depends on many other factors (network latency, bursting, throttling,…). But of course, you will get the same on larger data sets.

In this table, I’ll insert items by batch from the following JSON (values are randomly generated for each call):


{                                                                                                                                                                                                                                  [16/91205]
 "Demo": [
  {"PutRequest":{"Item":{"P":{"S":"4707"},"S":{"S":"23535"},"A0":{"S":"18781"}
,"A01":{"S":"10065"} ,"A02":{"S":"2614"} ,"A03":{"S":"7777"} ,"A04":{"S":"19950"} ,"A05":{"S":"30864"} ,"A06":{"S":"24176"} ,"A07":{"S":"22257"} ,"A08":{"S":"11549"} ,"A09":{"S":"28368"} ,"A10":{"S":"29095"} ,"A11":{"S":"23060"} ,"A12":{
"S":"3321"} ,"A13":{"S":"30588"} ,"A14":{"S":"16039"} ,"A15":{"S":"31388"} ,"A16":{"S":"21811"} ,"A17":{"S":"10593"} ,"A18":{"S":"18914"} ,"A19":{"S":"23120"} ,"A20":{"S":"25238"} }}},
  {"PutRequest":{"Item":{"P":{"S":"4106"},"S":{"S":"15829"},"A0":{"S":"28144"}
,"A01":{"S":"9051"} ,"A02":{"S":"26834"} ,"A03":{"S":"1614"} ,"A04":{"S":"6458"} ,"A05":{"S":"1721"} ,"A06":{"S":"8022"} ,"A07":{"S":"49"} ,"A08":{"S":"23158"} ,"A09":{"S":"6588"} ,"A10":{"S":"17560"} ,"A11":{"S":"4330"} ,"A12":{"S":"175
78"} ,"A13":{"S":"8548"} ,"A14":{"S":"57"} ,"A15":{"S":"27601"} ,"A16":{"S":"8766"} ,"A17":{"S":"24400"} ,"A18":{"S":"18881"} ,"A19":{"S":"28418"} ,"A20":{"S":"14915"} }}},
... 
  {"PutRequest":{"Item":{"P":{"S":"27274"},"S":{"S":"8548"},"A0":{"S":"11557"}
,"A01":{"S":"28758"} ,"A02":{"S":"17212"} ,"A03":{"S":"17658"} ,"A04":{"S":"10456"} ,"A05":{"S":"8488"} ,"A06":{"S":"28852"} ,"A07":{"S":"22763"} ,"A08":{"S":"21667"} ,"A09":{"S":"15240"} ,"A10":{"S":"12092"} ,"A11":{"S":"25045"} ,"A12":{"S":"9156"} ,"A13":{"S":"27596"} ,"A14":{"S":"27305"} ,"A15":{"S":"22214"} ,"A16":{"S":"13384"} ,"A17":{"S":"12300"} ,"A18":{"S":"12913"} ,"A19":{"S":"20121"} ,"A20":{"S":"20224"} }}}
 ]
}

In addition to the primary key, I have attributes from A0 to A20. I put 25 items per call (that’s the maximum for DynamoDB) and my goal is to have many attributes that I can index later.


aws dynamodb batch-write-item --request-items file://batch-write.json \
 --return-consumed-capacity INDEXES --return-item-collection-metrics SIZE

This is the simple call for this batch insert, returning the consumed capacity on table and indexes:


aws dynamodb batch-write-item --request-items file://batch-write139.json --return-consumed-capacity INDEXES --return-item-collection-metrics SIZE
...
{
    "UnprocessedItems": {},
    "ItemCollectionMetrics": {},
    "ConsumedCapacity": [
        {
            "CapacityUnits": 25.0,
            "TableName": "Demo",
            "Table": {
                "CapacityUnits": 25.0
            }
        }
    ]
}

25 Write Capacity Units for 25 items: each item that is smaller than 1KB consumes 1 WCU. My items are on average 170 Bytes here, so they fit in 1 WCU. And batching doesn’t help there: it is batched for the network call only, but they all go into a different place, and then require a WCU for each of them. There is nothing like preparing full blocks with many items (like RDBMS direct-path inserts, or fast load, or insert append…). DynamoDB is there to scale small transactions by scattering data though multiple partitions.

Table with 5 local indexes

Here is the same create statement but with 5 Local Secondary Indexes declared:


aws dynamodb create-table --table-name Demo --billing-mode PROVISIONED --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5 \
 --attribute-definitions AttributeName=P,AttributeType=S AttributeName=S,AttributeType=S \
  AttributeName=A01,AttributeType=S AttributeName=A02,AttributeType=S AttributeName=A03,AttributeType=S AttributeName=A04,AttributeType=S AttributeName=A05,AttributeType=S \
 --key-schema AttributeName=P,KeyType=HASH AttributeName=S,KeyType=RANGE \
 --local-secondary-indexes \
  'IndexName=LSI01,KeySchema=[{AttributeName=P,KeyType=HASH},{AttributeName=A01,KeyType=RANGE}],Projection={ProjectionType=ALL}' \
  'IndexName=LSI02,KeySchema=[{AttributeName=P,KeyType=HASH},{AttributeName=A02,KeyType=RANGE}],Projection={ProjectionType=ALL}' \
  'IndexName=LSI03,KeySchema=[{AttributeName=P,KeyType=HASH},{AttributeName=A03,KeyType=RANGE}],Projection={ProjectionType=ALL}' \
  'IndexName=LSI04,KeySchema=[{AttributeName=P,KeyType=HASH},{AttributeName=A04,KeyType=RANGE}],Projection={ProjectionType=ALL}' \
  'IndexName=LSI05,KeySchema=[{AttributeName=P,KeyType=HASH},{AttributeName=A05,KeyType=RANGE}],Projection={ProjectionType=ALL}'

This recreated the table with adding the definition for 5 local indexes, on the same partition key but different sort key. I had to add the attribute definition for them as I reference them in the index definition.


...
        },
        "TableSizeBytes": 0,
        "TableName": "Demo",
        "TableStatus": "CREATING",
        "TableId": "84fc745b-66c5-4c75-bcf4-7686b2daeacb",
        "KeySchema": [
            {
                "KeyType": "HASH",
                "AttributeName": "P"
            },
            {
                "KeyType": "RANGE",
                "AttributeName": "S"
            }
        ],
        "ItemCount": 0,
        "CreationDateTime": 1597054018.546
    }
}

The hash partition size in DynamoDB is fixed, 10GB, and because the local indexes are stored within each partition, the total size of an item plus all its index entries cannot go higher than this limit. Here, I’m far from the limit, which will be often the case: if your key-value store is a document store, you will not project the document into all local indexes. Then use KEYS_ONLY for the projection type and not the ALL one I used there. And anyway, 5 local indexes is the maximum you can create in DynamoDB.


aws dynamodb batch-write-item --request-items file://batch-write139.json --return-consumed-capacity INDEXES --return-item-collection-metrics SIZE
...
   "ConsumedCapacity": [
        {
            "CapacityUnits": 150.0,
            "TableName": "Demo",
            "LocalSecondaryIndexes": {
                "LSI01": {
                    "CapacityUnits": 25.0
                },
                "LSI03": {
                    "CapacityUnits": 25.0
                },
                "LSI02": {
                    "CapacityUnits": 25.0
                },
                "LSI05": {
                    "CapacityUnits": 25.0
                },
                "LSI04": {
                    "CapacityUnits": 25.0
                }
            },
            "Table": {
                "CapacityUnits": 25.0
            }
        }
    ]
}

Here we are 155 WCU in total here. The same 25 WCU as before, for the 25 items put in the table. And each local index accounts for an additional 25 WCU. I have no idea why 26 and not 25 by the way. Note that I’ve seen a few with 26 WCU for all indexes in the test and I don’t really know why.

Table with 20 global indexes

Now, without any local indexes but the maximum global indexes we can have here: 20 Global Secondary Indexes (GSI)


aws dynamodb create-table --table-name Demo --billing-mode PROVISIONED --provisioned-throughput ReadCapacityUnits=5,WriteCapacityUnits=5 \
 --attribute-definitions AttributeName=P,AttributeType=S AttributeName=S,AttributeType=S \
AttributeName=A01,AttributeType=S AttributeName=A02,AttributeType=S AttributeName=A03,AttributeType=S AttributeName=A04,AttributeType=S AttributeName=A05,AttributeType=S AttributeName=A06,AttributeType=S AttributeName=A07,AttributeType=S AttributeName=A08,AttributeType=S AttributeName=A09,AttributeType=S AttributeName=A10,AttributeType=S AttributeName=A11,AttributeType=S AttributeName=A12,AttributeType=S AttributeName=A13,AttributeType=S AttributeName=A14,AttributeType=S AttributeName=A15,AttributeType=S AttributeName=A16,AttributeType=S AttributeName=A17,AttributeType=S AttributeName=A18,AttributeType=S AttributeName=A19,AttributeType=S AttributeName=A20,AttributeType=S \
 --key-schema AttributeName=P,KeyType=HASH AttributeName=S,KeyType=RANGE \
 --global-secondary-indexes \
  'IndexName=GSI01,KeySchema=[{AttributeName=A01,KeyType=HASH},{AttributeName=S,KeyType=RANGE}],Projection={ProjectionType=ALL},ProvisionedThroughput={ReadCapacityUnits=1,WriteCapacityUnits=1}' \
  'IndexName=GSI02,KeySchema=[{AttributeName=A02,KeyType=HASH},{AttributeName=S,KeyType=RANGE}],Projection={ProjectionType=ALL},ProvisionedThroughput={ReadCapacityUnits=1,WriteCapacityUnits=1}' \
  'IndexName=GSI03,KeySchema=[{AttributeName=A03,KeyType=HASH},{AttributeName=S,KeyType=RANGE}],Projection={ProjectionType=ALL},ProvisionedThroughput={ReadCapacityUnits=1,WriteCapacityUnits=1}' \
  'IndexName=GSI04,KeySchema=[{AttributeName=A04,KeyType=HASH},{AttributeName=S,KeyType=RANGE}],Projection={ProjectionType=ALL},ProvisionedThroughput={ReadCapacityUnits=1,WriteCapacityUnits=1}' \
  'IndexName=GSI05,KeySchema=[{AttributeName=A05,KeyType=HASH},{AttributeName=S,KeyType=RANGE}],Projection={ProjectionType=ALL},ProvisionedThroughput={ReadCapacityUnits=1,WriteCapacityUnits=1}' \
  'IndexName=GSI06,KeySchema=[{AttributeName=A06,KeyType=HASH},{AttributeName=S,KeyType=RANGE}],Projection={ProjectionType=ALL},ProvisionedThroughput={ReadCapacityUnits=1,WriteCapacityUnits=1}' \
  'IndexName=GSI07,KeySchema=[{AttributeName=A07,KeyType=HASH},{AttributeName=S,KeyType=RANGE}],Projection={ProjectionType=ALL},ProvisionedThroughput={ReadCapacityUnits=1,WriteCapacityUnits=1}' \
  'IndexName=GSI08,KeySchema=[{AttributeName=A08,KeyType=HASH},{AttributeName=S,KeyType=RANGE}],Projection={ProjectionType=ALL},ProvisionedThroughput={ReadCapacityUnits=1,WriteCapacityUnits=1}' \
  'IndexName=GSI09,KeySchema=[{AttributeName=A09,KeyType=HASH},{AttributeName=S,KeyType=RANGE}],Projection={ProjectionType=ALL},ProvisionedThroughput={ReadCapacityUnits=1,WriteCapacityUnits=1}' \
  'IndexName=GSI10,KeySchema=[{AttributeName=A10,KeyType=HASH},{AttributeName=S,KeyType=RANGE}],Projection={ProjectionType=ALL},ProvisionedThroughput={ReadCapacityUnits=1,WriteCapacityUnits=1}' \
  'IndexName=GSI11,KeySchema=[{AttributeName=A11,KeyType=HASH},{AttributeName=S,KeyType=RANGE}],Projection={ProjectionType=ALL},ProvisionedThroughput={ReadCapacityUnits=1,WriteCapacityUnits=1}' \
  'IndexName=GSI12,KeySchema=[{AttributeName=A12,KeyType=HASH},{AttributeName=S,KeyType=RANGE}],Projection={ProjectionType=ALL},ProvisionedThroughput={ReadCapacityUnits=1,WriteCapacityUnits=1}' \
  'IndexName=GSI13,KeySchema=[{AttributeName=A13,KeyType=HASH},{AttributeName=S,KeyType=RANGE}],Projection={ProjectionType=ALL},ProvisionedThroughput={ReadCapacityUnits=1,WriteCapacityUnits=1}' \
  'IndexName=GSI14,KeySchema=[{AttributeName=A14,KeyType=HASH},{AttributeName=S,KeyType=RANGE}],Projection={ProjectionType=ALL},ProvisionedThroughput={ReadCapacityUnits=1,WriteCapacityUnits=1}' \
  'IndexName=GSI15,KeySchema=[{AttributeName=A15,KeyType=HASH},{AttributeName=S,KeyType=RANGE}],Projection={ProjectionType=ALL},ProvisionedThroughput={ReadCapacityUnits=1,WriteCapacityUnits=1}' \
  'IndexName=GSI16,KeySchema=[{AttributeName=A16,KeyType=HASH},{AttributeName=S,KeyType=RANGE}],Projection={ProjectionType=ALL},ProvisionedThroughput={ReadCapacityUnits=1,WriteCapacityUnits=1}' \
  'IndexName=GSI17,KeySchema=[{AttributeName=A17,KeyType=HASH},{AttributeName=S,KeyType=RANGE}],Projection={ProjectionType=ALL},ProvisionedThroughput={ReadCapacityUnits=1,WriteCapacityUnits=1}' \
  'IndexName=GSI18,KeySchema=[{AttributeName=A18,KeyType=HASH},{AttributeName=S,KeyType=RANGE}],Projection={ProjectionType=ALL},ProvisionedThroughput={ReadCapacityUnits=1,WriteCapacityUnits=1}' \
  'IndexName=GSI19,KeySchema=[{AttributeName=A19,KeyType=HASH},{AttributeName=S,KeyType=RANGE}],Projection={ProjectionType=ALL},ProvisionedThroughput={ReadCapacityUnits=1,WriteCapacityUnits=1}' \
  'IndexName=GSI20,KeySchema=[{AttributeName=A20,KeyType=HASH},{AttributeName=S,KeyType=RANGE}],Projection={ProjectionType=ALL},ProvisionedThroughput={ReadCapacityUnits=1,WriteCapacityUnits=1}'

This takes much longer to create because global indexes are actually like other tables that are maintained asynchronously (there’s only eventual consistency when you read them).


aws dynamodb batch-write-item --request-items file://batch-write139.json --return-consumed-capacity INDEXES --return-item-collection-metrics SIZE
...
{                                                                                                                                                                                                                                      "UnprocessedItems": {},
    "ItemCollectionMetrics": {},
    "ConsumedCapacity": [
        {
            "CapacityUnits": 525.0,
            "GlobalSecondaryIndexes": {
                "GSI06": {
                    "CapacityUnits": 25.0
                },
                "GSI07": {
                    "CapacityUnits": 25.0
                },
                "GSI05": {
                    "CapacityUnits": 25.0
                },
 ...
                "GSI08": {
                    "CapacityUnits": 25.0
                },
                "GSI03": {
                    "CapacityUnits": 25.0
                }
            },
            "TableName": "Demo",
            "Table": {
                "CapacityUnits": 25.0
            }
        }
    ]
}

The cost is the same as with local indexes: one capacity unit per item per index in addition to the table.

So this post is simply there to get your attention to the fact that adding indexes will slow the writes, in NoSQL, as in any database. In DynamoDB this is measured by Write Capacity Unit and you can get the whole detail, how many WCU for the table, for the LSI and for the GSI, with “ReturnConsumedCapacity”. But what is important is that this capacity can scale. You will probably not see a difference in the response time. Except of course if you go beyond the provisioned capacity. And then you can increase it (it has a cost of course). How does it scale? Because DynamoDB allows us to do only things that scale. Maintaining global indexes requires cross-node synchronization in a distributed database, and this cannot scale. So DynamoDB does it asynchronously (reads on the GSI is eventually consistent). And the number of GSI is limited to 20. Maintaining local indexes do not involve cross-partition latency and are maintained synchronously. But to limit the overhead, you can create 5 LSI at maximum. Within those limits, local and global indexes are useful to keep item access fast (see previous posts on covering GSI and LSI)

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Franck Pachot
Franck Pachot

Principal Consultant / Database Evangelist
Oracle ACE Director
Oracle Database OCM 12c certified
AWS Database Specialty certified
Oak Table member

RSS for this blog: feed
Twitter: @FranckPachot
LinkedIn : www.linkedin.com/in/franckpachot
Podcast en français: DBPod