Pynamodb Quick Start#
Overview#
Pynamodb
, an ORM (Object-Relational Mapping) library for DynamoDB, addresses these challenges by providing a more intuitive and Pythonic way to interact with DynamoDB. With Pynamodb, you can define your data models using Python classes and access DynamoDB tables and items using familiar object-oriented syntax.
Pynamodb abstracts away the complexities of the low-level DynamoDB API, providing a higher-level and more readable interface. It handles the data type conversions automatically, allowing you to work with native Python data types directly. This simplifies the code, reduces the chances of errors, and improves overall productivity.
By using Pynamodb, developers can focus on writing business logic and application code, rather than wrestling with the intricacies of the DynamoDB API. The library takes care of the heavy lifting, making it easier to perform common operations like querying, filtering, and updating data in DynamoDB tables.
Without pynamodb#
When working with Amazon DynamoDB using the low-level API directly, developers often face two significant challenges: complex API syntax and verbose data type conversions.
Complex API Syntax: The DynamoDB API is highly complex and lacks human-intuitive naming conventions. The API methods and parameters are often lengthy and difficult to memorize, requiring developers to constantly refer to the official documentation and copy-paste exact syntax. This complexity can lead to increased development time, reduced code readability, and a higher likelihood of making errors.
Verbose Data Type Conversions: the DynamoDB API requires developers to convert data types explicitly when interacting with the API. This means that before making any API call, you need to convert Python data types (e.g., strings, integers, booleans) into their corresponding DynamoDB data types (e.g., ‘S’ for string, ‘N’ for number, ‘BOOL’ for boolean).
[37]:
import boto3
from rich import print as rprint
aws_profile = "bmt_app_dev_us_east_1"
dynamodb_client = boto3.session.Session(profile_name=aws_profile).client("dynamodb")
res = dynamodb_client.create_table(
TableName="without_pynamodb_example",
KeySchema=[
{"AttributeName": "user_id", "KeyType": "HASH"},
],
AttributeDefinitions=[
{"AttributeName": "user_id", "AttributeType": "S"},
],
BillingMode="PAY_PER_REQUEST",
)
rprint(res)
{ 'TableDescription': { 'AttributeDefinitions': [{'AttributeName': 'user_id', 'AttributeType': 'S'}], 'TableName': 'without_pynamodb_example', 'KeySchema': [{'AttributeName': 'user_id', 'KeyType': 'HASH'}], 'TableStatus': 'CREATING', 'CreationDateTime': datetime.datetime(2024, 5, 6, 0, 51, 29, 100000, tzinfo=tzlocal()), 'ProvisionedThroughput': {'NumberOfDecreasesToday': 0, 'ReadCapacityUnits': 0, 'WriteCapacityUnits': 0}, 'TableSizeBytes': 0, 'ItemCount': 0, 'TableArn': 'arn:aws:dynamodb:us-east-1:878625312159:table/without_pynamodb_example', 'TableId': '83e80b6a-2101-422c-afaa-164c0fd8cfaf', 'BillingModeSummary': {'BillingMode': 'PAY_PER_REQUEST'} }, 'ResponseMetadata': { 'RequestId': 'DCFFQHQ46954G13B2PHU9ANJTNVV4KQNSO5AEMVJF66Q9ASUAAJG', 'HTTPStatusCode': 200, 'HTTPHeaders': { 'server': 'Server', 'date': 'Mon, 06 May 2024 04:51:29 GMT', 'content-type': 'application/x-amz-json-1.0', 'content-length': '676', 'connection': 'keep-alive', 'x-amzn-requestid': 'DCFFQHQ46954G13B2PHU9ANJTNVV4KQNSO5AEMVJF66Q9ASUAAJG', 'x-amz-crc32': '2352933615' }, 'RetryAttempts': 0 } }
[38]:
# Insert item
res = dynamodb_client.put_item(
TableName="without_pynamodb_example",
# complicate data type adaptor
Item={
"user_id": {"S": "uid-1"},
# if it is not string type, you need additional code to convert the raw value to string
"email": {"S": "alice@example.com"},
}
)
rprint(res)
{ 'ResponseMetadata': { 'RequestId': 'UF5AVJQL2BT5GPSLFEK5GTUNEVVV4KQNSO5AEMVJF66Q9ASUAAJG', 'HTTPStatusCode': 200, 'HTTPHeaders': { 'server': 'Server', 'date': 'Mon, 06 May 2024 04:51:41 GMT', 'content-type': 'application/x-amz-json-1.0', 'content-length': '2', 'connection': 'keep-alive', 'x-amzn-requestid': 'UF5AVJQL2BT5GPSLFEK5GTUNEVVV4KQNSO5AEMVJF66Q9ASUAAJG', 'x-amz-crc32': '2745614147' }, 'RetryAttempts': 0 } }
[39]:
# Query
response = dynamodb_client.query(
TableName="without_pynamodb_example",
KeyConditions={
"user_id": {
"AttributeValueList": [
{"S": "uid-1"}
],
"ComparisonOperator": "EQ",
}
}
)
for item_dict in response["Items"]:
# you need additional code to convert the item back to native python dictionary
email = item_dict["email"]["S"] # if it is not string type, you need additional code to convert it back to raw value, like integer, binary ...
print(f"user uid-1's email is {email}")
user uid-1's email is alice@example.com
With pynamodb#
Python offers a wide range of data class and ORM (Object-Relational Mapping) libraries that simplify working with data and databases. Some popular examples include dataclasses
, attrs
, pydantic
, peewee
, sqlalchemy
, and django-orm
. These libraries provide a convenient way to define data structures, validate data, and interact with databases using an object-oriented approach.
When it comes to working with Amazon DynamoDB, pynamodb
stands out as a powerful and intuitive ORM library specifically designed for DynamoDB. With pynamodb
, you can define your data models as Python classes, specifying attributes and their types. This declarative style of defining data models makes your code more readable and maintainable.
One of the key advantages of using pynamodb
is its human-friendly API for performing CRUD (Create, Read, Update, Delete) operations. Instead of dealing with the low-level DynamoDB API, pynamodb
provides a high-level and intuitive interface that abstracts away the complexities.
[40]:
# Authenticate to AWS
from pynamodb.connection import Connection
boto_ses = boto3.session.Session(profile_name=aws_profile)
credentials = boto_ses.get_credentials()
connect = Connection(
aws_access_key_id=credentials.access_key,
aws_secret_access_key=credentials.secret_key,
aws_session_token=credentials.token,
)
[41]:
# Declare data model
from pynamodb.models import Model, PAY_PER_REQUEST_BILLING_MODE
from pynamodb.attributes import UnicodeAttribute
class User(Model):
class Meta:
table_name = "with_pynamodb_example"
region = "us-east-1"
billing_mode = PAY_PER_REQUEST_BILLING_MODE
user_id = UnicodeAttribute(hash_key=True)
email = UnicodeAttribute()
User.create_table(wait=True)
[42]:
# Insert item
# pass in value as it is, no convertion needed
user = User(user_id="uid-1", email="alice@example.com")
rprint(user.save())
{'ConsumedCapacity': {'CapacityUnits': 1.0, 'TableName': 'with_pynamodb_example'}}
[43]:
# Query
for user in User.query(hash_key="uid-1"):
# visit dictionary view
rprint(user.attribute_values)
# visit raw value as it is
print(f"user uid-1's email is {user.email}")
{'email': 'alice@example.com', 'user_id': 'uid-1'}
user uid-1's email is alice@example.com
Sample Code - Single Data Model, No Relationship#
In this example, I will demonstrate:
How to create a data model (declare a DynamoDB table) in
pynamodb
.How to do basic CRUD operations:
Create:
insert one item
bulk insert
Read:
query by keys
filter by non-keys attribute
Update:
full replacement
key / value update (recommended for most of scenario)
Delete
[44]:
# Authenticate to AWS
import boto3
import pynamodb
from pynamodb.models import Model
from pynamodb.connection import Connection
from pynamodb.attributes import UnicodeAttribute, NumberAttribute
# create boto3 dynamodb client connection using user-defined boto session
boto_ses = boto3.session.Session(profile_name=aws_profile)
credentials = boto_ses.get_credentials()
connection = Connection(
aws_access_key_id=credentials.access_key,
aws_secret_access_key=credentials.secret_key,
aws_session_token=credentials.token,
)
[45]:
# Declare bank account data model, create table
class Accounts(Model):
class Meta:
"""
declare metadata about the table.
"""
table_name = "accounts"
region = "us-east-1"
# billing mode
# doc: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/HowItWorks.ReadWriteCapacityMode.html
# pay as you go mode
billing_mode = pynamodb.models.PAY_PER_REQUEST_BILLING_MODE
# provisioned mode
# write_capacity_units = 10
# read_capacity_units = 10
# define attributes
account_id = UnicodeAttribute(hash_key=True)
primary_holder_email = UnicodeAttribute()
balance = NumberAttribute(default=0) # set default value for attribute
create_time = UnicodeAttribute()
description = UnicodeAttribute(null=True) # allow null value for this attribute
# Create dynamodb table if not exists, if already exists, this code won't do anything
Accounts.create_table(wait=True)
[46]:
# Insert one item
account = Accounts(
account_id="111-111-1111",
primary_holder_email="alice@example.com",
balance=0,
create_time="2000-01-01 10:00:00",
description="alice's account"
)
rprint(account.save())
{'ConsumedCapacity': {'CapacityUnits': 1.0, 'TableName': 'accounts'}}
[47]:
# Query one item
account = Accounts.query(hash_key="111-111-1111").next()
# visit value using object, should be 0
print(f"{account.balance = }")
# visit value using python dictionary, should be {"account_id": "111-111-111", "primary_holder_email": "alice@example.com", ...}
rprint(f"{account.attribute_values = }")
account.balance = 0
account.attribute_values = {'balance': 0, 'account_id': '111-111-1111', 'create_time': '2000-01-01 10:00:00', 'description': "alice's account", 'primary_holder_email': 'alice@example.com'}
[48]:
# Update one item - key / value update
# change primary holder and description
Accounts(account_id="111-111-1111").update( # specify the item you want to update by hash key
# define multiple update actions
actions=[
Accounts.primary_holder_email.set("bob@example.com"),
Accounts.description.set("bob's account'"),
]
)
account = Accounts.query(hash_key="111-111-1111").next()
rprint(f"{account.attribute_values = }")
account.attribute_values = {'balance': 0, 'account_id': '111-111-1111', 'create_time': '2000-01-01 10:00:00', 'description': "bob's account'", 'primary_holder_email': 'bob@example.com'}
[49]:
# atomic update balance
Accounts(account_id="111-111-1111").update( # specify the item you want to update by hash key
actions=[
Accounts.balance.set(Accounts.balance + 1),
]
)
# WARNING, NEVER DO THIS, this operation is not atomic
# if you have multiple programs running this code concurrently, it may cause double pay or double spend
# account = Accounts.query(hash_key="111-111-1111").next()
# account.balance = account.balance + 1
# account.save()
account = Accounts.query(hash_key="111-111-1111").next()
print(f"{account.balance = }")
account.balance = 1
[50]:
# Update one item - replacement the existing one
Accounts(
account_id="111-111-1111",
primary_holder_email="cathy@example.com",
balance=0,
create_time="2000-01-01 10:00:00",
# even though we don't change description here, but the old description is gone, because it is a full item replacement.
).save()
account = Accounts.query(hash_key="111-111-1111").next()
rprint(f"{account.attribute_values = }")
account.attribute_values = {'balance': 0, 'account_id': '111-111-1111', 'create_time': '2000-01-01 10:00:00', 'primary_holder_email': 'cathy@example.com'}
[51]:
# Bulk insert
# create some dummy data in memory, or read from csv, database, etc ...
many_account_data = [
dict(account_id="222-222-2222", primary_holder_email="john@example.com", create_time="2000-01-02 00:00:00"),
dict(account_id="333-333-3333", primary_holder_email="mike@example.com", create_time="2000-01-03 00:00:00"),
dict(account_id="444-444-4444", primary_holder_email="smith@example.com", create_time="2000-01-04 00:00:00"),
]
with Accounts.batch_write() as batch:
for account_data in many_account_data:
account = Accounts(**account_data)
batch.save(account)
[52]:
# Filter by non-keys attribute
for account in Accounts.scan(
filter_condition=Accounts.create_time.between("2000-01-01 23:59:59", "2000-01-03 23:59:59")
):
rprint(account.attribute_values)
{ 'balance': 0, 'account_id': '333-333-3333', 'create_time': '2000-01-03 00:00:00', 'primary_holder_email': 'mike@example.com' }
{ 'balance': 0, 'account_id': '222-222-2222', 'create_time': '2000-01-02 00:00:00', 'primary_holder_email': 'john@example.com' }
[54]:
# Bulk delete
with Accounts.batch_write() as batch:
for account in Accounts.scan(
filter_condition=Accounts.create_time.between("2000-01-01 23:59:59", "2000-01-03 23:59:59"),
attributes_to_get=["account_id"],
):
batch.delete(account)
for account in Accounts.scan():
rprint(account.attribute_values)
{ 'balance': 0, 'account_id': '444-444-4444', 'create_time': '2000-01-04 00:00:00', 'primary_holder_email': 'smith@example.com' }
{ 'balance': 0, 'account_id': '111-111-1111', 'create_time': '2000-01-01 10:00:00', 'primary_holder_email': 'cathy@example.com' }
Sample Code - Data Model with many-to-many Relationship and Index#
In this example, I will demonstrate:
How to define index.
How to query many-to-many relationship efficiently using pynamodb
[57]:
from pynamodb.indexes import GlobalSecondaryIndex, KeysOnlyProjection
# Create Index, allow us to query order that contains specific item
class ItemOrderIndex(GlobalSecondaryIndex):
class Meta:
index = "item-and-order-index"
projection = KeysOnlyProjection()
item_id = UnicodeAttribute(hash_key=True)
order_id = UnicodeAttribute(range_key=True)
# Create Orders data model
class Order(Model):
class Meta:
table_name = "orders"
region = "us-east-1"
billing_mode = pynamodb.models.PAY_PER_REQUEST_BILLING_MODE
# define attributes
order_id = UnicodeAttribute(hash_key=True)
item_id = UnicodeAttribute(range_key=True)
item_unit_price = NumberAttribute()
quantity = NumberAttribute()
# associate index
item_order_index = ItemOrderIndex()
Order.create_table(wait=True)
[58]:
# Insert some items
many_order_data = [
dict(order_id="order-1", item_id="item-1-apple", item_unit_price=0.8, quantity=3),
dict(order_id="order-1", item_id="item-2-banana", item_unit_price=0.4, quantity=5),
dict(order_id="order-2", item_id="item-2-banana", item_unit_price=0.4, quantity=8),
dict(order_id="order-2", item_id="item-3-cheery", item_unit_price=1.3, quantity=2),
]
with Order.batch_write() as batch:
for order_data in many_order_data:
order = Orders(**order_data)
batch.save(order)
[59]:
# Given a order id, find out all items in that order
for order in Orders.query(hash_key="order-1"):
rprint(f"{order.attribute_values = }")
order.attribute_values = {'item_id': 'item-1-apple', 'item_unit_price': 0.8, 'order_id': 'order-1', 'quantity': 3}
order.attribute_values = {'item_id': 'item-2-banana', 'item_unit_price': 0.4, 'order_id': 'order-1', 'quantity': 5}
[60]:
# Given a item id, find out all order that has that item
for order in Order.item_order_index.query(hash_key="item-2-banana"):
rprint(f"{order.attribute_values = }")
order.attribute_values = {'item_id': 'item-2-banana', 'order_id': 'order-1'}
order.attribute_values = {'item_id': 'item-2-banana', 'order_id': 'order-2'}
Summary#
If you’re planning to build applications using Amazon DynamoDB, consider using the pynamodb
library to simplify your development process and create more reliable and maintainable code.
pynamodb
is a mature and actively maintained library that has been available for over a decade. Its long-standing presence in the DynamoDB ecosystem demonstrates its reliability and trustworthiness. You can confidently adopt pynamodb
knowing that it has been extensively used and continually improved by the community.
One of the key benefits of using pynamodb
is its ability to simplify the development of complex application code. By providing a higher-level abstraction over the DynamoDB API, pynamodb
reduces the chances of introducing errors and bugs in your codebase. It offers a more intuitive and Pythonic interface, allowing you to focus on writing the core logic of your application rather than dealing with the intricacies of the low-level DynamoDB API.
Moreover, pynamodb
promotes code succinctness and readability. It allows you to express your data models and interactions with DynamoDB using concise and expressive Python code. The library provides a declarative syntax for defining your data schemas, making it easy to understand the structure and relationships of your data. With pynamodb
, your codebase becomes more human-friendly, enabling other developers (including your future self) to quickly grasp the purpose and functionality of
your code.
The enhanced readability and maintainability offered by pynamodb
are particularly valuable in collaborative development environments and long-term projects. As your application grows and evolves, having a clear and comprehensible codebase becomes increasingly important. pynamodb
helps you achieve this by providing a consistent and intuitive way to interact with DynamoDB, reducing the cognitive load required to understand and modify your code.
In summary, if you want to develop DynamoDB applications with confidence, simplify your codebase, and create more maintainable and readable code, using the pynamodb
library is a wise choice. Its long-standing history, active maintenance, and developer-friendly features make it an excellent tool for building robust and efficient DynamoDB-based applications.
References#
[ ]: