Hosting a static website on Amazon S3#

Keywords: AWS, Amazon, S3, Host, Hosting

Summary#

使用 S3 作为 Host 静态网站的后台是一种非常方便省钱的方式. 传统静态网站都是需要一个长期在线的文件服务器, 这个文件服务器本身就是一部分开支. 如果使用 S3, 则你只需要支付存储和流量的费用即可. 并且 S3 有很多优势, 比如可以使用 CDN, 可以使用 Route53 换域名等等, 都是云原生支持的, 非常方便安全.

使用 S3 host static website 的关键步骤#

  1. 创建一个 S3 Bucket, 并且在 bucket 里的 permissions 菜单里将 Block public access (bucket settings) 设置为 Off. 这并不意味着你的数据就全变成 public 了, 这只是意味着关闭了 block public access 的功能. 这里要注意的是, 除了每个 bucket 可以单独设置这个, 在整个 Account 的 S3 settings 也可以设置对全部 bucket 生效的 block public access settings. 通常情况下一个安全的 AWS Accounts 会将这个设置打开, 也就是所有 bucket 默认拒绝 public access. 为了 host website, 你需要将 account 级别的这个设置关掉才能对具体的 bucket 也关掉这个设置. 但这意味着新建的 bucket 就不会默认打开这个设置了. 所以我一般推荐专门用一个 AWS Account 来做 public facing 的事情.

  2. 到 bucket 里的 properties 菜单里将 Static website hosting 打开.

  3. 到 bucket 里的 permissions 菜单里修改 Bucket policy, 定义谁可以访问这里的数据. 就如前面说的, 关闭 Block public access 并不会让你的数据变成 public, 而 Bucket policy 才是真正定义了将你的网站变成 public.

这里有两个 Statement 最为重要, 一个是 Allow 的部分, 定义了谁可以访问. 一个是 Deny, 定义了谁不可以访问. AWS 的规则是 explicit deny > explicit allow > default deny.

Allow 的部分一般是这样, 允许所有人访问这里的数据:

{
    "Sid": "PublicReadGetObject",
    "Effect": "Allow",
    "Principal": "*",
    "Action": "s3:GetObject",
    "Resource": "arn:aws:s3:::bucket-name/*"
}

Deny 的部分有很多种选择, 但是通常的目的是为了默认 deny 所有人, 除非是来自于某些受信的网络. 例如下面这个例子是只允许某些 IP 地址段访问, 而 deny 掉所有其他人:

{
    "Sid": "VpcSourceIp",
    "Effect": "Deny",
    "Principal": "*",
    "Action": "s3:*",
    "Resource": [
        "arn:aws:s3:::bucket-name",
        "arn:aws:s3:::bucket-name/*"
    ],
    "Condition": {
        "NotIpAddress": {
            "aws:SourceIp": [
                "111.111.111.111/32"
            ]
        }
    }
}

这里的关键是 Condition 的部分. 这里还有几个例子可以参考.

只允许来自于某些 VPC Endpoint 的访问, 使用的是 VPC id:

"Condition": {
    "StringNotEquals": {
        "aws:SourceVpce": [
            "vpce-1111111",
            "vpce-2222222"
        ]
    }
},

只允许来自于某些 VPC 的访问, 使用的是 VPC CIDR block:

"Condition": {
    "NotIpAddress": {
        "aws:VpcSourceIp": [
            "10.1.1.1/32",
            "172.1.1.1/32"
        ]
    }
},

只允许来自于某些 IP 的访问, 使用的是 Public IPV4 地址:

"Condition": {
    "NotIpAddress": {
        "aws:SourceIp": [
            "11.11.11.11/32",
            "22.22.22.22/32"
        ]
    }
},

只允许来自于某些 AWS Account, IAM User, IAM Role 的访问:

# AROAEXAMPLEID is the role ID of an IAM role that you want to allow
# AIDAEXAMPLEID is the user ID of an IAM user that you want to allow
# 111122223333 is the AWS account ID of the bucket, which represents the credentials of the AWS account root user

"Condition": {
    "StringNotLike": {
        "aws:userId": [
            "AROAEXAMPLEID:*",
            "AIDAEXAMPLEID",
            "111122223333"
        ]
    }
},

我最常用的 Bucket Policy 设置是只允许来自于受信的 IP 地址访问. 如果是我个人则是我家的 IP 地址, 如果是公司则是公司的 VPN IP 地址. 并且 CORS 没有打开, 因为我一般不自定义 Domain. 我的 Policy 如下:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "PublicReadGetObject",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::${bucket_name}/*"
        },
        {
            "Sid": "VpcSourceIp",
            "Effect": "Deny",
            "Principal": "*",
            "Action": "s3:*",
            "Resource": [
                "arn:aws:s3:::${bucket_name}",
                "arn:aws:s3:::${bucket_name}/*"
            ],
            "Condition": {
                "NotIpAddress": {
                    "aws:SourceIp": "${trusted_ip_address}/32"
                }
            }
        }
    ]
}

至此, 你就可以访问你的 static website 了. 其中 S3 object 到网站 URL 的映射关系是: s3://${bucket}/${key} -> https://${bucket}.s3.amazonaws.com/${key}

Reference:

Automation Script#

下面我们提供了一个脚本, 能够方便地将一个 S3 Bucket 设置为可以 host static website 的状态.

  1# -*- coding: utf-8 -*-
  2
  3import typing as T
  4import json
  5from urllib import request
  6
  7import botocore.exceptions
  8
  9if T.TYPE_CHECKING:
 10    from mypy_boto3_s3 import S3Client
 11
 12checkip_url = "https://checkip.amazonaws.com"
 13
 14
 15def get_public_ip() -> str:
 16    with request.urlopen(checkip_url) as response:
 17        return response.read().decode("utf-8").strip()
 18
 19
 20def get_bucket_website(
 21    s3_client: "S3Client",
 22    bucket: str,
 23) -> T.Optional[dict]:
 24    try:
 25        return s3_client.get_bucket_website(Bucket=bucket)
 26    except botocore.exceptions.ClientError as e:
 27        if e.response["Error"]["Code"] == "NoSuchWebsiteConfiguration":
 28            return None
 29        else:
 30            raise e
 31
 32
 33def enable_bucket_static_website_hosting(
 34    s3_client: "S3Client",
 35    bucket: str,
 36    index_document: str = "index.html",
 37    error_document: T.Optional[str] = None,
 38) -> dict:
 39    """
 40        Reference:
 41
 42        - Enable static website hosting
 43
 44    : https://docs.aws.amazon.com/AmazonS3/latest/userguide/HostingWebsiteOnS3Setup.html#step2-create-bucket-config-as-website
 45    """
 46    website_configuration = dict(
 47        IndexDocument=dict(Suffix="index.html"),
 48    )
 49    if error_document is not None:
 50        website_configuration["ErrorDocument"] = dict(Key=error_document)
 51    return s3_client.put_bucket_website(
 52        Bucket=bucket,
 53        WebsiteConfiguration=dict(
 54            IndexDocument=dict(Suffix=index_document),
 55        ),
 56    )
 57
 58
 59def turn_off_block_public_access(
 60    s3_client: "S3Client",
 61    bucket: str,
 62):
 63    """
 64    Reference:
 65
 66    - Edit Block Public Access settings: https://docs.aws.amazon.com/AmazonS3/latest/userguide/HostingWebsiteOnS3Setup.html#step3-edit-block-public-access
 67    """
 68
 69    return s3_client.put_public_access_block(
 70        Bucket=bucket,
 71        PublicAccessBlockConfiguration={
 72            "BlockPublicAcls": False,
 73            "IgnorePublicAcls": False,
 74            "BlockPublicPolicy": False,
 75            "RestrictPublicBuckets": False,
 76        },
 77    )
 78
 79
 80def put_bucket_policy_for_public_website_hosting(
 81    s3_client: "S3Client",
 82    bucket: str,
 83    s3_key_prefix_list: T.Optional[T.List[str]] = None,
 84):
 85    if s3_key_prefix_list is None:
 86        allow_resource = f"arn:aws:s3:::{bucket}/*"
 87    else:
 88        allow_resource = [
 89            f"arn:aws:s3:::{bucket}/{prefix}*" for prefix in s3_key_prefix_list
 90        ]
 91    allow_statement = {
 92        "Sid": "PublicReadGetObject",
 93        "Effect": "Allow",
 94        "Principal": "*",
 95        "Action": "s3:GetObject",
 96        "Resource": allow_resource,
 97    }
 98    bucket_policy = {
 99        "Version": "2012-10-17",
100        "Statement": [
101            allow_statement,
102        ],
103    }
104    s3_client.put_bucket_policy(Bucket=bucket, Policy=json.dumps(bucket_policy))
105
106
107def put_bucket_policy_for_website_hosting(
108    s3_client: "S3Client",
109    bucket: str,
110    s3_key_prefix_list: T.Optional[T.List[str]] = None,
111    is_public: bool = False,
112    allowed_ip_cidr_block_list: T.Optional[T.List[str]] = None,
113    allowed_vpc_endpoint_list: T.Optional[T.List[str]] = None,
114    allowed_vpc_ip_cidr_block_list: T.Optional[T.List[str]] = None,
115    allowed_aws_account_id_list: T.Optional[T.List[str]] = None,
116    allowed_iam_user_id_list: T.Optional[T.List[str]] = None,
117    allowed_iam_role_id_list: T.Optional[T.List[str]] = None,
118):
119    """
120    Reference:
121
122    - Add a bucket policy that makes your bucket content publicly available: https://docs.aws.amazon.com/AmazonS3/latest/userguide/HostingWebsiteOnS3Setup.html#step4-add-bucket-policy-make-content-public
123    - How can I restrict access to my Amazon S3 bucket using specific VPC endpoints or IP addresses?: https://repost.aws/knowledge-center/block-s3-traffic-vpc-ip
124
125    :param s3_client:
126    :param bucket:
127    :param s3_key_prefix_list: the s3 key prefix that is allowed
128        to access. if not provided, then all s3 objects in the bucket is allowed
129    :param is_public: if True, then the bucket will be public.
130        either you set is_public to True, either specify all of ``allowed_xyz``
131        parameters, you cannot do both
132    :param allowed_ip_cidr_block_list:
133    :param allowed_vpc_ip_cidr_block_list:
134    :param allowed_vpc_endpoint_list:
135    :param allowed_aws_account_id_list:
136    :param allowed_iam_user_id_list:
137    :param allowed_iam_role_id_list:
138    """
139    if is_public is True:
140        # all of them has to be None
141        if (
142            sum(
143                [
144                    allowed_ip_cidr_block_list is not None,
145                    allowed_vpc_ip_cidr_block_list is not None,
146                    allowed_vpc_endpoint_list is not None,
147                    allowed_aws_account_id_list is not None,
148                    allowed_iam_user_id_list is not None,
149                    allowed_iam_role_id_list is not None,
150                ]
151            )
152            > 0
153        ):
154            raise ValueError(
155                "you set 'is_public' to True, but you also specified some of the "
156                "allowed_xyz parameters, you cannot do both!"
157            )
158        return put_bucket_policy_for_public_website_hosting(
159            s3_client=s3_client,
160            bucket=bucket,
161            s3_key_prefix_list=s3_key_prefix_list,
162        )
163
164    if s3_key_prefix_list is None:
165        allow_resource = f"arn:aws:s3:::{bucket}/*"
166        deny_resource = [
167            f"arn:aws:s3:::{bucket}",
168            f"arn:aws:s3:::{bucket}/*",
169        ]
170    else:
171        allow_resource = [
172            f"arn:aws:s3:::{bucket}/{prefix}*" for prefix in s3_key_prefix_list
173        ]
174        deny_resource = [f"arn:aws:s3:::{bucket}"]
175        deny_resource.extend(
176            [f"arn:aws:s3:::{bucket}/{prefix}*" for prefix in s3_key_prefix_list]
177        )
178
179    allow_statement = {
180        "Sid": "PublicReadGetObject",
181        "Effect": "Allow",
182        "Principal": "*",
183        "Action": "s3:GetObject",
184        "Resource": allow_resource,
185    }
186
187    # TODO: test the logic operator if there's multiple conditions
188    condition = {}
189    not_ip_address = {}
190    string_not_equal = {}
191    string_not_like = {}
192    if allowed_ip_cidr_block_list is not None:
193        not_ip_address["aws:SourceIp"] = allowed_ip_cidr_block_list
194
195    if allowed_vpc_ip_cidr_block_list is not None:
196        not_ip_address["aws:VpcSourceIp"] = allowed_vpc_ip_cidr_block_list
197
198    if allowed_vpc_endpoint_list is not None:
199        string_not_equal["aws:SourceVpce"] = allowed_vpc_endpoint_list
200
201    user_id_list = []
202    if allowed_aws_account_id_list is not None:
203        user_id_list.extend(allowed_aws_account_id_list)
204    if allowed_iam_user_id_list is not None:
205        user_id_list.extend(allowed_iam_user_id_list)
206    if allowed_iam_role_id_list is not None:
207        user_id_list.extend([f"{role_id}*" for role_id in allowed_iam_role_id_list])
208    if user_id_list:
209        string_not_like["aws:userId"] = user_id_list
210
211    if not_ip_address:
212        condition["NotIpAddress"] = not_ip_address
213    if string_not_equal:
214        condition["StringNotEquals"] = string_not_equal
215    if string_not_like:
216        condition["StringNotLike"] = string_not_like
217
218    if condition:
219        deny_statement = {
220            "Sid": "DenyAllExceptListedBelow",
221            "Effect": "Deny",
222            "Principal": "*",
223            "Action": "s3:*",
224            "Resource": deny_resource,
225            "Condition": condition,
226        }
227    else:
228        raise ValueError(
229            "you set 'is_public' to False, but none of allowed_xyz condition is specified!"
230        )
231
232    bucket_policy = {
233        "Version": "2012-10-17",
234        "Statement": [
235            allow_statement,
236            deny_statement,
237        ],
238    }
239
240    s3_client.put_bucket_policy(Bucket=bucket, Policy=json.dumps(bucket_policy))
241
242
243if __name__ == "__main__":
244    from boto_session_manager import BotoSesManager
245    from rich import print as rprint
246
247
248    def print_res(res: dict):
249        if "ResponseMetadata" in res:
250            del res["ResponseMetadata"]
251        rprint(res)
252
253
254    bsm = BotoSesManager(profile_name="bmt_app_devops_us_east_1")
255    bucket = "bmt-app-devops-us-east-1-doc-host"
256
257    s3_client = bsm.s3_client
258
259    website_config = get_bucket_website(s3_client, bucket)
260    if website_config is None:
261        enable_bucket_static_website_hosting(s3_client, bucket)
262
263    turn_off_block_public_access(s3_client, bucket)
264
265    trusted_ip_address = get_public_ip()
266
267    put_bucket_policy_for_website_hosting(
268        s3_client=s3_client,
269        bucket=bucket,
270        is_public=False,
271        allowed_ip_cidr_block_list=[
272            f"{trusted_ip_address}/32",
273        ],
274        allowed_iam_user_id_list=[
275            bsm.aws_account_user_id,
276        ],
277    )

S3 Policy 把 Admin 都 Deny 了怎么办#

有的时候因为操作失误, 你给 S3 Bucket Policy 设置了一个 Deny All 的规则. 这就会导致连你的 Admin 都会被 Deny 掉, 无法将这个 Bucket Policy 该回去了. 这个时候唯一的办法就是用 Root Account, 也就是 email password 登录. 然后进到 S3 Bucket 中删除这个 Bucket Policy.

如果你的 Account 是你用邮箱创建的还好. 但是如果你使用 AWS Organization 创建的, 你就需要用创建这个 Account 的时候的 email alias, 也就是带 + 号的那种. 例如你的 Org root email 是 alice@gmail.com, 那么你的 dev account 的 email 就会是 alice+dev@gmail.com. 你应该选择用 alice+dev@gmail.com 进行 root user 登录, 然后走一遍恢复密码的流程, AWS 会发一封 email 到你的 alice@gmail.com, 然后你就可以设一个密码, 然后用 root 登录, 把 bucket policy 删除即可.

Reference:

使用自己的 Domain 的关键步骤#

如果你想要用自己的 Domain (http://www.my-website.com) 作为 S3 上的 Static Website 的域名 (原本是 http://example-bucket.s3-website-us-west-1.amazonaws.com), 开启 CORS 是一个很关键的步骤, CORS (Cross-Origin Resource Sharing, 也叫跨域) 是 HTTP 协议中的一部分用于允许一个域读取另一个域上的资源的协议. 如果你没有更改域名, 你的请求是从 AWS 的域读取 S3 上的资源, 这个 AWS 域名和 S3 是同一个域, 所以你不需要 CORS. 而你更改了域名, 等于说是你的请求先到达你的域名服务提供商, 然后你的域名向 AWS 请求数据, 这时候 S3 就需要设置 CORS, 允许来自你的域名的请求.

举例来说, 就是设置一个如果来源是 http://my-website.com 的流量 S3 Bucket 就允许 Read 操作. 然后你在 Domain Registry 服务商那设置了 http://my-website.com, 到你 http://example-bucket.s3-website-us-west-1.amazonaws.com 的映射. 然后从 你的 my-website.com 到 S3 的 Http 请求的 header 里就会带上 Origin = my-website.com, 然后你的 S3 就会允许并返回 html 了.

Reference: