Hosting a static website on Amazon S3#
Keywords: AWS, Amazon, S3, Host, Hosting
Summary#
使用 S3 作为 Host 静态网站的后台是一种非常方便省钱的方式. 传统静态网站都是需要一个长期在线的文件服务器, 这个文件服务器本身就是一部分开支. 如果使用 S3, 则你只需要支付存储和流量的费用即可. 并且 S3 有很多优势, 比如可以使用 CDN, 可以使用 Route53 换域名等等, 都是云原生支持的, 非常方便安全.
使用 S3 host static website 的关键步骤#
创建一个 S3 Bucket, 并且在 bucket 里的 permissions 菜单里将 Block public access (bucket settings) 设置为 Off. 这并不意味着你的数据就全变成 public 了, 这只是意味着关闭了 block public access 的功能. 这里要注意的是, 除了每个 bucket 可以单独设置这个, 在整个 Account 的 S3 settings 也可以设置对全部 bucket 生效的 block public access settings. 通常情况下一个安全的 AWS Accounts 会将这个设置打开, 也就是所有 bucket 默认拒绝 public access. 为了 host website, 你需要将 account 级别的这个设置关掉才能对具体的 bucket 也关掉这个设置. 但这意味着新建的 bucket 就不会默认打开这个设置了. 所以我一般推荐专门用一个 AWS Account 来做 public facing 的事情.
到 bucket 里的 properties 菜单里将 Static website hosting 打开.
到 bucket 里的 permissions 菜单里修改 Bucket policy, 定义谁可以访问这里的数据. 就如前面说的, 关闭 Block public access 并不会让你的数据变成 public, 而 Bucket policy 才是真正定义了将你的网站变成 public.
这里有两个 Statement 最为重要, 一个是 Allow 的部分, 定义了谁可以访问. 一个是 Deny, 定义了谁不可以访问. AWS 的规则是 explicit deny > explicit allow > default deny.
Allow 的部分一般是这样, 允许所有人访问这里的数据:
{
"Sid": "PublicReadGetObject",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::bucket-name/*"
}
Deny 的部分有很多种选择, 但是通常的目的是为了默认 deny 所有人, 除非是来自于某些受信的网络. 例如下面这个例子是只允许某些 IP 地址段访问, 而 deny 掉所有其他人:
{
"Sid": "VpcSourceIp",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::bucket-name",
"arn:aws:s3:::bucket-name/*"
],
"Condition": {
"NotIpAddress": {
"aws:SourceIp": [
"111.111.111.111/32"
]
}
}
}
这里的关键是 Condition
的部分. 这里还有几个例子可以参考.
只允许来自于某些 VPC Endpoint 的访问, 使用的是 VPC id:
"Condition": {
"StringNotEquals": {
"aws:SourceVpce": [
"vpce-1111111",
"vpce-2222222"
]
}
},
只允许来自于某些 VPC 的访问, 使用的是 VPC CIDR block:
"Condition": {
"NotIpAddress": {
"aws:VpcSourceIp": [
"10.1.1.1/32",
"172.1.1.1/32"
]
}
},
只允许来自于某些 IP 的访问, 使用的是 Public IPV4 地址:
"Condition": {
"NotIpAddress": {
"aws:SourceIp": [
"11.11.11.11/32",
"22.22.22.22/32"
]
}
},
只允许来自于某些 AWS Account, IAM User, IAM Role 的访问:
# AROAEXAMPLEID is the role ID of an IAM role that you want to allow
# AIDAEXAMPLEID is the user ID of an IAM user that you want to allow
# 111122223333 is the AWS account ID of the bucket, which represents the credentials of the AWS account root user
"Condition": {
"StringNotLike": {
"aws:userId": [
"AROAEXAMPLEID:*",
"AIDAEXAMPLEID",
"111122223333"
]
}
},
我最常用的 Bucket Policy 设置是只允许来自于受信的 IP 地址访问. 如果是我个人则是我家的 IP 地址, 如果是公司则是公司的 VPN IP 地址. 并且 CORS 没有打开, 因为我一般不自定义 Domain. 我的 Policy 如下:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "PublicReadGetObject",
"Effect": "Allow",
"Principal": "*",
"Action": "s3:GetObject",
"Resource": "arn:aws:s3:::${bucket_name}/*"
},
{
"Sid": "VpcSourceIp",
"Effect": "Deny",
"Principal": "*",
"Action": "s3:*",
"Resource": [
"arn:aws:s3:::${bucket_name}",
"arn:aws:s3:::${bucket_name}/*"
],
"Condition": {
"NotIpAddress": {
"aws:SourceIp": "${trusted_ip_address}/32"
}
}
}
]
}
至此, 你就可以访问你的 static website 了. 其中 S3 object 到网站 URL 的映射关系是: s3://${bucket}/${key}
-> https://${bucket}.s3.amazonaws.com/${key}
Reference:
Tutorial: Configuring a static website on Amazon S3: https://docs.aws.amazon.com/AmazonS3/latest/userguide/HostingWebsiteOnS3Setup.html
How can I restrict access to my Amazon S3 bucket using specific VPC endpoints or IP addresses?: https://repost.aws/knowledge-center/block-s3-traffic-vpc-ip
Automation Script#
下面我们提供了一个脚本, 能够方便地将一个 S3 Bucket 设置为可以 host static website 的状态.
1# -*- coding: utf-8 -*-
2
3import typing as T
4import json
5from urllib import request
6
7import botocore.exceptions
8
9if T.TYPE_CHECKING:
10 from mypy_boto3_s3 import S3Client
11
12checkip_url = "https://checkip.amazonaws.com"
13
14
15def get_public_ip() -> str:
16 with request.urlopen(checkip_url) as response:
17 return response.read().decode("utf-8").strip()
18
19
20def get_bucket_website(
21 s3_client: "S3Client",
22 bucket: str,
23) -> T.Optional[dict]:
24 try:
25 return s3_client.get_bucket_website(Bucket=bucket)
26 except botocore.exceptions.ClientError as e:
27 if e.response["Error"]["Code"] == "NoSuchWebsiteConfiguration":
28 return None
29 else:
30 raise e
31
32
33def enable_bucket_static_website_hosting(
34 s3_client: "S3Client",
35 bucket: str,
36 index_document: str = "index.html",
37 error_document: T.Optional[str] = None,
38) -> dict:
39 """
40 Reference:
41
42 - Enable static website hosting
43
44 : https://docs.aws.amazon.com/AmazonS3/latest/userguide/HostingWebsiteOnS3Setup.html#step2-create-bucket-config-as-website
45 """
46 website_configuration = dict(
47 IndexDocument=dict(Suffix="index.html"),
48 )
49 if error_document is not None:
50 website_configuration["ErrorDocument"] = dict(Key=error_document)
51 return s3_client.put_bucket_website(
52 Bucket=bucket,
53 WebsiteConfiguration=dict(
54 IndexDocument=dict(Suffix=index_document),
55 ),
56 )
57
58
59def turn_off_block_public_access(
60 s3_client: "S3Client",
61 bucket: str,
62):
63 """
64 Reference:
65
66 - Edit Block Public Access settings: https://docs.aws.amazon.com/AmazonS3/latest/userguide/HostingWebsiteOnS3Setup.html#step3-edit-block-public-access
67 """
68
69 return s3_client.put_public_access_block(
70 Bucket=bucket,
71 PublicAccessBlockConfiguration={
72 "BlockPublicAcls": False,
73 "IgnorePublicAcls": False,
74 "BlockPublicPolicy": False,
75 "RestrictPublicBuckets": False,
76 },
77 )
78
79
80def put_bucket_policy_for_public_website_hosting(
81 s3_client: "S3Client",
82 bucket: str,
83 s3_key_prefix_list: T.Optional[T.List[str]] = None,
84):
85 if s3_key_prefix_list is None:
86 allow_resource = f"arn:aws:s3:::{bucket}/*"
87 else:
88 allow_resource = [
89 f"arn:aws:s3:::{bucket}/{prefix}*" for prefix in s3_key_prefix_list
90 ]
91 allow_statement = {
92 "Sid": "PublicReadGetObject",
93 "Effect": "Allow",
94 "Principal": "*",
95 "Action": "s3:GetObject",
96 "Resource": allow_resource,
97 }
98 bucket_policy = {
99 "Version": "2012-10-17",
100 "Statement": [
101 allow_statement,
102 ],
103 }
104 s3_client.put_bucket_policy(Bucket=bucket, Policy=json.dumps(bucket_policy))
105
106
107def put_bucket_policy_for_website_hosting(
108 s3_client: "S3Client",
109 bucket: str,
110 s3_key_prefix_list: T.Optional[T.List[str]] = None,
111 is_public: bool = False,
112 allowed_ip_cidr_block_list: T.Optional[T.List[str]] = None,
113 allowed_vpc_endpoint_list: T.Optional[T.List[str]] = None,
114 allowed_vpc_ip_cidr_block_list: T.Optional[T.List[str]] = None,
115 allowed_aws_account_id_list: T.Optional[T.List[str]] = None,
116 allowed_iam_user_id_list: T.Optional[T.List[str]] = None,
117 allowed_iam_role_id_list: T.Optional[T.List[str]] = None,
118):
119 """
120 Reference:
121
122 - Add a bucket policy that makes your bucket content publicly available: https://docs.aws.amazon.com/AmazonS3/latest/userguide/HostingWebsiteOnS3Setup.html#step4-add-bucket-policy-make-content-public
123 - How can I restrict access to my Amazon S3 bucket using specific VPC endpoints or IP addresses?: https://repost.aws/knowledge-center/block-s3-traffic-vpc-ip
124
125 :param s3_client:
126 :param bucket:
127 :param s3_key_prefix_list: the s3 key prefix that is allowed
128 to access. if not provided, then all s3 objects in the bucket is allowed
129 :param is_public: if True, then the bucket will be public.
130 either you set is_public to True, either specify all of ``allowed_xyz``
131 parameters, you cannot do both
132 :param allowed_ip_cidr_block_list:
133 :param allowed_vpc_ip_cidr_block_list:
134 :param allowed_vpc_endpoint_list:
135 :param allowed_aws_account_id_list:
136 :param allowed_iam_user_id_list:
137 :param allowed_iam_role_id_list:
138 """
139 if is_public is True:
140 # all of them has to be None
141 if (
142 sum(
143 [
144 allowed_ip_cidr_block_list is not None,
145 allowed_vpc_ip_cidr_block_list is not None,
146 allowed_vpc_endpoint_list is not None,
147 allowed_aws_account_id_list is not None,
148 allowed_iam_user_id_list is not None,
149 allowed_iam_role_id_list is not None,
150 ]
151 )
152 > 0
153 ):
154 raise ValueError(
155 "you set 'is_public' to True, but you also specified some of the "
156 "allowed_xyz parameters, you cannot do both!"
157 )
158 return put_bucket_policy_for_public_website_hosting(
159 s3_client=s3_client,
160 bucket=bucket,
161 s3_key_prefix_list=s3_key_prefix_list,
162 )
163
164 if s3_key_prefix_list is None:
165 allow_resource = f"arn:aws:s3:::{bucket}/*"
166 deny_resource = [
167 f"arn:aws:s3:::{bucket}",
168 f"arn:aws:s3:::{bucket}/*",
169 ]
170 else:
171 allow_resource = [
172 f"arn:aws:s3:::{bucket}/{prefix}*" for prefix in s3_key_prefix_list
173 ]
174 deny_resource = [f"arn:aws:s3:::{bucket}"]
175 deny_resource.extend(
176 [f"arn:aws:s3:::{bucket}/{prefix}*" for prefix in s3_key_prefix_list]
177 )
178
179 allow_statement = {
180 "Sid": "PublicReadGetObject",
181 "Effect": "Allow",
182 "Principal": "*",
183 "Action": "s3:GetObject",
184 "Resource": allow_resource,
185 }
186
187 # TODO: test the logic operator if there's multiple conditions
188 condition = {}
189 not_ip_address = {}
190 string_not_equal = {}
191 string_not_like = {}
192 if allowed_ip_cidr_block_list is not None:
193 not_ip_address["aws:SourceIp"] = allowed_ip_cidr_block_list
194
195 if allowed_vpc_ip_cidr_block_list is not None:
196 not_ip_address["aws:VpcSourceIp"] = allowed_vpc_ip_cidr_block_list
197
198 if allowed_vpc_endpoint_list is not None:
199 string_not_equal["aws:SourceVpce"] = allowed_vpc_endpoint_list
200
201 user_id_list = []
202 if allowed_aws_account_id_list is not None:
203 user_id_list.extend(allowed_aws_account_id_list)
204 if allowed_iam_user_id_list is not None:
205 user_id_list.extend(allowed_iam_user_id_list)
206 if allowed_iam_role_id_list is not None:
207 user_id_list.extend([f"{role_id}*" for role_id in allowed_iam_role_id_list])
208 if user_id_list:
209 string_not_like["aws:userId"] = user_id_list
210
211 if not_ip_address:
212 condition["NotIpAddress"] = not_ip_address
213 if string_not_equal:
214 condition["StringNotEquals"] = string_not_equal
215 if string_not_like:
216 condition["StringNotLike"] = string_not_like
217
218 if condition:
219 deny_statement = {
220 "Sid": "DenyAllExceptListedBelow",
221 "Effect": "Deny",
222 "Principal": "*",
223 "Action": "s3:*",
224 "Resource": deny_resource,
225 "Condition": condition,
226 }
227 else:
228 raise ValueError(
229 "you set 'is_public' to False, but none of allowed_xyz condition is specified!"
230 )
231
232 bucket_policy = {
233 "Version": "2012-10-17",
234 "Statement": [
235 allow_statement,
236 deny_statement,
237 ],
238 }
239
240 s3_client.put_bucket_policy(Bucket=bucket, Policy=json.dumps(bucket_policy))
241
242
243if __name__ == "__main__":
244 from boto_session_manager import BotoSesManager
245 from rich import print as rprint
246
247
248 def print_res(res: dict):
249 if "ResponseMetadata" in res:
250 del res["ResponseMetadata"]
251 rprint(res)
252
253
254 bsm = BotoSesManager(profile_name="bmt_app_devops_us_east_1")
255 bucket = "bmt-app-devops-us-east-1-doc-host"
256
257 s3_client = bsm.s3_client
258
259 website_config = get_bucket_website(s3_client, bucket)
260 if website_config is None:
261 enable_bucket_static_website_hosting(s3_client, bucket)
262
263 turn_off_block_public_access(s3_client, bucket)
264
265 trusted_ip_address = get_public_ip()
266
267 put_bucket_policy_for_website_hosting(
268 s3_client=s3_client,
269 bucket=bucket,
270 is_public=False,
271 allowed_ip_cidr_block_list=[
272 f"{trusted_ip_address}/32",
273 ],
274 allowed_iam_user_id_list=[
275 bsm.aws_account_user_id,
276 ],
277 )
S3 Policy 把 Admin 都 Deny 了怎么办#
有的时候因为操作失误, 你给 S3 Bucket Policy 设置了一个 Deny All 的规则. 这就会导致连你的 Admin 都会被 Deny 掉, 无法将这个 Bucket Policy 该回去了. 这个时候唯一的办法就是用 Root Account, 也就是 email password 登录. 然后进到 S3 Bucket 中删除这个 Bucket Policy.
如果你的 Account 是你用邮箱创建的还好. 但是如果你使用 AWS Organization 创建的, 你就需要用创建这个 Account 的时候的 email alias, 也就是带 + 号的那种. 例如你的 Org root email 是 alice@gmail.com, 那么你的 dev account 的 email 就会是 alice+dev@gmail.com. 你应该选择用 alice+dev@gmail.com 进行 root user 登录, 然后走一遍恢复密码的流程, AWS 会发一封 email 到你的 alice@gmail.com, 然后你就可以设一个密码, 然后用 root 登录, 把 bucket policy 删除即可.
Reference:
使用自己的 Domain 的关键步骤#
如果你想要用自己的 Domain (http://www.my-website.com) 作为 S3 上的 Static Website 的域名 (原本是 http://example-bucket.s3-website-us-west-1.amazonaws.com), 开启 CORS 是一个很关键的步骤, CORS (Cross-Origin Resource Sharing, 也叫跨域) 是 HTTP 协议中的一部分用于允许一个域读取另一个域上的资源的协议. 如果你没有更改域名, 你的请求是从 AWS 的域读取 S3 上的资源, 这个 AWS 域名和 S3 是同一个域, 所以你不需要 CORS. 而你更改了域名, 等于说是你的请求先到达你的域名服务提供商, 然后你的域名向 AWS 请求数据, 这时候 S3 就需要设置 CORS, 允许来自你的域名的请求.
举例来说, 就是设置一个如果来源是 http://my-website.com 的流量 S3 Bucket 就允许 Read 操作. 然后你在 Domain Registry 服务商那设置了 http://my-website.com, 到你 http://example-bucket.s3-website-us-west-1.amazonaws.com 的映射. 然后从 你的 my-website.com 到 S3 的 Http 请求的 header 里就会带上 Origin = my-website.com, 然后你的 S3 就会允许并返回 html 了.
Reference:
Tutorial: Configuring a static website using a custom domain registered with Route 53: https://docs.aws.amazon.com/AmazonS3/latest/userguide/website-hosting-custom-domain-walkthrough.html
Using cross-origin resource sharing (CORS): https://docs.aws.amazon.com/AmazonS3/latest/userguide/cors.html
Enabling CORS for a REST API resource: https://docs.aws.amazon.com/apigateway/latest/developerguide/how-to-cors.html
Amazon S3 – Cross Origin Resource Sharing Support: https://aws.amazon.com/blogs/aws/amazon-s3-cross-origin-resource-sharing/