AWS Onboarding (and Offboarding)#
The Hubverse team currently provides cloud hosting for hubs. A “cloud-enabled” hub is one that mirrors its data and configuration to an Amazon Web Services (AWS) S3 bucket. By default, the current hub directories are synced in near-real-time to AWS:
auxiliary-data
hub-config
model-abstracts
model-metadata
model-output
target-data
Cloud Onboarding Setup#
Because each hub has its own S3 bucket and other dedicated AWS resources, a member of the Hubverse team needs to be involved in cloud onboarding.
If a hub admin wants to enable cloud hosting, these are the steps to follow:
Decide on a name for the hub’s S3 bucket. S3 bucket names must follow Amazon’s bucket naming rules and be globally unique.
Create the hub’s AWS resources by following instructions in the
hubverse-infrastructureREADME.
Note: Don’t be intimidated by “creating AWS resources.” The process is automated and requires a three line config change.Once the AWS resources are in place, submit a PR to the hub:
Add a
cloudsection to theadmin.jsonfile. See the Hubverse schema documentation for more details.Add the
hubverse-aws-upload.yamlGitHub workflow file. This is a Hubverse-maintained workflow that runs after a PR is merged to the hub’smainbranch. You do not need to make changes to this file.Update the hub’s README to include information about accessing data from S3. The
hubTemplaterepo has some boilerplate to use as a starting point.
Tip
As an example of this process, here are the pull requests used to onboard the variant-nowcast-hub to AWS:
Other notes:
A hub can be onboarded at any time.
The S3 data sync occurs after pull requests are merged to the hub. The sync process does not interfere with hub operations (for example, if AWS is down, hub validations and other tasks will still work).
Mirroring a hub’s data to the Hubverse-hosted AWS account does not require AWS tokens or other secrets to be stored in its repository.
How it works#
At a high level, this diagram describes the interactions between hub users, the hub hosted on GitHub, and the hub’s data mirrored to AWS:
---
config:
theme: base
themeVariables:
primaryBorderColor: '#3c88be'
primaryColor: '#dbeefb'
---
sequenceDiagram
create actor A as hub admins and modelers
create participant h as hub
A->>h: PR: update hub config
A->>h: PR: submit model-output
h-->h: run validations
h-->h: generate target data
create participant hc as Hubverse cloud
h->>hc: sync config, target, and model output data
actor B as hub data user
B->>hc: query hub data
hc->>B: return data
Cloud Offboarding#
Removing a hub from Hubverse AWS hosting is essentially a reverse of the onboarding process, with a few caveats.
Update the hub’s
admin.jsonfile, setting thecloud.enabledvalue to false.
Note: You can also remove the entirecloudsection if you prefer.Optional: Remove the
hubverse-aws-upload.yamlworkflow file from the hub. Leaving this workflow intact won’t harm anything because it checksadmin.jsonforcloud.enabled= true before syncing data to AWS.To completely remove the hub’s AWS resources:
Manually delete the contents of the hub’s S3 bucket (AWS does not permit deleting S3 buckets that contain objects).
Submit a PR to
hubverse-infrastructurethat removes the hub from thehubs.yamlfile.Once the PR is merged, the hub’s AWS resources will be deleted.
Helpful links for Hubverse cloud#
The
hubverse-infrastructureREADME describes the AWS resources that power each cloud-enabled hub and provides an overview of the OIDC authentication process that allows hubs’ GitHub actions to write data to S3.Logs emitted by the hubverse-transform-model-output Lambda (i.e., the function that transforms incoming model-output files) are in the
/aws/lambda/hubverse-transform-model-outputCloudWatch log group and can be viewed via the AWS console.Additionally, there is a CloudWatch dashboard that shows errors and warnings emitted by the hubverse-transform-model-output Lambda function.
The Python package that powers the hubverse-transform-model-output Lambda is in the
hubverse-transformrepo. The README contains instructions for making changes and deploying the updated code to AWS (which is currently a manual, script-driven process).