AWS Onboarding (and Offboarding)

AWS Onboarding (and Offboarding)#

The Hubverse team currently provides cloud hosting for hubs. A “cloud-enabled” hub is one that mirrors its data and configuration to an Amazon Web Services (AWS) S3 bucket. By default, the current hub directories are synced in near-real-time to AWS:

auxiliary-data
hub-config
model-abstracts
model-metadata
model-output
target-data

Cloud Onboarding Setup#

Because each hub has its own S3 bucket and other dedicated AWS resources, a member of the Hubverse team needs to be involved in cloud onboarding.

If a hub admin wants to enable cloud hosting, these are the steps to follow:

Decide on a name for the hub’s S3 bucket. S3 bucket names must follow Amazon’s bucket naming rules and be globally unique.
Create the hub’s AWS resources by following instructions in the hubverse-infrastructure README.
Note: Don’t be intimidated by “creating AWS resources.” The process is automated and requires a three line config change.
Once the AWS resources are in place, submit a PR to the hub:
- Add a cloud section to the admin.json file. See the Hubverse schema documentation for more details.
- Add the hubverse-aws-upload.yaml GitHub workflow file. This is a Hubverse-maintained workflow that runs after a PR is merged to the hub’s main branch. You do not need to make changes to this file.
- Update the hub’s README to include information about accessing data from S3. The hubTemplate repo has some boilerplate to use as a starting point.

Tip

As an example of this process, here are the pull requests used to onboard the variant-nowcast-hub to AWS:

Other notes:

A hub can be onboarded at any time.
The S3 data sync occurs after pull requests are merged to the hub. The sync process does not interfere with hub operations (for example, if AWS is down, hub validations and other tasks will still work).
Mirroring a hub’s data to the Hubverse-hosted AWS account does not require AWS tokens or other secrets to be stored in its repository.

How it works#

At a high level, this diagram describes the interactions between hub users, the hub hosted on GitHub, and the hub’s data mirrored to AWS:

        ---
config:
  theme: base
  themeVariables:
    primaryBorderColor: '#3c88be'
    primaryColor: '#dbeefb'

---
sequenceDiagram
    create actor A as hub admins and modelers
    create participant h as hub
    A->>h: PR: update hub config
    A->>h: PR: submit model-output
    h-->h: run validations
    h-->h: generate target data
    create participant hc as Hubverse cloud
    h->>hc: sync config, target, and model output data
    actor B as hub data user
    B->>hc: query hub data
    hc->>B: return data

Cloud Offboarding#

Removing a hub from Hubverse AWS hosting is essentially a reverse of the onboarding process, with a few caveats.

Update the hub’s admin.json file, setting the cloud.enabled value to false.
Note: You can also remove the entire cloud section if you prefer.
Optional: Remove the hubverse-aws-upload.yaml workflow file from the hub. Leaving this workflow intact won’t harm anything because it checks admin.json for cloud.enabled = true before syncing data to AWS.
To completely remove the hub’s AWS resources:

Manually delete the contents of the hub’s S3 bucket (AWS does not permit deleting S3 buckets that contain objects).
Submit a PR to hubverse-infrastructure that removes the hub from the hubs.yaml file.
Once the PR is merged, the hub’s AWS resources will be deleted.

Helpful links for Hubverse cloud#

The hubverse-infrastructure README describes the AWS resources that power each cloud-enabled hub and provides an overview of the OIDC authentication process that allows hubs’ GitHub actions to write data to S3.
Logs emitted by the hubverse-transform-model-output Lambda (i.e., the function that transforms incoming model-output files) are in the /aws/lambda/hubverse-transform-model-output CloudWatch log group and can be viewed via the AWS console.
Additionally, there is a CloudWatch dashboard that shows errors and warnings emitted by the hubverse-transform-model-output Lambda function.
The Python package that powers the hubverse-transform-model-output Lambda is in the hubverse-transform repo. The README contains instructions for making changes and deploying the updated code to AWS (which is currently a manual, script-driven process).