Failed manual deployment with cookieplone-based github actions [SOLVED]

After setting up with Cookieplone it fails at deploy.
I'm following the instructions in the devops/README-GHA.md

  1. Navigate to my ../actions/workflows/manual-deploy.yml file
  2. Click on Run workflow.
  3. Select Branch: main under Use workflow from.
  4. Press Run workflow.

I already have the target server setup but it's failing to get the stack to deploy the frontend and backend.

Here's the output of the failing step (I've noted where I'm seeing the issue)

Run kitconcept/docker-stack-deploy@v1.2.0
/usr/bin/docker run --name ghcriokitconceptdockerstackdeploy120_f8a827 --label 96febf --workdir /github/workspace --rm -e "INPUT_REGISTRY" -e "INPUT_USERNAME" -e "INPUT_PASSWORD" -e "INPUT_REMOTE_HOST" -e "INPUT_REMOTE_PORT" -e "INPUT_REMOTE_USER" -e "INPUT_REMOTE_PRIVATE_KEY" -e "INPUT_STACK_FILE" -e "INPUT_STACK_NAME" -e "INPUT_STACK_PARAM" -e "INPUT_ENV_FILE" -e "INPUT_DEPLOY_TIMEOUT" -e "INPUT_DEBUG" -e "REGISTRY" -e "USERNAME" -e "PASSWORD" -e "REMOTE_HOST" -e "REMOTE_PORT" -e "REMOTE_USER" -e "REMOTE_PRIVATE_KEY" -e "DEPLOY_TIMEOUT" -e "STACK_FILE" -e "STACK_NAME" -e "STACK_PARAM" -e "ENV_FILE" -e "DEBUG" -e "HOME" -e "GITHUB_JOB" -e "GITHUB_REF" -e "GITHUB_SHA" -e "GITHUB_REPOSITORY" -e "GITHUB_REPOSITORY_OWNER" -e "GITHUB_REPOSITORY_OWNER_ID" -e "GITHUB_RUN_ID" -e "GITHUB_RUN_NUMBER" -e "GITHUB_RETENTION_DAYS" -e "GITHUB_RUN_ATTEMPT" -e "GITHUB_REPOSITORY_ID" -e "GITHUB_ACTOR_ID" -e "GITHUB_ACTOR" -e "GITHUB_TRIGGERING_ACTOR" -e "GITHUB_WORKFLOW" -e "GITHUB_HEAD_REF" -e "GITHUB_BASE_REF" -e "GITHUB_EVENT_NAME" -e "GITHUB_SERVER_URL" -e "GITHUB_API_URL" -e "GITHUB_GRAPHQL_URL" -e "GITHUB_REF_NAME" -e "GITHUB_REF_PROTECTED" -e "GITHUB_REF_TYPE" -e "GITHUB_WORKFLOW_REF" -e "GITHUB_WORKFLOW_SHA" -e "GITHUB_WORKSPACE" -e "GITHUB_ACTION" -e "GITHUB_EVENT_PATH" -e "GITHUB_ACTION_REPOSITORY" -e "GITHUB_ACTION_REF" -e "GITHUB_PATH" -e "GITHUB_ENV" -e "GITHUB_STEP_SUMMARY" -e "GITHUB_STATE" -e "GITHUB_OUTPUT" -e "RUNNER_OS" -e "RUNNER_ARCH" -e "RUNNER_NAME" -e "RUNNER_ENVIRONMENT" -e "RUNNER_TOOL_CACHE" -e "RUNNER_TEMP" -e "RUNNER_WORKSPACE" -e "ACTIONS_RUNTIME_URL" -e "ACTIONS_RUNTIME_TOKEN" -e "ACTIONS_CACHE_URL" -e "ACTIONS_RESULTS_URL" -e GITHUB_ACTIONS=true -e CI=true -v "/var/run/docker.sock":"/var/run/docker.sock" -v "/home/runner/work/_temp/_github_home":"/github/home" -v "/home/runner/work/_temp/_github_workflow":"/github/workflow" -v "/home/runner/work/_temp/_runner_file_commands":"/github/file_commands" -v "/home/runner/work/myrepo/myrepo":"/github/workspace" ghcr.io/kitconcept/docker-stack-deploy:1.2.0
Environment Variables: Additional values
Container Registry: Logged in ghcr.io as myuser
SSH client: Configured
SSH client: Added private key
SSH remote: Keys added to /root/.ssh/known_hosts
SSH connect: Success
image ghcr.io/myuser/myrepo-varnish:main could not be accessed on a registry to record
its digest. Each node will access ghcr.io/myuser/myrepo-varnish:main independently,
possibly leading to different nodes running different
versions of the image.

image ghcr.io/myuser/myrepo-backend:main could not be accessed on a registry to record
its digest. Each node will access ghcr.io/myuser/myrepo-backend:main independently,
possibly leading to different nodes running different
versions of the image.

Deploy: Updated services
Deploy: Checking status
Service live-myrepo-work_backend state: paused # <--- PROBLEM
Service live-myrepo-work_db state: deployed
Service live-myrepo-work_frontend state: replicating 0/2 # <--- PROBLEM
Service live-myrepo-work_purger state: deployed
Service live-myrepo-work_traefik state: deployed
Service live-myrepo-work_varnish state: paused # <--- PROBLEM
Error: This deployment will not complete
Deploy: Failed

For some reason the image for the backend never got built.
After manually running the backend CI to generate the image and the frontend CI (just for completeness)

Ensure images exist

I take this to mean that I now have images:

Reran the failed deploy step

It stilled failed but now it seems only the varnish image is the issue:

Deploy: Updated services
Deploy: Checking status
Service live-my-repo-work_backend state: deployed
Service live-my-repo-work_db state: deployed
Service live-my-repo-work_frontend state: replicating 0/2 # <---- PROBLEM
Service live-my-repo-work_purger state: deployed
Service live-my-repo-work_traefik state: deployed
Service live-my-repo-work_varnish state: paused # <---- PROBLEM
Error: This deployment will not complete
Deploy: Failed
##[debug]Docker Action run completed with exit code 1
##[debug]Finishing: Deploy to cluster

It turns out that my varnish image had not been created.
I manually ran the varnish image creation step:

and then reran the manual deployment step.
It's almost working now, but failing with a timeout during deployment.

The problem - It's about building those docker images

The steps to create the various images (backend and varnish) initially failed. What I think happened:

  1. I did not configure my environment properly with secrets etc...
  2. I did not do so because I had a private repo which didn't allow me to create environments

The fix - allow creation of environments

To allow creation of environments I either had to:

a. Upgrade to Github Pro on private repos
b. or make my repo public.

Due to the nature of the project, I couldn't use a public repo so I upgraded to Github Pro.
After that I was able to setup the proper secrets etc... to support the Github actions.

The final issue (I hope) is that I'm getting a timeout during the deploy. It relates to the frontend.

Deploy: Updated services
Deploy: Checking status
Service live-my-repo-work_backend state: deployed
Service live-my-repo-work_db state: deployed
Service live-my-repo-work_frontend state: replicating 0/2 # <------ PROBLEM
Service live-my-repo-work_purger state: deployed
Service live-my-repo-work_traefik state: deployed
Service live-my-repo-work_varnish state: completed
Error: Timeout exceeded
Deploy: Failed

On the assumption that the images were no longer the issue. I wondered if previously failed step could have caused this.
In my case the creation of the site.

Ensured that the site had been created

On this hunch I decided to go to the devops folder and run

make stack-create-site

Successful reran manual deployment

Then I went back to the github actions and reran the manual deployment.
It worked :tada:
I'm sure this will save my future self a few hours or days :slight_smile: