`corepack` problems (specially in CI)

It seems there's been a corepack armageddon these last hours: Newly published versions of package managers distributed from npm cannot be installed due to key id mismatch · Issue #612 · nodejs/corepack · GitHub

The error looks like:

C:\Program Files\nodejs\node_modules\corepack\dist\lib\corepack.cjs:21535
  if (key == null || signature == null) throw new Error(`Cannot find matching keyid: ${JSON.stringify({ signatures, keys })}`);
                                              ^

Error: Cannot find matching keyid: {"signatures":[{"keyid":"SHA256:DhQ8wR5APBvFHLF/+Tc+AYvPOdTpcIDqOhxsBHRwC7U","sig":"MEUCIQDlkgmNyZjT7KUY8AO6jH7Gs3fyiXG8nbTnuLbd8fOS2AIgXyJ6SaYhumMFzUYQAZPJGhsnlaD5N0X2MZsbG+eS/Xo="}],"keys":[{"expires":null,"keyid":"SHA256:jl3bwswu80PjjokCgh0o2w5c2U4LhQAE57gj9cz1kzA","keytype":"ecdsa-sha2-nistp256","scheme":"ecdsa-sha2-nistp256","key":"MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE1Olb3zMAFFxXKHiIkQO5cJ3Yhl5i6UPp+IhuteBJbuHcA5UogKo0EWtlWwW6KSaKoTNEYL7JlCQiVnkhBktUgg=="}]}
    at verifySignature (C:\Program Files\nodejs\node_modules\corepack\dist\lib\corepack.cjs:21535:47)
    at installVersion (C:\Program Files\nodejs\node_modules\corepack\dist\lib\corepack.cjs:21882:7)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async Engine.ensurePackageManager (C:\Program Files\nodejs\node_modules\corepack\dist\lib\corepack.cjs:22316:32)
    at async Engine.executePackageManagerRequest (C:\Program Files\nodejs\node_modules\corepack\dist\lib\corepack.cjs:22416:25)
    at async Object.runMain (C:\Program Files\nodejs\node_modules\corepack\dist\lib\corepack.cjs:23102:5)        

Node.js v22.13.0

The workaround is to install latest corepack if you are hit by it (specially on CI):

npm i -g corepack@latest && corepack enable

As I see could be that this is unnecessary when they update a couple of versions here and there, but in case someone needs it today.

7 Likes

Thank you :blush:

This is making its way through Read The Docs builds as well for Volto:

How i needed this today, its been 5days of corepack war :sob: Thank you.

1 Like

@sneridagh I tried to fix it by your suggestion, but it did not help.

During CI on Github Action Frontend CI, I got the first error hit (some more similar later):

during code analysis -> Get pnpm store directory

0s
Run echo "STORE_PATH=$(pnpm store path --silent)" >> $GITHUB_ENV
/opt/hostedtoolcache/node/22.13.1/x64/lib/node_modules/corepack/dist/lib/corepack.cjs:21535
  if (key == null || signature == null) throw new Error(`Cannot find matching keyid: ${JSON.stringify({ signatures, keys })}`);
                                              ^

Error: Cannot find matching keyid: {"signatures":[{"sig":"MEYCIQDbcyRXEEpUvMj22WsicmOsvx+ctqHZv1vLScf3/247EAIhANfMkRDNAHdtTDNZ34BVH2z2z0Ef8o5VK4osH6ES9RHW","keyid":"SHA256:DhQ8wR5APBvFHLF/+Tc+AYvPOdTpcIDqOhxsBHRwC7U"}],"keys":[{"expires":null,"keyid":"SHA256:jl3bwswu80PjjokCgh0o2w5c2U4LhQAE57gj9cz1kzA","keytype":"ecdsa-sha2-nistp256","scheme":"ecdsa-sha2-nistp256","key":"MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAE1Olb3zMAFFxXKHiIkQO5cJ3Yhl5i6UPp+IhuteBJbuHcA5UogKo0EWtlWwW6KSaKoTNEYL7JlCQiVnkhBktUgg=="}]}
    at verifySignature (/opt/hostedtoolcache/node/22.13.1/x64/lib/node_modules/corepack/dist/lib/corepack.cjs:21535:47)
    at fetchLatestStableVersion (/opt/hostedtoolcache/node/22.13.1/x64/lib/node_modules/corepack/dist/lib/corepack.cjs:21553:5)
    at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
    at async fetchLatestStableVersion2 (/opt/hostedtoolcache/node/22.13.1/x64/lib/node_modules/corepack/dist/lib/corepack.cjs:21672:14)
    at async Engine.getDefaultVersion (/opt/hostedtoolcache/node/22.13.1/x64/lib/node_modules/corepack/dist/lib/corepack.cjs:22298:23)
    at async Engine.executePackageManagerRequest (/opt/hostedtoolcache/node/22.13.1/x64/lib/node_modules/corepack/dist/lib/corepack.cjs:22396:47)
    at async Object.runMain (/opt/hostedtoolcache/node/22.13.1/x64/lib/node_modules/corepack/dist/lib/corepack.cjs:23102:5)

Node.js v22.13.1

On the dev machine I had corepack --version
0.29.4

I ran npm i -g corepack@latest && corepack enable on the dev machine and now have: corepack --version
0.31.0
pnpm --version
9.13.0

Since nothing changed locally to be commited to github the error on github remained.

I reran make install in the project root and
in devops

  • make server-setup,
  • make stack-deploy

But this does not help to make the Frontend CI image generation on github work, because nothing changed to be pushed.

What step I am missing?

FYI: other steps failing:

Setup pnpm cache

...
Error: Input required and not supplied: path

Linting

...
Error: Cannot find matching keyid: {"signatures":

i18n sync

...
Error: Cannot find matching keyid: {"signatures":

Unit Tests

...
Error: Cannot find matching keyid: {"signatures":

via @sneridagh :

on every occurrence (in CI too) of corepack enable you replace it with npm i -g corepack@latest && corepack enable . Also images [need to be] updated after that issue are required (18.8.2 at least)

How to fix the corepack issue in the CI workflow

  • The corepack enable command is present e.g. in the projects .github/workflows/frontend.yml around line 50 and then needs to be temporarily adjusted as described.
  • corepack enable was not found nowhere else in the workflows.

I tried it out and can confirm : Frontend CI image creation and Manual deployment went smooth afterwards.

Run Plone locally with a Docker stack

Remark: There is also an occurence as corepack enable pnpm in the frontend/Dockerfile at the end. I did not need to modify this since I run Plone locally without a Docker stack and my global run of npm i -g corepack@latest && corepack enable did the job for now.

Guess: -> You may need to fix it in frontend/Dockerfile as well, if you use Docker locally!

UPDATED: This was not enough. You need to fix it definitely in more places see PR by @davisagli below. `corepack` problems (specially in CI) - #7 by davisagli

@acsr You made me realize we need to add the fix in cookieplone-templates: Add corepack workaround in more places by davisagli · Pull Request #162 · plone/cookieplone-templates · GitHub

1 Like

@davisagli A big hug for the move! I have overseen some other locations as well. I include my error logs because I have not found a solution elsewhere until now.

I had this while trying to get the deployment to work on ARM and was not sure if it was platform related. The images build well, but the deployment was not finishing on the server. After rolling the same code out on AMD again the issue remains. There must have been still some bomb still ticking in the frontend.

My footgun was omitting to change line 32 in the frontend/Dockerfile

from corepack enable pnpm to
npm i -g corepack@latest && corepack enable pnpm

In my case the Image creation went fine but the Manual deployment workflow did not work.

You get constantly restarting frontend containers with this footprint when visiting the logs with make stack-logs-frontend from the devops folder:

==> Stack my-plone-volto-project-com: Logs for frontend in context prod 
...
my-volto-project-com_frontend.1.3u0sqs69nmwh@kwk    | Command failed with signal "SIGTERM"
my-volto-project-com_frontend.1.4e3ajerpdxpm@kwk    | 
my-volto-project-com_frontend.1.4e3ajerpdxpm@kwk    | > project-dev@1.0.0-alpha.0 start:prod /app
my-volto-project-com_frontend.1.4e3ajerpdxpm@kwk    | > pnpm --filter @plone/volto start:prod
my-volto-project-com_frontend.1.4e3ajerpdxpm@kwk    | 
my-volto-project-com_frontend.1.4e3ajerpdxpm@kwk    | 
my-volto-project-com_frontend.1.4e3ajerpdxpm@kwk    | > @plone/volto@18.8.2 start:prod /app/core/packages/volto
my-volto-project-com_frontend.1.4e3ajerpdxpm@kwk    | > NODE_ENV=production node build/server.js
my-volto-project-com_frontend.1.4e3ajerpdxpm@kwk    | 
my-volto-project-com_frontend.1.4e3ajerpdxpm@kwk    | API server (API_PATH) is set to: https://my.plone.volto.project.com
my-volto-project-com_frontend.1.4e3ajerpdxpm@kwk    | Proxying API requests from https://my.plone.volto.project.com/++api++ to http://backend:8080/Plone
my-volto-project-com_frontend.1.4e3ajerpdxpm@kwk    | 🎭 Volto started at 0.0.0.0:3000 🚀
my-volto-project-com_frontend.1.4e3ajerpdxpm@kwk    |  ELIFECYCLE  Command failed.
my-volto-project-com_frontend.1.4e3ajerpdxpm@kwk    | /app/core/packages/volto:
my-volto-project-com_frontend.1.4e3ajerpdxpm@kwk    |  ERR_PNPM_RECURSIVE_RUN_FIRST_FAIL  @plone/volto@18.8.2 start:prod: `NODE_ENV=production node build/server.js`
my-volto-project-com_frontend.1.4e3ajerpdxpm@kwk    | Command failed with signal "SIGTERM"
my-volto-project-com_frontend.2.v5wziasy1xyn@kwk    | 
...
## Note: this repeats forever for every reincarnation of the stalling container after ~60 seconds

See all changes in the PR in detail (not all may have affected your resulting project):

It is still not working for me. Same error in Frontend containers restarting permanently… Even after change in Dockerfile

@acsr I'm on the same issue as well with a new website on the foundation docker swarm cluster, with a very recent cookieplone and volto 18.9.0. I've updated the corepack calls, but another suspect is the docker swarm healthcheck. The default is coming for the container image from plone-frontend its prod-config Dockerfile.

Now overriding it in my project Dockerfile and increasing the timeouts. Maybe pnpm is taking more than 30 seconds for something and then docker sees an unhealth container.

cc: @davisagli

@fredvd Can you write the path of the defaults more explicit. I am not sure if I get this. Do you mean: project-title/frontend/Dockerfile or is prod-config Dockerfile a default deeper in the packages.

I am asking because I want to understand the settings to override.

Can you please specify how you "increase the timeouts"?

This seems to be clearly project-title/frontend/Dockerfile

Other timeouts?

In my frontend/Dockerfile there is no timeout.

My failure occurs after running the Manual deployment from Github Actions web UI.

There is line 49 in my project-title/.github/workflows/manual_deploy.yml
deploy_timeout: 480 which is actually in minutes resulting in 8 hours as far as I understand

this is my Manual Deployment log from github adjacenting the deployment failure:

Run kitconcept/docker-stack-deploy@v1.2.0
  with:
    registry: ghcr.io
    username: my-user-id
    password: ***
    remote_host: ***
    remote_port: ***
    remote_user: ***
    remote_private_key: ***
    stack_file: devops/stacks/***.yml
    stack_name: project-title
    stack_param: main
    env_file: ***
    deploy_timeout: 480
    debug: 0
/usr/bin/docker run --name ghcriokitconceptdockerstackdeploy120_afc870 --label d89650 --workdir /github/workspace --rm -e "INPUT_REGISTRY" -e "INPUT_USERNAME" -e "INPUT_PASSWORD" -e "INPUT_REMOTE_HOST" -e "INPUT_REMOTE_PORT" -e "INPUT_REMOTE_USER" -e "INPUT_REMOTE_PRIVATE_KEY" -e "INPUT_STACK_FILE" -e "INPUT_STACK_NAME" -e "INPUT_STACK_PARAM" -e "INPUT_ENV_FILE" -e "INPUT_DEPLOY_TIMEOUT" -e "INPUT_DEBUG" -e "REGISTRY" -e "USERNAME" -e "PASSWORD" -e "REMOTE_HOST" -e "REMOTE_PORT" -e "REMOTE_USER" -e "REMOTE_PRIVATE_KEY" -e "DEPLOY_TIMEOUT" -e "STACK_FILE" -e "STACK_NAME" -e "STACK_PARAM" -e "ENV_FILE" -e "DEBUG" -e "HOME" -e "GITHUB_JOB" -e "GITHUB_REF" -e "GITHUB_SHA" -e "GITHUB_REPOSITORY" -e "GITHUB_REPOSITORY_OWNER" -e "GITHUB_REPOSITORY_OWNER_ID" -e "GITHUB_RUN_ID" -e "GITHUB_RUN_NUMBER" -e "GITHUB_RETENTION_DAYS" -e "GITHUB_RUN_ATTEMPT" -e "GITHUB_REPOSITORY_ID" -e "GITHUB_ACTOR_ID" -e "GITHUB_ACTOR" -e "GITHUB_TRIGGERING_ACTOR" -e "GITHUB_WORKFLOW" -e "GITHUB_HEAD_REF" -e "GITHUB_BASE_REF" -e "GITHUB_EVENT_NAME" -e "GITHUB_SERVER_URL" -e "GITHUB_API_URL" -e "GITHUB_GRAPHQL_URL" -e "GITHUB_REF_NAME" -e "GITHUB_REF_PROTECTED" -e "GITHUB_REF_TYPE" -e "GITHUB_WORKFLOW_REF" -e "GITHUB_WORKFLOW_SHA" -e "GITHUB_WORKSPACE" -e "GITHUB_ACTION" -e "GITHUB_EVENT_PATH" -e "GITHUB_ACTION_REPOSITORY" -e "GITHUB_ACTION_REF" -e "GITHUB_PATH" -e "GITHUB_ENV" -e "GITHUB_STEP_SUMMARY" -e "GITHUB_STATE" -e "GITHUB_OUTPUT" -e "RUNNER_OS" -e "RUNNER_ARCH" -e "RUNNER_NAME" -e "RUNNER_ENVIRONMENT" -e "RUNNER_TOOL_CACHE" -e "RUNNER_TEMP" -e "RUNNER_WORKSPACE" -e "ACTIONS_RUNTIME_URL" -e "ACTIONS_RUNTIME_TOKEN" -e "ACTIONS_CACHE_URL" -e "ACTIONS_RESULTS_URL" -e GITHUB_ACTIONS=true -e CI=true -v "/var/run/docker.sock":"/var/run/docker.sock" -v "/home/runner/work/_temp/_github_home":"/github/home" -v "/home/runner/work/_temp/_github_workflow":"/github/workflow" -v "/home/runner/work/_temp/_runner_file_commands":"/github/file_commands" -v "/home/runner/work/project-title/project-title":"/github/workspace" ghcr.io/kitconcept/docker-stack-deploy:1.2.0
Environment Variables: Additional values
Container Registry: Logged in ghcr.io as my-user-id
SSH client: Configured
SSH client: Added private key
SSH remote: Keys added to /root/.ssh/known_hosts
SSH connect: Success
Deploy: Updated services
Deploy: Checking status
Service project-title_backend state: deployed
Service project-title_db state: deployed
Service project-title_frontend state: replicating 0/2
Service project-title_purger state: deployed
Service project-title_traefik state: deployed
Service project-title_varnish state: completed
Service project-title_frontend state: paused
Error: This deployment will not complete
Deploy: Failed

I already tried to wipe the swarm before reployment via ssh on the server:

docker service rm $(docker service ls -q)

This helps to remove stalling containers remaining from the last run.

Update: more than 8h later:
The end of the Github Action logs for Deploy to Cluster in Manual Deployment were updated at the end -> finally replacing the state Error: This deployment will not complete with:

...
Deploy: Checking status
Service kwk-dev-acsr-de_backend state: replicating 0/2
Service kwk-dev-acsr-de_db state: deployed
Service kwk-dev-acsr-de_frontend state: deployed
Service kwk-dev-acsr-de_purger state: deployed
Service kwk-dev-acsr-de_traefik state: deployed
Service kwk-dev-acsr-de_varnish state: deployed
Service kwk-dev-acsr-de_backend state: deployed
Service kwk-dev-acsr-de_frontend state: replicating 0/2
Error: Timeout exceeded
Deploy: Failed

@acsr.

Can you please specify how you "increase the timeouts"?

Other timeouts?

In my frontend/Dockerfile there is no timeout.

There is some 'inheritance' in play here. When CI/CD builds the frontent image from the Dockerfile in your project definition, it 'inherits' or builds upon an already generated standard image. In the project I'm struggling with the same issue, that is done on this line:

Because of this FROM AS, it inherits the Dockerfile HEALTHCHECK statement from the built plone/server-prod-config:


Unless you override it again in our 'final' Project Dockerfile, which I did to test if it was a HEALTHCHECK timeout:

But it doesn't help.

Now that I've increased the HEALTHCHECK timeout to one minute I have a bit of time to enter the container and try to inspect what is going on. There are a number of processes consuming CPU. But the healthcheck indeed fails, both the programmed one as a simple telnet:

  PID  PPID USER     STAT   VSZ %VSZ %CPU COMMAND
   31    30 node     S    1252m  32%   0% node build/server.js
   19    18 node     S    1331m  34%   0% node /usr/local/bin/pnpm --filter @plone/volto start:prod
    1     0 node     S    1258m  32%   0% node /usr/local/bin/pnpm start:prod
   50     0 node     S     4188   0%   0% bash
   65    50 node     R     3268   0%   0% top
   30    19 node     S     2592   0%   0% sh -c NODE_ENV=production node build/server.js
   18     1 node     S     2576   0%   0% sh -c pnpm --filter @plone/volto start:prod
node@80ac45e46d56:/app$ ps axuw
PID   USER     COMMAND
    1 node     node /usr/local/bin/pnpm start:prod
   18 node     sh -c pnpm --filter @plone/volto start:prod
   19 node     node /usr/local/bin/pnpm --filter @plone/volto start:prod
   30 node     sh -c NODE_ENV=production node build/server.js
   31 node     node build/server.js
   50 node     bash
   87 node     ps axuw
node@80ac45e46d56:/app$ export LISTEN_PORT=3000
node@80ac45e46d56:/app$ [ -n "$LISTEN_PORT" ] || LISTEN_PORT=3000 ; wget -q http://127.0.0.1:"$LISTEN_PORT" -O - || exit 1
exit
fredvd@worker01:~$ docker exec -it 80ac45e46d56 bash
node@80ac45e46d56:/app$ telnet localhost 3000
telnet: can't connect to remote host: Connection refused

So either the port is wrong, or pnpm is busy in started and the frontend SSR server never becomes available.

@fredvd I am also visiting the same project due to preparing the content for that site. I found your HEALTHCHECK addition in the Dockerfile there, just some minutes ago but had no time to dig deeper there, because we have a related meeting this afternoon.

Is it worth to dig in here deeper or wait until you get a grip on that. I guess the origin of the pitfall is currently beyond my scope for now, except I end up the fool that digged the trap myself.

Does it make sense to reproduce your experimental change and see if I get the same effects? I am not sure where exactly to look for.

No, please wait. I'm on it now, this is sysadmin picking and poking to find out what is happening. I just realised maybe the localhost <> 127.0.0.1 can be anissue as well. I'll investigate.

I might set the healthcheck to 2-5 minutes so I have longer to peek around in the system.

1 Like

So the problem with the tagung.plone.de was a really silly one. I thought about it when checking possible causes and things I changed last friday. And then forgot to really check it :frowning:

The backend needs to be up and running, but it also needs to have a valid Plone-site configured. When the frontend starts up, the Volto SSR server checks the main (portal) root and that request HAS to respond with a 200.

If anything else is returned, or a connection refused, or a 404 not found (when no Plone site is there yet), you get the current behavior with the latest images. The pnpm process 'spins' and the container gets killed by (standard) container healthcheck permissions after X seconds. Rinse and repeat.

I keep telling to myself this afternoon that there used to be more debug output because I remember making the same mistake before and then recognised and fixed it... There is now no indication whatsoever. We also switched from yarn to pnpm, maybe yarn was more chatty. But I'm not so sure, perhaps I'm just getting old. :stuck_out_tongue:

I would love to add a preflight check here to the frontend startup that it reports if the backend is available and that there is a site that returns 200 in the default container output. It will save future users a lot of searching.

@acsr one more thing: the basic auth has an issue, on it now.

1 Like

I would love to add a preflight check

It sounds like the existing request IS a sort of preflight check, maybe it just needs a sane timeout and better error logging.

1 Like

@fredvd Gotcha! I had also this missing step once I killed the swarm and redeployed, the existing content including the site is gone (or not).

Why the Manual Deployment skips the create-site step when starting from scratch is worth a further look.

Without any other effort, a make stack-create-site from the devops folder created a new Plone site and the server is up an Plone is working.

Usually I write all the steps down and repeat them for every procedure I reuse.
When I start from the devops folder I always did:

  1. make stack-deploy
  2. make stack-create-site

But after the pnpm pitfall I started to do

  1. make stack-deploy
  2. make stack-status

and started to focus on the frontend pnpm issues, ignoring the step needed in the backend to create the site.

Later when the frontend pnpm issue was solved, I was stuck with this procedure. I had these restarting frontend containers before when I missed the create site step, but forgot to take notes in details because it seems obvious to never make the mistake again.

I added a PR to Troubleshoot deployment issues – Plone Deployment — Plone Training 2025 documentation

Manual Deployment now succeeded at once. I am still wondering why an initial Manual Deployment fails and still needs a make stack-create-site and the succeeds. Need to retry that on ARM as well.