Jenkins Node performance

During porting the robotframework tests I created a new experimental job pr-6.1-py3.12-robottest-only [Jenkins] which runs "robottests only" to get faster results of our changes.

Here are the runtimes on each Node for this job (ROBOTSUIT_PREFIX=ONLYROBOT results in 160 Scenarios for Plone 6.1)

Node 1: 9min 55sec -> success
Node 3: permanently offline
Node 4: 22min -> failure (timeouts)

I've therefore "pinned" the PR-6.1 jobs to Node 1 right now in order to get stable successful results but I wanted to know if the persons in charge of running Jenkins are aware of this?

/cc @gforcada @tisto @mauritsvanrees @fredvd @alert

EDIT:
I think that there might be a problem when there are running more jobs in parallel on the same node ... see Plone 6.1 - Python 3.10 - Robot Framework Tests (chrome) #384 Console [Jenkins] there it says Could not connect to the playwright process at port 46082.

and this one runniny in parallel succeeded Plone 6.1 - Python 3.12 - Robot Framework Tests (chrome) #622 [Jenkins]

I am not aware of differences between the nodes, except that node 3 has indeed been down for a while now.
But I have noticed often that parallel runs on the same node can easily go wrong. So there is no full isolation. I try not to start many jobs at once, if I can avoid it.

Moving to running all tests in a Docker container could help, then they would surely be isolated. But that takes effort.
Or move everything over to GitHub, but that takes effort as well, rewiring all the nice things that we currently have with Jenkins and mr.roboto.
Or move all 100 or so plone packages to a mono repo, but this also takes effort.

I restarted node 3.
Node 3 and node 4 are VMs hosted on machines that also have other things to do and they are also quite old, but last time I checked they were actually faster than node 1.
The situation might have changed.

Thanks for the info ... I've started the robottest builds once again sequentially and all is green now. I've also unpinned the 6.1 Jobs in order to let them choose on which node they are running ...

Meanwhile Node3 and Node4 do have severe resource problems :sweat:

Sorry abuot that but I do not really control the status of the VMs, just the host.
I anyway rebooted both of them and node3 is back online.

Node 4 is busy since minutes in this thing:

No other VM of that server is having any issue at all :confused:

IIRC during the jenkins rfbrowser config session with @gforcada last week we saw a full disk on Node3 ... Node4 is maybe full too. Not sure about that but maybe the jobs need much more space on disk because of the multiple rfbrowser init commands.

I think there is something not configured properly there, as every time a job with robot tests is run browsers get downloaded over and over again.

I remember configuring it to keep the browsers in a global folder, but maybe with the last changes that has to be updated.

I'm having problems ssh'ing into Node3 or Node4 as of late, and usually are rather slow (while doing apt-get update for example one can notice that right away).

Confugure the Browsers global, then they must keep in sync with the current use robotframework and a ENV var is needed.

@gforcada I can confirm that with the PLAYWRIGHT_BROWSERS_PATH environment var the browsers only get installed once. I additionally created a PR which only installs chromium for headlesschrome since we do not use any other browsers in robottests. See Only install chromium browser for robottests by petschki · Pull Request #374 · plone/jenkins.plone.org · GitHub

1 Like

Thanks! Jenkins jobs are updated, please give them a try! :robot:

2 Likes

What's the scenario if we want to contribute an additional Node for Jenkins?

We have a IONOS dedicated Server in Germany which has some resources available (500G, 16Core) so we could setup a virtualized box there (libvirt).

I saw this package GitHub - plone/plone.jenkins_node: Ansible Galaxy Playbook for a jenkins node but I need some guidance there.