CI for PR not working as expected

zopyx · March 19, 2019, 3:46pm

I created a PR for plone.restapi

and I wonder why 3 CI pipeline just show up with an error like

https://jenkins.plone.org/job/pull-request-5.2/2223/console

I have never seen a PR with a working CI pipeline..there is always something...in particular why is this so fragile?

mauritsvanrees · March 19, 2019, 8:12pm

I have seen lots of PRs with a working pipeline. I did a few fixes today that should fix most flaky robot framework failures.

In this case you started the jobs with a github comment @jenkins-plone-org please run jobs. That should have worked, but the console says:

You seem to forgot to add a pull request URL on the "Build with parameters" form!

Indeed the parameters of this build are empty.
@gforcada Would you have an idea what it failed here? I haven't seen that github comment fail myself, it is working fine for me.

Maanwhile I would suggest manually starting those jobs. I have done so for your PR.

gforcada · March 19, 2019, 9:17pm

Yes, just like @mauritsvanrees mentioned, you are manually triggering jenkins jobs without providing the pull request URL.

It is clearly stated on the message below the text field that you must do that in order to test the pull request.

Or, again, as Maurits mentions, you can easily add a comment on the pull request and it will run. All the details are already on the comment that was added by mr.roboto on that very pull request of yours.

@zopyx Is there a way to make it more clear? what was confusing you to not use the two methods already available?

mauritsvanrees · March 19, 2019, 10:13pm

No, that is not what I said. He actually used the github comment (and Rotonen did the same before that), but it did not work: it resulted in that message about forgetting to add a PR url.

zopyx · March 20, 2019, 8:55am

My expectation as contributor:

I submit the PR
I expect the verification pipeline including all Jenkins build to start automatically without having to click on build or adding "@jenkins-plone-org please run jobs"

Q:

how to restart a single Jenkins build in case of a failure manually? I did not see a BUILD button in the Jenkins UI
in my case some robot tests failed for whatever reason...I contributed a small fix (a two liner), all tests passing locally on Python 2 and 3...do you expect that a contributor would fix unrelated robot tests?

mauritsvanrees · March 20, 2019, 9:52am

Being able to do this via a comment is a recent addition and already makes it much easier and faster.
I have added 'please run jobs' as a reply template in github, making it even faster.

Starting a build automatically would be nice. But this only works when this is a change in one package without related and needed changes in other packages. When PRs in multiple packages belong together, it is useless and wasteful to start Jenkins jobs on the individual PRs.
Especially on sprints this would be unworkable.

Were you logged in? Build buttons are only visible to authenticated users. You can login with your github account.

Full info is on github. There might be a link to that somewhere in a logical spot, but I don't immediately see it. Currently, the main PR job page at least says this in the center top: "To trigger a job, just login with your GitHub account and provide the pull request URL."

Easiest is to login, go to the failing job, and click the Rebuild button. This will show a form with the PR url (or multiple urls) already filled in.

Someone has to. I finally made several robot tests more robust yesterday. When there are unrelated failures you have several options:

Fix the tests, possibly in a separate PR in a separate package.
Restart the failing job and hope it works now.
Add a comment on your PR saying that you think the failures are unrelated and that you think it is ready for merge after all.

You can do this yourself, or wait for someone else to step in. Doing it yourself takes more time for you, but may get your PR merged faster.

zopyx · March 20, 2019, 10:20am

Honestly: not me.

plone.restapi 3.7.5 was released just some days ago on 2019-03-14. I assume that all tests worked for the release.
There was no Plone 5.1 release inbetween. Robot tests are notoriously fragile and often we see false positives..I trust in unittest results but not in robot tests - in particular they are hard to debug.

petschki · March 20, 2019, 2:19pm

Maybe it would be an idea to make separate PR pipelines on jenkins with and without robot tests (like we have for the plone build already) ... the default please run jobs comment should be without. If you comment please run robot jobs you'll test on a robot pipeline ... unrelated flaky breaking robot tests bites me nearly every time I'll to a PR for core components.

tisto · March 21, 2019, 5:35am

@zopyx I agree that as a user you expect things to be as easy and nice as with Travis.

Apart from that: The best way to handle flaky robot tests is to fix them. In most cases, robot tests are flaky because the underlying implementation is actually flaky.

Though, the truth is that robot tests always might fail even there is no real issue. The best thing you can do then is to just re-run the robot tests that fails automatically. This approach is incredibly effective in my experience and leads to much more stable results.

I am happy to share our internal code how to re-run just the failing robot tests and merge results together.

A personal note: Gil is already working his ass off on Jenkins with a very limited amount of time available and the "please run jobs" enhancement is a major accomplishment and improvement! Keeping the CI infrastructure running is lots of work and he is basically doing this on his own for years now. I am completely failing contributing anything in the last years here. If Gil wouldn't do this, our CI infrastructure would completely break apart and core development wouldn't be possible in the way we know it. Together with Maurits, he is one of the major hidden gems in the Plone community. They do their incredibly important work for the community and do not talk too much about it.

The only thing I am asking here is: please keep that in mind when criticizing the current status quo.

zopyx · March 21, 2019, 8:00am

No intentional offense nor blaming anyone. Just asking because I had to deal multiple times with the CI with PRs in the past. In order to receive contributions there must be a stable pipeline. If robot tests are fragile to run then move them out of the standard pipeline with unittests.

tisto · March 21, 2019, 8:04am

Yeah. I got that. As said, your expectations are reasonable and I thank you for sharing them. Our CI system is a complex beast and running lots of robot tests in a stable manner continuously is not trivial.

gforcada · March 22, 2019, 12:06am

Pull requests are always welcome:

jenkins jobs are defined in jenkins.plone.org repository
jenkins master configuration is on an ansible playbook
jenkins node configuration is on another ansible playbook
the orchestration of those playbooks is on jenkins.plone.org repository

I'm happy to help reviewing pull requests, I will try to make my best effort in reviewing them in a timely manner.

gforcada · March 22, 2019, 12:14am

@zopyx do you see a big Rebuild link on a build job's left column? i.e.

I made the link so big and obvious as to ensure no one misses it. Do you see it, or maybe it is only available to admins?

My workflow with pull requests is:

create the pull request
add the jenkins trigger job comment
get back to the pull request on github to see if it finished and its status
if any of the jobs fail, look at the failures
if they feel unrelated, click on the enormous Rebuild button to give it a second try

One downside of the trigger a job with a comment is that currently the email that was usually sent to the person that trigger the job no longer receives it, but the email behind jenkins-plone-org github user (which is me right now). I will try to get some time to revert this side-effect so that users that trigger the job do get an email back.

zopyx · March 22, 2019, 7:05am

I say the "Build" button within the main content area but not in the sidebar but I could be wrong or blind...but I was looking for the rebuild option several times

gforcada · March 23, 2019, 10:51pm

Which one?

I never stated that anywhere, but the only buttons/links one should click on Jenkins (regarding triggering jobs) are the ones that I explicitly oversized to make them obvious that are on the side bar. That is so far:

"Build now" and "Rebuild last" links on regular jobs (i.e. the main jobs)
"Build with parameters" link on jobs with parameters (i.e. the pull request jobs)
"Rebuild" link on a specific build (i.e. https://jenkins.plone.org/job/pull-request-5.2-3.7/645/ rather than https://jenkins.plone.org/job/pull-request-5.2-3.7)

With the special comment on github and these three links, one should be able to restart any job at any time. If one does not see those, please state it!