TrigMC software release coordination: PTD's logbook - page for October 2018

Notes

Use the shift date in reverse chronological order to enter any new Merge Requests.

For each shift date show the NEW MRs, and any related relevant comments/questions; note down when I (or some one else) merged/closed each request, and mark the MR as done/closed by ticking the box ([x]). If an MR takes longer than a day to merge, leave it listed under the original shift date, but add comments prefaced with the date of the comment/action. Only MRs that are still active should have the box unticked ([ ]).

Therefore each MR will only appear once in the log, under the date of the day when I first became aware of it.

31/10/2018

[x] !15506 fix dphi issue (opened by Tomasz Bold)

2.06pm This MR was opened by Tomasz Bold a couple of hours ago. CI pipeline has been running for about one hour. More details of the underlying issue that this MR addresses can be found on JIRA ATR-18996.

1/11/2018 10.29am CI pipeline has concluded OK, and MR is now review-approved. Merged OK.


[x] !15469 Sweeping 15339 from 21.0 to 21.0-TrigMC. Add art include in head of test files RecJobTransformTests and RecExRecoTest (opened by Atlas Nightlybuild)

10.12am Opened automatically overnight. Pipeline OK and review-approved. Merged OK.

26/10/2018

[x] New release/21.5.10 off of 21.0-TrigMC nightly/2018-10-23T2139

2.28pm Andrzej Olszewski has posted on the ATLHI-226 JIRA thread requesting a new release, following their tests of the cache. Specifically, to make a new release 21.5.10 off of the nightly/21.0-TrigMC,r2018-10-23T2139.

2.45pm Checked that the release candidate number is indeed 21.5.10.

2.49pm Started Jenkins job to copy over the files to eos. The job failed almost immediately; checked console output - some error due to a directory that is either full (no space) or that should be empty prior to job starting. Then noticed that all other prior copy_RPMs_to_eos jobs that ran today failed for the same reason.

3.03pm Created JIRA ticket (ATLINFR-2742) requesting the new release, and also reporting the problem with the copy job. Ticket is assigned to Alessandro de Salvo.

3.14pm Oana has commented on the JIRA ticket: problem is known (apparently my files have been copied over, but some repository data is missing). She is on the case, will try again in 10 mins and will let me know where we stand.

3.27pm New tag created: release/21.5.10. Info message: "This release from the 21.0-TrigMC branch is for HI MC production. For more information see the JIRA tickets ATLHI-226 and ATLHI-220."

3.32pm Updated version.txt to the new release candidate number.

3.43pm Forgot to tick the box to "Start new MR" automatically, so had to create the MR (!15386) for updating the release candidate number, by hand... Done.

3.49pm MR !15386 merged OK. CI pipeline started and then, for some reason (?), went straight into "pending" status...

4.05pm Oana has now copied the missing "repodata" files over to the correct location. Over to Alessandro to deploy on the Grid.

27/10/2018 12.22pm Have checked: release 21.5.10 is deployed on 500+ nodes on the Grid. HAve posted on the ATLHI-226 to let Andrzej et al know. Will now email atlas-trig-relcoord as well.

27/10/2018 12.32pm Regarding the pipeline for MR !15386 (see above): it re-started automatically yesterday evening; passed OK.

25/10/2018

[x] !15317 Merge release/21.0.85 into 21.0-TrigMC (opened by Rafal Bielski)

9.28am Rafal Opened this request yesterday evening; it brings 21.0-TrigMC in line with 21.0.85 (until now had been aligned w 21.0.82). CI pipeline has passed and review-approved by L1 shifter. Merged OK.

22/10/2018

[ ] !15232 Merge nightly/21.1/2018-10-21T2147 into 21.0-TrigMC (opened by Rafal Bielski)

12.00pm This request has been opened 30 mins ago. It is apparently an update requested in ATLHI-226 JIRA ticket for new HI MC production. The issue seems to be with a threshold for one of the single muon triggers, which needs to be changed; the ticket mentions that once the developers are happy with these changes (i.e. nightly is deemed OK) they will want a release made out of it. CI pipeline is running.

12.20pm Have checked that the nightly above 21.1 nightly (nightly/21.1/2018-10-21T2147) is stable (at least as far as the ART summary page is concerned: no sign of any new errors compared to previous nightlies). Ditto for the most recent 21.0-TrigMC nightlies.

23/10/2018 9.40am CI pipeline failed yesterday, on account of "required tests". Rafal has in the meantime identified the cause of the problem and asked experts for advice on how to solve it. L1 shifter has removed the "review-pending" label and added "review-user-action-required".

23/10/2018 3.25pm Rafal posted a message: after consultation with the relevant experts, they are of the opinion is that the issue is with the test, rather than with the MR. Therefore he asks me to merge, despite CI pipeline failure. (They will fix the test afterwards.) Merged OK.

24/10/2018 9.34am I have had a quick look at the latest 21.0-TrigMC nightly (2018-10-23T2139) on ART, as well as on NICOS (GIT group as well as ATN group). The build had no errors and the tests all completed. Nothing looks out of the ordinary to my eye, especially when comparing to previous nightlies. Have posted a message on git to Rafal et al to let them know as much. Will wait to hear from them re how to proceed, once they have made more detailed checks on this nightly.

24/10/2018 3.45pm Note that Rafal has posted on the ATLHI-226 JIRA thread a request t the HI experts to look at the nightly and then advise if they want a release. Therefore keep an eye on this ticket for news re a possible release.

25/10/2018 9.18am (Yesterday evening Andrzej posted a comment with some details of tests he is carrying out; mostly OK, but still investigating a problem he uncovered.) No new comments posted since, on either this MR page or in the JIRA thread.

26/10/2018 9.55am There have been a more posts on the JIRA thread, possibly uncovering other small issues. No request yet for a release.

26/10/2018 2.28pm Release/21.5.10 has now been requested; see new entry above.

17/10/2018

[x] !15074 Sweeping 14887 from 21.0 to 21.0-TrigMC. Modify Input of MuonRecRTT and Sweep Updates for MuonGeomRTT (opened by Atlas Nightlybuild)

1.55pm Opened automatically 11h ago. Pipeline started then but does not seem to have got going (job is listed as "failed"); ATLAS Robot re-started it 30 mins ago, and is still running. For reference: !14887 was merged into 21.0 yesterday evening.

10.09pm Pipeline concluded successfully. As this is a sweep of a previously approved MR, should be safe to approve immediately. Merged OK.

10/10/2018

[x] !14895 Update 21.0-TrigMC to 21.0.82 (opened by Julie Hart Kirk)

12.43pm Request opened about 30 mins ago. Pipeline job started running, but was terminated within seconds...

11/10/2018 10.08am Pipeline passed and "review-approved". Merged OK.


[x] !14887 Modify Input of MuonRecRTT and Sweep Updates for MuonGeomRTT (opened by Zhidong Zhang)

11.21am Opened 15 mins. ago; pipeline is running. For some reason, despite being tagged with "21.0-TrigMC" the MR is not for me to approve...? (Maybe the reason is that this MR is also tagged with "21.0" and "21.3", therefore MR is targeted to other branches as well (for which I do not have any approval rights ...))

2.45pm Tags for this MR have now been changed to a more sensible "alsoTargeting:21.0-TrigMC" and "alsoTargeting:21.3", alongside the previous label "21.0". Therefore closing this entry.

9.56pm Just for information: the pipeline has passed. (This is relevant as this MR will be automatically swept into 21.0-TrigMC soon, probably even tonight.)

7/10/2018

[x] !14806 Fixing zero count in trktc chains (21.0-TrigMC) with algorithm of tracks and topoclusters (ATR-16263) (opened by Renjie Wang)

7.10pm MR opened yesterday afternoon (Saturday). CI pipeline passed. As this is not a sweep, or otherwise approved/tested MR, I will wait for the "review-approved" label from the L1 shifter.

8/10/2018 10.02am Still tagged "review-pending-level-1"...

8/10/2018 1.05pm L1 shifter has given their approval, in the past hour. Merged OK.

3/10/2018

[x] *Making release/21.5.9 out of nightly/21.0-TrigMC/2018-10-02T2138

(NB: For earlier info on this topic, see also the entries under !14635 below, and the discussion on the JIRA ticket ATLHI-220).)

5.29pm Still waiting for greenlight to go ahead with this; see also the many messages exchanged under the JIRA ticket ATLHI-220). Keep an eye on it... tag release tomorrow, if at all.

5/10/2018 12.06pm Still no change on JIRA; keep waiting for greenlight.

7/10/2018 7.05pm Still no change on JIRA; keep waiting for greenlight.

8/10/2018 9.58am (CERN) Iwona has posted another message on the issue, on JIRA. To try to get it unstuck she has tagged a couple of additional experts, to see if anyone can figure out the solution to the issue pointed out by Andrzej.

8/10/2018 2.46pm There has been a flurry of messages on the JIRA thread: the previously highlighted problem seems to have been dealt with; build request could be coming soon. Nevertheless, Andrzej will still have to do a quick test with a small sample of 1000 events before that happens. Keeping my eyes peeled.

9/10/2018 10.09am A few more messages on JIRA; it looks like we will go for an official release. Will confirm details with Iwona, Rafal, on the JIRA thread: based on the 2018-10-02T2138 nightly tag (or do they want a more recent nightly? The MRs after that were !14637 and !14806) Release number will be 21.5.9.

9/10/2018 10.40am Andrzej Olszewski points out that the 2018-10-02T2138 may now be overwritten/erased (it is now just over one week old...). I started the job to copy the RPM files over to the official CVMFS location, but the job failed almost immediately, due to not being able to find the files... So nightly has indeed been erased.

9/10/2018 10.49am Andrzej did run some tests earlier today, using the 2018-10-08T2138 nightly and he is happy to use it. Rafal also confirm he is happy to go with that.

9/10/2018 10.51am Started the job to copy over the RPM files for 2018-10-08T2138 nightly, on Jenkins.

9/10/2018 11.54am Just checked: Jenkins job finished successfully.

9/10/2018 12.01pm Created JIRA ticket ATLINFR-2691 for installation of new release; job assigned to Alessandro de Salvo.

9/10/2018 12.16pm Created new tag (release/21.5.9), and then updated release candidate number to 21.5.10 (via new MR).

9/10/2018 12.19pm Approved the MR !14858 for the release candidate number update. CI pipeline started.

9/10/2018 12.22pm "Merged immediately": OK. (Left the pipeline running.)

9/10/2018 4.01pm Still no sign of activity of the JIRA ticket for the new release... Also, checked the Grid installation website and release 21.5.9 is not yet known of...

9/10/2018 5.32pm Alessandro has started the installation on the Grid, about 30 mins ago.

9/10/2018 6.05pm Installation on the Grid seems to be progressing apace! According to the database, the new release is already installed in 279 grid nodes (!!) and is pending installation in one other node.

9/10/2018 6.15pm Checked installation on the Grid again: now installed in 440 nodes, 3 in progress. Clearly, this is far from over... Will wait to hear from Alessandro as to when installation is complete.

10/10/2018 8.48am Grid installation seems to be complete (installed in 528 nodes; and 5 still pending, from yesterday evening). Have confirmed this on the HI JIRA thread, to Iwona and the others.

10/10/2018 8.54am emailed atlas-trig-rel-coord et al to announce new release.

1/10/2018

11.35am Something is wrong with the TrigMC nightlies on this ART page: several are missing (including the 2018-09-19 nightly that was made into a release, last week...), listed as "n/a" (19-23/9, 25, 27,29 and 30/9). This does not seem to affect nightlies from other branches (eg I checked the master and 21.0 branches...). However, if I look at the old NICOS page, the 21.0-TrigMC nightlies are listed, without any gaps...??

2.37pm emailed Emil Obreshkov (cc Oleg Zenin) about the issue above. Also: updated the preamble on the (main logbook TWiki page) to reflect my latest understanding of the multiple TrigMC nightlies and nightlies webpages.

2/10/2018 9.18am Oleg Zenin has replied to my email:

It looks like the log processing script is receiving SIGTERM/SIGKILL from some other process in a few minutes after it's invoked from the nightly build job. I saw this before for other nightlies, need to investigate.


[x] !14635 Update 21.0-TrigMC to 21.1,2018-09-30T2139 (opened by Rafal Bielski)

3.37pm Just opened a few mins ago to bring 21.0-TrigMC in line with the very latest 21.1 nightly (30/9/2018). Large-ish number of MRs are included. Pipeline is running. Check the TrigMC nightly tomorrow (or day after, depending on actual time the MR is accepted) to check the impact of all this.

3.51pm Rafal has mentioned that this update is needed for HI MC production, as discussed in ATLHI-220. Subject to them being happy with the nightly tests, this presumably means we will be tagging it as an official 21.0-TrigMC release...

4.06pm Rafal (CC Julie) has confirmed that is the case.

2/10/2018 9.08 am Pipeline passed yesterday evening. Given the number of changes, and the fact we are aligning TrigMC with a nightly release of 21.1 - rather than a tagged and tested release, will wait for the review-approved label to be added by the L1 shifter.

2/10/2018 3.00pm (MR still not approved by L1 shifter.) Rafal asked (in comments) a few minutes ago as to whether we can go ahead with the merge, so as to make the nightly. Happy to do so.

2/10/2018 3.03pm Merged OK.

3/10/2018 10.30am Have finished looking at the nightly build and test results. The build went OK, even if there was one error in the Tools/ART package, due to a failed symlink creation. (This appears to happen regularly: it seems to be that the same symlink creation is attempted twice - or even more ties - and therefore on the second attempt there is an error due to "File already exists". This seems to be circumstantial, as it happens some nights and not others (and not eg an error in the script creating the links), and is probably not of great consequence (provided it does not trigger a stop of the execution of the script) as the symlink gets created anyway. The ATN tests all completed, and are no worse (also not better) than previous nightlies. Will email Rafal now.

3/10/2018 10.46am email sent to Rafal.

3/10/2018 11.05am Rafal is happy with the nightly, but has asked the HI group to make further checks - on specific HI trigger test counts - (via the JIRA ticket ATLHI-220) before we decide to make a tagged release or not. Will keep an eye on the JIRA ticket.


[x] !14637 WIP: ATR-18765: Muon deserialisation fix for bs-streamerinfo.root (opened by Catrin Bernius)

4.14pm Opened a few minutes ago. CI pipeline is running. Note that MR is marked as a Work in Progress ("WIP:" prefix in the title) therefore the MR cannot be merged (as the "Merge" button is not even available). Will have to wait for Catrin to remove the "WIP:" prefix from the title for that to change.

4.22pm Probably a good idea to keep this (whenever it becomes available to merge) - and any other non-HI subsequent MRs - separate from !14635 above. (i.e., approve !14635 first, wait for the nightly, and only then approve subsequent MRs).

2/10/2018 9.10am Pipeline passed overnight; MR is still marked as Work in Progress, so not available to merge.

2/10/2018 12.35pm Have noted that Rafal has added a comment pointing out that there is likely to be a conflict between !14635 and this MR, therefore should merge !14635 first (and presumably make a tagged release out of it?) before going ahead with !14637 (or even: changing !14637 to avoid future conflicts?)

2/10/2018 3.58pm Rafal has re-triggered a build/pipeline of this MR, now that !14635 has been merged... Not sure I understand what is the point (since no changes made to present MR...). OK, I think I see it: the build will be of the cache that includes all of the changes from the MR !14635... so if there is a conflict (between two files, as Rafal feared) the new pipeline would fail...? and therefore we would learn that , and refrain from merging !14637 until the issue is resolved.

3/10/2018 9.40am Have just noticed that Rafal removed the WIP: prefix from this MR (yesterday afternoon); the latest pipeline succeeded, therefore he is presumably happy to go ahead and merge this.

3/10/2018 11.01am Rafal has confirmed he is happy to go ahead with this MR. Merged OK.

-- PedroTeixeiraDias - 01 Oct 2018

Edit | Attach | Watch | Print version | History: r38 < r37 < r36 < r35 < r34 | Backlinks | Raw View | Raw edit | More topic actions

Physics WebpagesRHUL WebpagesCampus Connect • Royal Holloway, University of London, Egham, Surrey TW20 0EX; Tel/Fax +44 (0)1784 434455/437520

Topic revision: r38 - 28 Nov 2018 - PedroTeixeiraDias

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2024 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding RHUL Physics Department TWiki? Send feedback