[02:00:00] *** Joins: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) [03:31:02] *** Quits: tsuyoshi (b42b2067@gateway/web/freenode/ip.180.43.32.103) (Quit: Page closed) [05:15:24] *** Quits: johnmeneghini1 (~johnmeneg@pool-96-252-112-122.bstnma.fios.verizon.net) (Quit: Leaving.) [08:07:05] yeah, I suppose once I get this worked out, or if someone gets an Ubuntu VM up in CI first, we'll need to tweak a few script to account for it... no biggy [08:09:21] FYI if I comment out the last loop in the multi_process test it makes it to the next test. Will try running some of those individual perf and identify commands on their own and see what happens... [08:11:00] *** Joins: sethhowe (~sethhowe@192.55.54.39) [09:15:41] *** Joins: jstern_ (jstern@nat/intel/x-esibzdwpjwimuayt) [09:45:14] g'morning -- when we do a doc update on spdk.io, do we get build artifacts (the preview page) from the new gerrithub CI process? [09:46:38] spdk.io is broken into two parts [09:46:50] spdk.io/doc, which is all generated from doxygen stuff in the main spdk repository [09:46:59] so if you update that and submit a regular patch, it will generate a preview [09:47:07] although the preview will not have the same stylesheet [09:47:12] example: https://ci.spdk.io/builds/review/512035f9bba4b0bb5b23273d95247085937f7aa3.1497386144/wkb-fedora-03/doc/html/ [09:47:44] updates to the other parts of spdk.io are done through an Intel internal Gerrit (the old process) [09:47:49] the source for the website is not public [09:48:11] but the website is intended to mostly be "static" [09:48:20] and the docs are what really get updated [09:48:25] OK, understood. I was trying to link to the preview for the Vision statement, which was a doc update, but coldn't find it [09:48:38] wkb-fedora-03 builds doc previews as part of its test [09:49:03] the link is broken though [09:49:24] let me see if I can't sort it out [09:49:57] I can just naively tack on /wkb-fedora-03/doc/html/ to the link added to the patch and it seems to give me the preview [09:50:07] https://ci.spdk.io/builds/review/512035f9bba4b0bb5b23273d95247085937f7aa3.1497386144/wkb-fedora-03/doc/html/ [09:50:08] yeah but the link to the vision page is broken [09:50:20] yeah [09:50:20] because I didn't add it to the doxygen manifest [09:50:21] :) [09:50:21] one sec [09:54:30] it's rebuilding now [10:00:37] http://spdk.intel.com/public/spdk/builds/review/101add0022f6fc524acca95a189efc2bd3aad383.1497545611/wkb-fedora-03/doc/html/vision.html [10:02:11] *** Quits: sethhowe (~sethhowe@192.55.54.39) (Remote host closed the connection) [10:02:17] * peluse reminds himself to add "move the spdk.io to the new process once some other dust settles" to the TODO list... [10:04:56] spdk.io, besides the doc section, contains two types of things [10:05:01] process related stuff, which is rarely updated [10:05:05] like how to submit a patch [10:05:10] and blog posts [10:05:11] totally understand [10:05:24] the blog posts often have to go through legal review before going public [10:05:24] nothing is unclear there now :) [10:05:34] so we can't move to an open review process [10:05:34] yeah, I get that too [10:05:38] not true [10:05:56] will explain it to you later [10:06:40] if you have sign off from legal, then I'm happy to make it public [10:07:59] no an issue [10:08:04] no=not :) [10:08:19] after frying some other fish though.... [10:12:42] now I'm hungry [10:13:41] squirrel! [10:13:43] *** Joins: johnmeneghini (~johnmeneg@216.240.19.91) [10:18:06] interesting on my hang running nvme.sh from autorun.sh... [10:18:19] • Commenting out while loop at the end of multi_process [10:18:20] • Keeping while loop but commenting out bdev and event tests that come before it [10:18:20] • Running perf and identify commands w/same parms as nvme.sh but on their own [10:18:58] and when it hangs, I have 1 thread of arbitration, 2 of perf and one of identify. Their current states are at the gist I'm about to post. When I kill the arbitration thread everything finishes fine [10:19:07] https://gist.github.com/peluse/260980205ed2256da9250749bfeae477 [10:19:22] BTW those three bullet items are things sthat work OK [10:20:21] *** Quits: johnmeneghini (~johnmeneg@216.240.19.91) (Quit: Leaving.) [10:24:05] well, everything doesn't finish fine after I kill that process - poor wording :) segfault after that but I exit gracefully but the 2 perf threads and the 1 identify thread are still running [10:26:28] * peluse offline for a few hours... [10:52:17] *** Joins: sethhowe (sethhowe@nat/intel/x-dsuplzvzzwndurrh) [10:52:52] drv: Take a look at this for our test pool runs: http://rr-project.org/ [10:56:50] seems like it could be useful [11:54:17] *** Joins: johnmeneghini (~johnmeneg@216.240.19.91) [11:55:41] *** Quits: johnmeneghini (~johnmeneg@216.240.19.91) (Remote host closed the connection) [12:10:32] *** Quits: jstern_ (jstern@nat/intel/x-esibzdwpjwimuayt) (Ping timeout: 260 seconds) [12:15:58] *** Joins: jstern_ (~jstern@192.55.55.37) [12:17:05] *** Joins: jstern__ (jstern@nat/intel/x-xmakyqqasvzlpemq) [12:20:44] *** Quits: jstern_ (~jstern@192.55.55.37) (Ping timeout: 268 seconds) [13:54:37] jimharris: I posted a comment on https://review.gerrithub.io/#/c/365712/ - I think driver_ctx isn't correctly handled in vbdev_split if the vbdev itself needs a context [13:56:56] in this case my use is OK (I think) since I only use it in the reset case where I don't pass the original bdev_io to the base_bdev [13:57:03] hmm, I see [13:57:03] but maybe I should clarify that in a comment [13:57:09] yeah, that is pretty tricky [13:59:04] the reset stuff is looking much more solid now - going back and re-running the vhost tests with resets enabled like before on my system [14:04:42] we still have a bug if the blockdev module fails to abort an I/O on a reset [14:05:10] the NVMe driver will never fail after my patches (assuming it works like we've coded it), because it aborts everything out of software queues [14:07:25] are you referring to the split/gpt/lvol case? [14:07:41] yeah the one we drew on the whiteboard [14:08:18] for NVMe that other LUN will now at least see a bdev_io completion guaranteed [14:08:34] because the blockdev_nvme module guarantees all I/O will be aborted on a reset [14:08:45] by deleting all of the queue pairs [14:09:05] for other blockdev modules, that may not be the case (i.e. aio) [14:09:41] and either way, instead of showing those aborts to the other LUN, the split module should be resubmitting them [14:10:26] the nvme patches you did don't really help the split/gpt/lvol case though - they mostly help with thread safety on the nvme reset operation [14:10:44] even with your patches, we could have i/o queued up in the bdev layer that don't get aborted [14:10:56] i.e. on a different split/gpt/lvol associated with the same namespace [14:11:16] my patches fix the case where there was I/O pending at the nvme device [14:11:30] and a reset comes in [14:11:39] and those completions would not have been propagated because of the gencnt [14:11:58] i.e. vbdev split would just never get a completion for those [14:12:13] oh - i see what you're saying - yes, aio needs to also be fixed to reset all of the io_context_ts for all of the channels [14:13:10] jimharris: any opinion on whether we can merge the vhost-blk qemu changes to the main spdk branch? 1 0 0 │ [14:13:13] asserts 29 29 29 0 n/a │dverkamp-desk2:~/src/spdk [14:13:17] oops, wrong copy buffer [14:13:20] lol [14:13:25] https://review.gerrithub.io/#/c/363361/ [14:13:36] or should we put it on a new branch temporarily? [14:15:32] hmm, looks like your last patch that re-enables vhost causes a test failure - maybe that sleep that you shortened? [14:15:56] it looks like it started to try to do an RPC before it was ready [14:16:01] ugh - I didn't mean to push that yet [14:16:12] i accidentally added that part to the patch [14:16:29] on my system i don't need 25s so i shorten it to 10s [14:16:41] can't we use the wait for RPC logic we already have? [14:16:54] but regardless - i'm going to wait until i get a clean run on 100 iterations before i re-enable that (33 and counting...) [14:17:00] cool [14:23:35] i'm ok checking this into the main spdk/qemu master branch [14:24:27] do you have any problems with that? [14:29:09] sounds good to me [14:29:22] then we can get the spdk vhost-blk patches unblocked [14:33:00] bwalker: i'm wondering if we could make your bdev_nvme changes for reset more generic - i.e. add new bdev callbacks for something like "pre_reset" and "post_reset" [14:33:28] so all of the channel iteration is driven by common code and aio and other bdev modules don't have to duplicate it [14:33:50] you could move it up to the bdev layer and then just put all of the I/O channels [14:33:57] and re-get them after [14:34:41] to trigger nvme queue deletion and re-creation? [14:34:44] yeah [14:34:56] i don't think that will work - the upper layers will still have open references to them [14:35:24] the upper layers have references to the upper layer channels [14:35:39] those channels are pointing to the lower ones, but that's the reference we'd be calling put on [14:36:54] * jimharris looking again at the bdev channel code [14:38:08] ah - yes, I missed this, the bdev channel has the only reference to the module channel [14:38:33] yeah I don't know why I didn't code it generically in the bdev layer to start with [14:41:51] drv: did you not feel comfortable giving this a +2? https://review.gerrithub.io/#/c/364153/ [14:42:20] jimharris: I just wanted to make sure you had seen the event_perf changes [14:42:23] I'm OK with merging it [14:49:01] bwalker: on your subsystem patches - can you remove Daisuke Aoyama copyright? everything you have on those files is Intel original code [14:49:28] crap, I copied it from a file he must have been on [14:49:33] k, let me respin [14:50:01] thanks - i checked the first two in the series and they look good except for that copyright line [14:50:33] sethhowe: what's our topic for today's technical training? [14:55:17] jimharris: We will be covering nvmf today. [15:20:16] *** Quits: jstern__ (jstern@nat/intel/x-xmakyqqasvzlpemq) (Remote host closed the connection) [16:33:22] *** Quits: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…) [16:47:14] jimharris: did this vhost reset test re-enable pass your 100-test run? [16:50:51] yep [16:51:16] alright, sounds good enough to me [16:55:01] sethhowe, jimharris, bwalker: I set up a Trello board to track CI work: https://trello.com/b/3DvD85zi/continuous-integration [16:55:05] feel free to fill in more stuff [16:58:39] cool, thanks! [18:51:42] bueno [22:22:09] *** Joins: ziyeyang_ (ziyeyang@nat/intel/x-tvqutnbuqxkruztp)