[02:00:00] *** Joins: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl)
[03:31:02] *** Quits: tsuyoshi (b42b2067@gateway/web/freenode/ip.180.43.32.103) (Quit: Page closed)
[05:15:24] *** Quits: johnmeneghini1 (~johnmeneg@pool-96-252-112-122.bstnma.fios.verizon.net) (Quit: Leaving.)
[08:07:05] <peluse> yeah, I suppose once I get this worked out, or if someone gets an Ubuntu VM up in CI first, we'll need to tweak a few script to account for it... no biggy
[08:09:21] <peluse> FYI if I comment out the last loop in the multi_process test it makes it to the next test.  Will try running some of those individual perf and identify commands on their own and see what happens...
[08:11:00] *** Joins: sethhowe (~sethhowe@192.55.54.39)
[09:15:41] *** Joins: jstern_ (jstern@nat/intel/x-esibzdwpjwimuayt)
[09:45:14] <jstern_> g'morning -- when we do a doc update on spdk.io, do we get build artifacts (the preview page) from the new gerrithub CI process?
[09:46:38] <bwalker> spdk.io is broken into two parts
[09:46:50] <bwalker> spdk.io/doc, which is all generated from doxygen stuff in the main spdk repository
[09:46:59] <bwalker> so if you update that and submit a regular patch, it will generate a preview
[09:47:07] <bwalker> although the preview will not have the same stylesheet
[09:47:12] <bwalker> example: https://ci.spdk.io/builds/review/512035f9bba4b0bb5b23273d95247085937f7aa3.1497386144/wkb-fedora-03/doc/html/
[09:47:44] <bwalker> updates to the other parts of spdk.io are done through an Intel internal Gerrit (the old process)
[09:47:49] <bwalker> the source for the website is not public
[09:48:11] <bwalker> but the website is intended to mostly be "static"
[09:48:20] <bwalker> and the docs are what really get updated
[09:48:25] <jstern_> OK, understood. I was trying to link to the preview for the Vision statement, which was a doc update, but coldn't find it
[09:48:38] <bwalker> wkb-fedora-03 builds doc previews as part of its test
[09:49:03] <bwalker> the link is broken though
[09:49:24] <bwalker> let me see if I can't sort it out
[09:49:57] <jstern_> I can just naively tack on /wkb-fedora-03/doc/html/ to the link added to the patch and it seems to give me the preview
[09:50:07] <jstern_> https://ci.spdk.io/builds/review/512035f9bba4b0bb5b23273d95247085937f7aa3.1497386144/wkb-fedora-03/doc/html/
[09:50:08] <bwalker> yeah but the link to the vision page is broken
[09:50:20] <jstern_> yeah
[09:50:20] <bwalker> because I didn't add it to the doxygen manifest
[09:50:21] <jstern_> :)
[09:50:21] <bwalker> one sec
[09:54:30] <bwalker> it's rebuilding now
[10:00:37] <bwalker> http://spdk.intel.com/public/spdk/builds/review/101add0022f6fc524acca95a189efc2bd3aad383.1497545611/wkb-fedora-03/doc/html/vision.html
[10:02:11] *** Quits: sethhowe (~sethhowe@192.55.54.39) (Remote host closed the connection)
[10:02:17] * peluse reminds himself to add "move the spdk.io to the new process once some other dust settles" to the TODO list...
[10:04:56] <bwalker> spdk.io, besides the doc section, contains two types of things
[10:05:01] <bwalker> process related stuff, which is rarely updated
[10:05:05] <bwalker> like how to submit a patch
[10:05:10] <bwalker> and blog posts
[10:05:11] <peluse> totally understand
[10:05:24] <bwalker> the blog posts often have to go through legal review before going public
[10:05:24] <peluse> nothing is unclear there now :)
[10:05:34] <bwalker> so we can't move to an open review process
[10:05:34] <peluse> yeah, I get that too
[10:05:38] <peluse> not true
[10:05:56] <peluse> will explain it to you later
[10:06:40] <bwalker> if you have sign off from legal, then I'm happy to make it public
[10:07:59] <peluse> no an issue
[10:08:04] <peluse> no=not :)
[10:08:19] <peluse> after frying some other fish though....
[10:12:42] <bwalker> now I'm hungry
[10:13:41] <peluse> squirrel!
[10:13:43] *** Joins: johnmeneghini (~johnmeneg@216.240.19.91)
[10:18:06] <peluse> interesting on my hang running nvme.sh from autorun.sh...
[10:18:19] <peluse> •	Commenting out while loop at the end of multi_process
[10:18:20] <peluse> •	Keeping while loop but commenting out bdev and event tests that come before it
[10:18:20] <peluse> •	Running perf and identify commands w/same parms as nvme.sh but on their own
[10:18:58] <peluse> and when it hangs, I have 1 thread of arbitration, 2 of perf and one of identify.  Their current states are at the gist I'm about to post.  When I kill the arbitration thread everything finishes fine
[10:19:07] <peluse> https://gist.github.com/peluse/260980205ed2256da9250749bfeae477
[10:19:22] <peluse> BTW those three bullet items are things sthat work OK
[10:20:21] *** Quits: johnmeneghini (~johnmeneg@216.240.19.91) (Quit: Leaving.)
[10:24:05] <peluse> well, everything doesn't finish fine after I kill that process - poor wording :) segfault after that but I exit gracefully but the 2 perf threads and the 1 identify thread are still running
[10:26:28] * peluse offline for a few hours...
[10:52:17] *** Joins: sethhowe (sethhowe@nat/intel/x-dsuplzvzzwndurrh)
[10:52:52] <bwalker> drv: Take a look at this for our test pool runs: http://rr-project.org/
[10:56:50] <drv> seems like it could be useful
[11:54:17] *** Joins: johnmeneghini (~johnmeneg@216.240.19.91)
[11:55:41] *** Quits: johnmeneghini (~johnmeneg@216.240.19.91) (Remote host closed the connection)
[12:10:32] *** Quits: jstern_ (jstern@nat/intel/x-esibzdwpjwimuayt) (Ping timeout: 260 seconds)
[12:15:58] *** Joins: jstern_ (~jstern@192.55.55.37)
[12:17:05] *** Joins: jstern__ (jstern@nat/intel/x-xmakyqqasvzlpemq)
[12:20:44] *** Quits: jstern_ (~jstern@192.55.55.37) (Ping timeout: 268 seconds)
[13:54:37] <drv> jimharris: I posted a comment on https://review.gerrithub.io/#/c/365712/ - I think driver_ctx isn't correctly handled in vbdev_split if the vbdev itself needs a context
[13:56:56] <jimharris> in this case my use is OK (I think) since I only use it in the reset case where I don't pass the original bdev_io to the base_bdev
[13:57:03] <drv> hmm, I see
[13:57:03] <jimharris> but maybe I should clarify that in a comment
[13:57:09] <drv> yeah, that is pretty tricky
[13:59:04] <jimharris> the reset stuff is looking much more solid now - going back and re-running the vhost tests with resets enabled like before on my system
[14:04:42] <bwalker> we still have a bug if the blockdev module fails to abort an I/O on a reset
[14:05:10] <bwalker> the NVMe driver will never fail after my patches (assuming it works like we've coded it), because it aborts everything out of software queues
[14:07:25] <jimharris> are you referring to the split/gpt/lvol case?
[14:07:41] <bwalker> yeah the one we drew on the whiteboard
[14:08:18] <bwalker> for NVMe that other LUN will now at least see a bdev_io completion guaranteed
[14:08:34] <bwalker> because the blockdev_nvme module guarantees all I/O will be aborted on a reset
[14:08:45] <bwalker> by deleting all of the queue pairs
[14:09:05] <bwalker> for other blockdev modules, that may not be the case (i.e. aio)
[14:09:41] <bwalker> and either way, instead of showing those aborts to the other LUN, the split module should be resubmitting them
[14:10:26] <jimharris> the nvme patches you did don't really help the split/gpt/lvol case though - they mostly help with thread safety on the nvme reset operation
[14:10:44] <jimharris> even with your patches, we could have i/o queued up in the bdev layer that don't get aborted
[14:10:56] <jimharris> i.e. on a different split/gpt/lvol associated with the same namespace
[14:11:16] <bwalker> my patches fix the case where there was I/O pending at the nvme device
[14:11:30] <bwalker> and a reset comes in
[14:11:39] <bwalker> and those completions would not have been propagated because of the gencnt
[14:11:58] <bwalker> i.e. vbdev split would just never get a completion for those
[14:12:13] <jimharris> oh - i see what you're saying - yes, aio needs to also be fixed to reset all of the io_context_ts for all of the channels
[14:13:10] <drv> jimharris: any opinion on whether we can merge the vhost-blk qemu changes to the main spdk branch?   1      0        0                                                       │
[14:13:13] <drv>              asserts     29     29     29      0      n/a                                                       │dverkamp-desk2:~/src/spdk
[14:13:17] <drv>                                             oops, wrong copy buffer
[14:13:20] <jimharris> lol
[14:13:25] <drv> https://review.gerrithub.io/#/c/363361/
[14:13:36] <drv> or should we put it on a new branch temporarily?
[14:15:32] <drv> hmm, looks like your last patch that re-enables vhost causes a test failure - maybe that sleep that you shortened?
[14:15:56] <bwalker> it looks like it started to try to do an RPC before it was ready
[14:16:01] <jimharris> ugh - I didn't mean to push that yet
[14:16:12] <jimharris> i accidentally added that part to the patch
[14:16:29] <jimharris> on my system i don't need 25s so i shorten it to 10s
[14:16:41] <bwalker> can't we use the wait for RPC logic we already have?
[14:16:54] <jimharris> but regardless - i'm going to wait until i get a clean run on 100 iterations before i re-enable that (33 and counting...)
[14:17:00] <drv> cool
[14:23:35] <jimharris> i'm ok checking this into the main spdk/qemu master branch
[14:24:27] <jimharris> do you have any problems with that?
[14:29:09] <drv> sounds good to me
[14:29:22] <drv> then we can get the spdk vhost-blk patches unblocked
[14:33:00] <jimharris> bwalker: i'm wondering if we could make your bdev_nvme changes for reset more generic - i.e. add new bdev callbacks for something like "pre_reset" and "post_reset"
[14:33:28] <jimharris> so all of the channel iteration is driven by common code and aio and other bdev modules don't have to duplicate it
[14:33:50] <bwalker> you could move it up to the bdev layer and then just put all of the I/O channels
[14:33:57] <bwalker> and re-get them after
[14:34:41] <jimharris> to trigger nvme queue deletion and re-creation?
[14:34:44] <bwalker> yeah
[14:34:56] <jimharris> i don't think that will work - the upper layers will still have open references to them
[14:35:24] <bwalker> the upper layers have references to the upper layer channels
[14:35:39] <bwalker> those channels are pointing to the lower ones, but that's the reference we'd be calling put on
[14:36:54] * jimharris looking again at the bdev channel code
[14:38:08] <jimharris> ah - yes, I missed this, the bdev channel has the only reference to the module channel
[14:38:33] <bwalker> yeah I don't know why I didn't code it generically in the bdev layer to start with
[14:41:51] <jimharris> drv: did you not feel comfortable giving this a +2?  https://review.gerrithub.io/#/c/364153/
[14:42:20] <drv> jimharris: I just wanted to make sure you had seen the event_perf changes
[14:42:23] <drv> I'm OK with merging it
[14:49:01] <jimharris> bwalker: on your subsystem patches - can you remove Daisuke Aoyama copyright?  everything you have on those files is Intel original code
[14:49:28] <bwalker> crap, I copied it from a file he must have been on
[14:49:33] <bwalker> k, let me respin
[14:50:01] <jimharris> thanks - i checked the first two in the series and they look good except for that copyright line
[14:50:33] <jimharris> sethhowe: what's our topic for today's technical training?
[14:55:17] <sethhowe> jimharris: We will be covering nvmf today.
[15:20:16] *** Quits: jstern__ (jstern@nat/intel/x-xmakyqqasvzlpemq) (Remote host closed the connection)
[16:33:22] *** Quits: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…)
[16:47:14] <drv> jimharris: did this vhost reset test re-enable pass your 100-test run?
[16:50:51] <jimharris> yep
[16:51:16] <drv> alright, sounds good enough to me
[16:55:01] <drv> sethhowe, jimharris, bwalker: I set up a Trello board to track CI work: https://trello.com/b/3DvD85zi/continuous-integration
[16:55:05] <drv> feel free to fill in more stuff
[16:58:39] <sethhowe> cool, thanks!
[18:51:42] <peluse> bueno
[22:22:09] *** Joins: ziyeyang_ (ziyeyang@nat/intel/x-tvqutnbuqxkruztp)