[02:08:04] *** Quits: Guest37437 (cbec03e8@gateway/web/freenode/ip.203.236.3.232) (Ping timeout: 260 seconds) [06:18:54] jimharris, yeah something along those lines or maybe sections where we specify primary and assure it runs to completion and another where we have two with one specified primary, another secondary and assure they run to completion. All depends on what exactly we want to test I suppose [09:57:50] bwalker, you added a +2 and a minor suggestion on the README patch, what [09:57:55] what's your intention there? [09:58:27] looks like those were comments I forgot to publish from several revisions ago [09:59:01] they're no longer relevant - you fixed the spelling [09:59:04] so just ignore [09:59:12] peluse, jimharris, bwalker: we should talk through what to do with the multiprocess test: https://review.gerrithub.io/#/c/367725/ [09:59:27] I think the approach is reasonable, but this current patch will add ~30 seconds to the tests [09:59:36] bwalker, cool, thanks [10:00:00] I'm OK with checking this in as-is if we plan to fix it by using stub or something like that [10:00:16] drv, yeah I think that sounds like a plan [10:00:31] would rather add 30 sec and get it right than leave it alone [10:00:49] i think everything in the multi-process script should be using the stub [10:01:00] also, we need to address errors in background tasks that are not getting caught [10:01:26] jimharris, if you guys want to approve what's there now so we're actually testing, I'll start working on making the whole thing use the stub [10:01:39] and deal with background errors somehow [10:01:57] I think we need to address how we're going to wait for applications to initialize [10:02:04] i.e. some mechanism to wait until it is safe to launch a secondary [10:02:19] bwalker, if we use the stub then they're all secondary [10:02:19] to catch errors in background processes, we would need to capture all of the PIDs we have launched and individually 'wait' on them [10:02:55] bwalker, and we can have the stub drop a marker file when its fully initialized if the current sentinal file won't serve that purpose as well [10:02:58] drv, cool [10:03:19] maybe our event framework should be dropping the sentinel file [10:03:58] I want to solve the problem as generally as possible, not specifically for this test using the stub [10:04:58] is DPDK prepared to initialize secondaries when rte_eal_init returns? [10:05:06] hmmm... that seems like a bigger issue. [10:05:48] bwalker, I inferred from reading that it was the app's responsibility to assure sync between primary and secondary based on its unique requirments [10:05:54] that's what makes sense anyway, I could be wrong [10:06:27] if that's the case then something "general" in the framework would be difficult unless its coordinating between framework modules alone [10:06:43] well initialization is a big process - we certainly need to wait for DPDK to initialize before launching the secondary [10:06:55] but do we also need to wait for other SPDK initialiazation like the NVMe enumeration? [10:07:11] yeah I think so [10:07:19] well, I dunno I guess so :) [10:07:20] if that's the case, then maybe we can't do something generic [10:07:25] right [10:08:15] that's why I think for the test situation it's "OK" to have the stub be an isolated thing for handling sync between apps that have little/nothing to do with each other in real life (identify, perf, arb, etc) [10:08:56] do any of our SPDK modules operate as primary/secondary in a real life use case for an app using SPDK? [10:12:47] yeah [10:12:56] we support that - so our customers will probably do it [10:13:42] which ones? I think the general case should be solved there and then we can see if whatever it is turns out to be applicable in any testing scenarios we can apply it then [10:13:59] they can use any of our libraries in a multi-process environment [10:14:31] but only nvme really works in multi-process currently [10:14:40] the actual primary/secondary thing comes up any time they're using DPDK [10:14:52] or any time they want to share nvme devices [10:15:41] is there a specific example of a pair of apps that fire up as primary/secondary that we can talk about (other than the artificial test cases we already talked about)? [10:15:50] no [10:16:06] heh, problem solved :) [10:16:07] what about iSCSI or NVMe-oF target + nvme-cli? [10:16:27] nvme-of + nvme-cli is a good choice - we don't have that implemented right now [10:16:42] can you explain more about the ISCSI example? [10:17:01] gang's patches are pretty much ready but we're going to work through a method to reduce the diff with upstream nvme-cli [10:17:09] but that's ultimately identical in the scope of the problem to starting perf + arb [10:17:47] let's say you have a long-running process like an iSCSI/NVMe-oF/vhost target that is primarily using the NVMe SSD [10:18:22] but you want to do some kind of "management" operation on the SSD - for now let's assume that's along the lines of "get log page" or "identify" (and not "format") [10:18:44] sidebar: look like the arb app does all sorts of shit, can someone provide a summary of what its main purpose is in life? [10:18:56] we've developed patches to the Linux nvme-cli utility, so that it can work with SPDK as well as the Linux kernel [10:19:44] so with multi-process, you could run nvme-cli as a secondary process, to issue commands to the NVMe SSDs without having to shut down the iSCSI/NVMe-oF/vhost server [10:19:48] so does the nvme-cli run as s secondary to the nvme driver as primary? Or are you talking about somehting else? [10:20:02] nvme-cli would be a secondary process [10:20:03] oh, typing at the same time :) [10:20:22] the primary process could either be iscsi_tgt/vhost/nvmf_tgt [10:20:39] or it could be the stub app (and then iscsi_tgt/vhost/nvmf_tgt would be a secondary process) [10:21:03] the problem with the latter is that hotplug has to be driven from the primary app [10:21:26] due to some DPDK implementation details [10:21:26] jimharris, whats driving that requirement? [10:21:58] I typed that right before you typed the answer. I should just wait on all my questions for about 30 secs from now on [10:26:57] OK, well I'll get started on an nvme.sh patch that uses stub exclusively as the primary and will catch all errors background or not and fail appropriately... [10:29:28] Hi all, after this release build, I am going to pause the build pool for about a minute to test some functionality on the new centOS vm's. [10:35:57] jimharris: why is the stub app set to run on 4 cores? [10:37:24] I understand that the threads will end up sleeping inside of our reactor loop [10:37:28] i believe that was the superset of all of the tests that were running in that script [10:37:28] but does DPDK actually allow that? [10:37:38] hmmm - never ind [10:37:39] mind [10:37:56] does DPDK actually allow a primary and a secondary to start up with overlapping core masks? [10:38:02] yes [10:39:06] can a secondary start up with a core mask that isn't in use by the primary? [10:39:25] or will there be no local caches in the mempools available, etc [10:41:30] mempools allocate the local_cache array based on RTE_MAX_LCORE [10:41:38] (not the actual number of lcores) [10:42:18] so the secondary can specify any mask it wants, not just a subset of the primary mask [10:42:20] to be honest, I'm not sure why we specify anything except just 1 core in the stub core mask [10:42:58] I think for iscsi you wanted the nvme driver to be able to do numa-aware allocations [10:43:03] so you made the mask span sockets [10:43:35] but I'm not sure that actually impacts anything [10:53:21] bwalker, you mean you're not sure if it allows the secondary nvme driver to be numa-aware or do you mean it does but it makes no difference? [10:54:08] I'm not sure if we initialize DPDK with a core mask of 0x1, what the behavior will be if we attempt to allocate memory on a different cpu socket that doesn't contain cpu 0x1 [10:54:33] peluse: I'm taking a quick look at these multiprocess tests in light of this conversation [10:54:40] may have a patch for discussion [10:55:12] in terms of the memory allocation, I bet a numa-aware allocations works regardless of the core mask, but I don't know for sure [10:55:18] you're like a dog who just heard squirrel... [10:55:25] I'm already in the middle of one [10:55:42] but keep going, interested to see what you come up with [10:56:08] I'm definitely taking a much more extreme approach that just making the existing test work with the stub app [10:56:16] so it's more a discussion of direction than actual code overlap [10:59:19] rock on man [11:00:30] btw, the arbitration test is testing NVMe weighted round robin [11:00:48] ahh, thanks [11:02:48] except it isn't actually testing anything in the test pool, since the drives in the test machines don't support WR [11:02:51] *WRR [11:03:43] Here is my mp proposal patch: https://review.gerrithub.io/#/c/368208/ [11:03:58] I explained my reasoning in the commit message - that's the part that really needs to be debated [11:04:47] you can probably coalesce the two if [ `uname` = Linux ] blocks [11:05:01] probably - but I thought it was clearer to keep them separate [11:05:10] I thought the diff would show up nicer [11:05:12] but it doesn't [11:24:30] so I hadn't noticed nvmemp.sh before, where/when was that run and why do we believe removing it all is OK? [11:24:46] only run during nightly test [11:25:01] also, why just perf in the new test (instead of all 3 apps) and why are the first two in background and the third not? [11:25:27] removing it is ok for the reasons in my commit message - it isn't testing anything that isn't already hit by other tests, or that we intend to support [11:25:38] although those reasons are what the discussion needs to center on [11:26:02] OK, I need to look at that a little closer then [11:26:15] I don't think using different apps than perf adds any additional test coverage value - perf hits more stuff than the rest [11:26:40] the first two are in the background because I need to launch 3 processes simultaneously [11:26:44] 2nd set of questions? I'll also run this on my systems here and see how it fares. Also what about failures in the background tests? Will waiting on their PIDs expose that somehow (timeout or error?) [11:26:49] I could launch all 3 in the background, but that's more wordy than what I did [11:28:06] the 'wait' call takes care of background errors [11:29:08] meaning, it will wait forever if one of them fails or the shell will exist if one of them fails? [11:29:56] bwalker: posted a reply on the bdev open/close patch - can you take a look? [11:30:30] bwalker, oh, one more :) Why 3? The stub is running as primary so is there any reason why 3 instead of 1 or 2? [11:33:58] 3 is a magic number [11:35:18] it used to do 4, so I was going to do that [11:35:21] but I got bored of typing after 3 [11:37:04] heh, OK. [11:37:06] bwalker: in your patch, how do we know pid0 and pid1 haven't already exited by time you wait on them? [11:37:21] and same for no longer mixing apps and using all instances of perf? [11:37:51] the previous commands in nvme.sh did something similar [11:38:17] jimharris, why would we care? [11:39:32] doesn't wait pid0 exit with error status if pid0 doesn't exist? [11:40:09] I'm not so sure jimharris - trying to figure out the docs [11:41:10] oh - never mind [11:41:15] jimharris, oh, I thought it was a NOP if that were the case [11:41:49] when you run it in the background, bash keep status until you either put the process in the fg, or wait on it [11:42:01] bwalker, I have to leave for 90 min for an appt. 1 run of your patch looked good on my fedora system, let me run it a few mores times on that one as well as my ubuntu system before merge if u can. I'll be back later... [11:42:16] sleep 5 & [11:42:23] (wait 10 seconds) [11:42:27] wait [11:42:38] wait returns success [11:42:54] yeah, I think a backgrounded process sticks around until you wait on it [11:42:55] that's what drv told me - I was trying to figure out the actual documentation that says that [11:43:17] but I can't find any [11:43:25] but that test on my system does work out correctly [11:43:37] so wait does some magic for sure [11:43:45] I don't think the process itself sticks around but the shell keeps its status [11:43:49] yeah [11:44:01] so looks like what I wrote addresses all possible failures [11:44:06] if you do a ps after 5 seconds (i.e. my example above) it doesn't show the pid [11:44:18] the debatable part is whether we can delete the tests I deleted [11:45:49] my instinct says yes - i do think that with gang's nvme-cli work, we should add tests with a target app as primary, and nvme-cli as a secondary process [11:46:20] yeah I agree with that [11:46:23] but not sure how we want to integrate those tests since the nvme-cli patches won't be in the main spdk repo [11:46:40] we can do it like rocksdb where the tests goes and grabs the nvme-cli code [11:46:57] from some location [11:47:05] that kind of sucks actually, but I don't have a better idea [11:47:27] i think your patch is no worse than what we had before - running arbitration with perf was always a contrived example (because we didn't have anything better) [11:47:51] an example with iSCSI/NVMe-oF/vhost as primary, plus maybe identify as a secondary app would be another good test case [11:47:59] until we have nvme-cli [12:41:30] fedora-03 is throwing doc failures http://spdk.intel.com/public/spdk/builds/release/master/2524/wkb-fedora-03/build.log [13:03:59] sethhowe is fixing it up [13:55:53] My apologies. There was a bug in a change that I pushed to the agent script but everything is back up and running now. [14:17:07] jimharris: looks like this one needs a rebase: https://review.gerrithub.io/#/c/367611/ [14:17:55] yeah - was waiting to see if the GPT patch was also ready to be rebased [14:18:08] but it's not - so I'll rebase this 367611 at least [14:18:23] done [15:00:11] bwalker: can you re-review https://review.gerrithub.io/#/c/367611/11..12 [15:00:25] that link is the diffs since your last +2 review (it's all rebase-related changes) [15:02:42] done [15:03:06] gracias [15:08:24] drv: can you take a look at https://review.gerrithub.io/#/c/368115/ [15:08:45] i'm not sure maxlun => max_lun_count makes this more clear [15:09:27] yeah, I don't really like that new name either [15:10:01] maybe we should keep the "maxlun" name but make it actually equal the maximum valid number, rather than maximum + 1? [15:11:11] would "last_lun" make it more clear? [15:13:11] yeah [15:13:16] I like that better [15:17:09] luns don't have to be contiguous right (memory is fading)? [15:17:36] there has to be a lun0, but otherwise no [15:17:44] ah yes, thanks [15:17:57] in our implementation, they are always contiguous though [15:18:01] for now [15:18:12] because we're so cool right!? [15:28:49] you can create an iSCSI target node with non-contiguous LUNs but I don't think we have any tests for that currently [15:28:57] meaning an SPDK iSCSI target node [15:29:16] for vhost-scsi, we only support a single LUN per device (if you want to add more LUNs, just add another device) [15:29:38] SCSI device hotplug is much simpler than dynamic LUN changing [15:50:21] jimharris: working toward async RPC, here's the first step (just an API change, no actual async support yet): https://review.gerrithub.io/#/c/368225/ [16:03:30] bwalker, I posted some good test results on your multi proc patch. It failed CI though if you didn't notice: [16:03:31] Cannot mmap memory for rte_config at [0x7f22fff5f000], got [0x7f2304bc0000] - please use '--base-virtaddr' option [16:03:31] 10: [/home/sys_sgsw/build_pool/agent/repo/examples/nvme/identify/identify() [0x404bba]] [16:08:05] FYI I have more failures, of course, running CI beyond env.sh on these other systems. Will start looking at those a bit later to see what's real and what's not. Going to add some more UT coverage to nvme.c from that really old series first... [16:12:32] drv, was that failure above the same mem range map issue you ran into last week? Or this week or whenever it was :) [16:17:03] sounds similar [16:17:29] we would expect it to be using the --base-virtaddr address, but it's mapping the shm region somewhere else [16:18:00] so I'm not sure if the primary wasn't able to map the shm region where we requested, or if one of the other processes is starting before the one we expect to be the primary, or... something else :) [16:18:28] hmmm... [16:31:12] bwalker and sethhowe have found (part of) the problem with no free hugepages - looks like the hugepage mount got mounted twice, and the second mount hides the first one, which still has files in it [16:44:56] this patch should fix the double hugepage mount problem: https://review.gerrithub.io/#/c/368228/ [17:23:26] nice! [22:57:32] *** Quits: guerby (~guerby@april/board/guerby) (Quit: Leaving) [23:01:24] *** Joins: guerby (~guerby@ip165.tetaneutral.net) [23:01:24] *** Quits: guerby (~guerby@ip165.tetaneutral.net) (Changing host) [23:01:24] *** Joins: guerby (~guerby@april/board/guerby) [23:05:24] *** Quits: guerby (~guerby@april/board/guerby) (Client Quit) [23:06:57] *** Joins: guerby (~guerby@april/board/guerby)