[03:07:05] *** Joins: tkulasek (~tkulasek@192.55.54.44) [04:03:53] *** Quits: dlw (~Thunderbi@114.255.44.143) (Ping timeout: 255 seconds) [05:39:29] *** Joins: dlw (~Thunderbi@114.246.95.117) [06:57:54] *** Quits: dlw (~Thunderbi@114.246.95.117) (Ping timeout: 256 seconds) [08:49:35] *** Quits: tomzawadzki (~tomzawadz@192.55.54.40) (Ping timeout: 240 seconds) [09:17:44] *** Quits: tkulasek (~tkulasek@192.55.54.44) (Ping timeout: 256 seconds) [09:34:36] *** Joins: travis-ci (~travis-ci@ec2-107-21-150-51.compute-1.amazonaws.com) [09:34:37] (spdk/master) nvme: improve error messages in set_num_qpairs (Daniel Verkamp) [09:34:37] Diff URL: https://github.com/spdk/spdk/compare/a4a497d5b0f1...fa6f7a166dc1 [09:34:37] *** Parts: travis-ci (~travis-ci@ec2-107-21-150-51.compute-1.amazonaws.com) () [09:46:56] *** Joins: travis-ci (~travis-ci@ec2-107-21-150-51.compute-1.amazonaws.com) [09:46:57] (spdk/master) app: Add "NoPci" to command line options (Shuhei Matsumoto) [09:46:57] Diff URL: https://github.com/spdk/spdk/compare/fa6f7a166dc1...e5c5740911d0 [09:46:57] *** Parts: travis-ci (~travis-ci@ec2-107-21-150-51.compute-1.amazonaws.com) () [10:55:03] sbranden: looks like you rebased your patch to re-run it through the test pool so i haven't scheduled it to re-run - patch overall looks great, just one small change requested [11:17:00] *** Parts: jkkariu (jkkariu@nat/intel/x-twnjjhkdwgdohdox) ("Leaving") [13:24:23] *** Joins: travis-ci (~travis-ci@ec2-54-144-195-124.compute-1.amazonaws.com) [13:24:24] (spdk/master) nvme: Remove calls to getpid() when submitting nvme requests (Jonathan Richardson) [13:24:24] Diff URL: https://github.com/spdk/spdk/compare/e5c5740911d0...ce70f29662d1 [13:24:24] *** Parts: travis-ci (~travis-ci@ec2-54-144-195-124.compute-1.amazonaws.com) () [13:24:41] *** Joins: vkon15 (cf8c2b51@gateway/web/freenode/ip.207.140.43.81) [14:36:38] Hi guys question regarding hot plug support [14:37:46] I saw a SIGBUS when a drive was removed, when attempting to do an mmio_write [14:38:44] is it safe to issue requests prior to issuing spdk_nvme_detach()? [14:48:33] heh, I already have a use for the new bdev entry point to identify the end of bdev layer subsystem init, need it for the refactored crypto bdev to init all virtual & real crypto PMDs after they've been created during examine callbacks, nice! [14:50:11] vkon15: it should be safe - they should get caught and will result in failed I/O [16:00:21] *** Joins: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) [16:17:33] *** Joins: travis-ci (~travis-ci@ec2-54-144-195-124.compute-1.amazonaws.com) [16:17:34] (spdk/master) bdev/iscsi: initialize g_iscsi_lun_head staticly and make iqn per bdev (Pawel Wodkowski) [16:17:34] Diff URL: https://github.com/spdk/spdk/compare/ce70f29662d1...5f6c428dbcff [16:17:34] *** Parts: travis-ci (~travis-ci@ec2-54-144-195-124.compute-1.amazonaws.com) () [16:26:05] *** Joins: travis-ci (~travis-ci@ec2-54-144-195-124.compute-1.amazonaws.com) [16:26:06] (spdk/master) nvmf: add the assert in spdk_nvmf_ctrlr_disconnect (Ziye Yang) [16:26:06] Diff URL: https://github.com/spdk/spdk/compare/5f6c428dbcff...b332897a0424 [16:26:06] *** Parts: travis-ci (~travis-ci@ec2-54-144-195-124.compute-1.amazonaws.com) () [16:28:45] *** Joins: travis-ci (~travis-ci@ec2-107-21-150-51.compute-1.amazonaws.com) [16:28:46] (spdk/master) test/nvmf: confirm kernel finds NVMe-oF namespaces (Seth Howell) [16:28:46] Diff URL: https://github.com/spdk/spdk/compare/b332897a0424...e3263286d34b [16:28:46] *** Parts: travis-ci (~travis-ci@ec2-107-21-150-51.compute-1.amazonaws.com) () [16:33:42] bwalker: i posted a comment on https://review.gerrithub.io/#/c/407602/ - could you take a look? [16:37:08] now I have to write out all of the possible orders these things can run in [16:37:24] because I can't answer your question for sure [16:38:53] I tentatively think you are right [16:38:59] we know the ch is always operated on from the same thread [16:39:12] i was tentative in my question [16:39:51] I was going one way with the patch, then went back another way [16:39:55] i think since you're taking the ch off the thread up front, you've made all of it a lot simpler [16:39:56] and I may be able to simplify this more [16:40:26] yeah the keys are removing the ch right away, and also the device refcnt [16:40:30] those are the critical parts of the fix [16:40:39] so the rest maybe I can clean up now [16:41:08] i added a comment to https://review.gerrithub.io/#/c/407838/ as well - you can do it now or later - your call [16:41:23] w/travel this week I missed checking up on the nightly build for the last few days... looks like last night was OK but the night before that was a failure - https://ci.spdk.io/spdk/nightly_status.html [16:41:27] Did anyone look into that? [16:41:34] (last week) [16:45:16] 4/14 failure was the intermittent ASLR issue [16:45:43] 4/10 failures - 2 of them were the intermittent ASLR issue, the other was blobfs issue [16:46:27] a blobfs fix went in later on 4/10 - although there have been a couple of other intermittent blobfs issues seen on per-patch tests since then, i'm still investigating [16:47:03] any volunteers to tackle Shuhei's iSCSI JSON configuration dump patch review? https://review.gerrithub.io/#/c/406493/ [16:47:07] for the ASLR issue - we should (only on Linux) run pmap on the stub process after it starts as a debugging aid [16:47:12] I gave it a once over, but it's a lot of code to take in :) [16:47:28] i'll take a look tomorrow [16:48:27] we have a theory that with ASAN enabled, sometimes ASAN gets something mapped in the virtual address region we've specified - but it's only a theory at this point (and would need the pmap data to prove it) [16:48:37] it's harmless to just capture all the time for now [16:48:53] yeah, that's a good idea [16:49:27] looks like it is already wrapped in a nice start_stub helper function [16:49:48] i'll do the patch now [16:50:01] after I +2 bwalker's patch update - looks good1 [16:52:24] yeah, much simpler [16:54:36] jimharris, cool thanks [16:58:17] https://review.gerrithub.io/#/c/407842/ [17:03:26] jimharris: patch looks good, but running it locally with ASAN enabled, it looks like I never actually get the DPDK pages mapped at the base virtaddr we request [17:03:32] (at least across a couple of test runs) [17:03:51] that's interesting [17:03:53] so we are probably always getting a different base, it just happens to work most of the time because we get lucky with ASLR [17:05:24] so we probably just need to disable ASLR - but we should be able to do this *only* at the times we need it [17:05:29] it looks like nothing is mapped at 0x1000000000 in the ASAN enabled case [17:05:40] so I'm not sure why it doesn't let us map there [17:05:52] maybe the ASAN hooks intentionally disregard requested addresses for mmap [17:06:15] we could disable ASLR before starting the stub, then immediately re-enable it [17:06:21] yeah, that sounds reasonable [17:06:28] hmmmm - well no, that won't work [17:06:33] plus any other places where we use multi-process support [17:06:40] (I think some of the NVMe tests use it without stub) [17:07:16] well, we could enable it in start_stub and disable it in kill_stub [17:07:36] sounds like we'll need to disable the 0x10xxxx0000 virt-addr stuff as well [17:07:50] ASAN must not like that - it sounds like it's allowing the mapping in the secondary processes [17:08:26] primary process: DPDK requests 0x100000000 but gets 0x7F00004000 [17:08:44] secondary process: DPDK requests 0x7F00004000 and gets that address [17:09:02] am I wrong? [17:09:08] something like that [17:09:30] and the case where it fails is the same, except the secondary has something mapped in the way of wherever the primary put it [17:10:04] hmmm [17:11:58] wait - no, this must be different on the failing platforms - DPDK prints out an error message if we couldn't mmap to the requested address, and when it fails, we see an error for only a few of the hugepages when the *primary* process starts [17:12:59] http://spdk.intel.com/public/spdk/builds/review/526511f7f6d9ddf496e13b6e217daf56b3ba2bb8.1523707244/ubuntu16.04/build.log [17:13:02] search for "not respected" [17:13:20] * jimharris waits for peluse to inject snide comment [17:13:30] hmm, odd, I don't see that message in my test [17:13:42] but I also don't see anything in pmap at 0x1000000000 [17:13:42] * peluse reading backwards [17:13:45] something funky going on there [17:14:12] I definitely see stuff at 0x1000000000 when I run with asan disabled [17:14:55] LOL, I run into that *all the time* if you need help finding it ;) [17:16:04] need to get Rodney Dangerfield in here to tell us about getting no respect [17:16:13] are you running your process with shm_id > 0? [17:16:23] we only specify base addr in that case [17:16:24] no, I was testing with shm_id = 0 [17:16:38] I see the --base-virtaddr in the dpdk cmd, though [17:17:06] it looks the same with shm_id 1 [17:17:25] the mappings start at 0x600000000000 when ASAN is enabled [17:17:50] * peluse tugs at his collar [17:18:09] I am testing with DPDK 17.08 as shipped by Fedora, if that matters [17:18:29] i don't understand how you're seeing base-virtaddr = 0x10000000000 with shm_id not defined [17:18:52] yeah, that "not respected" message was added in v17.11, which is why I don't see it [17:19:10] jimharris: it's >= 0 (or really the else of < 0) [17:20:43] the test pool machines have the same behavior I'm seeing [17:20:46] e.g. http://spdk.intel.com/public/spdk/builds/review/c5ed2c50a885e7ff51d090e2f72031394379b07e.1523923539/fedora-01/build.log [17:20:59] the mappings start at 600000000000 instead of 0x1000000000 [17:21:27] well, actually, that's not the right mapping - they're at 7ffc6ee00000 [17:21:31] but still not where we asked [17:21:46] yeah, looking it that now [17:24:07] oh - those error messages aren't from mapping the hugepages, they are from mapping /dev/zero beforehand [17:24:27] which is why we only get a handful of them - the /dev/zero mappings are done on contiguous regions - not one per hugepage [17:24:33] I think it intentionally maps a large region from /dev/zero first to make sure the address space is free, then remaps the hugepages on top of that [17:24:39] well pmap has already helped here :) [17:24:42] yeah [17:24:46] yeah - exactly [17:25:32] i think that's the fix then - disable ASLR in start_stub and re-enable it in kill_stub [17:26:02] yeah, we just need to double check that everything using multi-process uses the start_stub/kill_stub helpers [17:36:32] looks ok from what I can tell - there are a few places where we start bdev_svc in multi-process mode unnecessarily - we should fix those in a separate patch [17:38:26] https://review.gerrithub.io/#/c/407843/ [17:38:44] i say we still commit the pmap stuff for now [17:42:50] yeah, it's useful for debugging anyway [18:10:34] *** Joins: dlw (~Thunderbi@114.255.44.143) [18:33:21] reminder: community meeting in about 13.5 hours from now.. see https://trello.com/b/DvM7XayJ/spdk-community-meeting-agenda for agenda [18:33:59] or or with bwalker's addition you can now check the date/time local to you by looking at http://www.spdk.io/community/ [19:41:39] *** Quits: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) (Ping timeout: 260 seconds) [19:46:25] *** Joins: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) [20:39:59] *** Quits: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) (Ping timeout: 260 seconds) [23:01:37] *** Joins: travis-ci (~travis-ci@ec2-54-144-195-124.compute-1.amazonaws.com) [23:01:38] (spdk/master) nvmf: make the qpair disconnnect in the right order. (Ziye Yang) [23:01:38] Diff URL: https://github.com/spdk/spdk/compare/a3f887677776...27d47b9a1032 [23:01:38] *** Parts: travis-ci (~travis-ci@ec2-54-144-195-124.compute-1.amazonaws.com) () [23:12:03] *** Joins: pzedlews_ (uid285827@gateway/web/irccloud.com/x-zukojogpekpslrsc) [23:15:57] *** Joins: tomzawadzki (~tomzawadz@192.55.54.38)