[01:48:58] *** Quits: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…) [02:08:23] *** Joins: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) [04:28:04] *** Quits: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) (Ping timeout: 240 seconds) [04:28:37] *** Joins: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) [05:22:27] *** Quits: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) (Ping timeout: 260 seconds) [05:24:34] *** Joins: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) [05:39:00] *** Quits: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…) [06:07:59] *** Joins: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) [07:04:08] *** Joins: lhodev (~Adium@inet-hqmc07-o.oracle.com) [07:07:27] *** Joins: mark-ddn (4a608da8@gateway/web/cgi-irc/kiwiirc.com/ip.74.96.141.168) [07:09:26] *** Parts: lhodev (~Adium@inet-hqmc07-o.oracle.com) () [07:10:42] *** Parts: mark-ddn (4a608da8@gateway/web/cgi-irc/kiwiirc.com/ip.74.96.141.168) () [08:05:05] *** Joins: markyk (4a608da8@gateway/web/cgi-irc/kiwiirc.com/ip.74.96.141.168) [08:06:11] *** Quits: markyk (4a608da8@gateway/web/cgi-irc/kiwiirc.com/ip.74.96.141.168) (Client Quit) [08:23:51] *** Joins: nKumar (uid239884@gateway/web/irccloud.com/x-rtszgrhzdfhogbtz) [08:24:45] so I am super close to having this blobStore implementation done. However, whenever I execute a spdk_bs_io_read_blob, I get no errors however all of the values in the readBuffer created are 0s [08:25:49] any ideas why this might be happening? [08:32:03] nKumar, is this still the same issue you were having before wrt data not matching or has it morphed from that somehow? [08:32:16] not really, last time it seemed to be garbage data [08:32:19] now its fully zeroed out [08:36:32] can you post your write & read routines somewhere like you did before? and also info on sizes (size of cluster, size of blob, size of IO that you're doing)? [08:40:56] *** Quits: nKumar (uid239884@gateway/web/irccloud.com/x-rtszgrhzdfhogbtz) (Excess Flood) [08:41:16] *** Joins: nKumar (uid239884@gateway/web/irccloud.com/x-wwsckgeegipzmviy) [10:18:36] peluse: just merged blob_hello - looks good [10:58:27] drv, awesome thanks! [10:59:19] * peluse just remembered he needs to go fixup the commit msgs on all those UT patches here soon... [14:43:29] *** Quits: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…) [14:56:25] *** Joins: whitepa (~whitepa@2601:601:1200:f23b:8054:7049:7905:d8f7) [16:12:06] hi all - I've noticed that the nvme_pcie_ctrlr_construct allocates using spdk_dma_zmalloc, while nvme_rdma_ctrlr_construct allocates using plain old calloc(). Is that an oversight? [16:12:42] I'm trying to use nvme in multi-process mode, with both pcie and rdma devices. [16:14:09] hitting a segfault when one process is trying to access a nvme_rdma_ctrlr that's on the other process' heap. [16:14:25] seems like that should probably be in dpdk shared mem [16:18:43] hi whitepa, we've been discussing just this topic recently [16:18:54] couple of patches that probably are of interest: https://review.gerrithub.io/#/c/373908/ and https://review.gerrithub.io/#/c/373898/ [16:19:36] hey, cool! I was in the middle of doing just that right now :) [16:19:39] thanks [16:20:34] if you try those out and they work, please let us know (either in IRC or as comments/+1 on the reviews if you have GerritHub set up) [16:20:43] will do [16:33:25] it seemed to fix the one particular crash I was seeing, now I'm hitting a different, related one. [16:33:41] (gdb) print rqpair->cm_id->qp | [16:33:41] Cannot access memory at address 0x14338d8 [16:34:40] hmm, are you attempting to actually use a NVMe-oF controller from more than one process? [16:34:43] use case is two processes, both accessing the same PCIe local SSDs. One of those processes tries to add a device via RDMA, and the other crashes. [16:34:44] I don't think that will work even after the patches [16:38:17] I'm using nvmf_tgt as the primary process, it connects direct to 2 PCIe SSDs. Then a secondary process starts up, connects to the same SSDs, and then tries to open another via RDMA. at that point, nvmf_tgt segfaults as above. [16:39:06] I don't need nvmf_tgt to be able to use the remote SSDs, I only need it to act as a target for the locals. [16:39:39] hmm, that seems like it should work, so there may be more patches necessary [16:39:53] it seems like it's trying to initialize it's own controller for the RDMA connected SSD [16:40:21] looks like a hotplug event is being handled or something [16:48:01] whitepa: do you have a backtrace handy? how does the nvmf_tgt side end up trying to create a NVMe-oF qpair in the NVMe driver? [16:48:36] my suspicion is that the NVMe-oF controller created in the secondary process will get added to the attached_ctrlrs list in the shared driver structure [16:48:51] yeah - paste here? [16:49:11] and then the nvmf_tgt's hotplug poller will call spdk_nvme_probe() and try to init the NVMe-oF controller, which won't work since the ibverbs pointers aren't in the shared memory region [16:49:12] *** Quits: nKumar (uid239884@gateway/web/irccloud.com/x-wwsckgeegipzmviy) (Quit: Connection closed for inactivity) [16:49:22] whitepa: if it's short, sure, or on a pastebin [16:49:37] really just want to verify that the callstack includes spdk_nvme_probe() being called in nvmf_tgt [16:49:39] https://pastebin.com/X5wV9kzp [16:50:36] it's definitely in there [16:50:53] hmm, yeah, looks like that's what's going on - we probably need a check in nvme_init_controllers() to avoid trying to touch non-PCIe controllers from a different process [16:52:43] we already stash the owning process's pid in each qpair, so that might be a way to check without adding too much new stuff [16:55:22] nm, it's actually in the per_process_tailq and only for io queues, but still shouldn't be too hard to add - let me see if I can put something together [16:57:00] yeha I see the active_procs tailq in the ctrlr... each having a pid and is_primary [17:05:12] this might work around it, untested: https://review.gerrithub.io/#/c/374015/ [17:05:58] makes sense [17:06:03] testing... [17:13:53] seems to be in an infinite loop on that continue [17:16:11] does the controller need to be removed from the init_ctrlrs list? [17:16:58] hmm, yeah, I totally didn't read the loop condition [17:17:17] we probably can't remove it from the init_ctrlrs, though, since that's shared across the processes... have to think about how to work around that [17:17:33] it actually should probably not be added to init_ctrlrs in the first place [17:17:56] let me tweak the patch [17:25:14] this is harder than I thought at first glance... the secondary process could be adding the controller to init_ctrlrs at the same time as the primary is checking for hotplug controllers [17:25:50] one quick test would be to disable hotplug in nvmf_tgt if you don't need it - [Nvme] section, add 'HotplugEnable No' [17:26:13] ah, ok... I'll give that a try. [17:26:50] FWIW removing that ctrlr from the init tailq did seem to work, but I understand it's probably racy [17:27:19] yeah, I'm not sure of the consequences of that - we should really be preventing the other process from calling probe_cb on it at all [17:28:07] makes sense... is the init_ctrlrs tailq meant solely for inter-process communication of new controllers? [17:29:26] actually, init_ctrlrs might make more sense as a per-process list - a controller is only on init_ctrlrs while it's in the process of being initialized (reset, create admin queue, that kind of stuff) [17:29:34] once it's done being initialized, it's put on attached_ctrlrs [17:29:49] but I'm not 100% sure how that all interacts in multi-process mode [17:30:48] I see... yeah if RDMA devices essentially will never work multi-process (as it sounds), then perhaps a mechanism that doesn't share the controller at all makes more sense. [17:32:16] well, no need to tackle this on a friday night :) Thanks for your help, Daniel. [17:32:26] sure, we'll get to the bottom of this next week :) [17:32:37] sounds great, have a good weekend [20:13:39] *** Joins: bwalker (~bwalker@192.55.54.44) [20:13:39] *** Server sets mode: +cnt [20:13:39] *** Server sets mode: +cnt [20:14:33] *** Joins: pbshah1 (~pbshah1@192.55.54.44) [20:15:07] *** Joins: cunyinch (~cunyinch@192.55.54.44) [20:18:35] *** Joins: gangcao (~gangcao@192.55.54.44) [20:19:06] *** Joins: jstern (~jstern@192.55.54.44) [20:20:06] *** Joins: pzedlews (~pzedlews@192.55.54.44) [20:21:07] *** Joins: vermavis (~vermavis@192.55.54.44) [20:21:37] *** Joins: qdai2 (~qdai2@192.55.54.44) [20:22:07] *** Joins: ppelplin (~ppelplin@192.55.54.44) [20:23:08] *** Joins: ziyeyang (~ziyeyang@192.55.54.44)