[01:48:58] *** Quits: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…)
[02:08:23] *** Joins: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl)
[04:28:04] *** Quits: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) (Ping timeout: 240 seconds)
[04:28:37] *** Joins: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl)
[05:22:27] *** Quits: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) (Ping timeout: 260 seconds)
[05:24:34] *** Joins: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl)
[05:39:00] *** Quits: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…)
[06:07:59] *** Joins: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl)
[07:04:08] *** Joins: lhodev (~Adium@inet-hqmc07-o.oracle.com)
[07:07:27] *** Joins: mark-ddn (4a608da8@gateway/web/cgi-irc/kiwiirc.com/ip.74.96.141.168)
[07:09:26] *** Parts: lhodev (~Adium@inet-hqmc07-o.oracle.com) ()
[07:10:42] *** Parts: mark-ddn (4a608da8@gateway/web/cgi-irc/kiwiirc.com/ip.74.96.141.168) ()
[08:05:05] *** Joins: markyk (4a608da8@gateway/web/cgi-irc/kiwiirc.com/ip.74.96.141.168)
[08:06:11] *** Quits: markyk (4a608da8@gateway/web/cgi-irc/kiwiirc.com/ip.74.96.141.168) (Client Quit)
[08:23:51] *** Joins: nKumar (uid239884@gateway/web/irccloud.com/x-rtszgrhzdfhogbtz)
[08:24:45] <nKumar> so I am super close to having this blobStore implementation done. However, whenever I execute a spdk_bs_io_read_blob, I get no errors however all of the values in the readBuffer created are 0s
[08:25:49] <nKumar> any ideas why this might be happening?
[08:32:03] <peluse> nKumar, is this still the same issue you were having before wrt data not matching or has it morphed from that somehow?
[08:32:16] <nKumar> not really, last time it seemed to be garbage data
[08:32:19] <nKumar> now its fully zeroed out
[08:36:32] <peluse> can you post your write & read routines somewhere like you did before? and also info on sizes (size of cluster, size of blob, size of IO that you're doing)?
[08:40:56] *** Quits: nKumar (uid239884@gateway/web/irccloud.com/x-rtszgrhzdfhogbtz) (Excess Flood)
[08:41:16] *** Joins: nKumar (uid239884@gateway/web/irccloud.com/x-wwsckgeegipzmviy)
[10:18:36] <drv> peluse: just merged blob_hello - looks good
[10:58:27] <peluse> drv, awesome thanks!
[10:59:19] * peluse just remembered he needs to go fixup the commit msgs on all those UT patches here soon...
[14:43:29] *** Quits: gila (~gila@5ED4FE92.cm-7-5d.dynamic.ziggo.nl) (Quit: My Mac Pro has gone to sleep. ZZZzzz…)
[14:56:25] *** Joins: whitepa (~whitepa@2601:601:1200:f23b:8054:7049:7905:d8f7)
[16:12:06] <whitepa> hi all - I've noticed that the nvme_pcie_ctrlr_construct allocates using spdk_dma_zmalloc, while nvme_rdma_ctrlr_construct allocates using plain old calloc().  Is that an oversight?
[16:12:42] <whitepa> I'm trying to use nvme in multi-process mode, with both pcie and rdma devices.
[16:14:09] <whitepa> hitting a segfault when one process is trying to access a nvme_rdma_ctrlr that's on the other process' heap.
[16:14:25] <whitepa> seems like that should probably be in dpdk shared mem
[16:18:43] <drv> hi whitepa, we've been discussing just this topic recently
[16:18:54] <drv> couple of patches that probably are of interest: https://review.gerrithub.io/#/c/373908/ and https://review.gerrithub.io/#/c/373898/
[16:19:36] <whitepa> hey, cool!  I was in the middle of doing just that right now :)
[16:19:39] <whitepa> thanks
[16:20:34] <drv> if you try those out and they work, please let us know (either in IRC or as comments/+1 on the reviews if you have GerritHub set up)
[16:20:43] <whitepa> will do
[16:33:25] <whitepa> it seemed to fix the one particular crash I was seeing, now I'm hitting a different, related one.
[16:33:41] <whitepa> (gdb) print rqpair->cm_id->qp                                                                                                         |
[16:33:41] <whitepa> Cannot access memory at address 0x14338d8
[16:34:40] <drv> hmm, are you attempting to actually use a NVMe-oF controller from more than one process?
[16:34:43] <whitepa> use case is two processes, both accessing the same PCIe local SSDs.  One of those processes tries to add a device via RDMA, and the other crashes.
[16:34:44] <drv> I don't think that will work even after the patches
[16:38:17] <whitepa> I'm using nvmf_tgt as the primary process, it connects direct to 2 PCIe SSDs.  Then a secondary process starts up, connects to the same SSDs, and then tries to open another via RDMA.  at that point, nvmf_tgt segfaults as above.
[16:39:06] <whitepa> I don't need nvmf_tgt to be able to use the remote SSDs, I only need it to act as a target for the locals.
[16:39:39] <drv> hmm, that seems like it should work, so there may be more patches necessary
[16:39:53] <whitepa> it seems like it's trying to initialize it's own controller for the RDMA connected SSD
[16:40:21] <whitepa> looks like a hotplug event is being handled or something
[16:48:01] <drv> whitepa: do you have a backtrace handy? how does the nvmf_tgt side end up trying to create a NVMe-oF qpair in the NVMe driver?
[16:48:36] <drv> my suspicion is that the NVMe-oF controller created in the secondary process will get added to the attached_ctrlrs list in the shared driver structure
[16:48:51] <whitepa> yeah - paste here?
[16:49:11] <drv> and then the nvmf_tgt's hotplug poller will call spdk_nvme_probe() and try to init the NVMe-oF controller, which won't work since the ibverbs pointers aren't in the shared memory region
[16:49:12] *** Quits: nKumar (uid239884@gateway/web/irccloud.com/x-wwsckgeegipzmviy) (Quit: Connection closed for inactivity)
[16:49:22] <drv> whitepa: if it's short, sure, or on a pastebin
[16:49:37] <drv> really just want to verify that the callstack includes spdk_nvme_probe() being called in nvmf_tgt
[16:49:39] <whitepa> https://pastebin.com/X5wV9kzp
[16:50:36] <whitepa> it's definitely in there
[16:50:53] <drv> hmm, yeah, looks like that's what's going on - we probably need a check in nvme_init_controllers() to avoid trying to touch non-PCIe controllers from a different process
[16:52:43] <drv> we already stash the owning process's pid in each qpair, so that might be a way to check without adding too much new stuff
[16:55:22] <drv> nm, it's actually in the per_process_tailq and only for io queues, but still shouldn't be too hard to add - let me see if I can put something together
[16:57:00] <whitepa> yeha I see the active_procs tailq in the ctrlr... each having a pid and is_primary
[17:05:12] <drv> this might work around it, untested: https://review.gerrithub.io/#/c/374015/
[17:05:58] <whitepa> makes sense
[17:06:03] <whitepa> testing...
[17:13:53] <whitepa> seems to be in an infinite loop on that continue
[17:16:11] <whitepa> does the controller need to be removed from the init_ctrlrs list?
[17:16:58] <drv> hmm, yeah, I totally didn't read the loop condition
[17:17:17] <drv> we probably can't remove it from the init_ctrlrs, though, since that's shared across the processes... have to think about how to work around that
[17:17:33] <drv> it actually should probably not be added to init_ctrlrs in the first place
[17:17:56] <drv> let me tweak the patch
[17:25:14] <drv> this is harder than I thought at first glance... the secondary process could be adding the controller to init_ctrlrs at the same time as the primary is checking for hotplug controllers
[17:25:50] <drv> one quick test would be to disable hotplug in nvmf_tgt if you don't need it - [Nvme] section, add 'HotplugEnable No'
[17:26:13] <whitepa> ah, ok... I'll give that a try.
[17:26:50] <whitepa> FWIW removing that ctrlr from the init tailq did seem to work, but I understand it's probably racy
[17:27:19] <drv> yeah, I'm not sure of the consequences of that - we should really be preventing the other process from calling probe_cb on it at all
[17:28:07] <whitepa> makes sense... is the init_ctrlrs tailq meant solely for inter-process communication of new controllers?
[17:29:26] <drv> actually, init_ctrlrs might make more sense as a per-process list - a controller is only on init_ctrlrs while it's in the process of being initialized (reset, create admin queue, that kind of stuff)
[17:29:34] <drv> once it's done being initialized, it's put on attached_ctrlrs
[17:29:49] <drv> but I'm not 100% sure how that all interacts in multi-process mode
[17:30:48] <whitepa> I see... yeah if RDMA devices essentially will never work multi-process (as it sounds), then perhaps a mechanism that doesn't share the controller at all makes more sense.
[17:32:16] <whitepa> well, no need to tackle this on a friday night :)  Thanks for your help, Daniel.
[17:32:26] <drv> sure, we'll get to the bottom of this next week :)
[17:32:37] <whitepa> sounds great, have a good weekend
[20:13:39] *** Joins: bwalker (~bwalker@192.55.54.44)
[20:13:39] *** Server sets mode: +cnt 
[20:13:39] *** Server sets mode: +cnt 
[20:14:33] *** Joins: pbshah1 (~pbshah1@192.55.54.44)
[20:15:07] *** Joins: cunyinch (~cunyinch@192.55.54.44)
[20:18:35] *** Joins: gangcao (~gangcao@192.55.54.44)
[20:19:06] *** Joins: jstern (~jstern@192.55.54.44)
[20:20:06] *** Joins: pzedlews (~pzedlews@192.55.54.44)
[20:21:07] *** Joins: vermavis (~vermavis@192.55.54.44)
[20:21:37] *** Joins: qdai2 (~qdai2@192.55.54.44)
[20:22:07] *** Joins: ppelplin (~ppelplin@192.55.54.44)
[20:23:08] *** Joins: ziyeyang (~ziyeyang@192.55.54.44)