[00:57:04] *** Quits: ziyeyang_ (~ziyeyang@192.55.54.44) (Quit: Leaving)
[01:43:44] *** Joins: Timo (~Timo@2001:638:807:208:bc71:5750:ce1c:471)
[01:44:26] <Timo> hello everyone :) is this channel still in use?
[01:44:53] <pwodkowx> yes
[01:45:15] <Timo> sweet!
[01:45:38] <Timo> I am having some issues with the I/OAT features of the SPDK and getting no responses on the intel forums unfortunately :/
[01:46:00] <Timo> and googling for "spdk_ioat_submit_copy" literally results in two pages of google results
[01:46:54] <Timo> You guys maybe have an I/OAT expert that could take a look at my forum post? :)
[01:56:39] <pwodkowx> what is the problem?
[01:57:58] <Timo> it's described here: https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring/topic/749389
[01:58:31] <Timo> essentially, im trying to scan a large data vector by dividing it into chunks, I/OAT-copying those chunks one at a time into local buffer and scanning it there
[01:58:48] <Timo> but the scans yield slightly different results every time
[01:59:25] <Timo> as if the destination memory region wasn't quite ready to be read yet.
[02:00:26] <pwodkowx> there is no source code there
[02:03:28] <Timo> it's attached to the post
[02:03:57] <pwodkowx> ohh yes I see now :D
[02:04:40] <Timo> hah, I wasn't sure if it was okay to post pastebin links, so I just attached the file to the post
[02:09:26] <pwodkowx> For me you have bug in your code
[02:10:05] <pwodkowx> local_1/local_@ are 8 byte long
[02:10:41] <pwodkowx> write unit test that use just memcpy before using IOAT
[02:11:04] <pwodkowx> this will save you a lot of time :)
[02:11:34] <pwodkowx> ohhhh, sorry :D forget my comment
[02:11:58] <pwodkowx> monday morning is not good time for me :)
[02:12:31] <Timo> haha thats okay
[02:12:43] <Timo> I did a test with memcpy and that works
[02:15:17] <Timo> do you by any chance have a machine to test the code on? the architecture of my test server is a bit "special", but it's the only I/OAT-enabled system I have right now
[02:20:38] <pwodkowx> This example don't do full SPDK init sequence. can you just modify one of SPDK apps to check you code?
[02:22:08] <Timo> sure, I'll try.
[02:22:20] <Timo> what do you mean with "full SPDK init sequence", though?
[02:26:23] <Timo> huh. putting it into the ex4 example code works. well, thats embarassing
[02:29:24] *** Joins: tomzawadzki (~tomzawadz@192.55.54.41)
[02:32:30] <pwodkowx> hehe :D
[02:32:39] <pwodkowx> good to know :P
[02:32:46] <Timo> im trying to find out what im doing differently
[02:33:00] <Timo> am I missing a call or something?
[02:38:32] <Timo> okay whooops, I guess it's monday morning for me too. I copied the memcpy-code into the example code. of course that works
[02:38:51] <Timo> The actual I/OAT code is also not working when putting it into the ex4 example code .
[02:42:15] <Timo> even worse, it broke my I/OAT devices again. Now I need to restart the system because the I/OAT fail to attach now. I usually only get this when DMA-copying memory that hasn't been allocated by spdk_dma_malloc
[02:50:52] <pwodkowx> can you fit your code into examples/ioat/perf ?
[02:51:18] <pwodkowx> you can modify this example since we know it is working
[02:52:15] <Timo> will do once the server is back up, yea
[02:53:17] <Timo> do you know if there is some kind of command I execute to "reset" the I/OAT devices somehow? whenever I use the SPDK and do somethig it doesnt like I need to reboot the whole system because subsequent SPDK inits will fail to attach any devices
[03:06:11] <Timo> Adding it to "perf" also produces varying results. and I can only call perf once, because now the IOAT devices are unusable again
[03:07:01] <Timo> so I'm either asking I/OAT to do something thats completely wrong, or there's some kind of bug here.. probably the first one
[04:13:01] <Timo> I really cannot wrap my head around this. I was actually hoping to incorporate the I/OAT stuff into a paper, but not with incorrect results :D it's a real shame
[04:37:50] *** Joins: Timo_ (~Timo@141.89.226.149)
[04:40:31] *** Quits: Timo (~Timo@2001:638:807:208:bc71:5750:ce1c:471) (Ping timeout: 240 seconds)
[04:41:57] *** Quits: Timo_ (~Timo@141.89.226.149) (Ping timeout: 240 seconds)
[05:12:33] *** Joins: Timo (~Timo@ipservice-092-208-202-115.092.208.pools.vodafone-ip.de)
[06:08:30] *** Quits: ChanServ (ChanServ@services.) (*.net *.split)
[06:17:46] *** Joins: ChanServ (ChanServ@services.)
[06:17:46] *** wolfe.freenode.net sets mode: +o ChanServ   
[08:22:03] *** Quits: Timo (~Timo@ipservice-092-208-202-115.092.208.pools.vodafone-ip.de) (Remote host closed the connection)
[09:35:27] *** Quits: tomzawadzki (~tomzawadz@192.55.54.41) (Ping timeout: 240 seconds)
[09:39:55] *** Joins: lhodev (~Adium@inet-hqmc03-o.oracle.com)
[09:47:41] <lhodev> Mornin' guys.  I'm looking at taking ownership of one of the low-hanging fruit items listed on the Things-To-Do SPDK trello page:  Remove use of exit() and abort() for error handling in library code.  Question:  a cited example in a comment is removing exit() calls in spdk_env_init().   Toward that goal, I would take the approach of altering the return type of spdk_env_init() from void to int.  While that's certainly do-able (along with ch
[09:47:41] <lhodev> anging all of the callers in the SPDK repo), what's the SPDK's policy of altering this public API?   Anyone out there in the "wild" who has relied on spdk_env_init() as a void that would terminate with exit() might suddenly discover breakage.  Do I forge ahead with this change, and simply assume that consumers (outside the SPDK repo itself) of this call will just have to alter their code?
[10:19:15] <drv> lhodev: I think it is probably fine to change the spdk_env_init() return type to int, as long as we document the API change in CHANGELOG.md
[10:20:12] <lhodev> Cool.  Thanks Daniel.
[10:20:50] <drv> any apps that don't apply the update will continue to work with a recompile (although won't catch any errors)
[10:21:53] <drv> we could also decorate it with _attribute__((warn_unused_result)), although I'm not sure that's necessary
[11:09:02] *** Joins: AlanA (94571712@gateway/web/freenode/ip.148.87.23.18)
[11:13:59] <AlanA> I would like to be added to the SPDK Trello Board.  My id is @alanadamson1.  Thanks.
[11:14:51] *** Joins: Timo (~Timo@ipservice-092-208-202-115.092.208.pools.vodafone-ip.de)
[11:15:30] <Timo> Hi again, I just wanted to say that someone on the intel forums found the bug in my code. In case that was someone in this channel: thank you very much! :)
[11:17:01] <drv> hi Timo, no problem :) - in the future, we'll be more likely to see the question if you post on the SPDK mailing list: https://lists.01.org/mailman/listinfo/spdk
[11:18:35] <Timo> noted, thank you :)
[11:25:03] *** Quits: Timo (~Timo@ipservice-092-208-202-115.092.208.pools.vodafone-ip.de) (Ping timeout: 248 seconds)
[11:41:51] *** Joins: Timo (~Timo@ipservice-092-214-202-050.092.214.pools.vodafone-ip.de)
[11:46:01] *** Quits: Timo (~Timo@ipservice-092-214-202-050.092.214.pools.vodafone-ip.de) (Ping timeout: 240 seconds)
[12:01:09] *** Joins: Timo (~Timo@ipservice-092-214-202-050.092.214.pools.vodafone-ip.de)
[12:03:44] <bwalker> so what's happening in the hardware that could explain the fact that the registers don't change value after a hot unplug with the IOMMU enabled
[12:03:52] <bwalker> I can't come up with any good theories so far
[12:15:05] *** Quits: Timo (~Timo@ipservice-092-214-202-050.092.214.pools.vodafone-ip.de) (Ping timeout: 240 seconds)
[13:07:49] <darsto_> bwalker: could you rephrase?
[13:08:13] <darsto_> i'm not sure what you're asking about
[13:08:30] <bwalker> I got an email about an experiment with vfio hot plug a few days ago
[13:08:49] <bwalker> where the result was that removing a device and then doing an MMIO read to the PCIe BAR
[13:09:05] <bwalker> kept returning whatever value it was returning prior to removing the device, instead of all 0xF like I expected
[13:09:24] <bwalker> and I can't explain how that's possible
[13:10:12] <darsto_> physical hotremove? weird
[13:10:17] <bwalker> yep
[14:52:56] *** ChanServ sets mode: +o peluse
[15:14:20] *** Joins: Timo (~Timo@ipservice-092-214-202-050.092.214.pools.vodafone-ip.de)
[15:18:46] *** Quits: Timo (~Timo@ipservice-092-214-202-050.092.214.pools.vodafone-ip.de) (Ping timeout: 260 seconds)
[15:44:30] <jimharris> bwalker: looking at where _spdk_bs_release_cluster gets called in blobstore
[15:45:10] <jimharris> so this waits until the blob gets persisted before releasing clusters that were truncated - can't this lead to cluster leaks that wouldn't be recovered until the blobstore was unloaded and reloaded?
[15:46:28] <jimharris> oh - never mind, I see now how this works
[16:05:51] <drv> any opinions on the necessity of spdk_scsi_dev_print()? I assume this is a leftover from istgt
[16:05:59] <drv> can we get rid of it, or at least turn it into INFOLOG or something?
[16:06:19] <jimharris> either sounds good to me
[16:06:26] <drv> I'm not sure why it prints "HDD UNIT" either
[16:34:41] *** Joins: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97)
[16:39:42] <bwalker> if you pass a callback to spdk_bdev_open for removal notification, what are you supposed to do in that callback?
[16:40:14] <bwalker> I would look in the code to see what things actually do, but as far as I can tell it is entirely unused
[16:40:17] <drv> close the bdev, I think
[16:40:19] <jimharris> yes
[16:40:22] <jimharris> close it
[16:40:28] <jimharris> do whatever clean up you need to do first, then close it
[16:40:32] <bwalker> I see, ok
[16:40:35] <drv> iSCSI should use it, at least
[16:40:38] <drv> or probably scsi_bdev
[16:40:46] <jimharris> vhost too
[16:41:51] <drv> the callback for SCSI is spdk_scsi_lun_hot_remove()
[16:42:19] <drv> which is another layer of callbacks for the SCSI LUN layer
[16:43:59] <drv> hmm, I'm not sure they actually close the bdev
[16:44:02] <drv> or at least the vhost one doesn't seem to
[16:44:11] <bwalker> they don't seem to do anything to me
[16:44:14] <bwalker> at least at first glance
[16:45:15] <drv> the vhost-scsi one uses the same code path as iSCSI via spdk_scsi_lun_hot_remove, so that may be the one to look into
[16:45:26] <drv> I'm not sure vhost-blk is actually hotplug capable
[16:46:06] <jimharris> vhost-blk isn't
[16:46:13] <bwalker> the iscsi target passes NULL for the hot remove callback
[16:47:04] <bwalker> vhost-scsi seems to be the one that's at least attempting to make this work
[16:47:40] <bwalker> I'm not actually trying to make hotplug work - I'm actually just trying to follow along with the lvol shutdown path
[16:47:58] <bwalker> which I think doesn't work - problems exposed by my patches that make channel destruction synchronous
[16:48:11] <bwalker> I think starts some asynchronous operations but doesn't provide a callback
[16:48:16] <bwalker> so it just drops off and never finishes
[16:48:47] <bwalker> if there are open lvols are shutdown, that is
[16:48:55] <bwalker> .s/are/at
[16:49:21] *** Parts: lhodev (~Adium@inet-hqmc03-o.oracle.com) ()
[16:50:14] <drv> lvol should definitely register a remove_cb when it opens the underlying bdev
[16:51:04] <drv> vbdev_lvs_hotremove_cb() is what it passes currently - but yeah, that looks like it calls vbdev_lvs_unload() without waiting for its completion
[16:51:46] <bwalker> yeah - _vbdev_lvs_remove is bugged when all_lvols_closed is not true
[16:55:14] <bwalker> that aside, something in the lvol shutdown path is not waiting for an I/O to complete
[16:57:37] <bwalker> oh, no it's not lvol. I see the problem
[16:57:42] <bwalker> it's the deferred completion path in bdev itself
[17:36:32] <jimharris> bwalker: just chatting with cunyin - he was doing the hotplug through sysfs, not physical removal of the devices - he's going to retry today with physical removal
[18:00:48] *** Joins: iotsiabah (c0f00e01@gateway/web/freenode/ip.192.240.14.1)
[18:15:01] <iotsiabah> Hello, are you gus still connected?
[18:15:49] <iotsiabah> I meant to say, are you guys still connected?
[18:38:49] <peluse> yo
[18:39:22] <peluse> what up iotsiabah?
[18:53:14] *** Quits: iotsiabah (c0f00e01@gateway/web/freenode/ip.192.240.14.1) (Ping timeout: 260 seconds)