[00:57:04] *** Quits: ziyeyang_ (~ziyeyang@192.55.54.44) (Quit: Leaving) [01:43:44] *** Joins: Timo (~Timo@2001:638:807:208:bc71:5750:ce1c:471) [01:44:26] hello everyone :) is this channel still in use? [01:44:53] yes [01:45:15] sweet! [01:45:38] I am having some issues with the I/OAT features of the SPDK and getting no responses on the intel forums unfortunately :/ [01:46:00] and googling for "spdk_ioat_submit_copy" literally results in two pages of google results [01:46:54] You guys maybe have an I/OAT expert that could take a look at my forum post? :) [01:56:39] what is the problem? [01:57:58] it's described here: https://software.intel.com/en-us/forums/software-tuning-performance-optimization-platform-monitoring/topic/749389 [01:58:31] essentially, im trying to scan a large data vector by dividing it into chunks, I/OAT-copying those chunks one at a time into local buffer and scanning it there [01:58:48] but the scans yield slightly different results every time [01:59:25] as if the destination memory region wasn't quite ready to be read yet. [02:00:26] there is no source code there [02:03:28] it's attached to the post [02:03:57] ohh yes I see now :D [02:04:40] hah, I wasn't sure if it was okay to post pastebin links, so I just attached the file to the post [02:09:26] For me you have bug in your code [02:10:05] local_1/local_@ are 8 byte long [02:10:41] write unit test that use just memcpy before using IOAT [02:11:04] this will save you a lot of time :) [02:11:34] ohhhh, sorry :D forget my comment [02:11:58] monday morning is not good time for me :) [02:12:31] haha thats okay [02:12:43] I did a test with memcpy and that works [02:15:17] do you by any chance have a machine to test the code on? the architecture of my test server is a bit "special", but it's the only I/OAT-enabled system I have right now [02:20:38] This example don't do full SPDK init sequence. can you just modify one of SPDK apps to check you code? [02:22:08] sure, I'll try. [02:22:20] what do you mean with "full SPDK init sequence", though? [02:26:23] huh. putting it into the ex4 example code works. well, thats embarassing [02:29:24] *** Joins: tomzawadzki (~tomzawadz@192.55.54.41) [02:32:30] hehe :D [02:32:39] good to know :P [02:32:46] im trying to find out what im doing differently [02:33:00] am I missing a call or something? [02:38:32] okay whooops, I guess it's monday morning for me too. I copied the memcpy-code into the example code. of course that works [02:38:51] The actual I/OAT code is also not working when putting it into the ex4 example code . [02:42:15] even worse, it broke my I/OAT devices again. Now I need to restart the system because the I/OAT fail to attach now. I usually only get this when DMA-copying memory that hasn't been allocated by spdk_dma_malloc [02:50:52] can you fit your code into examples/ioat/perf ? [02:51:18] you can modify this example since we know it is working [02:52:15] will do once the server is back up, yea [02:53:17] do you know if there is some kind of command I execute to "reset" the I/OAT devices somehow? whenever I use the SPDK and do somethig it doesnt like I need to reboot the whole system because subsequent SPDK inits will fail to attach any devices [03:06:11] Adding it to "perf" also produces varying results. and I can only call perf once, because now the IOAT devices are unusable again [03:07:01] so I'm either asking I/OAT to do something thats completely wrong, or there's some kind of bug here.. probably the first one [04:13:01] I really cannot wrap my head around this. I was actually hoping to incorporate the I/OAT stuff into a paper, but not with incorrect results :D it's a real shame [04:37:50] *** Joins: Timo_ (~Timo@141.89.226.149) [04:40:31] *** Quits: Timo (~Timo@2001:638:807:208:bc71:5750:ce1c:471) (Ping timeout: 240 seconds) [04:41:57] *** Quits: Timo_ (~Timo@141.89.226.149) (Ping timeout: 240 seconds) [05:12:33] *** Joins: Timo (~Timo@ipservice-092-208-202-115.092.208.pools.vodafone-ip.de) [06:08:30] *** Quits: ChanServ (ChanServ@services.) (*.net *.split) [06:17:46] *** Joins: ChanServ (ChanServ@services.) [06:17:46] *** wolfe.freenode.net sets mode: +o ChanServ [08:22:03] *** Quits: Timo (~Timo@ipservice-092-208-202-115.092.208.pools.vodafone-ip.de) (Remote host closed the connection) [09:35:27] *** Quits: tomzawadzki (~tomzawadz@192.55.54.41) (Ping timeout: 240 seconds) [09:39:55] *** Joins: lhodev (~Adium@inet-hqmc03-o.oracle.com) [09:47:41] Mornin' guys. I'm looking at taking ownership of one of the low-hanging fruit items listed on the Things-To-Do SPDK trello page: Remove use of exit() and abort() for error handling in library code. Question: a cited example in a comment is removing exit() calls in spdk_env_init(). Toward that goal, I would take the approach of altering the return type of spdk_env_init() from void to int. While that's certainly do-able (along with ch [09:47:41] anging all of the callers in the SPDK repo), what's the SPDK's policy of altering this public API? Anyone out there in the "wild" who has relied on spdk_env_init() as a void that would terminate with exit() might suddenly discover breakage. Do I forge ahead with this change, and simply assume that consumers (outside the SPDK repo itself) of this call will just have to alter their code? [10:19:15] lhodev: I think it is probably fine to change the spdk_env_init() return type to int, as long as we document the API change in CHANGELOG.md [10:20:12] Cool. Thanks Daniel. [10:20:50] any apps that don't apply the update will continue to work with a recompile (although won't catch any errors) [10:21:53] we could also decorate it with _attribute__((warn_unused_result)), although I'm not sure that's necessary [11:09:02] *** Joins: AlanA (94571712@gateway/web/freenode/ip.148.87.23.18) [11:13:59] I would like to be added to the SPDK Trello Board. My id is @alanadamson1. Thanks. [11:14:51] *** Joins: Timo (~Timo@ipservice-092-208-202-115.092.208.pools.vodafone-ip.de) [11:15:30] Hi again, I just wanted to say that someone on the intel forums found the bug in my code. In case that was someone in this channel: thank you very much! :) [11:17:01] hi Timo, no problem :) - in the future, we'll be more likely to see the question if you post on the SPDK mailing list: https://lists.01.org/mailman/listinfo/spdk [11:18:35] noted, thank you :) [11:25:03] *** Quits: Timo (~Timo@ipservice-092-208-202-115.092.208.pools.vodafone-ip.de) (Ping timeout: 248 seconds) [11:41:51] *** Joins: Timo (~Timo@ipservice-092-214-202-050.092.214.pools.vodafone-ip.de) [11:46:01] *** Quits: Timo (~Timo@ipservice-092-214-202-050.092.214.pools.vodafone-ip.de) (Ping timeout: 240 seconds) [12:01:09] *** Joins: Timo (~Timo@ipservice-092-214-202-050.092.214.pools.vodafone-ip.de) [12:03:44] so what's happening in the hardware that could explain the fact that the registers don't change value after a hot unplug with the IOMMU enabled [12:03:52] I can't come up with any good theories so far [12:15:05] *** Quits: Timo (~Timo@ipservice-092-214-202-050.092.214.pools.vodafone-ip.de) (Ping timeout: 240 seconds) [13:07:49] bwalker: could you rephrase? [13:08:13] i'm not sure what you're asking about [13:08:30] I got an email about an experiment with vfio hot plug a few days ago [13:08:49] where the result was that removing a device and then doing an MMIO read to the PCIe BAR [13:09:05] kept returning whatever value it was returning prior to removing the device, instead of all 0xF like I expected [13:09:24] and I can't explain how that's possible [13:10:12] physical hotremove? weird [13:10:17] yep [14:52:56] *** ChanServ sets mode: +o peluse [15:14:20] *** Joins: Timo (~Timo@ipservice-092-214-202-050.092.214.pools.vodafone-ip.de) [15:18:46] *** Quits: Timo (~Timo@ipservice-092-214-202-050.092.214.pools.vodafone-ip.de) (Ping timeout: 260 seconds) [15:44:30] bwalker: looking at where _spdk_bs_release_cluster gets called in blobstore [15:45:10] so this waits until the blob gets persisted before releasing clusters that were truncated - can't this lead to cluster leaks that wouldn't be recovered until the blobstore was unloaded and reloaded? [15:46:28] oh - never mind, I see now how this works [16:05:51] any opinions on the necessity of spdk_scsi_dev_print()? I assume this is a leftover from istgt [16:05:59] can we get rid of it, or at least turn it into INFOLOG or something? [16:06:19] either sounds good to me [16:06:26] I'm not sure why it prints "HDD UNIT" either [16:34:41] *** Joins: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) [16:39:42] if you pass a callback to spdk_bdev_open for removal notification, what are you supposed to do in that callback? [16:40:14] I would look in the code to see what things actually do, but as far as I can tell it is entirely unused [16:40:17] close the bdev, I think [16:40:19] yes [16:40:22] close it [16:40:28] do whatever clean up you need to do first, then close it [16:40:32] I see, ok [16:40:35] iSCSI should use it, at least [16:40:38] or probably scsi_bdev [16:40:46] vhost too [16:41:51] the callback for SCSI is spdk_scsi_lun_hot_remove() [16:42:19] which is another layer of callbacks for the SCSI LUN layer [16:43:59] hmm, I'm not sure they actually close the bdev [16:44:02] or at least the vhost one doesn't seem to [16:44:11] they don't seem to do anything to me [16:44:14] at least at first glance [16:45:15] the vhost-scsi one uses the same code path as iSCSI via spdk_scsi_lun_hot_remove, so that may be the one to look into [16:45:26] I'm not sure vhost-blk is actually hotplug capable [16:46:06] vhost-blk isn't [16:46:13] the iscsi target passes NULL for the hot remove callback [16:47:04] vhost-scsi seems to be the one that's at least attempting to make this work [16:47:40] I'm not actually trying to make hotplug work - I'm actually just trying to follow along with the lvol shutdown path [16:47:58] which I think doesn't work - problems exposed by my patches that make channel destruction synchronous [16:48:11] I think starts some asynchronous operations but doesn't provide a callback [16:48:16] so it just drops off and never finishes [16:48:47] if there are open lvols are shutdown, that is [16:48:55] .s/are/at [16:49:21] *** Parts: lhodev (~Adium@inet-hqmc03-o.oracle.com) () [16:50:14] lvol should definitely register a remove_cb when it opens the underlying bdev [16:51:04] vbdev_lvs_hotremove_cb() is what it passes currently - but yeah, that looks like it calls vbdev_lvs_unload() without waiting for its completion [16:51:46] yeah - _vbdev_lvs_remove is bugged when all_lvols_closed is not true [16:55:14] that aside, something in the lvol shutdown path is not waiting for an I/O to complete [16:57:37] oh, no it's not lvol. I see the problem [16:57:42] it's the deferred completion path in bdev itself [17:36:32] bwalker: just chatting with cunyin - he was doing the hotplug through sysfs, not physical removal of the devices - he's going to retry today with physical removal [18:00:48] *** Joins: iotsiabah (c0f00e01@gateway/web/freenode/ip.192.240.14.1) [18:15:01] Hello, are you gus still connected? [18:15:49] I meant to say, are you guys still connected? [18:38:49] yo [18:39:22] what up iotsiabah? [18:53:14] *** Quits: iotsiabah (c0f00e01@gateway/web/freenode/ip.192.240.14.1) (Ping timeout: 260 seconds)