[09:36:13] *** Joins: bwalker (~bwalker@134.134.139.72) [09:36:14] *** ChanServ sets mode: +o bwalker [09:36:14] *** Server sets mode: +cnrt [09:36:14] *** Server sets mode: +cnrt [10:34:58] *** Joins: travis-ci (~travis-ci@ec2-54-167-147-211.compute-1.amazonaws.com) [10:34:59] (spdk/master) vbdev_lvol: set optimal_io_boundary to cluster size (Piotr Pelplinski) [10:34:59] Diff URL: https://github.com/spdk/spdk/compare/2c5b956ee5a7...36cc61388cf9 [10:34:59] *** Parts: travis-ci (~travis-ci@ec2-54-167-147-211.compute-1.amazonaws.com) () [10:35:38] *** Joins: travis-ci (~travis-ci@ec2-174-129-149-194.compute-1.amazonaws.com) [10:35:39] (spdk/master) test/spdkcli: Add load and save config commands. (Pawel Kaminski) [10:35:39] Diff URL: https://github.com/spdk/spdk/compare/36cc61388cf9...09a9130ed249 [10:35:39] *** Parts: travis-ci (~travis-ci@ec2-174-129-149-194.compute-1.amazonaws.com) () [10:46:02] *** Joins: travis-ci (~travis-ci@ec2-54-205-69-222.compute-1.amazonaws.com) [10:46:03] (spdk/master) lib/copy: unregister copy engine on finish (shahar salzman) [10:46:03] Diff URL: https://github.com/spdk/spdk/compare/09a9130ed249...ac5aa2082109 [10:46:03] *** Parts: travis-ci (~travis-ci@ec2-54-205-69-222.compute-1.amazonaws.com) () [11:01:40] *** Joins: travis-ci (~travis-ci@ec2-54-205-69-222.compute-1.amazonaws.com) [11:01:41] (spdk/master) sock: set the fd with non_block flag. (Ziye Yang) [11:01:41] Diff URL: https://github.com/spdk/spdk/compare/ac5aa2082109...4730cd315840 [11:01:41] *** Parts: travis-ci (~travis-ci@ec2-54-205-69-222.compute-1.amazonaws.com) () [11:46:05] jimharris, U there? [11:46:31] yup [11:47:05] jimharris: Just noticed that you uploaded a patch (#29) to Pawel's 426364. Was unaware that could be done. Reading now about that. Is that enabled for "all" in SPDK, or just maintainers? If the former, what's the protocol/professional-courtesy to do that? [11:47:19] I assume we're gonna support ISAL compression as well as DPDK - assuming so we need an ISA-L fork and submodule. Can I set that up or do you need to? I'm ready for it now wrt a patch I'm finsihing to build with compressdev [11:47:50] yay, I know :) [11:50:34] are ISA-L packages available in any of the our supported Linux distributions? [11:50:50] or I guess maybe we don't *need* a fork, btu we need a submodule because DPDK builds from source [11:51:09] well, let me try another way first [11:51:10] ah, ok [11:51:42] never mind, it needs source just like crypto needed ipsec source [11:51:54] ok [11:52:07] one sec and i'll create the spdk github fork [11:52:15] should be use a fork just in case though? Don't see a need right now to change anyyting but... [11:52:24] thx [11:53:49] https://github.com/spdk/isa-l [11:54:07] i'll let you push the commit that adds the submodule [11:55:32] yup, was planning on putting it in with the relevant makefile changes here shortly. I already have a patch in our dpdk fork to uncomment the options for compressdec and isal that will need to get in there before this one will work too, thanks [11:57:10] jimharris, are you going to make an spkd branch on the isa-l fork like with ipsec or just use master? [11:58:45] until we find that we need some SPDK-specific patch let's just use master [12:16:08] coolio [12:16:30] and if we do I think we can track down that ISA-L guy somewhere :) [12:32:56] sethhowe: could you take a look at https://review.gerrithub.io/#/c/spdk/spdk/+/429285/? [12:40:36] *** Joins: travis-ci (~travis-ci@ec2-54-204-148-97.compute-1.amazonaws.com) [12:40:37] (spdk/master) pkgdep: Don't built intel-ipsec-mb as root (Ben Walker) [12:40:37] Diff URL: https://github.com/spdk/spdk/compare/a2fdc4dd73ed...2c1aaa760433 [12:40:37] *** Parts: travis-ci (~travis-ci@ec2-54-204-148-97.compute-1.amazonaws.com) () [12:57:01] jimharris: done. [13:10:44] jimharris, please take a look at https://review.gerrithub.io/c/spdk/spdk/+/429519 and make sure I'm not crazy (it works) [13:13:40] why isn't this needed anymore? does DPDK build the source files directly? [13:16:08] peluse: ^ [13:17:04] oh - I get it, we tell people to build intel-ipsec-mb explicitly [13:17:15] we do it for them in pkgdep.sh [13:17:44] and we tell them to do it when you run configure --with-crypto [13:18:14] I guess DPDK find the header file because its in the path somewhere, I'm not sure. But yes to your question on configure. There are messages in configure and pkgdep to make sure it gets isntalled [13:18:25] and I'm thinking we should do the same for isa-l [13:20:40] just one small nit on the patch but otherwise looks good [13:21:01] OK, cool [13:21:23] LOL, I was wondering about that semicolon!! [13:21:24] does dpdk compressdev rely on isa-l already being built and installed? [13:21:57] it looks almost like there are 2 options if you read the docs but I believe so https://doc.dpdk.org/guides-18.08/compressdevs/isal.html [13:27:23] *** Joins: KipIngram (~kipingram@185.149.90.58) [13:27:41] Hey, does spdk write a log anywhere while operating? [13:28:33] I'm haaving a problem occur during a job, and I'm trying to see if there's anywhere I can go to see what it might have told me it thought was wrong. [13:43:41] lhodev: Only the maintainers have permission to do that. We do it very sparingly and only on minor or inconsequential changes in the interest of speeding along a patch [13:43:54] Thanks bwalker [13:45:07] KipIngram: if you build in debug mode (./configure --enable-debug) you can usually pass the -L parameter to the SPDK applications to enable logging for various modules [13:45:12] sorry lhodev - totally missed your question! [13:45:22] the available options will be listed if you do -h on an SPDK application [13:45:37] @bwalker: Thanks! [13:45:57] we also have more high performance tracing stuff built in, but it's much more limited [13:45:58] bwalker: are you sure only maintainers can push revisions for other peoples' patches? I thought it was anyone in the "SPDK Developers" grou [13:46:07] I'll take a quick look [13:46:25] i'm trying to bring up the project options now but gerrithub is being sllllloooooowwwwww for me [13:46:30] yeah me too [13:47:05] i'm pretty sure it's more than just maintainers - i remember specifically adding john m and ed r to that list since they were pushing patches on behalf of others at netapp [13:47:28] but regardless - your point still stands that it's not something normally done except for specific circumstances [13:48:26] slow for me too [13:48:26] hmm, if the permission is called "Add Patch Set", then both SPDK Maintainers and SPDK Developers have it [13:48:42] i.e. lhodev has permission to do that [13:49:34] got a bunch of errands to run before heading out to the meetup tomorrow - see you all there!!! [13:52:39] bwalker: jimharris: Duly noted. Would be nice if it were configurable per change. I'd have enjoyed the opportunity to have offered patches to Pawel's spdk.spec instead of only performing tests of his and one-line suggestions in the review panel. But, certainly it's not difficult to imagine scenarios where that can be chaotic or annoying to a change-owner. [13:53:53] you can always do something like create another patch based on some version of Pawel's patch (in this case) [13:55:16] I mostly just use that power to fix typos and grammar issues before merging someone's patch [13:57:06] In the case of a WIP item and collaborating with someone many timezones away, I think that -- i.e. create patch based on some version of another person's patch -- is a good model. [14:00:53] bwalker: https://review.gerrithub.io/#/c/spdk/spdk/+/429517/ - fixes nightly test failure on iscsi/ext4test [14:01:32] lgtm [14:29:40] What is the significance of spdk_bdev vs. just spdk in the fio ioengine specification? When I first started using spdk with fio, I read instructions that just said "ioengine=spdk." Today I'm seeing some "ioengine=spdk_bdev", and an extra line in the config file. [14:30:13] I first installed all this under CentOS 7.4 early this year, and it just worked beautifully. I'm trying to bring up a CentOS 7.5 install currently, and it's misbehaving. [14:32:43] we have two fio plugins [14:32:47] one uses the nvme driver directly [14:32:53] and one routes I/O through SPDK's block stack [14:32:57] I think the nvme one is named 'spdk' [14:33:02] and the block stack one is 'spdk_bdev" [14:34:30] I see. So either should work. [14:39:18] Ah! I refreshed my fio and spdk repos and re-built; now I'm getting a message I didn't before. Something I can actually research. :-) [14:51:42] What is ASYNC LIMIT EXCEEDED telling me? [14:52:10] I'm getting a flood of such messages. [15:04:09] *** Joins: travis-ci (~travis-ci@ec2-54-166-66-206.compute-1.amazonaws.com) [15:04:10] (spdk/master) bdev/raid: raid_bdev_remove_base_bdev: cleanup not registered raid bdev (wuzhouhui) [15:04:10] Diff URL: https://github.com/spdk/spdk/compare/2c1aaa760433...e8b0ae0393f0 [15:04:10] *** Parts: travis-ci (~travis-ci@ec2-54-166-66-206.compute-1.amazonaws.com) () [15:17:19] hi KipIngram - can you provide more details on the test you are running that is hitting this message? [15:20:52] Jim, yes; thanks. My fio config file is here: [15:20:54] https://pastebin.com/GTEVqNJq [15:21:28] I'm testing a storage unit we're developing that presents a pair of 2-lane PCIe connections to a common backing store controller. [15:22:15] I've tested this thing quite heavily over the previous part of the year, but I've recently needed to update my OS (CentOS 7.4 to 7.5) and I just installed fio and spdk from scratch on the new system, so I got current code points. [15:22:29] The previous configuration just worked smooth as silk - never gave me any grief whatsoever. [15:24:03] I'm currently experimenting with checking out earlier versions of fio and spdk and building, to see if that makes a difference. [15:24:10] Trying to get around the time I did the first install. [15:24:13] *** Joins: travis-ci (~travis-ci@ec2-54-204-148-97.compute-1.amazonaws.com) [15:24:14] (spdk/master) test/iscsi: exclude *.o from ext4test.sh rsync (Jim Harris) [15:24:15] Diff URL: https://github.com/spdk/spdk/compare/e8b0ae0393f0...d40803609a78 [15:24:15] *** Parts: travis-ci (~travis-ci@ec2-54-204-148-97.compute-1.amazonaws.com) () [15:25:15] does your backing store controller support asynchronous event requests? [15:26:17] I'm not familiar enough with all this to know. I never saw this message before the this post-upgrade work. In fact, I didn't see that message until I refreshed my repos and re-built earlier today. [15:26:34] Things seem to run alright for a minute or two, and then not. [15:26:40] I see correct performance, etc. [15:26:52] I have people I can ask that question, but they're likely gone for the day. [15:28:32] "run alright for a minute or two" -> does this coincide with seeing the async limit exceeded messages? [15:29:59] Oh, well, good point. That comment really applies better to "pre-seeing-those-messages." Earlier, before I got a build that showed me those messages, I would see correct performance in the fio output line, and then a lot of the information from that line would disappear. But the messages actually start showing up almost immediately, before the output line changes. So I guess things aren't good oright from [15:30:01] the start. [15:31:46] Once that output line changes, I can no longer use ctrl-c to interrupt the run - I have to ps ax | kill -9. [15:32:32] when the controller is first initialized, the spdk driver will send an AER (Asynchronous Event Request) message to the controller [15:32:54] controllers must support at least one of these [15:32:59] according to spec [15:33:44] apparently your controller is responding to the AER, with the ASYNC LIMIT EXCEEDED message, which could mean one of two things: [15:34:23] I see two of the key guys online; I've pushed that message to them. [15:34:32] 1) the controller doesn't really support AER at all, so it returns this error the first time it sees an AER (and then the host driver will retry it over and over) [15:35:13] 2) an actual asynchronous event is happening, and the controller is completing the AER - is it possible that the controller is not ready to immediately receive another AER after completing the first one? [15:35:44] I've just been told that we do support AER, but it's never been well-tested because the higher-level product we intend this thing for doesn't use them. [15:35:46] the driver probably should be checking for the ASYNC LIMIT EXCEEDED message [15:36:01] However, I've tested previous software versions of this thing very extensively over the last eight months or so, using fio/spdk. [15:36:09] That doesn't mean it hasn't been broken though. [15:36:21] I can try patching my unit down to an earlier version that I've tested before. [15:37:39] that would be helpful - i'm also going to push a patch that will print an error message but not send another AER if one completes with ASYNC LIMIT EXCEEDED [15:38:06] if you want to hold off 5 minutes before downgrading your unit I'll have a patch for you [15:38:59] Oh, wow. That would be great - thank you. [15:40:09] Meanwhile, I've also been having some issues using just fio and NOT spdk - I'm not 100% sure yet, but the earlier fio version is showing signs of helping that. [15:40:27] oop- I take it back. [15:40:29] Still seeing that. [15:40:47] I'm in one of those nasty situations where I've changed more stuff at once than I should have. :-( [15:47:50] KipIngram: https://review.gerrithub.io/#/c/spdk/spdk/+/429533/ [15:48:00] please report any results on this patch [15:48:23] even if this 'fixes' the problem, i'd still recommend bottoming out on why the AERs are failing with your controller [15:50:37] Applying now. [15:54:18] This should apply to a fresh clone? It's telling me cpk is undeclared. [15:54:35] it's based on the latest master [15:54:42] but he has a typo [15:54:46] change cpk to cpl [15:54:53] Ah. Ok. [15:55:45] I just fixed it - didn't actually pull it down and test it [15:57:06] Ok, that built. Checking it out now. [15:57:56] argh [15:57:58] sorry about that [15:59:14] Ok, so now I'm getting your out-of-spec message, and no message flood - just a few at the outset. Still got the borking of the fio output line, though, and plus performance was just cratered. [15:59:34] So how likely is it that something in CentOS 7.5 has changed enough to throw all this off? [16:00:44] I'm going to pull fio back up to current and down-patch my target. [16:00:57] what version of fio is it [16:01:27] Um, right now I have 3.10.52-g350a, but I haven't reverted that back to latest yet. [16:01:47] you could also try nvme/perf [16:02:03] examples/nvme/perf/perf [16:02:20] Ok - cool. I'll work that into tonight's work too. [16:02:21] it's an fio-like utility specific to SPDK [16:02:30] Ah, neat. [16:02:45] could help you narrow down whether it's an fio issue [16:02:51] I found fio first, and had been using it very successfully with our Infiniband based product. [16:03:19] Then I started testing this guy and wasn't getting the greatest results - it ran, but it seemed like there was a lot of "os noise." [16:03:32] A colleague in Rochester recommended spdk, and that was just the cat's meow. [16:03:51] Not to mention the product we're integrating into is polling based, so we felt like we had a better match to that as well. [16:04:18] This is intended to live in a box running IBM's SVC software stack. [16:05:16] Ok, I'm going to go crawl around in this stuff for a while. Guys, I'm blown away by how responsive you were - I wasn't expecting anything like that much aid. Thank you so much. [16:05:57] I'll let you know tomorrow if anything significant has (or hasn't) happened. Is it ok if I just leave myself in here for the time being? [16:06:08] I've got all my IRC stuff set up so it's "always on." [16:06:24] yeah idling in IRC is a way of life [16:06:41] :-) True true. Ok, all of you have a nice evening. I'll catch you later. [17:52:22] Well, it's hard to be "as certain" about this with perf, since I don't get to see performance along the way. But I'm pretty sure I'm not getting the problem with perf. The characteristic behavior, before was that it would run fast some short length of time (generally under a minute), and then drop, very noticeably. I'm running perf for varying lengths of time, from 30 seconds up to 30 minutes, once. The [17:52:24] results are all clustered together, with the longer runs slightly higher (I assume there's a startup that has to get ammortized). [17:52:39] So looks like this part of things is working properly (aside from the AER issue). [17:52:58] Product's old software still shows the issue with fio/spdk. [17:53:21] So I'm thinking I've got decent confidence that it's fio doing this to me. [17:54:35] Is perf limited to the features visible in that little section of examples? Does it have any ability to log performance, or let me pin to different CPUs, and so on?