[00:20:35] *** Joins: tomzawadzki (tzawadzk@nat/intel/x-wmcfzhbfyhtsptfa)
[00:52:00] *** Quits: tsuyoshi (b42b2067@gateway/web/freenode/ip.180.43.32.103) (Ping timeout: 260 seconds)
[01:00:14] *** Quits: ziyeyang_ (~ziyeyang@134.134.139.77) (Quit: Leaving)
[07:57:31] *** Quits: tomzawadzki (tzawadzk@nat/intel/x-wmcfzhbfyhtsptfa) (Ping timeout: 240 seconds)
[08:22:22] *** Joins: johnmeneghini1 (~johnmeneg@pool-96-252-112-122.bstnma.fios.verizon.net)
[08:25:45] *** Joins: jstern_ (~jstern@192.55.54.42)
[09:25:42] <bwalker> johnmeneghini: that's not a fatal error. Your build failed due to a rare bug with the rocksdb integration that pops up now and again. jimharris root caused it yesterday so we should have a permanent solution going forward
[09:25:57] <bwalker> when we see a known issue fail a build, one of us will re-run the tests
[09:26:19] <bwalker> there aren't many issues like this - we move aggressively to get rid of latent test failures
[10:00:59] <jimharris> drv: should maintainers (you, me, bwalker) should mark patches as -2 instead of -1?
[10:01:02] <jimharris> https://review.gerrithub.io/#/c/362067/
[10:01:38] <jimharris> i marked it +2 and you marked it -1 - the dashboard shows a checkmark in the CR column but I think I'd prefer to see a -2 there instead
[10:04:32] <jimharris> any thoughts on this CI failure?  it timed out during a compile?  https://ci.spdk.io/builds/review/abf1d42c27ffb76e12e43c7ba112dcd33e375f2a.1495557714/
[10:05:56] <peluse> jimharris, not sure I follow exactly what you are saying about the -2, as we had discussed this should be reserved for "this is going the wrong direction, please abandon the current approach".  For "I prefer not to merge this now" because it needs workable changes, everyone should use -1, is that what you understood?
[10:07:58] <jimharris> peluse: good point - let's stick with -1 for this case - mainly trying to simplify my gerrit workflow
[10:08:12] <drv> yeah, the only annoyance is the way it shows up in the CR column
[10:08:16] <jimharris> exactly
[10:08:29] <drv> but I'm fine with continuing to use -1
[10:09:36] <peluse> cool
[10:09:46] <peluse> hopefully we never see a -2 :)
[10:09:48] *** Quits: jstern_ (~jstern@192.55.54.42) (Quit: Leaving)
[10:10:55] <peluse> ... because folks will discuss any big or odd patch in IRC before they submit!
[10:13:16] <jimharris> https://review.gerrithub.io/#/c/362235/ - the link in the gerrithub review comes up with an empty directory
[10:14:01] <jimharris> but the associated link from the CI status page shows the expected log info
[10:14:03] *** Joins: jstern_ (jstern@nat/intel/x-hksujqfkofzzoegl)
[10:14:48] <drv> jimharris: are you looking at the internal status page maybe?
[10:15:00] <drv> the sync process to ci.spdk.io is not instant
[10:15:11] <jimharris> ah
[10:15:38] <drv> we have 9 GB of builds already - need to put our builds on a diet :)
[10:15:58] <drv> sethhowe has a patch out that will clean up review builds for reviews that have been closed/merged
[10:17:14] <drv> we were talking through ways to make sure the build results are published before we post the link, but it's not trivial since the sync takes so long
[10:17:30] <jimharris> it's fine how it is - I'll just be more patient
[10:18:02] <drv> it's a reasonable complaint - I think we can drop a "this build is not ready yet" index page in there when the build starts so that at least you aren't left with an empty page
[10:18:13] <johnmeneghini1> It appears one of the Fedora testbeds in the spdk CI system has been updated to Linux 4.10.  Do you regularly update the kernel in this system?  And if you do, do you update the RPMs as well?
[10:18:56] <drv> johnmeneghini1: we try to keep up to date as much as possible - wkb-fedora-03 should be running a fairly up-to-date Fedora 25 installation, including all RPMs
[10:20:15] <drv> wkb-fedora-07 is still on Fedora 24 but needs to be updated soon, since Fedora 24 will be out of support in a few months once Fedora 26 is released
[10:21:20] <johnmeneghini1> OK.  What about wkb-fedora-04 and wkb-fedora-07, these are running Linux-4.9.14 and Linux-4.9.7
[10:21:35] <johnmeneghini1> Is there any purpose to that?
[10:21:56] <drv> no, we just haven't gotten around to updating them yet - it's been pretty ad hoc up to this point
[10:22:08] <drv> but we should probably have a scheduled update once a month or something
[10:22:09] <johnmeneghini1> Ok thanks.
[12:14:02] *** Quits: jstern_ (jstern@nat/intel/x-hksujqfkofzzoegl) (Ping timeout: 240 seconds)
[12:14:37] <peluse> jimharris, wrt that vagrant patch I just did, I don't know what happened with dpdk. When I submitted patch 1 it was automatically added with no changes?? I did 'git rm' because I wasn't sure why it did that and uploaded that. That hasn't happened to anyone else (other comments on gerrit)
[12:18:14] <peluse> jimharris, and now I have no clue to to 'fix' this in the patch
[12:20:08] <jimharris> with stgit, I would just pop the patch, do an "stg edit -d <my_patch_name>" and remove the dpdk changes from the patch
[12:20:32] <jimharris> i'm guessing bwalker can help with the raw git equivalent
[12:20:37] <jimharris> or that drv guy
[12:20:47] <jimharris> i hear he's pretty smart too
[12:23:24] <peluse> heh, yeah I'm not using stgit but can easily get to that patch but (a) I didn't make any dpdk change and (b) my "git rm dpdk" was my attempt to do exactly what you suggest
[12:24:18] <jimharris> this is the downside to git submodules - you probably rebased from master, but then didn't update the submodule
[12:24:30] <jimharris> we pushed a change to the dpdk submodule earlier late last week
[12:25:13] <peluse> could be
[12:25:14] <jimharris> so what happened was SPDK changed the submodule commit from SHA1 A to SHA1 B
[12:25:34] <peluse> OK, yeah
[12:25:59] <jimharris> but without doing the git submodule update, the dpdk directory was still SHA1 A
[12:26:48] <jimharris> if you did a git diff at this point, it would show a difference on the dpdk submodule
[12:26:59] <jimharris> and then if you do a git commit -a, it will add that diff to your patch
[12:27:16] <peluse> yeah, when I checkout patch set 1 I get this: "CONFLICT (modify/delete): dpdk deleted in HEAD and modified in a4c3b065ba74184e594dbd7bcaf0c4256590fe7e. Version a4c3b065ba74184e594dbd7bcaf0c4256590fe7e of dpdk left in tree.
[12:27:16] <peluse> Automatic merge failed; fix conflicts and then commit the result."
[12:31:06] <jkkariu> I have a question about the new fio plugin. How do I specify the file name? I have filename='trtype=PCIe traddr=0000.06.00.0 ns=1' but I am getting error nvme.c: 576:spdk_nvme_transport_id_parse: ***ERROR*** Unknown transport ID key ''trtype'
[12:37:36] <drv> jkkariu: if you're specifying filename in the FIO configuration file, you don't need the quotes
[12:38:53] <jkkariu> I removed the quotes and got fio_plugin.c: 268:spdk_fio_setup: ***ERROR*** Invaild traddr=0000.06.00.0
[12:39:22] <jkkariu> should I use : instead of .
[12:39:28] <jkkariu> for the bdf
[12:39:34] <jimharris> try it :)
[12:39:38] <drv> no, FIO treats : as a separator, so you can't use it in the filename
[12:40:00] <drv> that message means that spdk_pci_addr_parse() failed, but your address looks OK to me
[12:40:06] <drv> oh wait, no, you have too many fields
[12:40:26] <drv> er, hmm, no, you have the right number
[12:40:40] <drv> also there's a typo in "Invaild" :)
[12:40:55] <jkkariu> yes
[12:41:29] <drv> can you step through it in gdb and see what is going wrong?
[12:41:42] <jkkariu> sure.
[12:41:49] <drv> it should match the first sscanf branch in spdk_pci_addr_parse()
[12:47:54] <drv> it seems to be working for me with this line in the fio config file: filename=trtype=PCIe traddr=0000.04.00.0 ns=1
[12:52:42] <jkkariu> This is what gdb showed me
[12:52:44] <jkkariu> [New Thread 0x7fffc3e73700 (LWP 38041)]
[12:52:44] <jkkariu> fio_plugin.c: 268:spdk_fio_setup: ***ERROR*** Invaild traddr=0000.06.00.0
[12:52:44] <jkkariu> perf_test: you need to specify size=
[12:52:44] <jkkariu> fio: pid=0, err=22/file:filesetup.c:860, func=total_file_size, error=Invalid argument
[12:53:24] <drv> can you break on spdk_pci_addr_parse() and see which error path it is hitting?
[12:54:04] <drv> your line from above without the quotes works for me (I don't have a device on bus 6, but it gets past the parsing step)
[13:09:51] <jkkariu> ok. So that is the right format.
[13:10:18] <drv> yes
[13:10:33] <drv> I can come take a look if you get stuck
[13:13:15] <jkkariu> I think we might have the old copy of the plugin from when the new features were under development and we were helping test. Let me pull down the latest code
[13:17:16] <jimharris> peluse: love the unittest.sh script
[13:19:47] <jimharris> drv: so maybe i'm missing something - but if the scsi layer gets an bdev completion, how does it know whether that completed i/o has scsi or nvme status?  if it's scsi status, then we'd want to just directly use that information when filling out the sense data for the scsi task
[13:19:54] <peluse> jimharris, I can copy-n-paste with the best of them :) It is incredibly handy though, that's for sure.
[13:19:56] <jimharris> but if its nvme status, maybe we want to do nvme => scsi error translation
[13:20:11] * peluse just remembered he has a bunch of nvme unit test updates to submit next
[13:20:15] <drv> jimharris: that is what should happen inside the bdev layer
[13:20:58] <drv> if the user asks for SCSI and the bdev_io has NVMe status, it will get translated by lib/bdev/scsi_nvme.c
[13:21:02] <drv> and vice versa
[13:21:06] <jimharris> so you are proposing moving scsi translation code from scsi_bdev down into the generic bdev layer?
[13:21:13] <drv> it's already there - I think you reviewed that move :)
[13:21:43] <drv> commit dc9e11163e349c7f1ea5cebb52b74ee1c9ccb107
[13:23:11] <jimharris> does spdk_bdev_io_get_posix_status (or get_errno_status) make sense?
[13:24:14] <drv> we could add one, if some user wants it
[13:24:28] <jimharris> bah - i reviewed that four months ago - i can't remember what i had for breakfast this morning
[13:24:50] <drv> or maybe that morning :)
[13:24:51] <peluse> i can't remember what I had for lunch...
[13:24:59] <jimharris> is it time for lunch?
[13:25:09] <peluse> hellifiknow
[13:25:37] <peluse> do you remember what we decided about LD_FLAGS and the --wrap stuff, any way to only include that in debug builds or something?
[13:25:39] <drv> I think we can add an errno getter pretty easily, and we could also add a completion function that sets the errno value
[13:25:48] <jimharris> ok - i'm good with these changes then
[13:26:18] <drv> peluse: can you expand on why you want it for debug builds only?
[13:26:29] <jimharris> oh - i had a question about that - how hard is it going to be to really make use of --wrap with calls like malloc?
[13:26:31] <drv> I think you want to always wrap the functions you need to mock, but just for the unit tests
[13:26:41] <peluse> drv, jimharris thanks for look at the vagrant tweaks too.  BTW that was all w/o VPN and worked just fine
[13:26:47] <drv> cool, glad it's working
[13:27:18] <drv> I didn't notice jimharris had already commented - guess I repeated some of what he said
[13:27:44] <drv> just saw the "minor" in the commit message and the 600 line change :)
[13:28:20] <peluse> drv, heh, yeah why isn't gerrit smart enough to call any doc update minor?? :)
[13:29:10] <jkkariu> I pulled down the latest code and the issue was resolved
[13:29:25] <jimharris> i think the blob_bdev.c code is already asking for a get_errno_status() function
[13:31:01] <drv> sounds good to me
[13:31:11] <jkkariu> thanks for you help
[13:31:21] <jimharris> bwalker: your rocksdb change has 2 +2s now
[13:32:25] <bwalker> cool
[13:49:50] <jimharris> peluse: are you sure the virtualbox extension module is working on linux?
[13:50:10] <jimharris> your debug output in the vagrant/README.md doesn't show any NVMe devices
[13:52:06] * jimharris goes to make a patch to make hello_world fail when this happens
[13:52:16] <peluse> LOL, I know I just commented that this is precisely why its good to have expected output in the readme for newbs!
[13:59:45] <peluse> hmmm, have a test timeout failure and not sure where to find more detail and/or kickstart things.  Clearly the patch in question didn't change anything that the tests are touching.  https://ci.spdk.io/builds/review/ca7a9617fb0053d391882699e1e6ddcfd03bad4a.1495571166/
[14:01:34] <jimharris> something is fishy with wkb-fedora-03
[14:02:02] <jimharris> it's been acting up all day - i've seen several patches today fail - it looks like fedora-03 basically hung during compilation
[14:02:43] <peluse> do we have a way for anyone to restart a test wo submitting a new patch?
[14:03:08] <drv> only the maintainers can do that right now
[14:03:13] <drv> by removing the Verified label
[14:03:34] <drv> jimharris: sethhowe is looking into it
[14:03:56] <peluse> OK, sounds like a good thing for anyone to be able to do. The downside is abuse though but we're small enough we can probably manage that
[14:04:24] <drv> I'm not 100% sure if you can allow removing a Verified label without also allowing posting of a Verified label
[14:04:49] <peluse> openstack, as an example, had a few ways of doing this, by adding a comment with a specific keyword and bug # (if you knew the failure was for something unrelated and that was already reported)
[14:05:23] <peluse> or by using the keyword with "no bug" which was of course discouraged since it was kind of a "lazy button" but it was there for stuff like this
[14:06:38] <drv> yeah, we definitely need a better story for what to do when a build fails from some intermittent issue
[14:42:06] <jimharris> should we shut off the test pool until fedora-03 gets resolved?
[15:06:42] <bwalker> jimharris: I went a different direction on the patch that sends a function to each thread
[15:06:59] <bwalker> I wanted the function to be passed the io_device and the channel instead of just the user context
[15:07:11] <bwalker> and then I also wanted to pass a message back to track completions
[15:07:17] <bwalker> series starts here: https://review.gerrithub.io/#/c/362255/
[15:13:22] <jimharris> cool - i'll take a look
[15:13:38] <jimharris> argh - calsoft doesn't like my i/o counting patch
[15:17:33] <bwalker> I need your patch that adds a channel to reset
[15:17:41] <bwalker> I had the same patch, just hadn't submitted it yet
[15:17:50] <jimharris> i removed the -1 and it's back in the queue
[15:18:06] <bwalker> that, combined with the message passing stuff I just did
[15:18:10] <bwalker> let's me implement reset the right way
[15:18:27] <jimharris> and then i can put in my nvme nomem handling
[15:19:45] <jimharris> well crap - calsoft tests won't fail on my system
[15:35:07] <jimharris> do we need to run the unittests under valgrind on all of the test systems?
[15:37:22] <jimharris> we could cut 30 seconds from the test time for each patch by disabling valgrind on the unit tests on fedora-07
[15:38:19] <drv> we can make that a config option - we don't need it on all the machines
[15:38:27] <drv> actually, we could just make only one machine run the unit tests at all
[15:38:59] <jimharris> where does the agent pass the name of the file with the SPDK_TEST_* parameters?
[15:41:29] <drv> autorun.sh
[16:06:09] <drv> I think we may have diagnosed the cause of the wkb-fedora-03 random reboots (the NIC was not seated properly in the slot)
[16:43:55] *** Joins: tsuyoshi (b42b2067@gateway/web/freenode/ip.180.43.32.103)
[17:06:58] *** Joins: cunyinch_ (~cunyinch@134.134.139.82)
[20:12:43] <peluse> oh geeze
[20:12:58] <peluse> where are the HW guys when you need them :)
[21:45:04] *** Joins: ziyeyang_ (~ziyeyang@134.134.139.72)
[21:46:32] *** Quits: ziyeyang_ (~ziyeyang@134.134.139.72) (Client Quit)
[21:47:04] *** Joins: ziyeyang_ (~ziyeyang@134.134.139.72)