[00:12:05] *** Quits: lhodev (~lhodev@66-90-218-190.dyn.grandenetworks.net) (Ping timeout: 256 seconds)
[00:12:13] *** Joins: lhodev_ (~lhodev@66-90-218-190.dyn.grandenetworks.net)
[00:52:52] *** Joins: tkulasek_ (~tkulasek@134.134.139.75)
[00:52:53] *** Joins: tkulasek (~tkulasek@134.134.139.75)
[02:18:19] *** Quits: tkulasek (~tkulasek@134.134.139.75) (Quit: Leaving)
[02:18:37] *** Quits: tkulasek_ (~tkulasek@134.134.139.75) (Quit: Leaving)
[02:18:48] *** Joins: tkulasek_ (~tkulasek@192.55.54.44)
[02:22:46] *** Joins: dlw (~Thunderbi@114.255.44.143)
[03:44:51] *** Quits: dlw (~Thunderbi@114.255.44.143) (Ping timeout: 240 seconds)
[04:19:19] *** Joins: johnmeneghini (~johnmeneg@216.240.30.5)
[06:40:38] *** Quits: drv (daniel@oak.drv.nu) (Ping timeout: 260 seconds)
[06:45:54] *** Joins: drv (daniel@oak.drv.nu)
[06:45:54] *** ChanServ sets mode: +o drv
[06:49:16] *** Quits: drv (daniel@oak.drv.nu) (Client Quit)
[06:50:51] *** Joins: drv (daniel@oak.drv.nu)
[06:50:51] *** ChanServ sets mode: +o drv
[08:53:55] <drv> johnmeneghini: sorry for the delay - it's running now
[11:14:26] *** Joins: travis-ci (~travis-ci@ec2-54-158-138-97.compute-1.amazonaws.com)
[11:14:27] <travis-ci> (spdk/master) jsonrpc: fix closed connection hadling (Pawel Wodkowski)
[11:14:27] <travis-ci> Diff URL: https://github.com/spdk/spdk/compare/967339f3e533...01a9118d0c29
[11:14:27] *** Parts: travis-ci (~travis-ci@ec2-54-158-138-97.compute-1.amazonaws.com) ()
[12:06:06] <stefanha> jimharris drv: Are you submitting talks for KVM Forum on Oct 24-26 in Edinburgh, Scotland?  https://events.linuxfoundation.org/events/kvm-forum-2018/program/cfp/
[12:06:54] <stefanha> I looked at the SPDK Summit slides.  It would be great to see SPDK material at KVM Forum and learn more about the use cases you've been targetting.
[12:07:58] <jimharris> bwalker: could you take a look at https://review.gerrithub.io/#/c/spdk/spdk/+/412695/ again?
[12:08:28] <jimharris> stefanha: we don't have anything submitted yet
[12:09:01] <stefanha> The CFP deadline is June 14
[12:09:02] <jimharris> not sure if drv or i can get travel budget approved, but would be great to get someone from our Intel team in poland to attend
[12:10:45] *** Quits: tkulasek_ (~tkulasek@192.55.54.44) (Ping timeout: 260 seconds)
[12:14:03] <stefanha> jimharris: That would be cool.  Talks can be either for users (overview of how to configure/deploy SPDK) or internals (vhost-user, I/O architecture, etc)
[12:21:48] *** Joins: JoeGruher (c037362d@gateway/web/freenode/ip.192.55.54.45)
[12:22:19] <JoeGruher> When using the SPDK NVMe FIO plugin to access a namespace on an NVMeoF target, this is the example given: filename=trtype=RDMA adrfam=IPv4 traddr=192.168.100.8 trsvcid=4420 ns=1
[12:22:47] <JoeGruher> but it doesn't seem to allow me to specify which subsystem on the target to access?  shouldn't we need an NQN or something?
[12:35:08] *** Joins: travis-ci (~travis-ci@ec2-54-157-238-183.compute-1.amazonaws.com)
[12:35:09] <travis-ci> (spdk/master) test/vhost: move negative tests to separate file (Karol Latecki)
[12:35:09] <travis-ci> Diff URL: https://github.com/spdk/spdk/compare/4404da7ceaec...0af5182e1994
[12:35:09] *** Parts: travis-ci (~travis-ci@ec2-54-157-238-183.compute-1.amazonaws.com) ()
[12:36:19] *** Joins: travis-ci (~travis-ci@ec2-54-158-138-97.compute-1.amazonaws.com)
[12:36:20] <travis-ci> (spdk/master) blobstore: freeze I/O during resize (Piotr Pelplinski)
[12:36:21] <travis-ci> Diff URL: https://github.com/spdk/spdk/compare/0af5182e1994...69fa57cdf079
[12:36:21] *** Parts: travis-ci (~travis-ci@ec2-54-158-138-97.compute-1.amazonaws.com) ()
[12:39:53] <drv> JoeGruher: you can specify the subsystem NQN with the subnqn= key in the filename
[12:39:58] *** Quits: darsto (~darsto@89-68-12-72.dynamic.chello.pl) (Ping timeout: 260 seconds)
[12:40:15] <drv> if you don't specify a subnqn, it will try to connect to a discovery service at that address
[12:41:46] *** Quits: gila (~gila@5ED74129.cm-7-8b.dynamic.ziggo.nl) (Ping timeout: 264 seconds)
[12:42:39] <JoeGruher> got it, thx
[12:42:57] <JoeGruher> i realize now my FIO plugin build failed on these systems for some reason
[12:42:58] *** Joins: gila (~gila@static.214.50.9.5.clients.your-server.de)
[12:43:11] <JoeGruher> so i'll have to figure that out first
[12:44:34] *** Joins: darsto (~darsto@89-68-12-72.dynamic.chello.pl)
[12:59:51] <drv> jimharris: I posted another comment on https://review.gerrithub.io/#/c/spdk/spdk/+/412695/ - want to get your input before I send Vishal off on a wild goose chase :)
[13:01:41] <jimharris> i was thinking we don't worry about precise stats
[13:09:16] *** Quits: pohly (~pohly@p54BD5098.dip0.t-ipconnect.de) (Quit: Leaving.)
[13:19:17] <bwalker> jimharris: took_action determines whether the poller goes idle or not - shouldn't that be the same metric we use to track idle/active time?
[13:20:36] <bwalker> I'm also not sure why we need all these extra now = spdk_get_ticks calls either
[13:23:09] <JoeGruher> does the SPDK initiator work with the Linux kernel target?
[13:23:12] <bwalker> I think we only need to capture the tsc when the state flips from active to idle and vis versa
[13:23:19] <bwalker> yes it does
[13:23:25] <JoeGruher> I can run IO for about 10 seconds and then it drops to zero and I get this printed on the target system: [  323.024748] nvmet: ctrl 1 keep-alive timer (10 seconds) expired! [  323.036784] nvmet: ctrl 1 fatal error occurred!
[13:23:39] <bwalker> hmm
[13:23:44] <bwalker> what version of the kernel?
[13:23:55] <JoeGruher> 4.16.14
[13:24:04] <drv> is this with the nvme fio_plugin?
[13:24:14] <JoeGruher> yes, fio plugin on the initator system
[13:24:42] <drv> at a glance, it looks like we never call spdk_nvme_ctrlr_process_admin_completions() from the nvme fio_plugin, which is what sends the keep-alives
[13:24:49] <drv> not sure how nobody noticed this before
[13:25:02] <bwalker> yeah - that's just a bug
[13:25:18] <bwalker> must only be using fio on local disks everywhere
[13:25:39] <JoeGruher> will this also cause a failure with SPDK target, or only kernel target?
[13:25:47] <drv> bdev_nvme does poll for admin completions, so if you are willing to switch to the bdev fio_plugin, that should work
[13:25:58] <drv> the SPDK target doesn't currently enforce keep alives
[13:26:16] <drv> (which is also a bug that should eventually get fixed)
[13:26:23] <bwalker> I don't actually think the kernel target used to enforce either
[13:26:27] <bwalker> I think that's more recent
[13:27:25] <JoeGruher> OK - so I can try nvme plugin with SPDK target and bdev plugin with kernel target
[13:27:36] <bwalker> or bdev with both
[13:28:08] <JoeGruher> for the bdev plugin, looks like I need to attach the target to create a bdev on the initiator system, then I run FIO against the bdev - how do I create the bdev, is there a rpc.py command to do it?
[13:29:00] <bwalker> I believe you pass the fio plugin a configuration file that defines the bdevs
[13:29:28] <bwalker> the same kind of configuration format that you pass to the nvme-of target
[13:30:02] <bwalker> see examples/bdev/fio_plugin/README.md
[13:30:08] <JoeGruher> I see, spdk_conf=./examples/bdev/fio_plugin/bdev.conf
[13:30:13] <bwalker> yep
[13:30:33] <JoeGruher> but that bdev.conf example only shows a malloc device
[13:30:44] <JoeGruher> what contents should bdev.conf have for an nvmeof target device
[13:31:07] <bwalker> [Nvme]
[13:31:07] <bwalker>   TransportID "trtype:PCIe traddr:0000:00:00.0" Nvme0
[13:31:25] <JoeGruher> no - trtype rdma, right
[13:31:32] <bwalker> except trtype:RDMA traddr:<ip> trsvcid:<port> subnqn:<nqn>
[13:31:33] <JoeGruher> what would that look like
[13:31:37] <JoeGruher> ah ok
[13:31:43] <jimharris> bwalker: i don't think so - the "idle" stuff in spdk_reactor_run today is all around whether there were any events/pollers/timers that executed
[13:32:20] <jimharris> but in vishal's code, it's interpreting the poller return code to see if it did any "real" work
[13:32:34] <jimharris> it's conflating two different concepts of idle
[13:32:46] <jimharris> which probably needs to be fixed - i'm just saying we should do that separately from vishal's patch
[13:33:02] <bwalker> why? Isn't the point of vishal's patch to change the meaning of idle?
[13:33:35] <jimharris> the point of vishal's patch is to get an idea of how much a reactor core is actually being utilized in terms of cpu cycles
[13:33:43] <jimharris> top always says 100% since its polling
[13:33:52] <bwalker> his patch adds a lot of spdk_get_ticks() overhead that could be eliminated if it was tied to the idle mechanism
[13:34:07] <bwalker> how much does his patch hurt performance in benchmarks?
[13:34:15] <jimharris> i've done measurements and the extra spdk_get_ticks overhead is very minimal
[13:35:38] <bwalker> it's effectively one extra spdk_get_ticks call on every iteration through the loop
[13:35:48] <jimharris> correct
[13:36:18] <jimharris> or more precisely, we call spdk_get_ticks every time through the loop now instead of once every five times through the loop
[13:37:35] <JoeGruher> FIO doesn't seem to like the spdk_conf parameter... that goes right into the FIO .ini file?
[13:37:36] <JoeGruher> Bad option <spdk_conf=./bdev.conf.in>
[13:40:48] <bwalker> didn't we put in the count to 5 without doing timed-pollers in because of a performance benchmark?
[13:40:57] <bwalker> so if we take that out - what changed?
[13:41:02] <bwalker> that's changing our mind
[13:41:56] <jimharris> i think previously there was a suspicion that extra get_ticks calls would hurt performance
[13:42:10] <bwalker> I thought we measured it with your event perf tool
[13:42:33] <jimharris> i can re-run the data again with bdevperf
[13:43:02] <jimharris> it's a bit noticeable with something like event perf which is really the worst case scenario
[13:44:08] <bwalker> vishal's first patch changes it from a call to spdk_get_ticks() every 5th to a call on every iteration (if there are timed pollers registered). His second patch, the one I commented on, changes it to every iteration if there are any type of poller
[13:44:33] <JoeGruher> ah ok if I do the ld_preload and change the ioengine to spdk_bdev then FIO doesn't complain about the parameter
[13:46:10] <JoeGruher> but i get this failure: nvme_rdma.c: 803:nvme_rdma_qpair_connect: *ERROR*: Unhandled ADRFAM 0 nvme_rdma.c:1393:nvme_rdma_ctrlr_construct: *ERROR*: failed to create admin qpair
[13:46:59] <bwalker> trtype:RDMA traddr:<ip> trsvcid:<port> subnqn:<nqn> adrfam:IPv4
[13:48:23] <JoeGruher> great that works
[13:48:31] <JoeGruher> except I get the same failure as with the nvme plugin: [ 1917.393609] nvmet: ctrl 1 keep-alive timer (10 seconds) expired! [ 1917.405660] nvmet: ctrl 1 fatal error occurred!
[13:48:51] <JoeGruher> ran great for the first ten seconds tho
[13:49:16] <bwalker> hmm
[13:50:26] <drv> hm, the bdev_nvme adminq poller is supposed to run every second by default (unless you've changed AdminPollRate in the [Nvme] section), so that should be fine
[13:50:40] <drv> unless we're getting the wrong keep-alive timeout value back from the target somehow
[13:50:46] <bwalker> jimharris: the return code of the poller functions is used to decide whether to increment active/idle/unknown time
[13:50:54] <jimharris> yes
[13:51:01] <bwalker> but I just looked through the code base and some pollers return -1 for active, some return 0 for active, and some return > 0 for active
[13:51:09] <bwalker> so the stats captured in that patch are meaningless
[13:51:30] <bwalker> you have to make all pollers return active/idle before you can gather the stats
[13:52:11] <jimharris> i suggested the vishal work on the stat collection; we also need to go through all of the pollers to have them return the correct values
[13:52:19] <bwalker> and if the pollers are correctly returning active/idle, you may as well improve the took_action flag while you're there
[13:52:34] <jimharris> drv originally had all of them return -1 to mean "unknown" - but it's possible more recent pollers aren't doing the right thing
[13:53:08] <jimharris> really?  if an nvme_of poll group is idle for one second, we don't want the reactor to go to sleep - what will wake it up?
[13:54:07] <bwalker> it will only sleep for a configurable amount of time
[13:54:13] <bwalker> your maximum acceptable latency
[13:54:20] <bwalker> if you set that to 0, it won't ever sleep
[13:54:58] <bwalker> it could be that new pollers are returning 0 for "success"
[13:55:03] <bwalker> and that's what I'm seeing
[14:00:53] <jimharris> currently, as long as there is one poller on the reactor, it will never sleep, even if you set the max_delay_us
[14:01:40] <jimharris> it just keeps track of how long it's been since either an event executed or a poller existed
[14:02:30] <jimharris> but if we change that determination based on vishal's stats - now you could have cases where the target is just idle for a while, not doing any I/Os, and the reactor would go to sleep
[14:03:21] <jimharris> personally, I think we should just remove the max_delay_us stuff for now - maybe in the future when we get the dynamic threading stuff implemented, we can put a reactor to sleep when it's not running any threads
[14:08:00] <JoeGruher> in fio I can't specify an NQN like this in the filename parameter because the ':' breaks it, right?  subnqn=nqn.2018-05.io.spdk:nqn01
[14:08:44] <JoeGruher> is there a workaround?
[14:10:00] <jimharris> drv: i responded to https://review.gerrithub.io/#/c/spdk/spdk/+/414472/
[14:15:31] <drv> oh, I see
[14:15:34] <drv> I'm OK either way
[14:18:41] *** Quits: johnmeneghini (~johnmeneg@216.240.30.5) (Quit: Leaving.)
[14:20:44] <drv> JoeGruher: right, I don't think there is a workaround for that currently (unless there's some escaping mechanism in FIO that we didn't find yet)
[14:21:18] <JoeGruher> can i create nqns without a ':' on the spdk target?  i get an error if i just change it to a '.' the target seems to enforce some NQN rules?
[14:25:18] <JoeGruher> which doesn't really seem necessary, why does the target care if i want to use an out of spec nqn format
[14:25:56] <JoeGruher>   "message": "Invalid parameters",   "code": -32602
[14:28:13] <drv> no, we check that it's a valid NQN, so it won't allow you to make one without a : in it
[14:28:22] <drv> but you could patch that out for testing
[14:28:45] <drv> the check is the spdk_nvmf_valid_nqn call in spdk_nvmf_subsystem_create
[14:29:01] <JoeGruher> so how do you use the fio nvme plugin for nvmeof, if fio can't handle a ':', and the spdk target won't let you omit the ':' in the NQN
[14:29:34] <JoeGruher> it seems like it renders that feature useless?  why does the fio nvme plugin support rdma transport at all then?
[14:40:35] <bwalker>  JoeGruher: we just found the bug in the bdev fio_plugin with keep alive
[14:40:40] <bwalker> trying to come up with a fix
[14:41:58] <JoeGruher> cool
[14:42:27] <bwalker> you can use an invalid NQN with the kernel target that doesn't have a ':' in it
[14:42:32] <bwalker> because the kernel allows for invalid NQNs
[14:42:37] <bwalker> so then you could use our fio nvme plugin
[14:42:50] <bwalker> but I agree - this is a problem
[14:42:56] <bwalker> and I'm not entirely sure how to solve itg
[14:44:27] <JoeGruher> i'd suggest just not enforcing valid NQNs, it doesn't really seem necessary
[14:44:38] <JoeGruher> as you mentioned, the kernel target doesn't enforce, no one seems to mind :)
[14:45:09] <JoeGruher> I commented out the valid NQN check and created a non-spec NQN and I am successfully running the FIO NVMe plugin against SPDK target now, woohoo
[14:45:48] <drv> maybe we can downgrade the valid NQN check to a warning
[14:51:20] <jimharris> bwalker: i just went through all of the poller functions in spdk - the iscsi initiator wasn't returning the right values (it was always return 0 indicating idle) - but otherwise they are all returning a correct value (including returning -1 if they don't know)
[14:53:31] <bwalker> that's not terrible then
[14:54:00] <bwalker> if it really doesn't hurt performance to add those extra spdk_get_ticks, we can move ahead with that then
[14:54:13] <bwalker> and clean it up once the pollers are cleaned up
[14:55:16] <jimharris> i ran some more tests on my system and put the results in gerrithub
[14:55:35] <jimharris> we did base our original thinking on event_perf, which does show a degradation with vishal's patch
[14:55:45] <jimharris> but a "real" workload shows no difference
[14:56:57] <drv> yeah, I'm in favor of removing the once-every-5 check just for simplicity, if nothing else
[15:04:26] *** Quits: JoeGruher (c037362d@gateway/web/freenode/ip.192.55.54.45) (Quit: Page closed)
[15:05:05] *** Quits: gila (~gila@static.214.50.9.5.clients.your-server.de) (Ping timeout: 240 seconds)
[15:08:16] *** Joins: gila (~gila@5ED74129.cm-7-8b.dynamic.ziggo.nl)
[15:25:11] *** Joins: Jianjian (~jianjian@208.185.211.6)
[15:58:52] <drv> jimharris: you're probably already on top of it, but I added some comments about bsdump: https://review.gerrithub.io/#/c/spdk/spdk/+/414479/
[15:59:35] <bwalker> you also don't dump the masks right now
[15:59:51] <jimharris> yeah - there's a lot of stuff that isn't getting dumped yet
[16:00:00] <jimharris> snapshot related stuff, per-blob flags
[16:07:41] <jimharris> drv: fixed two of your comments and replied to the other
[16:12:01] <drv> jimharris: I see where the xattr_name gets terminated in the blobstore.c code, but this is actually the xattr value for the xattr named "name"
[16:12:04] <drv> if that's not confusing enough :)
[16:12:37] <drv> also, I think peluse factored out the hex dump thing from log, so we could use that here if you wanted to
[16:13:04] <drv> spdk_trace_dump() takes a FILE*
[16:15:46] <jimharris> oh - that's where it was, I was looking for it but couldn't find it
[16:17:05] *** Quits: Jianjian (~jianjian@208.185.211.6) (Remote host closed the connection)
[16:17:32] *** Joins: Jianjian (~jianjian@208.185.211.6)
[16:17:56] <drv> probably could use a better name
[16:22:13] *** Quits: Jianjian (~jianjian@208.185.211.6) (Ping timeout: 256 seconds)
[16:23:55] <jimharris> i think i'll keep my hex dump function for now
[16:24:07] <jimharris> i just tried it out but i don't care for how it looks in this case
[16:24:33] <drv> yeah, that's fine
[16:33:12] <jimharris> bsdump output in case you're interested
[16:33:13] <jimharris> http://spdk.intel.com/public/spdk/builds/review/7c62ff12fc63e6e282184e68c4d52158a5f2ae2d.1528499432/fedora-04/rocksdb/bsdump.txt
[16:33:47] <jimharris> (that's an intel-internal link)
[16:34:14] <drv> now we just need it to fail :)
[16:40:46] <jimharris> exactly!
[16:41:31] <jimharris> i was telling bwalker earlier that i already found a pseudo-bug with this bsdump tool - we don't coalesce unallocated clusters into a single extent
[20:43:19] *** Joins: Jianjian (~jianjian@c-73-231-38-189.hsd1.ca.comcast.net)
[21:40:45] *** Quits: Jianjian (~jianjian@c-73-231-38-189.hsd1.ca.comcast.net) (Remote host closed the connection)
[21:41:26] *** Joins: Jianjian (~jianjian@c-73-231-38-189.hsd1.ca.comcast.net)
[21:45:05] *** Quits: darsto (~darsto@89-68-12-72.dynamic.chello.pl) (Ping timeout: 240 seconds)
[21:50:43] *** Joins: darsto (~darsto@89-68-12-72.dynamic.chello.pl)
[22:30:31] *** Quits: Jianjian (~jianjian@c-73-231-38-189.hsd1.ca.comcast.net) (Remote host closed the connection)
[22:31:06] *** Joins: Jianjian (~jianjian@c-73-231-38-189.hsd1.ca.comcast.net)
[22:35:30] *** Quits: Jianjian (~jianjian@c-73-231-38-189.hsd1.ca.comcast.net) (Ping timeout: 260 seconds)