[00:10:38] *** Joins: dlw1 (~Thunderbi@114.255.44.143) [00:12:17] *** Quits: dlw (~Thunderbi@114.255.44.143) (Ping timeout: 245 seconds) [00:12:17] *** dlw1 is now known as dlw [00:52:29] *** Joins: tkulasek (~tkulasek@134.134.139.72) [03:41:09] *** Joins: gila (~gila@5ED74129.cm-7-8b.dynamic.ziggo.nl) [03:53:53] *** Quits: dlw (~Thunderbi@114.255.44.143) (Ping timeout: 248 seconds) [08:27:10] *** Joins: tomzawadzki (~tomzawadz@192.55.54.40) [09:35:14] *** Joins: tzawadzki (~tomzawadz@192.55.54.40) [09:35:15] *** Quits: tomzawadzki (~tomzawadz@192.55.54.40) (Remote host closed the connection) [09:43:25] two patch sets out for open channel [09:44:05] I need to pick easier code reviews to ease myself into monday morning anyway [09:53:46] yeah - those are on my todo list for today [09:54:35] drv: can you take a look at this patch? https://review.gerrithub.io/#/c/spdk/spdk/+/407126/ i marked it -1 but would like your input [09:56:04] yeah, I glanced at that once before and didn't know exactly what to do with it [09:56:13] need to think about how that should work, but I agree with your comment in general [09:57:20] i like the idea of higher-order RPC functionality [09:57:39] or rather higher-order configuration functionality that is built on a series of RPC calls [09:57:49] personally, I think we should also change how save_config works so it has less smarts in rpc.py [09:58:08] yeah I thought it looked neat [09:58:54] (imo, save_config should really just be a single call to a "get_subsystem_config"-type API that returns all subsystem configs) [09:58:55] bwalker: my question on the patch is whether it all belongs in scripts/rpc.py? or a new python script [09:59:16] probably easier to keep it in a separate file, just for future auditing purposes [09:59:31] drv: could you add this feedback to the gerrithub review? [09:59:48] yes, although the save_config stuff is actually code that's already checked in, not really relevant to this review [10:00:36] but we definitely should not have all this hard-coded magic in bdev.py [10:00:57] also not sure why we need this functionality at all - can't we just restart the app? [10:11:08] jimharris: any thoughts on adding -fno-strict-aliasing for DPDK? https://review.gerrithub.io/#/c/spdk/dpdk/+/411767/ [10:11:31] this could theoretically have a performance impact, but I'm also surprised DPDK doesn't turn it off already for correctness (I'm sure they're doing lots of crazy type punning) [10:12:02] (based on Seth's tests, this patch is sufficient to get us working on CentOS 6 again) [10:12:28] *** Joins: travis-ci (~travis-ci@ec2-54-81-215-159.compute-1.amazonaws.com) [10:12:29] (spdk/master) test/common: Make common pmap call more generic (Seth Howell) [10:12:29] Diff URL: https://github.com/spdk/spdk/compare/3c2044e9aa5b...27d7bac9a001 [10:12:29] *** Parts: travis-ci (~travis-ci@ec2-54-81-215-159.compute-1.amazonaws.com) () [10:17:13] i hate having to do that by default for all of the gcc versions from the last 9 years that don't need it (gcc v4.4 was released in 2009) [10:17:30] yeah [10:17:44] I don't know if there's some easy way to detect just the one we care about [10:18:04] could we put something in our spdk makefiles (maybe just the dpdkbuild one?) to detect the gcc version and specify EXTRA_CFLAGS/DPDK_CFLAGS? [10:18:27] or even punt on it from our makefiles entirely [10:18:45] and just put the onus on the user (we could do something in our autobuild.sh for example) [10:24:44] well, putting it in autobuild.sh would only fix it for our automated tests, not normal users just running configure/make [10:24:54] we could possibly put it in the Makefiles somewhere, though [10:28:43] i guess i'm not convinced that "support for centos 6" means you don't have to do a little bit of extra work [10:29:30] i'm ok with having it in the Makefiles [10:54:30] hey if anyone wants a tip on how to make your server reboot by feeding bogus HW descriptors into QAT, I have that like totally mastered! [10:56:22] spin that as "Optimized Method and Apparatus for Rebooting Servers" and you could have a patent on your hands [11:00:54] Yes, and it can be done locally or remotely, it's brilliant! [11:01:24] are you in the lab right now? [11:01:33] drv that is [11:01:40] yep [11:01:57] cover your ears :) [11:02:18] preparing for takeoff [11:02:46] hehe, I'll keep those experiments to a minimum on my remotes ;) [11:18:42] jimharris: can you look at https://review.gerrithub.io/#/c/spdk/spdk/+/407357/? [11:19:16] there is an intermittent failure on the build pool that I don't want to investigate until I know that the above patch doesn't already fix it [11:25:39] *** Joins: travis-ci (~travis-ci@ec2-54-81-215-159.compute-1.amazonaws.com) [11:25:40] (spdk/master) nvme: hold ctrlr_lock when setting timeout callback (Daniel Verkamp) [11:25:40] Diff URL: https://github.com/spdk/spdk/compare/27d7bac9a001...943c7c69c357 [11:25:40] *** Parts: travis-ci (~travis-ci@ec2-54-81-215-159.compute-1.amazonaws.com) () [11:37:45] *** Joins: travis-ci (~travis-ci@ec2-54-227-75-141.compute-1.amazonaws.com) [11:37:46] (spdk/master) nvmf: move outstandling req list in spdk_nvmf_qpair (Ziye Yang) [11:37:46] Diff URL: https://github.com/spdk/spdk/compare/943c7c69c357...582d8f86a20c [11:37:46] *** Parts: travis-ci (~travis-ci@ec2-54-227-75-141.compute-1.amazonaws.com) () [11:52:29] *** Quits: tkulasek (~tkulasek@134.134.139.72) (Ping timeout: 252 seconds) [12:38:32] jimharris: ping on https://review.gerrithub.io/#/c/spdk/spdk/+/410958/ - this is enabling work that we'll need to do to add timeout support on RDMA transport [12:41:24] I need to think some more about how AER handling should work with multiple processes, but I think at least making it not crash in multi-process is the right first step [12:47:31] *** Joins: travis-ci (~travis-ci@ec2-54-81-215-159.compute-1.amazonaws.com) [12:47:32] (spdk/master) bdev/passthru: remove trailing space in config dump (Daniel Verkamp) [12:47:33] Diff URL: https://github.com/spdk/spdk/compare/582d8f86a20c...571a4615c3ca [12:47:33] *** Parts: travis-ci (~travis-ci@ec2-54-81-215-159.compute-1.amazonaws.com) () [13:31:45] *** Joins: Alex_____ (bcf3adf1@gateway/web/freenode/ip.188.243.173.241) [13:32:21] Hi everyone. I have some question about spdk bdev module. Could anyone help ? [13:34:42] *** Joins: lhodev (~lhodev@inet-hqmc03-o.oracle.com) [13:35:29] Any spdk developers here ? [13:40:03] *** Quits: Alex_____ (bcf3adf1@gateway/web/freenode/ip.188.243.173.241) (Quit: Page closed) [14:02:06] *** Quits: tzawadzki (~tomzawadz@192.55.54.40) (Remote host closed the connection) [14:02:15] *** Joins: tzawadzki (~tomzawadz@192.55.54.40) [14:02:26] *** Quits: tzawadzki (~tomzawadz@192.55.54.40) (Remote host closed the connection) [14:02:34] *** Joins: tzawadzki (~tomzawadz@192.55.54.40) [14:10:55] hi Alex____ - ask away [14:13:37] *** Quits: tzawadzki (~tomzawadz@192.55.54.40) (Ping timeout: 248 seconds) [14:46:55] Hey guys, is the malloc bdev the only code that uses the copy-engine stuff? When I grep through the code, outside the copy-engine lib code, itself, my search suggests that bdev_malloc.c is the only thing that makes calls to the spdk_copy_XXXX APIs. [14:59:46] I believe you are correct [15:08:20] bwalker: Thanks Ben. I'm just seeking to confirm a conclusion that unless one is targeting a malloc bdev -- at least presently -- we otherwise don't employ any IOAT's that exist on a system. [15:09:17] correct - there is the IOAT driver that code could use explictly [15:09:28] but in terms of automatic use within an application, just the malloc bdev [15:10:14] right, bdev_malloc -> copy_engine -> one of ioat or memcpy [15:24:20] My curiosity on this copy-engine topic stems from some research into a presumably orthogonal issue; namely, on one system someone reported some really sluggish behavior. When I ran strace against nvmf_tgt, I saw a number of write()'s "failing" with -EAGAIN to /dev/infiniband/rdma_cm. I noted someone else observed this quite a while ago (see link: https://lists.01.org/pipermail/spdk/2016-August/000098.html). In that threaded discussion, the [15:24:20] OP inquired about running examples/ioat/verify/verify and observed output stating "Not enough ioat channels found...." [15:26:03] rdma_cm is the RDMA connection manager and it is used for generic event reporting when establishing RDMA connections [15:26:16] it's just a management interface - so if the management interface is returning -EAGAIN [15:26:25] then the network has something going on [15:27:02] I've concluded that the write() complaints to rdma_cm is entirely unrelated to the use of an IOAT. ;-) [15:52:49] *** Quits: lhodev (~lhodev@inet-hqmc03-o.oracle.com) (Remote host closed the connection) [16:23:41] *** Joins: travis-ci (~travis-ci@ec2-54-227-75-141.compute-1.amazonaws.com) [16:23:42] (spdk/master) nvme/rdma: create per-process controller struct (Daniel Verkamp) [16:23:42] Diff URL: https://github.com/spdk/spdk/compare/571a4615c3ca...3148c4807973 [16:23:42] *** Parts: travis-ci (~travis-ci@ec2-54-227-75-141.compute-1.amazonaws.com) () [16:54:10] i confirmed we should switch to using lrand48_r instead of rand_r [16:54:45] rand_r's RNG doesn't give enough variability [16:55:11] the question is whether lrand48_r is good enough so that we don't have to implement our own random map similar to fio [16:57:14] we already link to openssl for other stuff (md5 in iSCSI CHAP), so maybe we could use theirs [16:57:25] depending on how good we want it to be :) [16:57:47] i want something that isn't going to cause too much overhead :) [16:58:09] yeah, the openssl one is probably too slow (presumably reads from /dev/urandom or something similar) [16:58:54] i wrote a little test app/simulator - 2048M IO to a 512M LBA device hits 502M of the LBAs with lrand48_r [16:59:17] but only 113M with rand_r [16:59:39] in fact, it doesn't matter how many more I/O you run - it stays at 113M (i.e. the RNG repeats itself) [17:00:08] hmm, the lrand48 man page says it only generates numbers in [0, 2^31], so this will already start to be a problem with >2TB disks with 512-byte blocks [17:00:27] or actually 1 TB since it's only 2^31 [17:00:56] we could just generate two numbers and concatenate them or something, though [17:00:56] good point [17:01:09] yeah - that would work fine [17:01:15] let me simulate that [17:01:19] e.g. (uint64_t)lrand48_r() << 32) | lrand48_r() [17:01:30] thanks for showing me how to do bit math :) [17:01:33] haha [17:01:37] well, actually that's not quite right either [17:01:45] since it's only 2^31, the shift should probably be 31 instead [17:02:08] we better do three calls then in case 2^62 isn't enough [17:02:09] :) [17:03:26] does FreeBSD have lrand48_r()? [17:03:34] in a package [17:03:39] i already checked [17:03:59] so i'll have to ask seth to...oh crap [17:04:36] maybe we can just #ifdef our way around that for now (implement a fallback based on rand_r() or something) [17:04:56] or install the package, but then we need to add it to the deps and everything [17:05:16] i was just thinking to install the package [17:06:08] I guess that's not too bad [17:12:24] for anyone reading IRC and wants to pick up a small development task - issue #297 on github has details on improving the perf/bdevperf random number generator [17:12:38] please post a comment in the github issue if you are picking it up [17:13:17] I can go ahead and install the package on the FreeBSD test machine, as long as we remember to add it to pkgdep.sh later [18:44:05] *** Joins: dlw (~Thunderbi@114.255.44.143) [18:53:47] *** Joins: dlw1 (~Thunderbi@114.255.44.143) [18:53:47] *** Quits: dlw (~Thunderbi@114.255.44.143) (Read error: Connection reset by peer) [18:53:47] *** dlw1 is now known as dlw