[00:49:04] *** Joins: tomzawadzki (~tomzawadz@192.55.54.38) [00:49:18] *** Quits: tomzawadzki (~tomzawadz@192.55.54.38) (Remote host closed the connection) [00:49:21] *** Joins: tzawadzki (~tomzawadz@192.55.54.38) [01:39:39] *** Guest13780 is now known as darsto [01:40:27] *** Quits: gila (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl) (Ping timeout: 240 seconds) [01:51:52] *** Joins: tkulasek (~tkulasek@134.134.139.72) [02:24:58] *** Quits: ziyeyang_ (~ziyeyang@192.55.54.42) (Quit: Leaving) [02:50:56] *** Joins: gila (~gila@94.212.217.200) [03:13:01] *** Joins: darsto_ (~dstojacx@89-68-135-211.dynamic.chello.pl) [06:48:20] *** Joins: kragnizqy (~muvgf@182.16.175.115) [06:50:23] *** Quits: kragnizqy (~muvgf@182.16.175.115) (Remote host closed the connection) [07:22:10] *** Parts: lhodev (~Adium@66-90-218-190.dyn.grandenetworks.net) () [07:22:14] *** Joins: lhodev (~Adium@66-90-218-190.dyn.grandenetworks.net) [07:31:54] *** Quits: param (uid285755@gateway/web/irccloud.com/x-lvvbwgatemsuwgit) (Quit: Connection closed for inactivity) [08:04:15] *** Joins: gila_ (~gila@5ED4D9C8.cm-7-5d.dynamic.ziggo.nl) [08:04:36] *** Quits: gila (~gila@94.212.217.200) (Read error: Connection reset by peer) [08:13:21] peluse: do you plan for crypto virtual bdev could use separate core for crypto operations? [08:34:35] *** Quits: tzawadzki (~tomzawadz@192.55.54.38) (Ping timeout: 240 seconds) [09:01:26] *** Joins: Ranjit (6ac8ea81@gateway/web/freenode/ip.106.200.234.129) [09:01:44] Hi anybody here? [09:10:44] hi Ranjit [09:41:39] *** Joins: param (uid285755@gateway/web/irccloud.com/x-ocjkhqtnvquzxpxh) [09:46:24] *** Joins: param_ (6ad01ed1@gateway/web/freenode/ip.106.208.30.209) [09:47:30] jim : sure we can add the patch.. [09:47:57] jim : drv: what is the next card i can work.. [09:52:44] *** Quits: param_ (6ad01ed1@gateway/web/freenode/ip.106.208.30.209) (Ping timeout: 260 seconds) [09:58:34] pwodkowx, when using the virutal crypto device (all SW) that DPDK provides the current patch will create one virtual crypto device per channel that is allocated by the application that's using it so you won't have a dedicated core for crypto, you'll have a dedicated crypto per thread [09:58:55] haven't started looking into HW assist crypto yet but that's next... [10:01:43] I'm asking, because there will be hard to tell why we have performance drop if one of N bdevs will be pure SW. This will introduce huge lag in polling other bdevs (or virtqueues in vhost) [10:03:23] so, I think there should be an option to move pure SW crypto bdev to different core. [10:04:26] I see your point however crypto won't be something that is automatically on or configured, it would be a choice for a specific application but still not a bad idea for sure [10:04:35] how would we do that, with a vbdev I mean? [10:05:59] optional parameter - the coremask where to do the crypto op? [10:06:37] or lcore? [10:06:48] just thinking :P [10:08:08] yeah, just haven't thought about it for a vbdev, yeah I'll put it on the list for things to consider, thanks! [10:10:06] also, this might be not so virtual problem. in vhost scsi we have up to 8 LUNs per controller. All those are polled on single core. If one bdev is SW crypto device it will impact all other LUNs also if vhost uses multiqueue. [10:21:51] For the spdk_nvme_ns_cmd_write_with_md(), is the write transferred as a single dma or with multiple dmas? [10:22:09] To the controller [10:25:36] I mean one dma for data and metadata [10:40:13] the _with_md() functions are for use when your device is configured for separate metadata buffers (not "extended LBA"/inline data + metadata), so metadata will be in a separate location than data [10:40:43] so the device will definitely have to do multiple DMA transfers [10:54:07] https://ci.spdk.io/spdk/builds/review/f1d5f19415016ec444b370c8112370bcef0f075d.1520962697/ another nvmf crash in _spdk_io_device_attempt_free() [11:01:44] *** Quits: tkulasek (~tkulasek@134.134.139.72) (Ping timeout: 260 seconds) [11:24:22] *** Quits: Ranjit (6ac8ea81@gateway/web/freenode/ip.106.200.234.129) (Quit: Page closed) [11:51:54] *** Quits: param (uid285755@gateway/web/irccloud.com/x-ocjkhqtnvquzxpxh) (Quit: Connection closed for inactivity) [12:34:36] *** Joins: G804L5ruskie (~jxive@134.73.200.37) [12:36:41] *** Quits: G804L5ruskie (~jxive@134.73.200.37) (Remote host closed the connection) [12:37:25] *** Joins: tehvomtt (~zzajwkd@134.73.200.44) [12:38:47] *** Quits: tehvomtt (~zzajwkd@134.73.200.44) (Remote host closed the connection) [13:57:54] *** Joins: tkulasek (tkulasek@nat/intel/x-vgghtnuawfayfabl) [14:02:13] *** Quits: tkulasek (tkulasek@nat/intel/x-vgghtnuawfayfabl) (Ping timeout: 240 seconds) [14:07:55] drv: can you take a look at https://review.gerrithub.io/#/c/401717/? I addressed your comments in the latest patch set. [14:10:49] looks good to me [14:11:06] guess I will just have to unlearn my habit of typing ./unit :) [14:16:06] bwalker: are you there? If not, anyone know when Ben returns from sabbatical? [14:18:42] I think he's back at the beginning of April [14:19:14] *envy* [14:28:22] drv: I'm looking at making changes to lib/iscsi and noting his work, I hoped I might ask him a question or two. In iscsi_subsystem.c, spdk_get_pdu() currently abort()'s on a failure of spdk_mempool_get(). Am looking to return a NULL instead and am looking at the callers. spdk_iscsi_read_pdu() makes that call and returns an int. Currently defined return vals (enum spdk_error_codes) appear to include: SPDK_SUCCESS, SPDK_ISCSI_CONNECT [14:28:22] ION_FATAL and SPDK_PDU_FATAL. Semantically, the last one caught my eye and thought I might use that. However, an upstream caller of the latter, checks the return val for 0 and also only for SPDK_ISCSI_CONNECTION_FATAL. So, I'm left wondering whether I should "simply" return SPDK_ISCSI_CONNECTION_FATAL instead, or alter the upstream caller to either also check for SPDK_PDU_FATAL -or- perhaps any non-zero (which it currently does not). [14:44:06] it seems to me that SPDK_PDU_FATAL isn't really needed anymore - the only place where that gets returned would result in the connection getting torn down in which case why not just use SPDK_ISCSI_CONNECTION_FATAL instead? [14:45:45] so i think returning SPDK_ISCSI_CONNECTION_FATAL is correct [14:47:20] i think the abort() was there originally because the pdu pool is sized such that exhausting pdus should never be possible [14:48:24] jimharris: Thx Jim. I'll just do that (use CONNECTION_FATAL). Oh, and no one will ever need more than 640K of RAM ;-) [14:49:43] i'm just looking at all of the callstacks that have to be modified to handle NULL pdu handling - we will definitely need that broken up into multiple patches - looks like it could potentially be a lot of code changes [14:58:10] There *are* indeed a number of callers. I was walking them one by one. Hmmm. Not sure what criteria to employ for breaking them up. Draw up a diagram to see if there are perhaps any that grouped by some common call stack? If none, then just do each one separately? That'd be some 12 separate patches or so. Does I assume correctly that this would then be done a series such that they could be reviewed separately, but none of them woul [14:58:10] d be merged until the entire series has been reviewed and approved? [15:01:17] Yikes, let me re-word that last bit: Do I assume correctly that this — i.e. a series of patches — would be done such that each patch in the series could be reviewed separately, but none of them would be merged until the entire series has been reviewed and approved? [15:01:48] ideally they could be done such that we could start merging them one at a time once they are ready [15:14:15] Hmmm. I'm trying to think of how to achieve that. If the lowest common denominator — i.e. spdk_get_pdu() — is changed to return NULL (instead of abort()'ing), then I don't see how I can only change just one (or a partial set) of callers to check for NULL and do the right thing leaving the others "exposed" if they're not checking for NULL (nor abort()'ing); i.e. 3 Musketeers: all for one and one for all. Maybe invert the process? [15:14:15] That is, code up each (separately) to check for NULL. And then for the last patch, alter spdk_get_pdu() to return the NULL on failure. The initial set might look, um, "weird" on the timeline as NULL at that time wouldn't be getting returned — we'd still have the abort() in place — but at least it wouldn't break the existing behavior. What do you think? [15:32:32] i like the idea of inverting the process [15:33:20] some of them are easy - for example, any of the spdk_iscsi_op_* calls to spdk_get_pdu() - if they return NULL then they just return error back to spdk_iscsi_execute which already handles it [15:33:31] spdk_iscsi_read_pdu() is also straightforward [15:33:41] Yeah, me too. I think I'm just gonna tackle one at a time per patch allowing me to walk the call stacks up fully on each making any changes, if needed. [15:33:56] Even for the super easy ones, then it'll be a quick review ;-) [15:34:23] sounds good [15:35:44] I'll just submit though one at a time til we can through each merged as almost all of them are in the same source file and I'd like to avoid gerrit perceiving each as a merge conflict. [15:36:46] *** Quits: pzedlews_ (uid285827@gateway/web/irccloud.com/x-hugstytgdbuwcsiu) (Quit: Connection closed for inactivity) [15:46:32] *** Joins: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) [16:36:04] *** Joins: pzedlews_ (uid285827@gateway/web/irccloud.com/x-srzhbobyvdlakmta) [17:07:38] drv - was talking with ziye last night about the iscsi initiator failures - one of the problems is that libiscsi as packaged on some older distros has some blatant bugs in the header files which fail the build due to -Werror [17:08:00] one of the distros we are testing? [17:08:18] both centos7 (which isn't that old) and ubuntu14.04 [17:08:20] also, I thought warnings in system headers were ignored or something, but maybe I'm wrong [17:08:28] https://ci.spdk.io/spdk/builds/review/b041fa27ef31dfa0ef955a1d9951dcfdff1567d0.1520922199/ [17:09:02] is this scsi_task_set_iov_in function just missing in their headers? [17:09:11] (can we add our own declaration to get around that?) [17:09:19] hmmm - maybe [17:09:48] oh, there's different errors on centos7 vs ubuntu [17:10:03] the centos7 one is about scsi_set_uint16â [17:10:45] so one thing i noticed while playing around with this (including explicitly adding -Wno-error to the iscsi makefile) was that any per-Makefile CFLAGS get put at the beginning not the end of the compilation argument list [17:11:09] so even putting -Wno-error in CFLAGS doesn't work (not that it's the solution we want anyways) [17:14:17] yeah - ubuntu14.04 is just a function i'm using that doesn't even exist [17:14:39] maybe we can just drop some of these distros as having too-old libiscsi [17:14:52] or install a newer version in /usr/local if needed [17:15:52] upstream libiscsi added this scsi_task_set_iov_out/in in November 2012 [17:16:28] drop how? [17:17:39] add a configure flag for it and make the user turn it on explicitly, then detect existence of the header file in test/common/autotest_common.sh [17:17:42] like we do for librbd etc. [17:18:14] looks like Ubuntu 16.04 has libiscsi-dev 1.12.0 [17:19:06] oh, never mind, we were looking at Ubuntu 14.04 [17:19:25] that has libiscsi-dev 1.4.0 [17:19:47] and CentOS 7 has libiscsi-devel-1.9.0 [17:20:41] can we work around not having scsi_task_set_iov_in? or is that required for our use? [17:20:44] you think just check for existence of the header file? [17:21:00] well, that plus don't install the packages on the older distros in our test pool [17:21:02] not sure [17:21:14] but then we'd still need to do something about pkgdep.sh [17:21:52] i think we take libiscsi out of pkgdep.sh in that case [17:22:11] we don't have rbd in pkgdep.sh [17:22:24] yeah, although that's probably an oversight [17:22:40] I guess we could keep it in pkgdep.sh but still don't enable it by default - the header file detection would just be for autotest [17:23:47] and then if someone tries --enable-iscsi on centos7 and gets a build error...we just say you need to build libiscsi from source? [17:24:34] yeah [17:24:43] it's unfortunate that they ship such an old version [17:24:43] maybe our configure script could pull this LIBISCSI_API_VERSION from iscsi.h and make some decisions [17:25:07] libiscsi 1.9.0 is from 2013 [17:25:38] yeah, so far we don't do anything like that, but we could start if we have to [17:34:10] how's this for a hack [17:34:47] echo '#include "iscsi/iscsi.h"' | gcc -E - | grep "iscsi/iscsi.h" | head -1 | awk '{print $3}' | xargs cat | grep LIBISCSI_API_VERSION | awk '{print $3}' [17:35:05] * jimharris waits for drv to come up with something way simpler [17:36:13] that probably won't work with cross-compilation (and specifying a different $CC in general) [17:37:10] how does autotools do this? [17:37:34] configure knows which compiler you are using (picks up $CC etc. environment variables) and does test compiles, pretty much what you're doing here [17:37:53] currently, our configure doesn't know about $CC at all, but we could teach it [17:38:12] or just use autotools [17:38:13] * jimharris ducks [17:38:23] yeah [17:38:37] but that is no fun :) [17:39:11] use cc instead of gcc? [17:39:27] i didn't catch the cross-compilation comment [17:39:45] the cross compilation thing is mostly theoretical (I haven't tried it), but I think we are all set up to do it now [17:40:01] if you do something like 'make CC=my-crazy-non-x86-cc' [17:40:12] oh - so just specifying some CC [17:40:13] got it [17:40:17] we'd have to extend that to configure [17:40:26] and make it dump the right value of CC in some config file [17:41:10] I think for the libiscsi case, we can just let the build fail - no need to go to extremes to try to detect it beforehand [17:42:04] hmmm... seems like I might have stumbled into a size limitation on the AESNI PMD over something just over 32K :( Still investigating though [17:43:09] could configure use mk/cc.mk to get the CC value? [17:43:37] i guess you don't have mk/cc.mk at this point though [17:44:32] right, that would all have to be done in configure as well [17:45:05] I guess it's already split out into its own script, so that might not be too hard [17:45:45] ugh [17:45:55] we could just make libiscsi another git submodule [17:47:41] that might be the least painful option, although I'd prefer to avoid it if possible [17:47:50] as long as we can build against system libiscsi too (same as DPDK), I guess it's not too bad [18:06:53] Speaking of iscsi…. in spdk_iscsi_read_pdu(), upon detecting (data_len > max_segment_len) the function invokes spdk_iscsi_reject() and returns SPDK_SUCCESS. There's a comment therein "…We do not want to drop the connection, so return SUCCESS here so that the caller will continue to attempt reading PDUs." The issue? There's no check of the return value of spdk_iscsi_reject() (but it is sampled elsewhere). So, I *want* to sample [18:06:53] the spdk_iscsi_reject() return value (for the case of a spdk_get_pdu() failure). Thus, do I check that return value of spdk_iscsi_reject(), and if < 0, return a SPDK_ISCSI_CONNECTION_FATAL? Or do I leave the code as-is, ignoring the return value of spdk_scsi_reject()? [18:11:04] P.S. I vote option A — check rval of spdk_iscsi_reject() and return ever it returns. [18:15:27] Hi Lance, I got your thought. Please give me a time to think. It may be helpful for us to add it to Gerrit or Trello's iSCSI board to discuss. [18:16:07] https://trello.com/b/qrzKvSYm/iscsi [18:16:18] Shuhei, ok, and thank you. [18:33:51] And returning SPDK_ISCSI_CONNECTION_FATAL is good for me. [18:34:17] I don't know the reason of the comment too. [18:35:04] Would you push the patch to Gerrit? Other experts will be able to review it. [18:36:08] I'm not sure which is better, forwarding the return value of reject() or returning the FATAL yet. [18:37:36] But returning not -ENOMEM but FATAL may be reasonable because spdk_iscsi_conn_handle_incoming_pdus() cannot handle -ENOMEM appropriately. [18:38:24] Besides, as you read already, spdk_get_pdu() will not fail because it will abort before return if any error. [18:40:13] I hope I could understand your insight correctly. [18:41:06] Thanks :) [18:45:39] *** Quits: pzedlews_ (uid285827@gateway/web/irccloud.com/x-srzhbobyvdlakmta) (Quit: Connection closed for inactivity) [18:52:54] *** Quits: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) (Ping timeout: 260 seconds) [20:47:17] Shuhei: the work I'm presently pursuing involves replacing, wherever possible, abort() calls in library code with return's instead. The strategy for now at least is to replace abort()'s when the nature of the failure is some type of resource failure (e.g. spdk_mempool_get()) as opposed to a programming error. So, the plan is that I will be replacing the abort() in spdk_get_pdu() to a return of NULL. As such, I'm evaluating all of the c [20:47:18] alling functions up the stack to ensure they handle a NULL and "do the right thing". As there are many such calls to spdk_get_pdu(), I am altering the callers first (to test for NULL) in separate patches to make it more manageable before finally changing spdk_get_pdu() itself. [21:55:45] *** Joins: Shuhei (caf6fc61@gateway/web/freenode/ip.202.246.252.97) [23:34:09] *** Joins: tomzawadzki (tomzawadzk@nat/intel/x-bxeeyuirkvlexmma) [23:52:53] *** Quits: tomzawadzki (tomzawadzk@nat/intel/x-bxeeyuirkvlexmma) (Ping timeout: 240 seconds)