From: ebiederm@xmission.com (Eric W. Biederman) To: Christian Brauner <christian.brauner@ubuntu.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com>, Andrei Vagin <avagin@gmail.com>, adobriyan@gmail.com, viro@zeniv.linux.org.uk, davem@davemloft.net, akpm@linux-foundation.org, areber@redhat.com, serge@hallyn.com, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Pavel Tikhomirov <ptikhomirov@virtuozzo.com>, <linux-api@vger.kernel.org> Subject: Re: [PATCH 00/23] proc: Introduce /proc/namespaces/ directory to expose namespaces lineary Date: Mon, 17 Aug 2020 13:53:52 -0500 Message-ID: <87eeo59k8v.fsf@x220.int.ebiederm.org> (raw) In-Reply-To: <20200817174745.jssxjdcwoqxeg5pu@wittgenstein> (Christian Brauner's message of "Mon, 17 Aug 2020 19:47:45 +0200") Christian Brauner <christian.brauner@ubuntu.com> writes: > On Mon, Aug 17, 2020 at 10:48:01AM -0500, Eric W. Biederman wrote: >> >> Creating names in the kernel for namespaces is very difficult and >> problematic. I have not seen anything that looks like all of the >> problems have been solved with restoring these new names. >> >> When your filter for your list of namespaces is user namespace creating >> a new directory in proc is highly questionable. >> >> As everyone uses proc placing this functionality in proc also amplifies >> the problem of creating names. >> >> >> Rather than proc having a way to mount a namespace filesystem filter by >> the user namespace of the mounter likely to have many many fewer >> problems. Especially as we are limiting/not allow new non-process >> things and ideally finding a way to remove the non-process things. >> >> >> Kirill you have a good point that taking the case where a pid namespace >> does not exist in a user namespace is likely quite unrealistic. >> >> Kirill mentioned upthread that the list of namespaces are the list that >> can appear in a container. Except by discipline in creating containers >> it is not possible to know which namespaces may appear in attached to a >> process. It is possible to be very creative with setns, and violate any >> constraint you may have. Which means your filtered list of namespaces >> may not contain all of the namespaces used by a set of processes. This > > Indeed. We use setns() quite creatively when intercepting syscalls and > when attaching to a container. > >> further argues that attaching the list of namespaces to proc does not >> make sense. >> >> Andrei has a good point that placing the names in a hierarchy by >> user namespace has the potential to create more freedom when >> assigning names to namespaces, as it means the names for namespaces >> do not need to be globally unique, and while still allowing the names >> to stay the same. >> >> >> To recap the possibilities for names for namespaces that I have seen >> mentioned in this thread are: >> - Names per mount >> - Names per user namespace >> >> I personally suspect that names per mount are likely to be so flexibly >> they are confusing, while names per user namespace are likely to be >> rigid, possibly too rigid to use. >> >> It all depends upon how everything is used. I have yet to see a >> complete story of how these names will be generated and used. So I can >> not really judge. > > So I haven't fully understood either what the motivation for this > patchset is. > I can just speak to the use-case I had when I started prototyping > something similar: We needed a way to get a view on all namespaces > that exist on the system because we wanted a way to do namespace > debugging on a live system. This interface could've easily lived in > debugfs. The main point was that it should contain all namespaces. > Note, that it wasn't supposed to be a hierarchical format it was only > mean to list all namespaces and accessible to real root. > The interface here is way more flexible/complex and I haven't yet > figured out what exactly it is supposed to be used for. > >> >> >> Let me add another take on this idea that might give this work a path >> forward. If I were solving this I would explore giving nsfs directories >> per user namespace, and a way to mount it that exposed the directory of >> the mounters current user namespace (something like btrfs snapshots). >> >> Hmm. For the user namespace directory I think I would give it a file >> "ns" that can be opened to get a file handle on the user namespace. >> Plus a set of subdirectories "cgroup", "ipc", "mnt", "net", "pid", >> "user", "uts") for each type of namespace. In each directory I think >> I would just have a 64bit counter and each new entry I would assign the >> next number from that counter. >> >> The restore could either have the ability to rename files or simply the >> ability to bump the counter (like we do with pids) so the names of the >> namespaces can be restored. >> >> That winds up making a user namespace the namespace of namespaces, so >> I am not 100% about the idea. > > I think you're right that we need to understand better what the use-case > is. If I understand your suggestion correctly it wouldn't allow to show > nested user namespaces if the nsfs mount is per-user namespace. So what I was thinking is that we have the user namespace directories and that the mount code would perform a bind mount such that the directory that matches the mounters user namespace is the root directory. > Let me throw in a crazy idea: couldn't we just make the ioctl_ns() walk > a namespace hierarchy? For example, you could pass in a user namespace > fd and then you'd get back a struct with handles for fds for the > namespaces owned by that user namespace and then you could use > NS_GET_USERNS/NS_GET_PARENT to walk upwards from the user namespace fd > passed in initially and so on? Or something similar/simpler. This would > also decouple this from procfs somewhat. Hmm. That would remove the need to have names. We could just keep a list of the namespaces in creation order. Hopefully the CRIU folks could preserve that create order without too much trouble. Say with an ioctl NS_NEXT_CREATION which takes two fds, and returns a new file descriptor. The arguments would be the user namespace and -1 or the file descriptor last returned fro NS_NEXT_CREATION. Assuming that is not difficult for CRIU to restore that would be a very simple patch. Eric
next prev parent reply other threads:[~2020-08-17 18:57 UTC|newest] Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-07-30 11:59 Kirill Tkhai 2020-07-30 11:59 ` [PATCH 01/23] ns: Add common refcount into ns_common add use it as counter for net_ns Kirill Tkhai 2020-07-30 13:35 ` Christian Brauner 2020-07-30 14:07 ` Kirill Tkhai 2020-07-30 15:59 ` Christian Brauner 2020-07-30 14:30 ` Christian Brauner 2020-07-30 14:34 ` Kirill Tkhai 2020-07-30 14:39 ` Christian Brauner 2020-07-30 11:59 ` [PATCH 02/23] uts: Use generic ns_common::count Kirill Tkhai 2020-07-30 14:30 ` Christian Brauner 2020-07-30 11:59 ` [PATCH 03/23] ipc: " Kirill Tkhai 2020-07-30 14:32 ` Christian Brauner 2020-07-30 11:59 ` [PATCH 04/23] pid: " Kirill Tkhai 2020-07-30 14:37 ` Christian Brauner 2020-07-30 11:59 ` [PATCH 05/23] user: " Kirill Tkhai 2020-07-30 14:46 ` Christian Brauner 2020-07-30 11:59 ` [PATCH 06/23] mnt: " Kirill Tkhai 2020-07-30 14:49 ` Christian Brauner 2020-07-30 11:59 ` [PATCH 07/23] cgroup: " Kirill Tkhai 2020-07-30 14:50 ` Christian Brauner 2020-07-30 12:00 ` [PATCH 08/23] time: " Kirill Tkhai 2020-07-30 14:52 ` Christian Brauner 2020-07-30 12:00 ` [PATCH 09/23] ns: Introduce ns_idr to be able to iterate all allocated namespaces in the system Kirill Tkhai 2020-07-30 12:23 ` Matthew Wilcox 2020-07-30 13:32 ` Kirill Tkhai 2020-07-30 13:56 ` Matthew Wilcox 2020-07-30 14:12 ` Kirill Tkhai 2020-07-30 14:15 ` Matthew Wilcox 2020-07-30 14:20 ` Kirill Tkhai 2020-07-30 12:00 ` [PATCH 10/23] fs: Rename fs/proc/namespaces.c into fs/proc/task_namespaces.c Kirill Tkhai 2020-07-30 12:00 ` [PATCH 11/23] fs: Add /proc/namespaces/ directory Kirill Tkhai 2020-07-30 12:18 ` Alexey Dobriyan 2020-07-30 13:22 ` Kirill Tkhai 2020-07-30 13:26 ` Christian Brauner 2020-07-30 14:30 ` Kirill Tkhai 2020-07-30 20:47 ` kernel test robot 2020-07-30 22:20 ` kernel test robot 2020-08-05 8:17 ` kernel test robot 2020-08-05 8:17 ` [RFC PATCH] fs: namespaces_dentry_operations can be static kernel test robot 2020-07-30 12:00 ` [PATCH 12/23] user: Free user_ns one RCU grace period after final counter put Kirill Tkhai 2020-07-30 12:00 ` [PATCH 13/23] user: Add user namespaces into ns_idr Kirill Tkhai 2020-07-30 12:00 ` [PATCH 14/23] net: Add net " Kirill Tkhai 2020-07-30 12:00 ` [PATCH 15/23] pid: Eextract child_reaper check from pidns_for_children_get() Kirill Tkhai 2020-07-30 12:00 ` [PATCH 16/23] proc_ns_operations: Add can_get method Kirill Tkhai 2020-07-30 12:00 ` [PATCH 17/23] pid: Add pid namespaces into ns_idr Kirill Tkhai 2020-07-30 12:00 ` [PATCH 18/23] uts: Free uts namespace one RCU grace period after final counter put Kirill Tkhai 2020-07-30 12:01 ` [PATCH 19/23] uts: Add uts namespaces into ns_idr Kirill Tkhai 2020-07-30 12:01 ` [PATCH 20/23] ipc: Add ipc " Kirill Tkhai 2020-07-30 12:01 ` [PATCH 21/23] mnt: Add mount " Kirill Tkhai 2020-07-30 12:01 ` [PATCH 22/23] cgroup: Add cgroup " Kirill Tkhai 2020-07-30 12:01 ` [PATCH 23/23] time: Add time " Kirill Tkhai 2020-07-30 13:08 ` [PATCH 00/23] proc: Introduce /proc/namespaces/ directory to expose namespaces lineary Christian Brauner 2020-07-30 13:38 ` Christian Brauner 2020-07-30 14:34 ` Eric W. Biederman 2020-07-30 14:42 ` Christian Brauner 2020-07-30 15:01 ` Kirill Tkhai 2020-07-30 22:13 ` Eric W. Biederman 2020-07-31 8:48 ` Pavel Tikhomirov 2020-08-03 10:03 ` Kirill Tkhai 2020-08-03 10:51 ` Alexey Dobriyan 2020-08-06 8:05 ` Andrei Vagin 2020-08-07 8:47 ` Kirill Tkhai 2020-08-10 17:34 ` Andrei Vagin 2020-08-11 10:23 ` Kirill Tkhai 2020-08-12 17:53 ` Andrei Vagin 2020-08-13 8:12 ` Kirill Tkhai 2020-08-14 1:16 ` Andrei Vagin 2020-08-14 15:11 ` Kirill Tkhai 2020-08-14 19:21 ` Andrei Vagin 2020-08-17 14:05 ` Kirill Tkhai 2020-08-17 15:48 ` Eric W. Biederman 2020-08-17 17:47 ` Christian Brauner 2020-08-17 18:53 ` Eric W. Biederman [this message] 2020-08-04 5:43 ` Andrei Vagin 2020-08-04 12:11 ` Pavel Tikhomirov 2020-08-04 14:47 ` Kirill Tkhai
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=87eeo59k8v.fsf@x220.int.ebiederm.org \ --to=ebiederm@xmission.com \ --cc=adobriyan@gmail.com \ --cc=akpm@linux-foundation.org \ --cc=areber@redhat.com \ --cc=avagin@gmail.com \ --cc=christian.brauner@ubuntu.com \ --cc=davem@davemloft.net \ --cc=ktkhai@virtuozzo.com \ --cc=linux-api@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=ptikhomirov@virtuozzo.com \ --cc=serge@hallyn.com \ --cc=viro@zeniv.linux.org.uk \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Unnamed repository; edit this file 'description' to name the repository. This inbox may be cloned and mirrored by anyone: git clone --mirror http://archive.lwn.net:8080/linux-fsdevel/0 linux-fsdevel/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 linux-fsdevel linux-fsdevel/ http://archive.lwn.net:8080/linux-fsdevel \ linux-fsdevel@vger.kernel.org lwn-linux-fsdevel@archive.lwn.net public-inbox-index linux-fsdevel Example config snippet for mirrors. Newsgroup available over NNTP: nntp://archive.lwn.net/lwn.kernel.linux-fsdevel AGPL code for this site: git clone https://public-inbox.org/public-inbox.git