From: Kirill Tkhai <ktkhai@virtuozzo.com> To: "Eric W. Biederman" <ebiederm@xmission.com> Cc: viro@zeniv.linux.org.uk, adobriyan@gmail.com, davem@davemloft.net, akpm@linux-foundation.org, christian.brauner@ubuntu.com, areber@redhat.com, serge@hallyn.com, linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, Pavel Tikhomirov <ptikhomirov@virtuozzo.com> Subject: Re: [PATCH 00/23] proc: Introduce /proc/namespaces/ directory to expose namespaces lineary Date: Thu, 30 Jul 2020 18:01:20 +0300 Message-ID: <56928404-f194-4194-5f2a-59acb15b1a04@virtuozzo.com> (raw) In-Reply-To: <87k0yl5axy.fsf@x220.int.ebiederm.org> On 30.07.2020 17:34, Eric W. Biederman wrote: > Kirill Tkhai <ktkhai@virtuozzo.com> writes: > >> Currently, there is no a way to list or iterate all or subset of namespaces >> in the system. Some namespaces are exposed in /proc/[pid]/ns/ directories, >> but some also may be as open files, which are not attached to a process. >> When a namespace open fd is sent over unix socket and then closed, it is >> impossible to know whether the namespace exists or not. >> >> Also, even if namespace is exposed as attached to a process or as open file, >> iteration over /proc/*/ns/* or /proc/*/fd/* namespaces is not fast, because >> this multiplies at tasks and fds number. > > I am very dubious about this. > > I have been avoiding exactly this kind of interface because it can > create rather fundamental problems with checkpoint restart. restart/restore :) > You do have some filtering and the filtering is not based on current. > Which is good. > > A view that is relative to a user namespace might be ok. It almost > certainly does better as it's own little filesystem than as an extension > to proc though. > > The big thing we want to ensure is that if you migrate you can restore > everything. I don't see how you will be able to restore these files > after migration. Anything like this without having a complete > checkpoint/restore story is a non-starter. There is no difference between files in /proc/namespaces/ directory and /proc/[pid]/ns/. CRIU can restore open files in /proc/[pid]/ns, the same will be with /proc/namespaces/ files. As a person who worked deeply for pid_ns and user_ns support in CRIU, I don't see any problem here. If you have a specific worries about, let's discuss them. CC: Pavel Tikhomirov CRIU maintainer, who knows everything about namespaces C/R. > Further by not going through the processes it looks like you are > bypassing the existing permission checks. Which has the potential > to allow someone to use a namespace who would not be able to otherwise. I agree, and I wrote to Christian, that permissions should be more strict. This just should be formalized. Let's discuss this. > So I think this goes one step too far but I am willing to be persuaded > otherwise. > > Eric > > > > >> This patchset introduces a new /proc/namespaces/ directory, which exposes >> subset of permitted namespaces in linear view: >> >> # ls /proc/namespaces/ -l >> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'cgroup:[4026531835]' -> 'cgroup:[4026531835]' >> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'ipc:[4026531839]' -> 'ipc:[4026531839]' >> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'mnt:[4026531840]' -> 'mnt:[4026531840]' >> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'mnt:[4026531861]' -> 'mnt:[4026531861]' >> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'mnt:[4026532133]' -> 'mnt:[4026532133]' >> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'mnt:[4026532134]' -> 'mnt:[4026532134]' >> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'mnt:[4026532135]' -> 'mnt:[4026532135]' >> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'mnt:[4026532136]' -> 'mnt:[4026532136]' >> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'net:[4026531993]' -> 'net:[4026531993]' >> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'pid:[4026531836]' -> 'pid:[4026531836]' >> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'time:[4026531834]' -> 'time:[4026531834]' >> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'user:[4026531837]' -> 'user:[4026531837]' >> lrwxrwxrwx 1 root root 0 Jul 29 16:50 'uts:[4026531838]' -> 'uts:[4026531838]' >> >> Namespace ns is exposed, in case of its user_ns is permitted from /proc's pid_ns. >> I.e., /proc is related to pid_ns, so in /proc/namespace we show only a ns, which is >> >> in_userns(pid_ns->user_ns, ns->user_ns). >> >> In case of ns is a user_ns: >> >> in_userns(pid_ns->user_ns, ns). >> >> The patchset follows this steps: >> >> 1)A generic counter in ns_common is introduced instead of separate >> counters for every ns type (net::count, uts_namespace::kref, >> user_namespace::count, etc). Patches [1-8]; >> 2)Patch [9] introduces IDR to link and iterate alive namespaces; >> 3)Patch [10] is refactoring; >> 4)Patch [11] actually adds /proc/namespace directory and fs methods; >> 5)Patches [12-23] make every namespace to use the added methods >> and to appear in /proc/namespace directory. >> >> This may be usefull to write effective debug utils (say, fast build >> of networks topology) and checkpoint/restore software. >> --- >> >> Kirill Tkhai (23): >> ns: Add common refcount into ns_common add use it as counter for net_ns >> uts: Use generic ns_common::count >> ipc: Use generic ns_common::count >> pid: Use generic ns_common::count >> user: Use generic ns_common::count >> mnt: Use generic ns_common::count >> cgroup: Use generic ns_common::count >> time: Use generic ns_common::count >> ns: Introduce ns_idr to be able to iterate all allocated namespaces in the system >> fs: Rename fs/proc/namespaces.c into fs/proc/task_namespaces.c >> fs: Add /proc/namespaces/ directory >> user: Free user_ns one RCU grace period after final counter put >> user: Add user namespaces into ns_idr >> net: Add net namespaces into ns_idr >> pid: Eextract child_reaper check from pidns_for_children_get() >> proc_ns_operations: Add can_get method >> pid: Add pid namespaces into ns_idr >> uts: Free uts namespace one RCU grace period after final counter put >> uts: Add uts namespaces into ns_idr >> ipc: Add ipc namespaces into ns_idr >> mnt: Add mount namespaces into ns_idr >> cgroup: Add cgroup namespaces into ns_idr >> time: Add time namespaces into ns_idr >> >> >> fs/mount.h | 4 >> fs/namespace.c | 14 + >> fs/nsfs.c | 78 ++++++++ >> fs/proc/Makefile | 1 >> fs/proc/internal.h | 18 +- >> fs/proc/namespaces.c | 382 +++++++++++++++++++++++++++------------- >> fs/proc/root.c | 17 ++ >> fs/proc/task_namespaces.c | 183 +++++++++++++++++++ >> include/linux/cgroup.h | 6 - >> include/linux/ipc_namespace.h | 3 >> include/linux/ns_common.h | 11 + >> include/linux/pid_namespace.h | 4 >> include/linux/proc_fs.h | 1 >> include/linux/proc_ns.h | 12 + >> include/linux/time_namespace.h | 10 + >> include/linux/user_namespace.h | 10 + >> include/linux/utsname.h | 10 + >> include/net/net_namespace.h | 11 - >> init/version.c | 2 >> ipc/msgutil.c | 2 >> ipc/namespace.c | 17 +- >> ipc/shm.c | 1 >> kernel/cgroup/cgroup.c | 2 >> kernel/cgroup/namespace.c | 25 ++- >> kernel/pid.c | 2 >> kernel/pid_namespace.c | 46 +++-- >> kernel/time/namespace.c | 20 +- >> kernel/user.c | 2 >> kernel/user_namespace.c | 23 ++ >> kernel/utsname.c | 23 ++ >> net/core/net-sysfs.c | 6 - >> net/core/net_namespace.c | 18 +- >> net/ipv4/inet_timewait_sock.c | 4 >> net/ipv4/tcp_metrics.c | 2 >> 34 files changed, 746 insertions(+), 224 deletions(-) >> create mode 100644 fs/proc/task_namespaces.c >> >> -- >> Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com>
next prev parent reply other threads:[~2020-07-30 15:01 UTC|newest] Thread overview: 76+ messages / expand[flat|nested] mbox.gz Atom feed top 2020-07-30 11:59 Kirill Tkhai 2020-07-30 11:59 ` [PATCH 01/23] ns: Add common refcount into ns_common add use it as counter for net_ns Kirill Tkhai 2020-07-30 13:35 ` Christian Brauner 2020-07-30 14:07 ` Kirill Tkhai 2020-07-30 15:59 ` Christian Brauner 2020-07-30 14:30 ` Christian Brauner 2020-07-30 14:34 ` Kirill Tkhai 2020-07-30 14:39 ` Christian Brauner 2020-07-30 11:59 ` [PATCH 02/23] uts: Use generic ns_common::count Kirill Tkhai 2020-07-30 14:30 ` Christian Brauner 2020-07-30 11:59 ` [PATCH 03/23] ipc: " Kirill Tkhai 2020-07-30 14:32 ` Christian Brauner 2020-07-30 11:59 ` [PATCH 04/23] pid: " Kirill Tkhai 2020-07-30 14:37 ` Christian Brauner 2020-07-30 11:59 ` [PATCH 05/23] user: " Kirill Tkhai 2020-07-30 14:46 ` Christian Brauner 2020-07-30 11:59 ` [PATCH 06/23] mnt: " Kirill Tkhai 2020-07-30 14:49 ` Christian Brauner 2020-07-30 11:59 ` [PATCH 07/23] cgroup: " Kirill Tkhai 2020-07-30 14:50 ` Christian Brauner 2020-07-30 12:00 ` [PATCH 08/23] time: " Kirill Tkhai 2020-07-30 14:52 ` Christian Brauner 2020-07-30 12:00 ` [PATCH 09/23] ns: Introduce ns_idr to be able to iterate all allocated namespaces in the system Kirill Tkhai 2020-07-30 12:23 ` Matthew Wilcox 2020-07-30 13:32 ` Kirill Tkhai 2020-07-30 13:56 ` Matthew Wilcox 2020-07-30 14:12 ` Kirill Tkhai 2020-07-30 14:15 ` Matthew Wilcox 2020-07-30 14:20 ` Kirill Tkhai 2020-07-30 12:00 ` [PATCH 10/23] fs: Rename fs/proc/namespaces.c into fs/proc/task_namespaces.c Kirill Tkhai 2020-07-30 12:00 ` [PATCH 11/23] fs: Add /proc/namespaces/ directory Kirill Tkhai 2020-07-30 12:18 ` Alexey Dobriyan 2020-07-30 13:22 ` Kirill Tkhai 2020-07-30 13:26 ` Christian Brauner 2020-07-30 14:30 ` Kirill Tkhai 2020-07-30 20:47 ` kernel test robot 2020-07-30 22:20 ` kernel test robot 2020-08-05 8:17 ` kernel test robot 2020-08-05 8:17 ` [RFC PATCH] fs: namespaces_dentry_operations can be static kernel test robot 2020-07-30 12:00 ` [PATCH 12/23] user: Free user_ns one RCU grace period after final counter put Kirill Tkhai 2020-07-30 12:00 ` [PATCH 13/23] user: Add user namespaces into ns_idr Kirill Tkhai 2020-07-30 12:00 ` [PATCH 14/23] net: Add net " Kirill Tkhai 2020-07-30 12:00 ` [PATCH 15/23] pid: Eextract child_reaper check from pidns_for_children_get() Kirill Tkhai 2020-07-30 12:00 ` [PATCH 16/23] proc_ns_operations: Add can_get method Kirill Tkhai 2020-07-30 12:00 ` [PATCH 17/23] pid: Add pid namespaces into ns_idr Kirill Tkhai 2020-07-30 12:00 ` [PATCH 18/23] uts: Free uts namespace one RCU grace period after final counter put Kirill Tkhai 2020-07-30 12:01 ` [PATCH 19/23] uts: Add uts namespaces into ns_idr Kirill Tkhai 2020-07-30 12:01 ` [PATCH 20/23] ipc: Add ipc " Kirill Tkhai 2020-07-30 12:01 ` [PATCH 21/23] mnt: Add mount " Kirill Tkhai 2020-07-30 12:01 ` [PATCH 22/23] cgroup: Add cgroup " Kirill Tkhai 2020-07-30 12:01 ` [PATCH 23/23] time: Add time " Kirill Tkhai 2020-07-30 13:08 ` [PATCH 00/23] proc: Introduce /proc/namespaces/ directory to expose namespaces lineary Christian Brauner 2020-07-30 13:38 ` Christian Brauner 2020-07-30 14:34 ` Eric W. Biederman 2020-07-30 14:42 ` Christian Brauner 2020-07-30 15:01 ` Kirill Tkhai [this message] 2020-07-30 22:13 ` Eric W. Biederman 2020-07-31 8:48 ` Pavel Tikhomirov 2020-08-03 10:03 ` Kirill Tkhai 2020-08-03 10:51 ` Alexey Dobriyan 2020-08-06 8:05 ` Andrei Vagin 2020-08-07 8:47 ` Kirill Tkhai 2020-08-10 17:34 ` Andrei Vagin 2020-08-11 10:23 ` Kirill Tkhai 2020-08-12 17:53 ` Andrei Vagin 2020-08-13 8:12 ` Kirill Tkhai 2020-08-14 1:16 ` Andrei Vagin 2020-08-14 15:11 ` Kirill Tkhai 2020-08-14 19:21 ` Andrei Vagin 2020-08-17 14:05 ` Kirill Tkhai 2020-08-17 15:48 ` Eric W. Biederman 2020-08-17 17:47 ` Christian Brauner 2020-08-17 18:53 ` Eric W. Biederman 2020-08-04 5:43 ` Andrei Vagin 2020-08-04 12:11 ` Pavel Tikhomirov 2020-08-04 14:47 ` Kirill Tkhai
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=56928404-f194-4194-5f2a-59acb15b1a04@virtuozzo.com \ --to=ktkhai@virtuozzo.com \ --cc=adobriyan@gmail.com \ --cc=akpm@linux-foundation.org \ --cc=areber@redhat.com \ --cc=christian.brauner@ubuntu.com \ --cc=davem@davemloft.net \ --cc=ebiederm@xmission.com \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=ptikhomirov@virtuozzo.com \ --cc=serge@hallyn.com \ --cc=viro@zeniv.linux.org.uk \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
Unnamed repository; edit this file 'description' to name the repository. This inbox may be cloned and mirrored by anyone: git clone --mirror http://archive.lwn.net:8080/linux-fsdevel/0 linux-fsdevel/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 linux-fsdevel linux-fsdevel/ http://archive.lwn.net:8080/linux-fsdevel \ linux-fsdevel@vger.kernel.org lwn-linux-fsdevel@archive.lwn.net public-inbox-index linux-fsdevel Example config snippet for mirrors. Newsgroup available over NNTP: nntp://archive.lwn.net/lwn.kernel.linux-fsdevel AGPL code for this site: git clone https://public-inbox.org/public-inbox.git