问题: I’ve recently run into some slightly odd behaviour when running commands over ssh. I would be interested to hear any explanations for the behaviour below.
Running ssh localhost 'touch foobar &' creates a file called foobar as expected:
However running the same command but with the -t option to force pseudo-tty allocation fails to create foobar:
1 2 3 4 5 6
[bob@server ~]$ ssh -t localhost 'touch foobar &' Connection to localhost closed. [bob@server ~]$ echo $? 0 [bob@server ~]$ ls foobar ls: cannot access foobar: No such file or directory
My current theory is that because the touch process is being backgrounded the pseudo-tty is allocated and unallocated before the process has a chance to run. Certainly adding one second sleep allows touch to run as expected:
1 2 3 4
[bob@pidora ~]$ ssh -t localhost 'touch foobar & sleep 1' Connection to localhost closed. [bob@pidora ~]$ ls foobar foobar
回答: This is related with how process groups work, how bash behaves when invoked as a non-interactive shell with -c, and the effect of & in input commands.
The answer assumes you’re familiar with how job control works in UNIX; if you’re not, here’s a high level view: every process belongs to a process group (the processes in the same group are often put there as part of a command pipeline, e.g. cat file | sort | grep 'word' would place the processes running cat(1), sort(1) and grep(1) in the same process group). bash is a process like any other, and it also belongs to a process group. Process groups are part of a session (a session is composed of one or more process groups). In a session, there is at most one process group, called the foreground process group, and possibly many background process groups. The foreground process group has control of the terminal (if there is a controlling terminal attached to the session); the session leader (bash) moves processes from background to foreground and from foreground to background with tcsetpgrp(3). A signal sent to a process group is delivered to every process in that group.
If the concept of process groups and job control is completely new to you, I think you’ll need to read up on that to fully understand this answer. A great resource to learn this is Chapter 9 of Advanced Programming in the UNIX Environment (3rd edition).
That being said, let’s see what is happening here. We have to fit together every piece of the puzzle.
In both cases, the ssh remote side invokes bash(1) with -c. The -c flag causes bash(1) to run as a non-interactive shell. From the manpage:
An interactive shell is one started without non-option arguments and without the -c option whose standard input and error are both connected to terminals (as determined by isatty(3)), or one started with the -i option. PS1 is set and $- includes i if bash is interactive, allowing a shell script or a startup file to test this state.
Also, it is important to know that job control is disabled when bash is started in non-interactive mode. This means that bash will not create a separate process group to run the command, since job control is disabled, there will be no need to move this command between foreground and background, so it might as well just remain in the same process group as bash. This will happen whether or not you forced PTY allocation on ssh with -t.
However, the use of & has the side effect of causing the shell not to wait for command termination (even if job control is disabled). From the manpage:
If a command is terminated by the control operator &, the shell executes the command in the background in a subshell. The shell does not wait for the command to finish, and the return status is 0. Commands separated by a ; are executed sequentially; the shell waits for each command to terminate in turn. The return status is the exit status of the last command executed.
So, in both cases, bash will not wait for command execution, and touch(1) will be executed in the same process group as bash(1).
Now, consider what happens when a session leader exits. Quoting from setpgid(2) manpage:
If a session has a controlling terminal, and the CLOCAL flag for that terminal is not set, and a terminal hangup occurs, then the session leader is sent a SIGHUP. If the session leader exits, then a SIGHUP signal will also be sent to each process in the foreground process group of the controlling terminal.
When you don’t use -t
When you don’t use -t, there is no PTY allocation on the remote side, so bash is not a session leader, and in fact no new session is created. Because sshd is running as a daemon, the bash process that is forked + exec()’d will not have a controlling terminal. As such, even though the shell terminates very quickly (probably before touch(1)), there is no SIGHUP sent to the process group, because bash wasn’t a session leader (and there is no controlling terminal). So everything works.
When you use -t
-t forces PTY allocation, which means that the ssh remote side will call setsid(2), allocate a pseudo-terminal + fork a new process with forkpty(3), connect the PTY master device input and output to the socket endpoints that lead to your machine, and finally execute bash(1). forkpty(3) opens the PTY slave side in the forked process that will become bash; since there’s no controlling terminal for the current session, and a terminal device is being opened, the PTY device becomes the controlling terminal for the session and bash becomes the session leader.
Then the same thing happens again: touch(1) is executed in the same process group, etc., yadda yadda. The point is, this time, there is a session leader and a controlling terminal. So, since bash does not bother waiting because of the &, when it exits, SIGHUP is delivered to the process group and touch(1) dies prematurely.
About nohup
nohup(1) doesn’t work here because there is still a race condition. If bash(1) terminates before nohup(1) has the chance to set up the necessary signal handling and file redirection, it will have no effect (which is probably what happens)
A possible fix
Forcefully re-enabling job control fixes it. In bash, you do that with set -m. This works:
A syslog facility code is used to specify the type of program that is logging the message RFC 5424 Section 6.2.1. Facility code Keyword Description Info 0 kern kernel messages 1 user user-level messages 2 mail mail system Archaic POSIX still supported and sometimes used system, for more mail(1)) 3 daemon system daemons All deamons, including systemd and its subsystems 4 auth security/authorization messages Also watch for different facility 10 5 syslog messages generated internally by syslogd As it standartized for syslogd, not used by systemd (see facility 3) 6 lpr line printer subsystem (archaic subsystem) 7 news network news subsystem (archaic subsystem) 8 uucp UUCP subsystem (archaic subsystem) 9 clock daemon systemd-timesyncd 10 authpriv security/authorization messages Also watch for different facility 4 11 ftp FTP daemon 12 - NTP subsystem 13 - log audit 14 - log alert 15 cron scheduling daemon 16 local0 local use 0 (local0) 17 local1 local use 1 (local1) 18 local2 local use 2 (local2) 19 local3 local use 3 (local3) 20 local4 local use 4 (local4) 21 local5 local use 5 (local5) 22 local6 local use 6 (local6) 23 local7 local use 7 (local7) So, useful facilities to watch: 0,1,3,4,9,10,15.
Priority level
A syslog severity code (in systemd called priority) is used to mark the importance of a message RFC 5424 Section 6.2.1. Value Severity Keyword Description Examples 0 Emergency emerg System is unusable Severe Kernel BUG, systemd dumped core. This level should not be used by applications. 1 Alert alert Should be corrected immediately Vital subsystem goes out of work. Data loss. kernel: BUG: unable to handle kernel paging request at ffffc90403238ffc. 2 Critical crit Critical conditions Crashes, coredumps. Like familiar flash: systemd-coredump[25319]: Process 25310 (plugin-containe) of user 1000 dumped core Failure in the system primary application, like X11. 3 Error err Error conditions Not severe error reported: kernel: usb 1-3: 3:1: cannot get freq at ep 0x84, systemd[1]: Failed unmounting /var., libvirtd[1720]: internal error: Failed to initialize a valid firewall backend). 4 Warning warning May indicate that an error will occur if action is not taken. A non-root file system has only 1GB free. org.freedesktop. Notifications[1860]: (process:5999): Gtk-WARNING **: Locale not supported by C library. Using the fallback ‘C’ locale. 5 Notice notice Events that are unusual, but not error conditions. systemd[1]: var.mount: Directory /var to mount over is not empty, mounting anyway. gcr-prompter[4997]: Gtk: GtkDialog mapped without a transient parent. This is discouraged. 6 Informational info Normal operational messages that require no action. lvm[585]: 7 logical volume(s) in volume group “archvg” now active. 7 Debug debug Information useful to developers for debugging the application. kdeinit5[1900]: powerdevil: Scheduling inhibition from “:1.14” “firefox” with cookie 13 and reason “screen”.
If issue you are looking for, was not found on according level, search it on couple of priority levels above and below. This rules are recommendations. Some errors considered a normal occasion for program so they marked low in priority by developer, and on the contrary, sometimes too many messages plaques too high priorities for them, but often it’s an arguable situation. And often you really should solve an issue, also to understand architecture and adopt best practices.
Examples:
Info message:
pulseaudio[2047]: W: [pulseaudio] alsa-mixer.c: Volume element Master has 8 channels. That’s too much! I can’t handle that!
It is an warning or error by definition.
Plaguing alert message:
sudo[21711]: user : a password is required ; TTY=pts/0 ; PWD=/home/user ; USER=root ; COMMAND=list /usr/bin/pacman –color auto -Sy The reason - user was manually added to sudoers file, not to wheel group, which is arguably normal action, but sudo produced an alert on every occasion.
Compatibility with a classic, non-journald aware syslog implementation can be provided by letting systemd forward all messages via the socket /run/systemd/journal/syslog. To make the syslog daemon work with the journal, it has to bind to this socket instead of /dev/log (official announcement). As of systemd 216 the default journald.conf for forwarding to the socket was changed to ForwardToSyslog=no to avoid system overhead, because rsyslog or syslog-ng (since 3.6) pull the messages from the journal by itself. See Syslog-ng#Overview and Syslog-ng#syslog-ng and systemd journal, or rsyslog respectively, for details on configuration.
有时你希望查看另一个系统上的日志.例如从 Live 环境修复现存的系统. 这种情况下你可以挂载目标系统 ( 例如挂载到 /mnt ),然后用 -D/–directory 参数指定目录,像这样:
1
$ journalctl -D /mnt/var/log/journal -xe
Tips and tricks
Enable installed units by default Tango-view-fullscreen.pngThis article or section needs expansion.Tango-view-fullscreen.png Reason: How does it work with instantiated units? (Discuss in Talk:Systemd (简体中文)#) Arch Linux ships with /usr/lib/systemd/system-preset/99-default.preset containing disable *. This causes systemctl preset to disable all units by default, such that when a new package is installed, the user must manually enable the unit. If this behavior is not desired, simply create a symlink from /etc/systemd/system-preset/99-default.preset to /dev/null in order to override the configuration file. This will cause systemctl preset to enable all units that get installed—regardless of unit type—unless specified in another file in one systemctl preset’s configuration directories. User units are not affected. See the manpage for systemd.preset for more information. Note: Enabling all units by default may cause problems with packages that contain two or more mutually exclusive units. systemctl preset is designed to be used by distributions and spins or system administrators. In the case where two conflicting units would be enabled, you should explicitly specify which one is to be disabled in a preset configuration file as specified in the manpage for systemd.preset.
现在得到了 PID ,你就可以进一步探查错误的详细信息了.通过下列的命令收集日志,PID 参数是你刚刚得到的 Process ID (例如 15630):
1 2 3 4
$ journalctl -b _PID=15630 -- Logs begin at Sa 2013-05-25 10:31:12 CEST, end at So 2013-08-25 11:51:17 CEST. -- Aug 25 11:48:13 mypc systemd-modules-load[15630]: Failed to find module 'blacklist usblp' Aug 25 11:48:13 mypc systemd-modules-load[15630]: Failed to find module 'install usblp /bin/false'
General \copyright show PostgreSQL usage and distribution terms \g [FILE] or ; execute query (and send results to file or |pipe) \h [NAME] help on syntax of SQL commands, * for all commands \q quit psql
Query Buffer \e [FILE] [LINE] edit the query buffer (or file) with external editor \ef [FUNCNAME [LINE]] edit function definition with external editor \p show the contents of the query buffer \r reset (clear) the query buffer \s [FILE] display history or save it to file \w FILE write query buffer to file
Input/Output \copy ... perform SQL COPY with data stream to the client host \echo [STRING] write string to standard output \i FILE execute commands from file \o [FILE] send all query results to file or |pipe \qecho [STRING] write string to query output stream (see \o)
Informational (options: S = show system objects, + = additional detail) \d[S+] list tables, views, and sequences \d[S+] NAME describe table, view, sequence, or index \da[S] [PATTERN] list aggregates \db[+] [PATTERN] list tablespaces \dc[S] [PATTERN] list conversions \dC [PATTERN] list casts \dd[S] [PATTERN] show comments on objects \ddp [PATTERN] list default privileges \dD[S] [PATTERN] list domains \det[+] [PATTERN] list foreign tables \des[+] [PATTERN] list foreign servers \deu[+] [PATTERN] list user mappings \dew[+] [PATTERN] list foreign-data wrappers \df[antw][S+] [PATRN] list [only agg/normal/trigger/window] functions \dF[+] [PATTERN] list text search configurations \dFd[+] [PATTERN] list text search dictionaries \dFp[+] [PATTERN] list text search parsers \dFt[+] [PATTERN] list text search templates \dg[+] [PATTERN] list roles \di[S+] [PATTERN] list indexes \dl list large objects, same as \lo_list \dL[S+] [PATTERN] list procedural languages \dn[S+] [PATTERN] list schemas \do[S] [PATTERN] list operators \dO[S+] [PATTERN] list collations \dp [PATTERN] list table, view, and sequence access privileges \drds [PATRN1 [PATRN2]] list per-database role settings \ds[S+] [PATTERN] list sequences \dt[S+] [PATTERN] list tables \dT[S+] [PATTERN] list data types \du[+] [PATTERN] list roles \dv[S+] [PATTERN] list views \dE[S+] [PATTERN] list foreign tables \dx[+] [PATTERN] list extensions \l[+] list all databases \sf[+] FUNCNAME show a function's definition \z [PATTERN] same as \dp
Formatting \a toggle between unaligned and aligned output mode \C [STRING] set table title, or unset if none \f [STRING] show or set field separator for unaligned query output \H toggle HTML output mode (currently off) \pset NAME [VALUE] set table output option (NAME := {format|border|expanded|fieldsep|footer|null| numericlocale|recordsep|tuples_only|title|tableattr|pager}) \t [on|off] show only rows (currently off) \T [STRING] set HTML <table> tag attributes, or unset if none \x [on|off] toggle expanded output (currently off)
Connection \c[onnect] {[DBNAME|- USER|- HOST|- PORT|-] | conninfo} connect to new database (currently "postgres") \encoding [ENCODING] show or set client encoding \password [USERNAME] securely change the password for a user \conninfo display information about current connection
Operating System \cd [DIR] change the current working directory \timing [on|off] toggle timing of commands (currently off) \! [COMMAND] execute command in shell or start interactive shell
Variables \prompt [TEXT] NAME prompt user to set internal variable \set [NAME [VALUE]] set internal variable, or list all if no parameters \unset NAME unset (delete) internal variable
Large Objects \lo_export LOBOID FILE \lo_import FILE [COMMENT] \lo_list \lo_unlink LOBOID large object operations