FTP PROTOCOL EXTENSIONS IN THE KERMIT FTP CLIENT

F. da Cruz, Columbia University
[email protected]
26 Sep 2002
Most Recent Update: Wed Oct 30 16:51:40 2002
[ Kermit FTP Client ] [ FTP Scripting ] [ C-Kermit 8.0 ] [ Kermit Home ]


CONTENTS

The new releases of C-Kermit (8.0.206) and Kermit 95 (2.1) support new FTP protocol features from RFC 2389 as well as most of what's in the Elz and Hethmon Extensions to FTP Internet Draft (see References). Some of these features, such as SIZE (request a file's size), MDTM (request file's modification time), and REST (restart interrupted transfer) have been widely implemented in FTP clients and servers for years (as well as in the initial release of the Kermit FTP clients). Others such as FEAT and MLSD are rarely seen and are new to the upcoming Kermit releases. TVFS (Trivial Virtual File Store) is supported implicitly, and the UTF-8 character-set is already fully supported at the protocol and data-interchange level.

For Kermit users, the main benefit of the new FTP protocol extensions is the ability to do recursive downloads. But the extensions also introduce complications and tradeoffs that you should be aware of. Of course Kermit tries to "do the right thing" automatically in every case for backwards compatibility. But (as noted later) some cases are inherently ambiguous and/or can result in nasty surprises, and for those situations new commands and switches are available to give you precise control over Kermit's behavior, in case the defaults don't produce the desired results.

The reader is assumed to be familiar with FTP command-line clients and with the Kermit FTP client. If you are not aware of the Kermit FTP client, please CLICK HERE for a brief overview.


TERMINOLOGY

Command-line FTP clients such as Kermit (as well as the traditional FTP programs found on Unix, VMS, ..., even Windows) have commands like PUT, MPUT, GET, MGET, and BYE, which they convert into zero or more FTP protocol commands, such as NLST, RETR, QUIT. For clarity, we'll use "command" to refer to commands given by the user to the FTP client, and "directive" for FTP protocol commands sent by the FTP client to the FTP server.


FEATURE NEGOTIATION

New FTP protocol features are negotiated by the client sending a FEAT directive and the server responding with a list of (new) features it supports, or else with an error indication if it does not support the FEAT directive at all, in which case the client has to guess which new features it supports (Kermit guesses that it supports SIZE and MDTM but not MLST). Note that the MLST feature includes MLSD, which is not listed separately as a feature.

Guessing is nice when it works, but sometimes it doesn't, and some FTP servers become confused when you send them a directive they don't understand, or they do something you didn't want, sometimes to the point of closing the connection. For this reason, Kermit lets you override default or negotiated features with the following new commands:

FTP { ENABLE, DISABLE } FEAT
Enables or disables the automatic sending of a FEAT directive upon connection to an FTP server. Note that FTP [ OPEN ] /NOINIT   also inhibits sending the FEAT directive (and several others) for the connection being OPEN'd, but without necessarily disabling FEAT for subsequent connections in the same Kermit instance. FEAT is ENABLED by default, in which case many FTP servers are likely to reply:

500 'FEAT': command not understood

which is normally harmless (but you never know).

FTP ENABLE { MDTM, MLST, SIZE }
Enables the given directive for implicit use by the FTP GET and MGET commands in case it has been disabled or erroneously omitted by the server in its FEAT response. Note: MLSD can be used in the FTP ENABLE and DISABLE commands as a synonym for MLST.

FTP DISABLE { MDTM, MLST, SIZE }
Disables implicit use of the given directive by GET or MGET in case it causes problems; for example, because it makes multifile downloads take too long or the server announces it erroneously or misimplements it. Use DISABLE FEAT before making a connection to prevent Kermit from sending the FEAT directive as part of its initial sequence. Note that disabling FEAT, SIZE, or MDTM does not prevent you from executing explicit FTP FEATURES, FTP SIZE, or FTP MODTIME commands. Also note that disabling SIZE prevents PUT /RESTART (recovery of interrupted uploads) from working.

To enable or disable more than one feature, use multiple FTP ENABLE or FTP DISABLE commands. The SHOW FTP command shows which features are currently enabled and disabled.

FTP FEATURES
This command sends a FEAT directive to the server. In case you have been disabling and enabling different features, this resynchronizes Kermit's feature list with the server's. If the server does not support the FEAT directive, Kermit's feature list is not changed.

FTP OPTIONS directive
Informational only: the server tells what options, if any, it supports for the given directive, e.g. MLST. Fails if the server does not support the OPTS directive or if the directive for which options are requested is not valid. The directive is case-insensitive.

FTP SIZE filename
Sends a SIZE directive to the server for the given file. The filename must not contain wildcards. The server responds with an error if the file can't be found, is not accessible, or the SIZE directive is not supported, otherwise with the length of the file in bytes, which Kermit displays and also makes available to you in its \v(ftp_message) variable. If the directive is successful, Kermit (re-)enables it for internal use by the GET and MGET directives on this connection.

FTP MODTIME filename
Works just like the SIZE directive except it sends an MDTM directive. Upon success, the server sends modification date-time string, which Kermit interprets for you and also makes available in its \v(ftp_message) variable.

Whenever a SIZE or MDTM directive is sent implicitly and rejected by the server because it is unknown, Kermit automatically disables it.


USING MGET: NLST VERSUS MLSD

When you give an MGET command to an FTP client, it sends a request to the FTP server for a list of files, and then upon successful receipt of the list, goes through it and issues a RETR (retrieve) directive for each file on the list (or possibly only for selected files).

With the new FTP protocol extensions, now there are two ways to get the list of files: the NLST directive, which has been part of FTP protocol since the beginning, and the new MLSD directive, which is new and not yet widely implemented. When NLST is used and you give a command like "mget *.txt", the FTP client sends:

NLST *.txt

and the server sends back a list of the files whose names match, e.g.

foo.txt
bar.txt
baz.txt

Then when dowloading each file, the client must send SIZE (so it can have a percent-done display) and MDTM (if it wants to set the downloaded file's timestamp to match that of the original), as well as RETR (to retrieve the file).

But when MLSD is used, the client is not supposed to send the filename or wildcard to the server; instead it sends an MLSD directive with no argument (or the name of a directory), and the server sends back a list of all the files in the current or given directory; then the client goes through the list and checks each file to see if it matches the given pattern, the rationale being that the user knows only the local conventions for wildcards and not necessarily the server's conventions. So with NLST the server interprets wildcards; with MLSD the client does.

The interpretation of NLST wildcards by the server is not necessarily required or even envisioned by the FTP protocol definition (RFC 959), but in practice all clients and servers work this way.

The principal advantage of MLSD is that instead of sending back a simple list of filenames, it sends back a kind of database in which each entry contains a filename together with information about the file: type, size, timestamp, and so on; for example:

size=0;type=dir;perm=el;modify=20020409191530; bin
size=3919312;type=file;perm=r;modify=20000310140400; bar.txt
size=6686176;type=file;perm=r;modify=20001215181000; baz.txt
size=3820092;type=file;perm=r;modify=20000310140300; foo.txt
size=27439;type=file;perm=r;modify=20020923151312; foo.zip
(etc etc...)

(If the format of the file list were the only difference between NLST and MLSD, the discussion would be finished: it would always be better to use MLSD when available, and the MGET user interface would need no changes. But there's a lot more to MLSD than the file-list format; read on…)

The client learns whether the server supports MLSD in FEAT exchange. But the fact that the server supports MLSD doesn't mean the client should always use it. It is better to use MLSD:

But it is better to use NLST:

But when using MLSD there are complications:

To further complicate matters, NLST can (in theory) work just like MLSD: if sent with a blank argument or a directory name, it is supposed to return a complete list of files in the current or given directory, which the client can match locally against some pattern. It is not known if any FTP server or client does this but nevertheless, it should be possible since this behavior can be inferred from RFC 959.

In view of these considerations, and given the need to preserve the traditional FTP client command structure and behavior so the software will be usable by most people:

  1. The MGET command should produce the expected result in the common cases, regardless of whether NLST or MLSD is used underneath.

  2. For anomalous cases, the user needs a way to control whether the MGET argument is sent to the server or kept for local use.

  3. At the same time, the user might need a way to send a directory name to the server, independent of any wildcard pattern.

  4. The user needs a way to force NLST or MLSD for a given MGET command.

By default, Kermit's MGET command uses MLSD if MLST is reported by the server in its FEAT list. When MLSD is used, the filespec is sent to the server if it is not wild (according to Kermit's own definition of "wild" since it can't possibly know the server's definition). If the filespec is wild it is held for local use to select files from the list returned by the server. If MLST is not reported by the server or is disabled, Kermit sends the MGET filespec with the NLST directive.

The default behavior can be overridden globally with FTP DISABLE MLST, which forces Kermit to use NLST to get file lists. And then for situations in which MLSD is enabled, the following MGET switches can be used to override the defaults for a specific MGET operation:

/NLST
Forces the client to send NLST. Example:
mget /nlst foo.*

/MLSD
Forces the client to send MLSD (even if MLST is disabled). Example:
mget /mlsd foo.*

/MATCH:pattern
When this switch is given, it forces the client to hold the pattern for local use against the returned file list. If a remote filespec is also given (e.g. the "blah" in "mget /match:*.txt blah"), then it is sent as the NLST or MLSD argument, presumably to specify the directory whose files are to be listed. When the /MATCH switch is not given, the MGET filespec is sent to the server if the directive is NLST or if the filespec is not wild. Examples:

  Command:                   With NLST:     With MLSD:
    mget                      NLST           MLSD
    mget *.txt                NLST *.txt     MLSD        
    mget foo                  NLST foo       MLSD foo
    mget /match:*.txt         NLST           MLSD
    mget /match:*.txt foo     NLST foo       MLSD foo

In other words, the pattern is always intepreted locally unless MGET uses NLST and no /MATCH switch was given.


EXAMPLES

1. Downloading a Single File

There are no choices here, just use the FTP GET command. Kermit always sends the RETR directive, and possibly SIZE and/or MDTM. The small advantage of using MLST in this case is outweighed by the risk and effort of coding a special case.

2. Downloading a Group of Files from a Single Directory

This case presents tradeoffs, especially on slow connections:

3. Downloading a Directory Tree

MLSD is the only choice for recursive downloads; they rarely, if ever, work with NLST (the few cases where they do work rely on extra-protocol "secret" notations for the NLST argument). No special actions are required to force MLSD when the server supports it, unless you have disabled it. Examples:

MGET /RECURSIVE
This tells the server to send all files and directories in the tree rooted at its current directory.

MGET /RECURSIVE *.txt
This tells the server to send all *.txt files in the tree rooted at its current directory.

MGET /MLSD /RECURSIVE *.txt
Same as the previous example but forces Kermit to send MLSD in case it was disabled, or in case the server is known to support it even though it did not announce it in its FEAT listing.

MGET /RECURSIVE /MATCH:*.zip archives
Tells the server to send all ZIP files in the tree rooted at its "archives" directory.

MGET /RECURSIVE /MATCH:* [abc]
The server is running on VMS and you want it to send all the files in the directory tree rooted at [ABC]. But since "[abc]" looks just like a wildcard, you have to include a /MATCH: switch to force Kermit to send "[abc]" as the MLSD argument.

In all cases in which the /RECURSIVE switch is included, the server's tree is duplicated locally.

Although MLSD allows recursion and NLST does not, the MLSD specification places a heavy burden on the client; the obvious, straightforward, and elegant implementation (depth-first, the one that Kermit currently uses) requires as many open temporary files as the server's directory tree is deep, and therefore client resource exhaustion -- e.g. exceeding the maximum number of open files -- is a danger. Unfortunately MLSD was not designed with recursion in mind. (Breadth-first traversal could be problematic due to lack of sufficient navigation information.)

Of course all of Kermit's other MGET switches can be used too, e.g. for finer-grained file selection (by date, size, etc), for moving or renaming files as they arrive, to override Kermit's automatic per-file text/binary mode switching, to pass the incoming files through a filter, to convert text-file character sets, and so on.

4. NLST/MLSD Summary Table

Here's a table summarizing MGET behavior when the server supports both NLST and MLSD. /NLST and /MLSD switches are included for clarity to indicate which protocol is being used, and the expected effects. In practice you can omit the /NLST and /MLSD switches and the Kermit client chooses the appropriate or desired protocol as described above. Sample commands presume a Unix file system on the server, but of course the server can have any file system or syntax at all.

User's Command FTP Sends Remarks
mget /nlst NLST Gets a list of all the files in the server's current and downloads each file. The list includes names only, so Kermit also must send SIZE and MDTM directives if size and timestamp information is required (this is always true of NLST). Sending NLST without an argument is allowed by the RFC959 NLST definition and by the Kermit FTP client, but might not work with other clients, and also might not work with every server.
mget /nlst foo NLST foo If "foo" is a directory, this gets a list of all the files from the server's "foo" directory and downloads each file; otherwise this downloads the file named "foo" (if any) from the server's current directory.
mget /nlst *.txt NLST *.txt Gets a list of the files in the server's current directory whose names match the pattern *.txt, and then downloads each file from the list. Because we are using NLST, we send the filespec (*.txt) to the server and the server interprets any wildcards.
mget /nlst foo/*.txt NLST foo/*.txt  Gets a list of the files in the server's "foo" directory whose names match the pattern *.txt, and then downloads each file from the list (server interprets wildcards).
mget /nlst /match:*.txt NLST Gets a list of all the files in the server's current directory and then downloads each one whose name matches the pattern *.txt (client interprets wildcards).
mget /nlst /match:*.txt foo  NLST foo Gets a list of all the files in the server's "foo" directory and then downloads each one whose name matches the pattern *.txt (client interprets wildcards).
mget /mlsd MLSD Gets a list of all the files from the server's current directory and then downloads each one. The list might include size and timestamp information, in which case Kermit does not need to send SIZE and MDTM directives for each file (this is always true of MLSD).
mget /mlsd foo MLSD foo Gets a list of all the files from the server's "foo" directory (where the string "foo" does not contain wildcards) and then downloads each one. If "foo" is a regular file and not a directory, this command is supposed to fail, but some servers have been observed that send the file.
mget /mlsd *.txt MLSD Gets a list of all the files from the server's current directory and then downloads only the ones whose names match the pattern "*.txt". Because we are using MLSD and the MGET filespec is wild, we do not send the filespec to the server, but treat it as though it had been given in a /MATCH: switch and use it locally to match the names in the list.
mget /mlsd foo/*.txt MLSD This one won't work because MLSD requires that the notions of server directory and filename-matching pattern be separated. However, the client, which can't be expected to know the server's file-system syntax, winds up sending a request that the server will (or should) reject.
mget /mlsd /match:*.txt MLSD Gets a list of all the files from the server's current directory and then downloads only the ones whose names match the pattern "*.txt" (client interprets wildcards).
mget /mlsd /match:*.txt foo MLSD foo If "foo" is a directory on the server, this gets a list of all the files from the server's "foo" directory and then downloads only the ones whose names match the pattern "*.txt" (client interprets wildcards). This leaves the server CD'd to the "foo" directory; there's no way the client can restore the server's original directory because MLSD doesn't give that information, and since the client can not be expected to know the server's file-system syntax, it would not be safe to guess. If "foo" is a regular file, MLSD fails.
mget /mlsd foo bar MLSD This one is problematic. You're supposed to be able to give MGET a list a filespecs; in this case we name two directories. The client must change the server's directory to "foo" to get the list of files, and then the files themselves. But then it has no way to return to the server's previous directory in order to do the same for "bar", as explained in the previous example.
mget /mlsd /match:* [abc] MLSD [abc] Including a /MATCH: switch forces [abc] to be sent to the server even though the client would normally think it was a wildcard and hold it for local interpretation. In this example, [abc] might be a VMS directory name.
mget /mlsd /match:* t*.h MLSD t*.h Contrary to the MLSD specification, all MLSD-capable FTP servers I've encountered so far do interpret wildcards. This form of the MGET command can be used to force a wildcard to be sent to the server for interpretation.

When MLSD is used implicitly (that is, without an /MLSD switch given to force the use of MLSD) and an MGET command such as "mget foo/*.txt" fails, Kermit automatically falls back to NLST and tries again.


OTHER KERMIT FTP CLIENT CHANGES


REFERENCES

  1. Postel, J., and J. Reynolds, File Transfer Protocol (FTP), RFC 959, October 1985: ftp://ftp.isi.edu/in-notes/rfc959.txt.

  2. Hethmon, P, and R. Elz, Feature negotiation mechanism for the File Transfer Protocol, RFC 2389, August 1998: ftp://ftp.isi.edu/in-notes/rfc2389.txt.

  3. Elz, R, and P. Hethmon, Extensions to FTP, Internet Draft draft-ietf-ftpext-mlst-16.txt, September 2002: http://www.ietf.org/internet-drafts/draft-ietf-ftpext-mlst-16.txt.

  4. The Kermit FTP Client (overview).

  5. The Kermit FTP Client (documentation).

[ Top ] [ C-Kermit Daily Builds ]


New FTP Features / The Kermit Project / Columbia University / [email protected] / Sep-Oct 2002