SCOUG-Programming: GetLine()

Next Meeting: Sat, TBD
Meeting Directions

Be a Member
Join SCOUG

Navigation:

20 Most Recent Documents
Search Archives
Index by date, title, author, category.

Features:

Mr. Know-It-All
Ink
Download!

Supporting Warpstock Phoenix 2023

Supporting Warpstock Orlando 2022

SCOUG:

Home

Email Lists

SIGs (Internet, General Interest, Programming, Network, more..)

Online Chats

Pictures from Sept. 1999

The views expressed in articles on this site are those of their authors.

warptech
SCOUG was there!

SCOUG, Warp Expo West, and Warpfest are trademarks of the Southern California OS/2 User Group. OS/2, Workplace Shell, and IBM are registered trademarks of International Business Machines Corporation. All other trademarks remain the property of their respective owners.

The Southern California OS/2 User Group
USA

SCOUG-Programming Mailing List Archives

Return to [ 02 | June | 1998 ]

<< Previous Message << >> Next Message >>

Date: Tue, 2 Jun 1998 20:45:18 PST8PDT

From: Tom Emerson <starman@inetworld.net >

Reply-To: scoug-programming@scoug.com

To: scoug-programming@scoug.com

Subject: SCOUG-Programming: GetLine()

Content Type: text/plain

On Tue, 2 Jun 1998, Peter Skye wrote:

> Emerson, Tom # GPS-MDI wrote:
>
> A lot. But it summarizes to "the heck with what's in the buffer, what
> does getline() give us?"

Uh, yeah, I do get that way sometimes :)

>
> And what if somebody else is writing an FTP client, and they send you BS
> (backspace) characters? Should we handle them in getstring(), or just
> blow out the command as containing an invalid character, or what?

Actually, the FTP client should NOT send backspaces -- in fact, the
"client" should not be sending "character" oriented data in the first
place, i.e., the client shouldn't send each and every keystroke of the
user as he or she bangs on the keyboard... (A TELNET client, OTOH, is
quite different and in that case one WOULD expect characters transmitted
as they occur at the keyboard) A "well behaved" FTP client should prompt
the user for a command (or in the case of a GUI, interpret mouseclicks on
objects in an "appropriate" manner) and transmit this elusive LINE of data
we have been talking about. [BTW: this also invalidates the scenario of
the "slow" keyboardist (?) described a few messages ago -- it wouldn't be
until the end-user pressed return after the phone call that the "buffer"
would be sent...]

>
> Tom says, quite correctly, that we need a definition. We can each write
> our own version of getline() (the code is quite simple), but we should
> at least all agree on the options and then implement just what we
> individually want.
>
> I think you should at least allow (in the options) for multiple lines
> and implementation of BS (backspace). The delimiter string can be any
> string of one or more of {0Dh, 0Ah, 0Ch}, and any other control
> characters (which includes 7Fh) or characters >= 80h are simply
> discarded. This does mean a character-by-character examination of the
> buffer data.

This is where the RFC comes into play -- in the case of the various
protocols (FTP, NNTP, POP(2/3) etc.), each has a list of valid "commands"
as well as valid charactersets for those commands. Take, for instance,
the old Apple ][ computer -- since a backspace is a legal character in a
filename (!), one must somehow logically allow for that character to be
sent as part of, say, a filename in a get/put command [historical note:
the only way one got backspaces, carraige returns, and other "unprintable"
characters into filenames was to assemble the "filename" as a string
variable and open the file from a system call. This was an early form of
anti-piracy (if you couldn't type the file name, you couldn't copy it --
at least, not until someone came up with a character-based GUI...) as well
as "internal setting protection" (same reason -- if you couldn't "type"
the filename, you couldn't view it with an editor...)]

Technically, Unix and various variants use a "backslash" escape character
to indicate "special" characters in filenames. For instance, the "long
file name" of "this is a long filename.txt" on a win/os2 filesystem could
and often will be sent via the get/put command as "this\ is\ a\ long\
filename.txt" [or, alternatively,
"this\040is\040a\040long\040filename.txt"] The definition being that
either the physical character following the "escape" character would be
taken verbatim UNLESS the character following the escape character is
numeric, in which case treat the following THREE digits as an OCTAL number
for the character in question [again, more "modern" implementations may
consider the "digits" to be hexadecimal] Either way, all this is in the
realm of the command parser and not (neccessarilly) the "line grabber"
function.

But I've digressed -- as I was starting to say, the RFC's in question
SHOULD indicate what are "valid" characters for a command string -- if
they DON'T accept non-printing ascii characters, then this should trigger
an error (but the triggering of the error might not be until some later
time, such as when the "LINE" is being parsed)

Stating the functionality of getline() without using (much) "programmer"
language, we have a couple of options:

1) [ultra generic] reads data from some device until a terminator is found
in the data or an error occurs. If no error occured, the data
[including/excluding] the terminating character is returned and a global
indicator is set to "successful". If an error occured, the global
indicator is set to "unsuccessful" and no actual data is returned.

2) [just an alternative to the above] Accepts a location to return the
data and a maximum amount of data to return. reads data from some device
until a terminator is found in the data or an error occurs. If an error
occurs, the result is set indicating [an/the] error. If no error occurs,
then the data read is loaded at the location given to the routine, up to
the maximum specified, and the return value is set to indicate success.

3) [more specific] reads data FROM A SOCKET until a terminator is found in
the data or an error occurs. If an error occurs, the return value is set
to indicate this and the routine exits. If no error occurs, data is
copied to a location specified as input to the routine [up to some
(specified) maximum] and the return value is set to indicate success.

You can see where this is heading -- we essentially have three things we
need to "return" to a calling process:

1) an indication of success or failure
2) the amount of data read from the socket
3) the data itself.

If it is a "well known fact" that the data CANNOT contain certain
characters [namely, NULLS or the LINE TERMINATOR itself], then return
value 2 can be eliminated and it's value inferred from the data itself --
it will either be NULL terminated (a classic "C" string) or terminated by
the terminator found in the input stream. Also, as is the case in many
UNIX type "system" calls, the success or failure of a routine is
often gleaned from some (HIGHLY volatile) GLOBAL variable.

Be wary of "co-mingling" items 1 and 2 by returning either a
[presumed positive] number of bytes or an "error". If you aren't careful
in selecting the "error" value, you can run into trouble -- for instance,
if the "error" value is defined to be -1, what happens if you
(legitemately) want to read 65535 bytes of data [such as, from the DATA
port?] Right now, the routine WON'T handle this as the internal buffer
isn't big enough to accept that many characters, but say "somewhere down
the line" [i.e., 5 years from now when everyone would sneer at you for a
packet size less than a megabyte :) ] this buffer gets expanded so that it
COULD return 65535 bytes -- what then? [of course, this is based on the
assumption that we're dealing with a 16 bit return value, not a 32 bit
one] WORSE scenario that could occur: by returning a "-1" value, you
imply that the value returned IS SIGNED, but what if someone, in some
other part of the program, assigns the result of this call to an UNSIGNED
variable? In the case of an error, the "calling" program would MISTAKENLY
presume that it had REALLY returned 65535 bytes [or worse, 2 or 4 gig -- I
can never remeber what 2^32 is... :)] and MIGHT actually try to access
this "returned" data...

>>ACK<< I've done it again! :) [maybe I should write one of those
"unleashed" books... (or maybe I've already READ too many of them !)]

OK -- one last item before I go [promise!]

I took a look back at the ORIGINAL definition given by Rollin:

"The purpose of GetLine() is to retrieve one line of
text from the remote session. Its prototype is:

char *GetLine (int Socket)"

(the code snippet that follows implies that a FAILED recv() call should
return a NULL). He goes on to ask "what errors do you see?"

From this minimal requirement and the mountain of input I've provided thus
far, here are the "errors" I detect in the statement of purpose:

1) no definition of what a "LINE" is [but inferred as "we're all
programmers..."]

2) weak definition of what to do in the case of an error

2a) what about timeouts? is that an "error" or just an
"inconvenience"? (it's actually both...) This is essentially a case were
recv() may return zero bytes read, but the actual cause for such a
condition isn't serious enough to warrent returning a NULL (at least, not
the first time a timeout occurs. On the third, 10th, or some other
configurable value, then YES, timeouts should cause an error condition to
be returned to the caller.)

3) A presumption that the "magic number" of 4096 bytes is sufficient to
handle all input from the client -- if one ever reads the "CERT" notices
of how hackers are getting into systems, they almost read like
boilerplate: "due to insufficient bounds checking, a carefully constructed
input line will overwrite memory and allow the execution of arbitrary
commands..." What happens if byte 4096 of the actual input buffer ISN'T a
terminator? Later, when "parsing" the line returned, the parsing routine
[may] blithely continue along executing commands until it "falls off" the
end of the buffer, at which point it may CONTINUE trying to execute "the
following memory locations" as if they were FTP commands...

4) What happens when MORE than one "LINE" of data arrives [based upon
the definition of a "LINE" being an arbitrary number of characters,
including zero, followed by a defined terminator]? If the routine that
calls this one simply parses off the first "LINE" and then calls this
routine for another "LINE" of data, then the second call will overwrite
data returned by the previous recv() call that contained a second (and
perhaps partial) "LINE".

OK -- NOW I'm done ;)

Tom Emerson

=====================================================

To unsubscribe from this list, send an email message
to "steward@scoug.com". In the body of the message,
put the command "unsubscribe scoug-programming".

For problems, contact the list owner at
"rollin@scoug.com".

=====================================================

<< Previous Message << >> Next Message >>

Return to [ 02 | June | 1998 ]

The Southern California OS/2 User Group
P.O. Box 26904
Santa Ana, CA 92799-6904, USA