SCOUG-SundialSIG Mailing List Archives
Return to [ 01 | 
June | 
2001 ]
<< Previous Message << 
 >> Next Message >>
 
 
 
Content Type:   text/plain 
Peter Skye wrote:  
 
"Q:  What is the "content" and the "context" for the following   
three data  
records?  
 
  "The quick brown fox jumped over the lazy dog."  
  "It's crackers to slip a rozzer the dropsy in snide."  
  "Baile, baile la bamba, yo no necesito un poco de gracias."  
 
As Humphrey Bogart would say, "Here's looking at you".  The fact   
that you included each line of text within quotes indicates that   
it is a single attribute (of text type), thus a single field (in a   
record, a single column (in a row).  It would frankly be true even   
without the quotes if these existed as separate lines of text.  If   
they did not, if they constituted a paragraph say, then their   
entirely is treated as a single unit, a single attribute.  If they   
were a section, a chapter, a book, or a library, in whatever   
manner they were stored as a single whole, they still represent   
only a single attribute (field, column, etc.).  
 
So if you have an input "stream" of data, you have to make some   
serious decisions when it comes to storing it.  If you preprocess   
it prior to storage, i.e. decompose and selectively extract   
textual components of your definition, then I will guarantee you   
that whatever unstructured nature they may have had in input has   
been lost when you "tuck" them into your database or spreadsheet.    
We impose structure on data, even on unstructured data.  
 
"It's easier to teach the macro/program how to process the   
different "types" of data files we receive than it is to create   
some "master plan" (a data format) into which all data files must   
fit."  
 
Tsk, tsk, Master Peter, read what you have written.  Your only   
saving grace here is the word "all" which implies that you have   
multiple "master plans", i.e. data formats, into which you fit   
"all" your data (at least the extracted portions).  You have to   
because any system that you use, any application you have written   
or any written by others, does not allow any other method than a   
structuring one.  
 
If the multiple is more convenient and easier to use (and   
implement), then it just saves you from the embarrassment of   
having to constantly change the "master file" due to the dynamics   
of your data environment.  I will guarantee that you will use some   
translation process, if only in your head, that will allow you to   
deal with multiple formats at a time.  Otherwise the data in   
whatever format is useless, if it cannot serve your purpose.  
 
I would think in investment data capture when dealing with   
multiple "fixed" sources and live feeds that you do as much as   
possible to only extract those portions of text data that suits   
your needs plus organize those extractions (impose a format) that   
renders them more suitably to your use.  Perhaps I err.  Maybe you   
do not follow the Principle of Least Effort.  
 
=====================================================  
 
To unsubscribe from this list, send an email message  
to "steward@scoug.com". In the body of the message,  
put the command "unsubscribe scoug-sundialsig".  
 
For problems, contact the list owner at  
"rollin@scoug.com".  
 
=====================================================  
 
  
<< Previous Message << 
 >> Next Message >>
Return to [ 01 | 
June | 
2001 ] 
  
  
The Southern California OS/2 User Group
 P.O. Box 26904
 Santa Ana, CA  92799-6904, USA
Copyright 2001 the Southern California OS/2 User Group.  ALL RIGHTS 
RESERVED. 
 
SCOUG, Warp Expo West, and Warpfest are trademarks of the Southern California OS/2 User Group.
OS/2, Workplace Shell, and IBM are registered trademarks of International 
Business Machines Corporation.
All other trademarks remain the property of their respective owners.
 
  |