Rich.Uchytil.com

Progress | Fix Case

 

Fix the case of words. This is the article I wrote for Progressions.

Fixing the Case in Words

Recently we were migrating a bunch of data from one Progress database to another. One of my projects was to fix the case in employee names, customers, vendors, and users so they followed the format of capital first letter, lowercase the rest of the word. FIRST INTERSTATE became First Interstate. At first this seemed really easy, just take whatever text is in the field, use NUM-ENTRIES with a space as the delimiter, then do a SUBSTRING on the word using UPPER on the first letter and LC on the rest.

But as usual, issues came up. Some names had a slash, hyphen, or period separating the names instead of a space. FIRST-INTERSTATE couldn't be First-interstate. Ok, so I had to account for that. Other words HAD to always be uppercase, and some just aren't words, like B.S.F. and need to stay in uppercase.

This actually was a fun little project and I was quite pleased with my resulting program. It's not 100% perfect. You might have a name like CMI Inc. where you want the CMI to be uppercase. The problem with these is how do you know these type of names are supposed to be CMI and not Cmi? I haven't figured that part out yet. What I came up with gets me about 98% of the way there.

So here's my code. I do have some stuff hard-coded in here (like the states, and words that MUST be in uppercase). This was just a quick and dirty utility needed for a one-time conversion. If I were going to use this in my application, I would put these items in a table somewhere - I hate hard-coding. It's currently written as a standalone program you would call, but you could easily make it a procedure or super procedure. Feel free to play with it. If you do find a way to improve it, I would appreciate it if you wouldn't mind sharing that with me. Thanks much and enjoy!

/***************************************************************************** * PROGRAM: fixcase.p - Fix Case of Words * DATE: 01/19/04 * AUTHOR: Richard Uchytil * DESC: This will change the Cust Name from all caps or whatever to the * proper format, which is first letter caps, the rest lower. There * are some special exceptions, like the words "of,the,and,or,in", * these should be lowercase. Others need to be uppercase like * "ID,DBA". In some cases there is a "-" in the word, and need * to change the word after the "-" to the proper case too. * Also, some aren't words, but just initials, like "B.S.F.", * these need to stay as they are. This program should fix about * 98% of the problems, but there still might be issues, like a * company named CMI that gets changed to Cmi. I can't think of * a way around this other than to maybe use a spell checkers * dictionary. If the word isn't in the dictionary, then make * it all caps. Of course many company names have names in them * that wouldn't be in the dictionary, so this isn't valid either. * * INPUT/OUTPUT PARAMETERS * - iWordsToFix - could be one word or a full sentance. * - oFixedWords - fixed words ready to go back to calling program. * * ---------------------------- MODIFICATIONS -------------------------------- * WHO * DATE MODIFICATION * -------- ------------------------------------------------------------------ * *****************************************************************************/ /** INPUT/OUTPUT PARAMETERS **/ DEF INPUT PARAM iWordsToFix AS CHAR NO-UNDO. DEF OUTPUT PARAM oFixedWords AS CHAR NO-UNDO. /** LOCAL VARS **/ DEF VAR vSpecChar AS CHAR NO-UNDO INIT "\,/\-". DEF VAR vFixed AS CHAR NO-UNDO. DEF VAR vWord AS CHAR NO-UNDO. DEF VAR vTemp AS CHAR NO-UNDO. DEF VAR i AS INT NO-UNDO. DEF VAR j AS INT NO-UNDO. DEF VAR k AS INT NO-UNDO. /*****************************************************************************/ ASSIGN iWordsToFix = CAPS(iWordsToFix) oFixedWords = "". DO i = 1 TO NUM-ENTRIES(iWordsToFix," "): vWord = ENTRY(i,iWordsToFix," "). /** WORDS IN PARATHESES NEED TO BE FIXED - REMOVE THE LEFT ** ** PARAN FROM THE FIRST WORD SO WE CAN FIX THE FIRST WORD **/ IF SUBSTR(vWord,1,1) EQ "(" THEN ASSIGN oFixedWords = oFixedWords + SUBSTR(vWord,1,1) vWord = SUBSTR(vWord,2,60). /** IF THERE'S ONE OR MORE "/,\,-" IN THE WORD THEN WE NEED TO ** ** PROCESS EACH WORD BETWEEN THESE. THE EASIEST WAY TO DO THIS ** ** IS TO PROCESS EACH CHAR AND BUILD THE WORD ONE LETTER AT A ** ** TIME. ONCE WE HIT A SPECIAL CHAR THEN CORRECT THE CASE FOR ** ** THAT WORD, THEN MOVE ON TO THE NEXT WORD. IT MIGHT BE A ** ** LITTLE SLOWER THAN OTHER WAYS, BUT IT'S THE EASIEST TO ** ** UNDERSTAND AND THE LEAST AMOUNT OF CODE. **/ IF INDEX(vWord,"/") > 0 OR INDEX(vWord,"\") > 0 OR INDEX(vWord,"-") > 0 THEN DO j = 1 TO LENGTH(vWord): vTemp = vTemp + SUBSTR(vWord,j,1). IF INDEX(vSpecChar,SUBSTR(vWord,j,1)) > 0 OR j = LENGTH(vWord) THEN DO: RUN CorrectCase.ip (INPUT vTemp, OUTPUT vFixed). ASSIGN oFixedWords = oFixedWords + vFixed vTemp = "". END. END. /** IT'S JUST A WORD, CORRECT THE CASE **/ ELSE DO: RUN CorrectCase.ip (INPUT vWord,OUTPUT vFixed). ASSIGN oFixedWords = oFixedWords + vFixed. END. /** ADD A SPACE BETWEEN WORDS **/ oFixedWords = oFixedWords + IF i <> NUM-ENTRIES(iWordsToFix," ") THEN " " ELSE "". END. /** END OF DO i = 1 TO NUM-ENTRIES(iWordsToFix," " ) **/ RETURN oFixedWords. /*--------------------------- end of main code ------------------------------*/ /***************************************************************************** **** INTERNAL PROCEDURES **** *****************************************************************************/ PROCEDURE CorrectCase.ip: /** CORRECT THE CASE OF A WORD **/ /** INPUT/OUTPUT PARAMETERS **/ DEF INPUT PARAM vInWord AS CHAR NO-UNDO. DEF OUTPUT PARAM vOutWord AS CHAR NO-UNDO. /** LOCAL VARS **/ DEF VAR vAlwaysLC AS CHAR NO-UNDO INIT "of,the,and,in". DEF VAR vStates AS CHAR NO-UNDO. DEF VAR vNormal AS CHAR NO-UNDO INIT "INC.,INT'L.,DIST.,ASSOC.,AUTH.,ST.". DEF VAR vToCaps AS CHAR NO-UNDO INIT "ID,DBA". DEF VAR i AS INT NO-UNDO. /** STATES **/ ASSIGN vStates = "AL,AK,AS,AZ,AR,CA,CO,CT,DE,DC,FM,FL,GA,GU,HI,ID,IL,IN,IA," + "KS,KY,LA,ME,MH,MD,MA,MI,MN,MS,MO,MT,NE,NV,NH,NJ,NM,NY,NC," + "ND,MP,OH,OK,OR,PW,PA,PR,RI,SC,SD,TN,TX,UT,VT,VI,VA,WA,WV," + "VI,WY". /***************************************************************************/ /** LOWERCASE THE ENTIRE WORD **/ IF CAN-DO(vAlwaysLC,vInWord) AND i > 1 THEN vOutWord = LC(vInWord). /** STATE ABBREVIATIONS SHOULD BE UPPERCASE **/ ELSE IF CAN-DO(vStates,vInWord) THEN vOutWord = CAPS(vInWord). /** THIS WORD HAS A PERIOD IN IT BUT WE WANT IT CHANGED TO ** ** NORMAL CASE. HAVE TO DO THIS HERE BECAUSE LOWER IN THE ** ** CODE WE CHECK FOR WORDS WITH PERIODS AND MAKE THOSE ALL ** ** UPPERCASE. **/ ELSE IF CAN-DO(vNormal,vInWord) THEN vOutWord = CAPS(SUBSTR(vInWord,1,1)) + LC(SUBSTR(vInWord,2,60)). /** UPPERCASE THE ENTIRE WORD **/ ELSE IF CAN-DO(vToCaps,vInWord) OR INDEX(vInWord,".") > 0 THEN vOutWord = CAPS(vInWord). /** MAKE THE WORD NORMAL **/ ELSE vOutWord = CAPS(SUBSTR(vInWord,1,1)) + LC(SUBSTR(vInWord,2,60)). END PROCEDURE. /** END OF CorrectCase.ip **/ /*----------------------------------- eof() ---------------------------------*/