RegExp Based Dimension Filter

Another possible application of Regular Expressions as extractions tools is to list all the dimensions (distances, time etc.) as a tuple containing their values and units.

The Code is as Follows:

  1. import re
  2.  
  3. def get_number_with_unit(inputString):
  4.     matchList=re.findall('[0-9]+\.?[0-9]*',inputString)
  5.     splitString=inputString.split()
  6.     resultList=[]
  7.     list_of_units = ['km', 'kilometers', 'm', 'kilometer', 'meter', 'mts']
  8.     for matchString in matchList:        
  9.         matchStringIndex = splitString.index(matchString)
  10.         if splitString[matchStringIndex + 1] in list_of_units:
  11.             resultList.append((splitString[matchStringIndex],splitString[matchStringIndex+1]))
  12.     return resultList
  13.  
  14. if __name__=="__main__":
  15.     print get_number_with_unit('Paris is 400 kms from London by air but is 7000 km from Mumbai')

In the above example, the function will return the following list of tuples:
  1. [ ('7000', 'km')]

400 kms was not returned as kms is not in the variable list_of_units which contains the units to be filtered. The user may add more units to the list to increase the effectiveness of the module.