Another possible application of Regular Expressions as extractions tools is to list all the dimensions (distances, time etc.) as a tuple containing their values and units.
The Code is as Follows:
- import re
- def get_number_with_unit(inputString):
- matchList=re.findall('[0-9]+\.?[0-9]*',inputString)
- splitString=inputString.split()
- resultList=[]
- list_of_units = ['km', 'kilometers', 'm', 'kilometer', 'meter', 'mts']
- for matchString in matchList:
- matchStringIndex = splitString.index(matchString)
- if splitString[matchStringIndex + 1] in list_of_units:
- resultList.append((splitString[matchStringIndex],splitString[matchStringIndex+1]))
- return resultList
- if __name__=="__main__":
- print get_number_with_unit('Paris is 400 kms from London by air but is 7000 km from Mumbai')
In the above example, the function will return the following list of tuples:
- [ ('7000', 'km')]
400 kms was not returned as kms is not in the variable list_of_units which contains the units to be filtered. The user may add more units to the list to increase the effectiveness of the module.