/
Regular expression gymnastics - fall 2021

Regular expression gymnastics - fall 2021

 

By https://wellcomeimages.org/indexplus/obf_images/e8/b3/69290662f5c0300e8558a4e193c1.jpgGallery: https://wellcomeimages.org/indexplus/image/L0007192.htmlWellcome CC-BY-4.0

For this work with regular expressions, we will start with a browser-based tool.  

Below is a list of file names. We want to write a regular expression that captures parts of each string. Copy the text below into the browser window.

  1. write an expression that captures the station id (the leading part of the string, before an underscore)

  2. write an expression that captures the place name

  3. write an expression that captures each of the three components of each name separately (leaving off the “.csv”)

WY480140_ALTA_0658.csv
WY480540_BASIN_1991.csv
WY480915_BORDER_1458.csv
WY481175_BUFFALO_BILL_DAM_3649.csv
WY481675_CHEYENNE_2836.csv
WY481730_CHUGWATER_8790.csv
WY481905_COLONY_5951.csv
WY482595_DIVERSION_DAM_1327.csv
WY482715_DUBOIS_1476.csv
WY483100_EVANSTON_3425.csv
WY484065_GREEN_RIVER_6236.csv
WY485345_Lake_Yellowstone_2847.csv
WY485415_LARAMIE_5154.csv
WY485830_LUSK_6803.csv
WY486195_MIDWEST_1214.csv
WY486440_Moran_9705.csv
WY486660_NEWCASTLE_0202.csv
WY487105_PATHFINDER_DAM_0600.csv
WY487115_PAVILLION_3678.csv
WY487240_PINE_BLUFFS_6539.csv
WY487260_PINEDALE_6771.csv
WY487388_POWELL_FIELD_STATION_6778.csv
WY487760_RIVERTON_0259.csv
WY487845_ROCK_SPRINGS_3352.csv
WY487990_SARATOGA_1846.csv
WY488160_SHERIDAN_FIELD_STATION_7211.csv
WY488995_TORRINGTON_EXP_FARM_9001.csv
WY489615_WHEATLAND_5516.csv
WY489770_WORLAND_0653.csv
WY489905_YNP_MAMMOTH_7791.csv

In R, again write an expression that captures each of the three components of each name separately (leaving off the “.csv”), but further modify the text to be all upper-case.

Regular expression components (see ?regexp in R):

  1. character classes

    1. POSIX names [:alnum:] [:alpha:] [:digit:] [:lower:] [:upper:] [:blank:]

    2. sometimes shortcuts like \w or \s

    3. . (any character)

  2. enumerators: *, +, ?, [1,10], [2,]

  3. anchors: ^ (beginning of string) and $ (end of string)

  4. capturing matches and backreferencing, using () for capturing and $1 or \1 for backreferencing