Regular Expressions are Anything But Regular

Posted in Technology
Sun, Nov 15 - 8:35 pm EST | 5 years ago by
Comments: 4
Share This Post:
  • Facebook
  • StumbleUpon
  • Tumblr
  • Reddit
  • Twitter

I’ve been spending a bit of time the last couple of days trying to figure out exactly how regular expressions work. I’ve got to use one in an application I’m working on and I’m just not getting the syntax organization at all.

regex-illustration

Although you may think it’s gibberish, the text above is the regular expression I’m using for the solution to my earlier problem. Doesn’t look very "regular" does it? Aside from the fact it’s using characters I’m familiar with, the order and meaning behind them might as well be hieroglyphics to me. So, what is a "regular expression"?

The Wiktionary website defines it as follows:

A concise description of a regular formal language with notations for concatenation, alternation, and iteration (repetition) of subexpressions.

My task for my application was to exclude any .zip files from being able to be uploaded to the server. I couldn’t ever find a regular expression that excluded .zip files, but I did find the one above that basically allowed a list of other files.

Here’s the expression above again:

^.+\.(([jJ][pP][eE]?[gG])|([gG][iI][fF])|([pP][dD][fF])|([dD][oO][cC])|([dD][oO][cC][xX])|([bB][mM][pP])|([tT][xX][tT]))$

The "regular expression", or "regex" above basically looks for any .jpg, .jpeg, .gif, .pdf, .bmp, .doc, .docx, or .txt file and allows it to be uploaded. The upper and lower case version of the letter within brackets specifies that the file extension could be typed either way.

I’m still not real sure what all the other symbols are really specifying in there. I’ve got more to learn for sure.

Related Posts

Share This Post:
  • Facebook
  • StumbleUpon
  • Tumblr
  • Reddit
  • Twitter
  • Graeme

    Most programs will let you do things case insensitively so you could shorten it to something like /.+\.(jpg|jpeg|gif|pdf|doc|docx|bmp|txt)$/i

    Even with egrep you can use the -i flag to turn on ignore case.

    I know you’ve got extra bits in there for jpeg/jpg and doc/docx but that should help you along.

    Again, depending on the software you can break the regex out across multiple lines and comment each line so you know what’s going on if you need to make more complex ones.

  • http://www.ericmartindale.com Eric Martindale

    What language are you writing this in? There’s a big chance you can easily slim this down by using a case-insensitive option.

    Also, if you want to exclude zip files, searching for the last couple characters of the filename is not an accurate method. What you actually want to do is read the contents of the file and see if it’s a zip file (or other archive, if necessary). Otherwise, people will be able to subvert your method quite easily by simply renaming the file–but embarking on a coding journey of this magnitude may be more trouble than it’s worth.

    Be wary of techniques that allow users to hide files inside of images.

  • http://www.bnpositive.com/blog Jason Bean

    Thanks for the input guys. I’ll see if I can work those modifications into my app. I basically am just trying to allow attachments that can be converted into an image file for automatic faxing when an email address doesn’t exist for a user. Thanks for the help.

  • Pingback: Free Apps for Writing Regular Expressions : EveryJoe - Sports News – Tech Reviews – Entertainment – Life Tips for EveryJoe