Interview patterns - technical interview questions logo
 

Regular expressions in technical interviews

by Alex 9. April 2009 20:31


I personally strong believer that “syntax doesn't matter” then we are talking about technical interview. It pisses me off then I hear something like “Name third parameter of fourth overload of XYZNever_heard_before function in C#” during the interview. However the reality is you will get these questions eventually and need to be prepared.

As you guessed, I consider regular expression questions from exactly this category.  Working with regular expressions on the daily basis for the last five years I still use reference almost every time I need to write something somehow complicated. It is just unfair to expect candidate to know the answer right away on the white board. But again, it is something you just need to be prepared for; especially in this thought times then employers looking on big pool of candidates and can be picky.

So, for starters, couple basics which would really help you to figure out most of the RegEx interview problems and might be useful to know anyway.

  • "*" matches 0 or more occurrences
  • "." matches any character
  • "?" matches single character
  • "^"  ignore character.
  • "[]" range definition

As an example (.*?) would match any sequence of characters on the single line or in all document if Singleline option is enabled.  Note, the rounded brackets defines group which will make your life easier then you will iterate though the matches collection in the code.

This brings us to the probably the most popular interview question on regular expressions: 

Question #1: Write RegEx which would extract all anchor text from HTML page

Answer: Simplest regex would be <a[^>]*?>([^>]*?)</a> whitch matches properly formated HTML. 

I got lots of comments and questions about edge cases. My advice, do not overcomplicate ! Just voice out your assumptions and in the most cases you will be in a great shape. If you want to get a little more elaborate, throw in white space checking as <\s*a[^>]*?>([^>]*?)</a\s*>

Question #2: Write a regular expression which matches a email address 

Answer: This is very tricky question. Matching emails is not as trivial as it seems and the answer can be as simple as \w+@\w+\.[\w]{3}  or as messy as ^(([A-Za-z0-9]+_+)|([A-Za-z0-9]+\-+)|([A-Za-z0-9]+\.+)|([A-Za-z0-9]+\++))*[A-Za-z0-9]+@((\w+\-+)|(\w+\.))*\w{1,63}\.[a-zA-Z]{2,6}

Remember, it is interview question, not production code. So, be simple. I always recommend starting with basics:
\w+@\w+\.[\w]{3}

Then explain its limitations and give couple examples there the pattern would give false positive or false negative results. (such as test@test.111). From my experience, usually it is more than enough to impress interviewer.  From the other side, you always can iteratively enhance this pattern engaging interviewer into the conversation giving you the chance to showcase your knowledge.

Another popular question is also about parsing HTML

Question #3: Given a line of text, write a regular expression to strip all the HTML tags from it?

Answer:

public static string CleanHTML(string htmlPage)
{
  return Regex.Replace(htmlPage, "<[^>]*>", string.Empty);
}

See the pattern? Just remember ([^>]*?) and ([">]*?)  expressions and use them if get a question about using RegEx for HTML parsing.
Phill Haack has really good post on parsing HTML with Regular expressions I strongly recommend to read.

Another big subset of RegEx interview questions is puzzle-like problems. Interviewer gives huge and scary RegEx and with the evil smile asks to explain what's going on there and what does the code do.

So, let’s start with something very simple:

Question #3:

public static string TestQuestion(string date)
{
return Regex.Replace(date,"([0-9]+)/([0-9]+)/([0-9]+)","$2/$1/$3");
}

Explain what does this function do. Give an example, how would you use it.

Answer:

Ok, you probably already get the hint from the function definition… it is something to do with the date :) If you look on the expression it is obvious that it defines three numeric groups separated by the slash. The replace is simple rearrangement of these groups.

So, in this case following example:

string date = "01/02/03";

Console.Write(TestQuestion(date));

Would output:   “02/01/03”

Which can be an example of translating European date into American and vice versa.

Question #4:
What pattern would following RegEx match: (#?([A-Fa-f0-9]){3}(([A-Fa-f0-9]){3})?)

Answer: This should be easy for the web developers. Look on the pattern, the example string you can write is #FFFFFF which is HEX color for white. So, you would probably use it as a validator for the color field. For this type of questions just write on the board strings which would fit into the pattern until you understand that this set has in common.

There are number of interview-like type of problems covered in Brad Merrill ‘s article C# Regular Expressions including:

  • Swapping First Two Words using RegEx
  • Removing Leading and Trailing Whitespace
  • Extracting All Numbers from a String
  • Finding All Caps Words
  • Finding All Lowercase Words
  • Finding All Initial Caps
  • Finding Middle Initial in the name
  • And some more..

Useful resources:

Regular Expressions Cheat Sheet (V2)  Really handy for phone screens

Derek Slugger’s  .NET Regular Expression Tester

 

Currently rated 4.2 by 5 people

  • Currently 4.2/5 Stars.
  • 1
  • 2
  • 3
  • 4
  • 5

Tags:

Patterns | C#


Comments




All material copyright © North Pacific Technology Group, LLC. All rights are reserved. No part of any material on this web site may be reproduced, or stored in a database or retrieval system, distributed, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission. Terms of use.

Powered by BlogEngine.NET 1.4.5.0

Job Search

what
job title, keywords
where
city, state, zip
jobs by job search