Charlie Calvert on Elvenware

Writing Code and Prose on Computers

Elvenware

Regular Expressions

Python Reg Expressions

HTML: Find all the Style (or span) Tags

The problem here is that there can be a lot of junk in the tags. So you want to use something like: style.*. The problem is that you run across greediness. The parser won't stop at the end of the tag.

Suppose you have this:

<p>Bar</p>

Your goal is to find <p>, but instead, you keep selecting this: <p>Bar</p>.

A technique called negation is one way to turn off greediness. Search on this string p[^>]*. It will find <p>. The negation means that it will find all characters except the closing angle bracket. It selects until it finds the closing angle bracket.

For instance here is how to find all the class properties in Expression Web:

:bclass="[^"]*"

Or in a Posix compliant tool like NotePad++:

\sclass=[^>]*

The :b would be a space, in Expression Web, but in many other RegEx dialects it \s. Then look for the word class followed by an equals sign and an opening quote. The next character is the negation. We look for any characters we find  until we stumble upon another quotation symbal. Then we stop. But just to make sure we return something that includes that final quote, and nothing else, we explicitly reference that closing quote after the negation.

Here is how to do the same thing for styles:

:bstyle="[^"]*"

This kind of construct can help you find newlines:

<span$\r\n([^>]*)
>$\r\n(</span></span>[^>]*)

I the first of the above examples, you can remove the new line by replacing the result with this, which \1 represents the result returned by the part inside parenthesis:

<span \1

Here is a way to remove all the empty spans from a file in Expression Web:

<span>|<.span>

It looks for <span> or </span> and removes them.

Some Expressions

C# has a good regular expression parser in it. But not all regular expression parsers are good, especially those in editors. The one I rely on the most is built into notepad++.

Look above for the simplest way to find a style tag in a regular expression. But here are some expressions that might be useful in some cases.

Regular Expression in C#

See this page.