Text to Regex Generator

So on my blogroll, Jason Kottke posted this link: http://txt2re.com/. What is it? For those of you that don’t know, it’s a text to regular expression generator. What the heck is a regular expression? It’s basically a way to match a certain string (or sub-string) or text. It could be anything from looking for punctuations, certain patterns, words, and so forth. I have to admit, regular expressions (or regex) can be fairly tricky sometimes, but it’s definitely a great tool, especially in the computational field. So what is this generator suppose to do? You type in a string of text, then you proceed to select the text you want to search for with your expression by clicking on the links below those letters or words. At the very end, the generator spits out a piece of code that you can use to help find that string.

What’s good about it? Well, first off, it’s good in that it can generate code for any language. C#, Java, Python, PHP, and so forth. There’s a lot of possibilities there. Another good thing about it is that if you don’t know regex, it makes things a lot easier. Again, regex is pretty tricky and does takes some practice to master it. If you don’t plan on coding much, this will probably be useful since it doesn’t require you to learn the syntax for it or develop the string needed to extract what you want. The simplicity of using the generator is pretty good too. Just click the link of what you want the expression to go after. Character, variable, word or even an entire string. It does it pretty easily. If you make a mistake, like you wanted a character and not the entire word, you just reclick that link for the word and it’ll give you all the options again.

However, there’s definitely a few things that are problematic (or at least a programmer can be nit-picky about). First, if you’re going to be programming stuff, the likelihood of needing to know regex is probably high. So resorting to a generator every time probably isn’t good practice. Secondly is more of a nit-picky issue. In the example that they have when you first go to the page is the following string: 13:Apr:2011 “This is an Example!”. Now if you try to setup for the expression such that it’s simply This is an Example, the code is rather unusual. Here it is in Python:

import re

txt='13:Apr:2011 "This is an Example!"'

re1='.*?'	# Non-greedy match on filler
re2='(This)'	# Word 1
re3='( )'	# White Space 1
re4='(is)'	# Word 2
re5='( )'	# White Space 2
re6='(an)'	# Word 3
re7='( )'	# White Space 3
re8='(Example)'	# Word 4

rg = re.compile(re1+re2+re3+re4+re5+re6+re7+re8,re.IGNORECASE|re.DOTALL)
m = rg.search(txt)
if m:
    print "("+word1+")"+"("+ws1+")"+"("+word2+")"+"("+ws2+")"+"("+word3+")"+"("+ws3+")"+"("+word4+")"+"n"

Notice anything unusual? re3, re5, re7 are all the same white space characters. Yet, they are all assigned a specific variable. This is fairly bad since if they’re the same, you probably could just assign them all one variable and use that variable multiple times. I won’t bash on this (too much anyway) mainly because this is a generator. It was released fairly recently, thus it’s still in the early stages of development. Also, since you’re creating a generator, it might not necessarily differentiate one whitespace character from another (or another variable from another, or another word from another). But that can be fixed pretty easily by identifying whether or not that word or character had been used previously in the string.

Overall, I think this is a cool idea. It’s simple, universal, and does it pretty quickly. You don’t have to worry about unusual load times for the code when you go from one language to another. Those kinks are still there, but they can be fixed. It’s definitely a great tool to use if you don’t use regex often or at all… However, if you do use it on a regular basis, actually learn how to use it and not resort to using a generator. 😛

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.