A powerful parser for Java and Android: Jparsec

Today I want to talk about a powerful Java parser: Jparsec (http://jparsec.codehaus.org/). I recently used it for one of my projects and found it really helpful. This article is mainly for students, who are still at school and have no industry experience. If you are already in Java development career for years, then you might already a user of this tool or at least met it before.

Back in school, whenever I needed to parse strings to capture inputs for my projects, I usually took the shortcut to make the development faster, by using the easiest way: str.split(). Normally a str.contains() command would be enough to help me judge whether the input string is the format my code required, then splitting the string to an array of words by defining delimiters finished the job.

This is an efficient approach when the project is small enough and while the input string format is rigidly controlled. If I ever encountered a more complex case, constructing a regex would be my final weapon to launch. I had never been to a circumstance that these methods failed on me, since all the projects was not at the engineering level yet.

However, these approaches became unacceptable for my first job. str.split() can only handle cases when delimiters are not complex, and if you can’t control the input format, things get worse because you never know what the string from user input looks like. Regex is hard to write at the first place and it’s even harder to maintain because it’s not human readable when it gets long and complicated. A experienced team member recommended Jparsec to me and it becomes my daily solution for parsing at these days.

Accurately speaking, Jparsec is a parser building tool. It helps you build mini java parsers quickly. The standout of Jparsec is its combinator nature. You can start building simple parsers such as whitespace parser, and then combine these simple parsers to a more complex parser. The way it works is just like the evolution of functional programming: Taking advantage of simple functions to generate the most powerful function!

Let’s see some codes:

/*
    ** fundamental parsers
     */


    // zero or more whitespace
    private static final Parser<String> whiteSpace() {
        return Scanners.string(" ").many().source();
    }

    //left parentheses
    private static final Parser<String> leftParen() {
        return whiteSpace().followedBy(Scanners.string("(")).followedBy(whiteSpace());
    }

    //right parentheses
    private static final Parser<String> rightParen() {
        return whiteSpace().followedBy(Scanners.string(")")).followedBy(whiteSpace());
    }

    //comma
    private static final Parser<String> comma() {
        return whiteSpace().followedBy(Scanners.string(",")).followedBy(whiteSpace());
    }

    //dot
    private static final Parser<String> dot() {
        return whiteSpace().followedBy(Scanners.string(".")).followedBy(whiteSpace());
    }

    //negative sign
    private static final Parser<String> negativeSign() {
        return whiteSpace().next(Scanners.string("-").source()).optional().followedBy(whiteSpace());
    }

As seen in the code, we first built a whitespace parser, it scans for ” ” zero or many times, and returns the scanned whitespace(s) as a string(by using source()). I won’t talk much on the syntax and APIs since they are basically human readable and you can find pretty much you need in the documentation listed at the beginning of this article.

Later on, we used the similar way to generate more fundamental parsers such as parentheses, comma, and dot. Note how we generate these parsers by taking advantage of the whitespace parser we built earlier: each parentheses, comma and dot can comes after whitespace, and followed by more whitespace. We are now in the evolution!

Let’s together see a more complex case:

//decimal number with possible negative sign
    private static final Parser<Integer> integer() {
        return Parsers.sequence(negativeSign(), Scanners.INTEGER, new Map2<String, String, Integer>() {
            public Integer map(String neg, String value) {
                if (neg != null) {
                    return -Integer.parseInt(value);
                } else {
                    return Integer.parseInt(value);
                }
            }
        });
    }

In the above code, we generated a parser that parses negative integers. Whenever we see a negative sign followed by an integer, we create a map to translate these inputs to the result we want(in this case a negative integer). The map defines three parameter types, the first two comes from the parsers in the sequence. and the last parameter type is the result(return) type. Inside the map method, we can do whatever we want to generate the ideal result. In this case, if we see a negative sign, we return the negative value of the value we got from the second parser in the sequence, otherwise we return this value directly.

In the end, if we want to parse a string “-5”, we write:

int result = integer().parse("-5"); 

We got result = -5.

Here I’m only showing a basic example, but you can already see the power of Jparsec. You can use it to define your own gramma and parse whatever value you want from any kind of string input format. It’s much more human readable than regex, and it’s much more reliable than str.split().

Have fun with Jparsec!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s