Making NFAs smaller

So, you might be wondering what I’m working on now. Basically, I’m looking at algorithms that are motivated by the idea of reductions of nondeterministic finite automata.

The problem of NFA reduction is basically to take an NFA and make an equivalent NFA with fewer states. The natural question to ask is whether we can find an equivalent NFA with the minimal number of states, which is the NFA minimization problem. There are some good reasons that we focus on the easier problem. The first is that unlike DFAs, there is no unique minimal NFA. The second reason is that NFA minimization is super hard, as in PSPACE-complete (if you know how NP is “hard”, well, PSPACE is “harder” or “really friggin hard”).

Well, okay, but what if we’re not so worried about getting it as small as possible and we just want to get the damned thing smaller. If we can’t guarantee that they’re as small as we can make them, is there some kind of measurement to tell us how far away we are? Or are there special cases of NFAs where we can get the smallest one? And it turns out the answer is probably almost always no. Of course, that won’t stop us because there reducing the number of states in an NFA is pretty useful.

There are a number of ways to do this and it’s not known which way is the best way, otherwise, we could work on approximation or minimization instead. A lot of solutions take the approach of turning the NFA into a DFA, minimizing it, and then turning it back into an NFA and seeing what we get. This is nice, because we know there’s a smallest DFA and it’s not too hard to find. The problem with this is that the growth of the number of states in the NFA to DFA transformation can be exponential in the worst case. We’d like to avoid this.

The idea is to try finding states that are superfluous. What do we mean by superfluous? Well, if I have a word $ababa$ and I can end up in either a state $p$ or a state $q$, then we probably don’t need two different states. Similarly, if we start in a state $p$ or a state $q$ and use up the word $aaaba$ to get to a final state, we might not have needed two different states for that. This is essentially the kind of thing we do in DFA minimization, but in that case it’s a lot easier because the state transitions are deterministic.

What we’d do formally is define a relation that captures this idea. For an NFA $N=(Q,\Sigma,\delta,q_0,F)$, we’d define some relation $R$ that satisfies the following conditions:

  1. $R\cap(F\times(Q-F))=\emptyset$
  2. $\forall p,q\in Q, \forall a\in \Sigma, (pRq \implies \forall q’\in\delta(q,a), \exists p’\in \delta(p,a), q’Rp’)$.

Condition 1 rules out final states being related with non-final states. Condition 2 tells us that if two states $p$ and $q$ are related, then every state that $q$ has a transition to will be related to some state that $p$ transitions to on the same letter. As long as these two conditions are held, then we can define any crazy old relation to work with and gather up our states into a bunch of states.

This is essentially what we do when we want to minimize DFAs. The difference is that it’s a lot easier to define this relation because transitions only have one outcome in the deterministic case. In fact, there’s a specific relation, called the Myhill-Nerode equivalence relation, which has the property that $L$ is a regular language if and only if the Myhill-Nerode relation has a finite number of equivalence classes. And these equivalence classes turn out to correspond to the states in the minimal DFA for $L$.

However, like was mentioned before, we have no such luck with NFAs. We have a lot of relations we can choose from and it’s not clear which one of these is the best or things like which combination or how to iterate them so that we can get the best one. This is where the hard part is, figuring out which of these things gets us the smallest NFA in the most efficient way possible.