Can interpretable models compete with black box models?

Learning DNF rules via gradient descent.


While many efforts in the field of interpretability tries to explain the reasoning behind black box models, this project aims at learning inherently interpretable models that can compete with them.

Small Boolean rules in Disjunctive Normal Form (DNF) are easily interpretable. In fact, when humans must make a decision, they tend to consider only some important attributes and arrange them using disjunctions of conjunctions. For example, someone who plans to buy a used car in case it has low mileage and medium price, or just low price, is following a simple DNF rule. This makes DNF rules a choice of predilection for an interpretable model. While the learning of DNF rules has been studied from a theoretical point-of-view since the 80’s, we are aiming to learn DNF rules on noisy “real world” data, where little work has been done. One approach we investigate consists in representing Disjunctive Normal Form rules as neural networks and training them with gradient descent. The objective of the project is to demonstrate that accurate classification results can be achieved while extracting interpretable rules.


Data used for experiments