Identifying Soft Cores in Propositional Formulæ

Gilles Audemard

, Jean-Marie Lagniez

, Marie Miceli

and Olivier Roussel

CRIL, Univ. Artois & CNRS, Lens, France

Keywords:

SAT, Explanation, #SAT, Max#SAT.

Abstract:

In view of the emergence of explainable AI, many new concepts intend to explain why systems exhibit cer-

tain behaviors while other behaviors are excluded. When dealing with constraints, explanations can take the

form of subsets having few solutions, while being sufﬁciently small for ensuring that they are intelligible

enough. To make it formal, we present a new notion, called soft core, characterizing both small and highly

constrained parts of GCNF instances, whether satisﬁable or not. Soft cores can be used in unsatisﬁable in-

stances as an alternative to MUSes (Minimal Unsatisﬁable Subformulæ) or in satisﬁable ones as an alternative

to MESes (Minimal Equivalent Subformulæ). We also provide an encoding to translate soft cores instances

into MAX#SAT instances. Finally, we propose a new method to solve MAX#SAT instances and we use it to

extract soft cores.

1 INTRODUCTION

Nowadays, SAT is well known and used for solv-

ing many different problems such as planning, ver-

iﬁcation, cryptography or mathematical conjectures

(Biere et al., 1999), (Heule et al., 2016). Solvers

become increasingly efﬁcient and new applications

emerge frequently. When a problem is modeled as

a SAT instance, it is often observed that its set of so-

lutions may differ from the one that was expected. It

may be due to errors in modeling, or result from the

fact that the problem is too constrained. In both cases,

it is worth providing the user with some form of ex-

planation clearing up why the expected solutions are

impossible.

In the particular case when the set of constraints is

unsatisﬁable, a MUS (Minimally Unsatisﬁable Sub-

set) can be extracted and reported to the user as an

explanation of the discrepancy between the solutions

obtained and those being expected. A MUS is a subset

of constraints which is unsatisﬁable but whose each

proper subset of it is satisﬁable (Bruni and Sassano,

2001). It can be seen as a minimal (for inclusion) part

of the formula which is unsatisﬁable. As instances

may contain an exponential number of MUSes com-

pared to their number of elements, one element from

https://orcid.org/0000-0003-2604-9657

https://orcid.org/0000-0002-6557-4115

https://orcid.org/0000-0002-2591-6491

https://orcid.org/0000-0002-9394-3897

each MUS has to be modiﬁed or removed to make

the instance satisﬁable. In some practical applications

like hardware and software veriﬁcation (Lifﬁton and

Sakallah, 2008), ﬁnding all MUSes is valuable to get

the best diagnostic in order to repair errors. How-

ever, when it comes to explaining to human users,

returning an exponential number of MUSes does not

make sense. As in extreme cases its size edges toward

the size of the formula, returning a single MUS may

also overwhelm the user. Accordingly, the search for

a smallest MUS (SMUS) has been the central topic

of several papers (Ignatiev et al., 2015), (Mneimneh

et al., 2005). Obviously, generating a smallest MUS

of a CNF formula is more computationally difﬁcult

than extracting any MUS, as these problems are in

and FP

respectively (Ignatiev et al., 2015).

However, a SMUS may still be not succinct enough

or may pinpoint a part of the formula that would be

irrelevant when taking into account the entire prob-

lem.

When the set of constraints at hand is satisﬁ-

able, the notion of MUS trivializes and the notion of

Minimal Equivalent Subformula (MES) (Belov et al.,

2014) can be more suited. Given a set of constraints

C , a MES is a minimal (for inclusion) subset of con-

straints C

⊆ C which is equivalent to C . Even if com-

puting a MES might gather interesting information,

it does not help in detecting which part of the for-

mula is hard to solve. Moreover, like MUSes, a MES

and even a smallest MES (SMES) may not be small

486

Audemard, G., Lagniez, J., Miceli, M. and Roussel, O.

Identifying Soft Cores in Propositional Formulæ.

DOI: 10.5220/0010892700003116

In Proceedings of the 14th International Conference on Agents and Artiﬁcial Intelligence (ICAART 2022) - Volume 2, pages 486-495

ISBN: 978-989-758-547-0; ISSN: 2184-433X

enough to be handled by users.

To deal with these issues, we introduce in the fol-

lowing a generalization of the notion of MUS to both

satisﬁable and unsatisﬁable formulæ. A soft core is a

subset of constraints which is both small and highly

constrained, meaning it has a limited number of mod-

els. Identifying a soft core is clearly a bi-criteria op-

timization problem, where both the size of the soft

core and its number of models matter. Depending on

the context, it may be more important to reduce the

size of the soft core, even if we obtain more models.

Alternatively, in other cases, reducing the number of

models will be the main objective, even if this gives

larger soft cores.

Soft cores can be proven useful in different appli-

cation scenarios. They pinpoint the most constrained

parts of the formula, and determinate which con-

straints prevent expected solutions from being pos-

sible. For example, they can be used in scheduling

problems to ﬁnd which part needs to be relaxed, or in

the analysis of mathematical conjectures (Heule et al.,

2016). Indeed, for such case, once the parameters of a

conjecture (e.g. a number of colors) are instantiated,

the problem can often be encoded by constraints. Ex-

cept for small values of the parameters, the resulting

formula is hard to be solved. Besides, if the formula

is unsatisﬁable, a MUS is expected to be quite large,

since conjectures generally use a minimum number

of hypotheses. Thus, soft cores may provide informa-

tion and can become the start of mathematical proof

if the models admitted have speciﬁc characteristics or

are not numerous.

Soft cores can also be used in debugging tools, as

in (Dodaro et al., 2018), whose purpose is to help the

user to model problems. Generally, problems are not

modeled directly into SAT, but ﬁrst in higher level

modeling languages and then transformed into SAT

instances. When encoded as CNF formulæ, problems

loose a lot of structural information, and a simple nat-

ural constraint (for example, ”we want at least par-

ticipate in three sessions among the ﬁve available”,

as s

+ s

> 3) might become a large

set of clauses, whose decision variables are not dis-

tinguishable from auxiliary ones, and which, if not

gathered into groups, can be lost among other clauses.

Then, using a Group CNF (GCNF) formula instead of

a CNF formula can be preferred to keep structural in-

formation, and to treat together clauses issued from a

former higher-level constraint.

We propose using soft cores to determine, when

modeling a problem into SAT, which parts of the

formula would be hard to explore, in order to either

change modeling or use the information when solving

the generated instance.

The paper is organized as follows. First, we

formally deﬁne soft cores, and show that the de-

cision problem related to soft cores is an NP

hard problem by printing out a reduction from E-

MAJSAT (Pipatsrisawat and Darwiche, 2009). Af-

terwards, we demonstrate how to turn instances of

the soft cores problem into instances of MAX#SAT,

an NP

-complete problem (Fremont et al., 2017).

Finally, from a practical side, we present an exact

MAX#SAT solver. We show how to leverage it for

computing soft cores and we compare it to the approx-

imate MAX#SAT solver Maxcount, (Fremont et al.,

2017), the only available tool for solving MAX#SAT

instances up to now.

2 PRELIMINARIES

Boolean Logic. We consider standard Boolean

logic. Let L

be a language of formulæ over an

alphabet P of Boolean variables also called atoms,

denoted by a, b, c, . . . The symbols ∧, ∨, ¬, ⇒ and

⇔ represent the standard conjunctive, disjunctive,

negation, material implication and equivalence

connectives, respectively. Propositional formulæ are

built in the usual way from variables, connectives

and parentheses. They are denoted by greek letters

as α, β, Γ, ∆, . . . We denote by Var(Γ) the set of

variables appearing in a formula Γ. For convenience

we sometimes write Γ(X) to represent that Γ is

assumed to be deﬁned on variables X. A literal is a

variable or its negation. A term is a conjunction of

literals (`

∧ . .. ∧ `

) while a clause is a disjunction

of literals (`

∨ . .. ∨ `

). A unit clause is formed of

one literal.

Interpretation and Model. An interpretation

(or an assignment) ω to P is a mapping from P

to {true, false}. ω is complete when all variables

are assigned, otherwise it is partial. A variable

not assigned is said to be free. The set of all

interpretations is denoted by Ω. An interpretation

ω is a model of a formula Γ ∈ L

if and only if

it makes it true in the usual truth functional way.

On the contrary, an interpretation is a counter-

model if it does not satisfy the formula. The set

of models admitted by Γ is denoted Mod(Γ), with

Mod(Γ) = {ω ∈ Ω | ω is a model of Γ}. |= and ≡

denote respectively logical entailment and logical

equivalence. Let Γ and ∆ be two distinct propositional

formulæ, Γ |= ∆ if and only if Mod(Γ) ⊆ Mod(∆)

and Γ ≡ ∆ if and only if Mod(Γ) = Mod(∆).

Identifying Soft Cores in Propositional Formulæ

487

CNF and DNF. A propositional formula is in

Conjunctive Normal Form (CNF) (resp. in Disjunc-

tive Normal Form (DNF)) when it is written as a

conjunction of clauses (resp. a disjunction of terms).

Alternatively, CNF (resp. DNF) can be represented

by their set of clauses (resp. set of terms). The size

of the CNF Ψ (resp. DNF), denoted by |Ψ|, is its

number of clauses (resp. terms). The conditioning of

a CNF formula Ψ by a consistent term γ is the CNF

formula Ψ|

, obtained from Ψ by ﬁrst removing each

clause containing a literal ` ∈ γ and then removing all

occurrences of ¬` from the remaining clauses. When

a CNF formula contains a unit clause {`}, Ψ and Ψ|

are equisatisﬁable. The unit propagation of clause

{`} is the conditioning of Ψ on `. A CNF formula

with no unit clause is said to be closed under propa-

gation. The Boolean Constraint Propagation (BCP),

is an algorithm that, given a CNF formula, returns an

equivalent CNF closed under unit propagation.

Related Problems. Many problems revolve around

solutions (or lack thereof) of a given CNF formula

Ψ. The decision problem determining whether

a model of Ψ exists is the Boolean Satisﬁability

Problem (SAT) (Biere et al., 2009). A more general

problem is the counting problem #SAT (Thurley,

2006), (Lagniez and Marquis, 2017) which returns

the number of models of Ψ over its own set of

variables, denoted by k Ψ k. When Var(Ψ) is a

proper subset from the initial alphabet P, i.e. there

are free variables that are omitted in Ψ, we denote

its model count over P by k Ψ k

. When Ψ ∈ L

and X ⊆ P, ∃X.Ψ is a quantiﬁed Boolean formula

denoting (up to logical equivalence) the most general

consequence of Ψ which is independent from the

variables of X (Lang et al., 2003). Observe that

Var(∃X.Ψ) ⊆ Var(Ψ) \ X . The problem #∃SAT

(Aziz et al., 2015) is to determine the number of

models of a quantiﬁed CNF formula ∃X.Ψ.

Transforming a Propositional Formula into a CNF

Form. Tseitin encoding scheme is a linear-time

query-equivalent encoding scheme to translate any

propositional formula Γ into a CNF formula Ψ

(Tseitin, 1983). To do this, auxiliary variables are

added in order to represent subformulæ. Since

each additional variable is deﬁned from the input

variables, both formulæ have the same number of

models (Lagniez et al., 2020). More precisely, the

models of the resulting CNF encoding are extensions

of the models of the input formula, but no model

is created nor removed. It is also possible to use

a more compact encoding that does not consider

equivalence but only implication. This encoding,

called Plaisted&Greenbaum encoding (Plaisted and

Greenbaum, 1986), does not preserve the number of

models but is equivalent modulo forgetting on the

auxiliary variables.

Group CNF. Given a CNF formula, clauses can

be semantically linked and thus gathered into groups

G = {G

,. . . , G

} (Lifﬁton and Sakallah, 2008).

Concretely, each group is associated with an identi-

ﬁer which is assigned to the clauses that compose

it. In some problems, it is also relevant to gather

integrity constraints into a dedicated group denoted

by D. A group CNF (GCNF) formula Φ = D ∪ G

with G = {G

,. . . , G

} can be interpreted as D ∧

∧ . .. ∧ G

. Thus, all notions introduced so far

on CNF naturally extend on GCNF by considering

D∧G

∧. . .∧G

. Considering GCNF instead of CNF

becomes really crucial when the solution of the mod-

eled problem must satisfy some integrity constraints

and when we have to select some groups, which taken

together with D respect some interesting properties

(Nadel, 2010), (Lifﬁton and Sakallah, 2008), (Belov

et al., 2014).

3 PROBLEM STATEMENT AND

COMPUTATIONAL

COMPLEXITY

In this section, we ﬁrst introduce the new notion of

soft core, whose objective is to determine a small

set of groups that constrains the most a GCNF for-

mula. Then, we will show that computing a soft core

whose size is predetermined is an NP

-hard problem

by proposing a reduction from E-MAJSAT (Pipatsri-

sawat and Darwiche, 2009), which is the prototypical

-complete problem (Littman et al., 1998).

3.1 Deﬁnition of a soft core

Informally, a soft core is a subset of a propositional

formula, that is small both in size and in the number

of models admitted. More formally, it is deﬁned as an

element of the Pareto frontier of the problem whose

purpose is to ﬁnd a subset of the formula with two

objective functions to be minimized: (a) its size and

(b) its number of models. This is detailed in Deﬁni-

tion 1.

Deﬁnition 1. Given a GCNF formula Φ = D ∪G, G

is a soft core of Φ if and only if:

1. G

⊆ G ;

2. ∀G

⊆ G, with G

6= G

and |G

| ≤ |G

k D ∪ G

Var(Φ)

≥k D ∪ G

Var(Φ)

ICAART 2022 - 14th International Conference on Agents and Artiﬁcial Intelligence

488

To simplify, we restrict the bi-criteria problem to a

single criterion optimization problem by ﬁxing the

size of the subset. This size k is chosen by the user.

Then, a k-soft core is a subset of the formula of size k

with a minimum number of models. We note that, in

general, a k-soft core is not a soft core because k may

not appear on the Pareto frontier of the bi-criteria op-

timization problem.

Deﬁnition 2. Given a GCNF formula Φ = D ∪ G

and an integer k with k ≤ |G |, G

is a k-soft core if

and only if:

1. G

⊂ G and |G

| = k ;

2. ∀G

⊆ G and G

6= G

, with |G

| = k,

k D ∪ G

Var(Φ)

>k D ∪ G

Var(Φ)

We also deﬁne the decision problem associated to the

optimization problem.

Deﬁnition 3. Given a GCNF formula Φ = D ∪ G,

k ∈ N and m ∈ N, a subset G

of G is said to be a

hk, mi-soft core if |G

| ≤ k and k D ∪ G

Var(Φ)

≤ m.

As already mentioned in the introduction, it is easy

to demonstrate that soft cores generalize both SMES

and SMUS notions. Indeed, given a CNF formula

Ψ, the problem of minimizing k for hk,0i-soft core

can be seen as a generalization of SMUS when

treating unsatisﬁable formulæ and minimizing k for

hk, k Ψ ki-soft core can be seen as a generalization of

SMES when treating satisﬁable formulæ.

In the following section, we analyze the computa-

tional complexity of the decision version of the k-soft

core problem.

3.2 Reduction from E-MAJSAT to

hk, mi-soft core

We prove that hk, mi-soft core is NP

-hard by consid-

ering a reduction in polynomial time from the NP

complete problem E-MAJSAT (Littman et al., 1998).

This problem is deﬁned as follows. Let Ψ be a propo-

sitional formula in CNF deﬁned over X ∪Y , where X

and Y are two disjoint sets of propositional variables.

Does there exist an assignment ω over X such that

the majority of assignments over Y satisﬁes Ψ|w? In

other words, E-MAJSAT determines if an assignment

ω over X such as k Ψ(ω,Y ) k

Var(Ψ)

× 2

|Y|

exists.

Proposition 1. Finding a hk,mi-soft core is NP

hard.

Proof. Let us consider a CNF formula Ψ deﬁned

over the set of propositional variables X ∪ Y , with

X = {x

,. . . , x

}. Without loss of generality

we suppose that Ψ is not a tautology. Now, let us

associate with Ψ the GCNF formula Φ = D ∪G, with:

G. G simulates the choices of variables x

∈

X thanks to 2 × n propositional variables C =

¬x

,. . . , c

¬x

}. Each pair {c

¬x

}, that we

note c

, represents the choice of a literal ` in the cho-

sen assignment of X. If x

is true, then c

and there-

fore c

are also true. Otherwise, if x

is false, then c

¬x

is true and c

is false.

G = {G

s.t. Var(`) ∈ X and G

= {c

}} (1)

We denote by G

the set of groups whose unitary

clause c

(either c

or c

¬x

) are ﬁxed to true. In order

to select a well constructed interpretation ω of X,

such as it does not contain both c

and c

¬x

, we want

to ﬁx k = n, ensuring that exactly n choices are made.

To do this, we have to add constraints to D.

D. Nothing ensures that x

and its complementary ¬x

will not be chosen together. To detect this situation,

we add a ﬁrst constraint to D:

s ⇔

x∈X

⇔ c

¬x

) (2)

Equation 2 introduces a new propositional variable s

which is logically deﬁned by C. Then, s indicates if

ω is consistent (meaning well constructed): if at least

one pair x

and ¬x

is set to true, then s is true, other-

wise s is false. We use this new variable to make im-

possible the selection of an inconsistent ω by adding

to D the following constraint:

s ∨ ((c

⇒ x) ∧ (c

¬x

⇒ ¬x) ∧ ¬Ψ) (3)

Then, given the selected interpretation ω, Equation 3

either states the number of models of ¬Ψ conditioned

by ω if ω is well constructed or if not, states 2

|Var(Ψ)|

models. In the E-MAJSAT problem, we search for

an assignment of X such that we get at least

× 2

|Y|

models, whereas when computing a hk, mi-soft core,

we search for k groups such that we have at most m

models. Thus, we use the fact that minimizing the

number of models of Ψ is maximizing the number of

counter models of Ψ and we consider the negation of

Ψ. We obtain the following GCNF:

Φ = G ∧

s ⇔

x∈X

⇔ c

¬x

)

∧ (s ∨ ((c

⇒ x) ∧ (c

¬x

⇒ ¬x) ∧ ¬Ψ))

(4)

For the sake of simplicity, we chose to not translate

Equations 2 and 3 into CNF formulæ. However, and

Identifying Soft Cores in Propositional Formulæ

489

as pointed out in Section 2, this translation into a

CNF formula with the same number of models can

be done in polynomial time using Tseitin encoding.

Now, let us demonstrate that by ﬁxing m = (2

−1)×

|Var(Ψ)|

× 2

|Y|

− 1, we intend to prove that there

is an assignment w over X such that the majority of

assignments over Y satisﬁes Φ|w, if and only if there

exists a G

⊆ G s.t. |G

| ≤ k and k D ∪ G

Var(Φ)

≤ m.

First, let us remark that after selecting n groups, n

variables of C are units and the remaining n variables

of C are free. From now on, two cases have to be

considered: (a) the selected groups are inconsistent

in a sense that there exists a literal ` of X such that

and G

¬`

have been selected and (b) the selected

groups are consistent in a sense that there does not

exist such a literal `. The remaining case, which

consists in the situation where there exists a literal `

of X such that neither G

or G

¬`

has been selected, is

a consequence of case (a). Indeed, since |G| = 2 ×|X|

and only one group is associated with each literal of

X, if there exists a literal ` of X such that neither G

or G

¬`

have been selected, then there exists `

of X

such that G

and G

¬`

has been selected (Dirichlet’s

drawer principle).

a. Then, let us show that whatever the selected groups

which fall in case (a), we have k D ∪ G

Var(Φ)

× 2

|Var(Ψ)|

. If there exists a literal ` of X such that

and G

¬`

have been selected, then s is necessary

true by Equation 2. By both replacing s by > in Equa-

tion 3 and simplifying Equation 2 we get D = >. Con-

sequently, the number of models of D ∪ G

is given by

the number of free variables in D ∪ G

over Var(Φ),

which are the variables of Ψ as well as half of the vari-

ables of C, thus n + |Var(Ψ)| free variables. Then,

whatever the n selected groups, we always have in

case (a):

= k D ∪ G

Var(Φ)

= 2

× 2

|Var(Ψ)|

= (2

− 1) × 2

|Var(Ψ)|

+ 2

|Var(Ψ)|

(5)

Since we supposed that Ψ 6≡ >, then

k Ψ k

Var(Ψ)

< 2

|Var(Ψ)|

and we have m < m

Consequently, we can not ﬁnd out a subset G

⊆ G

that is a hk, mi-soft core of Φ for k = n and

m = (2

− 1) × 2

|Var(Ψ)|

× 2

|Y|

− 1 if we are in

case (a).

b. Let us consider the second case (b). By Dirich-

let’s drawer principle, G

will only consider one group

for each literal, G

= {G

,. . . , G

}, such that

Var(`

) = x

and x

∈ X. Then, D ∪ G

is equal

to D ∪ {c

,. . . , c

}, which is equivalent to Γ =

D ∧ c

∧ c

∧ . . . ∧ c

. We can show that the only

situation where s is not true by Equation 2 is when

¬`

, c

¬`

,. . . , c

¬`

are all set to false. Indeed, for all

remaining 2

− 1 cases, there exists `

such that c

and c

¬`

are true, which makes s = >. For each w

assignment of c

¬`

, c

¬`

, . . ., c

¬`

, in these 2

− 1

cases we have Γ|

= >. Thus, all variables from

Ψ are free and the number of models is 2

|Var(Ψ)|

When we consider the interpretation w

that makes

all c

¬`

,. . . , c

¬`

set to false, s would also be set

to false by Equation 2. Thus, we get:

Γ|

≡ D ∧ c

∧ c

∧ . . . ∧ c

∧ ¬c

¬`

∧ ¬c

¬`

∧ . .. ∧ ¬c

¬`

≡ ¬s ∧ `

∧ `

∧ . . . ∧ `

∧ ¬Ψ

∧ c

∧ . .. ∧ c

∧ ¬c

¬`

∧ ¬c

¬`

∧ . .. ∧ ¬c

¬`

(6)

Since all the variables except those of Y are units,

then k Γ|

Var(()Φ)

=k (¬Ψ)|

,...,`

Var(Ψ)

. Con-

sequently, in case (b) we have:

= k D ∪ G

Var(Φ)

= (2

− 1) × 2

|Var(Ψ)|

+ k (¬Ψ)|

,...,`

Var(Ψ)

(7)

Finally, G

is a hk, mi-soft core of Φ if and only if G

falls in the case (b) and m

≤ (2

−1)×2

|Var(Ψ)|

|Y|

− 1 which implies that k (¬Ψ)|

,...,`

Var(Ψ)

≤

× 2

|Y|

− 1. Since all variables of X are assigned, the

last assertion is true only when (¬Ψ)|

,...,`

has a

minority of models over Y , which is the case when

Ψ|

,...,`

has a majority of models over Y . We have

proven that there is an assignment w over X such that

the majority of assignments over Y satisﬁes Ψ|

if and

only if there exists a hk,mi-soft core of Φ.

Proposition 1 shows that it is theoretically possible to

leverage an E-MAJSAT solver in order to compute a

k-soft core. However, in practice, the transformation

of an instance of k-soft core into an instance of E-

MAJSAT is not straightforward and more importantly

(to the best of our knowledge) no E-MAJSAT solver

is available. In the next section, we propose a more

convenient transformation from an instance of the

soft core problem to an instance of MAX#SAT, an-

other NP

-complete problem. MAX#SAT is to de-

termine an assignment of some variables that max-

imizes the number of models of a given CNF for-

mula. The possibility of using an available tool to ap-

proximate MAX#SAT (https://github.com/dfremont/

maxcount) is also an argument for using such transla-

tion.

ICAART 2022 - 14th International Conference on Agents and Artiﬁcial Intelligence

490

4 EXTRACTING A k-soft core

A naive way to extract a k-soft core from a GCNF

formula would be to enumerate all possible combina-

tions of k groups, extract the selection from the for-

mula and call a model counter to compute its num-

ber of models over the whole alphabet. Obviously, in

practice this is feasible only on very small instances.

Then, we propose an alternative procedure, which

also guarantees completion, that encodes the problem

as an instance of the problem MAX#SAT.

4.1 Translation to MAX#SAT

MAX#SAT is a recent optimization problem, de-

ﬁned as an extension of the #SAT problem and

useful in different applications, such as planning

and probabilistic inference (Fremont et al., 2017).

Let Ψ be a CNF formula deﬁned on three distinct

sets of variables, X, Y , and Z, respectively called

maximization, counting and existentially quanti-

ﬁed variables. The MAX#SAT problem is to ﬁnd

the truth assignment ω

over the variables X that

maximizes k ∃Z.Ψ(ω

,Y, Z) k

. In other words,

is the assignment that maximizes the number of

assignments to Y such that Ψ(ω

,Y, Z) is satisﬁable.

Thus, MAX#SAT can succinctly be summarized

as max

#Y ∃Z.Ψ(X,Y, Z). Its decision version is

-complete (Fremont et al., 2017).

To the best of our knowledge, the only imple-

mentation of MAX#SAT is Maxcount (Fremont

et al., 2017), an approximate solver that takes upon

entry a CNF formula and a (X,Y, Z)-partition of its

variables, Z being possibly empty. Maxcount returns

the best truth assignment found over X , alongside its

approximate projected model count. As it is, it can

not be used to extract a k-soft core. Then, we propose

an encoding to transform linearly any k-soft core

instance into a MAX#SAT one.

Addition of Selectors. Let Φ =

D ∪ {G

,. . . , G

} be a GCNF formula.

First, to be able to extract groups from Φ, a new

variable x

called a selector is added to every clause

from the same group G

∈ Φ. For each i ∈ {1 . . . m},

we obtain the augmented group G

∗

j=1

(α

∨¬x

Thus, selectors have the same behavior than iden-

tiﬁers in GCNF: we can interpret groups as sets

of clauses gathered together via selectors and the

augmented formula as a CNF. Let Y be Var(Φ)

and X = {x

,. . . , x

} the set of selector variables.

Then, we note Ψ the CNF formula obtained, with

Ψ(X,Y ) = D ∧

i=1

∗

. When a selector x

is ﬁxed

to false, the set of clauses associated is satisﬁed

and thus, said to be deactivated. Otherwise, the

selector is removed and the set of clauses is activated.

Conditioning Ψ(X,Y ) on any truth assignment ω

over X removes all groups whose selectors have been

set to false and extracts the remaining.

Negation of the Formula. Let Θ be (

)=1

i.e., the subset of groups selected by the assignment to

X. Then, Ψ(X,Y )|

= D∧Θ. To make Θ a soft core,

we want to ﬁnd ω

that minimizes k D ∧ Θ k

. As

minimizing the number of models of a propositional

formula corresponds to maximizing its number of

counter-models, we can use MAX#SAT to compute

a soft core from a CNF formula, provided it has been

negated ﬁrst. By De Morgan’s law, negating Ψ results

in the DNF formula Ψ

∗

(X,Y ) = ¬D ∨

i=1

¬G

∗

Thus, solving max

#Y (¬Ψ

∗

(X,Y )) extracts a soft

core. However, Maxcount asks for a CNF input.

Transformation into a CNF. As pointed out in

Section 2, a DNF formula can be linearly transformed

into a CNF formula by adding auxiliary variables.

To keep the correct number of models, all the

auxiliary variables that are not logically deﬁned by

Var(Ψ(X,Y )) are put into Z, which we recall is the set

of variables to be existentially quantiﬁed. To be more

precise, this applies when the Plaisted&Greenbaum

scheme is used. As Tseitin scheme ensures that the

number of models is kept, auxiliary variables can

either be in Y or Z. Whatever the encoding selected,

let us call

Ψ(X,Y, Z) the CNF formula that encodes

∗

(X,Y ).

Activation of k Selectors. Solving

max

#Y ∃Z.

Ψ(X,Y, Z) results into an assign-

ment ω

over X that selects the set of groups

minimizing k Ψ(ω

,Y ) k

, which without further

constraint, would correspond to all groups of Φ.

To guarantee that exactly k groups are selected, we

add over the selectors X the cardinality constraint

∑

i=1

= k, translated into a CNF formula Γ (As

ın

et al., 2011) deﬁned over X and Z

, with X ∩ Z

being the set of auxiliary variables mandatory

to generate the selected encoding. The resulting

formula is Ψ

(X,Y, Z

) =

Ψ(X,Y, Z) ∧ Γ(X, Z

with Z

= Z ∪ Z

. Whenever ω

falsiﬁes Γ(X, Z

k ∃Z

.Ψ

(ω

,Y, Z

) k

would be equal to zero.

Therefore, ω

can not be the solution to max-

imizing the number of models and solving

max

#Y ∃Z

.Ψ

(X,Y, Z

) has to return a k-soft

core.

To sum up, given a GCNF formula Φ = D ∧

i=1

, we compute a k-soft core by considering the

Identifying Soft Cores in Propositional Formulæ

491

following MAX#SAT formulation, τ being the trans-

formation chosen to get a CNF from a DNF and χ the

CNF encoding of the cardinality constraint :

max

#Y ∃Z

.(τ(¬(

i=1

j=1

(α

∨¬x

))) ∧ χ(

∑

i=1

= k))

As already mentioned, there exists only one soft-

ware able to handle the MAX#SAT problem, and it

returns only an approximation. In the next section,

we propose a new and exact approach to tackle the

MAX#SAT problem.

4.2 Algorithm for Computing

MAX#SAT

Algorithm 1 provides the pseudo-code of Function

max#SAT that solves exactly MAX#SAT and which

takes upon entry a CNF Ψ and a (X,Y, Z)-partition

of its variables, respectively the counting, optimiza-

tion and existentially quantiﬁed variables. It returns a

term t, corresponding to an assignment of some vari-

ables of X, and the projected number of models of

∃Z.Ψ(t,Y, Z) over Y .

By construction, t is such that all complete

interpretations w of X extending t respect k

∃Z.Ψ(t,Y, Z) k

=k ∃Z.Ψ(w,Y, Z) k

. Based on the

model counter d4 (Lagniez and Marquis, 2017),

max#SAT is a top-down tree-search algorithm which

is, in our case, decomposed into two parts: (a) as

long as the current formula contains variables from

X, we branch on such variables. We keep the assign-

ment that maximizes the number of projected models,

k ∃Z.Ψ(t,Y, Z) k

, which is computed in the second

part (b) of the tree as soon as there is no more vari-

ables from X to select and by considering in priority

variables from Y .

We also take advantage of the dynamic decom-

position and cache implementation of d4. Let Ψ be

the current formula. Ψ can be partitioned into dis-

joint subformulæ {Ψ

,. . . , Ψ

}, when for each i, j

∈ {1, . . ., d} with i 6= j, Ψ

and Ψ

do not share any

variable (i.e. Var(Ψ

) ∩ Var(Ψ

) =

0). Then, each

subformula Ψ

is treated separately and their solu-

tions aggregated afterwards. Furthermore, we use a

cache to avoid computing once more an already en-

countered subformula. Each time a new value of ht, k

∃Z.Ψ(t,Y, Z) k

i is computed, it is stored in a map.

If a previously computed formula is found again, then

the cache would return the combination saved.

First, at line 1, one tests whether the formula Ψ

is satisﬁable. If not, whatever the interpretation con-

sidered on X is, the number of models corresponding

would be equal to zero and max#SAT returns h

0,0i.

BCP simpliﬁes Ψ at line 2 and returns an equiva-

lent formula Ψ

alongside the unit literals which were

propagated. If Ψ

has already been cached, max#SAT

returns cache[Ψ

] (line 3). Afterwards, we construct

ret (line 4), the temporary result that will be returned

at line 21 and which initially contains an empty term

with a neutral number of models.

connectedComponent partitions Ψ

into a set of

disjoint connected components at line 8. If there are

more than one component, then at line 8, the cur-

rent solution is equal to the aggregation of the solu-

tions of all its sub-components. As they do not share

any variables, Ψ

≡ Ψ

∧ .. . ∧ Ψ

. Then, ∃Z.Ψ

≡

∃Z.(Ψ

∧.. .∧Ψ

) ≡ ∃Z.(Ψ

)∧.. .∧∃Z.(Ψ

). There-

fore, the number of projected models k ∃Z.Ψ

equal to k ∃Z.Ψ

×. . . × k ∃Z.Ψ

, and the as-

signments over X are concatenated.

If Ψ

can not be partitioned into more than one

component ( j = 1), a variable from X (or if X is

empty, Y ) is selected (lines 11-13). If Y is also empty,

then the only remaining variables are existentially

quantiﬁed and the current branch can be stopped, as

we already know that the current assignment has an

extension on Z that satisﬁes Ψ

. Otherwise, regard-

less of whether the decision variable is from X, we

compute via two recursive calls to max#SAT the solu-

tions for either conditioning Ψ

by v or ¬v (lines 15

and 16). If v is a maximization variable (line 17), then

the current number of models is equal to the one given

by the conditioning that resulted the higher number of

models. If v is a counting variable (line 18), then the

current number of models is equal to the sum of the

two solutions, as in normal model counters. In line 19,

we replace ret by the computed result we just stored

in cache[Ψ

Finally, at line 20, we update the current result

stored in ret by extending its term with the unit lit-

erals of X that have been computed by BCP, and by

multiplying its number of models with the free vari-

ables of Ψ

belonging to Y .

5 EXPERIMENTAL RESULTS

We experimentally evaluated our MAX#SAT encod-

ing of k-soft core instances on both the state-of-art ap-

proximate solver Maxcount and our complete solver

based on the algorithm max#SAT, to compare their

performance. To the best of our knowledge, as there

is no similar notion of soft cores in the literature, we

can not rely on existing benchmarks to experiment

on. Thus, as performance was not the primary con-

cern in this paper, we only considered small satisﬁ-

able instances crafted by a random 3-CNF formulæ

ICAART 2022 - 14th International Conference on Agents and Artiﬁcial Intelligence

492

Input: Ψ: a CNF formula,

(X,Y, Z): a partition of Var(Ψ).

Output: ret = ht, ci s.t. any interpretation of X that extends t is a solution for MAX#SAT, and c the

number of models obtained.

if Ψ is unsat then return h

0,0i ;

(Ψ

, units) ← BCP(Ψ)

if cache[Ψ

] 6=

0 then return cache[Ψ

] ;

ret ← h

0,1i;

{Ψ

,. . . , Ψ

} ← connectedComponent(Ψ

)

if j > 1 then

for Ψ

∈ {Ψ

,. . . , Ψ

} do

ret ← ret × max#SAT(Ψ

, X, Y, Z)

end

else if j = 1 then

v ← undef

if Var(Ψ

) ∩ X 6=

0 then

v ← selectVar(Var(Ψ

) ∩ X)

else if var(Ψ

) ∩ Y 6=

0 then

v ← selectVar(Var(Ψ

) ∩Y )

end

if v 6= undef then

i ← max#SAT({Ψ

∧ v}, X, Y , Z)

i ← max#SAT({Ψ

∧ ¬v}, X, Y , Z)

if v ∈ X then

cache[Ψ

] ← (v

> v

) ? ht

i : ht

else

cache[Ψ

] ← h

0,v

+ v

end

ret ← cache[Ψ

]

end

return ret × h {` ∈ units | Var(`) ∈ X}, 2

|(Var(Ψ)\(var(Ψ

)∪Var(units)))∩Y|

Algorithm 1: Function max#SAT.

generator, with ten different variables and a clause-

to-variables ratio of 4.2. Afterwards, we translated all

problems into MAX#SAT instances using the encod-

ing presented in the previous section, with one clause

per group. We considered three variants, depending

on the transformation selected to get back a CNF. If

the Plaisted&Greenbaum scheme was used, then all

auxiliary variables were existentially quantiﬁed. Oth-

erwise, with the Tseitin scheme, auxiliary variables

were added on one side to the counting set, and on

the other to the existentially quantiﬁed one, which we

denote respectively by Tseitin(Y ) and Tseitin(Z).

For each instance, we measured the time in (sec-

onds) required by max#SAT and Maxcount to termi-

nate, as well as the number of models found. While

max#SAT is exact, Maxcount is an approximate model

counter. Thus, the estimated number of models is im-

portant to evaluate correctly the soft cores selected

by Maxcount. Furthermore, Maxcount takes upon

entry the number of copies n of the formula to use

in the self-decomposition. We have tested each in-

stance with a n set to 0, 3 and 5. When n is equal

to zero, the assignment to X is given at random with

no constraints, and during our experiments, it always

returned a solution that violated the cardinality con-

straint. However, setting n to a much higher integer

(n = 5) did not result into a better estimated count,

and increased signiﬁcantly the runtime. To be fair, we

only report Maxcount with n = 3, the best parameter

from all tested.

All the experiments have been conducted on a

cluster of Intel XEON X5550 (2,66 Ghz) bi-core pro-

cessors with 32 GiB RAM. Each solver was run with

a time-out of three hours and a memory limit of 32

GiB of per input instance.

Table 1 reports the results for each considered

Identifying Soft Cores in Propositional Formulæ

493

Table 1: Computing soft cores from random 3-CNF formulæ.

P&G Tseitin (Y) Tseitin (Z)

max#SAT Maxcount max#SAT Maxcount max#SAT Maxcount

time(s) #mod time(s) time(s) #mod time(s) time(s) #mod time(s)

102 560 92 135 timeout 107 536 77

91 560 84 147 timeout 101 560 86

91 576 99 136 timeout 111 560 85

99 547 81 151 timeout 95 544 80

85 560 89 130 timeout 105 564 89

105 536 99 148 timeout 100 544 84

95 576 99 137 timeout 125 576 105

107 576 102 153 timeout 102 576 92

94 560 92 137 timeout 115 560 85

107 576 102 142 timeout 97 532 90

variant. As each instance contains 42 clauses and

thus, 42 selectors, we set k equal to 5 in order to re-

strict the number of possible combinations,





being

already equal to 859, 668. Yet, the naive method pro-

posed in the introduction of Section 4 did not termi-

nate within the speciﬁed time. As the structure stays

the same, i.e., instances are all composed of 42 ternary

clauses, all formulæ after translation were up to 462

variables, with respectively 1625 and 1667 clauses for

the Plaisted&Greenbaum or Tseitin transformations.

As max#SAT is complete, it returns the assignment

on X that maximizes the number of models of the

negated formula. We did not report the number of

models computed, as it is always the same, which is

here equal to 640 models.

The experiment shows that Maxcount only termi-

nates when auxiliary variables are put into Z, and is

slightly faster if the encoding variables are equivalent

to the subformulæ they represent, as in the Tseitin

scheme. The results given with max#SAT are a mir-

ror image of Maxcount: besides giving a solution

when counting the auxiliary variables, selecting the

Plaisted&Greenbaum scheme is faster than using the

Tseitin transformation. In most cases, Maxcount is

faster than max#SAT but it also always returns an esti-

mated count that is lower than the optimum solution.

Obviously, this experiment is only an outline of what

could be solved by Maxcount and max#SAT, the pri-

mary objective being to show that on small instances,

max#SAT is rather competitive with Maxcount. How-

ever, on larger instances, Maxcount may scale up bet-

ter than MAX#SAT, as it is an approximate solver.

6 CONCLUSION AND

PERSPECTIVES

In order to explain results from CNF formulæ, we

introduced a new notion called soft core, which is

a sufﬁciently small and highly constrained part of a

formula. Soft cores can be used to identify which

constraints should be relaxed in order to obtain other

solutions, to select the most relevant constraints in

the formula or to help when modeling. Identifying

soft cores is a bi-criteria optimization problem where

both the size and the number of models of the sub-

formula have to be minimized. In this article, we fo-

cused on the restricted problem k-soft core where the

size of the soft core is given by the user, thus becom-

ing a single objective function problem. We showed

the NP

-hardness of the decision version of the k-

soft core problem by considering a reduction from E-

MAJSAT, and proposed an encoding to transform k-

soft core instances into MAX#SAT ones, as well as

a ﬁrst exact MAX#SAT solver. At last, an experi-

mental evaluation on randomly small generated CNF

formulæ has been realized.

As expected, even the restricted version k-soft

core is a difﬁcult problem. At the present time, as

long as the cardinality constraint is considered glob-

ally, only small instances can be envisioned. Indeed,

the major part of the tree search corresponds to as-

signments that falsify the cardinality constraint. Thus,

pruning them would improve the performance. Fur-

thermore, the cardinality constraint, in addition to in-

crease the size of the formula, may prevent it to be

partitioned into disjoint components, which would

also fasten the runtime. Thus, to be more effective

when searching for k-soft cores, a ﬁrst improvement

is to handle directly the cardinality constraint. A sec-

ond perspective is to stop considering the problem as

a MAX#SAT instance, but to use a dedicated solver

whose purpose would be to minimize directly the

number of models, and use the input formula with-

out transforming it. Finally, we could also compute

approximately soft cores by considering local search:

we ﬁrst pick k groups from the formula, and we

switch elements one by one until no switch decreases

the number of models admitted.

ICAART 2022 - 14th International Conference on Agents and Artiﬁcial Intelligence

494

REFERENCES

ın, R., Nieuwenhuis, R., Oliveras, A., and Rodr

ıguez-

Carbonell, E. (2011). Cardinality networks: a theoret-

ical and empirical study. Constraints, 16(2):195–221.

Aziz, R. A., Chu, G., Muise, C. J., and Stuckey, P. J. (2015).

#∃SAT: Projected model counting. In Proceedings of

SAT’05, volume 9340 of Lecture Notes in Computer

Science, pages 121–137.

Belov, A., Janota, M., Lynce, I., and Marques-Silva, J.

(2014). Algorithms for computing minimal equivalent

subformulas. Artif. Intell., 216:309–326.

Biere, A., Cimatti, A., Clarke, E. M., and Zhu, Y. (1999).

Symbolic model checking without bdds. In Proceed-

ings of TACAS’99, volume 1579 of Lecture Notes in

Computer Science, pages 193–207.

Biere, A., Heule, M., van Maaren, H., and Walsh, T., edi-

tors (2009). Handbook of Satisﬁability, volume 185 of

Frontiers in Artiﬁcial Intelligence and Applications.

IOS Press.

Bruni, R. and Sassano, A. (2001). Restoring satisﬁability

or maintaining unsatisﬁability by ﬁnding small unsat-

isﬁable subformulae. Electron. Notes Discret. Math.,

9:162–173.

Dodaro, C., Gasteiger, P., Reale, K., Ricca, F., and

Schekotihin, K. (2018). Debugging non-ground ASP

programs: Technique and graphical tools. CoRR.

Fremont, D. J., Rabe, M. N., and Seshia, S. A. (2017). Max-

imum model counting. In Proceedings of AAAI’17,

pages 3885–3892.

Heule, M. J. H., Kullmann, O., and Marek, V. W. (2016).

Solving and verifying the boolean pythagorean triples

problem via cube-and-conquer. In Proceedings of

SAT’16, volume 9710 of Lecture Notes in Computer

Science, pages 228–245.

Ignatiev, A., Previti, A., Lifﬁton, M. H., and Marques-Silva,

J. (2015). Smallest MUS extraction with minimal hit-

ting set dualization. In Proceedings of CP’15, volume

9255 of Lecture Notes in Computer Science, pages

173–182.

Lagniez, J., Lonca, E., and Marquis, P. (2020). Deﬁnability

for model counting. Artif. Intell., page 103229.

Lagniez, J. and Marquis, P. (2017). An improved Decision-

DNNF compiler. In Proceedings of IJCAI’17, pages

667–673.

Lang, J., Lin, F., and Marquis, P. (2003). Causal theories

of action: A computational core. In Proceedings of

IJCAI’03, pages 1073–1078.

Lifﬁton, M. H. and Sakallah, K. A. (2008). Algorithms

for computing minimal unsatisﬁable subsets of con-

straints. J. Autom. Reason., 40(1):1–33.

Littman, M. L., Goldsmith, J., and Mundhenk, M. (1998).

The computational complexity of probabilistic plan-

ning. J. Artif. Intell. Res., 9:1–36.

Mneimneh, M. N., Lynce, I., Andraus, Z. S., Silva, J. P. M.,

and Sakallah, K. A. (2005). A branch-and-bound al-

gorithm for extracting smallest minimal unsatisﬁable

formulas. In Proceedings of SAT’05, volume 3569 of

Lecture Notes in Computer Science, pages 467–474.

Nadel, A. (2010). Boosting minimal unsatisﬁable core ex-

traction. In Proceedings of FMCAD’10, pages 221–

229.

Pipatsrisawat, K. and Darwiche, A. (2009). A new d-dnnf-

based bound computation algorithm for functional E-

MAJSAT. In Proceedings of IJCAI’09, pages 590–

595.

Plaisted, D. A. and Greenbaum, S. (1986). A structure-

preserving clause form translation. J. Symb. Comput.,

2(3):293–304.

Thurley, M. (2006). Sharpsat - counting models with ad-

vanced component caching and implicit BCP. In Pro-

ceedings of SAT’06, volume 4121 of Lecture Notes in

Computer Science, pages 424–429.

Tseitin, G. S. (1983). On the complexity of derivation in

propositional calculus. In Automation of Reasoning,

pages 466–483.

Identifying Soft Cores in Propositional Formulæ

495