When the Optimum is also Blind: A New Perspective on Universal Optimization

SPEAKER 1: So today’s talk is going to be given by Michal Wlodarczyk, who’s a PhD student in Warsaw His advisor is Marek [INAUDIBLE] And his main work is on, well, it’s very wide It ranges from online algorithms, as presented here, approximation algorithms, and also things like hardness and [INAUDIBLE] MICHAL: OK, so this a Halloween talk I’m sorry I don’t have costume, but I put some bats on the slides So this is joint work with Stefano, who visited us last week and also Fabrizio Grandoni and Marek Adamczyk And I’ll start with what is universal optimization about, introduce some new benchmark, and I’ll give some deeper detail of some of our results So universal algorithm is about pre-computing reaction for scenarios that are not known in advance So for example, for set cover we can think about each element as a client and the set covering procedure as a kind of service And each client wants to be covered, but we don’t know which clients are going to be active And for each client, we want to choose one of the sets to that [INAUDIBLE] covers it And for example, here I put colors to the elements So this is an example of such an assignment And afterwards, after this assignment is done, the set of active elements is revealed And we then need to pick all the sets that were assigned to them And another example can be directed Steiner tree And here each terminal needs to have a path to the root assigned in advance So these are three terminals with their paths And then these two are revealed to be the active ones And this is the solution [INAUDIBLE] And as can we see, this solution is far from being optimal And this is because we cannot foresee what exactly we are going to cover SPEAKER 1: So this is like a non-adaptive online– I mean, it’s not adaptive solution for an online algorithm, right? MICHAL: Online is too big word, because there is only one moment of revealing the input SPEAKER 1: OK, so you’re saying that this is just a two-step thing But this could be an– MICHAL: Non-adaptive online algorithm, yeah SPEAKER 1: Also can begin this way OK MICHAL: So people were interested in competitive ratio of such an algorithm So how bad can be the solution that we offer to the active set with comparison to the optimal solution? And it turns out that it can be actually quite bad, even for universal vertex cover So in this setting, we want to cover each edge with one of its ends And this is equivalent to choosing the direction of each edge And for example, let’s say we have this assignment made, and this is some subset revealed And the optimal solution is cover these edges only with the one purple note But our assignment would choose all three of them And there is an easy argument that there

is some vertex with big out-degree, so there will be one scenario in which this ratio will be really bad AUDIENCE: How many scenarios do you have? Do you put a constraint on the numbers of scenarios here? MICHAL: So it’s like in set cover So the number of– ah, OK Number of scenarios AUDIENCE: So for the universal set cover or universal vertex cover, you have this question of different scenarios can show up And these are explicitly given scenarios MICHAL: So we consider three models A scenario model where we have list of all scenarios And this is part of the input AUDIENCE: And like this one, for this what you’re saying here? MICHAL: Yeah, they’re given as a list, because the number of scenarios is the number of vertices in this example AUDIENCE: But you said one vertex Each scenario is one vertex? [INTERPOSING VOICES] MICHAL: In this example, each scenario is a subset of edges [INTERPOSING VOICES] And each subset can be associated in one vertex in this special instance AUDIENCE: So each scenario is all edges connected to one vertex? MICHAL: That’s right AUDIENCE: So the optimum solution in each scenario is one MICHAL: Yes AUDIENCE: No matter if you have high degree [INAUDIBLE] MICHAL: And for every assignment we can find the vertex where this assignment would provide a best solution So I [INTERPOSING VOICES] AUDIENCE: That lower bound, then, omega max [INAUDIBLE] is for what model? MICHAL: So this is for scenario model And this also applies to Oracle model, which is more difficult But I want to talk about these models of scenarios later AUDIENCE: OK But when using scenario model, the number of scenarios is polynomial m and n MICHAL: Yes Or it’s a part of the input So we don’t want it to be large So this is like main message is that the competitive analyzes for universal algorithm has [INAUDIBLE] bounds And some idea is to make adversary a little weaker So one might consider a model where this scenario is drawn with a probably distribution that the algorithm knows in advance And we want to minimize the estimated cost of such a solution And here, if we analyze the previous example more carefully, you can see that it could still get this square root of n lower bounds also in this model AUDIENCE: [INAUDIBLE],, this cost, right? MICHAL: Yes And some idea is to consider some family of probability distributions For example, we can say that each element is being activated independently And for such a model, Fabrizio et al proposed algorithm with a competitive ratio logarithmic in m, where m is the number of sets And the m might be exponential with respect to n So this is still far worse than for the classical set cover And this competitive ratio is tight So what we tried to do is to replace this competitive benchmark with something more fair So in comparative analysis, we are comparing a single algorithm with an expected value of the optimal solution But this is not given by any algorithm

This is just optimal solution given by some oracle And what we propose is to rather compare against some best assignment Best algorithm that plays by the same rules as we do And one can think is that this competitive analysis is like comparing apples to oranges because they’re different classes of solutions being compared And we claim that this is a reason why this lower bounds are so huge AUDIENCE: That’s very interesting Goes back to [INAUDIBLE] This second model only makes sense if you have [INAUDIBLE] a known distribution stochastic model It does not make sense to define this new thing in the model that you don’t know the solutions in advance MICHAL: That’s right AUDIENCE: The definition of the space of solutions– MICHAL: We can also compute something like worst case ratio AUDIENCE: Unknown situation OK, so you have to specify what you mean by distributions The apples OK, go on AUDIENCE: But how this comparison can be made [INAUDIBLE] The first one is handicapping because the optimal is instead of specific benchmark, you have a single algorithm If you want to do distributions, you could do another thing You’re saying both the algorithm and the benchmark are distribution specific One slightly better apples and oranges comparison in between is algorithm is distribution-agnostic, is drawn from some bag of distributions you don’t know, whereas the benchmark is distribution-specific But it’s not instance-specific Because the first one is instance-specific [INAUDIBLE] AUDIENCE: You tell us what you mean by this So we still have this question [INAUDIBLE] MICHAL: So one example of the results in this is apples to apples benchmark is a result by Karger and Minkoff that show that Steiner tree admits constant approximation in this approximation benchmark in the independent activation model So it’s terminal It’s activated independently and we need to connect them to the [INAUDIBLE] AUDIENCE: This result is in this model? MICHAL: Yes AUDIENCE: So the benchmark that they compare with is the optimum [INAUDIBLE] program type solution for this independent activation model where you know the distribution in advance? So we know in advance what is the probabililty that each client is activated MICHAL: Yes They call it [INAUDIBLE] problem and they put it in some different words but this is totally equitable to this analysis AUDIENCE: I never thought about [INAUDIBLE] MICHAL: OK So I’m going to say how to approach to set cover, which will be the simplest algorithm of our toolbox So this is a classical LP formulation for set cover And we replace these variables xs with common configurations So what changes? Now, we cannot have one variable for a set, because the cost of taking set depends of how many elements would be covered, would be assigned to it So in the objective function, we get this probabilistic term So this is the probability that at least one of the assigned elements would be active AUDIENCE: So are you–

in the previous language, this is the scenario-based set [INAUDIBLE]? MICHAL: Now I’m talking like model-oblivious Like we are given probabilistic distribution and we can– let’s just assume we can evaluate something like that And then I will talk more about specific models AUDIENCE: So this is the set of what? MICHAL: Elements that are assigned to s So if we have integral solution, then setting this variable to 1 means that all elements from b are assigned to s and only them AUDIENCE: This is like summing overall subsets, right? MICHAL: Yes Yes, so now we have exponential number of variables This is a configuration LP So I have comparison to the previous one We can see that the objective function like change is because this new term and the rest is more or less the same So there’s an exponential number of variables So in order to solve such an LP, we need to go to the dual And in dual LP, we have now a polynomial number of variables and exponential number of constraints And if we are able to provide it with a separation oracle, then we can solve it So for each solution vector, we need to be able to check if it’s a feasible solution And if we can check something like that in polynomial time, then we could solve the dual And with some known tricks, we can also solve the prime LP So the main observation here is that this activation function is submodular Because it’s a combination of very simple functions for like a sum of functions for each scenario And when we take some of submodular function with positive coefficients, this is also submodular AUDIENCE: This is in dependant activation or this is general? MICHAL: General So what is crucial about submodular function is that we can minimize them in polynomial type AUDIENCE: P is not a family of subsets It’s subsets of elements Confused MICHAL: What is confusing? AUDIENCE: So this scenario, the set cover scenario, I’m trying to reverse engineer what is the problem that you’re trying to solve Why are you writing this LP? What is the problem that you’re trying to solve? Universal stochastic set cover? Is it you have different scenario, each scenario correspond to a family of subsets? Or each scenario correspond to? MICHAL: So I don’t have this on any slide, but I believe there is nobody to be seeing, so I can just– AUDIENCE: There is one person No, he dropped out There is no one MICHAL: OK So what we are trying to minimize We are given our universe Universe of size n and the family of sets of size m and probabilistic distribution, which is over–

AUDIENCE: Elements or sets? AUDIENCE: Subsets of elements or something AUDIENCE: Subsets of elements Is it [INAUDIBLE] elements or sets? MICHAL: Sets AUDIENCE: Ooh, it’s a probability distribution of scenarios Each scenario [INAUDIBLE] ideas and scenarios AUDIENCE: So there’s subsets of u, right? MICHAL: So each scenario is a subset of elements AUDIENCE: Each scenario is a subset of elements AUDIENCE: These are the elements at your disposal to cover all the sets that you have to cover Or no, these are the elements that you have to cover The subset of the family of subsets that you have is fixed In each scenario, you have the same family It’s just that– [INTERPOSING VOICES] AUDIENCE: Scenario is just to capture inputs, right? The input is captured by scenarios [INAUDIBLE] probability distribution or scenarios The probability distribution could be a general probability distribution or independent probability distribution That’s all What is meant by scenario independent LP is he doesn’t say whether it’s independent or not AUDIENCE: Each scenario is a subset of elements that you have to cover And the sets that you have to cover them with are fixed In each scenario, it doesn’t matter MICHAL: So to make everything clear, let’s go once again through this example So now the elements that are not gray is a scenario that was revealed So this is subsets of elements that we need to cover And these three sets that we can use are fixed and know from the beginning And so we have this assignment here And after this scenario is revealed, then it induces a family of sets that we need to pay for AUDIENCE: So u is the elements and s is the– what is s? MICHAL: Family of sets AUDIENCE: This is not a scenario It’s what you can use to cover [INAUDIBLE] MICHAL: Yes, yes AUDIENCE: I think what’s confusing me is that s is not scenario AUDIENCE: S is fixed AUDIENCE: OK, s is fixed? I see Probably the [INAUDIBLE] OK MICHAL: Yeah And the distribution is also over the family of subsets And what we are minimizing is expected value This is x drawn from the [INAUDIBLE] by distribution And this is sum over all sets that are in an image of x and this is cost of this set AUDIENCE: What is the image? AUDIENCE: Before any scenario is revealed, you are supposed to say for each element which set you are going to use to cover This is the solution Yes MICHAL: This is the solution Mapping from elements to sets [INAUDIBLE] AUDIENCE: You don’t do it with multiplicity right? MICHAL: Here, no AUDIENCE: You can have an independent set But then when the scenario shows up, you look at each element You pick the set that was supposed to choose as cover that element And then the cost of that solution is the one that you have to pick from your pre-defined solution The more overlap you have in that choice, the better MICHAL: OK, so are we now on the same page? AUDIENCE: So what can the optimal algorithm do that isn’t captured by this? AUDIENCE: The optimal is because the benchmark that he had is instant specific optimal That means you do it after the scenario is revealed, but also do it before the scenario is revealed That’s the two kinds of benchmarks

AUDIENCE: What’s the new benchmark? MICHAL: The new benchmark is the best mapping This is mapping minimizing this quantity AUDIENCE: OK And then you compare what you can do in polynomial time to this AUDIENCE: The mapping that has the best expected costs MICHAL: So when this is settled, we can go back AUDIENCE: So now go to the Karger Minkoff Go back, go back This is a nice theorem But I think for stochastic Steiner tree, I was under the impression in the independent activation model, you can even compete with the harder– is it true that you can compete with this harder thing? Like there’s a [INAUDIBLE] paper that does that So this is a weaker result compared to that It’s more recent Or am I wrong? MICHAL: I can’t say if you can get constant competitive ratio for Steiner tree I don’t remember AUDIENCE: I’ll tell you I’ll show you the paper I found and then we can discuss [INAUDIBLE] AUDIENCE: So you’re saying you can compare against instance-specific optimal if you relax the set [INAUDIBLE] distributions AUDIENCE: Yeah, for Steiner tree, [INAUDIBLE] MICHAL: So are we now– do we know how this corresponds to the problem? So we encode this assignment [INAUDIBLE],, this mapping using variables y In the same manner as in the classical set cover But we need to take into account this probabilistic theorem AUDIENCE: [INAUDIBLE] by all elements of b or some element of b? MICHAL: If we consider optimal integral solution, then it’s only like it’s exact But when we do linear relaxation, that, of course, we can allow these variables to be a little bit less straight But in optimal interval solution, yes So coming back When we go to– AUDIENCE: In the optimal interval solution, every element in b chose s? AUDIENCE: And none outside? MICHAL: So in order to solve the dual LP, we need to provide it with separation oracle, solve for the vector of alphas We need to check if all this inequalities hold And the number of inequalities is exponential because there is a configurator of all subsets of b of s But since this function g, defined as follows, is submodular, then this function at the bottom is also submodular AUDIENCE: That function is submodular not under [INAUDIBLE] model or something MICHAL: Always Always AUDIENCE: It comes from– yeah, so why [INAUDIBLE] derive it and then observe it, no? It’s not immediate? MICHAL: It’s quite immediate AUDIENCE: OK, so then I’m missing something MICHAL: So this [INTERPOSING VOICES] They stand for the independent model, so You see why that function is submodular? AUDIENCE: I think he’s going to write the [INAUDIBLE] MICHAL: So this is the sum over all subsets Take a probability of this subset being drawn

And here this is like, 1 if this is non-empty And 0 if it’s empty OK? And this function as a function of b is submodular AUDIENCE: It’s basically the membership MICHAL: Yes AUDIENCE: [INAUDIBLE] MICHAL: And we take combination AUDIENCE: Makes sense MICHAL: So when we add additive term, we still have submodular function And in the oracle, in the separation oracle, we just need to minimize such expression for each s from a covering family So we can solve this LP in polynomial time And the rounding procedure is the same as for the classical set cover So we can round it with the logarithmic waste AUDIENCE: Are you trying to tell us an algorithm now? This was an algorithm that you described for a set cover You are describing an algorithm for this universal stochastic set cover MICHAL: But this is the whole algorithm for set cover, for universal stochastic set cover And concerning the model, we need to be able to evaluate this function g And this is a trivial [INAUDIBLE] model because we iterate through all scenarios and we check how many overlap with a given set B And in the independent activation, we have formula for that For the oracle model, so a model where these scenarios are taken from a black box, and the algorithm can just sample oracle a few times and judging on the results may make decisions, we can reduce it to scenario a model in sort of polynomial time So for unweighted instances, we get a polynomial time But for weighted instances, we can show also a lower bound of square root of n And the idea is that [INAUDIBLE] weights, we can hide a relevant part of the input with really low probability And this gets us hard as pure universal optimization Because the algorithm cannot learn anything about the distribution on the relevant part So how are we with time? 20 minutes OK So this was the simplest algorithm from our paper And now I want to show something more sophisticated And here I will show how to use the assumption, how to use the independent activation model So consider a facility location problem So we need to assign clients to facilities And we need to pay for opening each facility and pay for a distance between a client and its facility And this is the definition of the classical problem without any stochastic stuff And there is nice, constant factor approximation, but only when the function b is given by a metric

Because if it’s non-metric, one can reduce set cover into [INAUDIBLE] location and then when needs, one can get any logarithmic factor And when we go to the stochastic universal stochastic setting, we are looking for, again, best assignment But we pay for opening a facility only when there is only at least one client active who wants to use this facility And costs of transport are additive [INAUDIBLE] AUDIENCE: Yeah MICHAL: OK This is a very similar LP formulation to the previous one, but now there is this additional additive term on the right And we can also solve it, but we don’t know how to round it to get a constant factor approximation in the scenario model But in the independent activation model, we have this nice inequality Because we can express this function g So this is the evaluation, the activation function of a set We can compare it with an almost linear function And this is also a trick that Karger used in the data work on the Steiner tree So when we replace this g of b with this almost linear term, we would lose only constant factor So let’s do that And I only replaced g of b with this term And now this is almost linear program And one can interpret this as b as a rent or buy problem Because when we take a minimum of 1 and a sum of probabilities, one can interpret this that you can pay one and open some facility for everybody Or every client can pay independently when he wants to use it And using this rent or buy interpretation, we can reformulate this LP into an equivalent one So now this variable x stands like– the purple variable stands for clients contributing to opening the facility for everybody And blue variables are clients that contribute to– that pay for opening the facility only for themselves And now it’s a purely polynomial LP And we can just solve it And in order to round it, we consider, we split clients into two groups The ones that prefer renting or buying So when we look at only the big clients, so the ones that have the purple variables summing to at least 0.75, then the middle term of the objective function vanishes And when this happens, when we keep only the purple variables This is just a classical facility location, but with this additional term gc next to the business So this is our first location in the distorted metric

And this can be solved by a primal dual scheme with a factor tree approximation And when we consider the small clients, so we consider only the blue variables right now, then this LP gets trivial, because there are no dependencies between clients And for each client, we can buy the facility that is cheapest for him We need to rescale these variables involved variants And this is why we chose this constant So there is four-factor rounding scheme for this LP And since we use this trick that replace the [INAUDIBLE] with this almost linear term, we also lose another factor And this is, in the end, we obtain something around 6 approximation AUDIENCE: Why is it 4? [INTERPOSING VOICES] MICHAL: So for the big clients, their constraints are only satisfied up to 0.75 So we need to rescale these variables AUDIENCE: Couldn’t you change the threshold of 3/4 and 1/4 to play with the– I think I’m asking why is this division into more than 3/4, more than 1/4 optimal? MICHAL: Because of the rescaling procedure These purple variables for big clients, they form a feasible solution for a facility location in a distorted metric But after rescaling times 4 by 3 And we also lose this factor 3 on them So this is 4 on total And we need to rescale the small clients minus 4 So this is the optimal threshold OK, so this was the sketch for the facility location And this is the overview of what we are able to obtain So for the set cover, we can also adopt a greedy algorithm So we then get a deterministic algorithm with just a harmonic number ratio And it works also for constraints at multi cover So set cover with when each element might request to be covered multiple times So for a vertex cover and edge cover, we also are able to provide the same approximation ratios are us in the classical setting And also for their directed Steiner tree, we are able to adopt the easiest approximation algorithm, because there is approximation kind of approximation scheme for their directed Steiner tree And we are only able to adapt the first level of the scheme And in the independent activation model, we have this constant approximation for facility location that they just showed And in the similar framework, we can also handle multicut And we have these lower bounds for oracle model And the most interesting question is getting general algorithms So we have this assumption of independent activation for the rest of problems So facility location, multicut, and Steiner tree For multicut, the best what we can hope is logarithmic approximation And we will also want to have it in the scenario model

So that’s it [APPLAUSE]