Suppose you have 3 sensors: A, B, C. Each has a set of readings that goes into a column of a matrix, b:
A matrix A is constructed such that bA, bB, bC represent compressed measurements in some domain. A is Gaussian, since this is a universal sensing matrix for compressive sensing (CS), due to its poor correlation to anything other than itself. [1]
The solution for each xi (i ∈ A,B,C) represents the data recovered from these compressive measurements and can be found with min ||xi||1, or some other sparsity-promoting technique.
I am interested in the condition when all sensors (A,B,C) agree. Consensus, agreement among multiple sensors, is important for many applications. One example is safing functions. Safing is checking for agreement among, sometimes redundant, sensors before engaging a drastic action, for example, deploying the airbags in the event of an automobile crash. When all sensors agree (or agree enough), then the dangerous function of deploying an airbag towards the driver and passengers can be performed.
In this example, I am interested in the case when each sensor finds some nonzero sparse solution. Previously, I developed the notion of the diverse solution to a set of linear equations that require a positive integer solution. The most diverse solution is the one that has values of roughly similar value. In another post, I showed that the diverse solution can be used to mitigate risk.
For the safing function we want to mitigate the risk that our sensor data is in error, indicating that we should erroneously employ our drastic action. We use multiple redundant sensor to mitigate this risk. The agreement among sensors is maximized when diversity is maximized across all sensors. Even though we are seeking a sparse solution for each individual sensor (for example, min ||xi||1), we seek to diversify the number of sensors that have witnessed a significant event, that is, have found a nonzero sparse solution. So the algorithm is a sparsity-based linear program wrapped inside a diversity-based linear program. When all sensors agree, a peak-to-sum ration (PSR) is minimized, if the sensitivity and units of each sensor are normalized. Since we are nesting two linear programs together, it is important that the innermost program, recovering the compressively-sensed (CS) data, seeks a sparse solution, since this can be computationally efficient.
The l1-norm minimization routine is used as an example of a CS recovery routine and exists inside the maximum diversity routine.
Where x(:,i) indicates the ith column of x. The objective function value of the l1-norm minimization of each column of x is stored in a vector, u. This is the innermost routine. The outermost routine seeks to maximize the diversity of u by minimizing its peak-to-sum ratio.
Let’s look at some example solutions. We’ll just look at types of solutions that we can reach with this algorithm, independent of the b and A that actually generated them.
In the first example, above, all sensors seem to be in agreement that something happened (of magnitude 10). This is indicated by the low PSR, which has a lower bound of 1/N (= 1/3, in this case). There must be a little bit of noise in the readings, since the agreement isn’t perfect, but this could be considered close enough to declare consensus. Notice, also, that the sensors don’t get exactly the same reading; the readings are basically shifted by one index. If the columns of x represent samples in time, then each sensor is phase-shifted with respect to the others. In this algorithm, we don’t have to explicitly time-align the data — which can be a computationally-intensive step. We just accept the data as is, even with small difference between sensors.
In the second example, above, 2 of the 3 sensors seem to agree. One sensor appears to be broken, however, this has not completely destroyed our PSR. We could still limp along with 2 functional sensors for a little while, until we are in a position to address the potentially defective sensor. The algorithm has mitigated the risk of one sensor dying by using a diversity-promoting strategy.
In the third example, above, something is clearly busted. One sensor has detected a significant event, but the majority have not. A simple majority voting scheme would not be suspicious of this data set, but our nested diversity/sparsity linear program is. Notice that the PSR is getting closer to its upper bound, which is 1. At this level of PSR, our algorithm diagnoses our system of sensors as broken and can take action accordingly, instead of making a bad decision.
The strategy of reaching a consensus among sensors by using both sparsity- and diversity-promoting techniques have made this system more robust. However, the way the actual computation is performed hasn’t been made clear yet. Sparsity-based recovery techniques have been well covered and recently, I posted about how to solve a diversity-based integer program. Next I’ll look at how to nest these linear programs.
References
[1] Baraniuk, Richard. “Compressive Sensing,” IEEE Signal Processing Magazine. July 2007.
We know about sparsity for vectors, which is the property of having a lot of 0′s (or elements close enough to 0). The 0′s in a sparse matrix or vector, generally, make computation easier. This has been a boon to sampling systems in the form of compressive sampling, which allows recovery of sampled data by using computationally-efficient algorithms that exploit sparsity.
The opposite of sparsity is density. A vector whose entries are mostly nonzero is said to be “dense,” both in the sense that it is not sparse, as well as being slow and difficult to muddle through calculations on this vector if it is really large. In general, dense vectors are not very useful. Or are they? In this post, I introduce the notion of diverse vectors, a subset of dense vectors, as a more interesting foil to sparsity than simply dense vectors.
This post explores a measure of diversity called the peak-to-sum ratio (PSR). We can use PSR to find the most diverse solution in a linear program, but what can it be used for? We’ll find this linear program optimally distributes quantized, positive units, such as currency or genes. Maximizing diversity, by minimizing PSR, is desirable in several applications:
Investing
Workload Distribution
Product Distribution
Sensor Data Fusion and Machine Learning
In contrast to some systems which allow dense vectors, these systems actually desire the most diverse solution.
Consider the following system of equations:
x1 + x2 + x3 + x4 = 12
x2 - x3 = 0
x4 = 0
The following table gives a partial list of possible solutions (for x1, x2, and x3, since x4 is already given).
x3
x2
x1
0
0
12
1
1
10
2
2
8
3
3
6
4
4
4
5
5
2
6
6
0
7
7
-2
The sparsest solution is highlighted in green, (12, 0, 0, 0). It has the most zeroes in it. It has everything concentrated in one element of the vector, the first element. All other elements are zero, making several computations on that vector faster and easier.
Notice the solution highlighted in yellow, (4,4,4,0). It has very few zeros in it, in fact, none except the one variable that was explicitly set to 0. But there are other solutions that have only one zero, or are, as they say, 1-sparse. For instance, (10,1,1,0) is also 1-sparse. The vector (4,4,4,0) is highlighted and (10,1,1,0) is not because (4,4,4,0) is more diverse. What makes it more diverse? The (4,4,4,0) vector has the most equitable distribution among its elements, which is, conceptually, the opposite of being sparse. Whereas the most sparse solution has its energy concentrated in the fewest elements, the most diverse solution has its energy distributed to the most elements. In fact, (10,1,1,0) could be considered to be more sparse than (4,4,4,0) as it is closer to having its energy concentrated in one vector (first posited in [1]).
So if we’re talking about more or less diverse/sparse, this implies that we can quantify it. One way to quantify it is to look at the peak-to-sum ratio (PSR). The peak to sum ratio is defined as the ratio of the largest element in x to the sum of all elements of x.
PSR can also be expressed as the ratio of two norms, the infinity norm and the first norm. Let’s see what this ratio looks like for the solutions we considered above, plus a few more.
x3
x2
x1
PSR
0
0
12
1.000
1
1
10
0.833
2
2
8
0.667
3
3
6
0.500
4
4
4
0.333
5
5
2
0.417
6
6
0
0.500
7
7
-2
0.438
8
8
-4
0.400
9
9
-6
0.375
10
10
-8
0.357
11
11
-10
0.344
12
12
-12
0.333
13
13
-14
0.325
The (4,4,4,0) solution appears to have the lowest PSR, until we proceed to solution (13,13,-14,0). As we proceed past (13,13,-14,0), PSR becomes vanishingly small. In fact, we see that (4,4,4,0) was just a local minima for PSR.
This is unsatisfying, for a couple of reasons. The first is linear programming. We can’t use a linear program to minimize PSR and arrive at the most diverse solution because PSR is not convex, that is, we can’t be sure we’re on a path to smaller and smaller PSR because we could encounter a local minima, like we did at (4,4,4,0).
The second way this is unsatisfying is that (13,13,-14,0) does not seem more diverse than (4,4,4,0). If we’re considering the elements of x to be bins of energy, or some other thing that is to be distributed, then having an element that is -14 does not make any sense. Instead of distributing, or deciding how things are to be allocated, we are actually taking them away, which defeats the whole purpose of what we are trying to do. So let’s add in a constraint that x has to be greater than 0.
For this particular problem, we can proceed in a linear fashion from the first candidate solution (12,0,0,0) to the most diverse one (4,4,4,0). However, in general, is this problem convex? That is, could we choose another set of equations and be able to march towards the most diverse answer?
Consider the definition of convexity for a feasible set of solutions. A set is convex if, when proceeding from one point to the next, you stay in that set [2]. It helps to rewrite our problem in terms of linear algebra. We seek the most diverse solution to a set of linear equations,
where Ax=b represents our equations we’re trying to solve. Since x cannot be negative, we know that,
Is this convex? The shaded region in the following graph shows the x’s that satisfy the quantity (PSR) to be minimized when it’s less than or equal to 1.
In the example in the graph above, there are only two elements in x. The possible values of x outside of the shaded region, when either x1 or x2 or both are negative, are not in the set of feasible solutions. So do the answers in this shaded region form a convex set? Yes, because we can pick two points in the shaded region, and draw a straight line between them that has to stay in the shaded region.
Notice if we restrict values of the elements of x to the integers, we still have a convex set! This result is satisfying because we’ll want to distribute units of energy, currency, etc…, anyway.
In the language of linear algebra,
where Z represents the set of integers. So the two constraints we’ve added to our convex optimization problem to find the most diverse solution:
Must not contain a negative distribution (x greater than or equal to 0)
Must have a base unit of distribution (x must contain integers)
We can now use an integer program to perform the optimization. Restricted to positive, integer values, when we minimize PSR, our linear program will march algorithmically, inexorably to the most diverse solution to a system of linear equations.
References
[1] Zonoobi et al. “Gini Index as Sparsity Measure for Signal Reconstruction from Compressive Samples,” IEEE Journal of Selected Topics in Signal Processing. vol 5, no. 5, Sept. 2011.
So say you want to use compressive sensing to lower the sampling rate of your measurement system. You first need to measure it in a basis which is sparse. You do this so you can sample it randomly far below Nyquist and save power in the process. However, what if the signal you are trying to measure is sparse in time? Then you’re in a little bit of a pickle, because you can’t sample this signal randomly without potentially losing information.
What I learned last week is that to compressively sense signals that are sparse in time, you need to first transform them into a domain that is not sparse in time and then sample randomly. One way to do this is to use the Hadamard transform to convert your signal of interest which is sparse in the time domain into a signal that is sparse in the frequency domain without any loss of information. The Hadamard transform is like taking a bunch of little DFT’s on your signal and putting them into a matrix.
However, if your signal is sparse in time, you probably still care exactly when your signal did the thing that is of interest to you, when it did whatever it was that you couldn’t afford to miss by sampling randomly. The latency or the group delay of your compressive sensing of a sparse in time signal is, ostensibly, going to be a pretty important piece of information to you. Unfortunately, I cannot find one paper that addresses latency or group delay in compressive sensing. To be sure, this has got to be incredibly difficult, if not impossible to predict. I would think that the iterative nature of the recovery algorithm alone (even though the order of its computational complexity can be assessed) must be very difficult to quantify in terms of time or samples.
Don’t worry, it’s just the fundamentals I don’t get; but it’s not for lack of traffic on the subject. There are some great resources on the internet about compressive sensing — which is a groundbreaking technique for reconstructing seemingly lost signals. Igor Carron has a great up-to-the-minute blog, Nuit Blanche on the latest news on compressive sensing and Rich Baranuik has a treasure trove of papers on the subject at http://dsp.rice.edu/cs. NPR did a story on compressive sensing that provides a good illustration of compressive sensing in practice. They have an example of an undersampled piece of music reconstructed using compressive sensing techniques.
Human ears usually pick up frequencies under 16 kHz, some as high as 20 kHz. Most speech is easily recognizable with only the frequency components at or below 4 kHz (for phone calls, speech is usually sampled at 8 kHz). CD’s are sampled at a rate of 44 kHz. In the demonstration on NPR, a musical track is sampled at a rate on the order of 100′s of Hz.
Hit Play on the control above to hear the undersampled version (sampled at ~100′s of Hz).
By using some fancy math, Jordan Ellenberg, from the University of Wisconsin, is able to faithfully reconstruct the original recording.
Hit Play on the control above to hear the sample reconstructed.
Compressive sensing has a wide range of possible applications. In general, there are usually three cases where compressive sensing is appropriate to use in a sampled system.
When the data is soon compressed after sampling
When the data is soon encoded for transmission
When the data is sparse
As it turns out, compressive sensing relies on some fairly complicated matrix algebra to work. Below are some of the topics I’ve encountered while reading about compressive sensing to try to figure out how it works.
Sparsity
A matrix is considered sparse if it can be represented with a much smaller set of values than its full size without much loss of information. For instance, take the matrix [1.0, 0.01, 0.01, 0.01, 0.01] and envision a scenario where it is treated, for all intents and purposes as [1.0, 0.0, 0.0, 0.0, 0.0]. If that representation is good enough for whatever calculation is performed on it, then the matrix is sparse. Say you wanted to know roughly what the average value of that matrix is. You could say it’s about 0.2 (as opposed to 0.208, which would be more accurate). The advantage is a far faster calculation: 1.0 / 5, instead of (1.0 + 0.1 + 0.1 + 0.1 + 0.1) / 5.
L1 Norm Minimization
L1 norm minimization is the technique the receiver employs to reconstruct the compressively sensed signal. The Wired article on compressive sensing gives a great overview of L1 minimization. In the example in the article, the L1 minimization algorithm creates a high resolution image from a low resolution one. It does this by
filling in shapes of color based on nearby known pixels. It iteratively follows this procedure, using smaller and smaller shapes until the picture is complete.
The L1 norm is a measure of distance in a 2-dimensional Cartesian coordinate system that gives the distance between two points. However, this distance is calculated as the sum of absolute difference of the coordinates of the two points. This is also known as taxicab geometry. It assumes that the only way to travel from one point to the other is in a gridded, rectilinear fashion, like streets in a major metropolitan area.
But how does L1 minimization preform this magic? I don’t know. This feat, I guess, is usually executed on a host CPU because when performace metrics are given for a new approach to L1 norm minimization, they are given with the qualification of what kind of machine was used. What’s more, this particular bit of matrix math, the part that actually recovers the desired signal, is iteratively executed. This means there’s really no telling how long this will take, which seems to make compressive sensing inappropriate for real-time sensing applications that rely on low-latency, such as feedback control loops.
Restricted Isometry Property
Isometry is a distance-preserving map between metric spaces.
characterizes matrices which are nearly orthonormal, at least when operating on sparse vectors. … There are no known large matrices with bounded restricted isometry constants, but many random matrices have been shown shown to remain bounded. The current smallest upper bounds for any large rectangular matrices are for those of Gaussian matrices
and further it defines RIP as….
Let A be an m × p matrix and let s < p be an integer. Suppose that there exists a constant δs such that, for every m × s submatrix As of A and for every vector y,
Then, the matrix A is said to satisfy the s-restricted isometry property with restricted isometry constant δs.
This really just completely confuses me. I guess the restricted isometry constant bounds the distance preserving transform between the two spaces? I have no idea, I guess I need to learn some more matrix math…
Coherence
Compressive sensing depends on mutual coherence, which is a matrix of the cross-correlations. But between what and what? I don’t really know, but I think it’s between acquired samples. That is, one sample and the next sample should have low coherence, so that they appear to be random and random measurements can be used for signals sparse in any basis.
I’m sure there’s plenty more I don’t understand, but that’s all I have for now.