-
Effect of stepwise adjustment of Damping factor upon PageRank
Authors:
Subhajit Sahu,
Kishore Kothapalli,
Dip Sankar Banerjee
Abstract:
The effect of adjusting damping factor α, from a small initial value α0 to the final desired αf value, upon then iterations needed for PageRank computation is observed. Adjustment of the damping factor is done in one or more steps. Results show no improvement in performance over a fixed damping factor based PageRank.
The effect of adjusting damping factor α, from a small initial value α0 to the final desired αf value, upon then iterations needed for PageRank computation is observed. Adjustment of the damping factor is done in one or more steps. Results show no improvement in performance over a fixed damping factor based PageRank.
△ Less
Submitted 9 August, 2021;
originally announced August 2021.
-
Adjusting PageRank parameters and Comparing results
Authors:
Subhajit Sahu,
Kishore Kothapalli,
Dip Sankar Banerjee
Abstract:
The effect of adjusting damping factor α and tolerance τ on iterations needed for PageRank computation is studied here. Relative performance of PageRank computation with L1, L2, and L{\infty} norms used as convergence check, are also compared with six possible mean ratios. It is observed that increasing the damping factor α linearly increases the iterations needed almost exponentially. On the othe…
▽ More
The effect of adjusting damping factor α and tolerance τ on iterations needed for PageRank computation is studied here. Relative performance of PageRank computation with L1, L2, and L{\infty} norms used as convergence check, are also compared with six possible mean ratios. It is observed that increasing the damping factor α linearly increases the iterations needed almost exponentially. On the other hand, decreasing the tolerance τ exponentially decreases the iterations needed almost exponentially. On average, PageRank with L{\infty} norm as convergence check is the fastest, quickly followed by L2 norm, and then L1 norm. For large graphs, above certain tolerance τ values, convergence can occur in a single iteration. On the contrary, below certain tolerance τ values, sensitivity issues can begin to appear, causing computation to halt at maximum iteration limit without convergence. The six mean ratios for relative performance comparison are based on arithmetic, geometric, and harmonic mean, as well as the order of ratio calculation. Among them GM-RATIO, geometric mean followed by ratio calculation, is found to be most stable, followed by AM-RATIO.
△ Less
Submitted 6 August, 2021;
originally announced August 2021.
-
Sample-and-Gather: Fast Ruling Set Algorithms in the Low-Memory MPC Model
Authors:
Kishore Kothapalli,
Shreyas Pai,
Sriram V. Pemmaraju
Abstract:
Motivated by recent progress on symmetry breaking problems such as maximal independent set (MIS) and maximal matching in the low-memory Massively Parallel Computation (MPC) model (e.g., Behnezhad et al.~PODC 2019; Ghaffari-Uitto SODA 2019), we investigate the complexity of ruling set problems in this model. The MPC model has become very popular as a model for large-scale distributed computing and…
▽ More
Motivated by recent progress on symmetry breaking problems such as maximal independent set (MIS) and maximal matching in the low-memory Massively Parallel Computation (MPC) model (e.g., Behnezhad et al.~PODC 2019; Ghaffari-Uitto SODA 2019), we investigate the complexity of ruling set problems in this model. The MPC model has become very popular as a model for large-scale distributed computing and it comes with the constraint that the memory-per-machine is strongly sublinear in the input size. For graph problems, extremely fast MPC algorithms have been designed assuming $\tildeΩ(n)$ memory-per-machine, where $n$ is the number of nodes in the graph (e.g., the $O(\log\log n)$ MIS algorithm of Ghaffari et al., PODC 2018). However, it has proven much more difficult to design fast MPC algorithms for graph problems in the low-memory MPC model, where the memory-per-machine is restricted to being strongly sublinear in the number of nodes, i.e., $O(n^\eps)$ for $0 < \eps < 1$.
In this paper, we present an algorithm for the 2-ruling set problem, running in $\tilde{O}(\log^{1/6} Δ)$ rounds whp, in the low-memory MPC model. We then extend this result to $β$-ruling sets for any integer $β> 1$. Specifically, we show that a $β$-ruling set can be computed in the low-memory MPC model with $O(n^\eps)$ memory-per-machine in $\tilde{O}(β\cdot \log^{1/(2^{β+1}-2)} Δ)$ rounds, whp. From this it immediately follows that a $β$-ruling set for $β= Ω(\log\log\log Δ)$-ruling set can be computed in in just $O(β\log\log n)$ rounds whp. The above results assume a total memory of $\tilde{O}(m + n^{1+\eps})$. We also present algorithms for $β$-ruling sets in the low-memory MPC model assuming that the total memory over all machines is restricted to $\tilde{O}(m)$.
△ Less
Submitted 25 September, 2020;
originally announced September 2020.
-
Ramanujan Bipartite Graph Products for Efficient Block Sparse Neural Networks
Authors:
Dharma Teja Vooturi,
Girish Varma,
Kishore Kothapalli
Abstract:
Sparse neural networks are shown to give accurate predictions competitive to denser versions, while also minimizing the number of arithmetic operations performed. However current hardware like GPU's can only exploit structured sparsity patterns for better efficiency. Hence the run time of a sparse neural network may not correspond to the arithmetic operations required.
In this work, we propose R…
▽ More
Sparse neural networks are shown to give accurate predictions competitive to denser versions, while also minimizing the number of arithmetic operations performed. However current hardware like GPU's can only exploit structured sparsity patterns for better efficiency. Hence the run time of a sparse neural network may not correspond to the arithmetic operations required.
In this work, we propose RBGP( Ramanujan Bipartite Graph Product) framework for generating structured multi level block sparse neural networks by using the theory of Graph products. We also propose to use products of Ramanujan graphs which gives the best connectivity for a given level of sparsity. This essentially ensures that the i.) the networks has the structured block sparsity for which runtime efficient algorithms exists ii.) the model gives high prediction accuracy, due to the better expressive power derived from the connectivity of the graph iii.) the graph data structure has a succinct representation that can be stored efficiently in memory. We use our framework to design a specific connectivity pattern called RBGP4 which makes efficient use of the memory hierarchy available on GPU. We benchmark our approach by experimenting on image classification task over CIFAR dataset using VGG19 and WideResnet-40-4 networks and achieve 5-9x and 2-5x runtime gains over unstructured and block sparsity patterns respectively, while achieving the same level of accuracy.
△ Less
Submitted 2 July, 2020; v1 submitted 24 June, 2020;
originally announced June 2020.
-
Efficient Range Reporting of Convex Hull
Authors:
Jatin Agarwal,
Nadeem Moidu,
Kishore Kothapalli,
Kannan Srinathan
Abstract:
We consider the problem of reporting convex hull points in an orthogonal range query in two dimensions. Formally, let $P$ be a set of $n$ points in $\mathbb{R}^{2}$. A point lies on the convex hull of a point set $S$ if it lies on the boundary of the minimum convex polygon formed by $S$. In this paper, we are interested in finding the points that lie on the boundary of the convex hull of the point…
▽ More
We consider the problem of reporting convex hull points in an orthogonal range query in two dimensions. Formally, let $P$ be a set of $n$ points in $\mathbb{R}^{2}$. A point lies on the convex hull of a point set $S$ if it lies on the boundary of the minimum convex polygon formed by $S$. In this paper, we are interested in finding the points that lie on the boundary of the convex hull of the points in $P$ that also fall with in an orthogonal range$[x_{lt},x_{rt}]\times{}[y_b, y_t]$. We propose a $O(n \log^{2} n) $ space data structure that can support reporting points on a convex hull inside an orthogonal range query, in time $O(\log^{3} n + h)$. Here $h$ is the size of the output. This work improves the result of (Brass et al. 2013) \cite{brass} that builds a data structure that uses $O(n \log^{2} n)$ space and has a $O(\log^{5} n + h)$ query time. Additionally, we show that our data structure can be modified slightly to solve other related problems. For instance, for counting the number of points on the convex hull in an orthogonal query rectangle, we propose an $O(n \log^{2}n)$ space data structure that can be queried upon in $O(\log^{3} n)$ time. We also propose a $O(n \log^{2} n) $ space data structure that can compute the $area$ and $perimeter$ of the convex hull inside an orthogonal range query in $O(\log^{3} n$) time.
△ Less
Submitted 23 July, 2013; v1 submitted 22 July, 2013;
originally announced July 2013.
-
CPU and/or GPU: Revisiting the GPU Vs. CPU Myth
Authors:
Kishore Kothapalli,
Dip Sankar Banerjee,
P. J. Narayanan,
Surinder Sood,
Aman Kumar Bahl,
Shashank Sharma,
Shrenik Lad,
Krishna Kumar Singh,
Kiran Matam,
Sivaramakrishna Bharadwaj,
Rohit Nigam,
Parikshit Sakurikar,
Aditya Deshpande,
Ishan Misra,
Siddharth Choudhary,
Shubham Gupta
Abstract:
Parallel computing using accelerators has gained widespread research attention in the past few years. In particular, using GPUs for general purpose computing has brought forth several success stories with respect to time taken, cost, power, and other metrics. However, accelerator based computing has signifi- cantly relegated the role of CPUs in computation. As CPUs evolve and also offer matching c…
▽ More
Parallel computing using accelerators has gained widespread research attention in the past few years. In particular, using GPUs for general purpose computing has brought forth several success stories with respect to time taken, cost, power, and other metrics. However, accelerator based computing has signifi- cantly relegated the role of CPUs in computation. As CPUs evolve and also offer matching computational resources, it is important to also include CPUs in the computation. We call this the hybrid computing model. Indeed, most computer systems of the present age offer a degree of heterogeneity and therefore such a model is quite natural.
We reevaluate the claim of a recent paper by Lee et al.(ISCA 2010). We argue that the right question arising out of Lee et al. (ISCA 2010) should be how to use a CPU+GPU platform efficiently, instead of whether one should use a CPU or a GPU exclusively. To this end, we experiment with a set of 13 diverse workloads ranging from databases, image processing, sparse matrix kernels, and graphs. We experiment with two different hybrid platforms: one consisting of a 6-core Intel i7-980X CPU and an NVidia Tesla T10 GPU, and another consisting of an Intel E7400 dual core CPU with an NVidia GT520 GPU. On both these platforms, we show that hybrid solutions offer good advantage over CPU or GPU alone solutions. On both these platforms, we also show that our solutions are 90% resource efficient on average.
Our work therefore suggests that hybrid computing can offer tremendous advantages at not only research-scale platforms but also the more realistic scale systems with significant performance gains and resource efficiency to the large scale user community.
△ Less
Submitted 9 March, 2013;
originally announced March 2013.
-
On the Analysis of a Label Propagation Algorithm for Community Detection
Authors:
Kishore Kothapalli,
Sriram V. Pemmaraju,
Vivek Sardeshmukh
Abstract:
This paper initiates formal analysis of a simple, distributed algorithm for community detection on networks. We analyze an algorithm that we call \textsc{Max-LPA}, both in terms of its convergence time and in terms of the "quality" of the communities detected. \textsc{Max-LPA} is an instance of a class of community detection algorithms called \textit{label propagation} algorithms. As far as we kno…
▽ More
This paper initiates formal analysis of a simple, distributed algorithm for community detection on networks. We analyze an algorithm that we call \textsc{Max-LPA}, both in terms of its convergence time and in terms of the "quality" of the communities detected. \textsc{Max-LPA} is an instance of a class of community detection algorithms called \textit{label propagation} algorithms. As far as we know, most analysis of label propagation algorithms thus far has been empirical in nature and in this paper we seek a theoretical understanding of label propagation algorithms. In our main result, we define a clustered version of \er random graphs with clusters $V_1, V_2,..., V_k$ where the probability $p$, of an edge connecting nodes within a cluster $V_i$ is higher than $p'$, the probability of an edge connecting nodes in distinct clusters. We show that even with fairly general restrictions on $p$ and $p'$ ($p = Ω(\frac{1}{n^{1/4-ε}})$ for any $ε> 0$, $p' = O(p^2)$, where $n$ is the number of nodes), \textsc{Max-LPA} detects the clusters $V_1, V_2,..., V_n$ in just two rounds. Based on this and on empirical results, we conjecture that \textsc{Max-LPA} can correctly and quickly identify communities on clustered \er graphs even when the clusters are much sparser, i.e., with $p = \frac{c\log n}{n}$ for some $c > 1$.
△ Less
Submitted 13 October, 2012;
originally announced October 2012.
-
Super-Fast 3-Ruling Sets
Authors:
Kishore Kothapalli,
Sriram Pemmaraju
Abstract:
A $t$-ruling set of a graph $G = (V, E)$ is a vertex-subset $S \subseteq V$ that is independent and satisfies the property that every vertex $v \in V$ is at a distance of at most $t$ from some vertex in $S$. A \textit{maximal independent set (MIS)} is a 1-ruling set. The problem of computing an MIS on a network is a fundamental problem in distributed algorithms and the fastest algorithm for this p…
▽ More
A $t$-ruling set of a graph $G = (V, E)$ is a vertex-subset $S \subseteq V$ that is independent and satisfies the property that every vertex $v \in V$ is at a distance of at most $t$ from some vertex in $S$. A \textit{maximal independent set (MIS)} is a 1-ruling set. The problem of computing an MIS on a network is a fundamental problem in distributed algorithms and the fastest algorithm for this problem is the $O(\log n)$-round algorithm due to Luby (SICOMP 1986) and Alon et al. (J. Algorithms 1986) from more than 25 years ago. Since then the problem has resisted all efforts to yield to a sub-logarithmic algorithm. There has been recent progress on this problem, most importantly an $O(\log Δ\cdot \sqrt{\log n})$-round algorithm on graphs with $n$ vertices and maximum degree $Δ$, due to Barenboim et al. (Barenboim, Elkin, Pettie, and Schneider, April 2012, arxiv 1202.1983; to appear FOCS 2012).
We approach the MIS problem from a different angle and ask if O(1)-ruling sets can be computed much more efficiently than an MIS? As an answer to this question, we show how to compute a 2-ruling set of an $n$-vertex graph in $O((\log n)^{3/4})$ rounds. We also show that the above result can be improved for special classes of graphs such as graphs with high girth, trees, and graphs of bounded arboricity.
Our main technique involves randomized sparsification that rapidly reduces the graph degree while ensuring that every deleted vertex is close to some vertex that remains. This technique may have further applications in other contexts, e.g., in designing sub-logarithmic distributed approximation algorithms. Our results raise intriguing questions about how quickly an MIS (or 1-ruling sets) can be computed, given that 2-ruling sets can be computed in sub-logarithmic rounds.
△ Less
Submitted 12 July, 2012;
originally announced July 2012.
-
Automatic analysis of distance bounding protocols
Authors:
Sreekanth Malladi,
Bezawada Bruhadeshwar,
Kishore Kothapalli
Abstract:
Distance bounding protocols are used by nodes in wireless networks to calculate upper bounds on their distances to other nodes. However, dishonest nodes in the network can turn the calculations both illegitimate and inaccurate when they participate in protocol executions. It is important to analyze protocols for the possibility of such violations. Past efforts to analyze distance bounding protocol…
▽ More
Distance bounding protocols are used by nodes in wireless networks to calculate upper bounds on their distances to other nodes. However, dishonest nodes in the network can turn the calculations both illegitimate and inaccurate when they participate in protocol executions. It is important to analyze protocols for the possibility of such violations. Past efforts to analyze distance bounding protocols have only been manual. However, automated approaches are important since they are quite likely to find flaws that manual approaches cannot, as witnessed in literature for analysis pertaining to key establishment protocols. In this paper, we use the constraint solver tool to automatically analyze distance bounding protocols. We first formulate a new trace property called Secure Distance Bounding (SDB) that protocol executions must satisfy. We then classify the scenarios in which these protocols can operate considering the (dis)honesty of nodes and location of the attacker in the network. Finally, we extend the constraint solver so that it can be used to test protocols for violations of SDB in these scenarios and illustrate our technique on some published protocols.
△ Less
Submitted 28 March, 2010;
originally announced March 2010.