{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "#\n", "#
*\"Two-way kernel matrix puncturing: towards resource-efficient PCA and spectral clustering\"*
\n", "##
-- Supplementary Material --
\n", "##
-- Python Codes of main article figures --
\n", "\n", "## Preamble: useful packages and functions" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import numpy as np\n", "import scipy as sp\n", "\n", "import matplotlib.pyplot as plt\n", "import scipy.linalg as lin\n", "import scipy.stats as stats\n", "import scipy.sparse.linalg\n", "import scipy.special\n", "import pandas as pd\n", "import seaborn as sns\n", "\n", "from numpy.random import default_rng\n", "\n", "rng = default_rng(0)\n", "\n", "%load_ext autoreload\n", "\n", "%autoreload 2\n", "\n", "# user's functions from the local .py file\n", "from punctutils import *\n", "\n", "%matplotlib inline\n", "\n", "plt.rcParams.update({\"font.size\": 12})" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Two-way puncturing of the kernel matrix\n", "\n", "The data matrix is $X\\in \\mathbb{C}^{p \\times n}$ where $p$ and $n$ are the feature and sample size resp.\n", "\n", "Then \n", "$$K = \\left[ \\frac 1p (X \\odot S)' (X \\odot S) \\right] \\odot B$$\n", "is the $n\\times n$ *two-way punctured* kernel matrix where:\n", " - matrix $S$ is the Bernoulli iid $(p \\times n)$ random matrix to select the **data** entries with rate eS\n", " - matrix $B$ is the Bernoulli iid $(n \\times n)$ random matrix to select the **kernel** entries with rate eB" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Simulations\n", "\n", "#### Figure 1.\n", "Eigenvalue distribution $\\nu_n$ of $K$ versus limit measure $\\nu$, for $p=200$, $n=4\\,000$, $x_i\\sim .4 \\mathcal N(\\mu_1,I_p)+.6\\mathcal N(\\mu_2,I_p)$ for \n", "$[\\mu_1^T,\\mu_2^T]^T \\sim\\mathcal{N}(0, \\frac1p \\left[\\begin{smallmatrix} 10 & 5.5 \\\\ 5.5 & 15\\end{smallmatrix}\\right]\\otimes I_p)$; \n", "$\\varepsilon_S=.2$, $\\varepsilon_B=.4$. Sample vs theoretical spikes in blue vs red circles. The two \"humps\" remind the semi-circular and Marcenko-Pastur laws." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "image/png": 