<?xml version="1.0" encoding="UTF-8" ?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">

    <channel>
        <title>A Not-So Primordial Soup</title>
        <link>https://blog.quipu-strands.com</link>
        <atom:link href="https://blog.quipu-strands.com/feed.xml" rel="self" type="application/rss+xml" />
        <description></description>
        <lastBuildDate>Tue, 03 Feb 2026 00:51:19 -0800</lastBuildDate>
        
        <item>
            <title>The Gumbel-Max Trick</title>
            <link>
                https://blog.quipu-strands.com/gumbel
            </link>
            <description>
                &lt;script type=&quot;text/javascript&quot; async=&quot;&quot; src=&quot;https://cdn.jsdelivr.net/npm/mathjax@3/es5/tex-mml-chtml.js&quot;&gt;
&lt;/script&gt;

&lt;figure&gt;
  &lt;p&gt;&lt;img src=&quot;/assets/gumbel/loaded_die_banner.png&quot; alt=&quot;biased_die&quot; width=&quot;100%&quot; /&gt;&lt;/p&gt;

  &lt;figcaption style=&quot;text-align:center; color: #777;&quot;&gt;

  &lt;/figcaption&gt;
&lt;/figure&gt;

&lt;p&gt;When we set out to learn a function or some property of it (like its maximum), we hope it is &lt;em&gt;differentiable&lt;/em&gt;, because that means we have at our disposal a host of well-studied, and often fast, techniques. But sometimes we are not so lucky - and then there are broadly two options: (a) use a technique that doesn’t rely on differentiability, e.g., Bayesian Optimization, or (b) use an approximation that is differentiable. The topic of this post is a very useful and elegant instance of the latter. It’s a technique to make sampling from a categorical distribution differentiable, using the &lt;em&gt;Gumbel&lt;/em&gt; distribution &lt;a class=&quot;citation&quot; href=&quot;#mcfadden_conditional_1974&quot;&gt;(McFadden, 1974; Jang et al., 2017; Maddison et al., 2017)&lt;/a&gt;. The need to differentiate through the categorical distribution shows up in various places &lt;a class=&quot;citation&quot; href=&quot;#gumbel_review&quot;&gt;(Huijben et al., 2023)&lt;/a&gt;. About the trick, &lt;a href=&quot;https://arxiv.org/pdf/1903.06059&quot;&gt;this&lt;/a&gt; paper &lt;a class=&quot;citation&quot; href=&quot;#DBLP:journals/corr/abs-1903-06059&quot;&gt;(Kool et al., 2019)&lt;/a&gt; says:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;We think the Gumbel-Max trick is like a magic trick.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;I agree! Let’s get started.&lt;/p&gt;

&lt;ul id=&quot;markdown-toc&quot;&gt;
  &lt;li&gt;&lt;a href=&quot;#why-differentiability&quot; id=&quot;markdown-toc-why-differentiability&quot;&gt;Why Differentiability&lt;/...
            </description>
            <pubDate>Tue, 03 Feb 2026 00:38:00 -0800</pubDate>
            <guid>
                https://blog.quipu-strands.com/gumbel
            </guid>
        </item>
        
        <item>
            <title>Evaluating LLMs - Notes on a NeurIPS'24 Tutorial</title>
            <link>
                https://blog.quipu-strands.com/eval-llms
            </link>
            <description>
                &lt;script type=&quot;text/x-mathjax-config&quot;&gt; 
    MathJax.Hub.Config({ 
        &quot;HTML-CSS&quot;: { scale: 100, linebreaks: { automatic: true } }, 
        SVG: { linebreaks: { automatic:true } }, 
        displayAlign: &quot;center&quot; });
&lt;/script&gt;

&lt;script type=&quot;text/javascript&quot; async=&quot;&quot; src=&quot;https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML&quot;&gt;

&lt;/script&gt;

&lt;!-- _includes/image.html --&gt;
&lt;div class=&quot;image-wrapper&quot;&gt;
    
        &lt;img src=&quot;/assets/eval_llm/banner.png&quot; alt=&quot;test&quot; /&gt;
    
    
        &lt;p class=&quot;image-caption&quot;&gt;&lt;/p&gt;
    
&lt;/div&gt;

&lt;style&gt;
@import url(&#39;https://fonts.googleapis.com/css2?family=Indie+Flower&amp;display=swap&#39;);
&lt;/style&gt;

&lt;p&gt;I attended NeurIPS’24 virtually, and I was happy to see that they had two tutorials on topics that I care about. One was on evaluating LLMs, and the other one was on decoding-time strategies. This post covers the former. I have been meaning to publish this for a while, but this languished as a draft for a long time while life got in the way. Well.&lt;/p&gt;

&lt;ul id=&quot;markdown-toc&quot;&gt;
  &lt;li&gt;&lt;a href=&quot;#introduction&quot; id=&quot;markdown-toc-introduction&quot;&gt;Introduction&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#tutorial-intro---irina-sigler&quot; id=&quot;markdown-toc-tutorial-intro---irina-sigler&quot;&gt;Tutorial Intro - Irina Sigler&lt;/a&gt;&lt;/li&gt;
  &lt;li&gt;&lt;a href=&quot;#quality-evaluation---yuan-xue-talk-irina-sigler-code&quot; id=&quot;markdown-toc-quality-evaluation---yuan-xue-talk-irina-sigler-code&quot;&gt;Quality Evaluation - Yuan Xue (talk), Irina Sigler (code)&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#scope-of-evaluation&quot; id=&quot;markdown-toc-scope-of-evaluation&quot;&gt;Scope of Evaluation&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#how...
            </description>
            <pubDate>Thu, 06 Mar 2025 09:00:00 -0800</pubDate>
            <guid>
                https://blog.quipu-strands.com/eval-llms
            </guid>
        </item>
        
        <item>
            <title>Inactive Learning?</title>
            <link>
                https://blog.quipu-strands.com/inactive_learning
            </link>
            <description>
                &lt;script type=&quot;text/x-mathjax-config&quot;&gt; 
    MathJax.Hub.Config({ 
        &quot;HTML-CSS&quot;: { scale: 100, linebreaks: { automatic: true } }, 
        SVG: { linebreaks: { automatic:true } }, 
        displayAlign: &quot;center&quot; });
&lt;/script&gt;

&lt;script type=&quot;text/javascript&quot; async=&quot;&quot; src=&quot;https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML&quot;&gt;

&lt;/script&gt;

&lt;p&gt;I totally stole the title from a paper &lt;a class=&quot;citation&quot; href=&quot;#10.1145/1964897.1964906&quot;&gt;(Attenberg &amp;amp; Provost, 2011)&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In theory, &lt;em&gt;Active Learning (AL)&lt;/em&gt; is a tremendous idea. You need labeled data, but your kind of labeling comes at a cost, e.g., you need to obtain them from a domain expert. Now, lets say, your goal is to use this labeled data to train a classifier that gets to a held-out accuracy of \(90\%\). If you randomly sampled points to label, you might require \(1000\) points. Active Learning lets you &lt;em&gt;strategically&lt;/em&gt; pick just \(500\) points for labeling, to reach the same accuracy. Half the labeling cost for the same outcome. This is great!&lt;/p&gt;

&lt;p&gt;Except that in a lot of real-world cases this is not how it plays out.  I suspected this from my personal experiments, and then in some stuff we did at &lt;a href=&quot;https://www.247.ai/&quot;&gt;[24]7.ai&lt;/a&gt;. So we decided to thoroughly test out multiple scenarios in text classification, where you believe (or current literature leads us to believe) Active Learning &lt;em&gt;should&lt;/em&gt; work … but it just doesn’t. We summarized our observations into the paper &lt;em&gt;“On the Fragility of Active Learners for Text Classification”&lt;/em&gt; &lt;a class=&quot;citation&quot; href=&quot;#fragilityActive&quot;&gt;(Ghose &amp;amp; Nguyen, 2024)&lt;/a&gt; [&lt;a href=&quot;https://arxiv.org/pdf/2403.15744&quot;&gt;PDF&lt;/a&gt;], and th...
            </description>
            <pubDate>Wed, 25 Sep 2024 12:00:00 -0700</pubDate>
            <guid>
                https://blog.quipu-strands.com/inactive_learning
            </guid>
        </item>
        
        <item>
            <title>Jensen's Inequality - A Visual Intuition</title>
            <link>
                https://blog.quipu-strands.com/jensens_inequality
            </link>
            <description>
                &lt;script type=&quot;text/x-mathjax-config&quot;&gt; 
    MathJax.Hub.Config({ 
        &quot;HTML-CSS&quot;: { scale: 100, linebreaks: { automatic: true } }, 
        SVG: { linebreaks: { automatic:true } }, 
        displayAlign: &quot;center&quot; });
&lt;/script&gt;

&lt;script type=&quot;text/javascript&quot; async=&quot;&quot; src=&quot;https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML&quot;&gt;

&lt;/script&gt;

&lt;p&gt;&lt;strong&gt;Jensen’s inequality&lt;/strong&gt; finds widespread application in mathematical proofs. I am fond of a particular intuitive explanation of it, which doesn’t seem to be very popular. I will try to present it in brief here.&lt;/p&gt;

&lt;p&gt;I am not sure when this argument originated, but Google does turn up a paper &lt;a class=&quot;citation&quot; href=&quot;#jensens_needham&quot;&gt;(Needham, 1993)&lt;/a&gt;. Even if this is not the source, it is a good reference. On a related note, the author of the paper, Tristan Needham, has the very well-reviewed book “&lt;a href=&quot;https://www.amazon.com/Visual-Complex-Analysis-Tristan-Needham/dp/0198534469&quot;&gt;Visual Complex Analysis&lt;/a&gt;” to his credit.&lt;/p&gt;

&lt;p&gt;For the record, &lt;a href=&quot;(https://en.wikipedia.org/wiki/Jensen%27s_inequality)&quot;&gt;Wikipedia&lt;/a&gt; states one version in this way:&lt;/p&gt;

&lt;!-- _includes/image.html --&gt;
&lt;div class=&quot;image-wrapper&quot;&gt;
    
        &lt;img src=&quot;/assets/jensens/wiki.png&quot; alt=&quot;test&quot; /&gt;
    
    
        &lt;p class=&quot;image-caption&quot;&gt;Jensen&#39;s Inequality, as on Wikipedia.&lt;/p&gt;
    
&lt;/div&gt;

&lt;p&gt;But don’t read too much into this yet; we’ll discover this form along the way.
This is going to sound surprising but let’s start with the idea of the &lt;em&gt;center of mass (CM)&lt;/em&gt;.&lt;/p&gt;

&lt;h2 id=&quot;center-of-mass&quot;&gt;Center of Mass&lt;/h2&gt;

&lt;p&gt;For our purposes, we can think of th...
            </description>
            <pubDate>Sun, 15 Sep 2024 23:44:00 -0700</pubDate>
            <guid>
                https://blog.quipu-strands.com/jensens_inequality
            </guid>
        </item>
        
        <item>
            <title>Bayesian Optimization, Part 2: Acquisition Functions</title>
            <link>
                https://blog.quipu-strands.com/bayesopt_2_acq_fns
            </link>
            <description>
                &lt;script type=&quot;text/x-mathjax-config&quot;&gt; 
    MathJax.Hub.Config({ 
        &quot;HTML-CSS&quot;: { scale: 100, linebreaks: { automatic: true } }, 
        SVG: { linebreaks: { automatic:true } }, 
        displayAlign: &quot;center&quot; });
&lt;/script&gt;

&lt;script type=&quot;text/javascript&quot; async=&quot;&quot; src=&quot;https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML&quot;&gt;

&lt;/script&gt;

&lt;!-- _includes/image.html --&gt;
&lt;div class=&quot;image-wrapper&quot;&gt;
    
        &lt;img src=&quot;/assets/bayesopt/bayesopt_cover.png&quot; alt=&quot;test&quot; /&gt;
    
    
        &lt;p class=&quot;image-caption&quot;&gt;&lt;/p&gt;
    
&lt;/div&gt;

&lt;p&gt;This post continues our discussion on BayesOpt. This is &lt;strong&gt;part-2 of a two-part series&lt;/strong&gt;. 
Now we take a look at the other pillar BayesOpt rests on: acquisition functions. My goal is to provide a flavor by looking at a few of them. I’ll go into depth for a couple; this would help us appreciate the role of GPs in conveniently calculating acquisition values. For the rest I’ll provide an overview.&lt;/p&gt;

&lt;ul id=&quot;markdown-toc&quot;&gt;
  &lt;li&gt;&lt;a href=&quot;#acquisition-functions&quot; id=&quot;markdown-toc-acquisition-functions&quot;&gt;Acquisition Functions&lt;/a&gt;    &lt;ul&gt;
      &lt;li&gt;&lt;a href=&quot;#probability-of-improvement&quot; id=&quot;markdown-toc-probability-of-improvement&quot;&gt;Probability of Improvement&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#expected-improvement&quot; id=&quot;markdown-toc-expected-improvement&quot;&gt;Expected Improvement&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#predictive-entropy-search&quot; id=&quot;markdown-toc-predictive-entropy-search&quot;&gt;Predictive Entropy Search&lt;/a&gt;&lt;/li&gt;
      &lt;li&gt;&lt;a href=&quot;#max-value-entropy-search&quot; id=&quot;markdown-toc-max-value-entropy-search&quot;&gt;Max-value Entropy ...
            </description>
            <pubDate>Sat, 18 Nov 2023 11:00:00 -0800</pubDate>
            <guid>
                https://blog.quipu-strands.com/bayesopt_2_acq_fns
            </guid>
        </item>
        
        <item>
            <title>Bayesian Optimization, Part 1: Key Ideas, Gaussian Processes</title>
            <link>
                https://blog.quipu-strands.com/bayesopt_1_key_ideas_GPs
            </link>
            <description>
                &lt;script type=&quot;text/x-mathjax-config&quot;&gt; 
    MathJax.Hub.Config({ 
        &quot;HTML-CSS&quot;: { scale: 100, linebreaks: { automatic: true } }, 
        SVG: { linebreaks: { automatic:true } }, 
        displayAlign: &quot;center&quot; });
&lt;/script&gt;

&lt;script type=&quot;text/javascript&quot; async=&quot;&quot; src=&quot;https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML&quot;&gt;

&lt;/script&gt;

&lt;!-- _includes/image.html --&gt;
&lt;div class=&quot;image-wrapper&quot;&gt;
    
        &lt;img src=&quot;/assets/bayesopt/bayesopt_cover.png&quot; alt=&quot;test&quot; /&gt;
    
    
        &lt;p class=&quot;image-caption&quot;&gt;The real reason I like Bayesian Optimization: lots of pretty pictures!&lt;/p&gt;
    
&lt;/div&gt;

&lt;p&gt;If I wanted to sell you on the idea of &lt;em&gt;Bayesian Optimization (BayesOpt)&lt;/em&gt;, I’d just list some of its applications:&lt;/p&gt;
&lt;ul&gt;
  &lt;li&gt;Hyperparameter Optimization (HPO) &lt;a class=&quot;citation&quot; href=&quot;#bayesopt_is_superior&quot;&gt;(Turner et al., 2021)&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Neural Architecture Search (NAS) &lt;a class=&quot;citation&quot; href=&quot;#white2019bananas&quot;&gt;(White et al., 2021)&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Molecule discovery &lt;a class=&quot;citation&quot; href=&quot;#LSBO_paper&quot;&gt;(Gómez-Bombarelli et al., 2018)&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Liquid chromatography &lt;a class=&quot;citation&quot; href=&quot;#BOELRIJK2023340789&quot;&gt;(Boelrijk et al., 2023)&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Creating low-carbon concrete &lt;a class=&quot;citation&quot; href=&quot;#ament2023sustainable&quot;&gt;(Ament et al., 2023)&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Plasma control in nuclear fusion &lt;a class=&quot;citation&quot; href=&quot;#mehta2022an&quot;&gt;(Mehta et al., 2022)&lt;/a&gt;.&lt;/li&gt;
  &lt;li&gt;Parameter tuning for lasers &lt;a class=&quot;citation&quot; href=&quot;#20.500.11850/385...
            </description>
            <pubDate>Sat, 18 Nov 2023 11:00:00 -0800</pubDate>
            <guid>
                https://blog.quipu-strands.com/bayesopt_1_key_ideas_GPs
            </guid>
        </item>
        
        <item>
            <title>Fun with GMMs</title>
            <link>
                https://blog.quipu-strands.com/fun_with_GMMs
            </link>
            <description>
                &lt;script type=&quot;text/x-mathjax-config&quot;&gt; 
    MathJax.Hub.Config({ 
        &quot;HTML-CSS&quot;: { scale: 100, linebreaks: { automatic: true } }, 
        SVG: { linebreaks: { automatic:true } }, 
        displayAlign: &quot;center&quot; });
&lt;/script&gt;

&lt;script type=&quot;text/javascript&quot; async=&quot;&quot; src=&quot;https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML&quot;&gt;

&lt;/script&gt;

&lt;p&gt;&lt;em&gt;Generative Models&lt;/em&gt; have been all the rage in AI lately, be it image generators like &lt;a href=&quot;https://stability.ai/blog/stable-diffusion-public-release&quot;&gt;Stable Diffusion&lt;/a&gt; or text generators like &lt;a href=&quot;https://openai.com/blog/chatgpt/&quot;&gt;ChatGPT&lt;/a&gt;. These are examples of fairly sophisticated generative systems. But whittled down to basics, they are a means to:&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;(a) concisely represent patterns in data, in a way that …&lt;/li&gt;
  &lt;li&gt;(b) they can &lt;em&gt;generate&lt;/em&gt; later what they have “seen”.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A bit like an artist who witnesses a scenery and later recreates it on canvas using her memory; her memory acting as a generative model here.&lt;/p&gt;

&lt;p&gt;In this post, I will try to illustrate this mechanism using a specific generative model: the &lt;strong&gt;Gaussian Mixture Model (GMM)&lt;/strong&gt;. We will use it to capture patterns &lt;em&gt;in images&lt;/em&gt;. Pixels will be our data, and patterns are how they are “lumped” together. Of course, this lumping is what humans perceive as the image itself. Effectively then, much like our artist, we will use a generative model to “see” an image and then have it reproduce it later. &lt;strong&gt;Think of this as a rudimentary, mostly visual, tutorial on GMMs&lt;/strong&gt;, where we focus on their representational capability. Or an article where I mostly ramble but touch upon GMMs, use of probabilities, all the w...
            </description>
            <pubDate>Fri, 22 Apr 2022 20:00:00 -0700</pubDate>
            <guid>
                https://blog.quipu-strands.com/fun_with_GMMs
            </guid>
        </item>
        
        <item>
            <title>Hello New Blog!</title>
            <link>
                https://blog.quipu-strands.com/hello-new-blog
            </link>
            <description>
                &lt;script type=&quot;text/x-mathjax-config&quot;&gt; 
    MathJax.Hub.Config({ 
        &quot;HTML-CSS&quot;: { scale: 100, linebreaks: { automatic: true } }, 
        SVG: { linebreaks: { automatic:true } }, 
        displayAlign: &quot;center&quot; });
&lt;/script&gt;

&lt;script type=&quot;text/javascript&quot; async=&quot;&quot; src=&quot;https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML&quot;&gt;

&lt;/script&gt;

&lt;p&gt;Moving to a new place can be hectic and tiresome. I am moving my blog, from &lt;a href=&quot;http://quipu-strands.blogspot.com/&quot;&gt;here&lt;/a&gt;, and it’s none of those.&lt;sup id=&quot;fnref:1&quot; role=&quot;doc-noteref&quot;&gt;&lt;a href=&quot;#fn:1&quot; class=&quot;footnote&quot; rel=&quot;footnote&quot;&gt;1&lt;/a&gt;&lt;/sup&gt; /s&lt;/p&gt;

&lt;p&gt;I tend towards writing technical posts when I tend towards writing at all these days, and &lt;a href=&quot;https://www.blogger.com&quot;&gt;blogger&lt;/a&gt; doesn’t give me the presentation options I need. So, for now, its GitHub pages, but with my own domain. That way, if I decide to move again, my (almost non-existent) readers won’t be sent scrambling to find my (almost non-existent) content.&lt;/p&gt;

&lt;p&gt;The old blog was titled “Random Thoughts”. I wanted something different and bit more original this time, so I Googled “Not So Random Thoughts”. Obviously.&lt;/p&gt;

&lt;p&gt;So many hits it isn’t even funny. So many, that you couldn’t squint and ignore. And that is exactly why you are stuck with “A Not So Primordial Soup”; which, by the way, does a good job of telling you that this is going to be a mixed bag of the deep and the frivolous.&lt;/p&gt;

&lt;p&gt;Just for the record, “The Psionic Poodle” was on the list. Since that isn’t the title, joy to us, things could have been worse.&lt;/p&gt;

&lt;p&gt;I have been asked about the domain name “quipu strands” (by the 3 and a 1/2 readers I have) . These were a device used by the Incas to...
            </description>
            <pubDate>Wed, 26 Apr 2017 08:01:36 -0700</pubDate>
            <guid>
                https://blog.quipu-strands.com/hello-new-blog
            </guid>
        </item>
        
    </channel>
</rss>
