Jekyll2023-10-17T20:57:45+00:00/feed.xmlClark FitzgeraldPersonal website for Professor Clark Fitzgerald in the Department of Mathematics and Statistics at Sacramento State.
Test Python Syntax Hightlight2023-10-17T00:00:00+00:002023-10-17T00:00:00+00:00/test-Python-syntax-hightlight<p>Will this generate and render properly?</p>
<div class="language-py highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nb">range</span><span class="p">(</span><span class="mi">10</span><span class="p">)</span>
<span class="mi">20</span> <span class="o">+</span> <span class="mi">10</span>
</code></pre></div></div>
<p>I hope so!</p>Will this generate and render properly?intro to Julia’s Debugger2020-06-19T08:34:00+00:002020-06-19T08:34:00+00:00/intro-to-Julia's-Debugger<p>This post demonstrates <a href="https://github.com/JuliaDebug/Debugger.jl">Julia’s Debugger</a> through some simple examples.</p>
<h2 id="introduction">Introduction</h2>
<p>Learning how to use a debugger was an important milestone in my growth as a programmer.
I thought I was doing fine without it, but I just didn’t know what I was missing.
A debugger allows you to stop a program in the middle of execution and interact directly with the software that you’ve written.
Debuggers validate your mental model of the program you’ve written, and your model of the language itself.
Duncan Temple Lang once remarked, “before you learn anything in a new programming language, you should learn the debugger.”
I’m taking his advice.</p>
<p>What follows is a brief, self contained, introduction to Julia’s debugger.
You may also enjoy Norm Matloff’s <a href="http://heather.cs.ucdavis.edu/~matloff/debug.html">general resources on debugging</a>.</p>
<h2 id="stepping-through-a-program">Stepping Through a Program</h2>
<p>We start by loading Debugger and defining a simple function, <code class="language-plaintext highlighter-rouge">f</code>.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">using</span> <span class="n">Debugger</span>
<span class="n">f</span> <span class="o">=</span> <span class="k">function</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">y</span> <span class="o">=</span> <span class="mi">2</span><span class="x">)</span>
<span class="n">z</span> <span class="o">=</span> <span class="mi">3</span>
<span class="n">x</span> <span class="o">+</span> <span class="n">y</span> <span class="o">+</span> <span class="n">z</span>
<span class="k">end</span>
</code></pre></div></div>
<p>We can enter the debugger by prefacing a function call with the aptly named macro <code class="language-plaintext highlighter-rouge">@enter</code>.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nd">@enter</span> <span class="n">f</span><span class="x">(</span><span class="mi">1</span><span class="x">)</span>
</code></pre></div></div>
<p>The debugger displays the following output.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">In</span> <span class="c">#1(x, y) at REPL[1]:1</span>
<span class="mi">1</span> <span class="n">f</span> <span class="o">=</span> <span class="k">function</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">y</span> <span class="o">=</span> <span class="mi">2</span><span class="x">)</span>
<span class="mi">2</span> <span class="n">z</span> <span class="o">=</span> <span class="mi">3</span>
<span class="o">></span><span class="mi">3</span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span> <span class="o">+</span> <span class="n">z</span>
<span class="mi">4</span> <span class="n">en</span>
<span class="n">About</span> <span class="n">to</span> <span class="n">run</span><span class="o">:</span> <span class="x">(</span><span class="o">+</span><span class="x">)(</span><span class="mi">1</span><span class="x">,</span> <span class="mi">2</span><span class="x">,</span> <span class="mi">3</span><span class="x">)</span>
<span class="mi">1</span><span class="o">|</span><span class="n">debug</span><span class="o">></span>
</code></pre></div></div>
<p>This means that the current program state is paused inside the call <code class="language-plaintext highlighter-rouge">f(1)</code>.
The <code class="language-plaintext highlighter-rouge">></code> symbol in front of line 3 means the debugger is ready to run line 3: <code class="language-plaintext highlighter-rouge">x + y + z</code>.
The prompt has changed to <code class="language-plaintext highlighter-rouge">1|debug></code>, since we are in the debugger, not the Julia REPL.</p>
<p>From the debug prompt, we can enter any valid Debugger commands.
Type <code class="language-plaintext highlighter-rouge">?</code> followed by enter to see the possible commands.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="mi">1</span><span class="o">|</span><span class="n">debug</span><span class="o">></span> <span class="o">?</span>
<span class="n">Debugger</span> <span class="n">commands</span>
<span class="n">≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡</span>
<span class="n">Below</span><span class="x">,</span> <span class="n">square</span> <span class="n">brackets</span> <span class="n">denote</span> <span class="n">optional</span> <span class="n">arguments</span><span class="o">.</span>
<span class="n">Misc</span><span class="o">:</span>
<span class="o">-</span> <span class="n">o</span><span class="o">:</span> <span class="n">open</span> <span class="n">the</span> <span class="n">current</span> <span class="n">line</span> <span class="k">in</span> <span class="n">an</span> <span class="n">editor</span>
<span class="o">-</span> <span class="n">q</span><span class="o">:</span> <span class="n">quit</span> <span class="n">the</span> <span class="n">debugger</span><span class="x">,</span> <span class="n">returning</span> <span class="nb">nothing</span>
<span class="o">...</span> <span class="x">(</span><span class="n">and</span> <span class="n">many</span> <span class="n">more</span><span class="x">)</span>
</code></pre></div></div>
<p>Let’s start by looking at the variables in our current frame.
These are the local variables inside the function.
We expect to see <code class="language-plaintext highlighter-rouge">x, y, z</code> bound to <code class="language-plaintext highlighter-rouge">1, 2, 3</code>.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="mi">1</span><span class="o">|</span><span class="n">debug</span><span class="o">></span> <span class="n">fr</span>
<span class="x">[</span><span class="mi">1</span><span class="x">]</span> <span class="c">#1(x, y) at REPL[1]:1</span>
<span class="o">|</span> <span class="n">x</span><span class="o">::</span><span class="kt">Int64</span> <span class="o">=</span> <span class="mi">1</span>
<span class="o">|</span> <span class="n">y</span><span class="o">::</span><span class="kt">Int64</span> <span class="o">=</span> <span class="mi">2</span>
<span class="o">|</span> <span class="n">z</span><span class="o">::</span><span class="kt">Int64</span> <span class="o">=</span> <span class="mi">3</span>
</code></pre></div></div>
<p>Indeed they are.
To evaluate any Julia expression, we type <code class="language-plaintext highlighter-rouge">`</code> (a backtick) and Debugger gives us a Julia prompt.
We can call functions on the local variables <em>based on their current state in the function execution</em>.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="mi">1</span><span class="o">|</span><span class="n">julia</span><span class="o">></span> <span class="mi">2</span><span class="o">*</span><span class="n">x</span>
<span class="mi">2</span>
</code></pre></div></div>
<p>We can also manipulate the state of the computation.
Suppose we would like to see what happens in the rest of the function if <code class="language-plaintext highlighter-rouge">x = 2.0</code>.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="mi">1</span><span class="o">|</span><span class="n">julia</span><span class="o">></span> <span class="n">x</span> <span class="o">=</span> <span class="mf">2.0</span>
<span class="mf">2.0</span>
</code></pre></div></div>
<p>Type Ctrl+C to exit the Julia prompt and return to the debug prompt.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="mi">1</span><span class="o">|</span><span class="n">julia</span><span class="o">></span> <span class="o">^</span><span class="n">C</span>
<span class="mi">1</span><span class="o">|</span><span class="n">debug</span><span class="o">></span>
</code></pre></div></div>
<p>Inspecting the variables in our current frame shows the new value for <code class="language-plaintext highlighter-rouge">x</code>.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="mi">1</span><span class="o">|</span><span class="n">debug</span><span class="o">></span> <span class="n">fr</span>
<span class="x">[</span><span class="mi">1</span><span class="x">]</span> <span class="c">#1(x, y) at REPL[1]:1</span>
<span class="o">|</span> <span class="n">x</span><span class="o">::</span><span class="kt">Float64</span> <span class="o">=</span> <span class="mf">2.0</span>
<span class="o">|</span> <span class="n">y</span><span class="o">::</span><span class="kt">Int64</span> <span class="o">=</span> <span class="mi">2</span>
<span class="o">|</span> <span class="n">z</span><span class="o">::</span><span class="kt">Int64</span> <span class="o">=</span> <span class="mi">3</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">n</code> steps to the next line in the function body, which is the implicit <code class="language-plaintext highlighter-rouge">return</code> statement.
The call returns <code class="language-plaintext highlighter-rouge">2.0 + 2 + 3 = 7.0</code>.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="mi">1</span><span class="o">|</span><span class="n">debug</span><span class="o">></span> <span class="n">n</span>
<span class="n">In</span> <span class="c">#1(x, y) at REPL[1]:1</span>
<span class="mi">1</span> <span class="n">f</span> <span class="o">=</span> <span class="k">function</span><span class="x">(</span><span class="n">x</span><span class="x">,</span> <span class="n">y</span> <span class="o">=</span> <span class="mi">2</span><span class="x">)</span>
<span class="mi">2</span> <span class="n">z</span> <span class="o">=</span> <span class="mi">3</span>
<span class="o">></span><span class="mi">3</span> <span class="n">x</span> <span class="o">+</span> <span class="n">y</span> <span class="o">+</span> <span class="n">z</span>
<span class="mi">4</span> <span class="n">en</span>
<span class="n">About</span> <span class="n">to</span> <span class="n">run</span><span class="o">:</span> <span class="k">return</span> <span class="mf">7.0</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">c</code> continues execution until a breakpoint is hit.
We didn’t add any breakpoints, and we’re already at the <code class="language-plaintext highlighter-rouge">return</code> statement anyways, so the call returns and we exit the debugger, returning to the main Julia REPL.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="mi">1</span><span class="o">|</span><span class="n">debug</span><span class="o">></span> <span class="n">c</span>
<span class="mf">7.0</span>
<span class="n">julia</span><span class="o">></span>
</code></pre></div></div>
<p>That’s it for the basic introduction.
The next example contains an actual bug.</p>
<h2 id="stopping-on-error">Stopping on Error</h2>
<p>It’s often useful to stop and enter the debugger when an error occurs, so we can examine the state of the program under the exact conditions that produced the error.
We do this with Debugger by calling <code class="language-plaintext highlighter-rouge">break_on(:error)</code>, then <code class="language-plaintext highlighter-rouge">@run</code> in front of the expression that produces the error.
Hopefully, this investigation will lead us to the root cause.</p>
<p>The following code calculates and prints <code class="language-plaintext highlighter-rouge">2*3 + 4</code>, but it contains two bugs.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">axpy1</span> <span class="o">=</span> <span class="k">function</span><span class="x">(</span><span class="n">a</span><span class="x">,</span> <span class="n">x</span><span class="x">,</span> <span class="n">y</span><span class="x">)</span>
<span class="n">ax</span> <span class="o">+</span> <span class="n">y</span>
<span class="k">end</span>
<span class="n">f1</span> <span class="o">=</span> <span class="k">function</span><span class="x">()</span>
<span class="n">a</span> <span class="o">=</span> <span class="mi">2</span>
<span class="n">x</span> <span class="o">=</span> <span class="mi">3</span>
<span class="n">y</span> <span class="o">=</span> <span class="mi">4</span>
<span class="n">println</span><span class="x">(</span><span class="s">"If a = </span><span class="si">$</span><span class="s">a, x = </span><span class="si">$</span><span class="s">x, y = </span><span class="si">$</span><span class="s">y, then ax + y = </span><span class="si">$</span><span class="s">(axpy1(x, y, a))"</span><span class="x">)</span>
<span class="k">end</span>
</code></pre></div></div>
<p>When we call <code class="language-plaintext highlighter-rouge">f1()</code>, we see the following error message:</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">julia</span><span class="o">></span> <span class="n">f1</span><span class="x">()</span>
<span class="n">ERROR</span><span class="o">:</span> <span class="kt">UndefVarError</span><span class="o">:</span> <span class="n">ax</span> <span class="n">not</span> <span class="n">defined</span>
<span class="n">Stacktrace</span><span class="o">:</span>
<span class="x">[</span><span class="mi">1</span><span class="x">]</span> <span class="x">(</span><span class="o">::</span><span class="n">var</span><span class="s">"#5#6"</span><span class="x">)(</span><span class="o">::</span><span class="kt">Int64</span><span class="x">,</span> <span class="o">::</span><span class="kt">Int64</span><span class="x">,</span> <span class="o">::</span><span class="kt">Int64</span><span class="x">)</span> <span class="n">at</span> <span class="o">./</span><span class="n">REPL</span><span class="x">[</span><span class="mi">45</span><span class="x">]</span><span class="o">:</span><span class="mi">2</span>
<span class="x">[</span><span class="mi">2</span><span class="x">]</span> <span class="x">(</span><span class="o">::</span><span class="n">var</span><span class="s">"#9#10"</span><span class="x">)()</span> <span class="n">at</span> <span class="o">./</span><span class="n">REPL</span><span class="x">[</span><span class="mi">48</span><span class="x">]</span><span class="o">:</span><span class="mi">5</span>
<span class="x">[</span><span class="mi">3</span><span class="x">]</span> <span class="n">top</span><span class="o">-</span><span class="n">level</span> <span class="n">scope</span> <span class="n">at</span> <span class="n">REPL</span><span class="x">[</span><span class="mi">49</span><span class="x">]</span><span class="o">:</span><span class="mi">1</span>
</code></pre></div></div>
<p>We don’t need debugging to fix this error.
This error message tells us exactly what the problem is: we never defined the variable <code class="language-plaintext highlighter-rouge">ax</code>.
Our program is simple, and the stack trace only contains three frames, so if we have the correct mental model of how the language works, then we can reason through it to fix the error.</p>
<p>Debugging becomes truly useful when the problem is <strong>not</strong> obvious.
The error message may be uninformative, the program may be complex, and the stack trace may contain more frames than you can fit in your head.
If you hit an error, and you don’t know what’s wrong, then run the same code through the debugger, as follows.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">julia</span><span class="o">></span> <span class="n">break_on</span><span class="x">(</span><span class="o">:</span><span class="n">error</span><span class="x">)</span>
<span class="n">julia</span><span class="o">></span> <span class="nd">@run</span> <span class="n">f1</span><span class="x">()</span>
<span class="n">Breaking</span> <span class="k">for</span> <span class="n">error</span><span class="o">:</span>
<span class="n">ERROR</span><span class="o">:</span> <span class="kt">UndefVarError</span><span class="o">:</span> <span class="n">ax</span> <span class="n">not</span> <span class="n">defined</span>
<span class="n">Stacktrace</span><span class="o">:</span>
<span class="x">[</span><span class="mi">1</span><span class="x">]</span> <span class="x">(</span><span class="o">::</span><span class="n">var</span><span class="s">"#11#12"</span><span class="x">)(</span><span class="o">::</span><span class="kt">Int64</span><span class="x">,</span> <span class="o">::</span><span class="kt">Int64</span><span class="x">,</span> <span class="o">::</span><span class="kt">Int64</span><span class="x">)</span> <span class="n">at</span> <span class="n">REPL</span><span class="x">[</span><span class="mi">52</span><span class="x">]</span><span class="o">:</span><span class="mi">2</span>
<span class="x">[</span><span class="mi">2</span><span class="x">]</span> <span class="x">(</span><span class="o">::</span><span class="n">var</span><span class="s">"#13#14"</span><span class="x">)()</span> <span class="n">at</span> <span class="n">REPL</span><span class="x">[</span><span class="mi">53</span><span class="x">]</span><span class="o">:</span><span class="mi">5</span>
<span class="n">In</span> <span class="c">#11(a, x, y) at REPL[52]:1</span>
<span class="mi">1</span> <span class="n">axpy1</span> <span class="o">=</span> <span class="k">function</span><span class="x">(</span><span class="n">a</span><span class="x">,</span> <span class="n">x</span><span class="x">,</span> <span class="n">y</span><span class="x">)</span>
<span class="o">></span><span class="mi">2</span> <span class="n">ax</span> <span class="o">+</span> <span class="n">y</span>
<span class="mi">3</span> <span class="n">en</span>
<span class="n">About</span> <span class="n">to</span> <span class="n">run</span><span class="o">:</span> <span class="x">(</span><span class="o">+</span><span class="x">)(</span><span class="n">Main</span><span class="o">.</span><span class="n">ax</span><span class="x">,</span> <span class="mi">2</span><span class="x">)</span>
<span class="mi">1</span><span class="o">|</span><span class="n">debug</span><span class="o">></span>
</code></pre></div></div>
<p>We are now in the debugger prompt, <em>inside the call to <code class="language-plaintext highlighter-rouge">axpy1()</code>,</em> so we can enter any of the debugger commands.
The error message said variable <code class="language-plaintext highlighter-rouge">ax</code> is not present.
What variables are present?
Let’s see.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="mi">1</span><span class="o">|</span><span class="n">debug</span><span class="o">></span> <span class="n">fr</span>
<span class="x">[</span><span class="mi">1</span><span class="x">]</span> <span class="c">#11(a, x, y) at REPL[52]:1</span>
<span class="o">|</span> <span class="n">a</span><span class="o">::</span><span class="kt">Int64</span> <span class="o">=</span> <span class="mi">3</span>
<span class="o">|</span> <span class="n">x</span><span class="o">::</span><span class="kt">Int64</span> <span class="o">=</span> <span class="mi">4</span>
<span class="o">|</span> <span class="n">y</span><span class="o">::</span><span class="kt">Int64</span> <span class="o">=</span> <span class="mi">2</span>
</code></pre></div></div>
<p>We have <code class="language-plaintext highlighter-rouge">a</code>, <code class="language-plaintext highlighter-rouge">x</code>, and <code class="language-plaintext highlighter-rouge">y</code>, but no <code class="language-plaintext highlighter-rouge">ax</code>.
Ah, of course: we should have written <code class="language-plaintext highlighter-rouge">a*x</code> instead of <code class="language-plaintext highlighter-rouge">ax</code>.
Let’s fix this bug:</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">axpy2</span> <span class="o">=</span> <span class="k">function</span><span class="x">(</span><span class="n">a</span><span class="x">,</span> <span class="n">x</span><span class="x">,</span> <span class="n">y</span><span class="x">)</span>
<span class="n">a</span><span class="o">*</span><span class="n">x</span> <span class="o">+</span> <span class="n">y</span>
<span class="k">end</span>
<span class="n">f2</span> <span class="o">=</span> <span class="k">function</span><span class="x">()</span>
<span class="n">a</span> <span class="o">=</span> <span class="mi">2</span>
<span class="n">x</span> <span class="o">=</span> <span class="mi">3</span>
<span class="n">y</span> <span class="o">=</span> <span class="mi">4</span>
<span class="n">println</span><span class="x">(</span><span class="s">"If a = </span><span class="si">$</span><span class="s">a, x = </span><span class="si">$</span><span class="s">x, y = </span><span class="si">$</span><span class="s">y, then ax + y = </span><span class="si">$</span><span class="s">(axpy2(x, y, a))"</span><span class="x">)</span>
<span class="k">end</span>
</code></pre></div></div>
<!--
Julia allows [juxtaposed expressions like `10x`](https://docs.julialang.org/en/v1/manual/integers-and-floating-point-numbers/#man-numeric-literal-coefficients-1), but no sane programming language can know that `ax` should actually mean `a*x`, even if the mathematical notation was perfectly clear to the human reader.
-->
<p>We call it as follows:</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">julia</span><span class="o">></span> <span class="n">f2</span><span class="x">()</span>
<span class="n">If</span> <span class="n">a</span> <span class="o">=</span> <span class="mi">2</span><span class="x">,</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">3</span><span class="x">,</span> <span class="n">y</span> <span class="o">=</span> <span class="mi">4</span><span class="x">,</span> <span class="n">then</span> <span class="n">ax</span> <span class="o">+</span> <span class="n">y</span> <span class="o">=</span> <span class="mi">14</span>
</code></pre></div></div>
<p>That’s not right.
<code class="language-plaintext highlighter-rouge">2 * 3 + 4 = 10</code>, not <code class="language-plaintext highlighter-rouge">14</code>.
This is an altogether more troubling class of bug: one that executes perfectly fine, but produces the wrong answer. 😭
<code class="language-plaintext highlighter-rouge">break_on(:error)</code> doesn’t help us here, because there is no error to break on.
Let’s use another debugging technique, breakpoints, to find what went wrong.</p>
<h2 id="setting-breakpoints">Setting Breakpoints</h2>
<p>Breakpoints tell the debugger to stop executing code, and instead drop you into an interactive debugger prompt so that you can look around.</p>
<p>One simple way to add a breakpoint is to add the macro <code class="language-plaintext highlighter-rouge">@bp</code> to the line of the source code where you want to stop and examine the state.
Let’s add <code class="language-plaintext highlighter-rouge">@bp</code> inside of the <code class="language-plaintext highlighter-rouge">axpy</code> function, which makes this equivalent to debugging when the <code class="language-plaintext highlighter-rouge">axpy</code> function is called.
In practice, you might add <code class="language-plaintext highlighter-rouge">@bp</code> deep in a loop in a conditional branch, so it only enters the debugger in the one case you’re interested in.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">axpy3</span> <span class="o">=</span> <span class="k">function</span><span class="x">(</span><span class="n">a</span><span class="x">,</span> <span class="n">x</span><span class="x">,</span> <span class="n">y</span><span class="x">)</span>
<span class="nd">@bp</span>
<span class="n">a</span><span class="o">*</span><span class="n">x</span> <span class="o">+</span> <span class="n">y</span>
<span class="k">end</span>
<span class="n">f3</span> <span class="o">=</span> <span class="k">function</span><span class="x">()</span>
<span class="n">a</span> <span class="o">=</span> <span class="mi">2</span>
<span class="n">x</span> <span class="o">=</span> <span class="mi">3</span>
<span class="n">y</span> <span class="o">=</span> <span class="mi">4</span>
<span class="n">println</span><span class="x">(</span><span class="s">"If a = </span><span class="si">$</span><span class="s">a, x = </span><span class="si">$</span><span class="s">x, y = </span><span class="si">$</span><span class="s">y, then ax + y = </span><span class="si">$</span><span class="s">(axpy3(x, y, a))"</span><span class="x">)</span>
<span class="k">end</span>
</code></pre></div></div>
<p>If we run this code normally, then it behaves as before.
In particular, leaving <code class="language-plaintext highlighter-rouge">@bp</code> in the code does not cause Julia to enter the debugger.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">julia</span><span class="o">></span> <span class="n">f3</span><span class="x">()</span>
<span class="n">If</span> <span class="n">a</span> <span class="o">=</span> <span class="mi">2</span><span class="x">,</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">3</span><span class="x">,</span> <span class="n">y</span> <span class="o">=</span> <span class="mi">4</span><span class="x">,</span> <span class="n">then</span> <span class="n">ax</span> <span class="o">+</span> <span class="n">y</span> <span class="o">=</span> <span class="mi">14</span>
</code></pre></div></div>
<p>To stop at the break point and enter the debugger, we need to preface the code with <code class="language-plaintext highlighter-rouge">@run</code>.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">julia</span><span class="o">></span> <span class="nd">@run</span> <span class="n">f3</span><span class="x">()</span>
<span class="n">Hit</span> <span class="n">breakpoint</span><span class="o">:</span>
<span class="n">In</span> <span class="c">#23(a, x, y) at REPL[62]:1</span>
<span class="mi">1</span> <span class="n">axpy3</span> <span class="o">=</span> <span class="k">function</span><span class="x">(</span><span class="n">a</span><span class="x">,</span> <span class="n">x</span><span class="x">,</span> <span class="n">y</span><span class="x">)</span>
<span class="n">●2</span> <span class="nd">@bp</span>
<span class="o">></span><span class="mi">3</span> <span class="n">a</span><span class="o">*</span><span class="n">x</span> <span class="o">+</span> <span class="n">y</span>
<span class="mi">4</span> <span class="n">en</span>
<span class="n">About</span> <span class="n">to</span> <span class="n">run</span><span class="o">:</span> <span class="x">(</span><span class="o">*</span><span class="x">)(</span><span class="mi">3</span><span class="x">,</span> <span class="mi">4</span><span class="x">)</span>
<span class="mi">1</span><span class="o">|</span><span class="n">debug</span><span class="o">></span>
</code></pre></div></div>
<p>We hit the breakpoint that we added inside the definition of <code class="language-plaintext highlighter-rouge">axpy3</code>, so we’re now back in the debugger.
<code class="language-plaintext highlighter-rouge">●2</code> indicates the breakpoint we hit on the second line, and <code class="language-plaintext highlighter-rouge">>3</code> indicates the next line to run.
Let’s look at the variables in our current frame, the call to <code class="language-plaintext highlighter-rouge">axpy3</code>.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="mi">1</span><span class="o">|</span><span class="n">debug</span><span class="o">></span> <span class="n">fr</span>
<span class="x">[</span><span class="mi">1</span><span class="x">]</span> <span class="c">#23(a, x, y) at REPL[62]:1</span>
<span class="o">|</span> <span class="n">a</span><span class="o">::</span><span class="kt">Int64</span> <span class="o">=</span> <span class="mi">3</span>
<span class="o">|</span> <span class="n">x</span><span class="o">::</span><span class="kt">Int64</span> <span class="o">=</span> <span class="mi">4</span>
<span class="o">|</span> <span class="n">y</span><span class="o">::</span><span class="kt">Int64</span> <span class="o">=</span> <span class="mi">2</span>
</code></pre></div></div>
<p>Fine, <code class="language-plaintext highlighter-rouge">a</code>, <code class="language-plaintext highlighter-rouge">x</code>, and <code class="language-plaintext highlighter-rouge">y</code> are all defined and nothing looks too terribly wrong.
Debugger allows us to step <strong>up</strong> the <a href="https://en.wikipedia.org/wiki/Call_stack">call stack</a>, into the parent frame of the call to <code class="language-plaintext highlighter-rouge">axpy3()</code> where we encountered the breakpoint.
The command is <code class="language-plaintext highlighter-rouge">up</code>.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="mi">1</span><span class="o">|</span><span class="n">debug</span><span class="o">></span> <span class="n">up</span>
<span class="n">In</span> <span class="c">#25() at REPL[63]:1</span>
<span class="mi">1</span> <span class="n">f3</span> <span class="o">=</span> <span class="k">function</span><span class="x">()</span>
<span class="mi">2</span> <span class="n">a</span> <span class="o">=</span> <span class="mi">2</span>
<span class="mi">3</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">3</span>
<span class="mi">4</span> <span class="n">y</span> <span class="o">=</span> <span class="mi">4</span>
<span class="o">></span><span class="mi">5</span> <span class="n">println</span><span class="x">(</span><span class="s">"If a = </span><span class="si">$</span><span class="s">a, x = </span><span class="si">$</span><span class="s">x, y = </span><span class="si">$</span><span class="s">y, then ax + y = </span><span class="si">$</span><span class="s">(axpy3(x, y, a))"</span><span class="x">)</span>
<span class="mi">6</span> <span class="n">en</span>
<span class="n">About</span> <span class="n">to</span> <span class="n">run</span><span class="o">:</span> <span class="x">(</span><span class="n">var</span><span class="s">"#23#24"</span><span class="x">())(</span><span class="mi">3</span><span class="x">,</span> <span class="mi">4</span><span class="x">,</span> <span class="mi">2</span><span class="x">)</span>
<span class="mi">2</span><span class="o">|</span><span class="n">debug</span><span class="o">></span>
</code></pre></div></div>
<p>We are inside the call to <code class="language-plaintext highlighter-rouge">f3()</code>, which called <code class="language-plaintext highlighter-rouge">axpy3()</code>.
In addition, the prompt changed to <code class="language-plaintext highlighter-rouge">2|debug></code>, indicating that we are on frame 2.
Let’s probe the state of the evaluation.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="mi">2</span><span class="o">|</span><span class="n">debug</span><span class="o">></span> <span class="n">fr</span>
<span class="x">[</span><span class="mi">2</span><span class="x">]</span> <span class="c">#25() at REPL[63]:1</span>
<span class="o">|</span> <span class="n">a</span><span class="o">::</span><span class="kt">Int64</span> <span class="o">=</span> <span class="mi">2</span>
<span class="o">|</span> <span class="n">x</span><span class="o">::</span><span class="kt">Int64</span> <span class="o">=</span> <span class="mi">3</span>
<span class="o">|</span> <span class="n">y</span><span class="o">::</span><span class="kt">Int64</span> <span class="o">=</span> <span class="mi">4</span>
</code></pre></div></div>
<p>In this frame, we still have the same variables <code class="language-plaintext highlighter-rouge">a</code>, <code class="language-plaintext highlighter-rouge">x</code>, and <code class="language-plaintext highlighter-rouge">y</code>, but because of <a href="https://docs.julialang.org/en/v1/manual/variables-and-scoping/">Julia’s lexical scoping rules</a> <em>they’re not the same as the <code class="language-plaintext highlighter-rouge">a</code>, <code class="language-plaintext highlighter-rouge">x</code>, and <code class="language-plaintext highlighter-rouge">y</code> in frame 1</em>.
Press <code class="language-plaintext highlighter-rouge">`</code> (literal backtick) to enter the Julia REPL where can evaluate our <code class="language-plaintext highlighter-rouge">axpy3</code> function in the frame where it appeared to have a problem:</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="mi">2</span><span class="o">|</span><span class="n">julia</span><span class="o">></span>
<span class="mi">2</span><span class="o">|</span><span class="n">julia</span><span class="o">></span> <span class="n">axpy3</span><span class="x">(</span><span class="n">a</span><span class="x">,</span> <span class="n">x</span><span class="x">,</span> <span class="n">y</span><span class="x">)</span>
<span class="mi">10</span>
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">2 * 3 + 4 = 10</code>, so our <code class="language-plaintext highlighter-rouge">axpy3</code> function works just fine.
The bug lies in <code class="language-plaintext highlighter-rouge">f3</code>, where we call <code class="language-plaintext highlighter-rouge">axpy3(x, y, a)</code> with the arguments in the wrong order.
Type <code class="language-plaintext highlighter-rouge">^C</code> followed by <code class="language-plaintext highlighter-rouge">q</code> to get back to the main Julia REPL.</p>
<p>Maybe it wasn’t the best idea to bury our call to <code class="language-plaintext highlighter-rouge">axpy3</code> inside this string interpolation.
Let’s fix the bug.</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">axpy4</span> <span class="o">=</span> <span class="k">function</span><span class="x">(</span><span class="n">a</span><span class="x">,</span> <span class="n">x</span><span class="x">,</span> <span class="n">y</span><span class="x">)</span>
<span class="n">a</span><span class="o">*</span><span class="n">x</span> <span class="o">+</span> <span class="n">y</span>
<span class="k">end</span>
<span class="n">f4</span> <span class="o">=</span> <span class="k">function</span><span class="x">()</span>
<span class="n">a</span> <span class="o">=</span> <span class="mi">2</span>
<span class="n">x</span> <span class="o">=</span> <span class="mi">3</span>
<span class="n">y</span> <span class="o">=</span> <span class="mi">4</span>
<span class="n">z</span> <span class="o">=</span> <span class="n">axpy4</span><span class="x">(</span><span class="n">a</span><span class="x">,</span> <span class="n">x</span><span class="x">,</span> <span class="n">y</span><span class="x">)</span>
<span class="n">println</span><span class="x">(</span><span class="s">"If a = </span><span class="si">$</span><span class="s">a, x = </span><span class="si">$</span><span class="s">x, y = </span><span class="si">$</span><span class="s">y, then ax + y = </span><span class="si">$</span><span class="s">z"</span><span class="x">)</span>
<span class="k">end</span>
</code></pre></div></div>
<p>Does it work?</p>
<div class="language-julia highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">julia</span><span class="o">></span> <span class="n">f4</span><span class="x">()</span>
<span class="n">If</span> <span class="n">a</span> <span class="o">=</span> <span class="mi">2</span><span class="x">,</span> <span class="n">x</span> <span class="o">=</span> <span class="mi">3</span><span class="x">,</span> <span class="n">y</span> <span class="o">=</span> <span class="mi">4</span><span class="x">,</span> <span class="n">then</span> <span class="n">ax</span> <span class="o">+</span> <span class="n">y</span> <span class="o">=</span> <span class="mi">10</span>
</code></pre></div></div>
<p>Problem solved.</p>
<h2 id="conclusion">Conclusion</h2>
<p>This post highlighted only a couple of Debugger’s rich set of capabilities.
You can add a breakpoint when a function is called, when a particular method is called, when a condition is satisfied, or to an arbitrary line in a file.
You can toggle breakpoints on and off, and generally control entire sequence of breakpoints.
We didn’t even talk about watching expressions, editing code on the fly, or more sophisticated stepping.</p>
<p>Debuggers are one of those peripheral skills that take a little time and effort to learn, so it’s easy to put it off.
Don’t.
Debuggers save time by allowing you to quickly pinpoint problems.
They also strengthen your mental model, for any language, by allowing you to constantly test, explore, and verify what you believe to be true.</p>This post demonstrates Julia’s Debugger through some simple examples.ideas for software engineering lessons2019-07-12T09:50:00+00:002019-07-12T09:50:00+00:00/ideas-for-software-engineering-lessons<p>Some notes on what I would like to teach about software engineering.</p>
<p>TODO: Find real examples in real code bases that illustrate these points.</p>
<h3 id="comments">Comments</h3>
<p>Comments should be meaningful, and tell you something that’s not obvious from the code.</p>
<p>Good:</p>
<pre><code class="language-{r}"># Initial state for computing the fibonacci sequence
x[1] = 0
</code></pre>
<p>Bad:</p>
<pre><code class="language-{r}"># Assign 0 for the first element
x[1] = 0
</code></pre>
<h3 id="dependencies-and-software-reuse">Dependencies and software reuse</h3>
<p>Find a balance between reusing code and having excessive dependencies.</p>
<h3 id="commit-messages">Commit messages</h3>
<p>Why did you do what you did?
Did it fix something or add new functionality?
Was it just cosmetic / stylistic?</p>
<h3 id="testing">Testing</h3>
<p>More tests are not necessarily better.
Don’t write tests that tie you to implementation details, if you can help it.
Test the things that matter.</p>
<h3 id="names">Names</h3>
<p>Few would be perplexed by a function named <code class="language-plaintext highlighter-rouge">get_data_from_connection</code>.
Few would know what a function named <code class="language-plaintext highlighter-rouge">get_dfc</code> does.
When in doubt, err on the side of verbose names.
Try to keep it under 25 characters.</p>
<h3 id="functions">Functions</h3>
<p>Write small, easy to understand functions.
Decompose them into simpler ones, and give them explicit names.
One general technique is to look for big indented blocks in your code and make those into functions.
These could be the bodies of loops or conditional statements.</p>
<h3 id="simplicity">Simplicity</h3>
<p>Strive to write simple code, using basic data structures whenever possible and convenient.
Don’t show off your knowledge of the language</p>Some notes on what I would like to teach about software engineering.string formatting with files in bash2018-09-21T11:07:00+00:002018-09-21T11:07:00+00:00/string-formatting-with-files-in-bash<h2 id="the-problem">The problem</h2>
<p>I need to generate a file with very simple structure, just a LaTeX file with a few <code class="language-plaintext highlighter-rouge">include</code> statements.
Until today, I was copying a template and manually changing it.
This was slow, redundant, and error prone.
It makes more sense to just generate the files from the template instead.
I wanted to use bash for this rather than a scripting language, because then it will connect better with the whole GNU Make workflow of the project.</p>
<h2 id="the-solution">The solution</h2>
<p>I started by putting the template in the Makefile and using <code class="language-plaintext highlighter-rouge">echo</code>, as I normally do.
This gave me some problems, so I started looking into alternatives that would let me keep the template as a separate file.
Most people recommend <code class="language-plaintext highlighter-rouge">printf</code> as a more robust alternative to <code class="language-plaintext highlighter-rouge">echo</code>.
The problem was, how to pass <code class="language-plaintext highlighter-rouge">printf</code> the <code class="language-plaintext highlighter-rouge">FORMAT</code> argument from a file?
Use <code class="language-plaintext highlighter-rouge">xargs</code>.</p>
<p>Here’s a simple example.
First you’ll need a <code class="language-plaintext highlighter-rouge">format.txt</code> file, or whatever you choose to call it.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ printf "%%s and %%s\n" > format.txt
</code></pre></div></div>
<p>The <code class="language-plaintext highlighter-rouge">%%</code> just escapes the special character <code class="language-plaintext highlighter-rouge">%</code> for <code class="language-plaintext highlighter-rouge">printf</code>, so the contents of <code class="language-plaintext highlighter-rouge">template.txt</code> are <code class="language-plaintext highlighter-rouge">%s and %s</code>.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cat format.txt
%s and %s
</code></pre></div></div>
<p>We can use it as follows:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>$ cat format.txt | xargs -0 -I{} printf {} A B
A and B
</code></pre></div></div>
<p>We read this command as <code class="language-plaintext highlighter-rouge">printf <contents of format.txt> A B</code>.
<code class="language-plaintext highlighter-rouge">xargs</code> always makes me do some mental gymnastics.
The <code class="language-plaintext highlighter-rouge">-0</code> flag prevents <code class="language-plaintext highlighter-rouge">xargs</code> from messing with the actual contents, i.e. removing whitespace.
<code class="language-plaintext highlighter-rouge">-I{}</code> allows me to take the text and pass it as the first argument with <code class="language-plaintext highlighter-rouge">printf {} ...</code>.
Otherwise it would be the last argument.</p>
<p>Here’s how I actually use it in my Makefile:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>%.tex: tex/%.tex texformat.txt preamble.tex
cat texformat.txt | xargs -0 -I{} printf {} $< $< > $@
</code></pre></div></div>The problemdifferences between subset assignment in R2018-09-06T10:50:00+00:002018-09-06T10:50:00+00:00/differences-between-subset-assignment-in-R<p>Today I used a data frame where one column was a list
consisting so that each row could contain its own list. I used this nesting
approach so that I could add arbitrary elements to each row.</p>
<p>I ran into a difference in the assignment operators <code class="language-plaintext highlighter-rouge">$<-</code> and <code class="language-plaintext highlighter-rouge">[[<-</code> that I
didn’t expect. There are multiple ways to <code class="language-plaintext highlighter-rouge">[[</code> and <code class="language-plaintext highlighter-rouge">$</code> to select the
same data, but they do different things when assigning back into them.
Here’s what I mean:</p>
<pre><code class="language-{r}">d = data.frame(a = 1:2)
d$b = list(list(foo = 10), list(foo = 20))
d$b
# [[1]]
# [[1]]$foo
# [1] 10
#
# [[2]]
# [[2]]$foo
# [1] 20
</code></pre>
<p>We can use the list method to extract.</p>
<pre><code class="language-{r}">d[["b"]][[1]][["foo"]]
# [1] 10
</code></pre>
<p>Or we can use the data frame method to extract. They both give the same
result.</p>
<pre><code class="language-{r}">d[, "b"][[1]][["foo"]]
# [1] 10
</code></pre>
<p>Assigning with the <code class="language-plaintext highlighter-rouge">[[<-</code> does what I expect; it replaces the 10 with the
new value and keeps all the structure.</p>
<pre><code class="language-{r}">d[["b"]][[1]][["foo"]]
d$b
# [[1]]
# [[1]]$foo
# [1] 10
#
# [[1]]$bar
# [1] 5
#
#
# [[2]]
# [[2]]$foo
# [1] 20
</code></pre>
<p>If we use the data frame <code class="language-plaintext highlighter-rouge">[<-</code> to do the assignment then R warns us.</p>
<pre><code class="language-{r}">d[, "b"][[1]][["foobar"]] = TRUE
Warning message:
In `[<-.data.frame`(`*tmp*`, , "b", value = list(list(foo = 10, :
provided 2 variables to replace 1 variables
</code></pre>
<p>We lost the structure and the data. This is not what I wanted.</p>
<pre><code class="language-{r}">d$b
# $foo
# [1] 10
#
# $foobar
# [1] TRUE
</code></pre>
<h2 id="conclusion">Conclusion</h2>
<p>Check your intuition when using lists inside data frames, they may not do
what you expect.</p>
<p>As another example, here’s how to NOT make the same list as the original <code class="language-plaintext highlighter-rouge">d</code>:</p>
<pre><code class="language-{r}">d2 = data.frame(a = 1:2, b = list(list(foo = 10), list(foo = 20)))
d2
# a b.foo b.foo.1
# 1 1 10 20
# 2 2 10 20
</code></pre>
<p>If I’m missing some higher unifying logic with all this then please let me
know and I’ll update this post.</p>Today I used a data frame where one column was a list consisting so that each row could contain its own list. I used this nesting approach so that I could add arbitrary elements to each row.heuristics for task scheduling algorithms2018-08-21T13:19:00+00:002018-08-21T13:19:00+00:00/heuristics-for-task-scheduling-algorithms<p>I’ve been reading Oliver Sinnen’s book <a href="https://onlinelibrary.wiley.com/doi/book/10.1002/0470121173">Task Scheduling For Parallel
Systems</a>. The
idea of task scheduling is to make a program faster by running several
statements simultaneously. Here’s a simple program:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>x = foo()
y = bar()
foobar(x, y)
</code></pre></div></div>
<p>The computer can potentially run the first two lines at the same time and
then communicate to make <code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code> available to run the third line. This
is an example of a schedule, a mapping of expressions to processors. Task
scheduling generalizes this idea to arbitrary graphs of expressions. It’s
NP hard.</p>
<h2 id="types-of-algorithms">Types of Algorithms</h2>
<p>Sinnen discusses three general heuristics for scheduling algorithms.</p>
<ul>
<li><strong>List scheduling</strong> is a greedy algorithm that assigns expressions to the
first ready worker.</li>
<li><strong>Clustering</strong> groups computations on one worker to avoid the expense of
transferring data.</li>
<li><strong>Genetic algorithms</strong> start with random schedules and evolve them until
they’re good enough.</li>
</ul>
<h2 id="future-directions">Future Directions</h2>
<p>While timing data analysis programs I’ve noticed that typically only a few
expressions take significant time. This suggests a different kind of
heuristic: Find an optimal schedule for the few long running expressions
and schedule everything else around that. I’ll probably implement this in
the R <a href="https://cran.r-project.org/package=makeParallel">makeParallel
package</a>.</p>I’ve been reading Oliver Sinnen’s book Task Scheduling For Parallel Systems. The idea of task scheduling is to make a program faster by running several statements simultaneously. Here’s a simple program:understanding R S42018-06-12T08:53:00+00:002018-06-12T08:53:00+00:00/understanding-R-S4<p>I’ve been reading John Chamber’s book <a href="https://www.crcpress.com/Extending-R/Chambers/p/book/9781498775717">Extending
R</a> and
it feels like another one of those things that I wish I would have read and
understood long ago. He argues for using S4 over S3 for more general software
engineering in R.</p>
<p>S4 is necessarily more complex than S3. This post is for me to gather my
thoughts and see if I understand it. There are copious amounts of
documentation out there.</p>
<h2 id="exploring-the-space">Exploring the space</h2>
<p>Typically in R I use <code class="language-plaintext highlighter-rouge">ls()</code> to see which objects exist. The methods
package has better tools for this.</p>
<p>What classes does a package define?</p>
<pre><code class="language-{r}"># Any package using S4
library(CodeDepends)
pkg = "package:CodeDepends"
getClasses(pkg)
# [1] "ScriptNodeInfo" "Script" "ScriptInfo" "AnnotatedScript"
# [5] "ScriptNode"
</code></pre>
<p>What is the definition of the class, ie. the slots and the inheritance?</p>
<pre><code class="language-{r}">getClass("AnnotatedScript")
# Class "AnnotatedScript" [package "CodeDepends"]
#
# Slots:
#
# Name: .Data location
# Class: list character
#
# Extends:
# Class "Script", directly
# Class "list", by class "Script", distance 2
# Class "vector", by class "Script", distance 3
</code></pre>
<p>What generic functions does the package use or define?</p>
<pre><code class="language-{r}">getGenerics(pkg)
# An object of class "ObjectsWithPackage":
#
# Object: "[<-" "[" "[[<-" "[[" "$<-" "$" "coerce"
# "getDependsThread"
# Package: "base" "base" "base" "base" "base" "base" "methods" "CodeDepends"
#
# Object: "getInputs" "getVariables" "makeCallGraph" "names" "readScript"
# Package: "CodeDepends" "CodeDepends" "CodeDepends" "base" "CodeDepends"
</code></pre>
<p>What methods have been defined for a particular generic function?</p>
<pre><code class="language-{r}">methods(getInputs)
# [1] getInputs,ANY-method getInputs,DynScript-method
# [3] getInputs,function-method getInputs,Script-method
# [5] getInputs,ScriptNodeInfo-method getInputs,ScriptNode-method
</code></pre>
<p>What is the definition of a particular method?</p>
<pre><code class="language-{r}"># Definition when calling getInputs(obj) where `obj` is of class function:
getMethod(getInputs, "function")
# Method Definition:
#
# function (e, collector = inputCollector(), basedir = ".", reset = FALSE,
# formulaInputs = FALSE, ...)
# {
# ...
</code></pre>
<h2 id="basic-operations">Basic Operations</h2>
<p>Define new classes (usually done in a package):</p>
<pre><code class="language-{r}">
Schedule = setClass("Schedule",
slots = c(code = "expression", evaluation = "data.frame"))
TaskSchedule = setClass("TaskSchedule",
slots = c(transfer = "data.frame"),
contains = "Schedule")
</code></pre>
<p>Now they show up here:</p>
<pre><code class="language-{r}">"Schedule" %in% getClasses(globalenv())
</code></pre>
<p>Define a new generic function with a default method (usually done in a package):</p>
<pre><code class="language-{r}">
setGeneric("generateCode", function(Schedule, ...)
{
Schedule@inCode
})
</code></pre>
<h2 id="lazy-evaluation">Lazy evaluation</h2>
<p>The methods dispatch on the class of the arguments, which requires
evaluating the arguments. Thus lazy evaluation is not possible for
these arguments.</p>
<pre><code class="language-{r}">wont_eval = function(x, ...) NULL
wont_eval(stop())
# NULL
setGeneric("will_eval", function(x, ...) NULL)
will_eval(rnorm(1))
# NULL
will_eval(stop('first arg here'))
# Error in will_eval(stop("first arg here")) : first arg here
</code></pre>I’ve been reading John Chamber’s book Extending R and it feels like another one of those things that I wish I would have read and understood long ago. He argues for using S4 over S3 for more general software engineering in R.customizing and extending R code2018-06-11T17:53:00+00:002018-06-11T17:53:00+00:00/customizing-and-extending-R-code<p>By customizable, we mean that we can control the behaviour of a function via
its parameters, and specifically, that we can pass a function in to the
function that is used to do a particular step in that function. Below we
discuss the scheduler parameter which takes a function that computes the
schedule. By being able to provide a function, we don’t have to define a
new class and a new method and then create an instance of that new class to
get our new method invoked. The function is more direct, dynamic and
ephemeral. It is in effect for this function call.</p>
<p>By extensible, we are referring to infrastructure and more specifically
extensibility via class extension/subclassing/interfaces in the Object
Oriented Programming (OOP) world. So we can extend the existing code base
without modifying it by defining one or more new classes (typically derived
from an existing class) and then providing methods for this new class. Then
we create a new instance of our new class and pass it into the existing
system and the new methods get invoked appropriately.</p>
<h2 id="application">Application</h2>
<p>We’ll explore these concepts in the context of a concrete example.
The software described below transforms regular R code into a version that uses
task parallelism, which means that different code blocks execute
simultaneously. This graph illustrates the steps:</p>
<p><img src="/assets/basic_model.png" alt="basic model" /></p>
<p>The package contains a function <code class="language-plaintext highlighter-rouge">task_parallel</code> implementing the above
steps, so users write:</p>
<pre><code class="language-{r}">newcode = task_parallel(oldcode)
</code></pre>
<p>We discuss how to design this function <code class="language-plaintext highlighter-rouge">task_parallel</code> to be flexibile and
customizable with respect to the four objects in the diagram:</p>
<ul>
<li>input code</li>
<li>task graph</li>
<li>schedule</li>
<li>output code</li>
</ul>
<p>and with respect to the three functions in the diagram:</p>
<ul>
<li>dependency analysis</li>
<li>scheduling algorithm</li>
<li>code generator</li>
</ul>
<p>This document demonstrates incremental steps to make functions more
extensible and customizable.</p>
<h2 id="simple">Simple</h2>
<p>Here’s the simplest way to implement <code class="language-plaintext highlighter-rouge">task_parallel</code>:</p>
<pre><code class="language-{r}">task_parallel = function(code)
{
tg = task_graph(code)
sc = scheduler(tg)
code_generator(sc)
}
</code></pre>
<p>This has the advantage of convenience for the user, because it implies the
following mental model.</p>
<p><img src="/assets/simple.png" alt="simple model" /></p>
<p><code class="language-plaintext highlighter-rouge">task_parallel()</code> is a black box, completely abstracted away. The user only
controls what code they pass in. Sometimes this level of abstraction is
entirely appropriate, as we don’t want to force users to have to understand
all aspects of the implementation. Indeed, the entire purpose of a function
is to provide a convenient abstraction. The problem with this
implementation is that it fixes the scheduling algorithm and code generator
so that the user has no control over the behavior of the <code class="language-plaintext highlighter-rouge">task_parallel</code>
function.</p>
<p>As we make the function customizable and extensible we would like to always
keep this simple behavior that allows users to write <code class="language-plaintext highlighter-rouge">autoparallel(code)</code>.
It’s convenient, it’s the most common use case, and it includes users who don’t
care about understanding any of the underlying mental models, they just
want the end result.</p>
<h2 id="multiple-inputs">Multiple Inputs</h2>
<p>The user may prefer to pass in different types of input to <code class="language-plaintext highlighter-rouge">task_parallel</code>.
For example, they may have the name of a file containing code,
a bunch of code in a character vector,
or maybe a language object produced by R’s <code class="language-plaintext highlighter-rouge">parse()</code> function.</p>
<p><img src="/assets/multiple_inputs.png" alt="multiple inputs model" /></p>
<p>We can keep the current definition for <code class="language-plaintext highlighter-rouge">task_parallel</code> and achieve this
behavior by making <code class="language-plaintext highlighter-rouge">task_graph</code> a generic function. This means <code class="language-plaintext highlighter-rouge">task_graph</code>
will dispatch on the class of the input <code class="language-plaintext highlighter-rouge">code</code>. This is a better choice
than making <code class="language-plaintext highlighter-rouge">task_parallel</code> a generic function, because then <code class="language-plaintext highlighter-rouge">task_graph</code>
becomes flexible and propagates this flexibility through to the calling
function <code class="language-plaintext highlighter-rouge">task_parallel</code>. In S3 our code might look like the following:</p>
<pre><code class="language-{r}">task_graph = function(code, ...)
{
UseMethod("task_graph")
}
task_graph.character = function(code, ...)
{
# ... Disambiguate file names from a character vector of unparsed code
TODO: use of I() here
task_graph(parse(filename))
}
task_graph.expression = function(code, ...)
{
# The actual work of building a task graph
}
</code></pre>
<h2 id="passing-arguments-through">Passing Arguments Through</h2>
<p>The scheduler happens to be the most complex step in the process, and we
would like to provide a way for users to easily control these parameters.
R’s ellipses <code class="language-plaintext highlighter-rouge">...</code> provide a mechanism for this.
Note that it really only makes sense to use this with a single function.</p>
<pre><code class="language-{r}">task_parallel = function(code, ...)
{
tg = task_graph(code)
sc = scheduler(tg, ...)
code_generator(sc)
}
</code></pre>
<p>Now if a user wants to specify another argument to the scheduling step, say
<code class="language-plaintext highlighter-rouge">maxworkers = 3L</code> to create a schedule with three workers they can easily
do this:</p>
<pre><code class="language-{r}">newcode = task_parallel(code, maxworkers = 3L)
</code></pre>
<p><img src="/assets/dots_model.png" alt="dots model" /></p>
<p>We could take this further and pass in further arguments in the form of a
list from <code class="language-plaintext highlighter-rouge">task_parallel</code> in to the other steps <code class="language-plaintext highlighter-rouge">task_graph</code> and
<code class="language-plaintext highlighter-rouge">code_generator</code>. We are not doing this at the moment for two reasons. First,
we don’t currently see a need for specifying many arguments for these two
functions. Second, it’s easy to add later without breaking anything.</p>
<h2 id="customizability">Customizability</h2>
<p>In the original computational model the scheduling algorithm and the code
generation are meant to be modular. Users can customize the
system by supplying their own functions that implement scheduling or code
generation.</p>
<p><img src="/assets/modular_model.png" alt="modular model" /></p>
<p>The code becomes:</p>
<pre><code class="language-{r}">task_parallel = function(code, scheduler = default_scheduler, ...
code_generator = default_code_generator)
{
tg = task_graph(code)
sc = scheduler(tg, ...)
code_generator(sc)
}
</code></pre>
<p>Now users can define and use their own scheduling algorithms, for example
<code class="language-plaintext highlighter-rouge">genetic_scheduler</code> that uses a genetic algorithm.</p>
<pre><code class="language-{r}">newcode = task_parallel(code, genetic_scheduler)
</code></pre>
<p>Suppose the user wants to modify some part of the pipeline. If the user has
a schedule in hand then they can directly call the code generator, and
there’s no need to use <code class="language-plaintext highlighter-rouge">task_parallel</code>. But they may want to modify the
task graph and pass this directly in. R evaluates arguments lazily, so we
can allow users to pass in a task graph by lifting the first line in the
body of the function into a default parameter:</p>
<pre><code class="language-{r}">task_parallel = function(code, taskgraph = task_graph(code), scheduler = default_scheduler,
..., code_generator = default_code_generator)
{
sc = scheduler(taskgraph, ...)
code_generator(sc)
}
</code></pre>
<p>We could even lift several lines into default parameters, provided that we
avoid circular references.</p>
<h2 id="extensibility">Extensibility</h2>
<p>Some schedulers must be tied to their code generators. We want the runtime
to figure out what the most appropriate code generator is and use that.
This is where the extensibility through object oriented programming comes
in. We change the <strong>package code</strong> as follows:</p>
<pre><code class="language-{r}">generate_code = function(schedule, ...)
{
UseMethod("generate_code")
}
generate_code.default = function(schedule, ...)
{
# ... more code here ...
}
</code></pre>
<p>At this point two of the three steps in the model use methods. We may as
well be consistent and make the scheduling step a method, even though we
don’t expect to dispatch on many different classes of task graphs.</p>
<pre><code class="language-{r}">schedule = function(taskgraph, maxworkers = 2L, ...)
{
UseMethod("schedule")
}
schedule.default = function(taskgraph, maxworkers, ...)
{
# ... more code here ...
class(result) = "Schedule"
result
}
</code></pre>
<p>The primary function becomes:</p>
<pre><code class="language-{r}">task_parallel = function(code, taskgraph = task_graph(code),
scheduler = schedule, ..., code_generator = generate_code)
{
sc = scheduler(tg, ...)
code_generator(sc)
}
</code></pre>
<p>Now we can extend the system through object oriented programming. For
example, <code class="language-plaintext highlighter-rouge">fork_join_schedule</code> is a scheduling algorithm that returns a more
specialized schedule that supports a particular type of code generator.
Then we don’t want to use the <code class="language-plaintext highlighter-rouge">default_code_generator</code>. By using a generic
function the <em>user</em> <strong>OR</strong> the package author can define a scheduling
algorithm with an associated implementation as follows:</p>
<pre><code class="language-{r}">fork_join_schedule = function(taskgraph, maxworkers = 2L, ...)
{
# ... more code here ...
class(result) = c("ForkJoinSchedule", "Schedule")
result
}
generate_code.ForkJoinSchedule = function(schedule, ...)
{
# ... more code here ...
}
</code></pre>
<p>We can call this new code as follows:</p>
<pre><code class="language-{r}">task_parallel(code, scheduler = fork_join_schedule)
</code></pre>
<p><code class="language-plaintext highlighter-rouge">fork_join_schedule</code> creates an object of class <code class="language-plaintext highlighter-rouge">ForkJoinSchedule</code>, and
<code class="language-plaintext highlighter-rouge">generate_code</code> will dispatch to the specialized method. These objects
become implicitly tied together which is what we wanted.</p>
<p><img src="/assets/extensible.png" alt="extensible" /></p>
<h2 id="summary">Summary</h2>
<p>We started with a function <code class="language-plaintext highlighter-rouge">task_parallel</code> that could only be called in one
simple way:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">task_parallel(code)</code>.</li>
</ul>
<p>We preserved this desirable simple behavior and extended it so that users can do any
of the following:</p>
<ul>
<li><code class="language-plaintext highlighter-rouge">task_parallel("some_script.R")</code> methods for different classes of
the first argument.</li>
<li><code class="language-plaintext highlighter-rouge">task_parallel(taskgraph = tg)</code> skips the setup part of the function in
case the user has already done that or they have a special task graph to
use.</li>
<li><code class="language-plaintext highlighter-rouge">task_parallel(code, schedule = my_scheduler, code_generator =
my_code_generator)</code> allows users to customize the steps in the process
by passing in their own functions to perform them.</li>
<li><code class="language-plaintext highlighter-rouge">task_parallel(code, schedule = fork_join_schedule)</code> dispatches on the
class allowing users to extend the system through defining their own
classes.</li>
</ul>
<p>This style of code accommodates three increasingly sophisticated classes of
users:</p>
<ol>
<li>Users who just want to treat it as a black box</li>
<li>Users who understand the model and would like to experiment by passing
in new functions</li>
<li>Users who would like to extend and build upon the system by writing
methods and using object oriented programming techniques</li>
</ol>
<p>The final version of the code supports all of these use cases
simultaneously. Further, we don’t force users to do it in any particular
way. This is nice as each usage style has its merits.</p>
<h2 id="appendix---final-code">Appendix - Final code</h2>
<pre><code class="language-{r}"># Different inputs
#------------------------------------------------------------
task_graph = function(code, ...)
{
UseMethod("task_graph")
}
task_graph.character = function(code, ...)
{
# ... Disambiguate file names from a character vector
task_graph(parse(filename))
}
task_graph.expression = function(code, ...)
{
# The actual work of building a task graph
}
# Allows extension through intermediate objects
#------------------------------------------------------------
generate_code = function(schedule, ...)
{
UseMethod("generate_code")
}
generate_code.default = function(schedule, ...)
{
# ... more code here ...
}
schedule = function(taskgraph, maxworkers = 2L, ...)
{
UseMethod("schedule")
}
schedule.default = function(taskgraph, maxworkers, ...)
{
# ... more code here ...
class(result) = "Schedule"
result
}
# Final user facing function
#------------------------------------------------------------
task_parallel = function(code, taskgraph = task_graph(code),
scheduler = schedule, ..., code_generator = generate_code)
{
sc = scheduler(tg, ...)
code_generator(sc)
}
</code></pre>Duncan Temple Lang, Clark FitzgeraldBy customizable, we mean that we can control the behaviour of a function via its parameters, and specifically, that we can pass a function in to the function that is used to do a particular step in that function. Below we discuss the scheduler parameter which takes a function that computes the schedule. By being able to provide a function, we don’t have to define a new class and a new method and then create an instance of that new class to get our new method invoked. The function is more direct, dynamic and ephemeral. It is in effect for this function call. By extensible, we are referring to infrastructure and more specifically extensibility via class extension/subclassing/interfaces in the Object Oriented Programming (OOP) world. So we can extend the existing code base without modifying it by defining one or more new classes (typically derived from an existing class) and then providing methods for this new class. Then we create a new instance of our new class and pass it into the existing system and the new methods get invoked appropriately.polyglot pipelines2018-04-27T08:23:00+00:002018-04-27T08:23:00+00:00/polyglot-pipelines<p>The <a href="https://randycity.github.io/">amazing Randy Lai</a> recently posted a
<a href="https://stat.ethz.ch/pipermail/r-devel/2018-April/075871.html">bug report</a>
on the <a href="https://stat.ethz.ch/mailman/listinfo/r-devel">R-devel mailing
list</a>.
His example pipes text into an R process that executes the script in <code class="language-plaintext highlighter-rouge">test.R</code>.</p>
<pre><code class="language-{bash}">$ echo "abc\nfoo" | R --slave -f test.R
</code></pre>
<p>R can run on the command line, read from <code class="language-plaintext highlighter-rouge">stdin</code> and write to <code class="language-plaintext highlighter-rouge">stdout</code>,
which means you can build shell pipelines with R. This can
be a nice way to combine several data processing steps in multiple
languages.</p>
<p>$ cat data.txt | R –slave -f step1.R
from Randy Lai</p>The amazing Randy Lai recently posted a bug report on the R-devel mailing list. His example pipes text into an R process that executes the script in test.R.automated query optimization in R2018-03-08T08:37:00+00:002018-03-08T08:37:00+00:00/automated-query-optimization-in-R<p>Lately I’ve been reading the documentation for Apache Calcite.</p>
<p>The natural join is associative and commutative.</p>
<pre><code class="language-{R}">
d1 = do.call(data.frame, as.list(1:10))
d1$key12 =
d3 = do.call(data.frame, as.list(letters))
</code></pre>Lately I’ve been reading the documentation for Apache Calcite.