Tuesday, August 12, 2014

Create a static site with Hakyll, Github and Travis CI

"Static sites are fast, secure, easy to deploy, and manageable using version control." So states the webpage for Hakyll, a great way to set up a static site or blog. It allows you to write blog posts by simply editing markdown in git, all the while having access to delicious Haskell for deeper customizations.

You can configure things to let you write blog posts directly on Github's interface and use Travis CI to deploy your changes. Most day-to-day blogging will not require using Haskell at all or even having the Haskell environment installed on the blogger's machine.

I'll show you how to set everything up, including an optimized Travis config that can deploy changes in less than a minute. There is some existing information online about doing this sort of thing, but it's all outdated in one way or another.

We'll be using Github Pages to serve the final assets. I'll assume you're making a static site for a Github organization called myorg and want it to live at myorg.io.

Installation

  1. Create a Github organization. E.g. myorg
  2. Create a project myorg/myorg.github.io
  3. The master branch will be repeatedly overwritten and committed later on by Travis, so you won't make any edits there directly. For now add a file to the root of the master branch called CNAME containing the string myorg.io
  4. Create two A records in the DNS for myorg.io pointing at 192.30.252.153 and 192.30.252.154 respectively.
  5. Generate the base Hakyll project.
  6. Create an orphan source branch in your git repo and copy the generated files there.
  7. You (and Travis) will use cabal to run the site builder, so create a myorg.cabal:
  8. Reuse the cabal sandbox you created earlier:
  9. Keep build artifacts out of git by adding these lines to .gitignore
  10. Run your new site locally to see that it works!
  11. Create .travis.yml and add the following boilerplate:
  12. Generate a Github auth token.
  13. Set encrypted environment variables to allow Travis to commit to the master branch
  14. Commit all the files.
  15. Enable Travis for your repo. Instructions here.
  16. Push the source branch to Github.
  17. Watch the deploy progress at https://travis-ci.org/myorg/myorg.github.io
Now you can create and edit blog posts right in Github and your changes get deployed automatically.

(optional) Generating a custom cabal sandbox for Travis


You can use my shared cabal sandbox on Travis as done above, or you can build your own. It's a little trickier. Use this Travis config as a start. It takes advantage of post-build deployment to S3.

Sunday, June 8, 2014

Pair programming with Haskell and Digital Ocean

I've been wanting a standardized Haskell development environment so I created a vim config and Vagrant image with everything I've always wanted. Check out the screencast below to get an idea how it works.

Try it for yourself at begriffs/haskell-pair.

Tuesday, April 29, 2014

Database migrations without merge conflicts

Ever discover an open source project that makes you overjoyed? That moment when you feel like somebody gets it, and things are getting better? That's how I felt tonight:


This is a database migration system built for the git era. It supports non-linear patch history and a system of detailed logging, reversion, and tagging. Oh, and it's built around test driven database design.

Here's one ingenious property of the system. It has a special format for the so-called "plan file" it updates as you apply or revert migrations. People are free to work on multiple branches of a codebase and change the plan file willy nilly. You might be used to getting merge conflicts in your database schema, but the plan file format is designed to merge cleanly and automatically in git with the union merge strategy. Just put sqitch.plan merge=union into your .gitattributes file and relax.

Sqitch is under active development for "better integration with version control systems...to make managing idempotent reworkings even easier."

Objections

  • "It's written in Perl, eww!" Yeah I don't write perl either, but the project is easy to install (even has a homebrew tap) and you don't have to worry about it. That's like complaining about the fuel used in a hyperdimensional warp drive as you ride in your horse-drawn buggy.
  • "It wants me to write crusty stored procedures?" No, not at all. Write the SQL of your choice to match the architecture of your app.
  • "Is that an unsalted md5 password in the documentation example?" Don't worry, read the whole documentation. Some examples intentionally show the wrong ways of doing things in order to correct them later and demonstrate concepts.

Sunday, April 13, 2014

Good songs in classical, romantic, impressionist and 20th century art music

I'll list only those songs I really like but add a (★) to the those that have especially stuck with me.

Find a cozy chair, turn up the volume and let's begin.

Aaron Copland

A man of musical contradictions. He has distinct styles, from forbidding angular melodies, to gentle populist songs (for instance commercial background music for a puppet show). He's all-American, like a musical Frank Lloyd Wright.
  • Concerto for Piano and Orchestra (1926). A poignant song of spacious yearning. The second movement gets ragtime and goofy though. [youtube]
  • Two Pieces for Violin and Piano (1926). It confides a sad secret.
  • Piano Variations (1930). Contemptuous, self-sufficient. [youtube]
  • Sunday Afternoon Music (1935). A simple, pure little song.
  • Appalachian Spring (1944) ★. There is a recording of Copland rehearsing it with an orchestra. On the last movement he makes them retry several times, saying "no, do it like a prayer." It's a musical prayer which has stayed with me ever since. [youtube]
  • Midsummer Nocturne (1947). Jazzy yet contemplative. [youtube]
  • Quartet for Piano and Strings (1950). Complex, with a peppy neoclassical second movement. [youtube]
  • Two Ballads for Violin and Piano (1957). Like an edgy lullaby.
  • Night Thoughts (1972).

Virgil Thomson

His compositions have an interesting disinterestedness. It's as if he indicates his musical exhibits with a pointer, murmuring, "now observe this melody." Spend some time with his music and enjoy its curious aloofness.

  • Symphony No. 1 "On a Hymn Tune" (1928) ★. Sweet hymns mingle with tart accompaniment. A memorable piece that displays Thomson's characteristic hesitating transitions and American flavor. Listen to both the orchestral and piano arrangements for a better appreciation of the song.
  • Symphony No. 2 (1941). A mercurial romp, sometimes emotionally distant, sometimes intimate.
  • Solitude, a portrait of Lou Harrison (1945). A thorny bramble.
  • Mother of Us All (1947). Ordinarily I don't enjoy opera, its artifice distracts. But this opera is hypnotic and impulsive.
  • Four Songs to Poems by Thomas Campion (1951). A beautiful arrangement. (I could only find crappy recordings online though)

Erik Satie

Satie lived by his own secret rules. He created an elaborate religion, posted space for let ads in the paper for imaginary castles, developed refined gothic penmanship, wore identical velvet suits every day. He was "born very young in a very old world." If you get the chance look at his musical scores because they have hidden messages for the performer. I find his first works best and most authentic. Later his music got more compromised to pay the bills. You can listen to his cabaret songs to appreciate how he suffered in life, being often broke and at once point evicted from his cramped Montmartre studio and forced to move to Arcueil.

Imagine him on his long walks (he walked everywhere) under lamplight watching the world and judging it by his internal rules. The works are often built in threes, and are designed to view the same musical material from three angles. Many of the song titles are absurd. Satie took refuge in irony and did not like to reveal his real emotions.

The music does reveal them, however, in a precise, odd, inspired, absolutely unique voice. His compositions might seem infantile to virtuosic performers and people with conventional minds, and Satie even took pains later in life to learn traditional composition at the Schola Cantorum. It simply didn't suit him. He played what he played and wrote what he wrote, and thankfully kept true to himself.

  • Ogives (1886). Ascetic and stern. I eat this stuff up. [youtube]
  • Trois Sarabandes (1887). Notice the lack of musical development. The melodies are fragments in a typical Satie dreamworld. [youtube]
  • Trois Gymnopédies (1888). ★ This is Satie, this is what he sounds like, the perfect little tuning fork of his soul. [youtube]
  • Gnossiennes (1889-97). A mystical-contemplative dance. Later pieces have an elegant eccentricity. [youtube]
  • Le Fils des étoiles (1891). Dense, dry, tintinnabular. [youtube]
  • Trois Sonneries de la Rose†Croix (1892). Music for his mystic order, structured around the golden ratio. [youtube]
  • Uspud (1892). A "Christian ballet" with dark ritualistic undertones. [youtube]
  • Danses Gothiques (1893). Static eerie dances. [youtube]
  • Vexations (1893). ★ He lost his love, a vivacious painter named Suzanne Valadon. He wrote her for thirty years thereafter but she never came back. This is his song about it. [youtube]
  • Dans de travers, No. 2 (1897). Patterns unfurl. [youtube]
  • Arrière-propos (1912). Like a jazzier Fils de étoiles.
  • Préludes flasques pour un chien (1912). Sedate and gnomish. [youtube] [funny synth remix]
  • Croquis et agaceries d'un gros bonhomme en bois (1913). Oddball waltz. [youtube]
  • Descriptions automatiques (1913). The first piece is like a tender soliloquy. [youtube]
  • Avant-dernières pensées (1915). ★ Mesmerizing ostinato. The score has a weird little story inside. What does it mean? [youtube]
  • Sonatine bureaucratique (1917). Balanced and self-assured. [youtube]
  • Carnet d'Esquisses et de Croquis (1919). Delightfully deranged ditties.
  • Nocturnes (1919). ★ Haunting. [youtube]

Federico Mompou

Listening to Mompou is like discovering a secret attic you never thought existed. It’s full of mysterious shapes covered in fabric and dusty pictures in tasteful frames. The attic stairs behind you seem to stretch far away now, down into that impossible world of modern times. Or have you been up here all along and have yet to explore the world outside?

  • Pressebres (1917). Folksy and out of kilter. [youtube]
  • Scènes d'Enfants (1918). Expansive and sunny. [youtube]
  • Trois Variations (1921). Variations on a pure little melody. [youtube, now watch the patterns: youtube]
  • Charmes (1921). ★ Magical sleepwalking. [youtube]
  • Paisajes (1960). Follow the secret footpath. [youtube]
  • Musica Callada (1959, 1962, 1965, 1967). ★ Written in four books. Inspired by the the mystic poet St. John of the Cross' who wrote about, "La Música Callada, la Soledad Sonora" or "the silent music, the murmuring of solitude." [youtube]

Claude Debussy

Debussy, the impressionist’s impressionist, master of lush webs of sound and of subtle orchestration. His music loses itself in a scintillating haze of colors and associations. It ushered in a freedom of composition and nuance lacking in the Germanic romanticism preceding it. As Satie remarked, “I explained to Debussy that a Frenchman had to free himself from the Wagnerian adventure, which wasn't the answer to our national aspirations. I also pointed out that I was in no way anti-Wagnerian, but that we should have a music of our own -- if possible, without any Sauerkraut."

Here’s a funny video that explains more about this musical transition.

  • Beau Soir (1879). Velvety. [youtube]
  • Salut printemps (1882). Composed for the Prix de Rome scholarship competition. Breezy, elegant, slightly pentatonic. [youtube]
  • Printemps, Suite symphonique (1887). richly orchestrated [youtube]
  • Deux Arabesques (1888). Charming and feminine [youtube]
  • Suite bergamasque (1890, published 1905). Overplayed easy listening, but why not? It’s graceful and good. [youtube]
  • String Quartet in G minor (1893). Memorable and lively. [youtube]
  • Prélude à l'après-midi d'un faune (1894). ★ Languorous, kaleidoscopic [youtube]
  • Nocturnes (1899). ★ Sumptuous strings and a luminous choir. [youtube]
  • Danse sacrée et Danse profane (1903). A shimmering harp piece that contrasts the love of the spiritual and the natural. [youtube]
  • La Mer (1905). Subtle orchestration evoking a rather literary depiction of the sea. [youtube]
  • Images pour piano, Deuxième Série (1907). Ringing and chatty. [youtube]
  • Preludes book 1 (1910)
    • "Footprints in the Snow." ★ Cold, delicate and resigned. Only the footprints remain, the person is never coming back. [youtube]
    • "The girl with the flaxen hair." ★ An intimate gold-tinted memory. [youtube]
    • "The submerged cathedral." A contemplative otherworldly dive. [youtube]
  • Première rhapsodie for clarinet and orchestra (1910). Meandering songbird-like piece. [youtube]
  • Douze Études, book 1 (1915). Hard-to-play studies in chords built of different intervals.
    • Étude 4, pour les sixtes. Like distinguished but tarnished jewelry. [youtube]
  • Page D’Album (1915). Smooth and clear. The left hand reminds me a little of Satie’s Gymnopédies. [youtube]
  • Sonate pour flûte, alto et harpe (1915). Melancholy, somewhat senile wandering. [youtube]
  • Sonate pour violoncelle et piano (1915). A dextrous and sinewy duet. Delightful! [youtube]

Toru Takemitsu

Takemitsu started out writing film scores and making experimental electronic music. As he became better known he combined traditional eastern sounds and melodies with the western avant garde. The result is rich orchestral haiku. He ranges from jazzy pop tunes to eerie alien soundscapes with plenty of good stuff in between.

  • Clouds at Sunset (1967). Lounge music with a Harpsichord! [youtube]
  • A Secret Post-Tokyo War Story Soundtrack (1970). Hippy bongos and vibraphone. [youtube]
  • Les Fils des Etoiles (1975). Satie remix.
  • Ballad of Orin Soundtrack (1977). Eerie and imaginative
  • In an Autumn Garden (1979). Beautiful intense droning. [youtube]
  • Rain Tree (1982). Translucent and mystical. [youtube]
  • I Hear the Water Dreaming (1987). Ambiguously restless. [youtube]
  • Rikyu Soundtrack (1991). Fascinating mix of baroque melodies and anxious strings. [youtube]

Bohuslav Martinů

Martinů is the shephard tone of composers. Listening to his stuff feels like climbing a mountain, it gets increasingly hectic and then makes a sudden “switch” to expose a broad vista at the top. Then it starts climbing again and you realize you haven’t reached the real peak.

That said, he does use a certain musical gesture compulsively. I’m fond of it but once you listen to a few songs you’ll know what I mean. Enough talking, get ready to be dazzled by an inventive and overlooked composer.

  • Three Czech Dances (1926). Frenetic, almost like a player-piano score. [youtube]
  • The Butterfly that Stamped (1926). Fluid and percussive, awash in combinations. Starts a little slowly. [youtube]
  • La Revue de Cuisine (1927). ★ Fresh and irregular. [youtube]
  • Suite Miniature: Seven Easy Pieces for Cello and Piano (1931). Tight, balanced duet. [youtube]
  • Les Ritournelles (1932). Nervous and cerebral. [youtube]
  • Julietta, moderato (1937). Quiet and kind of mind-bending. [youtube]
  • Sonata No. 1 for Cello and Piano (1939). Lively, with the surprising twists characteristic of Martinů’s mature style. [youtube]
  • Sinfonietta Giocosa, first movement (1940). ★ Soaring. [youtube]
  • Dumka No. 3 (1941). Matter-of-fact.
  • Piano Quartet (1942). ★ Rolling, boisterous. [youtube]
  • Symphony No. 1 (1942). Spacious and majestic. [youtube]
  • Fantasia for Theremin with Oboe, String Quartet and Piano (1944). ★ Spectral melodies with exciting piano/string accompaniment. [youtube (pitchy recording)]
  • Etudes and Polkas (1945). A river of whirling sounds. [youtube]
  • Toccata e Due Canzoni (1946). Stormy and suspenseful. [youtube]
  • Rhapsody Concerto for Viola and Orchestra (1952). Full and passionate. [youtube]
  • Sonata No. 3 for Cello and Piano (1952). Twisting topsy-turvy. [youtube]
  • Oboe Concerto (1955). Buoyant. [youtube]
  • Chamber Music No 1, second movement (1959). A touch of almost Copland. [youtube]
  • Nonet No 2 (1959). Charming neoclassical work, written as a goodbye on his deathbead. [youtube]

To be continued...

Check back for more great music. Much more.

Tuesday, April 8, 2014

Tikhon Jelvis' ideas about Structural Merging

This afternoon I paired with Tikhon. He's a Haskeller, researcher, and the organizer of the SF Types, Theorems, and Programming Languages group. His project is to extend the unix commands diff and merge to understand and work better on JavaScript code.

As a product of the Unix tradition the standard diff program operates line by line between files. This affects not just diff itself but programs like git that rely on it. Have you ever changed a program in a way that does not affect its operation such as changing indentation and then been forced to make a big git commit? Have you ever changed the name of a variable and caused a big fragmented commit? Tikhon believes that small changes of meaning should appear as small diffs and the reason that they currently don't is that we still think in terms of teletypes rather than syntax.

Most importantly Tikhon realized that operating crudely on lines can create merge conflicts when there needn't be any. For instance, consider this original file:

One person edits it by moving one function inside the scope of the other.

Another edits it by changing variable names.

The merge fails! Resolution requires accepting one version and manually adding the changes from the other.

His solution: a structural merge. A traditional diff sees each of these changes as many lines, whereas each of the files being merged differs by only a single structural change, and those changes can be harmlessly resolved. In terms of syntax, the first change (moving foo2 inside foo) looks like this



The second (renaming the variables) looks like this

These representations are created using the Zhang-Shasha tree edit-distance algorithm. It indicates "tree diff" in terms of the node operations move, relabel, add, and delete. The algorithm finds the minimum number of applications of these rules to transform one tree into another.

Interestingly if we create a tree diff of tree diffs themselves we can use it to display more meaningful merge conflicts. The diff of the two diff trees above looks like this

A second pass with a simplifying algorithm shows there is exactly one edit operation introduced by each change. A structural merge program can interactively ask the user which edit operation to apply (and can do them both if requested).

Tikhon's big hurdle is to make his tree diff fast. As he quipped, "[it runs in] exponential time...I'm not a fan." The solution is dynamic programming, and in a lazy language like Haskell with immutable data structures it only takes a tiny change in a program to automatically memoize functions and enable dynamic programming. We spent the day investigating how to do it for his tree diff function, but began by playing with it in the simpler problem of string edit distance.

Let me show you the trick first. It uses laziness and co-recursion to make the function and its lookup table always keep one step ahead of each other in a magical circle. Observe how it is used to generate Fibonacci numbers.

Let's see a naive implementation of string edit distance and how to transform it with The Trick. It's a Haskell implementation of the Wagner–Fischer algorithm which recursively calculates the edit distance of every initial segment of the two strings eventually working up to the original strings. Using the edit operations insert, delete, and substitute it can be expressed succinctly as


Translated to Haskell it becomes

The trick to make it fast is to co-recursively fill in a lookup table with the edit distances of initial segments, and to calculate edit distances...by referencing the table. Mind = blown.

After implementing the function above that returns merely the minimum edit distance, we augmented it to return an array of the actual edit actions needed. Got into some performance problems of repeatedly calculating the length of those arrays when checking for the minimum, but found a way around that problem.

What remains is to translate this nice memoized string edit distance to trees using tree edit operations rather than string operations. Notice the lookup table we used above is a two-dimensional array indexed by the length of segments. To translate the lookup table strategy to trees we need a way to uniquely name partial-traversals, which we could do by choosing the number of hops along the traversal to be the "index." (We experimented with using a Haskell Map keyed off the trees but that was really slow.) Ultimately we did not complete the refactor to make the tree diff sub-exponential speed, but we discovered how it will be done.

Friday, March 28, 2014

Magic numbers in polynomial hash functions

Every time I see copypasta polynomial string hash functions on the internet I am mystified by the arcane and magical numbers they contain. Today it's time to find out which numbers are acceptable and why. Scanning stack overflow discussions and spending some time at the blackboard has revealed the beginning of the secret.

Polynomial hashes are computed from a base number and the character codes of an input string. Let \(s_0 \ldots s_{k-1}\) be the codes of each input character in string \(s\). It's our job to choose constants \(b\) and \(n\) to minimize collisions in the hash function \(h(s) = \sum b^i s_i\ \text{mod}\ n\), where \(b\) is an arbitrary number and \(n\) is the number of buckets in our hash table.

Increasing \(n\) certainly helps. If \(n=1\) then everything will collide and we needn't worry about \(b\). So fix \(n\) as large as sensible for application memory. We'll see that certain choices of \(b\) are statistically better than others. Certain choices are really bad.

Let's get to our conclusion the roundabout way and see what happens when we pick bad values. Assume \(n \mid b\), that is \(b = nm\) for some \(m\). In this case \(h(s) = \sum b^i s_i = s_0 + nmX \equiv s_0\ \text{mod}\ n\). Hence only the first character of the string affects the hash value. This is terrible performance.

More generally if \(i \mid b\) and \(i \mid n\) for \(i > 1\) then \(b = ij\), \(n = ik\) for some \(j\) and \(k\). Thus each \(\sum b^i s_i\) can be written \(s_0 + ijX\). As in the previous case the \(s_0\) term turns out to be more important than the others. Notice \(s_0 + ijX \equiv s_0 + ijY\ \text{mod}\ ik\) iff \(ij(X-Y) \equiv 0\ \text{mod}\ ik\) iff \(j(X-Y) \equiv 0\ \text{mod}\ k\). That's not good -- the final terms, whatever they may be, are modded by \(k\) which is \(i\) times smaller than \(n\). Smaller modding means fewer bucket choices which makes collisions more likely.

Which brings us to the first conclusion: choose \(b\) and \(n\) to be relatively prime. Beware that integer arithmetic is already modular, so \(h(s)\) is really \(h(s)\ \text{mod}\ 2^{32}\). Don't choose \(b\) as a power of two (in fact choose it to be odd) or else \(\gcd{(b, 2^{32})} > 1\).

This is why the typical hash function snippet on stack overflow uses a prime for \(b\). The author doesn't know what you'll pick for \(n\) so they play it safe. However there is still an interesting question about which prime to pick. Sadly coprimality, while necessary, is not sufficient to guard against collisions. I wrote some code to test various strings and constants.

There are \(26^3 = 17576\) length three strings of lowercase letters. If we let \(n = 17576\) and run through all relatively prime choices of \(b < n\) there are plenty of bad values. To get a feel for how the performance varies, I sorted the number of keys that collide with any other keys as \(b\) varies. (The x-axis below is not \(b\).) The graph gives a feeling for the range of success.


For the best \(b\) a whole 89% of keys are collision-free. At the worst end all but six collide. Apparently there is some deeper stuff going on. That's as far as I'm going to take it for now.

Wednesday, March 5, 2014

Beyond HTTP Header Links

The idea of Hypertext As The Engine Of Application State is that each hypermedia resource provides links to related resources. These links are typically called affordances. Clients can decouple themselves from a server by following affordances rather than constructing -- or memorizing -- URLs. Servers are then free to change URLs or gracefully degrade service.

One of the more popularly adopted affordances are pagination link headers. The server includes links to the prev, next, first, and last pages at each page in a series. The client can disregard the URL scheme and just follow the links to access the pages as a doubly-linked list.

Problem: pagination is often random-access. Look familiar?


Sure, it has previous and next links but there are also links to go directly to numbered pages. Some interfaces embrace random access even more directly.


In a potentially unbounded sequence we can't include a header link to each page. What link relations would we use, and how would the client know what they mean? Ideally we would like a content agnostic way to paginate large resources, a standards-compliant way that doesn't have to adjust the URL.

If we can find a way to expose affordances for random-access pagination then why include the traditional four links? I'd prefer a standard way to access any page at all, which would make the first, last, next and prev links superfluous.

The solution is, I think, range headers. They were originally constructed to resume big downloads. Perfect! What is pagination but a slow download into human eyes? The HTTP 1.1 spec provides a standard way to request part of a resource and to discover how much is left to go. Let's see how it works.

Request
  GET /resource
Response
  Status 206 (partial content)
  Accept-Ranges: items
  Range-Unit: items
  Content-Range: 0-249/1000000

Here the client asks for a resource. The server doesn't want to send more than two hundred and fifty items at once, so it sends a partial response. A client that understands the "Accept-Ranges" affordance can now request specific ranges.

The client needn't inspect or adjust the URL to select a new range. It just sets a header

Request
  GET /resource
  Range-Unit: items
  Range: 0-24
Response
  Status: 206
  Range-Unit: items
  Content-Range: 0-24/1000000

Notice the negotiation. Both the client and server have limited numbers of items they want to consume or serve at once. In the first request the server limited the response, and in the second it honored the client's request to further limit it. At this point the client knows what range it has been given, along with the total number of items. Requesting any page is just math.

I'd suggest this supersedes the classic pagination link headers. It's equally restful yet more powerful.