Announcing Numbers API
Back when I was interning at Khan Academy a few months ago, my intro project was to create a dashboard for exercise statistics. Among the variety of graphs and widgets was one that reported a factoid about the number of shipped exercises, such as “We now have more exercises than the number of years Harriet the Galápagos tortoise lived (1830–2006).”
People seemed fascinated by their tidbits of random trivia. Sal Khan even suggested hosting a web service for these facts at numbers.khanacademy.org, but nothing came of that. That is, until now.
My friend Mack and I were deciding on a side project to distract our minds from Bode plots and Jacobians (if only Sal had some videos on Control Theory…), and narrowed down a crapload of ideas to just one. The winner was an API for interesting number facts, based on the criteria of being novel, completable in a few weekends, and interesting to us.
We built Numbers API with Node and Express for the back-end and Sass + Compass to make CSS bearable. We designed the landing page and the web service together, and then Mack put together the first iteration of the landing page while I set up the server and wrote the API + docs. We then sort of swapped roles, and I had the chance to prettify the landing page to my tastes. I also yet again self-plagiarized from my work at Khan Academy and did 20% of the work required to convert the rolling numbers counter into a jQuery plugin (the other 80% is more testing, performance tuning, code clean-up, and greater extensibility).
Mack, on the other hand, started hunting for content. I believe he used LXML to scrape Wikipedia and a Python natural-language processing toolkit to process sentences into a consistent grammatical form. This form was specifically chosen to be flexible enough to be used in a sentence like “42 is <fact string>.” as well as “We now have more exercises than <fact string>.” (I leave the exact form as an exercise to the reader.) This turned out to be non-trivial for trivia facts, and we even resorted to manually combing for interesting facts on cardinal numbers (which were far outnumbered by numbers used as names and codes) and phrasing them appropriately.
This is just the beginning. Time-permitting, we are also thinking of extracting and paraphrasing interesting facts from the Guinness World Records, almanacs, numbers in nature, statistics, number of works by famous artists, etc.
Here’s a charts of API usage statistics:
Have fun, and we can’t wait to know how you’ll be using these number facts! We ask only that you be gentle to our server.