Colin's Blog: 2011

Saturday, December 17, 2011

Learning Goals

Math:

-more statistics

-all about bayesian inference

-linear algebra

Programming:

-more R

-a functional language (probably Scheme or Haskell)

-Objective C

-More Java, because that's what UA teaches, but meh.

-web backend stuff.

Hacking Skillz:

-vi(m)

-more regular expressions

Classes to take:

ISTA 410: Bayesian inference

ISTA 450: Artificial Intelligence

ISTA 421: Machine Learning (not sure if class exists yet)

CSC 345: Discrete Structures

CSC 445: Algorithms

To Do:

-Finish ISTA degree leaning towards mathematical/computational side.

-CSC minor, take the classes I want, don't worry about completing a second major.

-Section Lead ISTA 130

-Continue to be involved in SISTA

-Get a good internship

-Keep learning stuff on Khan Academy

-Take opportunities to talk to smart people

-Try not to be too tied to short-term job with attractive perks if it starts to get in the way of anything else on this list.

Wednesday, December 14, 2011

Harold Cohen's talk for the SISTA colloquium was pretty interesting. In particular, I was intrigued by the relative simplicity of his color selection algorithm. Essentially, he used a random number generator, or some kind of mathematical pattern to select 9 numeric values, which he rescaled to become values (no pun intended) for the 'V' component of the HSV color model. From those 9 values, he randomly selected a high, mid, and low that ended up serving as the bright, medium, and dark parts of AARON's paintings.

What's interesting about this is that I have some semi-formal education in fine arts painting, and this very much echoes what I was taught. Highlight, Halftone, and Shadows. I literally have charcoal drawings of nine squares in various shades of gray.

However, in oil paint, I was taught to create my halftones and shadows not by adding black, but by adding a complimentary color. This had the effect of producing more neutral-looking halftones. I'm not really very well-read in color theory, but I'm curious how this method would be replicated algorithmically.

In my third assignment, I built a processing application that drew very simple Mark Rothko style images. Perhaps like rothko, the composition was deliberately simple; my focus was on building a color selection algorithm that was decently good at finding colors that look good with each other. All in all, I'd say my app gets it right maybe 1/3 of the time. And I think the problem is that I can't create truly natural-looking color selections when I'm only varying the V in HSV. I need to figure out what to do with the S at the same time.

And I have no idea how, because I don't know anything about color theory.

Thursday, December 8, 2011

ISTA 301 Blog: Awesome Visualizations I've been saving for a Blog Post

Data Visualization is cool, lets just admit it. And coming up with an intuitive way to do it certainly takes a lot of creativity. Plus some of them are pretty.

The darker spots on this map are the spots that are furthest from a McDonalds. Purdy!

ISTA 301 Blog: More Probabilistic Fun.

More potential has emerged after spending an evening at Time Market with a lot of literature grad students. First of all, the text generator is great fun when fed James Joyce's interminable masterpiece, Ulysses, as source material. Scraping the bottom of the barrel for technology related puns, this one has been christened "Joyce-stick." Its first words are pictured here, along with some of the aforementioned grad student commentary.

But the true inspiration of the night came only after a solid hour of beer sipping: Why not seed the text generator with the words of that infamous mad prophet of the airwaves, Glenn Beck?

Well, we did. And we had dramatic readings right there in Time Market that grew increasingly loud after another round of drinks were ordered. Here's an excerpt:

We learned about Microsoft. What I'm starting to take the set apart and this was a warning in 1939 to the American people. Thank you for sending it. Think how many people even know that? And how it was turned around and it can be used for good. But when it comes to a very hard left groups like Free Press are advocating for. Eric Schmidt, the former CEO talked about where he sees the future. Same thing is happening, America, with your food. Thirty-eight percent of the households even watching her for the next block. And that is the key. That it is that we believe in evil. The only answer to millions in the same things from all over the country was two years ago -- me and you as individuals that listen to the experts. They're usually sells in a year. I mean, I just -- I like live television who told you yet that I've said something wasn't done. You got to print more money. Push it out. Today, it closed at over $1500. What was it -- $1511? That's insane. You would think you come to Washington, D.C. and now, they have been loose with power do. These elitist dopes in the media who are celebrating, I waited for a season because you are being lied to. And you need to know what you are thinking individuals that listen to our programs. Not all of these scenes -- put the six other currencies to determine its value internationally. Six currencies to determine its value internationally. Six currencies and it's the Fed. And that's not good. I think in Australia and the on the road Khilafah or caliphate on the caliphate. That's why it doesn't make sense because oil has to be purchased with dollars -- people holding euros have to sell to the people who hate it here do not believe me when I was only on the air for one hour every day at 5:00. This is where Google's street view cars downloaded e-mails, passwords, and other information. They're because this is the NewsCorp building. This is not difficult to figure out some things, maybe a little longer than all of us, and you try to think, what is it I want to show you a few things." She just happens to be running for vice president at the time." He says and he predicts, I saw this video and think he was serious, that every young person will one day be entitled to automatically change his or her name on reaching adulthood to disown youthful hijinks stored on their friends' so-called media sites. And let's not forget this super, super classic, quote, "We know roughly who you are, and all of us, and you think I have to. I'm not leading any boycott. I hate boycotts. You do with your time and information. They are working really, really closely with the government and that`s a spooky thing. And we've done amazing people. And I have made amazing people. And I have done that lately? Let me keep this brief. Tonight America, here`s what you are the answer, and that we must have dollars.

I did get curious though, as I was copying program transcripts off of Beck's web site and viewing the copyright notification that it obnoxiously embeds in text as you copy it to the clipboard. If I were to data mine Beck's site and build a probabilistic map of his prose, and then, say, post my program and the accompanying database online, would this be considered a copyright infringement?

I mean, it's not really a quotation, nor does it contain any actual prose from Beck that is more than a few words long, however it also contains no original material. It is "analysis," but only in a very mechanical sense. Would it be protected under fair use? As far as I know, nothing like this has ever gone to court (though Beck is not busy defending free speech, he has proven notably litigious.)

Interesting question, no?

ISTA 301 Blog: Briannatomaton

Part of one of my Computer Science 227 assignments was to implement a probabilistic text generator. That is, a program that scans through a large text file, finds patterns in which letters tend to follow which other letters most often, and then generates a new text semi-randomly based on those probabilities. It's loads of fun, as it tends to produce mad-libs style nonsense but in a prose style similar to that of the text that fed it.

Oddly enough, as an ISTA student, I'd already been exposed to this idea in Paul Cohen's class a few semesters ago, and shortly before the 227 assignment was posted, I joked with Brianna about setting up a text generator based on her writing. When it turned out that the final project in one of my classes was to implement this very idea, I was ecstatic.

What it does is read every substring of n length in a large text and then builds a hash table containing list of every character that follows that substring, weighted according to the frequency of its occurrence. I use Google Voice for my text messages, so I was able to download an HTML archive of all the text messages Brianna has sent me in the last 6 months. 15 minutes and a little studying up on regular expressions later, and I had removed all of my own responses as well as all the HTML tags and just and a solid plaintext block of Brianna's texts. I fed this, along with some of her blog and a few papers she wrote as an undergrad into a gigantic text file which totalled about 82,000 words.

Briannatomaton was born. I modified the code slightly so that I could force it to "seed" each randomly generated blurb with a word or two at the beginning, which let me have it "talk to" specific names that occur in the text I fed into it. Why would I want to do that? Simple. Because Briannatomaton got her own facebook page.

Wednesday, November 2, 2011

ISTA 301 Blog: Cut with the Data Mining Pick through Internet's last Caffeine-Fueled Cultural Epoch, or, How Australian Internet Censorship Substituted Smurfs for Smut Without Really Trying

For my second computing and the arts assignment, I was supposed to make a collage in homage to the famous dadaist collages of the early 20^th century. In particular, I chose Hannah Hoch's Cut with the dada kitchen kitchen knife through through the last weimar beer belly cultural epoch in Germany. (Or something. Apparently there are variations on the translation of the name of the piece.)

Hoch's collage features the heads of numerous public figures cut and pasted into strange and seemingly nonsensical contexts. Because so much of the collage consists of material drawn from current events, I decided to imitate that in my collage by drawing all my images from trending google image searches. Since I did this in early October of 2011, I quickly discovered that by far the most common image searches were for young female celebrities and the recently-deceased Steve Jobs. Strictly speaking, this isn't really a data visualization, as I chose freely among the top ten or so trends, which increased the diversity of the images.

I also noticed that Hoch's piece frequently included clippings from newspaper headlines about dadaism, the very movement to which she belonged. I peppered my collage with the word “data” in a similar matter. That was mostly just to be silly.

One of the reasons I thought of creating my collage over a map is that Hoch's collage actually incorporated a map. Her map, however, was an infographic of Europe showing which countries had the most progressive policies on women's rights. If I had a chance to rework this collage, one thing I'd like to do is integrate some kind of visualization of electronic freedom around the world. One possibilities might be to present visual elements more sparsely in parts of the world where internet access is heavily censored. Unfortunately, it would be hard to distinguish this from regions where results are sparse for other reasons (developing nations, and regions that don't use google as much). Another possibility would be to do something based on each nation's laws on copyright and software patents. Like the move toward increased gender equality that was taking place during Hoch's life, I believe that more progressive laws in regards to internet censorship and software patents would eventually lead to greater freedom and perhaps even an economic boom.

I don't really have the skills to do it yet, but another cool thing to do would be to code up a map like this that actually pulls live data from google trends and visualizes it on demand. If I were to do that, I'd probably use a vector map based on national borders rather than a satellite image like I did here. I wonder if I could do something with the Google Maps API...

The images in the collage are drawn from Google Image Search Trends for the first two weeks of October 2011, and correspond (roughly) geographically with the areas map on which they are arranged. For example, in Russia, trending image searches included Yandex, Steve Jobs, Trollface, Emma Watson, and Minecraft.

US: Steve Jobs, black girls, Hope Solo, Sarah Palin

Canada: Justin Bieber, Natalie Portman

South America: Cristobal Colon, Miss Universo, Virginia Gallardo, and Nick Jonas

Northern Africa: Boko Haram, Ronaldo

Central/Southern Africa: Ramadan, Beyonce, Amy Winehouse

Western Europe: Steve Jobs, Jodie Marsh, BMW, Teresa Fidalgo

Northern Europe: Steve Jobs

China: Steve, Liu Yan, Aoi

Saudi Arabia: Blackberry

India: Pooja Bedi

Pakistan: Salman Khan (not the one from Khan Academy, apparently)

Australia: Smurfs, Ryan Gosling

Malaysia: Randy Pangalila

New Zealand: Dan Carter

Wednesday, October 26, 2011

Investing!

I decided to do a little experiment. I set aside some money and opened up a brokerage account with TD Ameritrade (Supposedly one of the better online brokers). I don't really have any investing experience, but I do have some fairly general background knowledge of how the stock market works. I also read a lot of analysis of the technology industry.

Now, in general I know it's not the greatest idea to buy individual stocks if you're a long-term investor and you're not an expert. I do have some money in a Roth IRA, as well as a 401(k) that is fed by my modest college-student earnings. Buying these stocks isn't an investment strategy, it's just for fun. I could spend the money going to Vegas, but all in all I think I'll enjoy myself more (and probably retain my money longer) investing it in companies that I have some knowledge of.

SO! Here's what I bought:

1 share of Amazon @ $203
Amazon was one of the few survivors of the first dot com bubble. Now they're the biggest and best online retailer, and they're also offering first rate cloud computing services now, as well as being being the second company in history to launch a tablet that people are falling over themselves to buy. Amazon was by far my most expensive buy, but I think it's a relatively safe bet.

2 shares of Netflix for $159 total
Netflix is the biggest name in streaming media. They've made some PR gaffes in the last few months and raised prices which caused them to lose about 4% of their customers, but I think their ultimate intention (which is to focus more on streaming and less on mailed DVDs) is forward-looking and intelligent. Analysts are making snide remarks about their plummet in share price, which was near $300 in mid summer, and now it's down below $80. Temporary setback and good buying opportunity, in my opinion. Streaming is the future, and I think Netflix shares will recover.

7 shares of Nokia for $46 total
This one is my biggest gamble. Nokia makes the kind of simple, cheap, functional cell phone that is slowly being replaced by much cooler smartphones. They're betting heavily on Microsoft's new Windows Phone 7. WP7 is an excellent product, but it's late to market. They're the underdog. Being the underdog is why Nokia shares are under $7 each. Cheap enough for me to take a gamble on.

Tuesday, October 4, 2011

Important Message to America

I wasn't going to post this but some trusted friends convinced me to try to get the word out in spite of the risk it might bring to me personally. The iPhone 4S has a dual core A5 processor. If you do the math, two A5's come out to A^2 + 10A + 25 cores of power on every silicon. Graph it. It's a parabola. Fact: Intellectual Ventures holds 27 parabola-related patents (the same number as there are amendments to the U.S. Constitution. Coincidence? You decide.) Why do you think Apple is so secretive? If you buy an iPhone 4S with a dual core processor, or any phone that utilizes parabolas, Nathan Myhrvold can send an Intellectual Ventures goon to your house beat you and your family with his expensive Taliban-backed cookbook. The recently passed America Invents Act contains a buried clause that allows him to do this to you by bypassing habeus corpus (which is latin, in case you don't read history) and establishing Intellectual Ventures as a secret fourth branch of government. Democrats and Republicans voted on this law and Obama signed it with NO media attention. Don't believe me? History is on my side; Galileo was laughed at until the Free Masons backed him against the Pope. Intellectual Ventures was founded by two ultra-wealthy Microsoft execs. That's the same Microsoft that put the "MS" in MSNBC, a news channel that let the America Invents Act slide by without proper scrutiny. It makes sense. The only way to stop this is to call your representatives NOW and repost this on your facebook wall. This is to protect the innovation that makes America great.

Friday, September 23, 2011

ISTA 301 Blog: The Word "Interactive"

To me, the word "interactive" brings up images of gimmicky edutainment software in a clearance bin at Best Buy. Software is interactive almost by definition, so branding a software product as such does little more than emphasize that this is not just a VHS tape. That's hardly an impressive statement in 2011. Growing up in the 90's, I didn't need a sticker on the box of Kid Pix or Oregon Trail to tell me that this was how I wanted to spend my afternoon.

Ye Olde Kid Pix

I have fond memories of Kid Pix in particular, because in retrospect, I can see that Kid Pix used fun tools to slyly teach kids how to use a Macintosh. While using the dynamite tool to blow up your images made for endless fun, many of the standard paint program tools were there too. The toolbar looked a lot like the toolbar from "real" paint programs, the menus were standard Mac menus, and even the cursors were the same as those used in paint programs at the time. And this was at a time when the Mac was taking the graphic design and publishing industry by storm. I can honestly say that Kid Pix introduced me to a lot of computer skills and interface conventions for the first time.

Recent version of Kid Pix

Sadly, modern versions of Kid Pix have abandoned the "standard" interface in favor of a more "fun" and "Fisher Price" looking UI. I don't know who's idea it is that making everything look like it's made out of colored plastic is going to somehow increase the educational value of the product, but I can't say I agree. The old Kid Pix looked like a real program, and I didn't have any trouble learning to use it. This, in my opinion, is a change for the worse. When I was 6, I didn't want to play with a Fisher Price hacksaw and brightly colored plastic drill bits, I wanted to use a real drill to destroy real parts of my parent's house! Well, it's the plastic drill bits I think of when I hear the word "interactive." I think of safety scissors, glue sticks, and the kinds of tools kids learn to hate because they're not the "real" tools the adults use.

In class, we recently discussed a trend in the art world that began with the cyberneticists in which artists began to shift the emphasis of their work from static representations to more "interactive" works. In particular, these artists were interested in getting the audience involved with the art.

From a 21st century perspective, many of the early attempts at interactivity have been eclipsed in their novelty so many times that works like CYSP-0 seem downright quaint. Today, pervasively (and I'm cringing as I type the word,) interactive art projects are never more than a few clicks away. Take for example the Arcade Fire's recent "music video" which incorporates imagery of the viewer's childhood home from Google Maps and Street View. Or Chris Milk's 3 Dreams of Black. Or the breathtaking Johnny Cash Project.

I hate using the word "interactive" to describe the Johnny Cash Project. It's just too cheap a word to me.

Yes, this blog was a rant about the word "interactive."

Friday, August 26, 2011

ISTA 301 Blog: Authorship

While the question of what constitutes "art" is a topic of constant debate, few would argue that all real art involves an act of creativity. In circumstances where multiple individuals or entities are involved in the realization of a work of art, it is not always easy to single out one individual as the author of the piece.

2011 is the bicentennial of the first performance of Beethoven's 7th symphony. But in hearing this music performed, are we listening to the London Classical Players? Or are we listening to Beethoven? The orchestra merely executes a set of instructions. While there is certainly artistry in the actual performance, the actual content of of the symphony is the work of Beethoven, not the orchestra.

This is not as clear-cut for other art forms. What of the theater? Here it seems that creativity is more evenly distributed between the author and the performers. A modern production of King Lear may have been modified by the producer of the play, and certainly there is more room for interpretation on the part of the actors than there is in the performance of an orchestra (or is there?!)

The common thread between music and theater is that both are art forms in which the artist's creative act and the actual realization of the work are separated, linked only by the passing of information (be it a written play or a book of musical notation).

In the field of information science, entropy is one way of measuring the unexpectedness of a piece of information. For example, a coin toss has 1 bit of entropy because there are two possible outcomes and they are equally likely. Random, in other words.

While there may be tens of thousands of possible combinations of Mozart's Musikalisches Würfelspiel (Musical Dice Game) over the course of a few dozen bars of music, given that we know all the building blocks in advance, each note of music is not particularly unexpected. In fact, we only need to know the first two or three notes of each bar for the remainder of the notes to be predicted with 100% accuracy. In the parlance of information science, the entropy of a given note in musical dice game composition is quite low. While this may seem somewhat nitpicky, it is meaningful to point out that we can represent ALL of the new information that is introduced at the generative stage of the Dice Game with only a handful of bits, simply because there are a very limited number of possible building blocks. Certainly, the dice contribute much less information than the composition itself did. So, beyond the fact that the musical dice game itself is quite a novel idea, it's not as if a lot of unique information is being generated by the dice--the creative credit still belongs to Mozart.

This same concept applies to the Sol Lewitt's Arcs and Lines piece we discussed in class. While the instructions do allow for an element of randomness (or creativity) at the time of the their execution, the finished work can still be described as a unique arrangement of a relatively small number of pre-defined building blocks. Again, the creative work is primarily in the hands of Sol Lewitt because the amount of new information that is introduced at the time of the execution of the instructions is minimal compared to that of the building blocks themselves.

A more technical example would be any algorithm that incorporates a random number generator. It is trivial for a programmer to create code that produces results that are random, but they are always predictably random. What I mean by "predictably" is that a coinToss() will return either heads or tails at random, but it will never return anything else. Similarly, it would be easy to create a program that generates a unique instance of Arcs and Lines, but no matter how many times you run it, that same code will never generate Beethoven's 7th Symphony. It is only "generative" within the framework that is defined by the instructions the artist provides. And since that framework was created by Sol Lewitt, no matter how many unique instances of Arcs and Lines my program generates, he is always the artist.

But where do we draw the line when the instructions become increasingly open-ended? Karlheinz Stockhausen's Richtige Dauern is extremely suspect in this regard. It doesn't take much of a leap from Richtige Dauern to say that I can claim ownership of any creative work by simply issuing the instruction to "play a song." The range of possible outcomes from that instruction is so broad that it would be ridiculous to ascribe any creative credit to the author of the instructions. Even if I say "Play a song in the key of C, 4:4 time, and make the lyrics about falling in love," I can hardly claim ownership of whatever the final outcome is. Heck, you can play dozens of pop songs with the same chords! And yet there's no denying that each of those songs is quite unique.

What I'm getting at is that while I think Arcs and Lines is an interesting concept from start to finish, I think the "art" happened when Sol Lewitt conceived the instructions, not when the instructions are executed. The uniqueness of each implementation is novel, but the particulars of the implementation are not what matters. I also think that the most credit Stockhausen deserves for a performance of Richtige Dauern is to credit him for inspiration.

Sunday, July 17, 2011

Girl

So I've been talking to this girl. She's out of town for the summer, but when she gets back, we're going on a date.

Monday, July 11, 2011

From Python to Objective-C

I'm going to go ahead and take this blog for a technical turn in the hopes that it gives me a reason to actually post on it. I'm trying to teach myself the Objective-C programming language. This spring I took my first programming class and learned the Python programming language. Python is really great, because it's pretty powerful and versatile language, but its syntax and grammar are just about as novice-friendly as you can get and still be considered a "real" programming language. It's elegant and fun, and learning it taught me that I could really enjoy problem-solving by programming.

Sadly, the universe does not run on Python. Next semester I start with Java, but I've also decided I'd like to learn Objective-C. So I purchased Programming in Objective-C, and now I'm slowly learning about all of the things normal programmers have to think about that Python has kept hidden from me.

I'm told that the extra steps are part of what makes Objective-C a powerful language, that (for example) having to define the types of my variables ultimately gives me more control over the code I'm writing. It's hard to see that right now. In other words, in Python I could write something like this to create an object from the class "MyClass" and assign it to the variable mySampleObject:

mySampleObject = MyClass()

So that right there creates a variable mySampleObject, and it allocates memory for the object itself, and it creates the object and initializes it with any default values the class might have laid out, and it points the mySampleObject to the instance of MyClass that it just created. It's also important to remember that mySampleObject isn't actually the object, but just a reference to it.

In fairness, there is actually a lot going on behind the scenes that Python keeps hidden from the programmer. That's really nice, until you try to learn a language that doesn't hold your hand so much.

So as I'm currently understanding Obj-C, and fair warning, this may be wrong or misleading, when I create an object in the normal way, I might say something like this:

MyClass *mySampleObject = [[MyClass alloc] init];

and this accomplishes the same thing as the the Python code that precedes it. Except it's not quite the same thing. In Python, if i wanted to point the mySampleObject variable to something else, I can. I can even point it to a totally different type of data, like a string or an integer. Not so with Objective C. With my code above, mySampleObject must always point to an object that is an instance of MyClass. I can't change it to a different class or make it point to an integer.

So then I guess Objective-C has a datatype called "id" which can point to any kind of object. But I'm not going to learn about that until chapter 9 :(

Sunday, June 26, 2011

Well, aren't I great at blogging? Since my last post I've decided to put Guatemala on hold (and then a few weeks later found out I didn't get the grant anyway). Now I have a foot-in-the-door type job with a company that I could potentially stay with for quite some time after graduating.

I'll soon be starting my second semester as an IST major. I'm deliberately slanting my electives towards Computer Science, so I've got learning Java on the menu in the coming months. I enjoyed learning Python last semester quite a lot, and I've even been chipping away at some of the problems on Project Euler over the summer. Hopefully the transition from Python to Java is gentle enough that I don't go too crazy. Then again, it seems I'm always overestimating the difficulty of my academics. I think I have pretty good excuses though. Last semester I was taking a more difficult math than I'd ever taken and learning my first programming language. I thought it was going to be a lot of work, so I ended only taking 13 credit hours instead of 16. Then it turned out that only one of those classes was very challenging. This semester, I'm still only taking 4 classes, but I'm having trouble even finding a fifth class that fits my schedule and meaningfully contributes to my degree. Maybe a calculus class at Pima? Then again, I'll probably be working more hours during the school year now. Hmm...

My new job has me working from home. I've set up a nice little nook with my work computer in it, and got myself an ergonomic chair that's tailored for tall people. I've also been riding my bike regularly in a way that could legitimately be considered a workout for the first time since last year's Tour de Tucson. My goal was to make it to the top of A Mountain without stopping by the end of June, and I made it about a week ago. On Thursday I completed the loop around the summit that contains about half of the climb two or three times before descending all the way to the bottom. Here's the statistics on a ride that looped the summit twice. Now I need to find a longer ride. I've been thinking about taking a shot at Gates Pass (that data is from me driving my car around my potential route). It'd be an incredibly beautiful ride, and there's not very much traffic, but there are also some pretty long stretches where there aren't any bike lanes. That's kind of scary. Anybody know any good west-side-of-Tucson bike paths that are suitable for a road bike?