Saturday, May 4, 2013

The Trouble with Data Mining


I don’t know what the right way to go about this would have been, but the way the professor chose was definitely one of the wrong ones.

The class was data mining. It’s my second to last class in the MSIT program and it’s the first one that has required any coding. The syllabus for the class said that programming experience was a prerequisite but it is not the first syllabus in our course of study to say this. To us it doesn’t really matter what the prerequisites are—we take the courses as they happen. It’s not as if, knowing the prerequisites for this course, any of us would eschew the class (which won’t come around for another 6 months to a year) so that we could take an intro to programming at a community college.

Having prerequisites outside the programs course of study is impractical—like having Wonderland Station on the Blue Line as a prerequisite to get off at Harvard Square on the Red Line*. Which is not to say that the professor was wrong to expect graduate students in an IT program would be able to write programs—just that he went about it the wrong way.

Our data mining professor is a tall, cheerful Indian man of about my age (perhaps he is younger). He is teaching many courses concurrently and so asked us to set the course title in the subject line of any e-mail we sent him. On the first night he asked a version of the standard MSIT professor’s questions of us all—What’s your name, where do you work, what did you get your undergrad degree in and what is your programming language experience?

A third of us (including me) have had no programming experience within the last decade (or none at all.) In spite of this, the professor insisted that programming was an important part of data mining and so we would all have to learn a programming language. On the spur of the moment he decided we should all learn Perl. He recommended getting the O’Reilly books.

Our classes are 6 weeks long—5 weeks really, since there is usually a Monday holiday during any given 6 week period. The idea of learning a programming language along with the course material (which was rather-math heavy) was daunting. At the second class the professor announced that he was giving us a homework assignment to work on over the next few weeks including a few Perl programs we would have to write. The third week of class was the week of Marathon day so there would be no class. I could fit all the Perl he'd taught us by the end of the second week on my thumb nail.

I spent most of the 2013 Marathon weekend trying to learn Perl. To do the homework we had to install Perl/Perl DBI and get them to work (or work on a command line Linux server in the cloud) and then write Perl scripts that would get a bunch of data from a CSV into a SQL database and then pull it out again and report statistics on it (max, min StDev etc.) The second of the two programs was supposed to draw a graph of the data.

This seemed impossible. And that was depressing. This was a graduate level course. I am a professional geek—shouldn’t I be able to rise to this challenge? Haven’t I faced more daunting tasks—situations where I had no idea how to get from A to B and somehow succeeded anyways? Both professionally and academically?  Maybe I wasn’t as smart as I thought I was. Didn’t I want to learn Perl and Perl DBI? Of course I did! So what was I complaining about?

Worse, my study-buddy sent me e-mail the Saturday after the professor gave us the assignment entitled “Don’t procrastinate on the homework.” Shit oh dear. He’d already spent 7 hours on the programming portion and wasn’t done. I had progressed from “Hello World!” to a few simple scripts that did math —but nothing that approached the complexity of the homework assignment. It if took this guy 7 hours I was toast. And if I was toast, what was the rest of the class—that dry piece of bread that caught fire when I stuck it in the microwave?

I admit I was enjoying working my way through the first few chapters of the O'Reilly Perl book, but I felt like I was playing around in the shallow end of the pool and I was not sure I was going to be able to swim in the deep end by next weekend.  There was only one problem in the problem set that involved Perl. I think in the end some of my colleagues gave up and just didn’t do that problem. I never considered that option because I figured this was not the last Perl problem we were going to be assigned. Also, there was my pride. 

It became apparent that most of my cohort had not started the homework on the first weekend—or at any rate hadn’t considered the Perl portion last weekend. E-mail about how hard this was flew back and forth from all members of the cohort. Finally the professor sent out a note reiterating “given that we are in an IT Program, and in a Data Mining course, it is reasonable to expect that we will implement Data Mining in code.”  And “Inevitably, some students will end up having a hard time, others will have an easy time, and some will find it to be just right.” I found this statement to be callous. I don't think he had any idea how hard we were finding it to make our way into the problem set he had assigned us. 

But the real show stopper was that he had cleared all of this with the dean. This meant that none of us could appeal to said dean about the difficulty of the homework. 

My problem with the data mining professor was that he decided we should all go off and learn a language on our own—and then do all the data mining coursework involved. I will admit that the Endicott program has spoon-fed the MSIT cohort to a certain extent, so it took me a while to decide if I was upset because this professor was asking something unreasonable or if I was upset because he was asking us to just work a little harder than normal. I decided he was asking us to do something unreasonable (given the 6 week course length.)

I e-mailed the dean and told him that in my opinion, if the MSIT students were expected to program it would be helpful if the course of study included a class in programming.

The dean responded by saying that the last time he had tried that the course had been a disaster. I mentioned this to a study buddy of mine who pointed out that it would be ridiculous to have a $17XX course teaching what anyone could pick up at a community college for $3XX—how much programming can you teach in 6 weeks anyways?  I admitted he had a point, but what is the right solution then—warn MSIT applicants that they will be required to code?

Meanwhile I miraculously found my way into the homework assignments. With a lot of luck (and a while loop that the professor gave me) I went from staring at Perl forums online and whimpering to writing two programs that compiled and ran. They didn't do everything they were supposed to do and they were not elegant, but they moved me from the side of people who couldn't figure the assignment out to the side of people who could. Suddenly I didn't hate the professor so much.

Meanwhile the professor had grasped that he was asking us to do something a little out of the ordinary. He  offered a tutorial before class—starting at 4:30. Most of us were there for it. He then spent most of the class teaching Perl—which was great except that we didn’t learn any data mining. 

In the parking lot after class some of us discussed the situation. We felt a little bad for the professor, but most of our sympathy was reserved for ourselves.  “You can’t just read Perl code to people after 9 PM and expect them to get anything out of it.” I opined.
“Remember what the dean said about the last time they included a programming course?”
 “It was a disaster!” we all said in chorus.  

Over the course of the next week it became clear that we had convinced our Data Mining professor that we really couldn’t write Perl code (or at least that he was not the man to teach it to us.) The professor sent out grades for the Perl assignment (graded on a very lenient curve) and a new assignment with no Perl in it. He also sent out a final project assignment—a research paper on data mining (initially it was supposed to involve getting a data set and doing some data mining).  We had broken his spirit. I am sorry anyone had to get his/her spirit broken but better him than us.