Gursimran Ahuja, IIM A, CAT 2015 – 99.83
iQuanta CAT 15 Student, (Originally shared on Quora)
Location: Taj Chandigarh, March 5, 2016
Profile: Xth: 95%; XIIth: 97.4%; Grad: 8.56; CAT: 99.83 (first attempt)
WAT Topic: A case to evaluate thecorrectness of a decision taken by the LEGO board to fire the Chief OperationsOfficer for introducing a product based on the Harry potter theme, that was notreally in line with what the company had been doing all these years.
Duration: Approximately 40 minutes
I was second in my panel, meaning that I got just enough time to straighten myself after the WAT when I was called in.There were two panelists-let’s say P1 and P2.
P1: So, you are a final year IDD student. Surely you must have a project you are working on?
I: Yes Sir, I am working on a project that aims to increase the robustness of the Speech Recognition systems that we use in our day to day life.
P1: And how do you plan on doing that?
I: Most of the systems presently in use, like S-Voice, Siri etc. falter a lot when it comes to recognizing speech in noisy conditions. I, for one, can never get my S-Voice to work on a train. So, we try to incorporate the video component into the audio one so that both can be used to get a better outcome. My research, in particular, is based on reducing the errors that changes in a person’s pose bring about to the video component.
P1: How much difference do you think it makes? Introducing the video as well? Is it worth the costs involved?
I: Sir, it does make a lot of difference, especially when it comes to speech recognition in noisy conditions.Technically speaking, introducing video increases the signal strength by somewhere around 8-10 dB, which is too huge a difference.
P1: So, if I am sitting facing my webcam, how will your system recognize what I say? Do you take the whole face?
I: No sir, we basically just consider the lip and jaw region to study the spoken word. The first step in the design of any such software is to segregate the lip region and construct a model out of the features. Most of the time, it is a Hidden Markov Model with a probabilistic outcome determination using the Viterbi algorithm.
P2(who was just listening all this while) : But, how do you plan on handling that pose thing?
I: (frankly, I hadn’t done this part of the project yet, so I knew just some theoretical background and nothing else) Sir, most of the research till now has just solved this problem by means of a regression matrix wherein the side pose is interpolated onto the frontal one to normalize the views and cover up on the missing links. (I still don’t know why I said “interpolated”)
P2: Interpolated? (Passing a paper to me) Can you explain the mathematics?
I: (should have said I don’t know much about it, decided to go ahead with what I knew anyway, probably a low point in my interview) Sir, for instance, if the frontal region of the lip is given and the side pose, we construct two matrices based on the appearance-based models that we draw out of the lips and then we have to find out the regression matrix averaged over a series of subjects. (drew two lips on the paper and wrote the matrix equation)
P2: What is in these matrices?
I: Sir, it is the pixel values for the seven strategic points on the lips that we use to demarcate the boundaries.
P2: I don’t get it. Okay, you have a matrix, you have all these equations and models. But, how are you going to get the pixels?
I: Sir, we have algorithms to mark the boundaries of the lips, including the crossbow structure and extremities. I have used one of those algorithms itself that use the hue and luminance to get the lip boundaries.
P2: But lips are so different across geographies. How do you plan on handling that?
I: I agree that people from different ethnicities have different lip structures and colors. But the algorithms that we use focus on the difference in the color of the skin and the lips, which works fine across the different ethnicities.
P2: Can it be used commercially? This project of yours?
I: Sir, though I am now working on a limited vocabulary based system, but it can of course be scaled up to create a vast database across different phones and consequently, the words as well.
P2: You mean if I say “apple”,your software would not even recognize it? What good is it then?
I: Sir, every research begins with the most basic of problems that need to be solved in order to create a system that is free from every kind of error that an end user may experience.And, in any case, this project, in its present state, can be scaled up to incorporate different phoneme combinations.
P1: You know, I still didn’t get that equation of yours? Explain again.
I: (not this again!! I explained it the same way as before, but I think that he saw through the fact that my knowledge on this part was limited)
P1: Have you actually implemented this pose problem?
I: No Sir, I still have not. I know the theory and the models on which it works, but I have not implemented itin code form yet.
P1: Okay. Tell me what else do you do?
I: (Phew!!) Sir, I have been a freelance writer for a US-based virtual assistance firm since 2014.
P1: What do you write?
I: Initially, I did projects for the firm itself, based on medical virtual assistance, virtual offices etc..But, now, in addition to that, I have also started writing health and lifestyle blogs for a client of theirs.
P1: Do they pay you? How much?
I: Yes Sir, they do.
P1: And what else do you like to do when you are free?
I: Sir, I am a very avid reader.And by that I mean, I like to read everything, right from something like “Confessions of a Shopaholic” to something along the lines of “Fortune at the Bottom of the Pyramid”. And I am a Kathak dancer as well.
P1: Can you tell me how many gharanas are there in Kathak? And who are their major exponents?
P1: What is the difference between these gharanas?
P1: What is the latest book that you have read?
I: Sir, I have read ISIS: Inside the Army of Terror by Michael Weiss and Hassan Hassan
P1: What is the ideology of ISIS?
I: Described Salafism and other aspects related to the misinterpretation of Quran by the group.
P2: But, why did terrorism become so big after 1990 only. Muslims were there. Quran was there. Why now?
I: Sir, the spirit and thought process in which terrorism finds its roots has been around since time immemorial. But, in its present form and ideology, I think it had a lot to do with the radical literary interpretations of Quran around 1990. (Mentioned Syed Qutb and his works)
P1: Do you think producer or consumer societies are the major victims of terrorism?
I: I think consumer societies are not just victims, but they contribute to the perpetration of terrorism as well( Gave the example of Pakistan and its tryst with terrorism in the wake of a floundering economy)
P1: (to P2) Anything else?
P2: No. Just take a candy from the bowl. Thank you.
I: Thanks a lot
All in all, I feel I could have done better, especially the academics part. But, all is well that ends well. Looking forward to life at IIMA.