As I shop online these days, my eyes keep hovering over the “You may also like” section in the website. It’s like you are shopping with your best friend who knows exactly what you are looking for. I used to wonder how these websites always know what I might like. Technically speaking, this best friend is a recommendation engine. Based on my activity in that site, it learns my needs and suggests products of my interest based on the choices made by a similar user like me in the past. Things like these are helping industries serve their customers better and gain a high customer satisfaction. This field of study is Machine Learning.
Amidst the 400 acres lush green campus of IISc, sits a modest Department of Computer Science and Automation, which has a rich history in 'pattern recognition': the science of finding regularities in datasets. A few labs in the department work on a related and exciting branch of computer science called 'machine learning': the science of 'teaching' machines to recognise patterns in data, and learn from them. Prof Chiranjib Bhattacharya's 'Machine Learning Lab' is one among them.
Professor Chiranjib Bhattacharyya (fondly known as Prof Chiru) joined IISC in 2002 having worked as a postdoctoral fellow in the University of California, Berkley after his PhD with Indian Institute of Science, Bangalore in 2000. He founded the Machine Learning lab in 2003 with the aim of tackling practical problems that require deep thinking and push theoretical boundaries. A number of students who did their masters and PhD projects have continued their education in world renowned universities abroad and some of them have taken up academic positions in elite institutions like the IITs. The lab also works closely with the industry to solve problems that are relevant to them.
In what can be seen as a quality of training the students get in the lab and the department, a group of masters students won the prestigious KDD cup (Knowledge Discovery and Data Mining Competition) in 2006. The students developed an algorithm to analyse a large number of Computed Tomography Angiography images and detect 'pulmonary embolism', a condition that occurs when an artery in the lung is blocked. Manual reading of these slices is laborious, time consuming and complicated. The lab is proud to call this a student driven high quality engineering work. This is one example of the lab’s belief in the true engineering sense of solving a practical problem at hand, with the theoretical concepts learnt in books.
The backbone of most of the lab's work lies in the three key foundational areas – statistics, optimization and algorithms. Though the lab works on two broad areas of machine learning: 'unsupervised learning' and 'supervised learning' the methods are same: build empirical models that can help computers recognise patterns and learn from them. Lab's work has been well recognised not only by the quality of the students it produces, but also in international conferences. A paper presented by the lab members won the Best Paper Award in the prestigious SIAM Data Mining Conference in 2011, and Best Paper Runner Up award at PAKDD (Pacific Asia Knowledge Discovery and Data mining) in 2009.
Today, we sit on top of a consumer market. There are tons of products available in different configurations and produced by different vendors. Imagine, you decide to buy a DSLR camera one day. Being a novice in photography, you browse the online shopping catalogue for options. As you read the product descriptions and reviews, you start to learn the different attributes you need to look out in any DSLR camera you encounter. For example, ISO, weight, battery life, screen resolution, etc. From now on you compare two DSLR cameras based on these attributes. You also browse the user reviews to understand the opinion of the current users on these attributes. You finally pick the one that is closest to your needs. Now, imagine filling this intelligence to a machine. The entire task that ran in your brain will now be done by a machine. The machine works out the probability that you may like a particular product and lists the highest ranked ones for your selection. The lab thrives to derive the most efficient way of solving such problems under an area called the “Unsupervised Learning”.
Imagine teaching a young child that a particular fruit is "Apple". We go to a market with a child and point to a fruit and say that it is "Apple". After doing this several times we often see children identifying the fruit "Apple" correctly. This is an instance of "Supervised Learning" where in the Training phase we provide the supervisory signal signal by identifying the fruit. "Supervised Learning" is one of the fundamental paradigms in Machine Learning where during Training observations are provided with supervisory labels. The goal is to build a machine which when presented with a new observation could correctly predict the labels. Supervised Learning has been extensively used and remains an active area of research. Recentlly this lab studied the question of noisy observations in Supervised Learning, an important open problem with many applications. There is a need for a robust predictor which can work efficiently irrespective of the noise in the data. This problem was motivated from a Computational Biology problem, where the goal is to derive a learning machine which could correctly predict the class of a protein structure. Protein structure is often obtained by X-Ray crystallography which often tends to be noisy. The lab collaborates with Israel Institute of Technology, Technion, to develop predictors which are robust to noise using the principles of Robust Optimization. At the core, this is a biologists’ problem but the lab is trying to work on ways technology can help them solve it efficiently.
Most of us use social media in different forms. Consider you are an avid Facebook user with a set of friends in your friends list. If we can plot this relationship in a graph, you and all your friends would be nodes in that graph and an edge would exist between you and each person in your friend list. Your mutual friends would have edges between themselves. Now, try to imagine such a graph for all the Facebook users in the entire world. We could see dense subgraphs in this graph where more people are connected with each other. These can then be called communities. Identifying such communities helps recommending right friends to a user based on the community he belongs to. To do this, a famous graph theory concept can be used - Lovasz theta function. The objective of this function is to find dense subgraphs in a graph. But what if you want to link to a friend who isn’t in Facebook, but in twitter. This would need the algorithm to work on multiple graphs. Unfortunately, the Lovasz theta function doesn’t work on multiple graphs. Also, the performance is compromised as we scale up the graph size. Machine learning has come to its aid. The lab has collaborated with the Idiap Research Institute, Switzerland to work on such large scale optimization problems.
Currently, with the advancements in technology, high volumes of data is made available. This is a challenge from the past that has been overcome successfully. With more number of world’s entities getting added to the internet, more data is generated and stored every day. For example, smart energy meters that report electricity usage at your house, wearable sensor system that can remotely read a patient’s biometrics, intelligent wireless sensors that assess your plants’ needs, etc. Correlation in this data can open up unforeseen insights about the system. The lab is well poised to work on such technologies in the future.
This is just a glimpse of the interesting real world problems that the lab is trying to solve. The lab holds joint patents with corporates for solving some key problems like review mining, storage workloads, handwriting recognition, etc. With Machine Learning touching every aspect of our life, a lab like this provides a great platform for students to experiment their ideas for the betterment of the society. As the lab prepares itself to tackle real world problems, the future truly lies in nurturing student startups.
Prof Chiranjib Bhattacharyya, Convener of the Machine Learning Lab, can be contacted at 91-80-2293-2468.