Students can now help build Project Manav - The Human Atlas

Pune

21 Sep 2020

Students can now help build Project Manav - The Human Atlas

^{Image Credits: Project Manav’s Facebook Page}

In 2019, the Department of Biotechnology (DBT), Government of India, in collaboration with Indian Institute of Science Education and Research Pune (IISER Pune), National Center for Cell Sciences, Pune (NCCS) and Persistent Systems Limited, Pune, embarked on an ambitious project. They launched Manav - The Human Atlas Initiative, the first such project in the country. It aims to create the complete map of the human body at the tissue, cellular and molecular level using information available in scientific literature and public databases. The atlas helps to gain a holistic understanding of the human body by studying how the organs and tissues behave under healthy conditions and during disease. It can help fill gaps in our knowledge about the human body and find better or targeted medicines and therapies.

The first phase of the project is set to be completed by the end of 2021. During this phase, the team will collect information about the human skin as an organ, and make this data available as a comprehensive, easy-to-look-up knowledge source. Based on crowd-sourcing, the project now invites students and faculty across the country to contribute to its databases that can benefit the scientific and medical community. Students who participate in this endeavour can develop the skill of reading scientific literature and assimilating the relevant content.

New research often builds on earlier work, the information about which is scattered across numerous research articles, reviews and databases published in various journals. Marking important text in a given article is called annotation and is similar to highlighting with a pen while reading a book or an article in print. Researchers commonly use annotations to collect and connect information from the literature they read. "This allows researchers to understand and make correlations between scientific content in the papers and also helps identify prominent methods and tools used in the study," comments Nagraj Balasubramanian. He is a professor at IISER Pune and is leading the Manav initiative in his institute.

^{Outreach activity held last year in University of Pune [Image credits: Team Manav]}

The students and researchers who sign up for the project are assigned scientific papers or articles (specifically focussed on skin biology in the first phase), one at a time. They read the text and add annotations, which the platform software captures and stores in a database. Researchers will then be able to access this collated information easily, saving much of the time spent in collecting and connecting information pieces from the enormous scientific literature. About 15000 students, 250 faculty and 160 reviewers and 140 experts have signed up for the project.

The annotations students make will go through a two-stage review. "In addition to student contributors, we will also need a pool of reviewers and expert reviewers," notes Archana Beri, the Project Manager of Manav. "We are looking at senior PhD students, post-docs, scientists, doctors and faculty to contribute as reviewers," she adds.

The platform will also let contributors label and bucket the annotated text into multiple defined categories. The text could be related to the structure of an organ, diseases that affect it, drugs which can be used to treat them, or genes and pathways associated with a cellular process. "The database of such annotated text will have information from various fields related to a specific organ or tissue. For example, when one looks for 'skin fibroblasts', they should be able to get pointers to all information related to 'skin fibroblasts'. That includes data on replication to survival and migration of fibroblasts," explains Nagaraj.

The Manav annotation platform is built using open-source and in house developed tools and is a customisable and reusable software solution. It also has a built-in review system for manual annotation by the curator. The team has carried out a proof of concept during a workshop, engaging about 100 students, that verified annotation guidelines, data capture and data validation. As a next step, the project plans to automate the categorisation of papers and their allocation to students.

"Making detailed and useful annotations is a lot of work and can be challenging due to the subjectivity introduced because of the diversity of expression. The Manav team is also exploring machine learning and artificial intelligence-based approaches to annotate, curate and represent the data," says Krishnasastry of NCCS. "The platform also has an option to plug-in Machine Learning-based auto-annotation," adds Project Investigator Anamika Krishanpal of Persistent Systems.

^{One of the webinars for outreach [Image credits: Team Manav]}

Project Manav conducts several outreach activities to introduce students to scientific reading and familiarise them with data science and its applications, especially in biology. "The original plan was to conduct seminars in colleges and university campuses. However, due to COVID-19, we are now conducting webinars," informs Nagaraj. The team has conducted about 70 webinars on "How to read scientific literature", attended by more than 7000 participants so far. It has also initiated a data science webinar series conducted by scientists from across the globe, which is archived on Manav's Youtube channel and is publicly available. More than 4000 participants across the country have attended the 12 such webinars conducted so far.

"Our current task is to reach out to the students and faculty so that they are aware of what the Manav project is aiming to do. We hope to get them to be interested and excited to be part of this national initiative," says Nagaraj. Later on, they plan to have academicians, researchers, clinicians, and pharmacists on board to build the database further. The project team is also exploring ways to concisely represent the collected knowledge with an engaging user interface that also allows exploring multiple combinations of data.