By Janaina Pereira and Tomas Kasza
“There is a sea of data and it might be useful to learn how to sail on it.”
In the past few years, we have generated a gigantic amount of data. Technologies such as next-generation sequencing, digitalization, cloud computing, and even smartphones have provided a massive amount of data. This data has become more and more accessible. If you stop and think about what you did today, you may find that you have contributed many drops into this sea of data. The picture you have posted on social media about your lunch, the review that you have written about a new favorite restaurant, your opinion about that new trending cosmetic on a survey or even the digital form that you filled out in a doctor’s appointment. All of these data points can be very useful to answer different questions, you just have to learn how to use them. For instance, the picture you posted on your social media account can be used to help identify faces or food. Data science is a field that uses specific strategies to find meaningful information in big data. As data generation has increased over the years, the need for a professional to extract meaning from data has arisen in companies across diverse industries. Therefore, data scientists have become experts with in-demand skills. To learn more about this growing field, iJOBS recently promoted a panel about this subject, where data scientists with different backgrounds discussed their career paths. [caption id="attachment_2994" align="alignnone" width="800"] Image source: https://resources.whitesourcesoftware.com/blog-whitesource/bigger-data-bigger-problems-three-major-challenges-in-big-data-security[/caption] The event started with a quick talk from each one of the speakers about their respective career paths and backgrounds. The first speaker was Dr. Ariella Sasson, a Senior Research Investigator at Bristol-Myers Squibb (BMS). She has a bachelor’s degree in mathematics and a Ph.D. in computational biology from Rutgers University. Her Ph.D. work was focused on the technical and analytical aspects of Next Generation Sequencing. At BMS, Dr. Sasson often develops pipelines and storage solutions for genomics, transcriptomics, and proteomics from preclinical and clinical datasets, work that allowed her to develop expertise in how to extract information from big data. Next was Dr. Yodit Seifu, a Senior Principal Scientist at Merck, who holds a Ph.D. in statistics from the University of Toronto. She started working on oncology in the pharmaceutical industry and over the following 19 years has built up an impressive background in data analysis for Phase I-IV clinical trials and registries. Currently, in her role at Merck, she is responsible for providing statistical support to the Safety and Risk Management group. [caption id="attachment_2996" align="alignnone" width="700"] Image source: https://bvijtech.com/big-data-future-of-big-data/[/caption] Then we heard from Dr. Matthew Koh who works at Bloomberg as a Machine Learning Engineer. He holds a Ph.D. in neuroscience from Cold Spring Harbor Laboratory. Before changing careers, Dr. Koh participated in the Insight Data Science Program in 2017, which helped him to achieve his position at Bloomberg. Finally, we heard from Dr. Alexander Izaguirre, a Chief Data Officer and Sr. Assistant Vice President at New York City Health and Hospital. Dr. Izaguirre holds a Ph.D. in viral immunology from Rutgers University and started his career in academia as an assistant professor at Robert Wood Johnson Medical School. Later on, he transitioned to an IT leadership position at New Jersey Medical School and Executive Director of the Office of Information Technology at Rutgers University. In 2013, Dr. Izaguirre founded his own start-up called “Aprenda Systems,” which is focused on solving data challenges between payers, providers, and hospitals through the use of big data. Throughout his career, Dr. Izaguirre has received many awards on technology and innovation. Data, the next Frontier In the second part of the event, we listened to how a diverse set of panelists had found themselves at the forefront of the data frontier. Several of the panelists had obtained their Ph.D. before the data revolution had been kicked off. Now they find themselves leading teams to try and extract meaning from the seemingly endless and vast data sets that are being generated. Most of the panelists had graduated a decade or more before this panel occurred, so they described their current management and hiring problems. Shockingly, the panelists reiterated how coding experience, while appreciated, was not required on an application. Employers are in search of passionate employees who have a desire to seek out training themselves. There was a noticeable sigh of relief from the audience when the panelists said how programming experience was not required. Many of the audience members were afraid that coding would be a prerequisite to applying for a job as a data scientist. The panelists explained that while employees could be taught the necessary programming skills, those same employees could not be taught how to be passionate about data and data analysis. The panelists did mention, however, that there is a coding interview for some jobs, but the test can easily be passed with basic programming training. The panelists also mentioned a website called HackerRank, which posts several common coding interview questions to help interviewees get practice for the coding interview. Potential Training resources I have found, as the panelists described, that there are many online resources available to help train and familiarize anyone who would like to learn to code. For those that would like to get started in data science before applying, the panelists suggested visiting several different online teaching websites including coursera, EdX, and datacamp. In addition, they also suggested completing projects, basically taking a data set and extracting meaning from it. The panelists said that completing practice projects works best when you experiment around and challenge the practice data set with insightful questions. The main two programming languages discussed were python and R. Dr. Matthew Koh described how he uses python almost exclusively at Bloomberg, whereas the other panelists used R. The software environment R has useful libraries with complex built-in functions, so you do not have to be an expert code writer to solve data science tasks! Specific questions: After the panel discussion, we broke into small groups where we had the opportunity to ask questions to each of the panelists individually. These are questions that were asked to Dr. Koh and Dr. Izaguirre. Q (Audience): Is there any computational model-building in industry data science jobs? A (Dr. Matthew Koh): There is very little in biomedical sciences, but there is some in the finance industry. Computational models are less common within biomedical sciences because they are not applicable yet to any model systems whereas building computational models is applicable to financial markets. Q (Audience): What are the hours of a typical data scientist? A (Dr. Alexander Izaguirre): That depends on the boss or who you work for. For some bosses as long as you get the work done on time you can show up whenever. For others, it seemed like 9-5 was mandatory. There also seemed to be a lot of meetings to go to but that’s typical for any job. Are you a potential data scientist? From the panelist's comments, data science careers are plentiful, and employers are hiring inquisitive minds to work on extracting meaning from large data sets. The panelists were handing out business cards and clearly looking for potential employees. There is no denying that a fresh Ph.D. who enters data science will make much more than other potential industry jobs. Our panelists came from a diverse set of backgrounds that did not necessarily include computational training; this suggests that there are diverse paths leading to a career in data science. Read some of our other blog posts about data science to find out if it is the right field for you! Junior Editor: Brianna Alexander Senior Editor: Monal Mehta and Tomas Kasza