Diving Into the Delights of Data Science
Allan Miller has had a long career in data science and analytics. In his tenure, he has seen the field grow from an offshoot of statistics into becoming its own entity.
While working his way through graduate studies in economics at UC Riverside, Allan got the chance to work on an early computer system: an Altos ACS 8000 with 64K of memory to be shared among all its users! That early computing experience gave him a front seat to the development of statistical tools and languages that would coalesce into the field of data science.
From that experience came a data science career, with some of his latest highlights below:
- He was the first analytics specialist at Zendesk. (2010–2012)
- He was senior data scientist at Hipmunk. (2014–2015)
- He recently was a senior data scientist at LinkedIn Learning. (2015)
- He has taught programming and machine learning courses with us for the past 12 years. (2009–current)
- He helped create our Certificate Program in Data Science and still sits on our Advisory Board.
“People who go into this field should have some passion for working with data,” Allan shares. “I feel very fortunate that it was sort of the first thing that I learned. And then, over the years I've worked with so many different types of data—from health care to marketing to clinical studies to genomics data. If you have an interest in a particular domain, that's one thing that can really attract you to working with data.”
As I chat with Allan about his career and how other burgeoning professionals can jump-start their data science endeavors, I can’t help but catch his enthusiasm for this rapidly growing field. The Bureau of Labor Statistics states, “Employment of computer and information research scientists is projected to grow 15 percent from 2019 to 2029, much faster than the average for all occupations.”
“I think a lot of people are attracted to this field because it's such a hot career area right now,” Allan says. “There's such a huge demand for it at the level of business analyst, data analyst and data scientist, so that's certainly a good thing. There's nothing wrong with getting a skill that you can earn a living on.”
Broad Opportunities in Data
But more than that data point, you get the sense from Allan that there are all sorts of opportunities for data scientists, ranging from social science to pure theory.
Allan refers to the field as a “spectrum,” morphing from business analysis to data analysis to data science.
One End of the Spectrum, Business Analysis
“These are separate compartments that kind of spill over into each other,” he explains. “I would put on one end what you might call a business analyst. A business analyst is somebody who's very rooted in the domain in which they're working, such as marketing or sales. Typically, they use a lot of GUI tools like Microsoft Excel or Tableau. They're doing a lot of descriptive reporting and analysis, working with the data that people give them.”
Continuing Through to Data Analysis
“And then,” he continues, “you have what I would call a data analyst. They would be doing things that might involve just a little bit more in terms of data acquisition. For example, they might be accessing databases and actually running queries to pull data sets down.
“They would still probably be using a lot of Excel. They're also doing that descriptive reporting, but adding data visualization and the communication of results. But they might be doing just a little bit more in terms of preparing the data for analysis. And then they actually start to get into some basic inferential analysis, inferential statistics and predictive analytics (looking at the statistical differences between subgroups of their market).”
Arriving at Data Science
Allan continues his explanation of the spectrum, “Then you take the next step over to what I would call data science. All of those previous aspects are kind of wrapped up quite a bit more, so the data acquisition and preparation part can actually get to be quite complicated, both in terms of the complexity of the data and the types of transformations that have to be made on the data.
“On the data science side, you’re doing a lot more sophisticated predictive analytics using more complex predictive analytics algorithms that don't have the kind of restrictive assumptions behind them that linear regression has, for example. The data starts to get much more open in terms of the complexity of getting the data, the larger scale of the data, and then the type of predictive analytics and visualizations that you do.”
Your Entry Point for Data Science
“Transitioning into data analysis is actually a good stepping stone for people who have degrees in other fields,” Allan explains. “I'm somebody who's like that, because my degree is in economics and I caught the computer bug early on doing data analysis. I went back and did a master's degree in computer science from Mills College because I was so interested in the application of computers to data analysis.
“There really weren't these kinds of programs back then, and the UC Berkeley Extension certificate is actually quite unique because it is designed for people who have, for example, a marketing background or a clinical one to succeed in data science.
“In one of my classes right now,” he continues, “I have a clinical data analyst at UCSF who's working in clinical trials. I also have a petrochemical engineer in that class. I’ve had Stanford Medical School professors in my class who are doing research projects. It's a good vehicle for people who have the education and maturity and experience to move into the field—an especially good first one, I think.”
Extensible to Non-Tech Interests
It’s not just “propellerheads” or techies who will find value in the certificate program. Allan himself is a fan of social science, and is working with a grad student in archeology who is performing petroglyph analysis.
“She is doing rock art analysis at a world heritage site in Nicaragua of petroglyphs from the pre-Columbian era,” Allan enthuses. “She's trying to use statistics to try to tie it back to some of the ethno-anthropology that people have studied in terms of tribes moving through the region. “And so to me, this is like, ‘Wow!’ What a fantastic opportunity to learn about rock art. I'm working with this really interesting material.”
Really an R Person
Of course, you still have to use some pretty sophisticated tools and programs to properly analyze data. That’s where our Data Science certificate comes in.
One of the courses that Allan teaches is Introduction to R: Data Exploration and Visualization. He sees the open-source R: programming language as fundamental to the pursuit of data analytics and science.
In Allan’s own self-description, he calls himself “really, an R person.”
He continues our conversation with a deep dive into the ways that R is a critical tool for data scientists. “First, it's a language that grew out of statistical analysis, so there's this rich community of people who are theoretical statisticians who have developed the science of data science and created these really complex algorithms.
“Another part of it is that R is an open-source platform. There are over 11,000 software packages, and there are people doing work in everything from market analysis to genomics. A lot of this stuff is free; it's all open source and there's a real community behind it.
“Just to give you an example,” Allan shares, “I worked for a water resource management consulting firm some years back and we ran into this problem of a very proprietary, very complex and outdated data format that the Army Corps of Engineers was using to do water resource management-related analysis. Turned out that there was a group of people who had taken this data format, developed a way to import it into R and published it on the web.
“So you can understand that this science has been developed in R in many cases,” Allan says, “and it really helps to have the understanding of the language because of the manner in which it is presented, and its structure really helps you learn the material.”
Meet Up With R Enthusiasts
Allan is such a devotee of R that he helps to manage the East Bay R Language Enthusiasts Group. We had first heard of this cohort from program graduate Joseph Walker, who had raved about the connections that he made in the program and in the Meetup.
Allan himself gets a lot of satisfaction from meeting with like minds.
“We meet once a month to discuss projects and problems,” he shares. “Now we have almost 2,000 members! COVID-19 kind of slowed us down a bit, but we're actually just getting ready to start up again. It's a club of people who are sharing experiences, and you don't have to be a world-class statistician in order to participate.”