Information Age Education Blog
Big Data: A New Facet of Science Research
All of the sciences make use of theoretical, experimental, and computational approaches as they work to advance their frontiers. This IAE Blog entry discusses a fourth approach that is now becoming quite important. It is based on analyzing huge sets of data (Big Data) that are gathered through the use of computerized data acquisition devices. Quoting from the Wikipedia:
Big data is an all-encompassing term for any collection of data sets so large and complex that it becomes difficult to process them using traditional data processing applications.
The challenges include analysis, capture, curation, search, sharing, storage, transfer, visualization, and privacy violations. The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be found to "spot business trends, prevent diseases, combat crime and so on."
A Personal Story
The computer-oriented aspects of knowing and doing the sciences are gradually impacting the undergraduate curriculum, and still more gradually affecting precollege education. For example, a few months ago I was donating blood and struck up a conversation with a college student. He told me he was majoring in Computational Biology. This particularly impressed me because I know one of the people who helped start the International Society for Computational Biology in 1997. Quoting again from the Wikipedia:
Computational biology involves the development and application of data-analytical and theoretical methods, mathematical modeling and computational simulation techniques to the study of biological, behavioral, and social systems. The field is broadly defined and includes foundations in computer science, applied mathematics, animation, statistics, biochemistry, chemistry, biophysics, molecular biology, genetics, genomics, ecology, evolution, anatomy, neuroscience, and visualization.
The history of the development of computational biology goes back to the late 1960s. It took many years for the power of computers to reach a level that would support the types of research projects that could advance the field of biology.
A Filter Down Effect
Leading edge researchers in many disciplines tend to be early adopters of computer technology as an aid to their research work. This early adoption spreads to other researchers, to the development of professional societies, to graduate programs of study, to undergraduate programs of study, and so on. Right now, each of the sciences takes three major approaches to research: theoretical, experimental, and computational (computer modeling and simulation). We now have computational biology, computational chemistry, computational earth science, computational physics, and so on.
Here is another filter down story. In the early 1970s, when I was Head of the Computer Science Department at the University of Oregon, I was disturbed by the fact that computers were not being used in the undergraduate physics courses. Note that The Journal of Computational Physics had been established in 1966. I did a survey of the Physics Department faculty,and it turned out that all of the faculty used computers and knew how to program in the FORTRAN programming language. The programming language FORTRAN, for formula translation, first became available in1957, and was specifically designed to meet the needs of scientists and engineers.
What this told me was that by the early 1970s,the filter down process had not yet had the time needed to bring computers into the undergraduate physics curriculum at the University of Oregon. That is not surprising, because the computer facilities at the University of Oregon at that time were rather limited, and appropriate physics-oriented computer textual materials were not yet widely available.
Now We Have “Big Data” (Huge Data Sets)
I recently encountered a new report from the National Academies of Science about future computing needs in the sciences (Committee on Future Directions for NSF Advanced Computing, 2014). The frontiers of research in the various sciences are now very computationally intense. In summary, the report says researchers are being hampered by a lack of needed compute power.
Quoting from the report:
Advanced computing, a term used in this report to include both compute-and data-intensive capabilities, is used to tackle a rapidly growing range of challenging science and engineering problems.
New knowledge and skills will be needed to effectively use these new advanced computing technologies. “Hybrid” disciplines such as computational science and data science and interdisciplinary teams may come to play an increasingly important role.
The report indicated that we are now moving into the situation that research in the sciences have four major approaches: theoretical, experimental, modeling/simulation, and data analysis (big data).
An example of the data analysis (big data) is provided by the Large Hadron Collider. Quoting from the linked site:
Approximately 600 million times per second, particles collide within the Large Hadron Collider (LHC). Each collision generates particles that often decay in complex ways into even more particles. Electronic circuits record the passage of each particle through a detector as a series of electronic signals, and send the data to the CERN Data Centre (DC) for digital reconstruction. The digitized summary is recorded as a "collision event". Physicists must sift through the 30 petabytes [30 time ten to the 15th bytes] or so of data produced annually to determine if the collisions have thrown up any interesting physics.
Big data is affecting all of us. For example, when I turn on my email, Facebook provides me information about the Facebook messages I have received. Facebook has more than a billion members, and it provides this “individualized” service to each member. At the same time, it is colleting information from me. Quoting from My Secure Cyberspace (n.d.):
Facebook collects two types of information: personal details provided by a user and usage data collected automatically as the user spends time on the Web site clicking around.
Regarding personal information, the user willfully discloses it, such as name, email address, telephone number, address, gender and schools attended, for example. Facebook may request permission to use the user's email address to send occasional notifications about the new services offered.
Facebook records Web site usage data, in terms of how users access the site, such as type of web browser they use, the user's IP address, how long they spend logged into the site, and other statistics. Facebook compiles this data to understand trends for improving the site or making marketing decisions.
I make regular use of Amazon.com to purchase eBooks and other products. This leads to my receiving “personalized” ads and requests for information from them. Click here to see the Amazon.com Privacy Notice.
The Facebook and Amazon.com examples given above are just the tip of the iceberg. Big data is now firmly embedded in our business world, our “spy” world, and other aspects of our everyday lives.
Example from Science Research
Storrs, C. (2/1/2015). Screening goes in silico. The Scientist. Retrieved 2/16/2015 from http://www.the-scientist.com//?articles.view/articleNo/41979/title/Screening-Goes-In-Silico/.
The expression "insilco" means "performed on computer or via compute simulation." The insico process is now common in science research involving big data. Quoting from this article:
In the last decade, a growing number of drug discovery researchers have replaced robots and reagents in their high-throughput screens with computer modeling, relying on software to identify compounds that will bind to a protein target of interest.
Researchers often combine virtual screening with other computational tools that make predictions about the activity of individual compounds, such as how they will interact with proteins. Together, these tools help narrow down large libraries of compounds into a subset to test experimentally. The biggest compound libraries boast several million molecules, an unrealistic load for the best-equipped lab to screen the old-fashioned way.
If a computer can solve or greatly help in solving a type of problem that is important in an academic discipline, what should students be learning about the use of computers as they study this type of problem? At the college undergraduate level, the filter down process has reached the stage that computer use is becoming a routine part of coursework.
However, such is not (yet) the case in high school and the lower grades in the K-12 curriculum. There, the growing emphasis is on the use of computers as an aid to teaching the “traditional” curriculum. I believe this use of computers is a very worthwhile endeavor. However, since students taking such computer-assisted learning courses have routine access to computers, it seems to me that the use of computers to help solve the types of problems and to master the curriculum they are studying should be a routine part of the course.
What You Can Do
Learn more about current uses of computers to help represent and solve problems in the disciplines that you teach and/or study. Use some of this “state of the art” information to liven up the courses you teach. Provide opportunities for your students to explore these modern aspects of the subjects they are studying.
Committee on Future Directions for NSF Advanced Computing (2014). Future directions for NSF advanced computing infrastructure to support U.S. science and engineering in 2017-2020: Interim report. National Academies Press. Retrieved 11/19/2014 from http://www.nap.edu/openbook.php?record_id=18972.
My Secure Cyberspace (n.d.). What Facebook collects and shares. Carnegie Mellon University. Retrieved 11/20/2014 from http://www.mysecurecyberspace.com/articles/features/what-facebook-collects-and-shares.html.
Readings from IAE Publications
Moursund, D. (2014). What the future is bringing us. IAE-pedia. Retrieved 11/20/2014 from http://iae-pedia.org/What_the_Future_is_Bringing_Us.
Moursund, D. (2014). What is science? IAE-pedia. Retrieved 11/20/2014 from http://iae-pedia.org/What_is_Science.
Moursund, D. (9/1/2014). A mind-expanding experience. IAE Blog. Retrieved 11/20/2014 from http://i-a-e.org/iae-blog/entry/a-mind-expanding-experience.html.
Moursund, D. (12/21/2013). Education for the future. IAE Blog. Retrieved 11/20/2014 from http://i-a-e.org/iae-blog/entry/education-for-the-future.html.
Sylwester, R., & Moursund, D., eds. (March, 2014). Understanding and mastering complexity. Eugene, OR: Information Age Education. Download the PDF file from http://i-a-e.org/downloads/doc_download/256-understanding-and-mastering-complexity.html. Download the Microsoft Word file from http://i-a-e.org/downloads/doc_download/255-understanding-and-mastering-complexity.html.