Improving Science through Data Management and Sharing

Professor Alexandra Bennett is in charge of a typical lab at a large university that studies the surface characteristics of semiconductors. The outcomes of research inform industry developers to build better electronic devices. Within this lab group, doctoral students are engaged in their own individual projects and are responsible for recording and archiving their results. However, there is no standardized system to digitally store and access research data. This is mostly due to the lack of familiarity with data storage among the researchers combined with different working styles. It makes it difficult for doctoral students to share related data with each other, much less with researchers outside of their institution, since recordkeeping is so idiosyncratic. Professor Bennett remarks, "It scares me how much data was lost (sic) because it wasn't well organized." (Akmon, Daniels, Hedstrom, & Zimmerman, 2011, p. 337).

This is just one small example of how valuable data can be lost to both current and future researchers when there is no data management plan in place. Scientific discovery and innovation move society into the future, and it is the responsibility of researchers to use their work to advance that purpose. By effectively managing and sharing their data with the public, researchers can facilitate collaboration with their peers, thus conserving time and resources. This also leads to increased transparency and improved scientific reputations. There are some challenges facing this proposal, but with a concerted effort data management and sharing can become an integral part of the scientific culture.

Background

Data curation is a field that is growing in response to the massive amounts of information being generated by research professionals. It is estimated that the amount of scientific data increases around the world at an annual rate of thirty percent (Pryor, 2012). These data comprise anything from specific data collected during experimental procedures, to the exact location of a star, to the image of a cell. Data curation refers to the preservation of digital information, with emphasis on adding value to the data over the long-term (Harvey, 2007). Generally, data curation involves activities related to preserving, maintaining, archiving, and depositing data (Jahnke & Asher, 2012). Curation, according to the Digital Curation Centre's Curation Manual (Harvey, 2007), includes appraisal and selection, capturing and maintaining metadata, ontology development and maintenance, and updating storage technologies. The emphasis is on adding value to digital data over the long-term (Harvey, 2007).

Appraisal and selection refer to the processes by which curators determine what information is worth preserving for the long term. Due to the limited of resources and curatorial effort, not all data that are generated can be saved. Metadata, defined as labels that describe pieces of data, can add value by giving original datasets context, in terms such as conditions, equipment, or materials. It makes replication by other scientists much more manageable because they would possess both the original data and the context. Ontologies are useful for structuring and can assist with retrieving information; ontologies classify concepts within a specific field and the relationships between those terms. Ontologies are often maintained by data curators to ensure consistency and operability and are updated as new information is generated. Updating storage technologies simply refers to keeping up with the newest technology in storage devices to make datasets accessible in the future (Harvey, 2007).

The most important reason to manage scientific data in this way is to enable researchers to easily share data immediately and archive it for future accessibility. Many scientists are skeptical of this idea, but widespread adherence to such principles would revolutionize the way research is conducted, and in some cases it has already done so. In today's science, collaboration is the key to success, and managing data effectively facilitates that process by making it easier to share data with other researchers. New scientific discoveries can be encouraged by making more datasets publically available, not just those that appear in published papers. It would also lend new credibility to the conclusions being drawn by researchers and influence scientific reputations.

Responsibility for data curation activities primarily lies with the researchers collecting the data firsthand. Ideally, the researchers would record, organize, and store the metadata, data, and results associated with their experiments as they are developed. Unfortunately, and for many reasons, this is not the case and important data are lost as a result. Scientific progress is often measured by the number and impact of publications, rather than the generation and sharing of information; therefore, there is a lack of incentive for researchers to curate their data. Lack of formal training in data curation during higher education can leave many researchers unaware of the importance of such methods and how to implement them. In addition, there is no system of incentives to encourage data curation practices. Concerns about ownership, privacy, and other legal issues may make some researchers nervous about sharing their data. The relative novelty of data curation means that there is yet to be a consensus on the best practices and standards.

Data curators are information professionals who bridge the gap between researchers and librarians. They have the scientific background to be able to interact with researchers on a peer-to-peer level while possessing the training to implement the appropriate data management methods. For example, Melissa Haendal and Nicole Vasilevsky are scientific data curators for the eagle-i Network at the Oregon Health and Science University (OHSU) in Portland, Oregon. Both women began their careers as bench scientists and transitioned into roles as scientific data curators. Their background in biomedical research enabled them to understand the significance of the information they were curating and preserve it accordingly (Haendal & Vasilevsky, personal communication, January 18, 2013).

Benefits of Data Management and Sharing

Easier Collaboration to Facilitate Research

In the past, great discoveries were made by scientists working alone; today, more and more scientists are working together, especially across disciplines. Structuring and managing data facilitates sharing and collaboration with others. This practice can also lead to the detection of significant patterns and ultimately facilitate discovery. With more data available to public view, patterns may be found that could launch new lines of research. People may look at data in different ways and notice implications or patterns that are missed by the initial researchers. Allowing others to analyze datasets post-publication also has the potential to increase the value of the data.

Researchers can also benefit from the efforts of amateur scientists, a group of people who do not practice science professionally yet still have a deep interest in one or several scientific fields. Researchers collecting large datasets, such as astronomical data, can make them available to the public and set up a standard procedure for organizing the information. Amateur scientists around the world can then assess the data and assist in the organization and categorization of the data.

Galaxy Zoo is such a project that began in 2007 and asked users to sort astronomical images taken by various telescopes, including the Sloan Digital Sky Observatory. Within twenty-four hours of launching, the website was receiving 70,000 galaxy classifications per hour. For increased accuracy, multiple users examine each galaxy picture. As a result of this citizen science, researchers can access these data for use in publications and save the time and resources it would have taken to do it themselves (Galaxy Zoo). It is worth mentioning that Galaxy Zoo is merely one of twelve citizen science platforms operating in "The Zooniverse," all of which invite lay users to help in the classification and organization of mass amounts of data (Zooniverse).

Save Time and Resources

Collaboration in science is easier in today's world than ever before, with technology linking researchers together across the globe. The significance of improved collaboration is that it saves the time that would have been spent performing the same experiments multiple times in isolated labs. Instead, researchers can access previously supported information, incorporate it into their experimental design, and take that particular line of research to the next step.

Proper data management makes this even more useful to the scientific community. Inclusion of metadata, accessible formatting, appropriate citations, and relevant links ensure that whoever is re-using data is receiving the full value of the work. Subsequent research based on such data would be commensurately more reliable. As a whole, the scientific community would be able to conduct research and release results at a faster pace, thus increasing the rate of scientific discovery.

Increased Transparency

The benefit to including metadata and data with published works is that other researchers can see every detail of the experiment. Typically, researchers only publish summaries and conclusions that are directly relevant to their hypothesis and subsequent conclusions. It is currently not the norm to include detailed descriptions of their methods or to provide the datasets with the publication. What is irrelevant at the time may become important in future experiments. The inclusion of datasets could become meaningful in terms of comparing new data and replicating certain results.

There are instances of scientific misconduct, where researchers publish dishonest studies by altering or fabricating results, often due to pressures to publish. Five percent of scientists who responded to an anonymous questionnaire admitted to removing data that contradicted previous research (Weiss, 2005). The number of articles that are retracted is increasing each year, and the majority of retracted articles are due to scientific misconduct (Fang, Steen, & Casadevall, 2012). Such actions can have serious consequences, such as misdirecting future research and thus wasting time and resources. Researchers who make metadata public along with their paper will make it possible for their peers to closely analyze the results and increase confidence in the quality of the published paper.

Improved Evaluation of Reputation

In academia, the number of citations one's publications garner is a major factor in reputation and credibility. The most reputable researchers will turn out a large number of published papers that prove relevant and reliable enough for their peers to apply to their own work. Therefore, a large number of citations are indicative of trustworthy, well-done research. In many instances, the decision to award tenure to an individual is also based on this factor. Researchers who make their data available along with their published papers increase the chance that someone doing similar research will come across their findings, which in turn increases the chance that their work will be cited in other publications. Essentially, more available data leads to more citations.

Current Problems and Possible Solutions

Influence of Career Ambitions

There is an incredible pressure in the research community to find grants and publish papers. Without funding, researchers are unable to gather the data necessary to publish papers, and their careers cannot advance. In the face of these harsh realities, proper storage of data is dropped to the bottom of the priority list and sharing data too early is considered risky for publication purposes. Theft of data is a real concern: "People steal data, that does happen," said Dr. Cooper, (personal communication, March 16, 2013).

There is also the matter of time and effort. As Borgman, Enyedy, and Wallis say in their 2007 paper, organizing data to be accessible to the wider public takes more time than making it available to a small lab group. Managing data to be usable in the long-term is more difficult than creating the data summaries that appear in published papers. In this context, researchers have little reason to want to effectively manage their data, since it is publications that are rewarded, not data management (Borgman, Enydey, & Wallis, 2007).

Solution. Widespread recognition of the aforementioned benefit of increased citations and improved reputation might change the mindset of current and future researchers. It would add value to the task of data management and sharing and provide incentives to make such activities a priority. Researchers would be willing to expend more time and energy to a task that promises a strong return, such as grants or tenure.

Some data repositories are addressing the risk of releasing data too early by allowing periods of hibernation between submission of a paper and availability of related data. Dryad is a digital repository that stores and archives "data files associated with any published article in the sciences or medicine" (Dryad, 2013). By default, the repository places an embargo on data related to an article until said article is published in a journal. Researchers may also select the "no-questions-asked' embargo option, which keeps the default embargo in place for one year after publication. This allows the data to be professionally managed while the original researcher gains recognition for the work, all before the data becomes publically available (Dryad, 2013).

Lack of Incentives from Grant Agencies

The driving force behind any scientific research is grant funding. Several major grant agencies, such as the National Institutes of Health (NIH) and the National Science Foundation (NSF), require a data sharing and management plan to be included with the grant proposal. The NIH data sharing policy states that it "endorses the sharing of final research data to serve these and other important scientific goals and expects and supports the timely release and sharing of final research data from NIH-supported studies for use by other researchers" (National Institutes of Health, 2012b). The NSF has a similar policy, which states that "Investigators are expected to share with other researchers, at no more than incremental cost and within a reasonable time, the primary data, samples, physical collections, and other supporting materials created or gathered in the course of work under NSF grants" (National Science Foundation, 2013).

Such policies are encouraging, yet the agencies fail to provide any incentives or sanctions (Pryor, 2012). In Dr. Christine Portfors experience, the NSF grants require a data management plan, but they do not check up on it, "so there is a disconnect" (personal communication, March 18, 2013). With no rewards or punishments to encourage follow-through, it is entirely possible that a researcher may include a plan to share data in a field-specific repository to satisfy the grant proposal requirements and completely fail to do so at the conclusion of the study, thereby rendering such polices useless. In addition, even when the funding agency has a requirement, it often does not provide any guidelines or assistance to the researcher on how to manage data. Perhaps even worse than a perfunctory policy is a nonexistent policy; the policies described above are not yet a widespread practice (Arzberger et al., 2004).

Solution. What is needed is a system of incentives and sanctions along with assistance. Rewards and assistance should be discipline-based, to provide the most encouragement for researchers to share their data and attain recognition for their work (Arzberger et al., 2004). A present-day example of this practice is the Protein Data Bank. This is a repository for protein structures that began in 1977 and has grown to contain more than 40,000 entries (Berman, 2007). In the late 1980s, a group of researchers who felt strongly about data sharing worked with committees from the International Union of Crystallography Commission on Biological Macromolecules, the American Crystallographic Association, and the United States National Committee for Crystallography to create an official policy that requires data deposition from crystallographers (Berman, 2007). The policy states that structure coordinates need to be deposited at publication and released within a year and that structure factors need to be both deposited and released within four years (Berman, 2007). To further encourage researchers to comply with these guidelines, many major journals such as the Journal of Biological Chemistry require that they be followed for publication (Berman 2007).

Lack of Universal Plans

Requiring data management plans as part of grant proposals is a step forward for data sharing, yet the lack of existing standardized data management procedures prevents this step from reaching its full potential. The first problem is that collection and archive plans used by libraries for print collections do not easily translate to digital collections. Secondly, variation in data types makes it difficult to establish one set of universal standards for shared data and procedures for depositing data in repositories.

Libraries have long-standing acquisition, selection, and preservation policies for print collections but these are not easy to apply directly to collections of data. With print resources, librarians consider things such as patron demand, immediate and long-term cost, and physical maintenance. With the influx of digital data experienced in recent decades, there has been a need to modify those strategies to suit this different format. New emphasis needs to be put on digital storage, on-going technical maintenance, and early decisions to prevent loss of data (Harvey, 2007).

The variety of data types and legal issues surrounding data are so complex that developing a universal policy and procedure is impossible. Different fields collect data differently, either through pictures, structures, numbers, or a combination. For example, the Cell Image Library is a public access repository that stores cell images along with information like the cell source, how it was acquired, and the conditions under which it was acquired (Cell Image Library). The GEON repository is also public access, and stores three-dimensional elevation models and topographic data (Geosciences Network). It would be impractical to attempt to establish uniformity between two such repositories because the data they manage is vastly different.

Solution. The best way to ensure the quality of the data in any given repository is to create standards within each scientific field. It is impossible to implement universal standards in research because data collection varies so widely across fields. However, it is possible for governing bodies within each field to devise a system of standards for data to ensure its integrity. It would make intra-discipline data sharing faster and easier and take the guesswork out of who is responsible for managing data.

This approach could also be applied to devising an appraisal and selection policy. Graham Pryor suggested that such a policy might include a series of questions that must be adequately addressed before a particular data set is selected for preservation. For example, does the data set "support research requirements of the user community"; are there "any legal obligations for the data"; and does "accessibility needs to be maintained through changes of technology" (2012)?The best such policy would need to address the needs of both the research community and information professionals.

The Human Variome Project held a forum in 2009 to discuss establishing standards for the submission of genetic data. This group of researchers was seeking to collect genetic information as it related to human health (Human Variome Project, 2013). The forum discussed formalizing "mutation nomenclature, description and annotation, clinical ontology, means to better characterize unclassified variants (UVs), and methods to capture mutations from diagnostic laboratories for broader distribution to the medical genetics research community," (Howard et al., 2010). These actions would eliminate guesswork and ad hoc data management methods, which would make it easier for researchers to share and access data relevant to their work and advance the goals of the organization.

Questions of Intellectual Property and Privacy

Collection and analysis of data involve a huge investment of time and energy on the part of the researchers, imparting a strong sense of ownership to the researchers (Fry, Lockyer, Oppenheim, Houghton, & Rasmussen, 2009). With the increase in interdisciplinary research, the question of ownership is already ambiguous, (Borgman et al., 2007) without the additional complications inherent in data repositories. Legal issues regarding data, especially in regards to data of a sensitive nature, are also a concern to researchers. In a review of applicable literature from Europe, Australia, the United Kingdom, and the United States, reviewed by Martin Feijen, forty-one percent of researchers cited legal issues and the same number cited misuse of data as their main concerns regarding data sharing (Feijen, 2011).

Disputes over data ownership may also discourage researchers from making data widely available. This is less of an issue in academic institutions and more of a concern in corporate and government research. At academic institutions, it is generally recognized that the researcher and institution share ownership of data, but that researchers may manage the data as they see fit (Culliton, 1988). At corporate and especially government institutions, research is owned by the institution, and the researcher may not have the authority to initiate data sharing (Culliton, 1988). Sensitive information acquired by corporations or by the original researchers is often destroyed due to privacy policies (King, 2011). The problem with this policy is that it makes the results impossible to replicate and, in extreme cases, may lead to questions of fraud (King, 2011).

Privacy matters are especially an issue in instances where human research subjects are concerned. Anonymity helps, but may not be enough. In a 1997 paper, Latanya Sweeny demonstrated how supposedly anonymous data could be cross-referenced to reveal the identity of whomever the data described. This tends to be an issue in medical research, where data such as demographics, clinical, and genomic information is collected and stored. There can be legal and financial consequences to institutions that fail to adequately protect their participants' privacy (Gkoulalas-Divanis & Loukides, 2011).

In terms of privacy, researchers' concerns vary from protecting the identity and physical safety of their subjects to preventing the theft of valuable objects to ethical concerns about who can access their data (Jahnke & Asher, 2012). Such concerns may contribute to the all-to-common failure of researchers to manage their data and secondary resources; if the results are not going to be made publically available, why invest the time and energy? (Jahnke & Asher, 2012).

Solution. The Genetic Testing Registry was launched by NIH on February 29, 2013 and provides a model for how to deal with medical data in a way that protects privacy. The database allows researchers to upload information relating to each genetic test, including "the purpose of each genetic test and its limitations; the name and location of the test provider; whether it is a clinical or research test; what methods are used; and what is measured" (National Institutes of Health, 2012a). It also offers information regarding analytic validity, clinical validity, and clinical utility. However, no data regarding the people who underwent the tests or test results will be included. The utility of this database is purely in its capacity to link researchers and medical professionals to tests, conditions, and genes. In this way, the data collected by these genetic tests is still available to science without endangering the privacy of patients.

Lack of Training in Data Curation

It is easy to say that researchers need to spend more time sharing and managing their data, but it is not fair to expect them to do so with their current level of training in that area. A survey of forty-eight ecology programs showed that more than half did not include data management as part of the curriculum (Carly Strasser, personal communication). In addition, Drs. Cooper, Portfors, and Tissot all said that they never received formal training in data management (personal communication, March 18, 2013).

The process of sharing and managing data is inherently technical and that can be daunting to a researcher who has no experience in such areas. It requires a huge investment of time to educate oneself about the repositories and databases available in one's field, as well as to keep up with rapidly changing technologies. It may seem easier for researchers to simply maintain the status quo of publishing as many papers as possible and keeping the data private. When researchers do chose to put effort into managing data, they must do so in an ad-hoc fashion, which may not result in the most effective methods (Jahnke & Asher, 2012). The 2008-2009 PARSE Insight survey, conducted by a consortium of European scientific research institutions, asked researchers about their attitudes towards the preservation of digital information. The results showed that researchers feel that more knowledge and expertise is important to improving the current state of digital preservation (Kuipers & van der Hoeven, 2010).

Data curation education could also stand improvement on the library and information science side. Only five masters programs in library and information science offer specialties in data curation (Jahnke & Asher, 2012). Other programs provide training only in individual courses embedded within a general master's degree in library science (Jahnke & Asher, 2012). This means that there is a lack of data curation specialists currently operating in the field.

Solution. Instituting greater emphasis on data management and encouraging data sharing while future researchers are in school is necessary to solve this problem. Science programs can include a data management course as requirement for an undergraduate degree, and undergraduate laboratory courses can include maintaining an accurate and honest data journal. This emphasis should continue into the graduate level, with teachers and mentors educating graduate students about field-specific databases and their uses, as well as the newest data management methods. Informational professionals can act as consultants for students seeking more knowledge about the technical and legal aspects of data curation and sharing. Once these students begin working as certified researchers, they will have a strong foundation in the essentials of managing data.

The University of Oregon Libraries has developed a program that could be a model for other institutional libraries to improve data curation education. The library website includes an outline of what a data management plan should look like and an informational section that includes links and overviews for topics like data repositories, intellectual property, and data storage. The library hosts an institutional repository where researchers can submit data for preservation and archiving, all of which will be available for citation. Perhaps their most helpful service is a series of workshops for faculty, graduate students, and researchers to teach them about data management. This program provides a way for information professionals and researchers to work together to properly manage data with a common understanding (University of Oregon, n.d.).

Haendel and Vasilevsky have also attempted to address this problem through their work on eagle-i by focusing on designing the system from the researcher's perspective to increase ease of use. They helped develop curation guidelines and added "tool tips" to the data collection interface to eliminate guesswork and promote uniformity across the database. The goal is to make it quick and simple for researchers to provide information in the database (Vasilevsky, et al., 2011).

Rapidly Changing Technologies

Data storage has come a long way since the first hard disk drive in the 1950s, which was the size of a small refrigerator and had a five-megabyte capacity. Since then, society has progressed through the Laserdisc, floppy disc, VHS, and compact disc. Current researchers often use servers to store the results of experiments and have multiple backup copies and may also utilize more recent cloud technology. Yet there is still data that exists on outdated storage devices that are no longer compatible with current technologies and cannot be accessed by researchers seeking to make comparisons with past data.

For example, Dr. Tissot's lab has a lot of outmoded storage devices, most of which contain the results of marine surveys, performed over many years. Each requires a specific type of machine to access the data, some of which are not even in existence anymore. The bulk of information is video taken during survey dives stored on VHS tapes and cassettes (B. Tissot, personal communication, March 7, 2013). Dr. Tissot is not averse to sharing information; he has built his career on collaboration. Yet it would be extremely difficult, if not impossible, to share the data he has collected over the years with other researchers simply because the technology required to access it is obsolete.

Situations like this are common with scientific researchers who have been conducting research for a number of years. Proper data management includes ensuring that data remains compatible with current technologies. Doing so prevents data from becoming damaged or inaccessible. This task is often deemed a low priority next to the pressure to conduct new research and publish papers. It is also possible that researchers do not have the money, resources, or expertise to update technologies.

Solution. The issue with technology has a unique solution in that it can easily be addressed if several of the above problems are solved. With the previously mentioned career emphasis on data sharing, the technical aspect of data management would become more important, and researchers may be more willing to invest the effort in making data technologically current. Here is also where a stronger partnership with information professionals would benefit researchers; researchers could outsource the task of managing and updating storage technologies to those who have been trained to do so. Grant agencies that include a data management plan requirement could also include additional funding to purchase the appropriate technologies and to hire information professionals.

Summary

The catalyst for change will be a paradigm shift in the research world that places more emphasis on data management as a means to facilitate data sharing. As more researchers become aware of the benefits of data management and sharing, more effort will be expended to solve the problems outlined here. The benefits are significant; sharing data with the research community allows for rapid innovation, which in turn improves the state of society. Several groups and institutions have already taken the first steps, and the rest of the research community can use them as models for their own changes. All of this can be made possible if every researcher makes a conscious, dedicated effort to curate their data. It is a small thing that can have huge impacts on science both now and in the future.

References

Akmon, D., Daniels, M., Hedstrom, M., & Zimmerman, A. (2011). The application of archival concepts to a data-intensive environment: working with scientists to understand data management and preservation needs. Archival Science, 11, 329-348. doi: 10.1007/s10502-011-9151-4

Arzberger, P., Shroeder, P., Beaulieu, A., Bowker, G., Casey, K., Laaksonen, L., et al. (2004). Promoting access to public research data for scientific, economic, and social development. Data Science Journal, 3, 135-152.

Berman, H. M. (2007). The Protein Data Bank: a historical perspective. Foundations of Crystallography, A64, 88-95. doi:10.1107/S0108767307035623

Borgman, C. L., Enyedy, N., & Wallis, J. C. (2007). Little science confronts the data deluge: habitat ecology, embedded sensor networks, and digital libraries. International Journal on Digital Libraries, 7, 17-30. doi: 10.1007/s00799-007-0022-9

Cell Image Library. (n.d.). About. Retrieved from http://www.cellimagelibrary.org/pages/about

Culliton, B. J. (1988). Authorship, data ownership examined. Science, 242, 658. doi:10.1126/science.3187511

Dryad. (2013). About Dryad. Retrieved from http://datadryad.org/pages/repository

Fang, F. C., Steen, R. G., Casadevall, A. (2012). Misconduct accounts for the majority of retracted scientific publications. Proceedings of the National Academy of Sciences of the United States of America, 109, 17028-17033. doi: 10.1073/pnas.1212247109

Feijen, M. (2011). What researchers want. SURF. Retrieved from http://www.surf.nl/nl/publicaties/Documents/What_researchers_want.pdf

Fry, J., Lockyer, S., Oppenheim, C., Houghton, J., & Rasmussen, B. (2009). Identifying benefits arising from the curation and open sharing of research data produced by UK Higher Education and research institutes. Retrieved from JISC Repository: http://repository.jisc.ac.uk/279/2/JISC_data_sharing_finalreport.pdf

Galaxy Zoo. (n.d.). The story so far. Retrieved from http://www.galaxyzoo.org/#/story

Geosciences Network. (n.d.). Vision. Retrieved from http://www.geongrid.org/index.php/about/

Gkoulalas-Divanis, A, & Loukides, G. (2011). Medical data sharing: privacy challenges and solutions. Tutorial presented at annual meeting of European Conference of Machine Learning and Principles and Practice of Knowledge Discovery Databases, Athens, Greece. Retrieved from http://www.zurich.ibm.com/medical-privacy-tutorial/Tutorial_MEDPRIV_GL.pdf

Harvey, R. (2007). Installment on "Appraisal and Selection". In Digital Curation Centre (Ed.), Digital Curation Manual (pp.1-39). Retrieved from http://www.dcc.ac.uk/sites/default/files/documents/resource/curation-manual/chapters/appraisal-and-selection/appraisal-and-selection.pdf

Howard, H. J., Horaitis, O., Cotton, R. G. H., Vihinen, M., Dalgleish, R., Robinson, P., Tuffery-Giraud, S. (2010). The Human Variome Project (HVP) 2009 Forum ''Towards

Establishing Standards''. Human Mutation, 31. Retrieved from http://onlinelibrary.wiley.com/doi/10.1002/humu.21175/pdf

Human Variome Project. (2013). About the Human Variome Project. Retrieved from http://www.humanvariomeproject.org/index.php/about

Jahnke, L. M., & Asher, A. (2012). The problem of data: data management and curation practices among university researchers (CLIR Report 154). Retrieved from Council on Library and Information Resources website: http://www.clir.org/pubs/reports/pub154/pub154.pdf

King, G. (2011). Ensuring the data-rich future of the social sciences. Science, 331, 719-721. doi: 10.1126/science.1197872

Kuipers, T., & van der Hoeven, J. (2010). Insight into digital preservation of research output in Europe. Retrieved from Parse-Insight website: http://www.parse-insight.eu/downloads/PARSE-Insight_D3-6_InsightReport.pdf

National Institutes of Health. (2012a). Confused by genetic tests? NIH's new online tool may help. Retrieved from http://www.nih.gov/news/health/feb2012/od-29.htm

National Institutes of Health. (2012b). NIH Grants Policy Statement. Retrieved from http://grants.nih.gov/grants/policy/nihgps_2012/nihgps_ch8.htm#_Toc271264948

National Science Foundation. (2013). Chapter IV – Other Post Award Requirements and Considerations. Retrieved from http://www.nsf.gov/pubs/policydocs/pappguide/nsf13001/aag_6.jsp

Pryor, G. (2012). Why manage research data?. In G. Pryor (Ed.), Managing research data (pp. 1-16). London: Facet Publishing.

Sweeney, L. (1997). Weaving technology and policy together to maintain confidentiality. The Journal of Law, Ethics, and Medicine, 25, 98-110. Retrieved from: http://dataprivacylab.org/dataprivacy/projects/law/jlme.pdf

University of Oregon. (n.d.). Research data management. Retrieved from http://library.uoregon.edu/datamanagement/index.html

Vasilevsky, N., Johnson, T., Corday, K., Torniai, C., Brush, M., Segerdell, E., Wilson, M., Shaffer, C., Robinson, D., Haendel, M. (2011). Research resources: curating the new eagle-i discovery system. The Journal of Biological Databases and Curation, 2012, n.p. Retrieved from: http://database.oxfordjournals.org/content/2012/bar067.full

Weiss, R. (2005, June 9). Many scientists admit to misconduct. Washington Post. Retrieved from http://www.washingtonpost.com/wp-dyn/content/article/2005/06/08/AR2005060802385.html

Zooniverse. (n.d.). Projects. Retrieved from https://www.zooniverse.org/projects

Improving Science through Data Management and Sharing

Kathryn A. Kane
Washington State University Vancouver

Introduction

Background

Benefits of Data Management and Sharing

Easier Collaboration to Facilitate Research

Save Time and Resources

Increased Transparency

Improved Evaluation of Reputation

Current Problems and Possible Solutions

Influence of Career Ambitions

Lack of Incentives from Grant Agencies

Lack of Universal Plans

Questions of Intellectual Property and Privacy

Lack of Training in Data Curation

Rapidly Changing Technologies

Summary

References

Improving Science through Data Management and Sharing

Kathryn A. Kane Washington State University Vancouver

Introduction

Background

Benefits of Data Management and Sharing

Easier Collaboration to Facilitate Research

Save Time and Resources

Increased Transparency

Improved Evaluation of Reputation

Current Problems and Possible Solutions

Influence of Career Ambitions

Lack of Incentives from Grant Agencies

Lack of Universal Plans

Questions of Intellectual Property and Privacy

Lack of Training in Data Curation

Rapidly Changing Technologies

Summary

References

Kathryn A. Kane
Washington State University Vancouver