CODATA: South Africa: Data Citation Workshop 2015

CODATA: The South Africa Data Citation Workshop, named “Data citation as a catalyst for good RDM practices”, was held on the 10th December 2015 at the Council of Scientific and Industrial Research (CSIR), Knowledge Commons building (Ulwazi), Pretoria, South Africa.

Agenda point Presenter
Science as an open enterprise – looking at the wider context for research data management (RDM) as well as Open Science Prof Colin Wright,
Department of Science & Technology
The international drive and importance of data citation. RDA Task group as well as any other relevant international initiatives Simon Hodson,
Executive Director of CODATA
CODATA Data Citation Task Team & its outputs: The citation principles developed
Martie van Deventer, CODATA Data Citation Task Team Member
Research using public funding: The South African case – (a) NRF Open Access Statement, (b) Requirements to be met
Daisy Selematsela, Executive Director: KMC
NeDICC – Its role and plans for 2016 Lucia Lötter, Chairperson: NeDICC
DIRISA Progress & function/role – (a) Repository certification: A network of trusted data repositories for South Africa; (b) DIRISA repository; (c) Encouraging good research practice – also for proper citation; (d) National digital object identifier infrastructure; (e) Researcher IDs Anwar Vahed, Manager: DIRISA
“$100 is not much to you”: Unseen Barriers in Open Science Louise Bezuidenhout University of Exeter and WITS
Workshop discussion: Data citation Concerns, success stories Will data citation really be seen as a reward for the proper management of research data?
Workshop discussion: Shared language / Jargon Clarification of terms (some examples) Ownership/ Management/ Stewardship Persistent identifiers Open, Embargoed and Restricted access Data center, Preservation DMP / RDM / RDA
Workshop discussion: Roles and responsibilities The RDM role and responsibility of each of the following: Funders; Research offices / officers; Libraries; Ethics committees; IP / Licensing office; The researchers; and the Role / responsibility for a data center
Roadmap session 1: A proposed national repository: Institutions have three options when making their research data accessible – (a) Establish your own managed data repository;
(b) Make use of an established subject / domain repository; or (c) Use DIRISA’s repository (prototype will be demonstrated)
• Join the planning and development team … Contribute to the requirements document for the repository (including the citation reference/ protocol) and preservation management & infrastructure.
• Ensure that all repositories could be harvested to allow for a national ‘register’ of South African data sets.
Anwar Vahed
Heila Pienaar
Isak van der Walt
Roadmap session 2: A proposed national DMP tool. Institutions could implement/ develop their own Data Management Planning (DMP) tools or collaborate and make use of a reliable national resource.
• RDM Plans – what are the funder requirements
• Specify the requirements for a national initiative.
Anwar Vahed
Johann van Wyk
Isak vd Walt

Journal of Empirical Research on Human Research Ethics: Special issue

Click Ethics and sharing individual-level health research data from low and middle income settings, vol 10(3), July 2015 for more information.

Big Data & Society: Read Top Articles

Received via HSRC from

  • Big Data, new epistemologies and paradigm shifts by Rob Kitchin. Abstract: This article examines how the availability of Big Data, coupled with new data analytics, challenges established epistemologies across the sciences, social sciences and humanities, and assesses the extent to which they are engendering paradigm shifts across multiple disciplines. In particular, it critically explores new forms of empiricism that declare ‘the end of theory’, the creation of data-driven rather than knowledge-driven science, and the development of digital humanities and computational social sciences that propose radically different ways to make sense of culture, history, economy and society. It is argued that: (1) Big Data and new data analytics are disruptive innovations which are reconfiguring in many instances how research is conducted; and (2) there is an urgent need for wider critical reflection within the academy on the epistemological implications of the unfolding data revolution, a task that has barely begun to be tackled despite the rapid changes in research practices presently taking place. After critically reviewing emerging epistemological positions, it is contended that a potentially fruitful approach would be the development of a situated, reflexive and contextually nuanced epistemology
  • Surveillance, Snowden, and Big Data: Capacities, consequences, critique by David Lyon. Abstract: Abstract: The Snowden revelations about National Security Agency surveillance, starting in 2013, along with the ambiguous complicity of internet companies and the international controversies that followed provide a perfect segue into contemporary conundrums of surveillance and Big Data. Attention has shifted from late C20th information technologies and networks to a C21st focus on data, currently crystallized in “Big Data.” Big Data intensifies certain surveillance trends associated with information technology and networks, and is thus implicated in fresh but fluid configurations. This is considered in three main ways: One, the capacities of Big Data (including metadata) intensify surveillance by expanding interconnected datasets and analytical tools. Existing dynamics of influence, risk-management, and control increase their speed and scope through new techniques, especially predictive analytics. Two, while Big Data appears to be about size, qualitative change in surveillance practices is also perceptible, accenting consequences. Important trends persist – the control motif, faith in technology, public-private synergies, and user-involvement – but the future-orientation increasingly severs surveillance from history and memory and the quest for pattern-discovery is used to justify unprecedented access to data. Three, the ethical turn becomes more urgent as a mode of critique. Modernity’s predilection for certain definitions of privacy betrays the subjects of surveillance who, so far from conforming to the abstract, disembodied image of both computing and legal practices, are engaged and embodied users-in-relation whose activities both fuel and foreclose surveillance.
  • Emerging practices and perspectives on Big Data analysis in economics: Bigger and better or more of the same? by Linnet Taylor, Ralph Schroeder, and Eric Meyer. Abstract: Although the terminology of Big Data has so far gained little traction in economics, the availability of unprecedentedly rich datasets and the need for new approaches – both epistemological and computational – to deal with them is an emerging issue for the discipline. Using interviews conducted with a cross-section of economists, this paper examines perspectives on Big Data across the discipline, the new types of data being used by researchers on economic issues, and the range of responses to this opportunity amongst economists. First, we outline the areas in which it is being used, including the prediction and ‘nowcasting’ of economic trends; mapping and predicting influence in the context of marketing; and acting as a cheaper or more accurate substitute for existing types of data such as censuses or labour market data. We then analyse the broader current and potential contributions of Big Data to economics, such as the ways in which econometric methodology is being used to shed light on questions beyond economics, how Big Data is improving or changing economic models, and the kinds of collaborations arising around Big Data between economists and other disciplines.
  • Networks of digital humanities scholars: The informational and social uses and gratifications of Twitter by Anabel Quan-Haase, Kim Martin, and Lori McCay-Peet. Abstract: Big Data research is currently split on whether and to what extent Twitter can be characterized as an informational or social network. We contribute to this line of inquiry through an investigation of digital humanities (DH) scholars’ uses and gratifications of Twitter. We conducted a thematic analysis of 25 semi-structured interview transcripts to learn about these scholars’ professional use of Twitter. Our findings show that Twitter is considered a critical tool for informal communication within DH invisible colleges, functioning at varying levels as both an information network (learning to ‘Twitter’ and maintaining awareness) and a social network (imagining audiences and engaging other digital humanists). We find that Twitter follow relationships reflect common academic interests and are closely tied to scholars’ pre-existing social ties and conference or event co-attendance. The concept of the invisible college continues to be relevant but requires revisiting. The invisible college formed on Twitter is messy, consisting of overlapping social contexts (professional, personal and public), scholars with different habits of engagement, and both formal and informal ties. Our research illustrates the value of using multiple methods to explore the complex questions arising from Big Data studies and points toward future research that could implement Big Data techniques on a small scale, focusing on sub-topics or emerging fields, to expose the nature of scholars’ invisible colleges made visible on Twitter.
  • Small Big Data: Using multiple data-sets to explore unfolding social and economic change by Emily Gray, Will Jennings, Stephen Farrall, and Colin Hay. Abstract: Bold approaches to data collection and large-scale quantitative advances have long been a preoccupation for social science researchers. In this commentary we further debate over the use of large-scale survey data and official statistics with ‘Big Data’ methodologists, and emphasise the ability of these resources to incorporate the essential social and cultural heredity that is intrinsic to the human sciences. In doing so, we introduce a series of new data-sets that integrate approximately 30 years of survey data on victimisation, fear of crime and disorder and social attitudes with indicators of socio-economic conditions and policy outcomes in Britain. The data-sets that we outline below do not conform to typical conceptions of ‘Big Data’. But, we would contend, they are ‘big’ in terms of the volume, variety and complexity of data which has been collated (and to which additional data can be linked) and ‘big’ also in that they allow us to explore key questions pertaining to how social and economic policy change at the national level alters the attitudes and experiences of citizens. Importantly, they are also ‘small’ in the sense that the task of rendering the data usable, linking it and decoding it, required both manual processing and tacit knowledge of the context of the data and intentions of its creators.

New service announced by DataCite

Some of the da|ra data centers have been already included in the Thomson Reuters Data Citation Index. In terms of inclusion criteria, here are the main factors TR looks for when evaluating a potential resource:

  • Is the source regularly maintained?
  • Does it contain the minimum metadata fields for inclusion (title, authors, created date, persistent identifier, URL/DOI)?
  • Are all required fields in English?
  • Does the source provide unique landing pages?
  • Does the source provide permanent IDs?
  • Does the resource have a commitment to data curation?
  • Is there evidence that the resource’s data have been cited in research literature?

If you want your data to be included in the TR DCI you should consider improving the quality of the metadata you are proving to da|ra. Refer to
Best Practice Assurance_of metadata quality

News from DataCite: It has announced a new service that might be of interests to you: It is available via the new DataCite Labs Search (  and includes all functionality of the DataCite/ORCID claiming tool which was developed in the EC-funded ODIN Project. It allows principle investigators / authors to add works with DataCite DOI names to their ORCID profile.

The tool is open for testing. I am happy to forward your feedback and comments to the DataCite Developer or contact Martin Fenner ( directly.

Furthermore DataCite has started a Data Levele Metrics pilot project.  Data level metrics are an emerging activity that are designed to track and measure data usage. DataCite, has been involved in DLM activities for a period of time. See the recent blog post for more information on MDC.

PS: Please visit the da|ra news page ( ) or follow them on twitter (@dara_info ) for updates and news.

Sharing Research Data and Intellectual Property Law: A Primer by Michael W Carroll

Abstract: Sharing research data by depositing it in connection with a published article or otherwise making data publicly available sometimes raises intellectual property questions in the minds of depositing researchers, their employers, their funders, and other researchers who seek to reuse research data. In this context or in the drafting of data management plans, common questions are (1) what are the legal rights in data; (2) who has these rights; and (3) how does one with these rights use them to share data in a way that permits or encourages productive downstream uses? Leaving to the side privacy and national security laws that regulate sharing certain types of data, this Perspective explains how to work through the general intellectual property and contractual issues for all research data.

Citation: Carroll MW (2015) Sharing Research Data and Intellectual Property Law: A Primer. PLoS Biol 13(8): e1002235. doi:10.1371/journal.pbio.1002235.
Published: August 27, 2015, Copyright: © 2015 Michael W. Carroll. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

NISO (National Information Standards Organization) Publication on Research Data Management

The National Information Standards Organization (NISO) has launched a new Primer Series on information management technology issues with the publication of the first primer on the topic of RDM. The primer on Research Data Management provides an overview of how data management has changed in recent years, and outlines best practices for the collection, documentation, and preservation of research data. The importance of creating a data management plan (DMP) before beginning a research data project is emphasized. Crucial questions regarding how the data will be managed are answered ahead of time in a DMP, thus making it easier for the researcher to collect and document the data properly for future use and reuse. Creating research data that is easily reproducible and transparent is the ultimate goal, and following the guidelines in this primer can help educate researchers to ensure their data is available for others. The differences between publishing papers and publishing datasets and the citation challenges the data community are working on solving are also discussed.

Full text: Research Data management by Carly Strasser

Complete announcement: